Web commands

  • Download a web page an all its requisites:
    wget -r -p -nc -nH --level=1 http://pypi.python.org/simple/python-ldap/
    
  • Create a PNG image of a rendered html page:
    kwebdesktop 1024 768 capture.png http://slashdot.org/
    
  • Search in all files malformed HTML entities (in this case non-breakable spaces that doesn’t end with a semicolon):
    grep -RIi --extended-regexp '&nbsp[^;]' ./
    
  • Here is a one-liner I use to ping some pages on internet to force our corporate proxy to refresh its internal cache:
    for EGG in BeautifulSoup PIL Plone; do wget --server-response -O /dev/null http://pypi.python.org/simple/$EGG/; done
    
  • Create a minimal self-signed unencrypted SSL certificate without issuer information and a validity period of 10 years:
    openssl req -x509 -nodes -subj '/' -days 3650 -newkey rsa:2048 -keyout self-signed.pem -out self-signed.pem
    
  • Create a pair of SSL self-signed certificate and (unencrypted) private key (source):
    openssl genrsa -out private.key 2048
    openssl req -new -subj '/' -key private.key -out certreq.csr
    openssl x509 -req -days 3650 -in certreq.csr -signkey private.key -out self-signed.pem
    rm certreq.csr
    
  • View certificate details:
    openssl x509 -noout -text -in self-signed.pem
    

Bad FTP mirrors with fmirror or wget ? Use lftp !

Today I’ve found that my websites were not backed up as expected. I was using fmirror (v0.8.4) to get a copy from my host provider to my backup machine. Here is the command line I was using:

fmirror -kRS -u kevin -p pass -s ftp.website.com -r /html -l /mnt/removable/website_backup/current

But fmirror seems to not care about sub-directories starting from a given depth. One source of the problem could be strange file names (spaces, utf8 chars, etc).

Because I don’t had the time to investigate further, I was looking for an alternative. So I tried wget (v1.10) with the following command:

wget -r -nH -N --cut-dirs=1 -l0 -np --cache=off ftp://kevin:pass@ftp.website.com:21/html -o ../backup.log

This work perfectly on small websites. But on my biggest one (hundreds of MB), wget hang up with the following error:

*** glibc detected *** double free or corruption (top): 0x08097750 ***

It seems to be a known limitation of wget: “Wget has got serious problems retrieving huge sites” (source: “Possible Alternatives to WGET”).

So I went back to basics by using the good old lftp, which is efficient and reliable. Here is the command:

lftp -c 'open -e "mirror -e . ./ " ftp://kevin:pass@ftp.website.com:21/html'