Regular expression, MIME type, Media Type, proxy, SOCKS, Tor, curl, yt-dlp

Scraping

  • Download a web page an all its requisites:
$ wget -r -p -nc -nH --level=1 https://pypi.python.org/simple/python-ldap/
  • Check the local SOCKS proxy started by a Tor Browser is working:
$ curl --preproxy 127.0.0.1:9150 "https://check.torproject.org"
  • Reuse local Tor Browser proxy to download a video:
$ yt-dlp --proxy socks5://127.0.0.1:9150 "https://www.video-provider.com/watch/random_id"
  • Create a PNG image of a rendered html page:
$ kwebdesktop 1024 768 capture.png https://slashdot.org/

Servers

  • Test that your site is sending gzipped content:
$ curl -i -H "Accept-Encoding: gzip,deflate" https://kevin.deldycke.com 2>&1 | grep gzip
  • Ping some pages on internet to force our corporate proxy to refresh its internal cache:
$ for EGG in BeautifulSoup PIL Plone; do wget --server-response -O /dev/null https://pypi.python.org/simple/$EGG/; done
  • Debug mysterious numbers (source):
$ echo 'obase=16; 1195725856' | bc | xxd -r -ps | od -cb
0000000   G   E   T
        107 105 124 040
0000004

Certificates

  • Create a minimal self-signed unencrypted SSL certificate without issuer information and a validity period of 10 years:
$ openssl req -x509 -nodes -subj '/' -days 3650 -newkey rsa:2048 -keyout self-signed.pem -out self-signed.pem
  • Create a pair of SSL self-signed certificate and (unencrypted) private key (source):
$ openssl genrsa -out private.key 2048
$ openssl req -new -subj '/' -key private.key -out certreq.csr
$ openssl x509 -req -days 3650 -in certreq.csr -signkey private.key -out self-signed.pem
$ rm certreq.csr
  • View certificate details:
$ openssl x509 -noout -text -in self-signed.pem
  • Fetch from a website its first certificate of the chain:
$ openssl s_client -connect imap.gmail.com:993 -showcerts 2>&1 < /dev/null | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' | sed -ne '1,/-END CERTIFICATE-/p' > ~/gmail.pem
  • Fetch the certificate from a website (the one returned is the last of the chain):
$ openssl s_client -connect imap.gmail.com:993 -showcerts 2>&1 < /dev/null | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' | tac | sed -ne '1,/-BEGIN CERTIFICATE-/p' | tac > ./google.pem

MIME type

$ find ./www -type f -exec file --mime-type -b "{}" \; | sort | uniq

Markup

  • Search non-breakable spaces that doesn’t end with a semicolon:
$ grep -RIi --extended-regexp '&nbsp[^;]' ./