Regular expression

  • Count the number of lines with at least one occurrence of the y character:
$ cat test.txt
asd  dd :; >
y YYYyy  yyy
 .

asdkjlyes
    kjkjhkjhy

$ grep -o '.*y.*' ./test.txt | wc -l
3

Replace

  • Text replacement:
$ sed 's/string to replace/replacement string/g' original-file.txt > new-file.txt
  • Dynamic, in-place replacement of unreleased text with today’s date:
$ sed -i "s/unreleased/`date +'%Y-%m-%d'`/" ./changelog.md
  • Replace all occurrences of str1 by str2 in all files below the /folder path:
$ find /folder -type f -print -exec sed -i 's/str1/str2/g' "{}" \;
  • Same as above but ignore all content of .svn folders and .zip files:
$ find /folder -type f -not -regex ".*\/\.svn\/.*" -not -iname "*\.zip" -print -exec sed -i 's/str1/str2/g' "{}" \;
  • Remove trailing spaces and tabs in every XML files:
$ find /folder -iname "*.xml" -exec sed -i 's/[ \t]*$//' "{}" \;
  • Place a new --- line at the start of each .markdown files (see result):
$ find ./folder -iname "*.markdown" -exec sed -i '1s/^/---\n/' "{}" \;
  • Place a new --- line before the first empty line of each .markdown files (see result):
$ find ./folder -iname "*.markdown" -exec sed -i '0,/^$/s//---\n/' "{}" \;
  • Remove lines starting with prefix1: or prefix2: in all .markdown files:
$ find /folder -iname "*.markdown" -exec perl -p -i -e 's/(prefix1|prefix2): .*\n//sg' "{}" \;
  • Remove lines matching a regex (encoding particular markdown TOC entries), save the result in place and save a backup of the original content in a .bak file:
$ gawk -i inplace -v INPLACE_SUFFIX=.bak '!/^- \[(Contribute|Contributing|Licence|License)\]\(#.+\)$/{print}' ./readme.md
  • Use sed address ranges to spot, in a Markdown file, all blocks led by a ::: directive, and terminated by a blank line. Then replace in each of these matched blocks the a letter by XXX. Notice how a occurrences outside the blocks are not replaced by XXX:
$ cat ./example.md

This is a code block:

:::shell-session
→ apache
→ java
→ python

This is another block:

:::shell-session
→ rust
→ haskell
→ javascript

This is a random sentence.

$ sed "/^:::/,/^$/ s/a/XXX/g" ./example.md

This is a code block:

:::shell-session
→ XXXpXXXche
→ jXXXvXXX
→ python

This is another block:

:::shell-session
→ rust
→ hXXXskell
→ jXXXvXXXscript

This is a random sentence.
  • In the same spirit as above but this time to spot indented blocks starting with :::, then wrap them into triple-backticks fences:
$ cat ./example.md

This is a code block:

    :::shell-session
    → apache
    → java
    → python

This is another block:

    :::shell-session
    → rust
    → haskell
    → javascript

This is a random sentence.

$ find ./folder -iname "*.md" \
> -exec sed -i "/^    :::/,/^$/ s/^$/    \`\`\`\n/" "{}" \; \
> -exec sed -i "/^    :::/,/^$/ s/:::/\`\`\`/"      "{}" \;

$ cat ./example.md

This is a code block:

    ```shell-session
    → apache
    → java
    → python
    ```

This is another block:

    ```shell-session
    → rust
    → haskell
    → javascript
    ```

This is a random sentence.
  • Strip in-place the block of text starting with XXX and ending with an empty line:
$ cat ./example.md

This is a code block:

XXX{shell-session}
→ apache
→ java
→ python

This is a random sentence.

$ perl -i -ne "print if not /XXX/ .. /^$/" ./example.md

$ cat ./example.md

This is a code block:

This is a random sentence.
  • Same as above, but with sed:
$ sed -i "/^XXX/,/^$/ d" ./example.md
  • Python one-liner to delete the first occurrence of a block of text delimited by triple-backticks fences. Contrary to methods above, this one is not distracted by blank lines within the text block:
$ python -c 'import re; from pathlib import Path; file = Path("./example.md"); file.write_text(re.sub(r"^\`\`\`.*?\`\`\`\n\n", "", file.read_text(), count=1, flags=re.MULTILINE | re.DOTALL))'
  • Append the content of the addendum.txt file to all .markdown files:
$ find ./folder -iname "*.markdown" -print -exec bash -c 'cat ./addendum.txt >> "{}"' \;
  • Replace all accentuated characters by their non-accentuated variants (thanks Matthieu for the tip):
$ echo "éÈça-$" | iconv -t ASCII//translit

Date & Time

  • Get the date of last week:
$ date +"%Y-%m-%d" -d last-week
  • Get the current date in english:
$ env LC_TIME=en date +"%a %b %d %Y"
  • Get the number of seconds since epoch:
$ date +%s
  • Convert back epoch time to human-readable date:
$ date --date=@1234567890

Transcoding

  • In place charset transcoding:
$ recode utf-8..latin-1 utf8text.txt

Edition

Additional References