Tonight I found the ultimate regex to get HTML tags out of a string. It was written a year ago by Phil Haack on his blog . His regex is quite bullet-proof: it’s able to parse HTML tags written on multiple lines which contain any sort of attributes (with or without a value, with single or double quotes).

Unfortunately his regular expression was designed for Microsoft . NET , so I’ve spend some time to convert it to PHP . Here is the result:

$regex = "/<\/?\w+((\s+\w+(\s*=\s*(?:\".*?\"|'.*?'|[^'\">\s]+))?)+\s*|\s*)\/?>/i";

And finally, my version based on the one above:

$regex = "/<\/?\w+((\s+(\w|\w[\w-]*\w)(\s*=\s*(?:\".*?\"|'.*?'|[^'\">\s]+))?)+\s*|\s*)\/?>/i";

The latter include the following enhancement:

  • accept hyphens as attribute‚Äôs middle characters ( thanks Ged )

Related content