<?xml version="1.0" encoding="UTF-8"?> <rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" ><channel><title>Kevin Deldycke &#187; PCRE</title> <atom:link href="http://kevin.deldycke.com/tag/perl-compatible-regular-expressions/feed/" rel="self" type="application/rss+xml" /><link>http://kevin.deldycke.com</link> <description>Free software engineer &#38; wannabe videomaker</description> <lastBuildDate>Fri, 03 Feb 2012 19:08:27 +0000</lastBuildDate> <language>en</language> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <generator>http://wordpress.org/?v=3.3.1</generator> <item><title>Ultimate Regular Expression for HTML tag parsing with PHP</title><link>http://kevin.deldycke.com/2007/03/ultimate-regular-expression-for-html-tag-parsing-with-php/</link> <comments>http://kevin.deldycke.com/2007/03/ultimate-regular-expression-for-html-tag-parsing-with-php/#comments</comments> <pubDate>Fri, 23 Mar 2007 22:27:09 +0000</pubDate> <dc:creator>Kev</dc:creator> <category><![CDATA[English]]></category> <category><![CDATA[HTML]]></category> <category><![CDATA[parsing]]></category> <category><![CDATA[PCRE]]></category> <category><![CDATA[PHP]]></category> <category><![CDATA[regexp]]></category><guid isPermaLink="false">http://kevin.deldycke.com/2007/03/ultimate-regular-expression-for-html-tag-parsing-with-php/</guid> <description><![CDATA[Disclaimer: this is a dirty hack ! To parse HTML or XML, use a dedicated library. Tonight I found the ultimate regex to get HTML tags out of a string. It was written a year ago by Phil Haack on &#8230; <a href="http://kevin.deldycke.com/2007/03/ultimate-regular-expression-for-html-tag-parsing-with-php/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<p><em><strong>Disclaimer</strong>: this is a dirty hack ! To parse HTML or XML, <a href="#comment-4740">use a dedicated library</a>.</em></p><p>Tonight I found the ultimate <a href="http://en.wikipedia.org/wiki/Regular_expression">regex</a> to get HTML tags out of a string. It was <a href="http://haacked.com/archive/2005/04/22/Matching_HTML_With_Regex.aspx">written a year ago by Phil Haack on his blog</a>. His regex is quite bullet-proof: it&#8217;s able to parse HTML tags written on multiple lines which contain any sort of attributes (with or without a value, with single or double quotes).</p><p>Unfortunately his regular expression was designed for Microsoft .NET, so I&#8217;ve spend some time to convert it to PHP. Here is the result:</p><pre class="brush: php; title: ; notranslate">
$regex = &quot;/&lt;\/?\w+((\s+\w+(\s*=\s*(?:\&quot;.*?\&quot;|'.*?'|[^'\&quot;&gt;\s]+))?)+\s*|\s*)\/?&gt;/i&quot;;
</pre><p>And finally, my version based on the one above:</p><pre class="brush: php; title: ; notranslate">
$regex = &quot;/&lt;\/?\w+((\s+(\w|\w[\w-]*\w)(\s*=\s*(?:\&quot;.*?\&quot;|'.*?'|[^'\&quot;&gt;\s]+))?)+\s*|\s*)\/?&gt;/i&quot;;
</pre><p>The latter include the following enhancement:</p><ul><li>accept hyphens as attribute&#8217;s middle characters (<a href="http://kevin.deldycke.com/2007/03/ultimate-regular-expression-for-html-tag-parsing-with-php/#comment-3167">thanks Ged</a>)</li></ul> ]]></content:encoded> <wfw:commentRss>http://kevin.deldycke.com/2007/03/ultimate-regular-expression-for-html-tag-parsing-with-php/feed/</wfw:commentRss> <slash:comments>31</slash:comments> </item> </channel> </rss>
<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Minified using disk: basic
Page Caching using disk: enhanced
Database Caching 2/15 queries in 1.508 seconds using apc
Object Caching 496/513 objects using apc

Served from: kevin.deldycke.com @ 2012-02-08 22:50:25 -->
