Tag Archive for 'HTML'

Python ultimate regular expression to catch HTML tags

1 year and 3 months ago I’ve came with a PHP regexp to parse HTML tag soup. Here is an improved version, in Python (my favorite language so far), that is normally much prone to detect strange HTML tags. It also support attributes without value so it’s closer to the HTML specification, but doesn’t strictly stick to it in order to catch tag soup and malformatted tags.

ultimate_regexp = "(?i)<\/?\w+((\s+\w+(\s*=\s*(?:\".*?\"|'.*?'|[^'\">\s]+))?)+\s*|\s*)\/?>"

And here is it applied in a trivial example (in a python shell):

>>> import re
>>>
>>> content = """This is the <strong>content</strong> in which we want to
<em>find</em> <a href="http://en.wikipedia.org/wiki/Html">HTML</a> tags."""
>>>
>>> ultimate_regexp = "(?i)<\/?\w+((\s+\w+(\s*=\s*(?:\".*?\"|'.*?'|[^'\">\s]+))?)+\s*|\s*)\/?>"
>>>
>>> for match in re.finditer(ultimate_regexp, content):
...   print repr(match.group())
...
'<strong>'
'</strong>'
'<em>'
'</em>'
'<a href="http://en.wikipedia.org/wiki/Html">'
'</a>'
>>>

How-to add a corner banner to a K2 Wordpress theme’s style

In this post I will give you all the technical details to create a corner banner for the wordpress K2 theme. This solution is uninstrusive as it can be bundled with a K2 style without modifying the K2 core theme.

We will use the new hooks from the brand new K2 1.0RC6. So first, we have to create a functions.php file in our style directory (example: /wp-content/themes/k2/styles/my-style). Then add the following PHP code in it:

<?php

// Add HTML code required by our corner banner
function add_corner_banner() {
  ?>
  <a id="cornerbanner" href="http://coolcavemen.com/news/new-website-beta-released/" title="New website released as beta version !"></a>
  <?php
}

// Call add_corner_banner() method on each template_body_top hook
add_action(‘template_body_top’, ‘add_corner_banner’);

?>

This code tell K2 to replace the template_body_top hook define in all K2 pages, by the result of the add_corner_banner() PHP function. This function is coded to return the HTML code we need for the corner banner.

Then we need to add the following CSS code to our style (/wp-content/themes/k2/styles/my-style/my-style.css):

#cornerbanner {
  background: url("/wp-content/themes/k2/styles/my-style/corner-banner.png") no-repeat;
  display: block;
  height: 205px;
  width: 205px;
  position: absolute;
  top: 0;
  right: 0;
  z-index: 999;
  text-decoration: none;
}

This CSS code refer to the corner-banner.png which is a 205×205 px PNG image with an alpha channel to simulate shadows and fine transparency. Here is the Gimp xcf source file I used to generate it.

This CSS code is designed for a top right banner. If you need a top left banner, replace:

  right: 0;

by

  left: 0;

This also work for horizontal positioning:

  top: 0;

can be replaced by

  bottom: 0;

That’s all ! My solution is not supposed to work (and was not tested) with Internet Explorer as the latter is known to have terrible PNG transparency support. You can still apply fixes on my code using iepngfix, jquery or PNG8 images.

I’ve provided you with all the technical details to create a corner banner and add it to your K2 style seamlessly. It’s now up to you to adapt it to your needs. Be Creative ! Oh, and by the way, when you’ll change the banner PNG file, do not forget to update the CSS code with your image width and height.

Update: my friend QPX sent me an alternative banner made with photoshop: here is the ready-to-use PNG file and the photoshop source file.

How-to inherit CSS width attributes for Internet Explorer

Let’s say you rely on a third party CSS framework that set the default layout of your content. The following CSS rule is part of the framework:

img {
  width: 100%;
}

This CSS directive tell all your images to use the full width available to them.

Now, for any reason (aesthetic, layout, etc.), you want to reset this behaviour.

One solution (the laziest) is to remove those three lines from the original CSS file. But if you’re like me, this sound too dirty for you as you don’t want to modify the original CSS file (I like to avoid patches on third party tools and libraries I don’t maintain).

Another solution consist in overriding this width attribute in another CSS file that you will call after the original one. This case is covered by the CSS 2.1 specification which define the inherit value:

img {
  width: inherit;
}

This solution is perfect and work as expected in Firefox. Unfortunately, and without any surprise, it doesn’t with Microsoft’s browser as IE has anecdotical support of CSS’s inherit.

But today I found a trick to fix this in both Firefox and Internet Explorer. The workaround is to use the auto value instead of inherit:

img {
  width: auto;
}

I’ve tested it with both Firefox 3.0rc1 and Internet Explorer 6.0.2800.1106CO.

Of course this solution is not generic: it only work in my case because img html tags has width attributes that support the auto value.

Ultimate Regular Expression for HTML tag parsing with PHP

Disclaimer: this is a dirty hack ! To parse HTML or XML, use a dedicated library.

Tonight I found the ultimate regex to get HTML tags out of a string. It was written a year ago by Phil Haack on his blog. His regex is quite bullet-proof: it’s able to parse HTML tags written on multiple lines which contain any sort of attributes (with or without a value, with single or double quotes).

Unfortunately his regular expression was designed for Microsoft .NET, so I’ve spend some time to convert it to PHP. Here is the result:

$regex = "/<\/?\w+((\s+\w+(\s*=\s*(?:\".*?\"|'.*?'|[^'\">\s]+))?)+\s*|\s*)\/?>/i";

And finally, my version based on the one above:

$regex = "/<\/?\w+((\s+(\w|\w[\w-]*\w)(\s*=\s*(?:\".*?\"|'.*?'|[^'\">\s]+))?)+\s*|\s*)\/?>/i";

The latter include the following enhancement:

  • accept hyphens as attribute’s middle characters (thanks Ged)