UltraMega Blog
7Apr/099

Creating a BBCode Parser

Have you ever wanted to implement BBCode, the special formatting codes used by forums, into your own PHP scripts? Well, it's actually pretty easy to accomplish using some simple regular expressions and the preg_replace PHP function. This mini-tutorial will show you how to create a function that you can use on any string to convert BBCode into its XHTML equivalent.

The advantage to using BBCode instead of allowing XHTML in user input is that it allows users to safely format their content without the risk of invalid code breaking the page formatting. It also tends to be easier to understand BBCode over XHTML due to its simplified syntax.

Anyway, here is how we'll do it. We're going to use the preg_replace function to convert the text using regex patterns. This function accepts both strings or arrays as patterns and replacements, so this allows us to convert all types of BBCode in one call of the function. We just create an indexed array of regex patterns and a matching array of replacements, so that's what we'll to first:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// The array of regex patterns to look for
$format_search =  array(
   '#\[b\](.*?)\[/b\]#is', // Bold ([b]text[/b]
   '#\[i\](.*?)\[/i\]#is', // Italics ([i]text[/i]
   '#\[u\](.*?)\[/u\]#is', // Underline ([u]text[/u])
   '#\[s\](.*?)\[/s\]#is', // Strikethrough ([s]text[/s])
   '#\[quote\](.*?)\[/quote\]#is', // Quote ([quote]text[/quote])
   '#\[code\](.*?)\[/code\]#is', // Monospaced code [code]text[/code])
   '#\[size=([1-9]|1[0-9]|20)\](.*?)\[/size\]#is', // Font size 1-20px [size=20]text[/size])
   '#\[color=\#?([A-F0-9]{3}|[A-F0-9]{6})\](.*?)\[/color\]#is', // Font color ([color=#00F]text[/color])
   '#\[url=((?:ftp|https?)://.*?)\](.*?)\[/url\]#i', // Hyperlink with descriptive text ([url=http://url]text[/url])
   '#\[url\]((?:ftp|https?)://.*?)\[/url\]#i', // Hyperlink ([url]http://url[/url])
   '#\[img\](https?://.*?\.(?:jpg|jpeg|gif|png|bmp))\[/img\]#i' // Image ([img]http://url_to_image[/img])
);
// The matching array of strings to replace matches with
$format_replace = array(
   '<strong>$1</strong>',
   '<em>$1</em>',
   '<span style="text-decoration: underline;">$1</span>',
   '<span style="text-decoration: line-through;">$1</span>',
   '<blockquote>$1</blockquote>',
   '<pre>$1</'.'pre>',
   '<span style="font-size: $1px;">$2</span>',
   '<span style="color: #$1;">$2</span>',
   '<a href="$1">$2</a>',
   '<a href="$1">$1</a>',
   '<img src="$1" alt="" />'
);

As you can see, $format_search is a list of regex patters (read the comments to see what each one does) and $format_replace is a list of the XHTML versions.

More about the regex used here:

  • We used '#' as the opening and closing delimiters instead of the usual '/' so we don't have to escape the '/' with a backslash
  • The characters after the closing '#' are modifiers, 'i' makes it case insensitive and 's' makes the search work across multiple lines
  • (.*?) matches any and all characters up to the first instance of the next part of the pattern (i.e. '(.*?)[/b]' matches everything until it finds '[/b]')
  • In the replacements, '$1' matches the first parenthesized part of the pattern, '$2' the second, etc.

All we have to do now is pass these arrays to the function:

$str = preg_replace($format_search, $format_replace, $str);

Then we wrap it all in a function that accepts $str and returns the formatted string:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
function bbcode_format($str){
   // Convert all special HTML characters into entities to display literally
   $str = htmlentities($str);
   // The array of regex patterns to look for
   $format_search =  array(
      '#\[b\](.*?)\[/b\]#is', // Bold ([b]text[/b]
      '#\[i\](.*?)\[/i\]#is', // Italics ([i]text[/i]
      '#\[u\](.*?)\[/u\]#is', // Underline ([u]text[/u])
      '#\[s\](.*?)\[/s\]#is', // Strikethrough ([s]text[/s])
      '#\[quote\](.*?)\[/quote\]#is', // Quote ([quote]text[/quote])
      '#\[code\](.*?)\[/code\]#is', // Monospaced code [code]text[/code])
      '#\[size=([1-9]|1[0-9]|20)\](.*?)\[/size\]#is', // Font size 1-20px [size=20]text[/size])
      '#\[color=\#?([A-F0-9]{3}|[A-F0-9]{6})\](.*?)\[/color\]#is', // Font color ([color=#00F]text[/color])
      '#\[url=((?:ftp|https?)://.*?)\](.*?)\[/url\]#i', // Hyperlink with descriptive text ([url=http://url]text[/url])
      '#\[url\]((?:ftp|https?)://.*?)\[/url\]#i', // Hyperlink ([url]http://url[/url])
      '#\[img\](https?://.*?\.(?:jpg|jpeg|gif|png|bmp))\[/img\]#i' // Image ([img]http://url_to_image[/img])
   );
   // The matching array of strings to replace matches with
   $format_replace = array(
      '<strong>$1</strong>',
      '<em>$1</em>',
      '<span style="text-decoration: underline;">$1</span>',
      '<span style="text-decoration: line-through;">$1</span>',
      '<blockquote>$1</blockquote>',
      '<pre>$1</'.'pre>',
      '<span style="font-size: $1px;">$2</span>',
      '<span style="color: #$1;">$2</span>',
      '<a href="$1">$2</a>',
      '<a href="$1">$1</a>',
      '<img src="$1" alt="" />'
   );
   // Perform the actual conversion
   $str = preg_replace($format_search, $format_replace, $str);
   // Convert line breaks in the <br /> tag
   $str = nl2br($str);
   return $str;
}

See an example of this function in action here

This is just a simple implementation of regular expressions to find and replace special character sequences. You may want to improve the replacement format for things like quotes or code blocks to make them more useful. You can also add CSS classes to the replacements to make things look nice.

Posted by Steve

Comments (9) Trackbacks (1)
  1. Though it’s compact and cute, the url tag could be used for various exploits, the htmlentities() notwithstanding (using javascript: schemes, for example).

    Perhaps you ought to change the (.*?) to a regex that will match only proper urls.

  2. Do you recommend any BBCODE editor for the front end user?

  3. I don’t mean to be mean or anything, but this is a really bad BBCode parser.

    What about generating valid HTML? If I did [b]bold [i]italc[/b] still italic[/i] it would generate:

    <strong>bold <em>italic</strong> still italic</em>

    when it should be:
    <strong>bold <em>italic</em></strong> still italic[/i]

    • Yes, this is very basic and has much room for improvement. If you give it malformed BBCode, you’ll get malformed HTML. So obviously this shouldn’t be used on a public interface. This is really just meant to demonstrate the concept.

  4. Thanks!
    It is possible to adapt this in order to create something like Tidypost?
    it was a form script used in 2003/2005.
    It generated a code that you could post into the topic field.
    it had input fields, like image url, album url, preview, genre, bitrate, etc.

    example:
    http://www.clubbingspain.com/phpBB/dj-sets-propios/applejux-you-won-t-like-this-applejux-mix-t13869.html

    thanks


Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.