2014-03-11

I’ve been in a long journey trying to find a great code highlighter, I’ve been using a lot of them that I can’t even remember. These are the ones I can remember right now:

SyntaxHighlighter

Google Prettifier

highlighter.js

Geshi

Right now I’m using highlighter.js but it wasn’t exactly what I want, what I want is to be able to highlight most “words” or reserved words, such as built in function, objects, etc. that this highlighter and most of them are missing. I know is not an important thing, unfortunately this was stuck in my head, until now.

Finally, I’ve found Pygments the perfect one that match with what I’ve been looking for and it’s the same used by GitHub. The only obstacle right now is that it’s a python based syntax highlighter and I’m using WordPress, and WordPress is built on PHP.

Installation

But hey, we can get over it, there is a solution, first, we need to get python installed on our server so we can use Pygments.

We aren’t going to go too deep on installation due to the fact that there are so many OS Flavors out there and it could be slightly different on each one of them.

Python

First of all you have to check if you already have python installed by typing python on your command line.

If not is installed, you should take a look at Python Downloads page and download your OS installer.

PIP Installer

To install pip installer according to its site, there are two ways to install it:

First and recommended way is downloading get-pip.py and run it on your command line:

Second way is using package managers, by running one of these possible two commands, like it have been mentioned before, this would depends on your server OS.

Or:

NOTE: you can use any package manager you prefer, such as easy_install, for the sake of example and because is the one used on Pygments site I used pip.

Pygments

To install pygments you need to run this command:

If you are on server where the user don’t have root access, you would be unable to install it with the previous command, if that is the case you have to run it with --user flag to install the module on the user directory.

Everything is installed now, so what we got to do is work with PHP and some Python code

PHP + Python

The way it’s going to work, it’s by executing a python script via php using exec() sending the language name and a filename of the file containing the code to be highlighted.

Python

The first thing we are going to do is create the python script that is going to convert plain code into highlighted code using Pygments.

So let’s go step by step on how to create the python script.

First we import all the required modules:

sys module provide the argv list which contains all the arguments passed to the python script.

highlight from pygments is in fact the main function along with a lexer would generate the highlighted code. You would read a bit more about lexer below.

HtmlFormatter is how we want the code generated be formatted, and we are going to use HTML format. Here is a list of available formatters in case of wondering.

This block of code what it does is that it takes the second argument (sys.argv[1]) and transform it to lowercase text just to make sure it always be lowercase. Because "php" !== "PHP". The third argument sys.argv[2] is the filename path of the code, so we open, read its contents and close it. The first argument is the python’s script name.

So it’s time to import the lexer, this block of code what it does is create a lexer depending on the language we need to analyze. A lexer what it does it analyze our code and grab each reserved words, symbols, built-in functions, and so forth.

In this case after the lexer analyze all the code would formatted into HTML wrapping all the “words” into an HTML element with a class. By the way the classes name are not descriptive at all, so a function is not class “function”, but anyways this is not something to be worried about right now.

The variable language contains the string of the language name we want to convert the code, we use lexer = get_lexer_by_name( language ) to get any lexer by their names, well the function it self explanatory. But why we check for php and guess first you may ask, well, we check for php because if we use get_lexer_by_name('php') and the php code does not have the required opening php tag <?php is not going to highlight the code well or as we expected and we need to create a the specific php lexer like this lexer = PhpLexer(startinline=True) passing startinline=True as parameter, so this opening php tag is not required anymore. guess is a string we pass from php letting it know to pygments we don’t know which language is it, or the language is not provided and we need it to be guessed.

There is a list of available lexers on their site.

The final step on python is creating the HTML formatter, performing the highlighting and outputing the HTML code containing the highlighted code.

For the formatter it’s passed linenos=False to not generate lines numbers and nowrap=True to not allow div wrapping the generate code. This is a personal decision, the code would be wrapped using PHP.

Next it’s passed code containing the actual code, lexer containing the language lexer and the formatter we just create in the line above which tell the highlight how we want our code formatted.

Finally it’s output the code.

That’s about it for python, that the script that is going to build the highlight.

Here is the complete file: build.py

PHP – WordPress

Let’s jump to WordPress and create a basic plugin to handle the code that needs to be highlighted.

It’s does not matter if you have never create a plugin for WordPress in your entire life, this plugin is just a file with php functions in it, so you would be just fine without the WordPress plugin development knowledge, but you need knowledge on WordPress development though.

Create a folder inside wp-content/plugins named wp-pygments (can be whatever you want) and inside it copy build.py the python script we just created and create a new php file named wp-pygments.php (maybe the same name as the directory).

The code below just let WordPress know what’s the plugin’s name and other informations, this code is going to be at the top of wp-pygments.php.

Add a filter on the_content to look for <pre> tags. the code expected is:

NOTE: html tags needs to be encoded; for example < needs to be < so the parse don’t get confused and do it all wrong.

Where class is the language of the code inside pre tags, if there is not class or is empty would pass guess to build.py.

preg_replace_callback function would execute mb_pygments_convert_code callback function every time there's a match on the content using the regex pattern provided: /<pre(\s?class\="(.*?)")?[^>]?.*?>.*?<code>(.*?).*?/sim, it should match on any <pre><code> on a post/page content.

What about sim?, these are three pattern modifiers flags. From php.net:

s: If this modifier is set, a dot metacharacter in the pattern matches all characters, including newlines.

i: If this modifier is set, letters in the pattern match both upper and lower case letters.

m: By default, PCRE treats the subject string as consisting of a single "line" of characters (even if it actually contains several newlines).

This can be done with DOMDocument(); as well. replace /<pre(\s?class\="(.*?)")?[^>]?.*?>.*?<code>(.*?).*?/sim with this:

The code below is from mb_pygments_convert_code function.

Reviewing the code above:

define a absolute plugin's directory path constant.

$pygments_build is the full path where the python script is located. Every time there is a match an array called $matches is passed containing 4 element. Take this as an example of a matched code from post/page content:

The element at position [0] is the whole <pre> match, and its value is:

The element at position [1] is the class attribute name with its value, and its value is:

The element at position [2] is the class attribute value without its name, and its value is:

The element at position [3] is the code itself without its pre tags, and its value is:

it creates a temporary file containing the code that would be passed to the python script. it's a better way to handle the code would be passed. instead of passing this whole thing as a parameters it would be a mess.

It creates the file of the code, but we decode all the HTML entities, so pygments can convert them properly.

It creates the python command to be used, it outputs:

Executes the command just created and if returns 0 everything worked fine on the python script. exec(); return an array of the lines outputs from python script. so we join the array outputs into one string to be the source code. If not, we are going to stick with the code without highlight.

Improving it by Caching

So by now with work fine, but we have to save time and processing, imagine 100 <pre> tags on a content it would creates 100 files and call 100 times the python script, so let's cache this baby.

Transient API

WordPress provide the ability of storing data on the database temporarily with the Transient API.

First, let's add a action to save_post hook, so every time the post is saved we convert the code and cache it.

if is a revision we don't do anything, otherwise we get the post content and call the pygments content filter function.

Let's create some functions to handle the cache.

At the beginning of mb_pygments_content_filter() add some lines to check if there is a cached for the post.

And at the end of mb_pygments_content_filter() add a line to save the post cache.

Finally, when the plugin is uninstall we need to remove all the cache we created, this is a bit tricky, so we use $wpdb object to delete all using this a query.

Read the full article at: Pygments on PHP & WordPress





Related Posts

Allow More HTML Tags in WordPress Comments

Add META Tags, Scripts, and Stylesheets to the WordPress Header and Footer

Create WordPress Shortcodes

Create a “Recent Posts” Module Outside of WordPress

Force Login to View WordPress Blog Pages

Show more