Over the winter vacation I relaxed and spent time away from thinking about the startup fundraising process. I had a latent interest in Jekyll ever since Tom Preston-Werner (Cofounder of GitHub) created, released and wrote about it in Blogging Like a Hacker. I finally got the chance to sit down and fully grok Jekyll. I decided to migrate my site from WordPress to Jekyll. I designed a new layout, imported my database and created some custom features to get it all to my liking. If you're reading this in an RSS reader, you will want to click through to see the new site.


Yup, still love my Kindle.

(Disclaimer: At over 8,000 words this is the longest post I have written (runner up). If you have any questions or if anything in here is presented in a confusing manner, or is just plain wrong, please don't hesitate to leave a comment or send me an email. I will clean up my Jekyll blog repo soon and make it public.)

Why the big move?

Like any other hacker I just wanted to learn a new tool.

I have been running this blog on WordPress for about 5.5 years. I first ran into WordPress when I was running a MediaWiki-powered website about computer modding and was curious about other CMSs. I fell in love with WordPress and set it up on my 1.42GHz G4 Mac Mini (which I had overclocked to a whopping 1.5GHz by unsoldering two SMT resistors). My first few months of blogging were surreal — several of my posts made it on digg and Lifehacker in 2005. I thought that was the coolest thing ever. I continued writing.

I soon received enough traffic to kill the 2.5-inch hard drive in my Mac Mini, which had been hosting this blog in my Georgia Tech dorm. It was around this time that I began modifying my site and picking up basic web design and development skills. I moved my blog over to Media Temple and have been happy there since. I am now on a developer-aimed VPS called the ProDev (ve) — 4GB of RAM on a dual quad-core 2.26GHz Xeon with lucid-flavored Ubuntu.

I am not leaving WordPress for any frustrations or problems. WordPress is a very capable and extensible CMS. Most anything you want done with WordPress is a search away. If you think your WP-powered site is slow, there are a number of fixes for the speed freaks, including generating static files and working with a CDN. The WordPress community has been amazingly helpful. I went to WordCamps, sponsored one and helped out other WP users through tutorials and forums. A big thank you to folks like Matt Mullenweg, Michael Heilemann and Mark Jaquith.

Content is king

I wanted to start from scratch and completely rebuild this blog. This time around I was going to focus on content. Over the last few years I have experimented with monetizing this site, as you may have noticed. I tried various affiliate programs, CPA ads, AdSense, Amazon Associates, RSS ads and private sales. It was starting to work. I wasn't the next John Chow but I was making 3,000 a month during this blog's financial peak around mid-2009.

Then it went down to a few hundred a month over the next year and a half. There are a few reasons for that:

I began posting less and less due to increasing startup obligations. That's to be expected. This blog is a great side-project and hobby but not a full-time job.

This blog had become known for in-depth, long-form content and as such I didn't blog unless I knew I had enough substance to make for an interesting article. Things I would have written about, but to a lesser extent, would go unpublished entirely.

I made a few SEO mistakes that knocked down my PR, and thus traffic, down. The first was using an early version of bbPress to run forums here. That version did not rel="nofollow" links to websites on user profile pages. Tens of thousands of spammers signed up and those profile pages linked to all sorts of sites. Google did not like that. My second mistake was running a translation plugin that generated every article in many languages. It eventually got to the point where Russian-translated posts would rank higher than the English posts and it became impossible for readers to find articles. I couldn't even find my own posts on Google, even when I knew the exact title. Third, I had some mysterious redirection issue that plagued my site for too long. Pages would randomly redirect. I couldn't reproduce it, Media Temple couldn't reproduce it. But I got tons of reports about it from my readers. I thought I had fixed it but it still occurred once every 50 or so page loads. Google ended up indexing articles with different URLs. It was a mess.

Google noted that my site loaded 88% slower than other sites. It was partially all the ads my site was running and partially all the images I put in my long posts.

Traffic also dwindled during this period. It was time to reboot. Whatever I was going to do needed to change all this and get me blogging more, one way or another. I fixed that with bits. More on that later.

What is Jekyll?

Oops, guess I didn't explain exactly what it is yet. Here's what the repository says:

Jekyll is a simple, blog aware, static site generator. It takes a template directory (representing the raw form of a website), runs it through Textile or Markdown and Liquid converters, and spits out a complete, static website suitable for serving with Apache or your favorite web server.

Jekyll is not really a CMS. There is no admin panel to edit, write and manage posts. But there is vim, emacs, TextMate, gEdit, Redcar, Notepad++ or your text editor of choice. In a nutshell: write your posts in markdown or textile (or in my case just keep using HTML like I've always been using in my posts and keep it future-proof), run jekyll and it will create a site directory filed with beautiful static, flat HTML files. You don't need even a database on your server...

I freaking love having my entire site in static files. It's a nerd feeling that's hard to explain. As a Mac user, I just need to activate Spotlight with two keystrokes and I can instantly find any old blog post.

You'll want to tell Spotlight to ignore the _site directory. Seductive wallpaper by Silk.

Or while I'm in my editor of choice I can activate PeepOpen to find and open any post. Or I can run Ack in Project in TextMate to find any string inside of my posts. Or maybe I was curious and wanted to list the top 5 posts by word count?

Flat files are just cool.

The Plan

After I was sure that I wanted to embark on this journey I had to think about how this would all work and what sacrifices I would have to make. I would need to implement some custom stuff to get some features and pages I was used to with WordPress. It was also important that I kept the exact same URL structure.

Here was the initial list of tasks that had to be completed/built:

Import WordPress database and retain tags

Move all images to Amazon CloudFront and rewrite posts to use new image URL

Make HTML files for regular pages and 404/500

Make a search page using Google Custom Search

Be able to parse <!--more-->'s used in WP posts and show post up to that tag for previews.

Create entire new layout and use Typekit because it makes the wannabe-designer in me happy.

Figure out layouts and includes for various parts of the site

Tags and individual tag pages + fix Jekyll issue with not supporting tags with spaces

Archives listed by month/year and individual archive pages to keep URLs like /2011/01

Sitemap that lists posts, archive pages and tag pages

Create a feed template and ensure it correctly redirects to FeedBurner

Create include in feed so I can put RSS ads at some point

Compass for Sass

List related posts

Do lots of .htaccess work to make sure URLs are as close to the old structure as possible and use link rel="canonical" where appropriate.

Ditch Mint for web stats and go database-free with something like Chartbeat or Reinvigorate (I also have Google Analytics)

Migrate all comments to Disqus

Get next/previous post links working

Create new section of the site for short-form content, create separate archives page and feed

Be able to put custom meta descriptions from content in YAML front matter in posts if wanted.

Write a rakefile to ease some routine tasks like generation and deploy

miscellany...

A lot of work was ahead.

Getting started with Jekyll

The first thing I did was create a new GitHub repository for the blog. Then I had to begin creating the file and directory structure Jekyll expects. Here's what my Jekyll directory looks like:

Rather than creating all these files from scratch, a good first step is to fork someone else's Jekyll blog and modify as you see fit. My Jekyll repository isn't public yet, I still have a lot of cleaning up to do but Tom Preston-Werner's Jekyll repo is popular for forking. Just be sure to remove all of his posts and images before you publish your new site.

The directory tree listed above is not what you put in your web server's public html directory. Instead, you point the web server to the _site directory. That's where the entire generated site and static HTML files are stored. Posts written in markdown, textile or regular HTML go in _posts while _layouts and _includes are reserved for HTML/Liquid markup layouts for various content types (page, post, homepage, etc) and any necessary HTML fragments/partials, respectively. As you can imagine, the _drafts folder is where you can stash posts you don't want to be generated and published until you're ready to move them to the posts folder.

Jekyll pays close attention to files that contain YAML Front Matter and Liquid template tags. A YAML front matter block at the beginning of any file can contain custom page variables as well as predefined ones such as: layout, title, date, tags and categories.

Liquid on the other hand is a markup language created by the Shopify folks that makes for easy layout creation. Liquid tags are either bound by curly braces and modulos, or double curly braces. The latter is for outputting content while the former is for conditionals and setting up loops. Here is part of my index.html file:

You can see it has a bit of YAML front matter, then it includes a file (my yellow "call to action" bar) that is stored in the _includes directory, then creates a few post loops and outputs content. I have two loops here because I want the first post displayed differently (that's what the post_listing.html include is for) and then the rest displayed in a simple list.

Here's what the post_listing.html include looks like:

While in the site.posts loop, this include has access to the template data for each post. There are some sections where variables are piped through filters. Several are included with Jekyll, such as date_to_string.

You can also specify no layout with "nil" and still access template data. For example, here's how I created my atom feed.

That's the basic gist of Jekyll layouts and templating. My particular setup is a bit more complex with 5 layouts and 9 includes, whereas Tom's blog contains just two layouts and no includes. Learn from his setup and expand as you see fit! I'm purposely being a bit brief here as getting the layout setup is pretty straightforward.

_config.yml

Jekyll's config file is a good place to start while building out your site. The default configuration and various configuration settings are explained on the Jekyll wiki. The default settings should be satisfactory, but you'll want to set markdown to rdiscount (explained later) and adjust the permalink style.

You may opt to put in various custom variables like I did with description and root_desc that I use in various parts of my layouts. Also, base_url is handy. When you're developing locally you can keep it to forward slash so that when the site is generated it links to other local posts, but when you're ready to deploy live adjust the url to your domain. There is no definitive argument for why you need to include the full domain versus relative, SEOs go both ways on it, but I simply prefer including the entire url.

Local Dev Environment

By now you'll want to actually install Jekyll itself. If you don't plan on doing any Jekyll hacking and will just be using it as is, you can just use the ruby gem:

If think you'll be doing some Jekyll hacking of your own, or using someone else's fork (there are tons of great Jekyll forks to be found), it's best to fork, clone and add the path to your bash profile. For example, cloning and running my Jekyll fork:

Now you just have to add that freshly-cloned Jekyll to your path.

Add this line near the end, editing the path and directory name for your fork accordingly:

Then run source ~/.bash_profile. Or you could use caret quick substitution: ^vim^source. It edits the last entered command but replaces "vim" with "source" and runs it. Handy tip for your bash repertoire.

RDiscount

Discount is a fast C implementation of John Gruber's Markdown markup language while the RDiscount extension makes this Discount Markdown processor available via a Ruby C Extension library. If you write posts in Markdown, you will need RDiscount to process the markup and convert the post to HTML. I don't usually write in Markdown, but like that it does basic things like add <p>'s for separate lines. There are other options like Maruku instead of RDiscount but I would not recommend it. Maraku was slow for me and didn't know how to render some of my post markup, resulting in errors like "REXML could not parse this XML/HTML". Stick with RDiscount and you should be fine:

sudo gem install rdiscount

If you use RDiscount you'll have to use Pygments for code syntax highlighting (what I use in this post). There are other options as well, such as CodeRay, which appears to only work with kramdown (a pure-Ruby Markdown converter that is slower than RDiscount, though CodeRay is faster than Pygments). Or you can just use GitHub Gist embeds for all your code needs.

I went with RDiscount + Pygments was my choice and I've been happy with it so far. You'll need the easy_install (a package manager like RubyGems but for Python) to install Pygments if you do not have it yet.

sudo easy_install Pygments

jekyll.dev

The next thing I did was setup Jekyll's _site directory with Apache on my MacBook Pro for easy local development. Jekyll comes with its own web server that's great for local testing (jekyll --server) but I prefer setting up a vhost.

You probably know the drill with adding directories and virtual hosts to Apache but here's a refresher. Open up the Apache conf file in your editor of choice. This can be found at /etc/apache2/httpd.conf in OS X. Add the sections below, editing the name of your jekyll directory and ServerName as necessary. The VirtualHost section usually goes at the very end of httpd.conf. If you want to be in good form you can create another .conf file altogether instead of editing the main Apache conf, but I digress.

The jekyll-blog directory doesn't actually live in /Library/WebServer/Documents/ on my setup; I prefer to have jekyll-blog in my home directory and just symlink it into the WebServer Documents folder.

Then add the line below to the end of your /etc/hosts file.

127.0.0.1       jekyll.dev

Restart Apache and try heading to http://jekyll.dev in your browser. You should get some generic Apache page if you don't have any files in the _site directory yet. When you generate the site you can start browsing the complete site locally.

Compass for Sass

I'm a huge Sass advocate and have been using it to generate my CSS for the last two years. Compass, a popular "Sass-based CSS Meta-Framework", was one of the first things I setup when designing the new site. For those new to Sass, here's how the official site describes it:

Sass makes CSS fun again. Sass is an extension of CSS3, adding nested rules, variables, mixins, selector inheritance, and more. It’s translated to well-formatted, standard CSS using the command line tool or a web-framework plugin.

sudo gem install compass

Then create config.rb and place it in the site root.

Run compass compile or compass watch to generate the CSS. Later in this post I share a rake task for site generation that compiles the Sass too.

Importing WordPress Posts

Now that you know how Jekyll processes files and generates the site, it's time to import your database. A set of migration scripts already comes with Jekyll and currently supports CSV, Drupal, Marley, Mephisto, MovableType, TextPattern, Typo, WordPress and WordPress.com. I ended up slightly modifying another user's custom WordPress migration script that added the ability to add tags to posts. My tweak dealt with rewriting image URLs in my posts:

Here is the complete WordPress importer script I used. It's easiest to copy your database to whichever computer you'll be running the script from and importing it into MySQL then providing the script with those database credentials. After running the script, I had a new _posts folder filled with all of my posts in markdown files with the correct YAML front matter including tags and title. The date was not placed in the YAML but is present in the name of the file (ex: 2011-01-20-my-post-slug.markdown), which is used when generating the site. However, if you post many times per day, the date in the slug is not specific enough and you might run into issues where Jekyll doesn't know which order to display posts published on the same day. To fix that you'll want to edit the importer to include a timestamp in the YAML for each post. I believe Harper Reed's migration script does just that.

CloudFront for Images

As for how I was going to move 460MB of images from my server to Amazon S3 for use with CloudFront, I used a nifty command line tool on my server called s3cmd. But it can be done easily via drag and drop with something like Cyberduck or Transmit. Just remember to change the ACL such that all images are publicly viewable. If you opt for the s3cmd route, after installing via brew or apt-get run s3cmd --configure to get started.

Structure of my S3 bucket deployed as a CF distribution. For CF distribution details, type s3cmd cfinfo, find the distribution ID, then try s3cmd cfinfo cf://[ID]

After the initial big upload, I wrote a task in my rakefile to make it easy to upload new images for a post. When I write a post I usually have a temporary folder on my desktop called new_post where I put all the images I want to use in the new post. I often link images to larger versions of the images and wanted this task to detect similar file names (example_img.png and example_img_1200.png, with the latter being a 1200px wide version of the former) and generate the proper HTML for an image linked to a larger version.

Example of the types of filenames in the new_post folder:

Now I just run rake cloudfront to upload the images with the proper ACL, clean up the filename, insert alt/title tags, detect different versions of images and provide me with the code for easy copying in TextMate. I know this is tied to my particular blogging workflow and may not apply to everyone but I wanted to share as it saves me lots of time.

(View this CloudFront rake task as a gist instead)

Disqus for Comments

Jekyll is all about static files so I can't do anything like serve my own commenting system. I decided to migrate my ~25,000 WordPress comments to the popular Disqus commenting system. I was worried this would be a long and painful process but was actually surprised at how easy it was. I simply installed their WordPress plugin and told it to migrate my comments to Disqus. The process did take a while — about 10 hours — until I noticed all the comments for each posts were properly loading. Comments that were threaded in WordPress were properly threaded in Disqus. Sweet!

As long as I kept the post URLs the same, there would be no problem adding Disqus to the Jekyll site. I created a comments.html include of the Disqus embed code that I put in my post layout.

That's all there was to it! There is one slight drawback, or plus depending on how you view things, to this approach. Disqus loads all comments after page load, via JS. This means that comments will not be indexed by Google. That's good if you write with SEO prowess and don't want user comments mucking up your perfect mix of keywords. That's bad if you're like me and think that user comments add tremendous value and want others to be able to find posts while searching for something mentioned in a comment.

Website Analytics

For the last few years I ran both Google Analytics and Mint. Google Analytics tends to be my "backup" analytics logging tool. I don't really check it too often but I like knowing that it's there keeping track of everything. I used Mint to simply look at more recent traffic patterns, popular referrers for the day and so on. I would check it more often than Google Analytics; up to maybe 5 times on a new post day.

With this site migration I decided it was time to lay Mint and it's MySQL database to rest. I didn't want to run a mysqld process anymore. I decided to sign up for both Chartbeat and Reinvigorate until I decided which one I liked more. Both cost roughly 10 per month at my tier. I have been using both for about a month. I'm not in love with either of them at the moment. Chartbeat has a neat dashboard with real-time data but makes it hard to get basic information like unique visitors and pageviews per day. I know that's not their target metric but it would be nice to add. It's like selling me a car that tells me current MPG but not average MPG.

Reinvigorate on the other hand does not give off quite the real-time vibe as Chartbeat does (and for some reason Reinvigorate reports roughly half as many active visitors as Chartbeat does; guess they have different definitions for active visitor). Reinvigorate has loads of data, much like Google Analytics, and you can get access to hourly, daily, monthly traffic, heatmaps, visitor details, top referrers, keywords and more, but it's spread out over some 20 pages and will take you an afternoon of clicking to find what you're looking for.

Maybe I'll just stick to Google Analytics and invest the money saved in a nice low expense ratio index fund. Or a haircut.

Features vs Generation Time

In the end, even after I built out complete archives pages and tag pages, I ended up ditching them entirely. Why? For simplicity and in interest of keeping site generation time minimal. With all these features and extra pages to generate, it took Jekyll 50 minutes to generate my site. That was 50 minutes between me and publishing a new post, changing something in the layout, et cetera. Running through 1,100+ posts and hundreds more archive and tag pages processing markdown, pygments and liquid is no easy feat. Jekyll is not made for large sites.

I ended up taking that restriction and using it for the better. Did I really need tags and individual archive pages? I asked a bunch of people on Twitter whether they used tags for site navigation. It came back as a resounding no. Most people considered them clutter. Search is the killer app now, no need for tags in my opinion. I yanked them all out and 301 redirected tag and individual archive pages to my single archives page.

Site generation time went down to around 6 minutes on my 2.8GHz Core 2 Duo after I took them out.

Generation by this task in my Rakefile:

Before this run-in with archives and tags I got related posts ("Latent Semantic Indexing") working after compiling and installing GSL with rb-gsl. It took a while to generate the list of related posts when I only had a handful of posts in my local Jekyll environment. When I put all my posts in and tried to generate them it took longer than 10 hours. I don't know exactly how long because I tried it twice and killed it after 10 hours — that wasn't going to fly and I decided to just list recent posts instead. I had considered spinning up a large EC2 instance to generate it but doing that each time I had a new post was going to be a pricey nuisance.

For those with fewer posts interested in implementing tag pages, I made a rake task for it as shown in this gist. Getting individual archive pages working required adding some of the archive support built by Mike West into my Jekyll fork. While I didn't end up using the full archive page support, it did allow me to organize the post listing in my single archives page by month and year (mentioned below).

Custom Features

A few features I did end up implementing and keeping include a second post type called "bit", MultiViews support, a filter to recognize WordPress "more" tags and collated posts.

MultiViews

There are two main ways of getting Jekyll to create permalinks. In the _config.yml you can either set permalink to something ending in .html or not. If the permalink structure ends in .html, Jekyll will end up generating posts as html files and dump them directly in _site and Apache will serve them as yoursite.com/your-post-slug.html. Jekyll will also link to posts on the site with the .html extension (that's what putting post.url in a posts loop will output).

If you set the permalink structure without any html extension, Jekyll will generate a ton of index.html files stored within their own directory named the slug of the post. Apache will serve it without any extensions as well, but will by nature keep a trailing slash since it is loading the index.html file inside of the directory. For example: _site/some-long-post-slug/index.html => yoursite.com/some-long-post-slug/

Alright Paul, so where's the issue?

I don't want a bunch of long name directories with index.html files. It makes it hard to search for posts locally if everything comes up as index.html. Just having post html files and less directories is much easier to deal with IMO.

I don't want permalinks to end in .html

I also don't want a trailing slash on permalinks (which is what happens with the index.html route)

I want Jekyll to generate post links without the .html extension even though I told it in the config to use .html

MultiViews is an Apache feature aimed at content negotation — serving up files for resources that don't exist. So even though /long-post-slug doesn't exist, Apache will end up serving /long-post-slug.html.

The effect of MultiViews is as follows: if the server receives a request for /some/dir/foo, if /some/dir has MultiViews enabled, and /some/dir/foo does not exist, then the server reads the directory looking for files named foo.*, and effectively fakes up a type map which names all those files, assigning them the same media types and content-encodings it would have if the client had asked for one of them by name.

Apache docs

I already set up MultiViews in the Apache configuration (you can also set it in .htaccess) so the only pieces left are 1) coaxing Jekyll into processing post urls without the html extension and then 2) having Apache redirect post-slug.html to the extension-free post-slug (otherwise both versions would load and Google would index both, detect duplicate content and spread PageRank amongst both.. not very canonical).

Fortunately both were a quick fix away. I ran across Henrik's Jekyll fork where he introduced a MultiViews setting in _config.yml and then rewrote the url method to remove the extension if multiviews is enabled and placed the url logic in another method. I applied the same method in an updated Jekyll (v0.10.0). Just set multiviews: true in the config file.

And finally, some a few .htaccess lines to take care of the duplicate urls:

<!--more--> content filter

This allows me to return just the part of the post before the more tag in my templates. For example, I wanted to use this on tag pages, archive pages and on new posts on the homepage. Alternatively, if you don't use or don't want to use the more tag in your posts, you can get the same effect with something like {{ post.content | truncatewords: 75 | textilize }}.

I added this to filters.rb in my Jekyll fork:

Bit post type

I wanted a post type similar to an aside but wanted it to remain entirely separate from regular posts. Bits would not share tags, be listed in the main RSS feed, et cetera. Something like this could have been done by adding another field to the YAML front matter in each bit and checking for the presence or exclusion of that value while looping through posts, or by simply making a bits folder and manually adding posts there, but then I wouldn't be able to loop through them for a bits archive, feed or the sitemap.

For ease of use, cleaner logic and faster generation times (less stuff for Liquid to do) I decided make a Bit class. It's essentially a direct copy of the Post class with appropriate variable changes/additions made throughout the Jekyll.

View bit.rb on GitHub.

Collated posts

And last but not least, I wanted slightly better archive pages. I didn't just want a list of every post. I wanted them broken up into sections for year and month.

This snippet, among some related archive code, was added to the render method in site.rb:

That allowed me to use this crazy markup to create the archives page:

View the complete archives file in this gist.

Performance

By nature, Jekyll will be fast — as fast as your nginx, Apache or other web server setup can dish out tiny static html files. By offloading all image resources to a CDN, I reduced the amount of HTTP requests the server has to reply to for a single page load. I could have also done the same thing with my Sass-compiled CSS but I change it so often that I prefer having it served from my server rather than dealing with CDN cache invalidation and versioning issues. I ended up keeping Apache as my web server; I don't get the kind of traffic to warrant an nginx setup. My box can handle a 50,000 pageview day no problem and that's the most I've seen from any sort of Hacker News/Reddit/&c fiasco.

I decided to install the much-hyped Apache 2.2 module by Google called mod_pagespeed:

Pagespeed should be installed and you'll see some new files in the /etc/apache2/mods-enabled/ directory. Now let's take the red pill and see what kind of configuration options are available. Open up pagespeed.conf and uncomment/enter these lines:

Read about more mod_pagespeed filters and settings in the docs; this is only scratching the surface. In particular, take a look the rewrite_images filter as well as ModPagespeedDomain if you do any CDN stuff. Pagespeed can also provide basic statistics if you enable the following:

After you're done fine-tuning pagespeed settings save the conf file and then restart Apache:

Fire up Chrome browser and load up your site. Right-click anywhere on the site, click Inspect Element, click the Network tab then refresh the site to fill up the network pane. Click on the name of the actual HTML for the page. Make sure it's not a 304 (that's generally good but if it's coming from cache you can't see the mod_pagespeed headers to confirm if properly installed). Load a page you haven't visited before, clear your cache or try enabling private mode. Once you're able to load a page with a 200 you should see X-Mod-Pagespeed at the end of the Headers pane:

<img src="http://turbo.paulstamatiou.com/uploads/2011/01/pstam_jekyll_pagespeed_header.png" alt="Google mod_pagespeed ena

Show more