Davidghedini.com

Apache Roller: Export Blog Posts to PDF, Word, and PPT with Eclipse and BIRT

2011-09-11

This post will cover how to export your Apache Roller blog posts to multiple formats using Eclipse and BIRT.

At the footer of each of my posts I have the option to export the post to PDF, Word, PowerPoint or HTML (print preview) as shown below.

At the onset, I'll say that while this solution works well for me, it has a fair number of issues which I've listed at bottom.

There are, of course, a number of ways you can accomplish exporting using itext and other tools.

Despite the additional memory overhead I chose Eclipse and BIRT because the solution is extremely simple to implement. It also lends itself to serving multiple blogs from a single BIRT instance and even from a single Eclipse report document.

What you will need:

1. Ecplise. I am using Helios (eclipse-reporting-helios-SR2).

2. BIRT Runtime Viewer. I've used both 2.5.2 as well as 3.7

Step 1: Create a Data Source.

For our data source, rather than my Postgresql database, I'll use my atom feed at http://www.davidghedini.com/pg/feed/entries/atom.

Using my atom feed means less security worries (no authentication required) and having my data source already in XML format.

In Eclipse, in the Data Pane, right click Data Source and select 'New Data Source'. Select 'XML Data Source'.

Give your Data Source a name. I'll name my Data Source 'DsDavesBlog'

For the url of the Data Source, enter your Apache Roller rss feed url

Click Test Connection to test and then click Finish.

Step 2: Create a Data Set.

In Eclipse, in the Data Pane, right click on Data Sets and select 'New Data Set'.

Under XML Data Source, highlight the Data Source you created in step 1 (we used 'DsDavesBlog').

Give your new Data Set a name as shown below. I'm calling mine 'dSetDavesBlog'. Click Next

In the New XML Data Set screen, select 'Use the XML File Defined in the Data Source' as shown below. Click Next

You should now see the Row Mapping screen as shown below. For XPath Expression, choose 'entry' from the tree menu and then use the add arrow. The resulting value of XPath Expression at right should now be '/feed/entry' as shown below.

You should now see the Column Mapping screen as shown below. Expand the 'entry' tree node as shown below.

Use the center arrows to add 'title' and 'content' as shown below.

'title' is our blog post title and 'content' is the actual blog post content.

Expand the 'link' tree entry beneath 'entry as shown below. Highlight the @href element and now use the center arrow to add the /link/@href element

The /link/@href XPath expression will provide a link to post's permalink page.

Your Column Mapping screen should now look as below.

Click the Show Sample Data button to verify the data set is pulling the correct elements

Close the Show Sample Data screen and click FInish

Step 3: Create Your Eclipse Report Document.

Now, create your layout in Eclipse.

For a basic report document like mine, Eclipse is simple to use.

You can simply create a grid and then drag your Data Set items into the appropriate locations. I've also added some text elements as well for the report document header, as well as social networking links.

The one Data Set element that requires special attention is the 'content' Data Set item. The Content Type for this element must be HTML as shown below.

To set the Content Type as HTML, drag a Dynamic Text field into the report document from the Palette Window. You can then bind the 'content' data set item to the Dynamic Text field.

Your 'content' report element should now look as below.

Step 4: Add A Parameter

As part of the procedure for calling our Eclipse document(s) we will need to create a parameter for our report. In our case, the post title will serve as the parameter.

In the Data pane, right click on Report Parameters and select 'New Parameter'.

I'll call my parameter pTitle.

Now, in the Data pane, double click on the Data Set that we created above (dSetDavesBlog).

The Edit Data Set Window appears.

Select 'Filters' from the left menu, then click the 'New' button.

As show below, in the The New Filter Condition window, select row["title"] from the first drop down, 'Equal To' from the middle drop down, and '
' from the right drop down as shown below.

In the Expression Builder window, click on 'Report Parameters' from the Category selection box.

In the Sub Category selection box, click on '--All--'.

In the Double Click to Insert selection box, double click on the parameter we created, pTitle as shown below. Click OK.

The New Filter Condition window will now look like below. Click OK.

The Edit Data Set window will not look like below.

Click OK.

Now, it's time to test our parameter. Click the Preview Tab in the report window.

The Enter Parameters pop up window should appear as below.

Enter the title of one of your posts and then click OK.

The report should return the post you passed as the parameter value as shown below.

We now have a report that we can use to export our posts to PDF, Word, PPT, Excel, or HTML (Print Preview).

Now, it's time to install BIRT.

Step 5: Install the BIRT Runtime Viewer.

Installing the BIRT Runtime Viewer is very simple. We will be installing on our existing Tomcat instance. For JBoss, WebSphere, WebLogic, you can find instructions here: http://www.eclipse.org/birt/phoenix/deploy/viewerSetup.php

1. Download the BIRT Runtime Viewer here: http://download.eclipse.org/birt/downloads/

2. Extract the contents.

3. Copy the WebViewerExample directory to your Tomcat webapps directory.

4. For my site, I renamed the WebViewerExample to 'export'. You can name it whatever you like.

5. Start or Restart your Tomcat instance.

6. You should now be able to reach the runtime viewer at http://YourDomain.com:8080/export

Note: In my set up I am using Jason Weathersby's Filename Generator, which you can download here: http://www.birt-exchange.org/org/devshare/deploying-birt-reports/1322-filename-generator-that-uses-a-url-parameter-to-name-export/

Now that BIRT is up and running, you need to make a few changes.

1. Delete the webcontent directory from BIRT. You do not need it and keeping the webcontent directory will cause issues if a file is not found and when using HTML as the export type.

2. Set PermSize If you do not set PermSize you will encounter memory errors. Simply add the desired parameters to your Catalina.sh as shown below or use whatever method you like for setting your JAVA_OPTS.

Step 6: Publish Report to BIRT.

Publish your report to your BIRT directory under webapps. You need only publish the rptdesign document.

Step 7: Calling the Report from Your Roller Posts.

In order to call the reports from our posts we will to have JQuery available. You can probably use straight javascript, but I'm too lazy for that

If you do not already have JQuery available, add the following line to the header of your blog. You can use whatever version you wish.

To dynamically build the BIRT urls for calling your reports, you can do the following:

1. Get or create icons for PDF, Word, etc... If you like, you can download mine in zip format here:

http://www.davidghedini.com/DocImages.zip

In my script below, I have the image paths as /images/DocImages.

2. Create a JavaScript file to build the urls based on the document type you are exporting. These are the urls that need to be created both to render the links in the footer and to call the reports in various formats from BIRT.

For my blog, I use http://www.davidghedini.com/js/birt-rollers.js as shown below.

As you can see from above, we first use JQuery bit at the beginning:

The above captures the post title on both multiple post pages as well as single (permalink) pages.

The remainder of the js file is simply building the appropriate url for the document type we want to export to.

As noted above, I am using Jason Weathersby's Filename Generator to set the exported document to that of the post title. If you do not have or want to use this, remove the following from the end of var urlStart:

2. Upload your birt-rollers.js file to your server.

3. Add the javascript below to your _Day template, where varRollerId is the Eclipse document name and the url is the location you uploaded birt-rollers.js.

I added the above javascript to my template pages as below:

As noted at the start of this post, there are a number of issues with this solution.

I'm reasonably sure someone with better Java skills (read: anyone) could make a better go of it.

Also, there are a number of free online services that convert to some formats. The problem I had with these online services is that they work using the url of your post. The result is that the document produced is little more than a screen shot as it includes sidebars, menus, and other elements I didn't want in my document.

I wanted the body of my document to just capture my entryBox div

Problems (ones that I know about):

1. Handling of HTML content is spotty. Some tags can be escaped while others do not work well.

2. I cannot get the CSS of Syntax Highlighting to work in the document.

3. I have no idea how to set page breaks.

4. BIRT uses memory resources.

Again, the issues could probably be addressed by those with Java and/or Eclipse skills.

The upside for me is:

1. No-brainer deployment. You can implement the basic solution in less than 30 minutes.

2. A single BIRT instance can serve any number of blogs. Just change the value of var rollerId to the name of the document being used for the particular blog

3. Because it is using the atom feed as the data source and not a JNDI or similar, there should be no significant security concerns with multiple bloggers sharing the same BIRT instance.

4. Export to any supported BIRT format - PDF, HTML, Word, PowerPoint, CSV, and XLS.

5. Eclipse document can easily be reused for any blog. Simply change the atom url for the data source. Taking it a step further, rather than using a static text field for the document title, you could use the XPath to /entry/name (or any other identifier).

Finally, as the data source is always the Roller atom feed, if you wanted to serve any number of blogs without having to update even the data source, you could set your Data Source dynamically. You can do this by adding a new parameter (pDataSource below) and changing your data-source xml as below:

You could then set your Data Source in your javascript using something like: