2014-11-20

While you are twiddling your trackball thumbs waiting for the DACA explosion. something slightly useful.

I finally bought Edward Tufte’s classic of graphical design The Visual Display of Quantitative Information.
It’s so good it has no competitors
, like the Department of Water Engineering at the Technical University of Delft. [Update 22/22: this is incorrect, see comments.] Struggling to find a niggle against a nearly perfect work, my only complaint is that he compares his masterpiece to Strunk and White’s error-packed The Elements of Style: a “malign little compendium of bad advice” (Stephen Dodson); “the book’s toxic mix of purism, atavism, and personal eccentricity is not underpinned by a proper grounding in English grammar” (Geoffrey Pullum). If you have any serious professional or even amateur interest in charts, buy Tufte’s book: he gets all your money as self-publisher, as commercial publishing houses refused to cede him the full graphical control he demanded. More fools they.

The book lucidly combines general graphical principles on “the revelation of the complex” and a plethora of striking and even amazing examples of good and bad practice. It would be a disservice to offer a dummy’s summary on this blog. What I tried to do was to investigate how much of his specific good advice, as opposed to the general principles, can be put into effect using a standard office software suite. I have LibreOffice. Most of the features apply in Excel, which does offer more: in some case misguidedly, in the 3D pyramid stacked histograms, with a variable lie factor as the data correspond to the heights of the pyramid slices while the eye reads their volume. If you want to make marginal or bubble plots, you will need specialised software like this or this.

I’ll take a worked example. Warning: the page below the fold is large, with many images, pushing the envelope on resolution. The WordPress software seems to muddy the resolution of images so you will need to click on each to get a proper view.

Tufte tells us to choose interesting, rich data. Obvious, but often ignored. I’ll use those from the annual LLNL energy flowcharts for the US economy. Here is that for 2013.



This is itself a fine piece of work. The only thing wrong is that the areas of the total boxes at the right do nor match those at the left. I once located a higher-resolution version, presumably the original, where the boxes are correct. Whoever converted this very large image to png format to fit on a webpage truncated the boxes. Keep control of your work.

They have charts for five previous years. It’s very difficult to compare years. Let’s try to chart the changes between 2008 and 2013, for a subset of key data. I’m especially interested in wasted energy, which happens at the right-hand side.

Let’s start with the popular pie charts. Tufte is against them:

A table is nearly always better than a single pie chart; the only worse design than a single pie chart is several of them, for the viewer is asked to compare quantities located in spatial disarray ….

Here they are. I left the default output of LibreOffice, with few exceptions: the data legends are in bold; the pies were carefully resized so that areas correspond to the true ratios of the data.


The lesser problems include garish colours, oversaturated so that any legends placed within them are unreadable, and a legend train wreck at the top of the 2008 wasted energy pie on thin pie slices. For some reason the lettering has turned out poorly. The colours can be fixed by editing them; the lettering only by deleting from the chart software and re-entering by hand in a picture editor. The software changed the order of the sectors when I added electricity generation to the waste chart – I’ve no idea how to fix this. (The net output of electricity generation is included in the other sectors, see the flowchart.)

The main problem of difficult visual comparison is unfixable. What would you say is the ratio of useful to wasted energy in either year? The latter is clearly more, but could be by anything from 20% to 60%. The eye is not nearly as good at estimating areas as lengths. The true increase is 40%.

So let’s follow Tufte’s advice and put our 10 data points in a table.



I take issue with Tufte here. A table is fine for at most a double comparison of a set of data: say, within a year between sectors, and between useful and wasted energy. Add a third variable such as time, and it gets confusing. So we will see what we can do with stacked bar charts. Here is the raw default result.

This is already a considerable improvement. It is quite easy to run a visual comparison both between useful and wasted energy within each year (adjacent columns), and of useful or wasted energy between years (alternate columns). The intuitive perception of the ratios of the column heights is much more accurate. All the sectors are in the right order. The software offers percentage bar charts, but what’s the point? The percentages are just as clear visually without them, and the varying height adds another useful datum, the absolute total.

The remaining weaknesses are of visual comfort, elegance and legibility. First we add a white horizontal grid as a discreet reference point – a tip from Tufte. This suggested a pastel coloured background. To make the white lines run through the columns, I made them 50% transparent. You want to start with a well-saturated colour for this to work. I played around with the colours to make an agreeable effect. Tufte does not offer much advice on colours, in this book at any rate. I like pastels, and chose related colours for my four final consumption sectors, and a contrasting one for the electricity waste, a category of its own. I also added data labels within the columns, allowing the precise numbers to be read off directly.

What are the tick labels on the left-hand axis doing? The numbers are already in the columns. So we get rid of them. Tufte suggests, for scatter plots, replacing regular interval ticks on the axes with exact marginal coordinate values; not feasible with ordinary tools. Similarly for truncating the lines of the axes to the data range. Another way of combining numbers with charts is to put a table below the columns, as with this good example from EPIA.

Tufte insists that revision is as necessary for charts as it is for writing. First, I added the column totals, a useful piece of information, using a picture editor (SansSerif PhotoPlus Starter edition – free). More important, I decided to change the units, a substantive not a graphical issue. The quad (quadrillion BTU) is a standard unit for discussing very large quantities of energy, as for the US economy. But it reflects the era of fossil fuels we are leaving for one powered predominantly by renewable electricity. For geeks, the quad is deprecated as not an SI unit. There is no loss in intuitive grasp in shifting to SI. Neither the quad nor the BTU has any day-to-day resonance, unlike the kilowatt, roughly the power delivered by a small horse. (Racing cyclists can sustain 400 watts for a while). The terawatt (trillion watts) is too small as a measure for the US economy, so let me introduce you to the petawatt, a quadrillion watts or billion megawatts. Get used to the prefix: the NSA are already up to exabytes – the next jump beyond petabytes – at their Borgesian Utah data centre. That’s thousands of petabytes of selfies and emails, no more useful (judging by Benghazi and ISIS) than Smaug’s bed of gold. So I recalculated the spreadsheet in petawatt-hours. A small explanation went into the chart.

Finally I decided to replace the data legends manually in the picture editor, allowing the placement I wanted. I added explanations of the electricity issue and the units, and my name as author. Here’s the end product. Not great, but I think a decent piece of work. I fancy I’m not far from the limits imposed by today’s bog-standard software. In a few years my grandchildren will be emulating Hans Rosling’s dynamic bubble plot (2.30 minutes in).

Was it worth the effort? I gained no new insights from the work, and you should not expect any if you emulate. Chart design is all for getting across your thinking about data to your audience, not refining it for yourself. By that standard, I hope I succeeded. Let me know. I’d have liked to add thin line borders to the column segments, but neither LibreOffice nor Excel offer this, and I felt I’d invested more than enough time in the project already.

What are then the points the chart illustrates?

There is a colossal amount of energy waste, and it’s overwhelmingly in just two sectors, electricity generation and transport. Shift to renewable generation (100% efficient by accounting definition) and electric vehicles (something like 85% efficient plug-to-wheel) and you would save waste, and hence carbon emissions, equal to the entire useful energy consumption of the country, with no other changes in lifestyle or the production basket. Replacing current primary energy production is unnecessary, and it’s the wrong metric. Focus on useful energy and waste. (The accounting convention for renewables and nuclear is incidentally correct. Inefficiencies in the form of unconverted wind and sunlight, and heat from reactors, are absolutely trivial environmentally, unlike the emissions from wasted fossil fuels. Conversion efficiency gains are of course welcome there, and costs create a sufficient incentive to pursue them.)

US industry is remarkably energy-efficient compared to commerce (including government) and households. How it rates against German industry is another matter.

Obama’s presidency, in spite of sound policies, has not yet achieved significant reductions in energy consumption and efficiency. Some of these policies, notably the EPA coal regulations and vehicle mileage standards, will of course certainly have a bigger impact in the future.

Show more