2011-12-01

Many have pointed out the flaws in the GDP and CPI. But yet the numbers are still the primary economic metrics used across academica and the media. This essay comes not to criticize but to destroy. The GDP and CPI numbers are not just imperfect, but they are off by so much, and in so many different directions, that they should be tossed out entirely. For every purpose that GDP may be used, a better measure exists. We’ll start this essay by decimating GDP, and then I’ll present the alternatives.

Composition Problems

The BEA calculates GDP by totaling the dollar expenditures of every person and organization in the United States (this is called “Nominal GDP”), and then multiplying that total by the change in a price index. This results in a number the bureau refers to as “Real GDP”.

The “Nominal Gross Domestic Product” metric is grossly misnamed. The units of the number are dollars, thus it is a measure of money, not “product”. If the government printed $100 trillion dollars, national income would skyrocket, but that does not mean any more goods would be produced, it would just mean that prices would rise. The number is a measure of the supply and velocity of money, it has zero to do with the production of goods. The name should be “Gross Expenditures” or “Gross Money Flow”.

To calculate actual economic growth, the government takes the growth in expenditures and subtracts the growth in prices. The difference is the actual growth in production of goods. At least that is the theory.

The first problem is the composition of the price index. This is such a big problem that by itself it invalidates use of GDP as a metric. The price index is calculated by calculating the prices of a basket of goods.

The trouble is that the basket excludes a) all goods that are produced but not measured/traded and b) goods that the economy can no longer produce at all, for whatever reason.

The famous example from Paul Samuelson is that if a man married his maid, then, all else being equal, GDP would fall. Or if a nanny has a child and quits her nanny job to take care of that child, GDP falls, despite the total amount of child care being produced remaining the exact same.

The most glaring absence from the “goods basket” is leisure time. And in fact any non-economic good is excluded: working on the house, taking care of children, doing private research, writing a book to distribute for free over the Internet, etc.

Futurists in the 1800’s used to imagine that as society grew richer, people would work very short weeks and dedicate most of their time to leisure, learning and the arts. In such a scenario, GDP would actually be stagnant or declining, as GDP stats would not pick up the increased production of leisure, learning, or arts. In policy circles, officials think that greater GDP is always good. But we have no idea if that’s actually the case - we might be better off under scenarios where GDP is falling.

The astute reader may complain, “But GDP is not trying to be a measure of well being, it is supposed to be a measure of output, to be weighed in policy decisions against other factors.” But this is incorrect. GDP does try to measure of “well being” because there is entire system of “hedonics” built into the price index. These hedonic measure try to measure how much the good has improved well being.

Nor does GDP measure output; it just measures changes in a price index. If the entire industrial base atrophies, manufacturing disappears, and a country survives off of exporting its currency like 16th Century Spain or modern Iceland, numbers such as the GDP will not pick this up until it is far, far too late.

Finally, policy makers with an economics background do overly focus on GDP. Just pick up a random economics paper or read a Fed report to Congress. These documents mostly discuss what factors or policies will raise GDP. They never discuss the scenarios in which raising GDP is actually desirable. The policy papers rarely concern the trade-offs with other factors such as leisure time, commute time, community, environment, etc.

Exports amd Imports

The price index used by GDP excludes all imported goods. This has some logic to it. GDP is supposed to measure output of the U.S. If the U.S. imports all its wool, and the price of wool rises due to a sheep epidemic in Australia, that is not representative of a fall in output in the U.S.

But this logic collapses upon further inspection. Let’s say there is a thriving export industry making airplanes. U.S. manufacturers and export the planes to Saudi Arabia, Saudi Arabia ships back oil. Now let’s say that due to bad management the quality of the planes declines. Saudi Arabia buys European made Airbus planes instead, and the American companies go out of business. What will happen? National Income in dollars of the U.S. will be the same (remember National Income is simply a measure of the supply of dollars and people’s desire for cash balances, it has nothing to do with goods produced). The price of oil will rise significantly. The U.S. has less goods to exchange for oil and thus the dollar must weaken against Saudi’s currency. The price of other goods will rise a bit due to increased oil costs, but not as much. Thus if you exclude oil from the price index, you would miss this huge drop in production. And in fact, this has happened over the last decade as GDP continues to climb despite the disappearance of the U.S. manufacturing base.

So we have good reasons for excluding imports from the price index, and good reasons for including imports. Which is correct? The economist at the Bureau of Economic Analysis tries to patch it up the best they can, and kludge a number together.

The reasonable person must admit: “We do not know”. You fundamentally cannot reduce all of a nations output to a signal number. Any attempt to do so will simply combine dozens of different assumptions that will react or cancel each other out in weird ways, giving you a resulting number that is completely useless.

And we are still not done yet.

Substitutions

If the price of steak rises, consumers will shift their consumption to another good, perhaps ground beef. This will in turn change the weightings used by the CPI and GDP, so that the more expensive good, now being consumed less, will get weighted less. The CPI allows limited substitution, while the GDP is a full substitution index. These different assumptions can have dramatic differences in the resulting numbers. John Williams writes:

Nonetheless, if steak prices were to rise, strapped consumers indeed likely would shift to cheaper meats, and such would be reflected in the broader weighting categories. The problem from the BLS standpoint, though, was that those weightings traditionally were recast only every ten years. So, the broad-category reweighting process was accelerated, with a reweighting in 1998, and, thereafter, reweightings were structured formally for every two years starting in 2002. This process moved the CPI closer to a fully substitution-based index. These approaches, in conjunction with other methodological changes ranging from increased use of discount-store surveying to shifting quality and hedonic adjustments, resulted in meaningful downside adjustments to reported annual CPI inflation of roughly 300 basis points (3%), of which 28 basis points currently is estimated by the BLS as the effect of geometric weighting. The period involved here, from the early-1990s to date, was dominated by efforts to address the Greenspan/Boskin contention that the CPI “overstated” inflation. Nonetheless, the resulting, current CPI was and still is not a fully-substitution-based inflation index. An experimental substitution-based index, the Chain Weighted CPI-U (C-CPI-U), is published monthly by the BLS along with the CPI-U and CPI-W. At present, the C-CPI-U is showing the annual inflation rate running about 80 basis points (0.8%) below the official CPI-U. It is plotted as one of the three CPI measures in the inflation graph on the www.shadowstats.com home page.

Infrastructure

GDP numbers measure the dollar value spent on infrastructure, but they do not measure the amount actually produced. If the government pours money down the drain in wasteful and corrupt corruption projects, this will show up in the figures as net production.

GDP numbers do not show depreciation. So if existing infrastructure is crumbling faster than it gets replaced, GDP might show up as actually growing while in reality things are falling apart.

If the entire city of Detroit gets destroyed in riots, fires, crime waves, and ethnic cleansing, and is left in ruins, this will not show up in GDP numbers. In fact, GDP might actually increase since the destruction would spur the creation of new housing (which does show up in the numbers) and the average home price might actually fall (reducing the GDP delfator) due to homes becoming unlivable due to violence.

Hedonic Hell

Both the Consumer Price Index and GDP deflator rely on “hedonic adjustments”. They adjust the numbers based on improvements in quality. On the surface, there is some plausibility to these adjustments. Cars have risen in price since 1970, but they have also improved greatly in quality. We now have air bags, crunch zones, better mileage, greater durability, etc. If you just look at price, you may think that the quality of life has dropped when in fact it has risen.

But the trouble is that none of these improvements are numerically quantifiable. Try answering for yourself. How much better is a 2010 computer than a 2002 computer? 2.3 times better? 1.7 times better? 20% worse? If you cannot answer this number numerically, for yourself, how can anyone answer it, especially for the entire country?

The 2010 computer has three times the processing power, so is it three times better? But most people will never use this processing power, so for them the quality is the same. Those who hate windows Vista might argue that the quality has actually declined. The answer is entirely subjective, and defies a numerical categorization.

The BEA uses several methods for adjusting based on hedonics. All of these methods have a surface plausibility, but upon a deeper look are completely invalid.

Method one is overlap pricing. For a brief period of the year, an applicance company might sell both the 2009 microwave model and the 2010 model at the same time. The price difference between the models can be used as the hedonic adjustment. This method is complete absurd. New products often sell for more even if they are not really any better. New game consoles sell for huge premiums upon release, but quickly fall in price. It costs $12 to watch GI Joe II in the theater the week it comes out, at the same time it costs $3 to watch The Avengers in a second run theater. Does this mean that GI Joe is four times better than the Avengers? Overlap pricing strategies actually measure the amount of price discrimination a company is using to gain extra money from people who want to show status by having the very latest gadget. There is no predictable relationship to overall increase in quality.

Method two is the explicit quality adjustment method in which the government tracks the amount spent on improvements. For instance, if a car company spent $100 per model adding a new support beam for safety, that would be counted as $100 worth of quality improvements. Again, this is invalid. Just because a company spent $x dollars on improvement does not mean it actually increased in quality that much. This number has the potential to exclude various items the car company might subtract from the car. The new oven might have a slick digital interface, but perhaps some of the internal parts have been replaced by plastic. Nor it is easy to distinguish improvements that are marketing gimmicks from actual long term quality improvements. The CPI excludes money spent on cosmetic changes like new paint colors and reshaped bumpers. But take the example of the Lexus that can parrallel park itself. Is that a valuable long term quality improvement, or really just a style change that allows its rich owner to show off?

Method three is to measure some component of the product - such as processing speed or gas milage - and detect how much it improves. Again, this is almost always invalid because it is impossible to relate something like processing speed to an overall quality value for the product. Doubling the processing speed of the computer has no hedonic impact on my mother’s ability to send email to her friends.

These problems are not merely academic, hedonic adjustments have a dramatic impact on the numbers.

From 1996 to 2010, the average sale price of an American car rose from $16,901 to $23,182 (37%). The Ford Taurus rose in price from $18,545 in 1996 to 25,018 in 2010. The cheapest Ford rose in price from $11,430 (the escort) to $13320 (the Fiesta). The cheapest Honda rose in price from $9,980 (the Civic) to $14,900 (the Fit).

Depending on the measure we use then, the prices of cars rose by 22% to 49%.

Yet from the CPI index for automobiles from January 1996 to April 2010 actually fell by 1.5%. In other words, the bureau of statistics has decided that the 2011 cars are ~35% better in quality than the 1996 cars.

Compare the 1996 Ford Escort specs to the 2011 Ford Fiesta Specs. The new model has no more gas mileage, no greater trunk space, no more seats. The new model does have more horsepower, but that’s not going to get you to your destination any faster.

I do not know know exactly how the BLS determined that the price of cars did not actually rise from 1996 to 2010. The general outlines of their methodology is generally available but not the details.

What’s amazing though, is that if you use the straight price numbers, and use gas milage, car space, and speed to calculate hedonic changes, then that means there has been essentially no economic growth in the automobile sectors. Growth is wiped out. The assumptions used by the BLS thus create the econonmic growth. Change the assumptions and you get very different growth numbers.

Of course, in other respects, common statistics may be dramatically underpricing growth. Thanks to Rhapsody + Netflix + Google Books + the Kindle I have instant access to a virtually unlimited number of books, movies, and music for a total monthly price of $25. My Google Books collection alone would have cost thousands of dollars to build a few years ago. But I downloaded it all for free.

ShadowStats does their own calculation of CPI numbers. By using pre-Boskin commission methodology, the ShadowStats estimate of CPI is nearly double the official number.

The point is not that GDP numbers are overstating growth or understating growth. The point is that GDP numbers have absolutely no meaning whatsoever. The numbers are very sensitive to the assumptions you make, and a wide range of plausible assumptions can be used, each producing a very different GDP number. The GDP flunks a sensitivity analysis and is therefore useless. If the number seems close to your intuitive sense of how fast the economy has grown, it is because the calculations were retroactively fitted to match your intuitive sense.

Adjusting the data

Every year. the BEA makes adjustments and revisions to previous years GDP data. For instance, the growth numbers for Q3 2002 were revised downward in three successive revisions. The end result was changing the growth rate from 3.3% to 2.2%. In the 2009 comprehensive revision, the growth rate for 2008 was changed from 1.1% to .4%.

Economist Jeremy Nalewaik has pointed out that GDP tends to be adjusted in the direction of the GDI estimates (GDP and GDI should be indentical, GDP is calculated by adding up expenditures while GDI is calculated by adding up incomes).

Again, the point is not that these adjustments are right or wrong. The point is that the results are extremely sensitive to the assumptions and adjustments made. The end result is that GDP numbers will simply replicate what the people doing the adjustments think it should look like.

Composition problems with the CPI

The CPI excludes the price of housing. Instead they use owner-equivalent rent. The claim is that since money paid for a home is actually income for another person, it is not necessary to include the home price in the index. But this claim applies to the price of every good. Money you pay for oil goes to the shareholders of the oil company. Money you pay for services is someone elses income. The net result of excluding housing was creating a much lower inflation estimate for the past decade.

We all know that from 2006 to 2010 the housing market crashed. Across the nation housing prices dropped dramatically. The Case-Shiller index reported that home prices fell by 31.5%. Yet the CPI index for housing costs (based on equivalent rent) actually rose by 7.7%. Again, by using a different methodology, the CPI produces a wildly different number.

The issue becomes even more complicated when you include foreign investment. Imagine China is implementing a mercantilist policy. China bids up the price of a current home and buys it from the owner (via the channel of mortgage backed securities). The owner then uses the money to buy products from China. China then sells the home to a new buyer at a higher price, who takes out a mortgage. The net result is that America is a net seller of home equity and in return has gotten goods. The price of a home will be pushed up. In the GDP statistics this will actually show up as economic growth (since the cheap Chinese goods will push down the GDP deflator). But in reality, there has been no growth.

“Give me four parameters, and I can fit an elephant. Give me five, and I can wiggle its trunk”

The most seductive call of the GDP is that the resulting number seems somewhat plausible. But that’s part of the problem.

I don’t know exactly what goes on in the heads of the gnomes at the BEA. But I assume it’s no different than what goes on with a college senior trying to write a thesis, or a marketing department trying to figure out ROI numbers for their product.

As we noted, there are huge range of “adjustments” that go into making the GDP. Everything from excluding oil imports, to including paid child care but not household child care, to the various hedonic adjustments are dubious adjuments and fudge factors. The art is that you take these assumptions, and you keep tweaking them until you get something that “feels right”. While this sounds nefarious, the economist might not think so. For he assumes that it is possible to boil down the economic output to a number. He also assumes that the plausible adjustments are the best possible. Therefore, any tweak to those adjustments to make it “feel right” is a tweak towards greater accuracy, it’s fine tuning.

One analyst at a government agency in Canada writes:

Much of our time is spent “forecasting,” which basically means making a common-sense appraisal of what some indicator or variable will do in the coming years, and creating a statistical model that confirms it. The second step adds nothing of value to the prediction - the math is just there for show, a means of impressing the innumerate by camouflaging shot-in-the-dark guesses in rigorous clothing.

Forecasting is different field than compiling GDP statistics, but this quote shows the general mindset that exists.

The problem is that you cannot boil the economy down to single number. The result of this entire process is pure numerology. You are data mining a pre-determined conclusion. Numerology is when people calculate numbers from the bible to get results. Since the Bible is so big, you can get pretty much any number you want. Similiar with GDP. The space in which you can make adjustments and tweak varaibles is so great, you can get any result you want. So the fact that the numbers say GDP grows 2% a year is not the result of “science”, but of psuedo-science. It’s a deeply complicated model that simply spits back the assumptions of the model’s creator.

The result that the designers of the CPI wanted was a reduction in social security payments. Congress felt that social security payments were too high, and they wanted to balance the budget, so they assigned the Boskin commission to redesign the formula. Now there is some validity to this. IMO, seniors were getting too much. But it’s not because the CPI was “overstating” inflation. It was because a) the components of inflation that were rising the fastest affected seniors the least. Healthcare costs were rising, but seniors get covered by Medicare. Housing costs were shooting up, but far more seniors are sellers than buyers. But in changing the CPI numbers to stop overpaying seniors the designers ruined the number as an overall measure of well being.

GDP and dveloping countries

All the problems of GDP apply 100X when we’re talking about developing countries. Yale economist Chris Blattman writes:

Doesn’t it strike you as odd that the World Development Indicators have annual infant mortality data for most countries in Africa for most years? It should. Most of that data is interpolated, and the rest is (as often as not) close to made up. It’s not just the human development indicators. You wouldn?t want to be inside the sausage factory that is the GDP calculation in Chad.

A commenter on his blog, Mona follows up: “As someone who, another lifetime ago, worked on the World Development Indicators, I can corroborate the claims in that last paragraph!”

Often GDP numbers for developing countries look plausible. But that could just because the number was fitted to be plausible. If the GDP number counters your intution, you certainly cannot treat the GDP number as authoritive. Thus there is no reason to use the GDP number at all.

What is the overall bias in the RGDP number?

The Real GDP number is NGDP adjusted by a price index. NGDP is a measure of monetary inflation, in other words it is a measure of the supply and velocity of money. Thanks to excluding certain classes of goods such as imports and real estate, and including hedonics, the price index is less sensitive to growth in the money supply than is NGDP. So in any period of monetary inflation the economy will appear to be growing, while in periods of monetary stability the economy will appear to be shrinking. What this means is that the Real GDP number has a built in bias that makes policy makers confuse inflation with growth. Policy makers might think the economy is growing, as in 2006, but in reality the purchasing power of the average worker is eroding as the inflation is going to the well connected while the price of oil and food rises.

And of course, due to the mechanisms of the modern business cycle, inflation will be associated with high utilization of economic resources (labor and factories) while deflation will be associated with recessions. Another bad problem is that there is two types of growth that get badly mixed up. There is the “growth” that occurs when exiting a downturn - this is a growth in utilization as idle resources go back to work. This kind of growth can be stimulated by inflation. Then there is the growth due to the creation of new technologies and products. This growth has very little to do with inflation. Unfortunately all these mixed definitions leads to smart people writing hopelessly confused essays where they try to apply policies to stimulate technological growth towards stimulating utilization growth, or they use charts of RGDP (ie monetary inflation) to examine the problems of technological growth (which has nothing to do with RGDP).

Is the GDP good for anything?

“GDP” refers to either nominal GDP or Real GDP. Nominal GDP is a misnomer. It’s a measure of the flow of dollars, not of production. However, comparing the nominal GDP of Country A in 2010 to the nominal GDP of country B in the same year can be useful. Because there is an exchange rate between the two countries, comparing any cross section of income (nominal GDP, median income, wage of the average McDonald’s worker, total taxable income, etc, etc) will give a reasonable comparison of quality of life. Of course, there is still much room for fudge (Purchasing power parity, potential double counting in the GDP calculations, etc).

It is so called real GDP that is completely useless. Real GDP attempts to put a dollar price on the change of output overtime. But since there is no exchange of goods between 2010 and 1970, this can only be calculated via the tortuous statistics that I discussed above. These statistics simply replicate what the economists want to find, they do not add any information, and only open opportunities for being misled.

How to replace GDP and CPI

The last argument of the GDP/CPI fans is that, “GDP/CPI may be imperfect, but it’s the best number we got for (insert use case here).”

We will demolish this one last argument. In some cases we must accept that the attempt to use any number is fundamentally flawed. But in many cases there is actually a replacement number that is far more accurate and sensible for the given purpose. Let’s go through each use case one by one.

Measuring changes in well being: the Basic Living Index

Is there any way we can measure well-being numerically?

Creating some sort of “national happiness number” is an impossible task. It will contain all the problems of measuring GDP and then some. It will bury the underlying data and create endless arguments that it’s not taking into acount x, y, or z.

The best way to measure changes in well being over time is to do the following:

Create an index where you have precisely defined set of goods in the basket that maintain their definition over the time period that you are measuring. For instance, the basket might include:

100 dozen eggs

200 pounds of ground beef

300 pounds of flour

one day at the hospital, one prescription drug, 2 hours with a doctor

350 gallons of gasoline

enough heating gas for a year

the cheapest car that can legally drive on the highway, cost amortized over its lifetime

the median home amortized

The goods should be weighted by the actual, typical consumption over the course of the year. Then the total cost of buying one year’s worth of the good should be divided by the median wage of a 30 year old male worker. The basic living index now has a precisely defined definition that is true across the entire time period: “The number of hours you must work to meet your basic needs of food, shelter, clothing, heat, and healthcare.”

Then leave everything else to subjective discussions and interpretations. Do not try to turn subjective things into a number, leave them for the reader to decide. Each person for themselves can debate subjective claims such as: The median home in 1970 was smaller, but had a shorter commute. We have more access to music now, but the live music sucks.

Monetary Control

The primary number the government should use for monetary control is total personal income. Total income is simply a measure of the supply and demand for money. If the supply of money increases national income will rise as a person will have more money to spend. If demand money for falls, by definition the price at which a person is willing to exchange money for goods will rise, and thus national income will rise.

By preventing national income from falling, the government can prevent or stop recessions, which are caused by falling aggregate demand. And by controlling the rise of national income, the government can prevent destabilizing bubbles from developing ( I’ll explain this point in a future essay).

The second number the government should use is an index of the prices of alternative stores of value. Money is fundamentally a store of value. If the price of other stores of values are rising with respect to dollars, that is a good sign that the currency is frying. An index of alternative stores of value would include: central city real estate, farmland, stocks, oil, and gold.

Illustration

Sometimes you wish to adjust prices for the purpose of illustration. You may be reading a book that cites the price of a movie ticket in 1910. What does that price mean to someone today? The best way to adjust the price is by using a measure of wage. Median wage is good if the number is availble. Otherwise use the wage of a typical farm hand or carpenter. A wage number is less suspectible to fudge factors (although still not perfect), and is more directly aligned with what you actually want to know. If you are adjusting a price for illustration prices, you are trying to figure out what the price means to you. The best way to illustrate the price is by telling you how many hours it would take to earn that movie theater ticket.

Indexing Social Security

Social Security should be indexed to national income. In fact, the way to solve the entire social security crisis is to simply state that 12.6% of national income will go to social security, and whatever that income buys, it buys. If the economy produces fewer goods because seniors retire, there is no inflation adjustment possible that can give those seniors the promised income. If the cost of living increases because the economy is shrinking, everyone must bear the pain. Conversely, if the economy grows really fast there is no reason to exclude seniors from this windfall by reducing their income.

Comparing cross country wealth

Let’s say we want to compare the wealth of two countries. The best number to use is some sort of income measure adjusted for exchange rate. The best way is to find the types of jobs the median worker is employed in at that country - carpenter, taxi driver, factory worker, secretary or whatever - use that to find an average nominal median wage, and then multply by the population. If the country has accurate measurements of total income, then that could be used. But for third world nations where no good national income numbers exist, the median wage method will give more meaningful numbers.

Measuring the start and end of a recession

Use the unemployment rate. This will eliminate the absurdity of people saying the recession is over when the unemployment rate is still at 10% and rising. Any measure of utilization is also useful, such as automobiles manufactured as a percentage of peak output.

Measuring national output

For some purposes - such as military planning, studying the history of economic development in a country, or gauging the depth of a depression - it is useful to have actual measure of output. In this case the various industrial outputs should be measured directly. Compile a list of statistics for all industries - miles of railway, tons of steel, units of automobiles, cargo containers shipped, KWH produced, airplane flights made, the total horsepower of all machinery, bushels of wheat, phones per capita, etc. The units on these figures are the unit that you are measuring. There is no way to convert the units to dollars and compare the results across time period. Instead just compare the output directly.

Show more