Fivethirtyeight.com

In Search of America’s Best Burrito

2014-06-05

Seven years ago, I moved to Wicker Park, Chicago. The neighborhood, once heavily Hispanic, was being inundated by hipsters and yuppies, and its taquerias, some run by Mexican families who had immigrated to Chicago a generation earlier, were finding new audiences for their wares: the creative professional on her lunch break, the bro on his late-night bar crawl.

I was destined to be one of their best customers. I’m a burritophile. But of the 19 taquerias within a short walk of my apartment, which was the best? I decided to try them all, comparing them two at a time by ordering the same food item (say, a carne asada burrito) and knocking out the weaker alternative in an NCAA-style elimination tournament. Thus began the Burrito Bracket.

The Wicker Park version of the burrito bracket played down to a final five and then I got distracted, partly because I was beginning work on what would eventually become FiveThirtyEight. (I think of La Pasadita, the No. 1 seed, as the unofficial champion.) My burrito dreams were deferred. But I’d still like to know where to find the best burrito in Chicago. In fact, I’d like to find the best burrito in the country.

That’s what we’re about to do. We’re launching a national, 64-restaurant Burrito Bracket. We’ve convened a Burrito Selection Committee. We’ve hired an award-winning journalist, Anna Maria Barry-Jester, to be our burrito correspondent. She’s already out traveling the country and sampling burritos from every establishment that made the bracket.

It’s a little crazy, but we think it needs to be done. And we think we’re the right people to do it. One reason is that narrowing the field to 64 contenders is a massive problem of time and scale. It perhaps couldn’t be done adequately if not for a little data mining and number crunching. In 2007, I was able to sample every burrito restaurant — in one neighborhood, in one city. But there are 67,391 restaurants in the United States that serve a burrito. (I’ll tell you how we came up with that figure in a moment.) To try each one, even if you consumed a different burrito for breakfast, lunch and dinner each day, would require more than 60 years and run you close to 50 million calories.

We need some way to narrow the list of possibilities. Fortunately, Anna and I were able to enlist some help. The past seven years have produced explosive growth for crowdsourced review sites like Yelp. Yelp provided us with statistics on every burrito-selling establishment in the United States.

The Yelp data was the starting point for FiveThirtyEight’s Burrito Bracket, which will officially launch early next week and whose solemn (but not sole) mission is to find America’s best burrito. There are three major phases in the project, each of which I’ve already hinted at:

  Step 1: Data mining. Analyze the Yelp data to create an overall rating called Value Over Replacement Burrito (VORB) and provide guidance for the next stages of the project. (This step is already done, and I’ll be describing the process in some detail in this article.)

  Step 2: Burrito Selection Committee. Convene a group of burrito experts from around the country, who will use the VORB scores and other resources to scout for the nation’s best burritos and vote the most promising candidates into a 64-restaurant bracket — 16 contenders in each of four regions: California, West, South and Northeast. (The committee has already met, and we’ll reveal the 64 entrants in a series of articles later this week and this weekend.)

  Step 3: Taste test. Have Anna visit each of the 64 competitors, eat their burritos, rate and document her experiences, and eventually choose one winner in a multi-round tournament. (Anna will be posting her first reviews early next week. She’s worked as a documentary photographer and multimedia journalist, and as a producer at ABC News and Univision, where she’s spent years reporting on Hispanic-American culture.1)

This project involves a mix of seriousness and whimsy. Anna and I both have a lifelong obsession with Mexican-American food. We also know that burritos are not a matter of great national importance.

But the question of how consumers might use crowdsourced data to make better decisions is an important one. Billions of dollars turn upon customer reviews at sites like Yelp, Amazon, Netflix and HealthGrades. How should you evaluate crowdsourced reviews as compared to the recommendations from a professional critic, or a trusted friend? Are there identifiable biases in the review sites and ways to correct for them? When using sites like Yelp, should you pay more attention to the number of reviews, or to the average rating?

We’re only going to be able to scratch the surface of these questions, some of which have received too little empirical study. But burritos provide a good way to experiment precisely because they represent a relatively narrow range of experience. There are different burrito styles across the country — more than you might gather if your burrito-eating ambitions have never ventured beyond Taco Bell. But there are fewer parameters to control for when rating burritos than when comparing movies, or doctors, or colleges.

We’ll eventually crown a national champion burrito, but with no illusions that the bracket can offer a definitive result. We hope, nevertheless, that there is some value in our approach of blending analytics and first-hand experience. It isn’t a perfect analogy, but in some ways this project represents an attempt to engage in a “Moneyball”-style experiment pitting statistics (as represented by the Yelp ratings) against “scouts” (as represented by the members of the Burrito Selection Committee — all of whom have extensive experience working in or writing about the food industry). During our deliberations, our professionals sometimes had strident disagreements with the Yelp ratings. So Anna (who had not previously been a professional food reviewer) will visit restaurants that are rated highly by Yelp but that our “scouts” consider mediocre, and others that they vouch for but that Yelpers largely dislike or ignore.

The rest of this article will describe our procedures for working through the Yelp data and selecting the field of 64 burrito-selling establishments. If you have more of an appetite for burritos than for detail, you might want to scroll through this quickly and check back over the next few days as we begin to reveal and review our selections.

Step 1: Data mining

The data we got from Yelp contained 67,391 listings for businesses that had at least one review mentioning “burrito” or “burritos” and were open as of Feb. 1 of this year.2 I use the term burrito-selling establishments (BSEs) to describe them because the list was not strictly limited to businesses categorized as “restaurants” by Yelp: food trucks and grocery stores are also potentially eligible, for example, so long as they make something clearly recognizable as a burrito available for commercial sale.3

What is a burrito, you might ask? In developing VORB, I resolved this question by basing the scores solely on those Yelp reviews that mentioned “burrito” or “burritos.”4 This was a subject of considerable debate among the Burrito Selection Committee, however. For instance, a Japanese restaurant in Plano, Texas, came up fairly high on the VORB list because it sells a highly regarded item called a “superman burrito.” The “superman burrito,” however, is really a Texas-sized sushi roll. One of our panelists, worried about the dishonor to Texas’s reputation, threatened bodily harm to anyone who voted to include it in the bracket. More often, however, our group was tolerant and inclusive, so Anna’s itinerary will include breakfast burritos, vegan burritos, Korean fusion burritos, a chimichanga and even something called a “Mexican hamburger.”

Quality vs. quantity of reviews

Another challenge was in making the best use of the different types of data that Yelp made available to us. For instance, would you rather go to a restaurant rated at 4.5 stars based on 50 reviews, or one rated at 4.1 stars based on 500 reviews? (Yelp ratings range between 1 and 5 stars, with higher ratings being more favorable.)

One issue is that the restaurant with 50 reviews has a larger margin of error. Yelp reviews take a long time to converge to the mean. Restaurant reviews are not like movie reviews or book reviews where everyone is evaluating the same product. Instead, customers with different tastes are evaluating many different menu items. They’re also having different customer service experiences, and they might be more or less sensitive to factors like price, décor or portion size.

Based on an analysis of Yelp’s publicly-available Academic Dataset, which covers all reviews in Phoenix, Arizona, I developed a formula that accounts for the degree of mean-reversion and predicts what a restaurant’s average review will be going forward (as well as the margin of error associated with the prediction). I found that a higher volume of reviews is a fairly powerful signal in predicting future review quality.5 Even mediocre reviews tend to favorably predict star ratings going forward, provided that there are a lot of them.

This may still undersell the importance of a restaurant’s popularity, however. Popular restaurants tend to attract a wider and less experienced set of reviewers,6 who may view the restaurant differently and be less predisposed toward liking it.7 In addition, my research found that the volume of reviews — and not just the average rating — was a fairly powerful indicator of a restaurant’s likelihood of appearing on “best of” lists as prepared by professional food critics and may therefore serve as a useful proxy for “expert” judgment.

The formula I developed, VORB, is inspired by the baseball statistic VORP, or Value Over Replacement Player. Just as VORP measures both the quality of a baseball player’s performance and how often he plays, VORB accounts for both the quality and the quantity of a restaurant’s reviews.

Regional and other adjustments to Yelp reviews

Also like VORP, which adjusts a player’s statistics for his environment (it’s easier to post superficially better batting statistics in a small ballpark like Fenway Park than in a cavernous one like Dodger Stadium), VORB considers a restaurant’s location.

Yelp was founded in San Francisco and adopted early there and elsewhere in California. Even now, Yelp’s usage is considerably heavier in urban areas, and urban areas have much greater population densities and potentially more visitors per restaurant. So it’s much easier for a restaurant to accumulate a large number of Yelp reviews in San Francisco or Los Angeles than in Spokane, Washington, or Lafayette, Louisiana.

The solution was to use chain restaurants as a common denominator. While not all Chipotles or Taco Bells are identical to one another, they control for a lot more variables than two independent taquerias do.

More specifically, the geographic adjustment is based on the number of reviews for the 50 Mexican chain restaurants nearest to a BSE’s ZIP code.8 The most extreme cases were several ZIP codes near San Francisco, where chain restaurants were reviewed almost six times more often than the same chains elsewhere in the country, and ZIP code 25704 (Huntington, West Virginia), where nearby chain restaurants were reviewed almost 17 times less often than the national average. Roughly speaking, we’d need to divide the number of Yelp reviews for San Francisco restaurants by six, and multiply the number of Yelp reviews for Huntington restaurants by 17 to make them comparable.

I also discovered substantial differences in average star ratings between the same chains in different parts of the country. Orlando, Florida, is the biggest outlier; chain restaurants there were rated 0.5 stars higher than the same chains elsewhere in the country. (Reviews of one Baja Fresh location in Orlando, for example, contain the sort of praise that is usually lavished upon restaurants like L’Arpège.) By contrast, the stingiest chain reviews were in the 90021 ZIP code, Los Angeles, where Mexican chains were rated about 0.4 stars lower than the national average. This presumably implies, among other things, that the standards for Mexican food are much higher in Los Angeles than in Orlando. Scores for BSEs were adjusted accordingly.9

I made two other minor adjustments to the Yelp ratings. One is for how recently a restaurant opened: Accumulating 100 reviews in six months is more impressive than doing so in six years.10 Another rewards restaurants that have a higher share of 5-star reviews: My analysis of the Phoenix data found that the proportion of 5-star reviews favorably predicted future star ratings, even after accounting for the average star rating.11

The best burritos according to VORB

Another similarity between VORB and VORP is that both rely on the notion of “replacement level.” In baseball, a replacement-level player is one who is just on the fringe of being a major leaguer — perhaps a third baseman who hits about .245. The analogous concept in the restaurant industry would be an establishment that is just on the fringe of being a viable business. Imagine a generic-looking Chinese takeout place, serving plates of General Tso’s chicken to a lonely customer or two.

On Yelp’s rating scale, replacement level appears to be somewhere near 3.3 stars. In the Phoenix data, 3.3 stars represented the breakpoint in predicting future star ratings. That is to say, ratings above 3.3 stars tended to favorably predict future star ratings, while those below 3.3 stars tended to negatively predict them.12 Also, average reviews for lightly reviewed businesses — like the Chinese restaurant I asked you to imagine — tend to cluster around 3.3 stars.

I suspect you’re hungry for some data. (A more complete derivation of the VORB formula can be found in the footnotes.13) The BSEs with the 20 highest VORB scores in the country are as follows:

The list is heavy on California restaurants, despite the regional adjustment that corrects for the fact that California businesses are heavily reviewed by Yelpers. The top-rated burrito-selling establishment, according to VORB, is El Farolito in San Francisco’s Mission District. As of Feb. 1, it had 1,840 reviews that mentioned burritos, with an average rating of 4.25 stars. Only Lucha Libre Gourmet Taco Shop in San Diego had more burrito reviews, and its star rating was not as strong.

El Farolito’s presence on this list was no surprise to us. Last year, it won Esquire’s billing as “the most life-changing burrito in America.” I’ve been there myself a few times and can vouch for it serving a darned good burrito. But “life-changing”? I’d want to have a lot more burrito-eating experiences — in different parts of the country — before conceding the top spot to El Farolito.

That’s what Anna is going to do. This list of 20 restaurants ought to formulate a very good first cut. You’d do much better to walk into any of them than to pick a burrito place at random. But some of our purpose is to evaluate the reliability of crowdsourced reviews, rather than to take them for granted. Our test will be to taste the results for ourselves. Anna will visit most of the restaurants you see on the top 20 list14 and a number of others in the top 50 or the top 100. She’ll also be visiting some others with average or even poor VORB scores. We’ll see how many false positives and false negatives there are.

Step 2: Burrito Selection Committee

I’m a fan of sites like Yelp, but they have some complicated aspects, such as that previous reviews can potentially influence future reviews. (I’d recommend Duncan Watts’ book “Everything is Obvious” for some research on this.) Moreover, Yelp reviewers are within their rights to consider a restaurant’s price, service and decor in addition to the quality of its food. There might also be other biases: A restaurant known to serve margaritas to underage patrons might receive good ratings for all the wrong reasons. We’re concerned only with the food.

It’s for reasons like these that we formed the Burrito Selection Committee, rather than simply sending Anna to the restaurants with the highest VORB scores. Sixty-four burritos is a lot, but not compared to a field of more than 67,000 candidates. We hope the committee can prevent Anna from wasting time at boozy brunch joints that have little chance of serving America’s best burrito, or missing those that Yelpers downrate because of the surly service but where the burritos are top-notch.

Finally, there are still a few parts of the country — the rural South and Southwest, predominantly Spanish-speaking neighborhoods in mid-sized cities — where Yelp usage is so sparse as to not provide much of a signal at all. The committee wanted a few representatives from these locales.

Professional reviewers can have their own biases, of course. Sometimes this is because they are after different things — Oklahoma Joe’s Barbecue, which is located in a gas station in Kansas City, Kansas, is rated as the third-best restaurant in the country, according to Yelp. Oklahoma Joe’s is, in fact, a life-changing experience,15 but not likely to make the Michelin Guide.

Burritos represent less of a trade-off between fancy and folksy food — they’re almost always on the unpretentious side of the scale. Nonetheless, there were some restaurants in the VORB Top 20 that our panelists insisted were mediocre at best, and others resturants that they thought Yelpers were crazy for rating so poorly. In these cases, was the crowd wrong, or were our panelists snobs? Anna will be taste-testing her way to a view on that question.

The regions

Before I introduce the members of the panel, a little more about the bracket itself. The first thing I did, even before receiving the data from Yelp, was to divide the country into four regions, as the NCAA basketball tournament does. The idea was to split the 50 states and the District of Columbia into four regions of roughly equal burrito strength. These divisions were based on a combination of three factors, weighted equally: (i) the Mexican-American population in each state; (ii) the number of Mexican restaurants in each state; and (iii) the relative popularity of the word “burrito” as a Google search term in each state, multiplied by the state’s population. The regions I came up with were as follows:

California, the Burrito State;

The Northeast, a burrito-sparse region that stretches all the way from Missouri to Maine16;

The South, including parts of Texas and Oklahoma;

The West, excluding California, but otherwise consisting of the western half of the continental U.S., plus Alaska and Hawaii.

Texas and Oklahoma were the only states split across regions; I messed with Texas because failing to do so would have produced a potentially imbalanced map. (Specifically, the states are split at 97°24’ West longitude, which is close to the center of population of each state. Dallas, Houston and Tulsa are east of this line and placed in the South region; Austin, San Antonio, El Paso and Oklahoma City are west of it and are in the West region.)As it turns out, the regions may be somewhat imbalanced anyway. Based on the cumulative VORB of all BSEs in each region, California has more than twice as much burrito goodness as the West and Northeast, and about four times more than the South.

One reason may be that burritos figure especially prominently in Cal-Mex cuisine (as opposed to Tex-Mex cuisine, which is more focused on tacos, enchiladas, fajitas and so forth). If we were basing the regions on solely on VORB, a roughly even division might be: (i) Northern California; (ii) Southern California; (iii) everywhere west of the Mississippi, except California; and (iv) everywhere else.

But it could also be that there’s something about Yelp reviews, or about the VORB calculations, that biases them in favor of California. (For instance, the regional adjustments in VORB might be suspect.) Moreover, there is something to be said for burrito diversity: Does the 17th-most-promising burrito in California deserve to go in ahead of the best one in Alabama? For now, California looks like the grupo burrito de la muerte — but we’re curious to see whether it completely dominates the tournament or winds up being overhyped.

The bracket

In contrast to the NCAA basketball tournament, which requires six rounds to pare 64 teams to one champion, the Burrito Bracket will consist of three rounds. In each round, Anna will choose one BSE from among four to advance to the next stage.

The first round consists of intra-region competition and will reduce 64 competitors to 16. Within each region, the top four entrants were ranked and seeded by the selection committee, while the other 12 contenders went unseeded. Somewhat like in a tennis tournament, the seeded BSEs are protected from facing one another in the first round. Instead, they are grouped into pods with three unseeded BSEs; Anna and I chose these groupings based on geography and other thematic factors.

In the second round, the surviving BSEs will again be matched up four at a time. In contrast to the first round, the second round will consist of competition across regions. (This will be when California has a chance to assert itself.) The total number of BSEs will be reduced from 16 to four.

The third round simply consists of determining one national champion burrito from among the final four.Members of the committee

The Burrito Selection Committee consisted of six members: Anna, me and one representative from each of the four regions:

Gustavo Arellano represents California. He is an editor at OC Weekly and has a weekly syndicated column called “Ask A Mexican.” He is the author of “Taco USA: How Mexican Food Conquered America.”

Jeffrey Pilcher, representing the West, is the author of “Planet Taco: A Global History of Mexican Food” and a professor of history at the University of Minnesota.

David Chang is the Northeast correspondent. He was the winner of the 2013 James Beard Foundation Outstanding Chef award and is the founder and owner of the Momofuku Restaurant Group.17

Finally, Bill Addison represents the South. He recently became the restaurant editor at Eater and was formerly at Atlanta Magazine and the San Francisco Chronicle, where he once ate 100 burritos in 10 weeks.

Our deliberations

Anna will be introducing the committee members to you at more length in the coming days. She’ll also be describing the deliberation process and its results in some detail, so what follows is just a quick summary.

The committee met in New York in early March.18 Before the meeting, each regional representative prepared a cheat sheet listing potential entrants from his region. Anna and I encouraged the panelists to look at VORB scores along with other factors, including other crowdsourced review sites (e.g. TripAdvisor), professional reviews and “best of” lists, and first-hand knowledge or experience.

The degree to which the committee members weighed the Yelp ratings against the other factors varied. Some of that was smoothed out during the deliberation process, where all six members had a vote. (I generally played the role of advocating for more rather than less emphasis on the VORB scores.) But each region contains a mix of BSEs that were rated very highly by VORB and others that were not.

We told the committee members to focus on evaluating the potential quality of the burritos themselves — including their taste, the quality of the ingredients and the balance of flavors — while ignoring price, service, atmosphere and the quality of non-burrito dishes on the menu. We also told them to look for evidence of a “house specialty,” meaning a particular type of burrito that receives especially rave reviews. (Technically speaking, we are hoping to find the best burrito in America rather than the best taqueria or burrito-selling establishment. If it’s clear, for example, that the carnitas burrito is amazing at Barney’s Burrito Castle, we shouldn’t care that Barney also serves a mediocre chicken burrito.)

We also told the committee to consider, as a secondary factor, the representativeness and diversity of their lists, such as by geography or by different styles of cuisine. The purpose of this was not to engage in “burrito affirmative action” (as one of our panelists described it) but instead to avoid duplicative or “second-best” experiences.19 The committee generally preferred to send Anna to high-risk, high-upside BSEs over those that were thought to be very good but which had little chance of providing the very best burrito in the United States.

The voting process itself was inspired by the one the NCAA uses to select and seed the annual men’s and women’s college basketball tournaments. There were four rounds of balloting in each region, each of which permitted four BSEs at a time into the bracket. Deliberations took place between each round of voting, with committee members lobbying for their favorites. After the initial set of 16 BSEs was selected for each region, the committee was allowed to override its previous selections by majority vote, which it sometimes did, such as to correct for a perceived imbalance of representation among different parts of the region.

The resulting list was diverse. We wound up with representatives from every corner of the country — Maine, Key West, Florida, Seattle and Hawaii — along with “flyover” states from Iowa to Idaho.

Step 3: Taste test

Anna has already made her initial visit to many of the BSEs, and has come up with a 5-factor system to rate their burritos on a scale of 0 to 100. She and I also agreed upon several additional ground rules before she began her visits:

She should seek to identify the “house specialty” burrito at each restaurant, consulting the chefs or owners for advice where necessary (her Spanish fluency has proved to help in some cases).

She should attempt to consume at least half of the burrito unless it literally or figuratively proves to be inedible.

She should concentrate on the quality of the burrito alone rather than other factors.

She should judge each BSE based on its performance on the day of her visit, rather than hedging it against her previous expectations.

She should attempt to avoid experiences that would not be available to an anonymous patron — although we recognize that this will become harder as the identity of the list becomes more public.

Anna’s having a lot of fun on her travels so far. Now’s the time to hand the project over to her; we hope you’ll join along the way.