Fivethirtyeight.com

How FiveThirtyEight Is Forecasting The 2017 NCAA Tournament

2017-03-13

Editor’s note: This article is an adapted version of one we published last year about how our March Madness predictions work.

Welcome to FiveThirtyEight’s March Madness predictions of the men’s and women’s NCAA basketball tournaments. We’ve been issuing probabilistic March Madness forecasts in some form since 2011, when FiveThirtyEight was just a couple of us writing for The New York Times.

Here’s how we computed everything in this year’s forecast.

Live win probabilities

Our interactive graphic will include a dashboard that shows the score and time remaining in every game as it’s played, as well as the chance that each team will win that game. These probabilities are derived using logistic regression analysis, which lets us plug the current state of a game into a model to produce the probability that either team wins the game. Specifically, we used play-by-play data from the past five seasons of Division I NCAA basketball to fit a model that incorporates:

Time remaining in the game

Score difference

Pre-game win probabilities

Which team has possession, with a special adjustment if the team is shooting free throws.

These in-game win probabilities won’t account for everything. If a key player has fouled out of a game, for example, his or her team’s win probability is probably a bit lower than we’ve listed. There are also a few places where the model experiences momentary uncertainty: In the handful of seconds between the moment when a player is fouled and the free throws that follow, we use the team’s average free-throw percentage. Still, these probabilities ought to do a reasonably good job of showing which games are competitive and which are in the bag.

We built a separate in-game probability model for the women’s tournament that works in exactly the same way but uses historical women’s data. Thus, we’ll be updating our forecasts live for both the men’s and women’s tournament.

Excitement index

Our March Madness “excitement index” (loosely based on Brian Burke’s NFL work) is a measure of how much each team’s chances of winning changed over the course of the game and is a good reference for picking the best games to flip to.

The calculation is simple: It’s the average change in win probability per basket scored, weighted by the amount of time remaining in the game. This means that a late-game basket has more influence on a game’s rating than a basket near the beginning of the game. We give additional weight to changes in win probability in overtime. Ratings range from 0 to 10, except in extreme cases where they can exceed 10.

Elo ratings

Otherwise, the methodology for our men’s forecasts is also largely the same as last year. But we’ve developed our own computer rating system — Elo — which we include along with the five computer rankings and two human rankings we used previously.

If you’ve followed FiveThirtyEight, you’ll know that we’re big fans of Elo ratings, which we’ve introduced for the NBA, the NFL and other sports. We’ve now applied them for men’s college basketball teams dating back to the 1950s, using game data from ESPN, Sports-Reference.com and other sources.

Our methodology for calculating these Elo ratings is highly similar to the one we use for NBA. They rely on relatively simple information — specifically, the final score, home-court advantage, and the location of each game. (College basketball teams perform significantly worse when they travel a long distance to play a game.) They also account for a team’s conference — at the beginning of each season, a team’s Elo rating is regressed toward the mean of other schools in its conference — and whether the game was an NCAA Tournament game. We’ve found that historically, there are actually fewer upsets in the NCAA Tournament than you’d expect from the difference in teams’ Elo ratings, perhaps because the games are played under better and fairer conditions in the tournament than in the regular season. Our Elo ratings account for this and also weight tournament games slightly higher than regular season ones.

Elo ratings for the 68 teams to qualify for the men’s tournament follow below.

RATINGS

PROBABILITY OF…

TEAM

REGION

SEED

ELO

COMPOSITE

FINAL 4

CHAMPS

Villanova

East

1

2142

95.2

40.2%

15.0%

Gonzaga

West

1

2029

93.7

41.5

13.8

Kansas

Midwest

1

2058

92.2

38.0

10.4

Kentucky

South

2

2054

92.3

30.2

8.2

North Carolina

South

1

2030

91.7

29.9

7.0

Duke

East

2

2044

92.3

23.7

6.7

Louisville

Midwest

2

1978

90.8

21.6

5.0

Arizona

West

2

2038

89.0

16.1

4.4

West Virginia

West

4

1966

90.8

14.7

3.5

UCLA

South

3

1965

88.0

9.8

2.5

Virginia

East

5

1924

90.0

9.6

2.5

Saint Mary’s (CA)

West

7

1888

87.4

11.8

2.1

Purdue

Midwest

4

1932

88.6

10.6

2.0

Wichita State

South

10

1972

88.9

8.4

2.0

Southern Methodist

East

6

2019

88.4

7.2

1.7

Iowa State

Midwest

5

1959

87.9

9.0

1.7

Baylor

East

3

1925

87.7

6.4

1.4

Oregon

Midwest

3

2026

87.3

6.6

1.2

Butler

South

4

1892

86.5

8.6

1.1

Florida

East

4

1946

87.8

5.7

1.1

Florida State

West

3

1897

87.2

7.0

1.0

Cincinnati

South

6

1903

87.4

5.3

0.9

Wisconsin

East

8

1874

87.8

4.4

0.9

Michigan

Midwest

7

1968

86.9

5.0

0.8

Notre Dame

West

5

1932

86.7

3.9

0.6

Creighton

Midwest

6

1887

84.4

2.8

0.4

Oklahoma State

Midwest

10

1863

84.7

2.0

0.3

Miami (FL)

Midwest

8

1867

84.6

1.6

0.2

Arkansas

South

8

1827

83.2

1.7

0.2

Vanderbilt

West

9

1816

83.8

1.3

0.1

Rhode Island

Midwest

11

1847

84.0

1.3

0.1

Kansas State

South

11

1745

83.1

0.8

0.1

South Carolina

East

7

1745

83.1

1.1

0.1

Seton Hall

South

9

1864

83.0

1.2

0.1

Dayton

South

7

1800

82.8

1.1

0.1

Marquette

East

10

1830

83.0

0.9

0.1

Michigan State

Midwest

9

1791

82.8

1.0

<0.1

Wake Forest

South

11

1797

83.0

0.7

<0.1

Xavier

West

11

1773

82.3

0.9

<0.1

Virginia Commonwealth

West

10

1823

82.9

0.9

<0.1

Middle Tennessee

South

12

1816

81.3

1.2

<0.1

Maryland

West

6

1754

82.5

0.9

<0.1

Northwestern

West

8

1764

82.6

0.8

<0.1

Minnesota

South

5

1827

81.2

1.0

<0.1

Providence

East

11

1805

81.8

0.3

<0.1

Southern California

East

11

1764

81.2

0.2

<0.1

Nevada

Midwest

12

1827

80.7

0.2

<0.1

Princeton

West

12

1824

80.0

0.2

<0.1

North Carolina-Wilmington

East

12

1798

80.2

0.2

<0.1

Virginia Tech

East

9

1822

80.0

0.1

<0.1

Vermont

Midwest

13

1786

79.5

0.1

<0.1

Bucknell

West

13

1679

77.9

0.1

<0.1

East Tennessee State

East

13

1721

78.1

0.1

<0.1

Winthrop

South

13

1664

75.5

0.1

<0.1

Florida Gulf Coast

West

14

1619

75.8

<0.1

<0.1

New Mexico State

East

14

1630

75.6

<0.1

<0.1

Iona

Midwest

14

1608

75.5

<0.1

<0.1

Kent State

South

14

1625

74.3

<0.1

<0.1

Troy

East

15

1643

73.3

<0.1

<0.1

Northern Kentucky

South

15

1614

72.8

<0.1

<0.1

South Dakota State

West

16

1624

72.8

<0.1

<0.1

North Dakota

West

15

1591

72.3

<0.1

<0.1

Texas Southern

South

16

1502

71.0

<0.1

<0.1

Jacksonville State

Midwest

15

1548

71.2

<0.1

<0.1

North Carolina Central

Midwest

16

1513

71.0

<0.1

<0.1

UC-Davis

Midwest

16

1528

69.9

<0.1

<0.1

Mount St. Mary’s

East

16

1454

69.8

<0.1

<0.1

New Orleans

East

16

1524

69.2

<0.1

<0.1

2017 NCAA Tournament team ratings

Note, however, that Elo is still just one of six computer rankings that we use for the men’s tournament. The other five are ESPN’s BPI, Jeff Sagarin’s “predictor” ratings, Ken Pomeroy’s ratings, Joel Sokol’s LRMC ratings, and Sonny Moore’s computer power ratings. In addition, we use two human-generated rating systems: the selection committee’s 68-team “S-Curve”, and a composite of preseason ratings from coaches and media polls. The eight systems — six computer-generated and two human-generated — are weighted equally in coming up with a team’s overall rating.

We’ve calculated Elo ratings for men’s teams only. For women’s ratings, we rely on the same composite of ratings systems that we used last year. You can find more about the methodology for our women’s forecasts here.

As has been the case previously, our ratings are also adjusted for travel distance and (for men’s teams only) player injuries. Our injury adjustment has been slightly improved to account for the higher or lower caliber of replacement players on different teams.