Umwbullet.com

Statistical Modeling, Knowing Your Limitations, and Some Reflections

2016-02-14

I tweeted today about the difference between good and bad models, and the importance of recognizing that fancy stats aren’t necessarily good stats. I wanted to write a little bit about that, post some thoughts about understanding the limitations of models, and reflect on my own model’s successes and misses.

With the semi-regular attacks on statistical models in the media, I see the wagons circle on Analytics TwitterTM and understandably so. People work hard on their models, and as far as I can tell the vast majority of people do it for little or no reward beyond Twitter likes and an increased following. When they feel that work (or the similar work of others) is diminished as done by weaklings in air-conditioned rooms, it’s easy to feel attacked. That becomes a vicious cycle though, where we end up trusting “stats” in general over anything else, regardless of the method. So when we have a dozen or more prediction models out there, all coming to different conclusions, it’s important to evaluate those models to see which one is closest to the truth.

Everyone is entitled to use their own criteria, and really learning how to do it involves more statistical training than I can do in a simple blog post. For me, I don’t trust any model that doesn’t post their methods and predictions publicly. There can be some proprietary elements, but if I don’t have enough info to evaluate how someone came to their conclusions I discount that model’s conclusions heavily.1 I don’t necessarily like models that include salary data because it over-values teams like Manchester City who overpay for players, and makes some general assumptions that higher paid players are better. More than anything, I want to see public predictions and some sort of validation of those predictions. Let me see how well your model does, let me see where it succeeds and where it misses, and hopefully learn from that. Again, people can post what they choose, but I try to be as transparent as possible and I think people have responded to that.

I’ve been fortunate to gain a lot of followers very quickly – this was just supposed to be a fun project for me, a way to learn some machine learning techniques, and maybe people would enjoy it. To have picked up 2100 followers in ~6 months is beyond anything I thought would happen, and I couldn’t be more appreciative of all the people who read and share my work. I’m glad people find what I do interesting, and hope to continue over this year. I have needed to adjust my thinking accordingly, and wanted to post those thoughts/concerns publicly so people could evaluate them and my project accordingly.

My goal isn’t to call out any model or stat specifically, so I want to talk about my model for a minute. I don’t do so out of narcissism, and will focus on the “growth opportunities” as much as I do the successes. You can learn both from being right and being wrong, but in many cases the opportunities to improve are in fixing bad predictions rather than congratulating yourself for your correct ones.

I’ve posted this before, but the original goal of my model was to quantify the contribution of individual players. I experimented with some “Points Above Replacement” metrics, but got some pushback so I put that on the backburner to validate the model before I could confidently assert its value. So I decided to let my pre-season predictions run the entire season and to see how well they do. As of last week I’m leading the 90+ entrants in Scoreboard Journalism’s prediction contest, and was the closest to identifying the black swan that is Leicester City by tapping them for 8th place with 60 points. I’m overall pleased with the model’s results, with a few caveats I wanted to mention and some cautionary notes on my predictions and be as transparent with my thoughts as I can so people who follow me can understand what I’m doing better. As a reference, here’s this week’s predicted probabilities of each team’s final table position.

Arsenal leads the pack with a ~75% probability of finishing first. Leicester City is in second with a ~15% chance, and Spurs and City both have a ~5% chance of winning the league. Arsenal fans should be happy with this, but there are some caveats here. Here is my diagnostic plot, showing MOTSON’s predicted points vs. the actual points earned.

Arsenal, United, Southampton, and Man City are all basically on the regression line, which means MOTSON has predicted their points perfectly through 26 weeks. They’ve all hovered around that line for the first 26 weeks – United was around +5 or so for a while, but quickly slipped back to the mean as their form slipped into what it’s been recently. Southampton was -5 or so, but has improved in recent weeks, but otherwise they’ve all been fairly close to expectations all season. Some temporary deviations are to be expected, so what this means is that my model has a really good handle on exactly how good Arsenal, United, Southampton, and City are. When the title race seemed like a two team race between Arsenal and City, then I was very confident in how highly my model rated Arsenal (even when the rest of the world was picking City – a pick that seems to have been validated recently).

However, my model has done better with Leicester than anyone else, but still underestimates their ability by a significant amount. How much, I’m still not entirely sure. It did like Arsenal to beat them at home, which happened, but it also liked City to beat them, which didn’t happen. To be fair, the simple in-season results model liked City in that match as well so it may have just been an upset, but it’s hard to tell. Regardless, Leicester’s overperformance means the model likely underestimates their “true” ability, which means their predicted likelihood of winning the title is understated. How much? I’m not sure, but I am personally confident the number is higher than 15%.

The same thing goes for Spurs: they’re in the middle of a special season where they’re out-performing expectations. They’re not doing it as much as Leicester obviously, but MOTSON really seems to have underrated them. So their number is probably higher than the 5% chance they’re being given right now, but again, I’m not sure by how much.

I’m torn on whether I want to keep presenting the model’s predictions as/is, knowing that the percentages are skewed against Arsenal. For me, it’s an academic exercise, but it’s taken on more of a following than I anticipated so I wanted to be transparent with what I think is going on with the model. I’m not altering it from the pre-season, so it comes with certain assumptions. Primarily, that Leicester City will play like the 8th best team in the EPL instead of a top 3 team, and that Spurs are a top 6 team but not a top 3 team. Those affect the model’s predictions, and it’s become particularly relevant the last couple of weeks so it’s something people should be aware of when they evaluate my model (and everyone else’s).

I’m confident Arsenal will end the season right around 75 points. I’m not as confident that Leicester will end at 71 (where I’m currently predicting them) or that Spurs will only have 68. The model is presented as such because for this type of work you don’t update just because you want to. It’s not necessarily a bad thing because it eliminates recency bias. If I did that, I’d have put West Ham in the top 4 early in the year, dropped Southampton into the bottom half of the table six weeks ago, and would have handed City the title at least a half dozen times (like many other modelers did by the way). Those would have been big mistakes, and would have happened because I trusted my own (flawed) judgment instead of the model’s. Trust in the numbers, but be aware of their limitations. This applies to any statistics you read, including, but not limited to, mine. I’m just more transparent about it than others.

Personally I think people overvalue the proprietary nature of their models – if you’re truly good at statistical modeling you should be able to just outperform other people regardless of how much you share. I would also never pay anyone who isn’t transparent with how they do things, but who am I to tell people how to earn or spend their money? ↩