Econtalk.org

Cathy O'Neil on Weapons of Math Destruction

2016-10-03

Cathy O'Neil, data scientist and author of Weapons of Math Destruction talks with EconTalk host Russ Roberts about the ideas in her book. O'Neil argues that the commercial application of big data often harms individuals in unknown ways. She argues that the poor are particularly vulnerable to exploitation. Examples discussed include prison sentencing, college rankings, evaluations of teachers, and targeted advertising. O'Neil argues for more transparency and ethical standards when using data.

Play

Time: 1:11:09

How do I listen to a podcast?

Download

Size: 32.6 MB

Right-click or Option-click, and select "Save Link/Target As MP3.

Readings and Links related to this podcast episode

Related Readings

HIDE READINGS

This week's guest:

Cathy O'Neil's Home page

Cathy O'Neil on Twitter.

Cathy O'Neil on Wall St and Occupy Wall Street. Previous EconTalk episode with Cathy O'Neil. February 2013.

This week's focus:

Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, by Cathy O'Neil. Amazon.com.

Additional ideas and people mentioned in this podcast episode:

Susan Athey on Machine Learning, Big Data, and Causation. EconTalk. September 2016.

"The Ethical Data Scientist," by Cathy O'Neil. Slate, February 2016.

"U.S. News's corrupt college rankings," by Robert L. Woodbury. Originally published in Connection, the journal of the New England Board of Higher Education. Available at college-advisor.com.

"Why Nate Silver is Not Just Wrong But Maliciously Wrong," by Cathy O'Neil. Nakedcapitalism.com, December 20, 2012.

A few more readings and background resources:

Prisons and the prison system:

What Is the LSI-R?. PDF file. Level of Service Inventory for recidivism by offenders or incarcerations in Rhode Island. Rhode Island government website.

David Skarbek on Prison Gangs and the Social Order of the Underworld. EconTalk. March 2015.

Paul Robinson on Cooperation, Punishment and the Criminal Justice System. EconTalk. August 2015.

Pettit on the Prison Population, Survey Data and African-American Progress. EconTalk. December 2012.

An Introduction to Value at Risk (VAR), by David Harper. Investopedia, August 21, 2015.

Discrimination, by Linda Gorman. Concise Encyclopedia of Economics. Big data proponent in economics, 1940s-1960s.

Lawrence Klein. Biography. Concise Encyclopedia of Economics. Big data proponent in economics, 1940s-1960s.

A few more EconTalk podcast episodes:

Adam D'Angelo on Knowledge, Experimentation, and Quora. EconTalk. August 2016.

Hanushek on Teachers. EconTalk. August 2011.

Highlights

Time

Podcast Episode Highlights

HIDE HIGHLIGHTS

0:33

Intro. [Recording date: August 26, 2016.]

Russ Roberts: One of the great titles of all time, Weapons of Math Destruction. What are they?

Cathy O'Neil: They are algorithms that I think are problematic. And I can define them for you. They have three properties. The first is that they are widespread--which is to say they are being deployed on many, many people to make very important decisions about those people's lives. So it could be how long they go to jail, whether they get a job or not, whether they get a loan. Things that matter to people. That's the first characteristic. The second is that they are secret in some sense: either there's a secret formula, that the people who get scored by these algorithms--usually a scoring system--it's either a secret formula that they don't really understand, or sometimes even a secret algorithm that they don't even know that they're being scored by. And then finally, they are destructive in some way: they have a destructive effect on the people who get badly scored or they sometimes even create feedback loops--pernicious feedback loops--that are overall destructive to society as a whole.

Russ Roberts: Let's talk about those feedback loops, because you give some examples in the book of where I would call it a misunderstanding of a false correlation--or not a false correlation: a correlation that's not causative--is misinterpreted and it feeds back on itself. So, can you give us an example of that?

Cathy O'Neil: Sure. Pretty much every chapter in my book has an example of one of these problematic algorithms. But I guess one of the ones I worry about the most, if we want to jump in, is a family of models, actually, called 'recidivism risk scores,' that judges all across the country--

Russ Roberts: That's 'recidivism,' right?

Cathy O'Neil: Recidivism risk, yeah.

Russ Roberts: The risk of getting back on the bad side of the law and ending up in jail, for example.

Cathy O'Neil: Right. So they are basically--they are scored for people who are entering jail or prison. And 97% of people eventually leave. So the question is: How likely is this person to return? And so these algorithms measure the likelihood for a given criminal defendant to return. And they are given, like, basically--there are categories: either it's low risk, medium risk, or high risk. And that score is given to the judge in sentencing. Or, sometimes in paroling, or even in setting bail. But I'll focus on the sentencing. So, it might not be obvious, and it's actually not obvious. We can talk about it. But if you are a higher risk of recidivism, then the judge tends to sentence you for longer. And so we can get into what I think is problematic about the scoring systems themselves. But let me just discuss the feedback loop. The feedback loop here, which I consider extremely pernicious, is that when you are put in jail for longer, then by the time you get out of jail, you typically have fewer resources and fewer job prospects, and you are more of an outsider--more isolated from your community, you have fewer community ties. And you end up back in jail. So it's a kind of--it creates its own reality. By being labeled high risk, you become high risk. If that makes sense.

Russ Roberts: Yeah. So, that's a theory--right?--the idea that prison is not much of a rehabilitation experience and that in fact it could be opposite. Right? It could be an opportunity if you spend more time with people who, instead of making you a more productive person in legal ways make you a more productive person in illegal ways when you do get out. Do we know anything about whether that's true? It's a hard question to answer.

Cathy O'Neil: There certainly have been studies to this effect. And, by the way, I'm not claiming that this is inherently true. I mean, it's theoretically possible for prisons to be wonderful places where people have resources and they learn--you know, they go to college and they end up, because they spent a full 4 years there instead of 3, they end up with a college degree. And it actually improves their life after prison. But the studies that we know about don't point to that.

5:32

Russ Roberts: Okay. So carry on. But that's a fact of--that's an issue of how, whether present sentences should be structured the way they are and whether prisons should be, what the experience should be like of being in prison. Some would argue it could be a deterrent effect; maybe it's not in practice. But how does the data part of this interact--the riskiness and the length of the sentence, to have a feedback loop that's pernicious?

Cathy O'Neil: Right. So, the scores themselves are calculated in problematic ways. So the first thing to understand about these scoring systems is that they basically--there's two types of data that go into the recidivism risk scores. The first is interactions with the police. And the second is kind of questionnaires that most of these scoring systems have. And then they use all of this information--the kind of police record with the answers to the questions--and they have a logistic model that they train to figure out the risk of coming back to jail.

Russ Roberts: A logistic model is just a technical style of--an attempt to isolate the impact of the individual variables in this kind of 1-0 setting: Come back or not come back.

Cathy O'Neil: Right. Well, it's actually a probability, but you have a threshold. If it's above, like, 65% or something, you'll say it's likely to come back. I don't know the exact thresholds they set. Nor do I actually have a problem with using a logistic regression. I don't even have a problem with calculating this probability. What I have a problem with is sort of interpreting the score itself. So, to be clear, if we have to take a step back and understand how data and the justice system works, and what kind of data we are talking about here. And so, you know, everybody who has been alive for the last few years, has seen, has looked around and seen all these, you know, black lives matter movement issues. A lot of--the Ferguson Report, the recent Baltimore Report, reported in the Chicago Police Department Commission Report--all point to police practices which, at the very least we can all agree upon are uneven. So there's much more scrutiny of poor and minority neighborhoods. There's just many, many more police interactions in those communities. Um, which leads to an actually biased data set coming out of that practice. So, I already have a problem with that kind of data, going into these recidivism risk scores. If--and I just want to be forward, I want to object. I want to make the point that if we were only taking into consideration violent crimes, I would have less of a problem. But we're not. We're taking into consideration a lot of things that we consider broken-windows, policing type interactions with the police.

Russ Roberts: Explain what that is.

Cathy O'Neil: That's the stuff like nuisance crimes. Like, having a joint in your pocket. Peeing on the sidewalk. Things that are associated with poverty, more or less. And things for which poor people are much more likely to get in trouble with the police than richer people or whiter people. So, that's one of the problems: it is that the data coming in from the police interactions is biased. The other thing is that often the questions that are asked in the corresponding questionnaire are actually proxies for race and class as well. So, there's a very widespread version of this recidivism risk score called the LSI-R (Level of Service Inventory-Revised). One of the questions on the LSI-R is, you know, 'Did you come from a high-crime neighborhood?' So, it's a very direct proxy. The answer to that question is a very direct proxy for class. There is another question which is, 'Did your, do family members, in your family, have they historically had interactions with the police?' This is obviously again--it goes back to if you are a poor, black person, then the chance of your saying yes to that are much higher. I would also point out that that's a question that would be considered probably unconstitutional if we were asked in an open court--if a lawyer said, 'Oh, this person's father was in jail, Judge, so please sentence this person for longer.' That would not fly. But because it's embedded in this scoring system, it somehow gets through. And the reason it gets through is because it's mathematical. People think that because it's algorithmic and because it's mathematical--

Russ Roberts: It's science--

Cathy O'Neil: It's scientific, yes. That they think it's objective and fair by construction. And so, the biggest point of my book is to push back against that idea.

10:20

Russ Roberts: And that's where you and I have tremendous common ground. Right? So, in many ways--we'll turn to some other examples in a minute--but in many ways a lot of the examples that you give, are just, to me, really bad social science run amok. Which becomes more possible when there's more data. Which is what the world we're increasingly living in--

Cathy O'Neil: Yeah. I would make sure that--right up front that I'm not against using data.

Russ Roberts: I know.

Cathy O'Neil: But I'm not--

Russ Roberts: That's good to say. I know you're not, but it's good to say.

Cathy O'Neil: I'm a data scientist. And I promote good uses of data. What I'm seeing more and more, and the reason I wrote the book, is very unthoughtful uses of data being used in very high impact situations. Unfairly. And so we might agree completely. I don't know if we have disagreement, Russ. But I'm sure you'll find it if we do.

Russ Roberts: We'll dig up some. But it's an interesting example. You are a data scientist. I'm an economist. And of course we're in favor of using data and evidence and facts, but using them well. And using them wisely. It's an interesting challenge, how to react to that: if it becomes increasingly difficult to do that. So, to come a narrative that you write about as well in the book, which is financial issues: I have friends who argue, 'Well, of course we have to use technical, mathematical measures of risk, because that's the best we can do.' And that's certainly true: That's the best that we can do in most cases. Sometimes. But what if, by putting the risk into this mathematical formulation, you become insensitive to it? You start to think you have it under control? That, psychologically, even though you know it's a flawed measure, and you know when you could list all the assumptions that went into it that you know were not accurate about, say, the distribution of the error function or the likelihood of a black swan--even though you are totally aware of that day after day, of looking at the data and your model and saying, 'Everything's fine today,' you get lulled into a false sense of security. In which case maybe this is a weapon of math destruction. And it's very difficult for technically trained, rational, left-brained people to say, 'Yeah, I shouldn't overuse that because I'm prone to use it badly.'

Cathy O'Neil: Yeah. You bring up a really important point. I don't have a simple answer to it. But the truth is, it's really difficult even for trained professionals to understand uncertainty on a daily basis. With a lot of these things, the uncertainty is extreme. It's not the same thing as, say, the Value at Risk measure, which can be deceiving, even for people who kind of understand its failings. If that's an example you had in mind.

Russ Roberts: That is what I had in mind.

Cathy O'Neil: I mean, let's just go there. Value at Risk--I was a researcher at RiskMetrics, which kind of developed and marketed and sold for Value at Risk. It was clearly flawed. Of course, it was easy for me to say--I actually got there in 2009. But I feel like, if somebody had been in charge of being worried about Value at Risk being misinterpreted, they wouldn't have had to go too far to find the way people were--and I'll use shorthand here--the way people were stuffing risk into the tail in order to gain the 95-var risk measure. And I don't want to get too wonky here. But the point being that we had a sort of industry standard of worrying about 95 var. Sometimes 99. What that meant was that we never looked further afield than that kind of risk.

Russ Roberts: Right. That's a perfect example. I assume by 95 or 99 you mean 1 in 20 or 1 in 100 chance.

Cathy O'Neil: One in 20. Exactly. The worst return in 20 days.

Russ Roberts: So, when you have a 99 and that's your standard and it never gets close to it, after a while you start to think everything's great. And of course that's not true. Let's go back to the prison example. You are a consulting firm--I assume; this is a privately designed, for money, for profit measure that some Department of Justice grant has funded or is paying for. And who wants to say that, 'Oh [?] I'm not sure we should really use this because it's got all these proxies that might not be accurate for what we're trying to measure. So, I would just use it as a crude rule of thumb. But I wouldn't rely on it.' But that's not really a very good career move. It's not a very good move for a person at the Department of Justice, let alone the consulting firm. So isn't that part of the problem here, is the temptation to soft-pedal the problems in these kind of models when you are being paid, on either end, as the buyer or seller?

Cathy O'Neil: I mean, great point. I would even emphasize that in the case of the justice system, what we're dealing with currently is a very, very problematic situation, where judges are probably less reliable than these terrible models. So, in other words, I wouldn't say, 'Hey, let's go to the old days,' when we just relied on judges who were often more racist than the models I'm worried about. What I am worried about--and yes, so that's one thing. The next thing is, 'Yes, I built a model but it's not very good.' Right? No one wants to say that.

Russ Roberts: 'But it's still a bargain. You got a good deal, trust me. It's great for what it is.'

Cathy O'Neil: That actually is the context for the--they could probably honestly say, 'I built a model and it's better than what you have.' Right? Yeah. And there's another thing going on, by the way. I interviewed somebody, like, you know, on background, who is a person who models, who builds recidivism risk models. And I asked him what the rules were around his models. And in particular I said, 'Well, would you use race directly as an attribute in this logistic regression?'

Russ Roberts: Let me guess.

Cathy O'Neil: And he said, 'Oh, no, no, I would never use race--'

Russ Roberts: Of course not--

Cathy O'Neil: 'because that would be--that would cause racial disparities in the results, in the scoring.' And I said, 'Well, would you ever use zip code?' And he said, 'Yeah, maybe.' Well, that's a proxy for race. In a segregated country like ours, what's really the difference? And he said, 'Yeah, no you're right, but it's so much more accurate when you do that.' It is more accurate. But what does that mean? When you think about it, what that means is, well, police really do profile people. So, yes, it is really more accurate. In other words, this doesn't--we want mathematical algorithms and scoring systems to simplify our lives. And some of them do. Like, I'll tell you one of my favorite scoring systems. If you've visited New York City, it's the restaurant grades. You know, there's a big sign, a big piece of paper in every restaurant window saying, you know, what their score was, last time they got the Sanitation Department came and checked out their kitchen. And you know not to go to a restaurant that doesn't have an A grade. Right? Why does that work so well? Because it simplifies a relatively thorny and opaque question, which is: Is this a hygienic restaurant? And we don't know if it's a perfect system. But it does really have this magic bullet feel to it, which is: That's all I need to know. Thank you.

Russ Roberts: Well, we know it's not a perfect system because on the night you ate there maybe the people didn't wash their hands that day; and it was three weeks after the inspector and everybody's falling back into [?] behavior--

Cathy O'Neil: Of course, of course. Absolutely.

Russ Roberts: You raise an important issue throughout your book, which is: These kind of simple indices, like, what's the probably of recidivism--which is a big, complicated thing, obviously, that's very person-dependent but we're going to simplify it as a function of 8 variables. Or the same thing is true from the grade from the Department of Health. The problem with a lot of these is of course that they can be gamed by the people to achieve a high score that doesn't represent high quality.

Cathy O'Neil: So, it can be. And actually there was an interesting blog post about the prevalence of restaurant scores--so they started out as numbers, I guess, and then they turned into grades--that are just above the cutoff. So, there is clearly something slightly unstatistical about that. But at the same time, you know--and we also don't really know what we need in a clean restaurant. But it is, crudely put, a good way for us as consumers.

Russ Roberts: There's some information there. That's what I would say.

Cathy O'Neil: There's some information there. The problem with recidivism scores is what we've done is we've basically given the power to a class of scientists, data scientists, who focus on accuracy only. And when, again, when I talked to the person I interviewed, I said, 'You know, is accuracy--is the only thing we care about accuracy?' I would care more about causality, right? And you mentioned the word 'causal.' Like, the question should not be, 'Is this person poorer?' And are they poor minority people. The question should be 'Is this person going to commit another crime that we can prevent?' And, like--in other words, they can't do anything about having grown up in a poor neighborhood. For that fact to be used against them doesn't seem right.

20:28

Russ Roberts: I want to dig into this a little deeper, because if things go as planned this episode will add shortly after a conversation with Susan Athey, who is a machine learning econometrician, who makes a distinction in our interview between prediction and causation. And that's what you're talking about, I think--we should clarify this and go a little deeper. When you say 'accurate,' it very well may be the case that people from this particular zip code or people with these characteristics have a higher chance of committing a crime when they come back out of jail. And therefore ending back in jail. And that would be the "prediction" part: it fits the data well. These characteristics "predict"--they may not predict for this person very well but they do predict with these classes of people--these groups--according to the variables that you've actually measured. And that is not necessarily what we care about in a justice system; because, I think your argument--correct me if I'm wrong--you're argument if we observe in these neighborhoods a lot more police presence we may actually see more types of police interaction and even arrests and sometimes crimes of smaller versus larger amounts that will confirm the model in the sense that it's "predictive," but it's not really describing the fact that these people are more likely to necessarily be bad people, but they are just more likely to get swept up in a police problem. Is that kind of what you're getting at?

Cathy O'Neil: Yeah. That's a really good description. Let me just reframe that a little bit, which is: I would look at the system as a whole. And it's not just police. It's also the way our jobs work for poor people, or don't work. The way our economy offers opportunities to [?] or doesn't. But I guess the simplest way to put it is that when you give someone a score this way and then you hold them accountable in a certain sense--by which I mean judges actually sentence people to longer if they have higher scores--in a very direct sense you are punishing them for that score. And so you are laying the blame on them. You are pointing a finger at them; you are saying, 'You have a bad score; I'm holding you responsible for that.' And the question is, of course, 'Why do you have a bad score?' Is that because of what you've done?

Russ Roberts: And who you are.

Cathy O'Neil: Or is it because of the police system you live in? Is it because of the economic opportunities you are given or not given because of who you are, how you were born, how you were raised? And the point is that that's a very hard question which I'm not equipped to answer by myself. But I am equipped to say that as a data scientist it should not be my job to decide this.

Russ Roberts: Yeah. I just want to clarify what I said before, because I think it might be somewhat confusing. If I fit the data on what's the probability of somebody coming back into prison, I may have variables in there that correlate with that probability, but they are not causal. It just happens to be the case that people from these neighborhoods because of a police presence at certain time or different allocations of resources or whatever it is--school quality--it may turn out to be true. It doesn't imply that this person in particular, when they go back into that neighborhood, will have that experience. Because there could be a correlation that's not causal. And I think that's the distinction that machine learning is unable to make--even though "it fit the data really well," it's really good for predicting what happened in the past, it may not be good for predicting what happens in the future because those correlations may not be sustained.

Cathy O'Neil: And we hope them aren't, in that situation. Let me give you another example; and you said it very well. It's a thought experiment that your listeners might enjoy. I'm imagining that there's a tech company and they want to hire engineers. That happens a lot, actually. And they decide to--they are having trouble finding good engineers, so they want to use a machine learning algorithm to help them sort through resumes. And of course they have their own history of hiring people, and those people either succeeded or they didn't succeed in their company. But they have to define success for this model to sort through the historical data and look for people who look like they have succeeded. That's basically what--when you want to build a model you have to define your data set; you have to say what success looks like; and [?] to feed the algorithm--you should choose an algorithm--but once you've chosen the algorithm you have to tell it, 'Look for this; look for patterns of people that look like this success story.' Now imagine that they define success as someone who has been there for 3 years and has been promoted at least twice. Now imagine that they run this machine algorithm; it gets trained on their historical hiring practices; and they set it on the new data set, which is new applications for engineering jobs. And they find that, like, no women get through the filter: that the algorithm literally rejects all the women applicants. What would that mean?

Russ Roberts: It obviously means women aren't good at being engineers.

Cathy O'Neil: I've set it up, an extreme case; probably not happening.

Russ Roberts: Playing straight-person to your--

Cathy O'Neil: Right, right. Thank you: Straight man. I set it up to be extreme, but the point being like the algorithm would not say, 'Hey, you guys should check to make sure your culture is welcoming to women.' Right? It would instead just say, like, 'Women do not succeed at this company; throw them out.'

Russ Roberts: Or it could be that the applicants--there aren't very many women in the data set because you have a poor history in the past and there's a lot of noise in the data, so women are just not matched to those characteristics that you found. But certainly the culture example would be more dramatic, right? If you have a sexist culture, women are going to look like they can't get those promotions, and as a result you are going to be encouraged not to hire them in the future by the machine learning. And then you'll see how smart you were--you'll think you're really smart.

Cathy O'Neil: If you don't like that example--

Russ Roberts: I like that example.

Cathy O'Neil: Well, I'm just going to say, think about Fox News and women anchors. It's not that they don't have any women. It's that the women that they have are pushed out. Right?

Cathy O'Neil: I don't know if that's true.

Cathy O'Neil: I'm not saying that this is actually happening in a given engineering firm. I'm just making the point that a machine learning algorithm is dumb. They don't understand the 'why.' The only understand the 'what happened.'

Russ Roberts: I think that's important to emphasize. There are patterns; sometimes patterns are very dramatic. But that doesn't mean they'll be sustained in the future or that they should be sustained. Right?

Cathy O'Neil: Exactly.

27:20

Russ Roberts: A friend of mine worked at a company and said he noticed that everyone there--he was an intern--he said he noticed that everyone there who had a permanent job, had only gone to 3 different universities. I don't think that was a coincidence to start with for their resumes. And it's not a bad place to start. Obviously there are good universities; I'm not going to name them; I don't remember them, actually. But they were good universities; but that's not necessarily--that's one way to reduce the cost of sifting through a lot of resumes. It's a very crude and perhaps not a terrible way to save time and cost. But as you get to these more sophisticated methods, as you point out, you get this opportunity to make false conclusions. Right? It's pretty straightforward.

Cathy O'Neil: I mean, it's interesting. Because, you know, it's kind of obvious once you say it. But these algorithms, you know, as sophisticated as they are--and they sometimes are: they deep learning, they are all network algorithms--I wouldn't call it 'sophisticated' but they are certainly unintelligible.

Russ Roberts: They are fancy.

Cathy O'Neil: They don't make moral decisions. They literally only pick up patterns that already exist. So, it would be great--and sort of the Big Data promise is that you throw data against a wall and truth falls out. The Big Data promise is that somehow the truth is embedded in historical practices. But that's only true if historical practices are perfect. So, as soon as we have a firm that has--an engineering firm that has like really mastered what it means to find good engineers--as soon as we have that then we should make a machine learning algorithm to mimic that. But I don't think we have that yet.

29:20

Russ Roberts: And I think the other point you make which I think is important--I'm not sure I agree with it in all the cases you give: there's not always a mechanism for making the model better. So, in the case of the engineers, you'd consistently hire men. You slowly would weed out the women in that case, or you wouldn't hire them to start with. And you'd have a model that you'd be foolishly thinking had worked pretty well, but in fact you've made a mistake. Now, I would argue that firms that do that have an incentive to at least think about whether they are making a mistake: whether their big data models are serving them well. And I think we are in early days. So, one argument would be, against your pessimism about these models, would be, 'Well, we're just starting. Sure, they make some mistakes now but we're going to get better.' In fact, the evangelists would say, 'It's just going to get better and better. Of course they're imperfect.' What are your thoughts on that optimism and pessimism?

Cathy O'Neil: I'm actually one of those people. I know we're going to get better. What I'm trying to point out is that we can't assume we're already good. What I'm objecting to are high-stakes decisions being made when there's no actual check or monitor on the fairness or the actual meaningfulness of the scores themselves. And I say, 'meaningfulness,' because I'm thinking about the teacher-value-added model--

Russ Roberts: I was just going to ask you about that.

Cathy O'Neil: Yeah. I don't think the problem there is discrimination, per se. Like, actually a lot of the teachers are women. It's a very diverse field. There might be some discrimination issues around it. But the biggest problem is that it's not very meaningful. We have these scores that are typically between 0 and 100. And some work has been done to see just how consistent the scores are. And it's abysmal.

Russ Roberts: Let's back up. Put the uses and the Value-added model in context, because listeners won't know what it is. This is an attempt to evaluate teacher quality and use that evaluation to either--typically to fire the worst teachers under various mandates. Right?

Cathy O'Neil: Yeah. It goes back a couple of decades and a few Presidencies. The idea is: Fix education by getting rid of the bad teachers. And we have this myth of these terrible teachers that are ruining education. And I'm not saying there aren't--

Russ Roberts: Yeah; I wouldn't call that a total myth. I think there are some lousy teachers.

Cathy O'Neil: There absolutely are bad teachers; and there are bad schools. But, I'm just claiming--and I'll repeat myself--that, you know, there might be a problem but if you have a solution that doesn't actually solve the problem then you are getting nowhere. And I think the value-added model for teachers is an example of that. So, what they've done, the first generation of teacher assessment tools, was pretty crude and obviously flawed. And that was to sort just count the number of students in a given teacher's class who, like, were proficient in their subject by the end of the year. And the reason that was super-crude was that essentially performance on standardized tests is highly correlated to poverty. Across the nation. And across the world, in fact. And when you discounted the number of students in a given class that attained proficiency and that punished the teachers who had very few of those students, then you are punishing basically teachers of poor students. And it was pretty clear that that wasn't good enough. Like, that wasn't--it wasn't discerning enough as a way of finding bad teachers. Or another way of thinking about it was, 'These kids weren't proficient in Third Grade. Why would they suddenly be proficient in Fourth Grade?'

Russ Roberts: Yeah. You are not controlling for the initial quality of the students that the teachers had to deal with. So that's clearly wrong.

Cathy O'Neil: Exactly. Right. So that's clearly wrong. So, they wanted to do exactly what you just said: they wanted to control for the students, themselves. So, what they've developed is this, what I call a 'derivative model.' So, it depends on another model, which is in the background, which estimates what a student, a given student, should get at the end of their fourth grade year. Let's say. And is based on what they got at the end of third grade--reasonably enough--as well as a few other attributes like what school district they are in, like whether they qualify for school lunches--which is a proxy for poverty. Various things. So, now, just imagine: Everybody in your class--you are a teacher, a fourth grade teacher--everybody in your class has an expected score at the end of the year. What is your score ending up? What's your Value Added score? It's going to be essentially the difference between--the collection of differences because you have a bunch of students--the differences between what your students actually get versus what they expected to get. [More to come, 34:19]