Econtalk.org

Pedro Domingos on Machine Learning and the Master Algorithm

2016-05-09

What is machine learning? How is it transforming our lives and workplaces? What might the future hold? Pedro Domingos of the University of Washington and author of The Master Algorithm talks with EconTalk host Russ Roberts about the present and future of machine learning. Domingos stresses the iterative and ever-improving nature of machine learning. He is fundamentally an optimist about the potential of machine learning with ever-larger amounts of data to transform the human experience.

Play

Time: 1:05:50

How do I listen to a podcast?

Download

Size:30.2 MB

Right-click or Option-click, and select "Save Link/Target As MP3.

Readings and Links related to this podcast episode

Related Readings

HIDE READINGS

This week's guest:

Pedro Domingos's Home page

@pmddomingos. Pedro Domingos on Twitter.

This week's focus:

The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World, by Pedro Domingos at Amazon.com.

Machine Learning, by Pedro Domingos. Coursera class, U. of Washington.

Additional ideas and people mentioned in this podcast episode:

Chris Anderson and the Long Tail. EconTalk. August 2006.

Medical Uses of Machine Learning:

Robert Aronowitz on Risky Medicine. EconTalk. November 2015.

Brian Nosek on the Reproducibility Project. EconTalk. November 2015.

Causality and randomized trials:

A/B Testing. Wikipedia.

Adam Cifu on Ending Medical Reversal. EconTalk. February 2016.

Connectionism. Wikipedia.

"The Common Sense of Bayesianism," by Bryan Caplan. EconLog, July 2005.

A few more readings and background resources:

Uplift Modelling. Wikipedia.

"It Had to Be You", as sung by Frank Sinatra. Music by Isham Jones; lyrics by Gus Kahn. Metrolyrics.com.

A few more EconTalk podcast episodes:

Nick Bostrom on Superintelligence. EconTalk. December 2014.

Richard Jones on Transhumanism. EconTalk. April 2016.

David Mindell on Our Robots, Ourselves. EconTalk. November 2015.

Tyler Cowen on Inequality, the Future, and Average is Over. EconTalk. September 2013.

Joel Mokyr on Growth, Innovation, and Stagnation. EconTalk. November 2013.

David Autor on the Future of Work and Polanyi's Paradox. EconTalk. October 2014.

Highlights

Time

Podcast Episode Highlights

HIDE HIGHLIGHTS

0:33

Intro. [Recording date: May 2, 2016.] Russ: This book is an introduction to machine learning; it's an introduction to a world that actually we're all immersed in without realizing it, most of us, as well as its implications for our lives and the future of humanity. It's a really fascinating book. We're going to go through its main ideas today. Let's start with the most basic one: What is machine learning? Guest: Machine learning is computers programming themselves instead of having to be programmed by us. In the first stage of the information age we had to tell computers what to do, right? So, most of the things that people see computers doing, somebody actually wrote a program in painstaking detail, explaining to the computer exactly what to do, step by step. And this is very slow and costly, and so limits the rate of progress in what computers can do. The new idea in machine learning is that we don't actually have to program the computers any more. They look at the data that we generate and from that data, they try to figure out what to do. Like, for example, a Recommender System from Amazon or Netflix is, from the things you've bought or clicked on or that other people have bought or clicked on will try to predict what it is that you want to buy and then recommend that to you. Russ: You describe it as a technology that builds itself, an artifact that designs other artifacts. That gives it a sort of volitional consciousness; but these exercises by computers don't have a will of their own. We still have to tell them a goal. And so, we're not writing the code, but the programmers are telling them something that they want to accomplish. Correct? Guest: Yeah, exactly. In some sense what we are doing is programming them at a higher level. We just tell them in general terms what it is that we want them to do, and then we let them figure out by themselves how to do it. And then the whole trick becomes: How do we set their goals? But you are right. These machine-learning programs, no matter how much they learn, they do not have goals of their own. It is still our job to tell them what their goals should be. Russ: Now, something like recommendations for Amazon or Netflix, I think in the back of our minds most of us have some idea of how that would happen. We know it has something to do with looking at other people like us and seeing what they liked, and then figuring we'll probably like it, too. And when you actually have to figure out how to make that happen it's not so straightforward. And a chunk of the book is about that, as well as other challenges. But can you give us some other examples--which you do in the book--of what these kind of algorithms are helping us with or are doing for us online that we're not probably aware of? Guest: Yes. So, first of all there are the online things, which is actually where people have for the most part met machine learning although they may not be aware of it. But it's not just Recommender Systems. When you do a web search, Google or Bing use machine learning to figure out what results to give to you. When you go to Facebook, Facebook uses machine learning to figure out what updates to show you. Twitter uses it for Tweets. Just about everything. For example, online dating sites, they actually use machine learning to propose potential dates to people. And, you know, these days, a third of all marriages start on the Internet, so they are actually children alive today that wouldn't have been born if not for machine learning. But it's not just online. It's also in real life, if you will. So, just as an example, a self-driving car is essentially powered by machine learning algorithms. We actually don't know how to drive a car by itself. It learns by observing people and the road and trying to figure out how steering and braking and whatnot follow from what the camera and the other sensors are showing. But other things--like your smart phone. Your smart phone is full of machine learning algorithms. They do the speech recognition; they do the spelling correction. They choose what things to show you, at what time. As another example, a lot of [?] thinking is there are a lot of things in Finance are then by machine learning. Companies use machine learning to streamline job applicants: so, some people have or don't have a job because a machine learning algorithm decided that they looked good or not. And the same thing with getting raises. Medical diagnosis, machine learning algorithms are typically better than human doctors at diagnosing various things, like for example deciding by looking at an x-ray whether there is a tumor in somebody's chest or not. And so on. The list is really endless.

5:41

Russ: A lot of the examples you gave--and I don't know if this is just by chance; and I didn't feel this when I was reading your book--but a lot of the examples that you give are examples of what I would call matching. Dating is an obvious example, but so is, really, Netflix and Amazon: You are trying to match me with movies or books or products that I would be likely to buy. Is that a huge portion of the use and application of machine learning, or do I just happen to--is that a coincidence? Guest: Yes. No, it's not a coincidence. So, all of these applications are instances of really the same phenomenon, which is: The Internet has created this world of really infinite choice, where anybody can buy anything from anybody, anywhere, any time. And this is great; but what it creates is a problem of too many choices, right? If you want to buy a book and you go to Barnes and Noble, you go to the relevant section: let's say, Mystery books. And you can actually take a look at all of them, because there's maybe only 500 or not even. Whereas, on Amazon, there's--who knows? Maybe a million. And therefore, something else needs to come in to help you make those choices. And that something else is machine learning. So, in machine learning it's kind of like the next logical step from the Internet. The Internet creates all these choices because it puts everybody in contact with everybody. But then what machine learning does is it actually lets you make those choices without having to look at every item yourself. In some sense what machine learning is trying to do is, for example, pick out the books that you would have picked out if you were able to look through all 500,000 of them. So, you know, like, there's this notion of the Long Tail, right? And how the Internet has made it possible to buy things from the long tail. But that can really only happen once there is a way for you to explore the long tail. It is long. And machine learning is what does that for us. Russ: But there's a lot bigger promise. It's interesting. A lot of life is about matching. You gave an example that I think of as filtering. But it really is matching, also, right? It's: How do I match what I'm interested in, how do I pare and get rid of stuff that I'm not so interested in and match what I am interested in to my time. Guest: Yeah. And matching buyers and sellers--a little bit of what machine learning does is it makes the economy work more smoothly. Because a problem of matching--it's sort of like economics that there's this abstraction, right, that there's a market and all buyers can sell to all sellers. But the devil is in the details. The interesting thing is that machine learning actually makes that closer to being possible than at any time before. So, and you know, some things are matching people with items; some things are matching people with other people. So, there are various versions of this with different flavors and different constraints on them. But a very interesting example of all this is the market for ads online. When you see a web page or search results, you get a bunch of ads. And those ads are actually selected in a fraction of a second by an auction among all the possible ads that could be shown. And then based on all the ads--based on the content of the ads, and on the text of the page, and whatever--the ad network, like Google, knows about you as a user--and also what different advertisers--[?] will link to pay for the slot--it actually runs the auction to the site: What should--and the machine learning is actually what makes that work well. So, the better machine learning is at predicting how likely you are to click on something, the better this will work for everybody--for you, the user, because you don't see useless ads; for the seller because their money isn't wasted. And for Google or whoever because they make more money in that regard. So, this--so, like, the ideal model of how economics works in some sense is actually being enabled by machine learning literally at the rate of billions of auctions per day for just the case of ads. Russ: Yeah. It's a--the technical term is it's reducing transactions costs, search costs, and other types of things that make transactions less effective than they otherwise would be.

9:44

Russ: But, just to stick with this matching for a minute, which is intriguing me. You hold out the promise--we'll talk about it in a minute--of much, a vast array of knowledge that could be accessible through machine learning. But of course some of that is matching as well; we just don't normally think of it that way. So, a person who has cancer, God forbid, and has to figure out what treatment to use: That is a matching problem in a certain dimension. Especially if you don't even know which medicines to search among. And there may be medicines to search among that aren't in existence yet. So, you have to, in theory, figure out what those should be. Or could be. Right? Guest: Yeah. Exactly right. So, there is a great example, because if--you know, medical diagnosis a priori doesn't seem similar to these other things, but yet [?] is very similar. In the same way that Amazon recommends a book for you or Netflix recommends a movie, what a medical diagnosis some[?] can do is recommend a drug for the particular thing that ails you. And cancer is a great example, because cancer is hard to cure, because it's not a single disease. Everybody's cancer is different. And even the same person's cancer three months later because it mutated. And so the best solution is probably to have something like a machine learning algorithm that tries, given the person's, the patient's, genome, and their medical history and the mutations of the cancer tries to predict what is the best, you know, drug to treat that cancer. It could even be designing a new drug, which is also something that is done by machine learning. But that is very much how this works. Having said that, you know, the problem of finding a good drug for somebody's cancer is much, much harder than the problem of finding a book or a movie to recommend. But, you know, we are getting there. Russ: And the costs of mistake--if I don't like the movie after 20 minutes, I'll turn it off. It's obviously a bad, ineffective drug is a lot more serious. Guest: Exactly. Russ: I have to say: I'm a very optimistic person, and I have a lot of faith in the human enterprise writ large--not so much in any one human. I have very little faith in any one human, which is why I'm suspicious of experts and power that are centralized. But I'm very optimistic about the human enterprise. But you may be more optimistic than I am. Which is impressive--is all I'm trying to say. I want to take this medical example for a moment. We've had guests on the program recently--Adam Cifu and Robert Aronowitz--who talk about some of the challenges of medical diagnosis and some of the ineffectiveness. And Cifu particularly talks about medical reversal--techniques, surgeries, devices where the empirical evidence suggests that they work well, not only not work well, they are often sometimes harmful. So, machine learning can't solve that problem, it seems to me. And yet there's such a cheery tone to all of this. Do you want to react? Guest: Well, actually, on the contrary. So, by the way, I am an optimist; but I think I also worry a lot about the things that might go wrong. And I actually think that's the right balance. I think we should be optimistic because, you know, the historical record justifies that. And, you know, if you are not optimistic, then you won't try to improve things you want. At the same time there are a lot of things that can go wrong, and we need to be continuously worrying about them. And the problem that you just mentioned is one of them. But actually, largely the cure for that problem is machine learning. That things could be more evidence based. The problem is that today there is a lot of, you know, surgery and treatments and whatnot, that get, you know deployed widely on the basis of very little evidence. On the basis of one little study on a thousand people--there's this whole problem, as you know, of reproducibility and these results that happen by chance. Russ: Yup. [?] Guest: Exactly. So, but the solution for that is to be more evidence-based, is to have more evidence, if they have more trials, they have, you know--if we get the data from the patients as things are rolled out little by little, they never go from just a thousand patients to, you know, 10 million people in one step and then people never look at it again. Where medicine is going is that we are continuously getting the feedback from the results of treatments. And, based on those results, certain things will rise or sink. And certain things that might have looked promising in the first study, you know, if on the next 10,000 people it doesn't work so well, then we drop it. Or, what might happen is we realize: Well, this only works well for this 5% of people. But the machine learning lets us figure out which 5% it is. And then it's not used for the other 95%; but for this 5% it is used. And again, these things are starting to happen, but we just need to move more in this direction, despite the resistance from various directions. And, by the way: This is not just in medicine. I think for every x we should be doing more evidence-based x. Right? Evidence-based policy, etc., etc.

14:45

Russ: So, yeah, that's the question: Will the fundamental complexity of, say, the human body or the macroeconomy yield themselves, yield itself to the magic of machine learning? And, I'm somewhat skeptical. I like the idea--that, yep, it hasn't worked so well so far in this particular area, say--whether it's machine learning or not we just have to try harder. The question is whether it is ultimately doable or achievable or not, right? So-- Guest: It may not be. So, you are right. So, like machine learning can only learn things that are learnable. And there is a lot of theory about what is learnable about what is not learnable. And it is possible that, for example, the economy or the human body at some point are too complex for us to be able to fully predict it well. But remember, the goal in machine learning is not to predict perfectly. It's just to predict well enough to be useful. Russ: Or better. Guest: It's a little different from the laws of physics. Let me give you an extreme example. One of the oldest and most successful of applications of machine learning is in predicting, you know, the stock market, and foreign exchange situations, and things like that. And to predict whether a stock will go up or down is extremely difficult. But if you are 51% accurate, consistently, that's enough to make you very, very rich. If you come to me and say that you are 55% accurate, I think you are probably making a mistake somewhere, or you are trying to con me. So, just being slightly better than chance is actually enough to do well. And a lot of the most successful applications of machine learning were the ones where the starting point was very low. So, my guess is what's going to happen is: Well, [?], they were not going to be able to predict the economy perfectly, or the human body perfectly; but we are going to be able to predict it much, much better than we are predicting now, and get correspondingly better results. Russ: Yeah; I'm not so sure. I think it's something that's in the news these days, we've talked about before on the program. Which is the impact of the minimum wage. So, most of attempts to assess the impact of the minimum wage on workers, from their wages, from their employment--of course, there are other aspects we tend not to talk about because they are not easy to measure: their on-the-job training, the way they are treated day-to-day, politely or not, those things we just don't observe usually. But attempts to do that use traditional statistical methods. Which, the current generation of young econometricians are extremely optimistic, and have solved most of the problems that were there in the past. Machine learning takes a very different approach, if I understand it correctly, from the statistical causation, relationship ways that people have tried impact in the past. Can you talk about that a little bit? Sort of--not sort of. Can you talk about the differences between traditional attempts to assess relationships such as regression analysis and statistics, and what machine learning might provide that might be better? Guest: Yes, so I think there are two big differences. One is that the traditional statistical methods are very, very limited. They are things like linear regression. The only relationships you can find are the ones that follow a straight line. And this will make you blind to most things that are actually going on in the real world. Machine learning embraces the full gamut of nonlinear models. It's just, you know, it's like having a Ferrari versus having a bicycle. There is really no comparison. And the other one is that traditional statistical analysis is based on very small amounts of data. It's like, let's make decisions about the minimum wage based on some aggregate data. That aggregate data has left out most of the important details. In machine learning, you can based on what you can learn from all the individuals. And [?] individuals and say what happened with them. And then, finally, I think the way a lot of these big decisions will get made using machine learning is not that you do a big study up front and then decide what's going to happen--that you start to deploy things a little bit here and there and see what happens. And it just according to that. And this is actually what all the, you know, companies on the Web do. They do these things called A/B test and uplift modeling. Which is basically trying out the two conditions, right? Do I have a minimum which here, do I not have a minimum which there; and comparing how the two happen. And by doing that carefully you can actually do not just correlation modeling, but actually suss out what causes what. And then you can, you know, take actions, based on the causal connections that you discover that way.

19:09

Russ: So, just--we should clarify, because we've been talking for a few minutes already. What are the differences between machine learning, big data, data mining, artificial intelligence. These are buzz words that we hear, I think used interchangeably in many cases--which is not accurate. Can you talk about the differences between those? Guest: Sure. So, machine learning is a subfield of artificial intelligence. Artificial intelligence is the subset of computer science that deals with, in some sense, the hardest problems. It's trying to make computers do the things that in humans require intelligence--like, you know, problem solving, vision, understanding language, common sense, understanding--etc., etc. Of all these capabilities, the most important one is the ability to learn. In fact if we made the rule about tomorrow that was intelligent as a human but couldn't learn--the following, you know, day, would have fallen behind and would never catch up again--so, machine learning is very much at the center of Artificial Intelligence. Russ: Maybe with the people you hang out with. But yeah, I get your point. Go ahead. Not everybody learns every day. But I understand what you are trying to stay there--about falling behind. Some people are pretty static. Guest: No, but here's the thing. This is a very natural thought. But even when we natural human beings don't think of ourselves as learning, we are learning all the time. I'm talking about learning even at the lowest levels, like perception. A lot of things that we don't even notice that we take for granted. Those are actually the hardest things. It's one thing to be like learning to do a new job or learning to do something in a new way. Those in some sense are actually the hardest things for people to do; and you are right. You know, many people just kind of go along with that, without learning much at that level. But I am talking about much simpler things that we human beings take for granted; although it is actually the ones that computers are the most troubled with. And, you know, this is what machine learning helps with, like all the recent advances in computer vision and in speech recognition and whatnot--they were driven by machine learning. Those things that people don't understand often, you know, why are computers so bad at them? It's because they are learning all the time. Subconsciously. And the computers didn't know how. And, you know, we are still not as good as people at doing these things, by any, you know--by any stretch. But we are getting there with the help of learning.

21:24

Russ: So, I interrupted you. So, machine learning is a subset of artificial intelligence. What do people mean when they say 'big data' or 'data mining'? There are very closely related to machine learning, though. Guest: Yeah. So, big data is just a lot of data. Right? And the relationship between machine learning and--you know, today's big data is tomorrow's data. What seemed like big data yesterday is just data now. And much like big data today, tomorrow will be, you know, the normal amounts of data. But there is a symbiotic relationship between machine learning and big data. Big data is--you can learn from small data. You can learn from a thousand data points, a hundred, a thousand--it doesn't have to be a million or a billion. But as the amount of available data grows exponentially, the machine learning grows exponentially more powerful as well, just by piggy-backing on that. And this is indeed very powerful because maybe as before you could maybe program a thousand rules by hand, today you can easily get a million by pushing a button and solving the data and then you see the rules from that. So, the big data drives the machine learning. Machine learning also drives big data. Because at the end of the day, the reason they capture all that data and store it and clean and process it is so that you can then, you know, go and learn models from it. So machine learning is what makes big data useful. And big data is what makes machine learning powerful. Data mining is a term that actually came from the business community. It was originally applied to this notion of companies supply large databases; there's probably lots of good information in there along with maybe a lot of information that is not relevant; and the goal is to kind of like mine out the gold nuggets from that load. It's a term that's increasingly less used today. It's been largely replaced by 'data science'-- Russ: Science? Guest: which is another buzzword. But I think 'science' is a better word than data mining, and the way I would describe it as a science is that it has machine learning at its core, but it also has all the high-performance computing, parallel processing on the one hand, information integration, and then things like visualization, reporting the results, interacting with the humans who use the system [?]. It's the complete set of things that you need to have to make the most of the data that you have, whether it's in business or in science or in governments or whatever. Russ: So, in the case of machine learning, it still struggles with this issue of causality. I just want to come back to this for a second; then we'll move on. But it's often looking for patterns in the data; it's looking for matches and for similarities. It's looking for probabilities that something is more like something else. And ultimately it can get better and better as it gets more data: it's seeing those correlations. But it fundamentally can't answer the questions of causation that we care about. And therefore it is-- Guest: No, no, no. Not true. Not true. This is what I was saying. This is actually a very common misconception about machine learning, is that it can only discover correlations. Machine learning can and does and has discovered causation, not just correlation. And as I mentioned before, this is actually happening as we speak. We all, users of Google and Facebook and Amazon--you've probably not knowing been a part of, I don't know, hundreds of these A/B tests, whose goal is exactly to suss out what is the causal effect of making a change, for example, on the website. And they are able to do that because it's experimental data. So, from observational data that was just gathered for whatever purpose, it's often hard to suss out causality, although even in that case, there are techniques, you know, that within some limits can discover causal relations from observational data. But let's set that aside. For the most part, causality, the way scientists discover causality, is by performing experiments. And you can perform experiments in machine learning, and this is what all these web companies do: they try out both conditions and they see what the results are. And this is just what people were doing in medicine, you know, decades ago, of, like: we'll have a control group of patients who don't take anything or a placebo; we'll have a control group of patients who take the actual drug; and then we see what the differences in the results are. People in computer science call this A/B Tests, you know, and more generally it's called randomized trials. But by doing randomized trials you can suss out causality. But with machine learning you can actually do a lot more than that. You can figure out not just, 'Well, does this drug work or does it not work?' You can figure out who does it work for? Who are the 5% of people for whom this drug is a cure? Because you can then, you know, you not only look at the two conditions but you look at all the other variables involved. So machine learning is actually an extraordinarily powerful way to discover causal relationships. It's just that most people haven't realized that yet. Russ: Well, I have to disagree with you. I'll make my case one more time. You can respond; and then we're going to move on. And some of this, I suppose, is somewhat a philosophical disagreement. But, randomized trials are certainly--are often--better than observational data where we attempt to control for unobservables. But that doesn't mean there aren't unobservables in the randomized trials. Certainly many randomized trial results do not generalize, have not generalized. And that's because we really don't understand the causal mechanisms. So if you tell me that because this drug has worked on this set of people with these characteristics, it will work on other people with the same set of characteristics, it might be true; it might be better than a random guess as to what will work for them. But if you don't understand the causal mechanism that's going on at, say, the cellular level, you don't know that these people really are similar to the other people. They are just similar observationally for what you have data on. You can't conclude that it's going to work for them; and oftentimes it doesn't. And we know that, because some of these randomized trials fail to scale, or fail to apply in different settings. And that's because, just as there are in observational cases, there's missing data on observables. Guest: Yeah. This really boils down to what you mean by causality. Right? You know, of course, there is a long history of people discussing that throughout history. But at the end of the day, I think it really is just the following: Causality is being able to predict the effects of your actions. Russ: Yup. Guest: If you know how to predict the effects of your actions, that's all that you need, right? And if, you know--a notion of causality that does not involve predicting the effect of your actions, I'm not sure what it would mean or how it would be different from correlation. And so, you know, all I'm saying is that you can use machine learning to predict the effect of your actions. Now, a lot of the problems that you are pointing out, they are very real: missing data, insufficient data, and so on and so forth. That limits how well you can do your causal modeling. Russ: Right. Guest: For example, in the case of cancer: it is enough have this high-level, these drugs work on these patients? Or do you really need to go down to modeling in the cell works? And the gene regulatory networks, and whatnot? And I think that [?] you need to go down to the latter. But we can get the data for the latter from, you know, micro arrays and gene sequencing and whatnot. And the former will get you some part of the way there. And there are lots--like guarantees they can establish that unfortunately a lot of these trials tend to fail. They'll let you know with high confidence whether you have something that generalizes or not. I mean, the entire field of machine learning is all about, 'How can I be conscious[?] that I have generalized well from the data that I have seen to what I haven't seen?' And again, this is induction; it's not deduction. So it's never perfect. But it's far from, you know, just picking up correlations.

29:10

Russ: So, let's--you've divided the types of machine learning into 5 different kinds: symbolists, connectionists, evolutionaries, Bayesians, and analogizers. All of them are sort of attempts to mimic things we think the brain does in certain ways, it seems to me. Maybe that's not an accurate way to describe it. But, talk briefly about each kind, if you would. And a little bit about how people who focus on those techniques, what are some of the ways they use machine learning. Guest: Yes. So, some of them mimic the brain, but [?] very much a [?]. The people who do them would be very resistant to the notion that what they are doing is modeling the brain, because they think, you know, the brain is just a pile of evolutionary hacks and there's no guarantee that it's actually doing the optimal thing. So, their goal is to discover the optimal thing straight out. So, like, to go through these: The connectionists are the people whose agenda is to do machine learning inspired by how the brain works. It's reverse engineering the competition. It's: The brain is a network of neurons, so let's build a model of a neuron, connect it up in a network, and then try to learn in some way the brain does, which is by adjusting the strengths of the connections between the neurons. Everything that you've learned is encoded in how strong the connections between the neurons are. And the stronger the connection between neurons A and B and neuron A fires, that makes neuron B more likely to fire. So, the stronger the connection, the more neuron A will tend to make neuron B be fired. And because all the knowledge is encoded[?] in the connections, this approach is often, you know, this school of thought is often known as Connectionism. And they are inspired by the human brain, but only up to a point, because at the end of the day, it's machine learning and the goal is to just learn however we can. So at the end of the day a lot of these connectionist [?] even though they were originally inspired by the brain, often end up looking very different from what the brain does. And now, another approach that is inspired by nature, although not the brain, is the evolutionary approach. And the idea there is that, well, evolution, that is the greatest learning algorithm on earth. It created you and I, not just the brain but every animal and plant, every living creature that exists. So, why don't we try to evolve programs in the same way that nature evolves creatures? And we know roughly how evolution works: evolution is very much an algorithm. It's a search algorithm. It's something that's very similar to computer scientists; and indeed these days biologists do tend to view evolution that way. It starts out with a population; each one of them performs the task; the ones of them that do best, they are the fittest; and then they get to mate with each other and they produce offspring. And then the next generation will be better at the task. And it turns out we can do amazing things that way. So, that's the evolutionary approach. The analogizer approach in some ways is inspired by what people do, not the brain per se but more at the level of psychology. It's reasoning by analogy. A lot of the people who read the book, they often say, 'The analogizer, that's the one that really resonates with me.' Because people-- Russ: It's the easiest to understand, I think. Guest: Yeah. It's easy. It's a very natural one. We do that all the time. There's a lot of evidence from psychology, you know: when we see a new problem what we do is we try to retrieve from our memories similar situations that we were in, in the past. And then we extrapolate the solutions. Like, I am a doctor; I am diagnosing a patient; I find the patients in the past that had the most similar symptoms; and I hypothesize that this patient has the same diagnosis. Very simple idea, but it's very powerful. So, that's doing things based on analogy. It actually has inspirations from various quarters, but psychologists are probably the single most important ones. Russ: Of course, you have to be careful: you can pick the wrong analogy. Guest: Absolutely. You could go wrong. Russ: It's really the--I always think of a CEO (Chief Executive Officer) I once knew who confessed to me his company went bankrupt because he picked the wrong Harvard case study. He was a Harvard MBA (Master of Business Administration), and he "picked the wrong one." Whatever that means. That's a strange--interesting way that he thought about a disastrous end, financially. But I think that's the power of the case study approach, right? You say, 'Oh, I see it. It's just like--'. But of course sometimes you mis-see. And that's a challenge of that case[?]. Guest: That happens all the time. Even in, I mean in history: a lot of the biggest mistakes were based on the wrong analogies. Like: this war is like that one, so we're going to fight it the same way. And then, guess what? It wasn't. So, you know, you could pick the wrong case. Or, you could actually pick the right case, but transfer the wrong things. Whenever you are doing an analogy between two things, some things are similar, some things aren't. And if you transfer the wrong things, it could also fall flat that way. Nevertheless in practice there are many things that [?] very well with analogy. One that people are familiar with is these Recommender Systems--you look for similar people, people with similar tastes, and then you extrapolate from one to the other. Or things like Call Centers, right? You call up, you know, Delta, say, 'My PC (Personal Computer) is on the fritz,' and your problem is probably similar to other problems that they've encountered before. So they can try to extrapolate. So, yeah; this is a very interesting approach. And all of them have their pros and cons. Then there's the symbolic learning approach. And what the symbolists do is--again, this is more of a first principles approach. It's, 'We're not necessarily trying to imitate people. We're just trying to formulate induction s the inverse of deduction.' In the same way that subtraction is the inverse of addition: deduction is going from general rules to specific facts; induction is the opposite, but we can formulate it that way and solve it that way. Like, for example if I tell you that Socrates is a human and humans are mortal, I can infer by deduction that Socrates is mortal. That's deduction. Well, the inverse of that is to say, 'What information am I missing, if I know that Socrates is human, in order to infer that he is mortal?' And the information that you are missing is that humans are mortal. And so you can introduce a rule that way. And this is very, very powerful, because you can introduce different rules from different things, and then you can chain them together in new ways to answer completely different questions from the ones that you originally saw. And then finally there's the Bayesians. The Bayesians come from statistics. And what they do is they try to formulate, from first principles, what is the optimal solution to the learning problem? And the way they look at it is like this: I have a range of hypotheses that I could use to explain my data; I have some amount to which I believe each one of them a priori. I will always be uncertain about which is the right hypothesis, because induction is always uncertain. But what I am going to do is I am going to quantify the uncertainty with probability. And then, as I see more evidence, I update the probability with which I believe each hypothesis. So, the hypotheses that are consistent with the data will tend to become more probable, and the other ones will become less probable.

36:12

Russ: So, one of the things I found--as sort of a sub-theme of the book--is that these different styles of machine learning have their ups and downs in the profession. So, there are times when one of these 5 looks ascendant and the others look like they are not going to contribute anything again. But they often come back. Just comment on that. I found that fascinating. Guest: I mean, exactly. So, the machine learning has this amazing history, which is part of what I think made the book fun to write, and hopefully makes it fun to read as well: Is that in any given decade there is one of these paradigms that is sweeping all before it; but then come the next decade, the paradigm has fallen behind and another one is in the lead. And this is happening even as we speak. So, right now, the dominant paradigm is connectionism--you know, it's known by the name of Deep Learning. But Deep Learning is really just a type of connectionism. And this is actually the third coming of connectionism. Connectionism had its first heyday in the 1950s and 1960s. And the symbolists basically killed it by showing that there were all these important things that it couldn't do. And then it dead for like 10, 20 years. And then it had another coming in the 1980s, where, you know, some of those problems were solved. And then there were a lot of applications that were possible. But then it kind of faded away again: the Bayesians kind of ruled for a decade. And then, the analogizers ruled for another decade. But now, guess what? The connectionists have come back again. So it will be interesting to see, you know, what happens next decade. Maybe it will another one of these paradigms that comes up. Maybe connectionism really has taken off, you know, permanently. Or, maybe it will be an entirely new paradigm, which to me would actually be the most exciting outcome. Russ: You seem to--you argue that there is a Master Algorithm, a single unifying possible way to move forward that would allow us to invent everything that's ever been invented, learn everything that could ever be learned, and so on. So, make the case for why you think that could be true. Guest: Exactly. So that is what I think--is that none of these [?] 5 paradigms is going to solve the whole learning problem. And the reason is very simple: It's that each of these 5 paradigms has a particular problem that it deals with very well. And again, you know, there is a lot of promise there; but because all the side-problems are real, there is no single algorithm that actually is going to solve all of them. So, what you need to solve all of them is the unification of the five. And again, this is a very normal thing in science and in technology: Is, people develop all these different models of different things. But then somebody else comes along and unifies them into one. Right? The quintessential example of this is physics. Like, you know, Maxwell, unified electromagnetism and electricity and magnetism and light. And today the standard model unifies three of the four main forces of nature. And there are things like string theory that hopefully will be, you know, unify them, them with the last one, which is gravity. So in a way, what I think we need to look for, a lot of us have been looking for in machine learning, is a similar grand, unified theory of learning, that unifies. And the one algorithm, you know, all of these paradigms. That has all of these capabilities that each one of them has in a single algorithm. And then this algorithm will be able to learn all of the different things that these different types of learning are able to learn. And in principle, it will be able to learn absolutely anything. It's that, all of these, you know, master algorithms, from the individual paradigms--they have these theorems that say: If you give me enough data I can learn any function. But it's one thing to say that in theory. It's another thing to do that with a reasonable amount of data and computing. And this is where we have a lot of progress to make. But there is also, I think, where we can also at some point have an algorithm. And actually I don't think we are that far from having an algorithm that is essentially as good as each one of these in its own domain. And therefore can replace all of them. Russ: So, I--this desire for a grand, unified theory is very human. We try to do it in a lot of areas. But just to take the most simple form of human interaction, which is some sharing of emotions, right? So, you and I don't know each other. We've never met. Here we are talking for the first time. We talked for about 4 minutes before this started. We joked around a little bit. And there's certain things I would say to you that I would not say to other people. But there's many, many things I won't say to you, because, you know--I don't know you. That I save for my family, there are other things I save for my spouse; there are other things I say to myself--and thank goodness they don't go out loud. There's a unique way of communicating depending on the circle of intimacy that I'm interacting with at the time. So: Why should I think that there is one general way of learning? Why isn't it possible that there's different ways for different problems? Guest: No, so I mean, I actually agree with everything that you just said. But actually I think it is somewhat orthogonal to the issue we are talking about here, right? The--if we have a universal learning algorithm, first of all it will only learn things that can be learned. And it will only learn what the data that it's given allows it to learn. Right? At the end of the day, you still need to have the data to give to the algorithm so it will learn the things that it will learn. So, now, the question is: Why should there be one and not many? The truth is that there are many. Right? In the same way that there are many forces in nature. But you can unify them all into one model. So, really, you know, what a Master Algorithm does, is it just sees, it just shows what the relationships are between all these things: What are the ways in which this more general matrix can be specialized to each of these? And we don't know, at the end of the day, to what extent this can be done. But what we know from the history of science and technology is that this is a very productive enterprise. You often discover new things this way. You often are able to do things that you weren't, you know, you weren't able to do before. Even if at the end of--think about it this way: You know, most problems, it's an 80-20 rule, right? So, there's 20% of the work that does 80% of the job. And the Master Algorithm is not going to make, you know, engineering or specialization of different things unnecessary. It's just going to do80% of the job. You know, let me give you an analogy not from science but from technology. Think of the microprocessor. Right? There is nothing more important to the information is than the microprocessor. A microprocessor is actually very, you would think--it's one circuit that can be programmed to do anything. Before the microprocessors, people had to build a different design, and fabricate a different digital circuit for every job. And, you know, and there one time, Intel had this Japanese company that wanted them to build, you know, like 12 chips with 12 different things very quickly; and Intel was like, 'We can't do that. Let's just do one chip, and then program it to do different things.' Now, the thing about the microprocessor is that a microprocessor is not the best way to do anything. Whatever the problem is, there's always what's called an application-specific integrated circuit that will do it, that isn't a microprocessor. And yet, microprocessors are what we use to do 99.9% of things. Precisely because it's just one thing, and everybody can have it on their desktop or in their smartphone; and then it's just a matter of programming it. So, we do sacrifice some efficiency. But at the end of the day, it's all used for everything. And really, the Master Algorithm is really just for learning the same idea that a microprocessor is, for integrated circuits.

43:27

Russ: So, I'm going to read a somewhat lengthy paragraph that charmed me, from the book. And then I want to ask you a philosophical question about it. So here's the passage:
If you're a parent, the entire mystery of learning unfolds before your eyes in the first three years of your child's life. A newborn baby can't talk, walk, recognize objects, or even understand that an object continues to exist when the baby isn't looking at it. But month after month, in steps large and small, by trial and error, great conceptual leaps, the child figures out how the world works, how people behave, how to communicate. By a child's third birthday all this learning has coalesced into a stable self, a stream of consciousness that will continue throughout life. Older children and adults can time-travel--aka remember things past, but only so far back. If we could revisit ourselves as infants and toddlers and see the world again through those newborn eyes, much of what puzzles us about learning--even about existence itself--would suddenly seem obvious. But as it is, the greatest mystery in the universe is not how it begins or ends, or what infinitesimal threads it's woven from. It's what goes on in the small child's mind--how a pound of gray jelly can grow into the seat of consciousness.
So, I thought that was very beautiful. And then you imagined something called Robby the Robot, that would somehow simulate the experience and learn from, in the same way a child learns. So, talk about how Robby the Robot might work; and then I'll ask my philosophical question. Guest: Yes. So, there are several approaches to solving the problem of [?]. So, how can we create robots and computers that are as intelligent as people? And, you know, one of them, for example, is to mimic evolution. Another one is to just build a big knowledge base. But in some ways the most intriguing one is this idea of building a robot baby. Right? The existence proof of intelligence that we have as human beings--in fact, if we didn't have that we wouldn't even be trying for this. So, the idea of--so the path, one possible path to (AI) artificial intelligence, and the only one that we know is guaranteed to work, right? Is to actually have a real being in the real world learning from experience in the same way that a baby does. And so the ideal is the robot baby is--let's just create something that has a brain--but it doesn't have to be at the level of neurons, it's just at the level of capabilities--that has the same capabilities that the brain, that the mind, if you will, that a newborn baby has. And if it does have those capabilities and then we give it the same experience that a newborn baby has, then two or three years later we will have solved the problem. So, that's the promise of this approach. Russ: So, the thought, the philosophical thought that I had as I was down in the basement the other day with my wife and we were sorting through boxes of stuff that we don't look at except once a year when we go down in the basement and decide what to throw out and what to keep. And one of the boxes that we keep, even though we never examine it, except when we open, go down to the basement once a year to go down through the boxes, is: It's a box of stuffed animals that our children had when they were babies. And we just--we don't want to throw it out. I don't know if our kids will ever want to use them with their children--if they have children; our kids, we don't have any grandchildren but I think we imagine the possibility that they would be used again. But I think something else is going on there. And if our children were in the basement with us, going through that, and they saw the animal or the stuffed item that they had when they were, say, 2 and a half or 3 years old, that was incredibly precious to them--and of course has no value to them whatsoever to them right now--they would have, just as we have, as parents, they would have an incredible stab of emotional reaction. A nostalgia. A feeling that I can't imagine Robby the Robot would ever have. Am I wrong? [More to come, 47:19]