2016-12-29



His mission: To responsibly unleash the power of data to benefit all Americans.

On a recent episode of Recode Decode, hosted by Kara Swisher, U.S. Chief Data Scientist DJ Patil talked about his Silicon Valley roots, the implications of big data and how health-care data is the next frontier.

You can read some of the highlights from Kara’s interview with DJ at that link, or listen to it in the audio player above. Below, we’ve posted a lightly edited complete transcript of their conversation.

If you like this, be sure to subscribe to Recode Decode on iTunes, Google Play Music, TuneIn and Stitcher.

Transcript by Celia Fogel.

Kara Swisher: Today in the red chair is DJ Patil from the White House’s Office of Science and Technology Policy. DJ is the country’s first-ever chief data scientist and, since assuming that role in February of 2015, he has worked on several technological initiatives within the government and recruiting other tech talent to the White House. Previously he worked at places like LinkedIn, eBay, RelateIQ. DJ, welcome to Recode Decode.

DJ Patil: Thank you.

How ya doing?

I’m well.

Good. Let’s go a little bit into your background, because you’re a Silicon Valley guy. You’ve been around forever, correct?

I have, but not in the usual way.

Well, give me your background, because we’re going to get into what you’ve been doing as chief data scientist, which means you’re Chief Geek of the United States. Or one of them.

One of them. We think of POTUS as the Chief Geek.

Oh, he’s the chief, okay. Do we have to say POTUS? That’s right, POTUS.

Well, we could say the president, too.

The president.

It just sounds easier as a Twitter handle.

So talk a little bit about your background and how you got to where you got and why you’re qualified for this job.

Well, I actually grew up here in Silicon Valley. Moved here in 1984, Cupertino. Cupertino at that time ...

Why? What was the reason?

My dad, actually. He didn’t get tenured, so he decided to try and build a company instead.

What was he a professor of?

At MIT, he was electrical engineering and was working on the idea of how to actually build semiconductors in high density, a very large scale integration. People didn’t believe at the time that that could be done just through simulations and what’s called fabless semicon. So he took his company, which at that time was just an idea with some grad students called Patil Systems Incorporated. We were in Utah, where he had a small professorship, and then started out here, and the company got renamed as Cirrus Logic back then.

I’ve heard of it.

It was kind of one of the early days. But Cupertino, back then, was mostly Moffett Field support. A lot of military, a lot of Cupertino Electric. I went to a school named Monte Vista, which many people now know as a powerhouse. Back then, it was a very different place. I think there were eight Indians, maybe 20 Asians altogether. Just very different. It was phenomenal. You know, a lot of people had gun ownership. We went shooting and did a lot of target practice and those things. It was a place where you actually interacted with a very different version of Silicon Valley.

Right, farming.

Farming and agriculture.

And military.

And military. And so I had a very different experience growing up that way. Also, I wasn’t a very good student. So eventually — because of my math classes, oddly enough — I went to De Anza junior college, which most people know as the Flint Center. De Anza is one of those seminal institutions. This is why I’m such a big advocate of community college, because community college is what got me sorted out. My girlfriend was taking this class called calculus, I took the same classes she did, and it turned out I fell in love with it. It was amazing. So it was off to the races for me.

A lot of people I talk to were mathematicians in school early, in the fourth grade, and they had a parent that pushed them. But you weren’t.

No, I was quite the opposite. My dad was so incredibly busy trying to make a company that there was no time to push me. Everyone just wandered around and hung out by themselves. It was a very safe community. So there wasn’t a lot of pressure. But I think what it did do is it gave me space to be creative and try lots of other things.

I was doing many other things. I learned how to etch my own chip designs and all these things, because there were a lot of vocational classes for electronics. I learned how to do drafting. But it was very different.

I was able to take that community college experience [and] go to UCSD. [I] did my undergraduate degree in very theoretical mathematics, but very much working on data. I was really interested in oceanography and those things with data. Then graduated rather quickly and was able to go to University of Maryland, where I did my doctorate in nonlinear dynamics and chaos theory with the guy who coined the term “chaos theory.” I was there for about 10 years.

What were you trying to do there? What was the goal? What were you trying to study? Explain the theory for idiots.

Sure. So the idea of chaos theory is that the world is incredibly susceptible to small changes. So a butterfly flapping its wings in one place could cause ...

Right, the old Ray Bradbury, right?

Exactly ... could cause a tornado somewhere else, or a lack of a tornado. So a lot of people thought, well, this had been understood for weather. So we took a very fresh look at it and we showed that when you look at that five-day, seven-day, 10-day forecast when you open up the newspaper, or now just open up an app, how do you quantify whether that’s a good forecast or bad? How do you quantify the relative margins of success of that forecast, the quality of it?

What we found is that in different times, you could quantify that. You could say this is a storm, but the storm has a high degree of unpredictability or very high degree of predictability by running many simulations and actually figuring out how you could fly a plane out there to take the observations, to dramatically improve the forecast or just say, “There’s nothing you can do to improve this forecast, just because it is so high-dimensional chaotic.” That became a big deal, because it helped all the major weather forecast centers change the way they approached this and is still being implemented to this day. It’s an idea called the Maryland Ensemble Kalman Filter.

Oh!

I know, we’re very original.

Just trips off the tongue.

Exactly. And the cool thing about this is, you know, we talk about scale often here in Silicon Valley. The [place where] I’ve had honestly the greatest scale is my weather work. Because … think about the population of the world that receives a weather forecast and depends on it.

It’s everybody.

It’s almost everybody. You know, of the nearly seven billion people now, let’s call it three to four billion. So I feel pretty good on that side of what we’re able to do. And people forget often that some of these unsexy areas actually have the greatest lift.

For effect of the human race.

For effect to the human race.

So you did weather.

I did weather.

And then?

9/11 happened. So I was part of the second wave of people who were asked to come in to think about threats against U.S. interests and the idea of, how do we use large amounts of data to find signaled noise. And there’s this question because the 9/11 hijackers …

Yes, there was a lot of data.

There was a lot of data, but we didn’t see the signal, so what is right, what is wrong. Also, at that time, there were a lot of questions around privacy and security and responsibility of what’s happening. We’re having a similar parallel conversation today. Then, the program was called Total Information Awareness, and I was at the side of one of the people that was asked to come and help right beside that program, make sure it was in the right. That’s why ethics, and the ethics of data, is so critical to me — because it was like, “How did we get there with these things?” So we did a lot of that work and also ended up doing a lot of work in bioweapons proliferation prevention in central Asia, finding places that were doing bad things and figuring out what to do.

So you were looking at data to try to get signals of attacks, or …

Also going in country.

To find out how we could better understand.

Yeah, how do you bring together — not just as some esoteric step backward from the world, look at data and do it from a lab, but using the combination of data in the field to make smarter decisions.

Right, so as opposed to the old intelligence, which was hand to hand people …

That’s right. It was a very augmented approach [to] thinking about the problem. And those were early days. The interesting thing is, we often have this narrative right now of Silicon Valley is coming to save D.C. They forget that …

No, it’s just Silicon Valley that has that [idea], but go ahead.

That’s true. And everyone forgets that all of us that were part of that big data wave, here in Silicon Valley, almost all of us came from the national security apparatus. We were all doing this in some form. Especially the people who were helping fight fraud at eBay or PayPal or those types of things.

Sure. Well, the government did invent the internet. Forgetful as we are of that.

Well, and also self-driving cars come from DARPA. This is one of the things that we forget, that the spark is often national. The flame, and the culturing of the flame and making sure it all works, that is the rest of all the muscles that we have built out of time and stuff like that.

So you’re working in the government on a very important issue, obviously everyone’s concerned with attacks and how to prevent them. Why did you shift here? What was the impetus to do that? Because you can’t have stopped working on that problem, it’s still not solved.

No, and it was time to pass the baton. One of the big things for us was [that] my wife and I were commuting and we had a child coming on the way. And so we had to figure out, how do we be in the same place together. So we just both packed up our bags, she was in New York, I was in D.C., and we just relocated here and saw what would happen. And the interesting thing is, most companies passed on me. You know, all the usual names …

Because?

Didn’t think I had much to add. They said, "Well, we’ll see what you could do." Luckily enough, my mom was at a dinner party and happened to see Rajiv Dutta, who was president of Skype at the time, and she harassed him into taking a call with me.

“Talk to my son.”

“Talk to my son, do me a favor, talk to this kid.” And he had the foresight to say, like, “Hey, maybe there’s actually value here.” I had some other friends working at eBay at the time, and they said there was this new initiative to work across these companies. So I was able to get in and start building things. And when we started building things, one of the things that people didn’t realize at the time was [that] an adversary who was attacking you was evolving faster than you could ever build rules. Especially on fraud or security. So we had to take a different approach.

We took an approach that was, now, what people would call very similar to deep learning. It was neural networks, it was fast training, all these things. It was an idea that has been in national security and government circles for a long time and we just applied it.

And what was the big problem there, that people were just — fraud all over?

Fraud, and you have a bad guy who’s just, you know: You found a hole, you’ve patched it, and they found another way. And your cycle time is so fast. I remember this, that even at some of these companies, the time for greatest attack is like 5 pm on Friday, and the attacks keep going till like Sunday night. [laughs]

Yeah, because you’re not there to fix it.

They’re not there to fix it. They know your down times, they’re very motivated to find your weaknesses in any dimension and you need to augment systems. This is why I think the work that we’re doing on artificial intelligence at the White House is so important. We think of this as, you know, big infrastructure and super-high-power data scientists and machine-learning people doing this, but the person who’s got your medical records is some physician in practice with three people, and now we have all that information digital. Good, because it helps us get better care faster — but how do we protect them? How do we make sure that their systems are just as well protected?

They’re not.

They’re not, and that’s ... How do we bring up everyone’s? And they’re not for many different layers. It turns out that’s been one of the most interesting things. One reason is because the infrastructure of the technology that they’re built on is very old and antiquated. A lot of these things are built on top of billing systems. The other is, the classic two-factor authentication and other good hygiene techniques that are just 101.

Right, I was just yelling at someone about that.

Yeah. I get around, and in my talks or if I’m talking to a group, I just always — and I just did this at a major hospital — [ask] how many people in the audience of the physicians had two-factor auth on. It was only like a third. They didn’t even know what it was, and I think it’s our fault because we call it “two-factor auth,” which is like ...

Yes, all your names so far have just been awful. [DP laughs] So you were doing fraud there, trying to prevent fraud. And I probably saw the beginnings of the stuff that’s going on now, like how they become more and more sophisticated, these players.

That’s right. The evolutionary war has been going on for a substantially long period of time. And what has happened is, we’re just seeing it now bleed over to more and more areas. Especially as we get online and especially with the Internet of Things and people not realizing what can happen.

One of the classic problems that we’re also seeing is the way the people are trained. So if you have a person who’s training and taking Computer Science 101 and they’re learning about a database, they learn about a database but they never learn about an overflow attack or any of the classic ways that you can compromise a database. Same way with scripting or any of these things. So our belief is, and what we’ve called for, is [that] every student that is training in any technical area must have security and ethics built in as part of the core curriculum. Because now we’re in that day and era where somebody builds one of these Internet of Things and they haven’t even thought through what the attacker is going to be like.

Right, exactly. We’ll get to that in a minute. So you worked there, and then you worked at LinkedIn and ...

I became real good friends with Reid Hoffman, and when we were thinking about what’s the thing that could carry LinkedIn to the next level, [we] realized that it’s sitting on an amazing data asset.

Right, it is all data, that’s why it got bought.

And how do you make that come alive? One of the big differences that we did is, like many of the other companies, data was like a research team, or some R&D spent a lot of time writing papers, and then you try to go knock on the door of product and they say, “Not now, we got this other idea.” We flipped it completely around [to] where I was one of the direct product leads. And so I was able to take those ideas and advocate for them with the rest of the product teams and say, “How can we make this?”

So you were trying to take the data that you’d collected and make it useful to the users.

That’s right. So everything from people you may know to you’ve viewed this profile, you might like this profile, jobs you might like, all of those areas.

Why was there that disconnect? Because it seems like data would be at the center of all product development.

I think one of the challenges we have is we have a lot of people who don’t actually come from technical backgrounds but they’re more focused on, “How do I just get this thing to grow? How do I do this?” And this is like classic marketing slash product but not true deep technical product. It’s a veneer of product, rather than saying, “Okay, how am I truly going to make this come alive and make this system work for you and make it super simple.”

And you know, if we look at a lot of our systems, we go, “Geez, why doesn’t it work that way?” We’re seeing this, and one of the things that’s been I think really interesting to me is we’re starting to see voice. You know, one of the things is whether it’s, “What do we call this voice, what word do we use? Do we use Nicole versus Sally versus Tom versus Jerome versus Tanya?”

These things actually have big impacts. But also their tenor and how they talk to us, the softness. They don’t have that approach. And so they almost have an arrogance or coldness. It’s off-putting, but you can make data actually fail gracefully and then you will help it along and you will make it better.

Really?

You want a classic example?

All I want to do is kill Siri. That’s all I want to do. I hate Siri.

[laughs] Well, so like it doesn’t correct, right? Autocorrect. Why doesn’t autocorrect ... every time, you’re like, “I entered this five times.”

Right, exactly. Why doesn’t autocorrect correct?

That’s a failure of the feedback loop as it’s getting data to actually turn that into your system. Now imagine that was equivalent to data that was coming in from an attacker. If the system didn’t take that in right away, you would lose a lot of money. And so they figured out that the economic incentive is so much on that side that version of the technology has not been put in that way. Like autofill, autocomplete, those are in some areas, but the ubiquity of this for everywhere it could be is astonishing. In fact, one of the things that we’ve been really focusing on is how do you get that into every government process.

Right, where you learn.

That’s right, where it learns.

We’re going to talk about that after the break. So you went from these companies. How did you get to the White House?

It wasn’t my intention, to be honest.

Or back to government.

So one of the things I’ve always wanted to do was go back to public service in some form. I’ve never known if that’s national or just, you know, helping out your local town. The thing that was particularly ... when you get that call, it’s weird. You don’t know how to respond. You’re like, what does this even mean? What do you do? To be honest, the thing I first did is I went and asked my wife, because RelateIQ had just been acquired and so there’s obviously a lot of — as everyone knows here — if you don’t stay [long] enough, there’s a lot of implications.

Sure.

So I asked my wife what she’d do. And she didn’t even bat an eye. She was like, “You have to put your hat in the ring, you have to start thinking about it.” The thing that I think she saw that I couldn’t see is that we always love to talk about mission and the mission of our companies and that’s important. But when you’re in these roles, you don’t worry about stock options, you don’t worry about ...

Right, it’s just the mission.

It’s just the mission. There’s nothing else other than the mission. The secretary of defense has a great way of saying it. He says, “There’s nothing more powerful than waking up in the morning knowing you’re part of something bigger.” And once you’re part of it, it’s extraordinarily liberating.

It can be. It absolutely can be. Because it doesn’t have the same implications or incentives attached to it.

That’s right. And the fear and the paranoia you have is letting a portion of society down or making a catastrophic error in which people are going to get hurt or killed.

Right.

And that is rewarding in a remarkable way.

It’s a different set of incentives for sure.

It’s different incentives.

We’re here with DJ Patil, the chief data scientist of the United States of America. He’s ending his tenure now, he showed up in early 2015. When we get back we’re going to talk about some of the things he’s been doing as chief data scientist and later we’ll be talking also about what happens next after the Obama presidency, which has been one of the most tech friendly or tech fast-forward presidencies and what’s coming afterward.

[ad]

We’re here with DJ Patil, the chief data scientist of the United States of America. You work for the Office of Science and Technology, which reports to the president. Let’s talk a little bit about your tenure there. So you were just talking about how you went there, you have to do it, it’s a big job. You had previously been in government, unlike a lot of techies, and you liked it. You liked your time in government presumably.

You’re the first chief data scientist; why is that? And given data is so important to our government and we churn out so much of it — the government does — what do you think you’ve accomplished? Let’s talk about a couple things — and [what] you’re going to do — in the next short amount of time.

I think the thing that’s most interesting is first, why do we need a new title? Like, what’s this role? And the part that I’ve always found incredibly curious to me is how do you get a president who’s really a constitutional law professor so excited about data and technology. And I think it really comes back from his time at his first campaign where he saw technology and data in particular transformative for how to reach out to the electorate and interact with them.

And every turn where they have leveraged data and technology, the administration has gotten disproportionate benefit and ability to be effective, and so it’s become more natural. There’s effectively a chief economist, the chair of Council of Economic Advisers. There’s a chief statistician. There’s a lot of really great people with data. What’s been fascinating is they don’t always talk to each other. And what our role also is, what is the usable data for everybody.

And so like all things in the government, you start with a mission. When we were thinking about our mission, the one we really settled on with the president is to responsibly unleash the power of data to benefit all Americans. You know, so we’ll [get a] return on our investment on data.

You’re collecting all this data and you want to do something with it.

We want to do something with it.

Now cities have been doing that for much longer, correct?

No, the government ... well, it depends. I guess the government has because we have a census and so the census has been going on for the entire history of the country. And that has been some of our most powerful data. But there’s a lot of other data, the economic forecasts, and the weather data. All of these things are as core essence, the source of truth. And there’s an incredible amount of process that goes on to make this data.

What hasn’t happened is the ability to flip this around say, “What happens when we open it up and how do people use it?” That’s why responsibility is in there as well. It’s chosen very carefully because opening up weather data is very different than opening up health data. I guess the way you would say it is if you want to get some data, where do you go to get that? How do you do it?

Right, where do you?

So it’s data.gov. It’s actually really simple. That is the one-stop area where everyone can go download the data that they want.

That’s available, the data that’s available.

That’s available, that’s currently out there. And the president has an executive order that says all data that the federal government produces by default now must be open and machine readable. The thing is, we’re not there 100 percent across everything because there’s a lot of legacy systems.

What, were they on pieces of paper before?

Or PDFs.

Right, PDFs. Yeah. Sure.

A lot of PDFs. And so the data isn’t actually usable. And there’s no cycle for if you’re looking at that data and you say, “Hey, I found a problem,” there’s no way to tell the person who’s producing the data, “Hey, I found an error.”

Also, people haven’t realized the disproportionate value that happens when that data’s opened up. And as an example, there was this kid that we have, we have the White House science fair now, thanks to this president, and the year before last we had this kid, he was 17 years old at the time, and he wanted to use artificial intelligence, machine learning algorithms, just to play with. So he found an open data set called DB Gap, which is basically DNA snippet data that’s relative to cancer.

Which is government data?

It’s government data, it’s held by the National Institutes for Health. He’s looking at these sites on the genome that are relative for cancer, how to think about cancer. His algorithms compete with the best algorithms out there in the world on AI. And he’s 17 years old, he just doesn’t know better. And he’s just playing with open data. The way I got my work ...

And what was his point? What was he trying to do?

He’s just actually playing with AI and then he’s falling into this area of realizing like, “Wait, actually I can work on cancer.” And so now he’s able to broaden his research. My own weather data, I was downloading every night weather data on all the computers I could get my hands on from the National Weather Service. So my own research is literally built on open data. I would not have my entire research career if it wasn’t for the weather [data] that was put out there.

So your goal, number one, as data scientist was to open up more and more data, correct?

Open up more data and make it usable and then have people use the data.

Meaning readable.

Not just readable but you should build something with that data. So one of those examples is precision medicine.

Which has been a big initiative.

Which is a big initiative, the president announced it the year before last State of the Union.

And why precision medicine?

The big thing here [is], we now can get our genome sequenced. The costs have gone [down], they’re just dropping every decade just radically. So we’re now at about $1,000 and a little while ago it was $10,000, before that was $10 million ...

And it will be free.

Yeah exactly, it’s going to be paid for. Why would it get paid for?

Because it’s data.

Well, not only because ... so that gets to a really important point.

Can you sell your data.

Like who owns the data? Which is a big problem we work on. But before we get to that, the part there is actually if you get it sequenced, when you go to the pharmacist, why is it that nobody says, “Hey, is your genome on file?” Nobody checks if there’s anything in there.

It’s not, I was just at a pharmacy.

And even so, the drugs that you get, the pill that you get, the pill that I get, it’s not even often tested truly across ethnic or gender. And so we have a real [problem]: Truly to enter the genomic era, we need a different approach.

I think the way we do medicine right now, it’s like an expression a friend of mine once wrote, it’s like throwing a hammer at a piano to make music.

In some cases, yes.

You know, just everyone gets the same ...

Everyone gets broad brushstroke things. There are populations that get highly highly tailored medicine.

That would be white guys, correct?

Well, research institutions. It’s basically you’re around research institutions and the challenge there is it is definitely upper middle class that has access to that. So we’re not giving the population at large ...

So your point in bringing the data together around precision medicine is that we’re wasting efforts by not precisely medicating people correctly.

Well, to do this first, what we need to do is we don’t even have a place where we have a consistent data set that says, “Hey, this is the genomic data of a large population of every ethnicity, all our gender diversity, and we’re following it along to look for those classic big data correlations.” What’s fascinating about this is we don’t even look at some of the basic results that are not even genomic, but population health, that says, “Oh, look, by the way over here, here’s a population that has increased amounts of chronic fatigue syndrome …”

Or diabetes.

Or diabetes. The Vioxx signature, for those that remember the Vioxx [recall], there’s a clear-cut signal that says Vioxx was causing a problem. No one looked at the data. Like if you think of a monitoring system, nobody monitored it.

Why is that? They’re producing these reams and reams of data, and we are doing that every day on everything, our movements, our traffic movements, what we eat, what we buy, everything we’re doing is producing [data]. And now with the phones, where we go, everything we download, everything we look at. So you pick precision medicine because it was one area of waste, presumably, that we’re wasting ...

No, not just waste, but opportunity. There’s a lot of people who this would massively benefit. A lot of people with so-called rare genomic disorders that aren’t so rare. They’re actually very common.

Right. If we understood the patterns.

If we understood the patterns.

And we medicated them correctly.

That’s right.

So where are we right now in that? Where have you taken this?

So there’s a few pieces that have happened that are necessary to make this work. First of all is that we have to have the health records in a digital form, and those have to be safe and secure. [KS laughs] So the good news, here’s the good news: Over 10 years ago it was something like 90 percent were still on triplicate paper. Now it’s 97 percent of hospitals are on electronic medical records. The downside is your doctor’s spending too much time typing than talking to you.

Right, exactly.

And so we have a human computer interaction problem that needs to get solved. The other problem is that there’s a question of whose data is it. And we believe in a world where if it’s your data sitting in some type of hospital database that’s your record, you should have access to it. And you should have the ability to correct any errors that are in there. And all of that to make a better system.

And you should not have the ability for one group to block your information going to another, which has been happening. And we’ve put a lot of rules in the other incentive structures in place to do that. The second part of that is how do we get together to build this type of research cohort? How do we get to make it an all-volunteer effort, national effort, to do this? And the National Institutes for Health will be driving that new program and that will be coming out late this year, beginning of next year as they start doing more and more prototypes of how to collect that data, build it out, in high quality.

But we have another one that turns out to be a cornerstone, that is veterans at the Veterans Affairs, have over 500,000 [who] have stepped up to say they want to continue to serve and have given their data to do really high-quality sequencing.

Oh, after they’ve left the military.

After they’ve already left the military. And they’re just saying, “Use the data to help another veteran.” That dataset has the potential to unlock unbelievable insights including on cancer and other types of diseases.

And these veterans themselves.

That’s right.

Because they probably suffer a unique set of circumstances.

That’s right. And those first set of research results will be coming back later this year and will be aiming ... they have to go through the classic science process.

The goal presumably in the end is to get everybody.

That’s right. And these are broadly, broadly applicable things. Can I just tell you about one problem of this?

Sure. Absolutely.

So very recently there was a research team — Zak Kohane is the leader of this in Boston — he was able to show that there’s a test that has been done on African-American males for sudden cardiac death syndrome.

You particularly hear about this with a player playing sports and they collapse. And that genomic test has been giving a lot of false positives. Why has it been giving false positives? Because there haven’t been enough healthy African-American males in that research cohort. So a lot of people have been misdiagnosed. And it also shows that the complexity of how the genome is is far greater.

One of the things that people miss about this, there’s two thing people often forget about. For our health-care system, one of the reasons the Affordable Care Act that the president put in place is so critical is that it says you cannot be limited in your health care, there’s no preexisting conditions. When we get to the genome, every one of us has a preexisting condition, it’s being human.

The other thing that’s in there, as we start to move forward with this, is just that we forget America’s an incredibly diverse place. And the diversity of our population as we go after the next generation of health care, our diversity as a population is our asset.

We have lots of signals.

We have a lot of signals. And the fact that we have such an amazing ethnicity diversity across the country is going to give us insights that are going to help one population versus another in ways that I don’t think we even can unlock.

If we continue with this.

If we continue with this.

Right. So the second thing you’re doing is around policing. Can you talk about that briefly, what you’re doing? And of course it’s an enormous topic because one of the issues obviously in this election and everywhere else is how different communities are treated differently.

Absolutely. So the president in the wake of Ferguson and a lot of the shootings and the race relations that we’re seeing across the country, he stood up a group to give them a report on 21st century policing, task force on 21st century policing. A large number of those recommendations all say data and technology. Use data and technology. Body cameras, collect the data. How much do we know about police shooting? Do we know about what’s called low-level use of force? An office pushing you, all these other things. Turns out there isn’t a lot of technology being used here. There’s not a lot of data. Data’s not even ...

Well, body cameras. Body cameras is the way people think of it.

Body cameras is a kind of like the carte blanche solution. So we have taken that on to say, what does that actually look like? And there’s two big projects we’ve stood up. One is the president’s police data initiative. And the other is data-driven justice. Police data initiatives works with police departments to say, “Hey, how can we actually open up our data to provide transparency and people can use it to help us think through problems faster?”

So if something’s happening in a certain area, it starts to see patterns.

See patterns or also, is this a policing structure that we want?

Right, is it working.

One of the first things that we find is when the police departments try to release the data, they realize their data hasn’t been collected well. So they actually can’t make good assessments.

Right.

Or they don’t even know.

Or everything is anecdotal.

Exactly. Everything is just ...

When it’s not anecdotal.

It’s not. And so the very basic questions of just use of force and these type of things are out there.

Anecdotal is always wrong, I think. Is that correct? [laughs]

Almost always.

Almost always.

And one of the shocking things is what you expect from one department could be totally different from another department, even if they’re just a few miles from each other. Because there’s no way for them to share or collaborate.

Or use best practices.

There’s no best practices. So what we did is we have now over 40 million Americans are covered by this thing, all the major cities. And all the police departments get together and they meet every other week to talk about how they’re opening up the data, how do they make this usable, and what can come about from it.

Just one of the things that’s been really fascinating, because we often think of this as just police transparency, there’s a department in the South that has been working with the University of Chicago to look at excessive use of force data. And so these data scientists came in and they started looking at machine learning techniques.

And they started going, “What are the features that cause this problem?” And the first set of signals are all the usual suspects. You know, “You’re a bad actor, you obviously shouldn’t be here.” Suddenly in the middle, two interesting signals show up. One is that you responded recently to a suicide. Another one is you responded to domestic violence where a child was present.

Oh, they got upset.

They got upset. So imagine, you go to one of the things, suicide’s exceptionally messy. It’s a very horrible situation, emotionally, physically. And suddenly you’re done, you’ve written up the case, and somebody says, “Get out there,” and you’re back on beat patrol. And somebody’s flippant with you. Like, where did the system fail? The dispatch system doesn’t take this data into account to say ...

“Don’t send this guy.”

“Let’s give this officer — male, female — like, give this officer time to decompress.” Let’s treat them as a human rather than a robot.

Sure. It’s so easy.

So that is now being put into place. They also realized that, gee, domestic violence where a child is present, fights break out a lot. Highly emotionally charged. Why are we sending just two officers? Send more officers. Because that’s going to stabilize the situation. So they’re now taking this test-and-iterate model and measure for effectively to do this. So that’s one side.

I was just recently at an event in Oakland. In Oakland they’re doing voice recognition — everyone’s focused on the data cameras on people, which have yielded very emotional things and a lot of proof of real violence, although it only tells part of the story, like you said. You don’t know what happened before that.

One of the things that was interesting is how they could tell who the police officer was talking to, their race, by the language, by word clouds. With African-American people they would say “hey” and “man.” With white people they would say “sir” and “hello,” which are kinder words, which are obviously more respectful words. And it was really interesting to see it. And they were trying to figure out what they can do from that. People don’t even realize what they’re saying. Although you think they would.

That’s why the open data actually is so important. Because if we don’t get that into other people’s hands, the police department has no money.

Are they open to this or resistant, “We’re not racist, we’re not this, we’re not that.”

So it depends on the city, but by and large we have found an unbelievable ... These 129 cities have stepped up for police data initiative, they’re some of the biggest. They’re like LAPD and Oakland and they want to do this.

I’ll tell you, one of the troubling things that we all have to watch for. There’s increased legislation at the state level that is coming in to say very classically, “This is our transparency measure,” but they aren’t. They actually prevent people from accessing the data. And we have to really look closely at, are those in the police officer’s best interests? Are they truly in the citizen’s best interest or are they just some de facto [situation], somebody writing that rule that just prevents anybody from really doing anything useful for finding a solution.

That’s the place that we really need to work on. Because otherwise we don’t know. And I have no idea if what you see from Oakland, with the words that they use, is different from what we see in LA or even San Jose.

Sure, it was just an interesting way to use data. And get some learning from it.

Exactly. And that’s what we’re seeing. That’s like this case where people are using this machine learning with these things, we have to unleash the potential of people doing smart things with the data. And flipping it around. Can I give you the other one?

Sure, absolutely. And then we’re going to get back and talk about where things are going.

Sure. So the data-driven justice, and that covers now 91 million Americans, has 130 plus now — I think, it keeps going up — cities that have joined in. And what it does is it says, “Our criminal justice system right now is causing a ridiculous drain on society.” And we all know this. But it is, just as a kind of quick numbers, we have 11.4 million people that are going through 3,100 jails every year.

So that number just doesn’t make sense. And we’re not talking prison. Actually 90-plus percent will never go on to prison. And they on average stay there 23 days. So we have created a cycle where we just cycle people through our local jails. That cost structure is one of our most expensive cost structures. Cook County jail, which is one of our largest single institution jails in the United States, one third of the population is mentally ill. Why are they sitting in jail? We should get them to the treatment they want.

They need.

They need! They need that treatment. And we also have opioid issues, all these things. So we need to get people to the care they need rather than forcing people to languish in jail. What we have found is that if you take your data from your department of corrections, policing, that whole infrastructure, criminal justice, and move it over to the health side, you can very quickly identify who are the people that you see most often and need help.

And we’re not talking like crazy big data. We’re talking like, “Have you seen so and so?” “No, go fish.” It’s like pass the spreadsheet. And the cost savings are unbelievable. Miami Dade, Florida did this, they trained their officers in crisis intervention to get them into the right type of mental health thing. Year 1 they saved more than $10 million but more importantly they closed a full jail.

Well, could it be that people don’t want those jails closed? I mean, there’s a financial incentive for many people.

Typically that’s prisons, that’s the private prisons. This is local jail. So these people never even get to the prison side of this problem. And they actually never were going to get to the prison. They are people who are just cycling.

Right. It’s just using data to bring costs down of our government.

Well, and get people to the right care they need, most importantly. Stabilize them. And so that’s the part that’s particularly exciting. And that’s why this data-driven justice initiative, when we just move some data a little bit, we can see large-scale transformation of society.

How people behave. How governments behave.

How government behaves.

All right, we’re here with DJ Patil, the chief data scientist, who is making a lot of sense, which is really disturbing to me on so many levels. [laughs] The government can work! But first ...

[ad]

I’m here with DJ Patil, the chief data scientist of the United States of America. We’re talking about fascinating things like improving policing through data, improving health through data. Everything gets help through data! Everything gets help through data! We have so much data.

But there’s also the downside of data, too.

All right, tell me.

One of the things we’re very concerned about is the intersection of big data and civil rights. We released two reports on this, on not only privacy but data and algorithms and what happens when people don’t have transparency algorithms.

You know, people do even nefarious things with data. We’ve seen recent examples of this. When you go in front of a judge, people are using data, you don’t know where that data comes from, how it’s being used, to make an assessment about the type of bail you should get. It’s been shown that some of these are what you might deem as racist. And how do we make sure that someone’s not just slapping a label on something that says “big data solution” or “data science solution” and people go, “Ooh, that’s good.” We have to think through those things. If people think, “Oh, that’s just in these kind of net small areas,” there’s all sorts of cases. And we’ve seen this on image search or the fact that even Pokémon Go or other places of e-commerce sites, they haven’t supported certain environments because there’s a lack of data.

How do we think about that? How do we make sure that that has happened? That’s why it’s been so important that anybody who’s out there and anybody who’s listening to this, you’re a data scientist and you do not have training on ethics, you better go get some. If you’re in a data science training program and they’re not teaching you about ethics, you are not in the program that is the cutting edge.

That’s part of a bigger problem in Silicon Valley in general, don’t you find?

Diversity of teams in all of this. One of the fascinating things is, there’s a picture I love that we have which is called “One of the Jumbos,” and there’s the footwear of all the people just standing around who are part of the president’s national security team. There’s pumps, there’s all shoes of all types. You can tell the color of the feet, all types of ethnicities. That’s how we get to better decisions, when we have the diversity.

The Obama administration is quite diverse, comparatively. I mean, here you must be like, “Hello.”

In fact, when I look at the U.S. Digital Service or I look at 18 F or any of these tech employers ...

No, I’ve met a lot of them. It looks like Silicon Valley should look.

Exactly. Even the CTO team. That’s how we’re supposed to be.

So why isn’t it? You worked in all these companies ...

I think it’s just frankly laziness.

I do too. As you know, I’ve said it 100 times. We have a show on diversity this week, that we just pulled out all the thoughts that people we’ve talked to around diversity. And why ... you know, here you are trying to build something for the U.S. government that has a diverse mentality, a diverse point of view. And I’m not just thinking about race and gender; it’s age, disability, all kinds of experience, economic level. What happens here in Silicon Valley when you come back here? What is your assessment?

I have two reactions. The first is, I wish people would get out and see more of the country. You know, we talk about user research but our user researchers usually go hang out at the corner coffee shop and talk to a few people. I wish people would go out and hang out in the middle of Iowa. I wish they would go out to Texas. I wish they would go out to New Orleans. I wish they would get out to North Dakota. And see what people need. I think you would see that there’s a very different world out there.

When we say, “Oh, that’s just an edge case.” Those aren’t edge cases, those are huge populations and they have names. And when you put the names against them and you start actually talking to them, you see a different world problem. I find it really interesting that if you look at many of the people who were in that so called “data wave,” we were first national security people, then we were social network kind of people. Now a lot of those people are all in some form of health care. You know, cancer research and other type of areas.

And I think there’s a different focus that as people have kind of started to wake up and say, “Ah, there’s more out there.” To really get there, I think we also need our venture capital community to also reflect those values and the diversity.

Yeah. Not so much.

And you’ve written and talked about this. And I see when we evaluate a problem, it is so unbelievably liberating when you’re around a team and somebody’s like, “Have you thought about it this way? Did you know about this?” And you’re like, “No.” And no one’s hating you for having that blind spot, but people are using it as an opportunity to tell you about a different group that you haven’t been able to interact with.

Do you imagine you’ll come back to Silicon Valley after?

I will. And the specific reason I will is my kids are here and my wife wants to be here. But I’ll tell you, like one of the things that’s a challenge is, I have two tough things for me as a parent. One is our kids go to a school where they don’t interact with a broad ethnicity. Two is they don’t interact with society where the people that support our services and our most critical infrastructure, including those that serve — and whether it’s police, firemen, or fire people, Air Force, Marines, pick your favorite — they don’t meet those people. They don’t interact with them. And one of the most powerful things that I’ve taken away this year is watching my kids experience that. And see that their world has shifted. And seeing that world and how far we are from it is ...

You mean it’s not all froyo and Palo Alto?

Exactly.

What? What?

It’s not just Pokémon walking around and just those types of things. Not to hate on Pokémon Go, in fact one of the most popular places to play it is the White House. But it’s ... I think we have an opportunity to tackle some of our hardest problems.

I love the fact that there’s increasing numbers of people who are going into prisons, and David Hornik is one of those people who’s really kind of championed saying, “What can we do there?” Or people who have started saying, “What can we do about the homeless population and how do we think about that not just as a narrow niche?” There’s so much more that we have the opportunity to do and it’s not philanthropy.

What do you want to do? What would you do then? And I want to know what will happen to all these Obama administration initiatives. Because you know, one of the things that we’re looking at in this election besides just the horror of it, the fresh horror every day, is the idea that neither of these candidates are tech-forward. One less so than the other. Trump for sure, he’s back in the 70s, and I don’t think Secretary Clinton is very far along. Not where Obama is for sure.

Our focus right now is to maximize the opportunity we have.

How many days do you have?

We have 10 weeks left. [KS laughs] I don’t go by days, and the reason I don’t go by days is if you measure things in weeks, you can run a two-minute drill and you can actually plan and game out what you can and can’t accomplish and be hyper-effective. Days are just a way of getting yourself freaked out. But weeks is a shipping schedule. So we have very clear milestones.

So what are you shipping?

Well ...

And how do you keep it in place?

The biggest one that we do is we work with the agencies and ...

Right, the individual, you’ve put people in the U.S. digital service all over.

There’s people but there’s also these programs. So for example the police data initiative has really been developed jointly with the Department of Justice and what’s called the COPS Office, which is Cooperative Policing Unit. And that’s where it lives. And that’s who basically runs it. Precision medicine is run by the National Institutes for Health. It’s being run by Eric Dishman, who is not only a person who got this type of personalized treatment to save his life and even received an organ as a result, but is a deep technologist.

And he’ll stay.

He’s going to stay.

These are not appointed ...

This is a career person who is not just, you know, this is a real solid hitter on every dimension and he’s going to run that with a great team that is being built out of the National Institutes for Health. So our programs graduate and they go into different places. And many of them have continued to be there as things that are going to happen. And so we look at it as a measure of these things — in the John Lilly way of framing these things and others — is like we have to fire ourselves out of jobs.

As you move them to other places.

As we move to other places.

But I think the worry is here’s a presidency that has pushed these things into these areas. What happens if there’s not that? And there will not be that, from what I can tell.

Well, I can’t comment on any of the elections because we have the Hatch Act, obviously. But the part there that we believe is the case — and here’s the thing that I found in every case of working with an agency, is the agencies have and the career staff have wanted to do this for a long time. They’ve been saying like, “Where have you guys been and why haven’t you unleashed us to let us do this?” And so now that they’ve been unleashed and they have the runway and the mandate, they’re going to maximize that opportunity. What I think we need to do in Silicon Valley to make sure this continues is, if we just say, “Ah, okay, we tried,” and disengage ...

“We’re back to the BlackBerry, people.”

Yeah. Like, we lose. This only works if we have a model where we are continuing to serve, not by providing advice only, but by literally taking a pause from here and going and stepping into the world and taking a tour of duty.

And being a part of it.

And being a part of the tour.

Do you imagine they all do that in a Trump presidency? I cannot see it except for our friend Peter Thiel.

Well, so I can say, I served in the Bush administration as well. And I think problems of national importance transcend whoever’s in the office. Whatever your political views are, whichever way they go, I think there’s always an opportunity to serve in some fashion.

Silicon Valley should not pull away from it. I want to end talking about something that President Obama said, and he’s done a lot of different initiatives, the AI initiative, all kinds of things that have been happening. He was talking about the difficulty he has dealing with techies. Now you’re coming back here and as much as he appreciates them, and that’s clear, there’s all kinds of problems, encryption is one, which you were not involved with but that’s a big issue, going to keep going, sort of tension between them.

The other is privacy. The other is monopoly power. There

Show more