Note-Code: Videos: Machine Learning: Google's Vision

0:00 TOM SIMONITE: Hi.
0:10 Good morning.
0:11 Welcome to day three of Google I/O,
0:14 and what should be a fun conversation about machine
0:16 learning and artificial intelligence.
0:18 My name is Tom Simonite.
0:19 I'm San Francisco bureau chief for MIT Technology Review.
0:23 And like all of you, I've been hearing a lot recently
0:26 about the growing power of machine learning.
0:28 We've seen some striking results come out
0:30 of academic and industrial research labs,
0:33 and they've moved very quickly into the hands of developers,
0:36 who have been using them to make new products and services
0:39 and companies.
0:40 I'm joined by three people this morning
0:42 who can tell us about how this new technology
0:45 and the capabilities it brings are coming out into the world.
0:48 They are Aparna Chennapragada, who
0:51 is the director of product management
0:53 and worked on the Google Now mobile assistant,
0:56 Jeff Dean, who leads the Google Brain research group here
1:00 in Mountain View, and John Giannandrea,
1:02 who is head of search and machine intelligence at Google.
1:06 Thanks for joining me, all of you.
1:08 We're going to talk for about 30 minutes,
1:10 and then there will be time for questions from the floor.
1:15 John, why don't we start with you?
1:16 You could set the scene for us.
1:19 Artificial intelligence and machine learning
1:21 are not brand new concepts.
1:23 They've been around for a long time,
1:24 but we're suddenly hearing a lot more about them.
1:27 Large companies and small companies
1:28 are investing more in this technology,
1:30 and there's a lot of excitement.
1:31 You can even get a large number of people
1:33 to come to a talk about this thing early in the morning.
1:37 So what's going on?
1:39 Tell these people why they're here.
1:41 JOHN GIANNANDREA: What's going on?
1:41 Yeah, thanks, Tom.
1:42 I mean, I think in the last few years,
1:44 we've seen extraordinary results in fields that hadn't really
1:48 moved the needle for many years, like speech recognition
1:51 and image understanding.
1:52 The error rates are just falling dramatically,
1:55 mostly because of advances in deep neural networks,
1:58 so-called deep learning.
2:00 I think these techniques are not new.
2:03 People have been using neural networks for many, many years.
2:06 But a combination of events over the last few years
2:09 has made them much more effective,
2:11 and caused us to invest a lot in getting them
2:14 into the hands of developers.
2:17 People talk about it in terms of AI winters,
2:19 and things like this.
2:20 I think we're kind of an AI spring right now.
2:23 We're just seeing remarkable progress
2:25 across a huge number of fields.
2:26 TOM SIMONITE: OK.
2:27 And now, how long have you worked
2:28 in artificial intelligence, John?
2:30 JOHN GIANNANDREA: Well, we started
2:31 investing heavily in this at Google about four years ago.
2:33 I mean, we've been working in these fields,
2:35 like speech recognition, for over a decade.
2:38 But we kind of got serious about our investments
2:40 about four years ago, and getting organized
2:44 to do things that ultimately resulted
2:46 in the release of things like TensorFlow, which
2:48 Jeff's team's worked on.
2:49 TOM SIMONITE: OK.
2:49 And we'll talk more about that later, I'm sure.
2:52 Aparna, give us a perspective from the view of someone
2:56 who builds products.
2:57 So John says this technology has suddenly
2:59 become more powerful and accurate and useful.
3:03 Does that open up new horizons for you,
3:05 when you're thinking about what you can build?
3:06 APARNA CHENNAPRAGADA: Yeah, absolutely.
3:08 I think for me, these are great as a technology.
3:12 But as a means to an end, they're
3:13 powerful tool kits to help solve real problems, right?
3:17 And for us, as building products, and for you guys,
3:20 too, there's two ways that machine learning
3:22 changes the game.
3:24 One is that it can turbo charge existing use cases-- that
3:27 is, existing problems like speech recognition--
3:30 by dramatically changing some technical components
3:33 that power the product.
3:34 If you're building a voice enabled assistant, the word
3:37 error rate that John was talking about, as soon as it dropped,
3:40 we actually saw the usage go up.
3:42 So the product gets more usable as machine learning improves
3:46 the underlying engine.
3:47 Same thing with translation.
3:48 As translation gets better, Google Translate,
3:51 it scales to 100-plus languages.
3:54 And photos is a great example.
3:55 You've heard Sundar talk about it, too,
3:57 that as soon as you have better image understanding,
4:00 the photo labeling gets better, and therefore, I
4:02 can organize my photos.
4:03 So it's a means to an end.
4:04 That's one way, certainly, that we have seen.
4:06 But I think the second way that's, personally, far more
4:09 exciting to see is where it can unlock new product use cases.
4:14 So turbocharging existing use cases is one thing,
4:17 but where can you kind of see problems
4:19 that really weren't thought of as AI or data problems?
4:22 And thanks to mobile, here-- 3 billion phones-- a lot
4:26 of the real world problems are turning into AI problems,
4:29 right?
4:29 Transportation, health, and so on.
4:31 That's pretty exciting, too.
4:32 TOM SIMONITE: OK.
4:33 And so is one consequence of this
4:35 that we can make computers less annoying, do you think?
4:38 I mean, that would be nice.
4:40 We'd all had these experiences where
4:41 you have a very clear idea of what it is you're trying to do,
4:44 but it feels like the software is doing
4:46 everything it can to stop you.
4:47 Maybe that's a form of artificial intelligence, too.
4:50 I don't know.
4:50 But can you make more seamless experiences
4:53 that just make life easier?
4:55 APARNA CHENNAPRAGADA: Yeah.
4:56 And I think in this case, again, one of the things
4:59 to think about is, how do you make sure-- especially
5:01 as you build products-- how do you
5:03 make sure your interface scales with the intelligence?
5:06 The UI needs to be proportional to AI.
5:09 I cannot believe I said some pseudo formula in front of Jeff
5:12 Dean.
5:14 But I think that's really important,
5:15 to make sure that the UI scales with the AI.
5:18 TOM SIMONITE: OK.
5:19 And Jeff, for people like Aparna,
5:23 building products, to do that, we
5:26 need this kind of translation step
5:27 which your group is working on.
5:29 So Google Brain is a research group.
5:30 Works in some very fundamental questions in its field.
5:33 But you also build this infrastructure,
5:36 which you're kind of inventing from scratch, that makes
5:38 it possible to use this stuff.
5:41 JEFF DEAN: Yeah.
5:42 I mean, I think, obviously, in order
5:44 to make progress on these kinds of problems,
5:46 it's really important to be able to try lots of experiments
5:50 and do that as quickly as you can.
5:52 There's a very fundamental difference
5:55 between having an experiment take a few hours,
5:58 versus something that takes six weeks.
5:59 It's just a very different model of doing science.
6:03 And so, one of the things we work on
6:06 is trying to build really scalable systems that are also
6:10 flexible and easy to express new kinds of machine learning
6:13 ideas.
6:14 So that's how TensorFlow came about.
6:16 It's sort of our internal research vehicle,
6:19 but also robust enough to take something you've done and done
6:23 lots of experiments on, and then, when you get something
6:25 that works well, to take that and move it into a production
6:28 environment, run things on phones or in data
6:31 centers, on RTPUs, that we announced a couple days ago.
6:36 And that seamless transition from research
6:39 to putting things into real products
6:41 is what we're all about.
6:43 TOM SIMONITE: OK.
6:44 And so, TensorFlow is this very flexible package.
6:48 It's very valuable to Google.
6:49 You're building a lot of things on top of it.
6:51 But you're giving it away for free.
6:52 Have you thought this through?
6:54 Isn't this something you should be keeping closely held?
6:56 JEFF DEAN: Yeah.
6:57 There was actually a little bit of debate internally.
7:00 But I think we decided to open source it,
7:02 and it's got a nice Apache 2.0 license which basically
7:05 means you can take it and do pretty much whatever
7:07 you want with it.
7:09 And the reason we did that is several fold.
7:14 One is, we think it's a really good way of making research
7:18 ideas and machine learning propagate more quickly
7:20 throughout the community.
7:22 People can publish something they've done,
7:26 and people can pick up that thing
7:27 and reproduce those people's results or build on them.
7:30 And if you look on GitHub, there's
7:33 like 1,500 repositories, now, that mention TensorFlow,
7:36 and only five of them are from Google.
7:38 And so, it's people doing all kinds of stuff with TensorFlow.
7:41 And I think that free exchange of ideas and accelerating
7:43 of that is one of the main reasons we did that.
7:47 TOM SIMONITE: OK.
7:47 And where is this going?
7:49 So I imagine, right now, that TensorFlow is mostly
7:52 used by people who are quite familiar with machine learning.
7:55 But ultimately, the way I hear people
7:59 talk about machine learning, it's
8:00 just going to be used by everyone everywhere.
8:03 So can developers who don't have much
8:05 of a background in this stuff pick it up yet?
8:07 Is that possible?
8:08 JEFF DEAN: Yeah.
8:09 So I think, actually, there's a whole set
8:12 of ways in which people can take advantage of machine learning.
8:15 One is, as a fundamental machine learning researcher,
8:18 you want to develop new algorithms.
8:19 And that's going to be a relatively small fraction
8:21 of people in the world.
8:23 But as new algorithms and models are developed
8:26 to solve particular problems, those models
8:29 can be applied in lots of different kinds of things.
8:31 If you look at the use of machine learning
8:35 in the diabetic retinopathy stuff
8:36 that Sundar mentioned a couple days ago,
8:39 that's a very similar problem to a lot of other problems
8:41 where you're trying to look at an image
8:43 and detect some part of it that's unusual.
8:45 We have a similar problem of finding text
8:48 in Street View images so that we can read the text.
8:51 And that looks pretty similar to a model
8:54 to detect diseased parts of an eye, just different training
8:58 data, but the same model.
8:59 So I think the broader set of models
9:02 will be accessible to more and more people.
9:05 And then there's even an easier way,
9:07 where you don't really need much machine learning knowledge
9:09 at all, and that is to use pre-trained APIs.
9:12 Essentially, you can use our Cloud Vision API
9:15 or our Speech APIs very simply.
9:17 You just give us an image, and we give you back good stuff.
9:20 And as part of the TensorFlow flow open source,
9:22 we also released, for example, an inception model that
9:26 does image classification that's the same model as underlies
9:29 Google Photos.
9:30 TOM SIMONITE: OK.
9:31 So will it be possible for someone-- maybe they're
9:33 an experienced builder of apps, but don't know much about
9:37 machine learning-- they could just
9:39 have an idea and kind of use these building blocks to put it
9:42 together?
9:42 JEFF DEAN: Yeah.
9:42 Actually, I think one of the reasons TensorFlow has taken
9:45 off, is the tutorials in TensorFlow are actually
9:47 quite good at illustrating six or seven important kinds
9:53 of models in machine learning, and showing people
9:55 how they work, stepping through both the machine learning
9:58 that's going on underneath, and also how you express them
10:01 in TensorFlow.
10:01 That's been pretty well received.
10:03 TOM SIMONITE: OK.
10:04 And Aparna, I think we've seen in the past
10:06 that when a new platform of mode of interaction comes forward,
10:10 we have to experiment with it for some time
10:13 before we figure out what works, right?
10:16 And sometimes, when we look back,
10:17 we might think, oh, those first generation
10:19 mobile apps were kind of clunky, and maybe not so smart.
10:23 How are we going with that process
10:25 here, where we're starting to have to understand
10:28 what types of interaction work?
10:30 APARNA CHENNAPRAGADA: Yeah.
10:32 And I think it's one of the things that's not intuitive
10:34 when you start out, you rush out into a new area,
10:36 like we've all done.
10:38 So one experience, for example, when
10:39 we started working on Google Now, one thing we realized
10:42 is, it's really important to make sure
10:44 that, depending on the product domain, some of these black box
10:49 systems, you need to pay attention
10:51 to what we call internally as the wow to WTH ratio.
10:55 That is, as soon as you kind of say,
10:57 hey, there are some delightful magical moments, right?
11:00 But then, if you kind of get it wrong,
11:02 there's a high cost to the user.
11:04 So to give you an example, in Google Search,
11:06 let's say you search for, I don't know, Justin Timberlake,
11:09 and we got a slightly less relevant answer.
11:12 Not a big deal, right?
11:13 But then, if the assistant told you to sit in the car,
11:16 go drive to the airport, and you missed
11:18 your flight, what the hell?
11:21 So I think it's really important to get that ratio right,
11:23 especially in the early stages of this new platform.
11:27 The other thing we noticed also is
11:29 that explainability or interpretability really builds
11:33 trust in many of these cases.
11:35 So you want to be careful about looking
11:37 at which parts of the problem you use machine learning
11:41 and you drop this into.
11:43 You want to look at problems that are easy for machines
11:46 and hard for humans, the repetitive things,
11:48 and then make sure that those are the problems that you
11:50 throw machine learning against.
11:52 But you don't want to be unpredictable and inscrutable.
11:56 TOM SIMONITE: And one mode of interaction that everyone seems
11:59 to be very excited about, now, is this idea
12:01 of conversational interface.
12:02 So we saw the introduction on Wednesday of Google Assistant,
12:06 but lots of other companies are building these things, too.
12:10 Do we know that definitely works?
12:13 What do we know about how you design
12:15 a conversational interface, or what the limitations
12:17 and strengths are?
12:19 APARNA CHENNAPRAGADA: I think, again, at a broad level,
12:21 you want to make sure that you can have this trust.
12:24 So [INAUDIBLE] domains make it easy.
12:26 So it's very hard to make a very horizontal system
12:29 work that works for anything.
12:31 But I'm actually pretty excited at the progress.
12:33 We just launched-- open sourced-- the sentence parser,
12:36 Parsey Mcparseface.
12:37 I just wanted to say that name.
12:41 But it's really exciting, because then you say,
12:43 OK, you're starting to see the beginning of conversational,
12:46 or at least a natural language sentence understanding,
12:49 and then you have building blocks that build on top of it.
12:52 TOM SIMONITE: OK.
12:52 And John, with your search hat on for a second,
12:56 we heard on Wednesday that, I think, 20% of US searches
13:01 are now done by voice.
13:02 So people have clearly got comfortable with this,
13:04 and you've managed to provide something
13:06 that they want to use.
13:09 Is the Assistant interface to search
13:12 going to grow in a similar way, do you think?
13:14 Is it going to take over a big chunk of people's search
13:17 queries?
13:18 JOHN GIANNANDREA: Yeah.
13:19 We think of the Assistant as a fundamentally different product
13:22 than search, and I think it's going
13:24 to be used in a different way.
13:25 But we've been working on what we
13:26 call voice search for many, many years,
13:28 and we have this evidence that people
13:30 like it and are using it.
13:32 And I would say our key differentiator, there, is just
13:36 the depth of search, and the number of questions
13:38 we can answer, and the kinds of complexities
13:40 that we can deal with.
13:43 I think language and dialogue is the big unsolved problem
13:46 in computer science.
13:48 So imagine you're reading an article
13:50 and then writing a shorter version of it.
13:52 That's currently beyond the state of the art.
13:54 I think the important thing about the open source release
13:56 we did of the parser is it's using TensorFlow as well.
14:02 So in the same way as Jeff explained,
14:03 the functionality of this in Google Photos for finding
14:06 your photos is actually available open source,
14:08 and people can actually play with it
14:09 and run a cloud version of it.
14:11 We feel the same way about natural language understanding,
14:13 and we have many more years of investment
14:15 to make in getting to really natural dialogue systems,
14:19 where you can say anything you want,
14:20 and we have a good shot of understanding it.
14:23 So for us, this is a journey.
14:25 Clearly, we have a fairly usable product in voice search today.
14:29 And the Assistant, we hope, when we launch
14:31 later this year, people will similarly
14:33 like to use it and find it useful.
14:36 TOM SIMONITE: OK.
14:36 Do you need a different monetization model
14:39 for the Assistant dialogue?
14:40 Is that something--
14:42 JOHN GIANNANDREA: We're really focused, right now,
14:42 on building something that users like to use.
14:45 I think Google has a long history
14:46 of trying to build things that people find useful.
14:49 And if they find them useful, and they use them at scale,
14:52 then we'll figure out a way to actually have a business
14:54 to support that.
14:56 TOM SIMONITE: OK.
14:57 So you mentioned that there are still
14:58 a lot of open research questions here,
14:59 so maybe we could talk about that a little bit.
15:03 As you described, there have been
15:05 some very striking improvements in machine learning recently,
15:08 but there's a lot that can't be done.
15:09 I mean, if I go to my daughter's preschool,
15:11 I would see young children learning and using
15:14 language in ways that your software can't match right now.
15:17 So can you give us a summary of the territory that's
15:21 still to be explored?
15:22 JOHN GIANNANDREA: Yeah.
15:23 There's a lot still to be done.
15:25 I think there's a couple of areas
15:28 which researchers around the world
15:30 are furiously trying to attack.
15:32 So one is learning from smaller numbers of examples.
15:35 Today, the learning systems that we have,
15:37 including deep neural networks, typically
15:39 require really large numbers of examples.
15:41 Which is why, as Jeff was describing,
15:43 they can take a long time to train,
15:44 and the experiment time can be slow.
15:48 So it's great that we can give systems
15:51 hundreds of thousands or millions of labeled examples,
15:53 but clearly, small children don't need to do that.
15:56 They can learn from very small numbers of examples.
15:58 So that's an open problem.
16:00 I think another very important problem in machine learning
16:02 is what the researchers call transfer learning, which
16:05 is learning something in one domain,
16:07 and then being able to apply it in another.
16:09 Right now, you have to build a system
16:11 to learn one particular task, and then that's not
16:13 transferable to another task.
16:14 So for example, the AlphaGo system that
16:17 won the Go Championship in Korea,
16:20 that system can't, a priori, play chess or tic tac toe.
16:24 So that's a big, big open problem
16:26 in machine learning that lots of people are interested in.
16:28 TOM SIMONITE: OK.
16:29 And Jeff, this is kind of on your group, to some extent,
16:33 isn't it?
16:34 You need to figure this out.
16:35 Are there particular avenues or recent results
16:38 that you would highlight that seem to be promising?
16:41 JEFF DEAN: Yeah.
16:42 I think we're making, actually, pretty significant progress
16:46 in doing a better job of language understanding.
16:48 I think, if you look at where computer vision was three
16:53 or four or five years ago, it was
16:54 kind of just starting to show signs of life,
16:57 in terms of really making progress.
16:58 And I think we're starting to see the same thing in language
17:02 understanding kinds of models, translation, parsing, question
17:06 answering kinds of things.
17:08 In terms of open problems, I think unsupervised
17:12 learning, being able to learn from observations
17:14 of the world that are not labeled,
17:15 and then occasionally getting a few labeled examples that
17:18 tell you, these are important things about the world
17:21 to pay attention to, that's really
17:23 one of the key open challenges in machine learning.
17:27 And one more, I would add, is, right now,
17:31 what you need a lot of machine learning expertise for
17:34 is to kind of device the right model structure
17:36 for a particular kind of problem.
17:38 For an image problem, I should use convolutional neural nets,
17:41 or for language problems, I should use this particular kind
17:44 of recurrent neural net.
17:46 And I think one of the things that
17:48 would be really powerful and amazing
17:50 is if the system itself could device the right structure
17:54 for the data it's observing.
17:57 So learning model structure concurrently
17:59 with trying to solve some set of tasks, I think,
18:02 would be a really great open research problem.
18:05 TOM SIMONITE: OK.
18:05 So instead of you having to design the system
18:08 and then setting it loose to learn,
18:11 the learning system would build itself, to some extent?
18:13 JEFF DEAN: Right.
18:14 Right now, you kind of define the scaffolding of the model,
18:17 and then you fiddle with parameters
18:18 as part of the learning process, but you don't sort of
18:21 introduce new kinds of connections
18:22 in the model structure itself.
18:24 TOM SIMONITE: Right.
18:25 OK.
18:25 And unsupervised learning, just giving it that label,
18:29 it makes it sound like one unitary problem, which
18:31 may not be true.
18:32 But will big progress on that come
18:36 from one flash of insight and a new algorithm,
18:41 or will it be-- I don't know-- a longer slog?
18:46 JEFF DEAN: Yeah.
18:47 If I knew, that would be [INAUDIBLE].
18:50 I have a feeling that it's not going to be, like,
18:53 100 different things.
18:54 I feel like there's a few key insights
18:57 that new kinds of learning algorithms
19:00 could pick up on as to what aspects
19:03 of the world the model is observing are important.
19:06 And knowing which things are important
19:08 is one of the key things about unsupervised learning.
19:11 TOM SIMONITE: OK.
19:12 Aparna, so what Jeff's team kind of works out, eventually,
19:18 should come through into your hands,
19:19 and you could build stuff with it.
19:21 Is there something that you would really
19:23 like him to invent tomorrow, so you can start building
19:26 stuff with it the day after?
19:28 APARNA CHENNAPRAGADA: Auto generate emails.
19:30 No, I'm kidding.
19:32 I do think, actually, what's interesting is, you've heard
19:35 these building blocks, right?
19:36 So machine perception, computer vision, wasn't a thing,
19:40 and now it's actually reliable.
19:41 Language understanding, it's getting there.
19:44 Translation is getting there.
19:45 To me, the next other building block you can make machines do
19:49 is hand-eye coordination.
19:51 So you've seen the robot arms video
19:53 that Sundar talked about and showed at the keynote,
19:56 but imagine if you could kind of have these rote tasks that
20:00 are harder, tedious for humans, but if you
20:03 had reliable hand-eye coordination built in, that's
20:07 in a learned system versus a controlled system code
20:09 that you usually write, and it's very brittle,
20:11 suddenly, it opens up a lot more opportunities.
20:13 Just off the top of my head, why isn't there
20:16 anything for, like, elderly care?
20:18 Like, you are an 80-year-old woman with a bad back,
20:21 and you're picking up things.
20:23 Why isn't there something there?
20:24 Or even something as mundane with natural language
20:27 understanding, right?
20:28 I have a seven-year-old.
20:29 I'm a mom of a 7-year-old.
20:31 Why isn't there something for, I don't know,
20:33 math homework, with natural language understanding?
20:36 JOHN GIANNANDREA: So I think one of things
20:38 we've learned in the last few years
20:39 is that things that are hard for people
20:42 to do, we can teach computers to do,
20:44 and things that are easy for us to do
20:45 are still the hard problems for computers.
20:47 TOM SIMONITE: Right.
20:48 OK.
20:49 And does that mean we're still missing some big new field
20:56 we need to invent?
20:57 Because most of the things we've been talking about so far
20:59 have been built on top of this deep learning
21:01 and neural network.
21:02 JOHN GIANNANDREA: I think robotics work is interesting,
21:04 because it gives the computer system an embodiment
21:08 in the world, right?
21:10 So learning from tactile environments
21:13 is a new kind of learning, as opposed to just looking
21:16 at unsupervised or supervised.
21:17 Just reading text is a particular environment.
21:21 Perception, looking at images, looking at audio,
21:23 trying to understand what this song is,
21:25 that's another kind of problem.
21:27 I think interacting with the real world
21:29 is a whole other kind of problem.
21:30 TOM SIMONITE: Right.
21:30 OK.
21:31 That's interesting.
21:33 Maybe this is a good time to talk a little bit more
21:35 about DeepMind.
21:35 I know that they are very interested in this idea
21:38 of embodiment, the idea you have to submerge this learning
21:43 agent in a world that it can learn from.
21:45 Can you explain how they're approaching this?
21:47 JOHN GIANNANDREA: Yeah, sure.
21:48 I mean, DeepMind is another research group
21:49 that we have at Google, and we work closely with them
21:52 all the time.
21:53 They are particularly interested in learning from simulations.
21:57 So they've done a lot of work with video games
21:59 and simulations of physical environments,
22:01 and that's one of the research directions that they have.
22:04 It's been very productive.
22:06 TOM SIMONITE: OK.
22:08 Is it just games?
22:09 Are they moving into different types of simulation?
22:12 JOHN GIANNANDREA: Well, there's a very fine line
22:14 between a video game-- a three-dimensional video game--
22:17 and a physics simulation already environment, right?
22:20 I mean, some video games are, in fact,
22:22 full simulations of worlds, so there's not really
22:26 a bright line there.
22:27 TOM SIMONITE: OK.
22:27 And do DeepMind work on robotics?
22:29 They don't, I didn't think.
22:30 JOHN GIANNANDREA: They're doing a bunch of work
22:32 in a bunch of different fields, some of which
22:33 gets published, some of which is not.
22:35 TOM SIMONITE: OK.
22:36 And the robot arms that we saw in the keynote on Wednesday,
22:40 are they within your group, Jeff?
22:41 JEFF DEAN: Yes.
22:42 TOM SIMONITE: OK.
22:42 So can you tell us about that project?
22:44 JEFF DEAN: Sure.
22:44 So that was a collaboration between our group
22:46 and the robotics teams in Google X. Actually, what happened was,
22:52 one of our researchers discovered
22:53 that the robotics team, actually,
22:55 had 20 unused arms sitting in a closet somewhere.
22:59 They were a model that was going to be discontinued
23:01 and not actually used.
23:02 So we're like, hey, we should set these up in a room.
23:06 And basically, just the idea of having
23:10 a little bit larger scale robotics test environment
23:12 than just one arm, which is what you typically
23:14 have in a physical robotics lab, would
23:18 make it possible to do a bit more exploratory research.
23:22 So one of the first things we did with that was just
23:24 have the robots learn to pick up objects.
23:27 And one of the nice properties that has,
23:29 it's a completely supervised problem.
23:32 The robot can try to grab something,
23:34 and if it closes its griper all the way, it failed.
23:36 And if it didn't close it all the way,
23:38 and it picked something up, it succeeded.
23:40 And so it's learning from raw camera pixel inputs
23:44 directly to torque motor controls.
23:45 And there's just a neural net there
23:47 that's trained to pick things up based on the observations it's
23:51 making of things as it approaches a particular object.
23:55 TOM SIMONITE: And is that quite a slow process?
23:57 I mean, that fact that you have multiple arms going
23:59 at once made me think that, maybe, you
24:02 were trying to maximize your throughput, or something.
24:04 JEFF DEAN: Right.
24:05 So if you have 20 arms, you get 20 times as much experience.
24:08 And if you think about how small kids learn to pick stuff up,
24:11 it takes them maybe a year, or something,
24:13 to go from being able to move their arm to really be
24:17 able to grasp simple objects.
24:19 And by parallelizing this across more arms,
24:22 you can pool the experience of the robotic arms a bit.
24:24 TOM SIMONITE: I see.
24:25 OK.
24:27 JEFF DEAN: And they need less sleep.
24:29 TOM SIMONITE: Right.
24:31 John, at the start of the session,
24:32 you referred to this concept of AI winter,
24:35 and you said you thought it was spring.
24:39 When do we know that it's summer?
24:43 JOHN GIANNANDREA: Summer follows spring.
24:45 I mean, there's still a lot of unsolved problems.
24:47 I think problems around dialogue and language
24:49 are the ones that I'm particularly interested in.
24:52 And so, until we can teach a computer to really read,
24:56 I don't think we can declare that it's summer.
24:59 I mean, if you can imagine a computer's really reading
25:02 and internalizing a document.
25:04 So it's interesting.
25:05 So translation is reading a paragraph in one language
25:08 and writing it in another language.
25:10 In order to do that really, really well,
25:12 you have to be able to paraphrase.
25:13 You have to be able to reorder words, and so on and so
25:15 forth So imagine translating something
25:17 from English to English.
25:18 So you read a paragraph, and you write a different paragraph.
25:21 If we could do that, I think I would declare summer.
25:25 TOM SIMONITE: OK.
25:26 Reading is-- well, there are different levels of reading,
25:30 aren't there?
25:31 Do you know--
25:33 JOHN GIANNANDREA: If you can paraphrase, then you really--
25:35 TOM SIMONITE: Then you think that-- if you
25:36 could reach that level.
25:37 JOHN GIANNANDREA: And actually understood--
25:37 TOM SIMONITE: Then you've got some argument.
25:39 JOHN GIANNANDREA: And to a certain extent,
25:40 today, our translation systems, which
25:42 are not perfect by any means, are getting better.
25:45 They do do some of that.
25:46 They do do some paraphrasing.
25:48 They do do some re-ordering.
25:49 They do do a remarkable amount of language understanding.
25:52 So I'm hopeful researchers around the world
25:54 will get there.
25:55 And it's very important to us that our natural language
25:57 APIs become part of our cloud platform,
25:59 and that people can experiment with it, and help.
26:02 JEFF DEAN: One thing I would say is,
26:04 I don't think there's going to be
26:05 this abrupt line between spring and summer, right?
26:08 There's going to be developments that push the state of the art
26:11 forward in lots of different areas in kind
26:13 of this smooth gradient of capabilities.
26:16 And at some point, something becomes
26:18 possible that didn't used to be possible,
26:21 and people kind of move the goalposts
26:23 of what they think of as really, truly hard problems.
26:28 APARNA CHENNAPRAGADA: The classic joke, right?
26:30 It's only AI until it starts working,
26:32 and then it's computer science.
26:34 JEFF DEAN: Like, if you'd asked me four years ago,
26:36 could a computer write a sentence
26:38 given an image as input?
26:40 And I would have said, I don't think they
26:42 can do that for a little while.
26:43 And they can actually do that today,
26:44 and that's kind of a good example of something
26:46 that has made a lot of progress in the last few years.
26:48 And now you sort of say, OK, that's in our tool
26:51 chest of capabilities.
26:53 TOM SIMONITE: OK.
26:53 But if we're not that great at predicting
26:56 how the progress goes, does that mean we can't see winter,
27:00 if it comes back?
27:04 JOHN GIANNANDREA: If we stop seeing progress,
27:06 then I think we could question what the future's going
27:09 to look like.
27:10 But today, the rate of-- I think researchers in the field
27:14 are excited about this, and maybe the field
27:16 is a little bit over-hyped because of the rate of progress
27:18 we're seeing.
27:19 Because something like speech recognition,
27:21 which didn't work for my wife five years ago,
27:23 and now works flawlessly, because image identification
27:29 is now working better than human raters for many fields.
27:32 So there's these narrow fields for which algorithms are not
27:36 superhuman in their capabilities.
27:37 So we're seeing tremendous progress.
27:39 And so it's very exciting for people working in this field.
27:42 TOM SIMONITE: OK.
27:43 Great.
27:44 I should just note that, in a couple of minutes,
27:46 we will open up the floor for questions.
27:48 There are microphones here and here in the main seating area,
27:52 and there's one microphone up in the press area, which
27:55 I can't see right now, but hopefully you
27:57 can figure out where it is.
28:01 Sundar Pichai, CEO of Google, has spoken a lot recently
28:04 about how he thinks we're moving from a world which
28:06 is mobile-first to AI-first.
28:11 I'm interested to hear what you think that means.
28:13 Maybe, Aparna, you could speak to that.
28:16 APARNA CHENNAPRAGADA: I interpret
28:18 it a couple different ways.
28:19 One is, if you look at how mobile's changed,
28:21 how you experience computing, it's
28:25 not happened at one level of the stack, right?
28:28 It's at the interface level, it's
28:29 at the information level, and infrastructure.
28:31 And I think that's the same thing that's
28:33 going to happen with AI and any of these machine learning
28:36 techniques, which is, you'll have infrastructure layer
28:39 improvements.
28:39 You saw the announcement about TPU.
28:41 You'll have a bunch of algorithms and models
28:44 improvements at the intelligence and information layer,
28:47 and there will be interface changes.
28:48 So the best UI is probably no UI.
28:51 TOM SIMONITE: Right.
28:52 OK.
28:53 John, what does AI-first mean to you?
28:57 JOHN GIANNANDREA: I think it means
28:58 that this assistant kind of layer is available to you
29:01 wherever you are.
29:02 Whether you're in your car, or whether it's
29:05 ambient in your house, or whether you're
29:07 using your mobile device or laptop,
29:10 that there is this smart assistance
29:12 that you find very quietly useful to you all the time.
29:17 Kind of how Google search is for most people today.
29:19 I think most people would not want search engines taken away
29:23 from them, right?
29:24 So I think that being that useful to people,
29:26 so that people take it for granted,
29:27 and then it's ambient across all your devices,
29:29 is what AI-first means to me.
29:31 TOM SIMONITE: And we're in the early stages of this,
29:33 do you think?
29:34 JOHN GIANNANDREA: Yeah.
29:35 It's a journey, I think.
29:36 It's a multi-year journey
29:37 TOM SIMONITE: OK.
29:38 Great.
29:39 So thanks for a fascinating conversation.
29:41 Now, we'll let someone else ask the questions for a little bit.
29:44 I will alternate between the press mic and the mics
29:49 down here at the front.
29:51 Please keep your questions short,
29:53 so we can get through more of them,
29:54 and make sure they're questions, not statements.
29:58 We will start with the press mic, wherever it is.
30:13 MALE SPEAKER: There's nobody there.
30:14 TOM SIMONITE: I really doubt the press has no questions.
30:18 What's happening?
30:18 Why don't we start with the developer mic
30:20 right here on the right?
30:23 AUDIENCE: I have a philosophical question about prejudice.
30:28 People tend to have prejudice.
30:31 Do you think this is a step stone
30:33 that we need to take in artificial intelligence,
30:36 and how would society accept that?
30:40 JOHN GIANNANDREA: I'm not sure I understand the question.
30:43 Some people have prejudice, and?
30:46 AUDIENCE: Some people have the tendency
30:49 to have prejudice, which might lead to behaviors
30:53 such as discrimination.
30:56 TOM SIMONITE: So the question is,
30:57 will the systems that the people build have biases?
31:00 JOHN GIANNANDREA: Oh, I see.
31:01 I see.
31:02 Will people's prejudices creep into machine learning systems?
31:05 I think that is a risk.
31:07 I think it all depends on the training data that we choose.
31:10 We've already seen some issues with this kind of problem.
31:13 So I think it all depends on carefully
31:14 selecting training data, particularly
31:16 for supervised systems.
31:19 TOM SIMONITE: OK.
31:21 Is the press mic working, at this point?
31:23 SEAN HOLLISTER: Hi.
31:24 I'm Sean Hollister, up here in the press mic.
31:26 TOM SIMONITE: Great.
31:27 Go for it.
31:28 SEAN HOLLISTER: Hi, there.
31:29 I wanted to ask about the role of privacy in machine learning.
31:33 You need a lot of data to make these observations
31:38 and to help people with machine learning.
31:41 I give all my photos to Google Photos,
31:44 and I wonder what happens to them afterwards.
31:47 What allows Google to see what they
31:49 are, and is that ever shared in any way with anyone else?
31:53 Personally, I don't care very much about that.
31:55 I'm not worried my photos are going
31:57 to get out to other folks, but where do they go?
32:00 What do you do with them?
32:01 And to what degree are they protected?
32:04 JEFF DEAN: Do you want to take that one?
32:06 APARNA CHENNAPRAGADA: I think this
32:07 is one of the most important things
32:09 that we look at across products.
32:12 So even with photos, or Google Now,
32:14 or voice, and all of these things.
32:16 There's actually two principles we codify into building this.
32:20 One is, there's a very explicit--
32:22 it's a very transparent contract between the user
32:25 and the product that is, you basically know what benefits
32:29 you're getting with the data, and the data
32:31 is there to help you.
32:32 That's one principle.
32:34 But the second is, by default, it's an opt-in experience.
32:39 You're in the driver's seat.
32:40 In some sense, let's say, you're saying,
32:42 hey, I do want to get traffic information when
32:45 I'm on Shoreline, because it's clogged up to Shoreline
32:48 Amphitheater, you, of course, need the system
32:50 to know where your location is.
32:51 Because you don't want to know how the traffic is in Napa.
32:55 So having that contract be transparent, but also
32:58 an opt-in, I think it really addresses the equation.
33:04 But I think the other thing to add in here
33:06 is also that, by definition, all of these are for your eyes
33:11 only, right?
33:12 In terms of, like, all your data is yours, and that's an axiom.
33:16 JOHN GIANNANDREA: And to answer his question,
33:18 we would never share his photos.
33:19 We train models based on other photos that are not yours,
33:24 and then the machine looks at your photos,
33:26 and it can label it, but we would never
33:27 share your private photo there.
33:29 SEAN HOLLISTER: To what degree is advertising
33:31 anonymously-targeted at folks like me,
33:34 based on the contents of things I upload,
33:37 little inferences you make in the meta data?
33:40 Is any of that going to advertisers in any way,
33:44 even in aggregate, hey, this is a person who
33:47 seems to like dogs?
33:50 JOHN GIANNANDREA: For your photos?
33:51 No.
33:52 Absolutely not.
33:52 APARNA CHENNAPRAGADA: No.
33:53 TOM SIMONITE: OK.
33:53 Let's go to this mic right here.
33:55 AUDIENCE: My questions is for Aparna, about,
33:58 what is the thought process behind creating a new product?
34:02 Because there are so many things that these guys are creating.
34:05 So how do you go from-- because it's kind of obvious right
34:08 now to see if you have my emails,
34:10 and you know that I'm traveling tomorrow to New York,
34:14 it's kind of simple to do that on my calendar
34:16 and create an event.
34:17 How do you go from robotic arms, trying
34:21 to understand how to get things, to an actual product?
34:25 The question is, what is the thought process behind it?
34:27 APARNA CHENNAPRAGADA: Yeah.
34:27 I'll give you the short version of it.
34:29 And, obviously, there's a longer version of it.
34:32 Wait for the medium post.
34:33 But I think the short version of it
34:35 is, to echo one thing JG said, you
34:38 want to pick problems that are easy for machines
34:41 and hard for humans.
34:42 So AI plus machine learning is not
34:45 going to turn a non-problem into a real problem
34:47 that people need solving.
34:49 It's like, you can take Christopher Nolan and Ben
34:53 Affleck, and you can still end up with Batman Versus Superman.
34:56 So you want to make sure that the problem you're solving
34:59 is a real one.
35:00 Many of our failures, even internally
35:02 and external, like frenzy around bots and AI,
35:06 is when you kid yourself that the problem needs solving.
35:09 And the second one, the second quick insight there,
35:12 is that you also want to build an iterative model.
35:15 That is, you want to kind of start small, and say, hey,
35:18 travel needs some assistance.
35:19 What are the top five things that people need help with?
35:22 And see which of these things can scale.
35:25 JEFF DEAN: I would add one thing to that,
35:26 which is, often, we're doing research
35:29 on a particular kind of problem.
35:31 And then, when we have something we think is useful,
35:34 we'll share that internally, as presentations or whatever,
35:37 and maybe highlight a few places where
35:39 we think this kind of technology could be used.
35:42 And that's sort of a good way to inform the product designers
35:45 about what kinds of things are now possible that
35:49 didn't used to be possible.
35:50 TOM SIMONITE: OK.
35:51 Let's have another question from the press section up there.
35:54 AUDIENCE: Yeah.
35:54 There's a lot of talk, lately, about sort of a fear of AI.
35:59 Elon Musk likened it to summoning the demon.
36:04 Whether that's overblown or not, whether it's
36:07 perception versus reality, there seems
36:10 to be a lot of mistrust or fear of going
36:13 too far in this direction.
36:15 How much stock you put into that?
36:18 And how do you win the trust of the public, when
36:22 you show experiments like the robot arm thing
36:24 on the keynote, which was really cool, but sort
36:26 of simultaneously creepy at the same time?
36:29 JOHN GIANNANDREA: So I get this question a lot.
36:31 I think there's this notion that's
36:34 been in the press for the last couple of years
36:36 about so-called super intelligence,
36:38 that somehow AI will beget more AI,
36:40 and then it will be exponential.
36:42 I think researchers in the field don't put much stock in that.
36:46 I don't think we think it's a real concern yet.
36:48 In fact, I think we're a long way away
36:49 from it being a concern.
36:51 There are some researchers who actually
36:53 think about these ethical problems,
36:55 and think about AI safety, and we
36:56 think that's really important.
36:57 And we work on this stuff with them,
37:00 and we support that kind of work.
37:01 But I think it's a concern that is decades and decades away.
37:06 It's also conflated with the fact
37:08 that people look at things like robots learning
37:10 to pick things up, and that's somehow
37:12 inherently scary to people.
37:14 I think it's our job, when we bring products
37:16 to market, to do it in a thoughtful way
37:19 that people find genuinely useful.
37:21 So a good example I would give you is, in Google products,
37:26 when you're looking for a place, like a coffee shop
37:28 or something, we'll show you when it's busy.
37:30 And that's the product of fairly advanced machine learning
37:34 that takes aggregate signals in a privacy-preserving way
37:36 and says, yeah, this coffee shop is really
37:38 busy on a Saturday morning.
37:39 That doesn't seem scary to me, right?
37:41 That doesn't seem anything like a bad thing
37:46 to bring into the world.
37:47 So I think there's a bit of a disconnect between the somewhat
37:50 extended hype, and the actual use of this technology
37:52 in everyday products.
37:54 TOM SIMONITE: OK.
37:54 Next question.
37:55 AUDIENCE: Thank you.
37:56 So given Google's source of revenue
37:58 and the high use of ad blockers, is there
38:02 any possibility of using machine learning
38:04 to maybe ensure that the appropriate ads are served?
38:07 Or if there's multiple versions of the same ad,
38:10 that the ad that would apply most to me
38:12 would be served to me, and to a different user,
38:14 a different version, and things like that?
38:16 Is that on the roadmap?
38:17 JEFF DEAN: Yeah.
38:18 I think, in general, there's a lot
38:20 of potential applications of machine
38:21 learning to advertising.
38:24 Google has actually been using machine
38:25 learning in our advertising system for more than a decade.
38:29 And I think one of the things about deciding
38:34 what ads to show to users is, you
38:35 want them to be relevant and useful to that user.
38:38 And it's better to not show an ad at all,
38:40 if you don't have something that seems plausibly relevant.
38:44 And that's always been Google's advertising philosophy.
38:47 And other websites on the web don't necessarily quite
38:51 have the same balance, in that respect.
38:53 But I do think there's plenty of opportunity to continue
38:56 to improve advertising systems and make them better,
38:59 so that you see less ads, but they're actually more useful.
39:03 TOM SIMONITE: OK.
39:03 Next question from at the top.
39:05 JACK CLARK: Jack Clark with Bloomberg News.
39:08 So how do you differentiate to the user
39:12 between a sponsored advert, and one that is provided by your AI
39:17 naturally?
39:18 How do I know that the burger joint you're suggesting
39:21 is like a paid-for link, or is it a genuine link?
39:26 JEFF DEAN: So in our user interfaces,
39:27 we always clearly delimit advertisements.
39:30 And in general, all ads that we show
39:33 are selected algorithmically by our systems.
39:36 They're not like, you can just give us an ad,
39:38 and we will always show it to someone.
39:40 We always decide what is the likelihood
39:43 that this ad is going to be useful to someone,
39:45 before we decide to show that advertiser's ad.
39:48 JACK CLARK: Does this extend to stuff like Google Home, where
39:51 it will say, this is a sponsored restaurant
39:53 we're going to send you to.
39:57 JEFF DEAN: I don't know that product.
39:58 JOHN GIANNANDREA: I mean, we haven't
40:00 launched Google Home yet.
40:01 So a lot of these product decisions are still to be made.
40:05 I think we do, as a general rule,
40:08 clearly identify when something is sponsored
40:10 versus when it's organic.
40:13 TOM SIMONITE: OK.
40:13 Next question here.
40:15 AUDIENCE: Hi.
40:15 This is a question for Jeff Dean.
40:19 I'm very much intrigued by the Google Brain project
40:22 that you're doing.
40:22 Very cool t-shirt.
40:25 The question is, what is the road map of that,
40:28 and how does it relate to the point of singularity?
40:32 JEFF DEAN: Aha.
40:34 So the road map of-- this is sort of the project code name
40:41 for the team that I work on.
40:43 Basically, the team was developed
40:45 to investigate the use of advanced methods
40:49 in machine learning to solve difficult problems in AI.
40:54 And we're continuing to work on pushing the state
40:57 of the art in that area.
40:59 And I think that means working in lots of different areas,
41:01 building the right kinds of hardware with TPUs,
41:04 building the right systems infrastructure with things
41:07 like TensorFlow.
41:08 Solving the right research problems
41:10 that are not connected to products,
41:14 and then figuring out ways in which machine learning can
41:17 be used to advance different kinds of fields,
41:22 as we solve different problems along the road.
41:25 I'm not a big believer in the singularity.
41:27 I think all exponentials look like exponentials
41:30 at the beginning, but then they run out of stuff.
41:34 TOM SIMONITE: OK.
41:35 Thanks for the question.
41:36 Back to the pressbox.
41:38 STEVEN MAX PATTERSON: Hi.
41:39 Steven Max Patterson, IDG.
41:41 I was looking at Google Home and Google Assistant,
41:45 and it looks like it's really a platform.
41:50 And it's a composite of other platforms,
41:53 like the Knowledge Graph, Google Cloud Speech, Google machine
41:58 learning, the Awareness API.
42:00 Is this a feature that other consumer device manufacturers
42:06 could include, and is that the intent and direction of Google,
42:09 is to make this a platform?
42:13 JOHN GIANNANDREA: It's definitely
42:14 the case that most of our machine learning APIs
42:18 are migrating to the cloud platform, which enables people
42:21 to use, for example, our speech capabilities in other products.
42:25 I think the Google Assistant is intended to be, actually,
42:27 a holistic product delivered from Google.
42:29 That makes sense.
42:30 But it may make sense to syndicate
42:32 that to other manufacturers at some point.
42:34 We don't have any plans to do that today.
42:36 But in general, we're trying to be
42:37 as open as we can with the component pieces
42:39 that you just mentioned, and make
42:41 them available as Cloud APIs, and in many cases,
42:43 as open source solutions as well.
42:45 JEFF DEAN: Right.
42:46 I think one of the things about that
42:47 is, making those individual pieces available
42:49 enables everyone in the world to take advantage of some
42:53 of the machine learning research we've done,
42:55 and be able to do things like label images,
42:57 or do speech recognition really well.
42:59 And then they can go off and build
43:00 really cool, amazing things that aren't necessarily
43:03 the kinds of things we're working on.
43:05 JOHN GIANNANDREA: Yeah, and many companies are doing this today.
43:07 They're using our translate APIs.
43:08 They're using our Cloud Speech APIs today.
43:12 TOM SIMONITE: Right.
43:13 We have time for one last quick question from this mic here.
43:15 AUDIENCE: Hi.
43:16 I'm [INAUDIBLE].
43:18 John, you said that you would declare summer
43:22 if, in language understanding, it
43:25 would be able to translate from one paragraph in English
43:30 to another paragraph in English.
43:32 Don't you think that making that possible requires
43:35 really complete understanding of the world, and everything
43:40 that's going on, just to catch the emotional level that
43:44 is in the paragraph, or even the physical understanding
43:47 of the world around us?
43:50 JOHN GIANNANDREA: Yeah, I do.
43:52 I use that example because it is really, really hard.
43:55 So I don't think we're going to be done for many, many years.
43:58 I think there's a lot of work to do.
44:00 We built the Google Knowledge Graph, in part,
44:02 to answer that question, so that we actually
44:03 had some semantic understanding of at least
44:05 the things in the world, and some of the relationships
44:07 between them.
44:08 But yeah, it's a very hard problem.
44:09 And I used that example because it's
44:11 pretty clear we won't be done for a long time.
44:13 TOM SIMONITE: OK.
44:14 Sorry, there's no time for other questions.
44:16 Thanks for the question.
44:17 A good forward-looking note to end on.
44:20 We'll see how it works out over the coming years.
44:23 Thank you for joining me, all of you on stage,
44:25 and thanks for the questions and coming for the session.
44:29 [MUSIC PLAYING]
Transcripción : Youtube

Note-Code: Videos

Machine Learning: Google's Vision - Google I/O 2016

No hay comentarios.:

Publicar un comentario

Entradas populares 00

Entradas populares 30