0:00 [MUSIC PLAYING]
0:06 Six lines of code is all it takes
0:08 to write your first Machine Learning program.
0:10 My name's Josh Gordon, and today I'll
0:11 walk you through writing Hello World for Machine learning.
0:14 In the first few episodes of the series,
0:16 we'll teach you how to get started with Machine
0:17 Learning from scratch.
0:19 To do that, we'll work with two open source libraries,
0:21 scikit-learn and TensorFlow.
0:23 We'll see scikit in action in a minute.
0:25 But first, let's talk quickly about what Machine Learning is
0:27 and why it's important.
0:29 You can think of Machine Learning as a subfield
0:31 of artificial intelligence.
0:32 Early AI programs typically excelled at just one thing.
0:35 For example, Deep Blue could play chess
0:37 at a championship level, but that's all it could do.
0:40 Today we want to write one program that
0:41 can solve many problems without needing to be rewritten.
0:45 AlphaGo is a great example of that.
0:47 As we speak, it's competing in the World Go Championship.
0:50 But similar software can also learn to play Atari games.
0:53 Machine Learning is what makes that possible.
0:55 It's the study of algorithms that
0:57 learn from examples and experience
0:59 instead of relying on hard-coded rules.
1:00 So that's the state-of-the-art.
1:02 But here's a much simpler example
1:03 we'll start coding up today.
1:05 I'll give you a problem that sounds easy but is
1:07 impossible to solve without Machine Learning.
1:09 Can you write code to tell the difference
1:11 between an apple and an orange?
1:12 Imagine I asked you to write a program that takes an image
1:15 file as input, does some analysis,
1:17 and outputs the types of fruit.
1:18 How can you solve this?
1:20 You'd have to start by writing lots of manual rules.
1:22 For example, you could write code
1:23 to count how many orange pixels there are and compare that
1:26 to the number of green ones.
1:27 The ratio should give you a hint about the type of fruit.
1:30 That works fine for simple images like these.
1:33 But as you dive deeper into the problem,
1:34 you'll find the real world is messy, and the rules you
1:37 write start to break.
1:38 How would you write code to handle black-and-white photos
1:41 or images with no apples or oranges in them at all?
1:44 In fact, for just about any rule you write,
1:46 I can find an image where it won't work.
1:48 You'd need to write tons of rules,
1:50 and that's just to tell the difference between apples
1:52 and oranges.
1:53 If I gave you a new problem, you need to start all over again.
1:57 Clearly, we need something better.
1:59 To solve this, we need an algorithm
2:00 that can figure out the rules for us,
2:02 so we don't have to write them by hand.
2:04 And for that, we're going to train a classifier.
2:07 For now you can think of a classifier as a function.
2:10 It takes some data as input and assigns a label to it
2:13 as output.
2:14 For example, I could have a picture
2:15 and want to classify it as an apple or an orange.
2:18 Or I have an email, and I want to classify it
2:20 as spam or not spam.
2:22 The technique to write the classifier
2:23 automatically is called supervised learning.
2:26 It begins with examples of the problem you want to solve.
2:29 To code this up, we'll work with scikit-learn.
2:31 Here, I'll download and install the library.
2:34 There are a couple different ways to do that.
2:35 But for me, the easiest has been to use Anaconda.
2:38 This makes it easy to get all the dependencies set up
2:40 and works well cross-platform.
2:42 With the magic of video, I'll fast forward
2:44 through downloading and installing it.
2:45 Once it's installed, you can test
2:47 that everything is working properly
2:48 by starting a Python script and importing SK learn.
2:51 Assuming that worked, that's line one of our program down,
2:53 five to go.
2:56 To use supervised learning, we'll
2:57 follow a recipe with a few standard steps.
3:00 Step one is to collect training data.
3:02 These are examples of the problem we want to solve.
3:04 For our problem, we're going to write a function
3:06 to classify a piece of fruit.
3:08 For starters, it will take a description of the fruit
3:10 as input and predict whether it's
3:11 an apple or an orange as output, based on features
3:14 like its weight and texture.
3:16 To collect our training data, imagine
3:18 we head out to an orchard.
3:19 We'll look at different apples and oranges
3:21 and write down measurements that describe them in a table.
3:23 In Machine Learning these measurements
3:25 are called features.
3:26 To keep things simple, here we've used just two--
3:28 how much each fruit weighs in grams and its texture, which
3:31 can be bumpy or smooth.
3:33 A good feature makes it easy to discriminate
3:35 between different types of fruit.
3:37 Each row in our training data is an example.
3:40 It describes one piece of fruit.
3:42 The last column is called the label.
3:44 It identifies what type of fruit is in each row,
3:46 and there are just two possibilities--
3:47 apples and oranges.
3:49 The whole table is our training data.
3:51 Think of these as all the examples
3:53 we want the classifier to learn from.
3:55 The more training data you have, the better a classifier
3:57 you can create.
3:59 Now let's write down our training data in code.
4:01 We'll use two variables-- features and labels.
4:04 Features contains the first two columns,
4:06 and labels contains the last.
4:07 You can think of features as the input
4:09 to the classifier and labels as the output we want.
4:13 I'm going to change the variable types of all features
4:15 to ints instead of strings, so I'll use 0 for bumpy and 1
4:18 for smooth.
4:19 I'll do the same for our labels, so I'll use 0 for apple
4:22 and 1 for orange.
4:23 These are lines two and three in our program.
4:26 Step two in our recipes to use these examples to train
4:29 a classifier.
4:30 The type of classifier we'll start with
4:32 is called a decision tree.
4:34 We'll dive into the details of how
4:35 these work in a future episode.
4:37 But for now, it's OK to think of a classifier as a box of rules.
4:41 That's because there are many different types of classifier,
4:43 but the input and output type is always the same.
4:47 I'm going to import the tree.
4:49 Then on line four of our script, we'll create the classifier.
4:52 At this point, it's just an empty box of rules.
4:54 It doesn't know anything about apples and oranges yet.
4:56 To train it, we'll need a learning algorithm.
4:58 If a classifier is a box of rules,
5:00 then you can think of the learning algorithm
5:02 as the procedure that creates them.
5:04 It does that by finding patterns in your training data.
5:06 For example, it might notice oranges tend to weigh more,
5:09 so it'll create a rule saying that the heavier fruit is,
5:11 the more likely it is to be an orange.
5:14 In scikit, the training algorithm
5:16 is included in the classifier object, and it's called Fit.
5:19 You can think of Fit as being a synonym for "find patterns
5:21 in data."
5:23 We'll get into the details of how
5:24 this happens under the hood in a future episode.
5:27 At this point, we have a trained classifier.
5:29 So let's take it for a spin and use it to classify a new fruit.
5:32 The input to the classifier is the features for a new example.
5:36 Let's say the fruit we want to classify
5:37 is 150 grams and bumpy.
5:39 The output will be 0 if it's an apple or 1 if it's an orange.
5:43 Before we hit Enter and see what the classifier predicts,
5:46 let's think for a sec.
5:47 If you had to guess, what would you say the output should be?
5:51 To figure that out, compare this fruit to our training data.
5:53 It looks like it's similar to an orange
5:55 because it's heavy and bumpy.
5:57 That's what I'd guess anyway, and if we hit Enter,
5:59 it's what our classifier predicts as well.
6:01 If everything worked for you, then
6:03 that's it for your first Machine Learning program.
6:06 You can create a new classifier for a new problem
6:08 just by changing the training data.
6:10 That makes this approach far more reusable
6:13 than writing new rules for each problem.
6:15 Now, you might be wondering why we described our fruit
6:17 using a table of features instead of using pictures
6:19 of the fruit as training data.
6:21 Well, you can use pictures, and we'll
6:23 get to that in a future episode.
6:25 But, as you'll see later on, the way we did it here
6:27 is more general.
6:29 The neat thing is that programming with Machine
6:30 Learning isn't hard.
6:32 But to get it right, you need to understand
6:33 a few important concepts.
6:35 I'll start walking you through those in the next few episodes.
6:37 Thanks very much for watching, and I'll see you then.
6:40 [MUSIC PLAYING]
Transcripción : Youtube
0:06 Six lines of code is all it takes
0:08 to write your first Machine Learning program.
0:10 My name's Josh Gordon, and today I'll
0:11 walk you through writing Hello World for Machine learning.
0:14 In the first few episodes of the series,
0:16 we'll teach you how to get started with Machine
0:17 Learning from scratch.
0:19 To do that, we'll work with two open source libraries,
0:21 scikit-learn and TensorFlow.
0:23 We'll see scikit in action in a minute.
0:25 But first, let's talk quickly about what Machine Learning is
0:27 and why it's important.
0:29 You can think of Machine Learning as a subfield
0:31 of artificial intelligence.
0:32 Early AI programs typically excelled at just one thing.
0:35 For example, Deep Blue could play chess
0:37 at a championship level, but that's all it could do.
0:40 Today we want to write one program that
0:41 can solve many problems without needing to be rewritten.
0:45 AlphaGo is a great example of that.
0:47 As we speak, it's competing in the World Go Championship.
0:50 But similar software can also learn to play Atari games.
0:53 Machine Learning is what makes that possible.
0:55 It's the study of algorithms that
0:57 learn from examples and experience
0:59 instead of relying on hard-coded rules.
1:00 So that's the state-of-the-art.
1:02 But here's a much simpler example
1:03 we'll start coding up today.
1:05 I'll give you a problem that sounds easy but is
1:07 impossible to solve without Machine Learning.
1:09 Can you write code to tell the difference
1:11 between an apple and an orange?
1:12 Imagine I asked you to write a program that takes an image
1:15 file as input, does some analysis,
1:17 and outputs the types of fruit.
1:18 How can you solve this?
1:20 You'd have to start by writing lots of manual rules.
1:22 For example, you could write code
1:23 to count how many orange pixels there are and compare that
1:26 to the number of green ones.
1:27 The ratio should give you a hint about the type of fruit.
1:30 That works fine for simple images like these.
1:33 But as you dive deeper into the problem,
1:34 you'll find the real world is messy, and the rules you
1:37 write start to break.
1:38 How would you write code to handle black-and-white photos
1:41 or images with no apples or oranges in them at all?
1:44 In fact, for just about any rule you write,
1:46 I can find an image where it won't work.
1:48 You'd need to write tons of rules,
1:50 and that's just to tell the difference between apples
1:52 and oranges.
1:53 If I gave you a new problem, you need to start all over again.
1:57 Clearly, we need something better.
1:59 To solve this, we need an algorithm
2:00 that can figure out the rules for us,
2:02 so we don't have to write them by hand.
2:04 And for that, we're going to train a classifier.
2:07 For now you can think of a classifier as a function.
2:10 It takes some data as input and assigns a label to it
2:13 as output.
2:14 For example, I could have a picture
2:15 and want to classify it as an apple or an orange.
2:18 Or I have an email, and I want to classify it
2:20 as spam or not spam.
2:22 The technique to write the classifier
2:23 automatically is called supervised learning.
2:26 It begins with examples of the problem you want to solve.
2:29 To code this up, we'll work with scikit-learn.
2:31 Here, I'll download and install the library.
2:34 There are a couple different ways to do that.
2:35 But for me, the easiest has been to use Anaconda.
2:38 This makes it easy to get all the dependencies set up
2:40 and works well cross-platform.
2:42 With the magic of video, I'll fast forward
2:44 through downloading and installing it.
2:45 Once it's installed, you can test
2:47 that everything is working properly
2:48 by starting a Python script and importing SK learn.
2:51 Assuming that worked, that's line one of our program down,
2:53 five to go.
2:56 To use supervised learning, we'll
2:57 follow a recipe with a few standard steps.
3:00 Step one is to collect training data.
3:02 These are examples of the problem we want to solve.
3:04 For our problem, we're going to write a function
3:06 to classify a piece of fruit.
3:08 For starters, it will take a description of the fruit
3:10 as input and predict whether it's
3:11 an apple or an orange as output, based on features
3:14 like its weight and texture.
3:16 To collect our training data, imagine
3:18 we head out to an orchard.
3:19 We'll look at different apples and oranges
3:21 and write down measurements that describe them in a table.
3:23 In Machine Learning these measurements
3:25 are called features.
3:26 To keep things simple, here we've used just two--
3:28 how much each fruit weighs in grams and its texture, which
3:31 can be bumpy or smooth.
3:33 A good feature makes it easy to discriminate
3:35 between different types of fruit.
3:37 Each row in our training data is an example.
3:40 It describes one piece of fruit.
3:42 The last column is called the label.
3:44 It identifies what type of fruit is in each row,
3:46 and there are just two possibilities--
3:47 apples and oranges.
3:49 The whole table is our training data.
3:51 Think of these as all the examples
3:53 we want the classifier to learn from.
3:55 The more training data you have, the better a classifier
3:57 you can create.
3:59 Now let's write down our training data in code.
4:01 We'll use two variables-- features and labels.
4:04 Features contains the first two columns,
4:06 and labels contains the last.
4:07 You can think of features as the input
4:09 to the classifier and labels as the output we want.
4:13 I'm going to change the variable types of all features
4:15 to ints instead of strings, so I'll use 0 for bumpy and 1
4:18 for smooth.
4:19 I'll do the same for our labels, so I'll use 0 for apple
4:22 and 1 for orange.
4:23 These are lines two and three in our program.
4:26 Step two in our recipes to use these examples to train
4:29 a classifier.
4:30 The type of classifier we'll start with
4:32 is called a decision tree.
4:34 We'll dive into the details of how
4:35 these work in a future episode.
4:37 But for now, it's OK to think of a classifier as a box of rules.
4:41 That's because there are many different types of classifier,
4:43 but the input and output type is always the same.
4:47 I'm going to import the tree.
4:49 Then on line four of our script, we'll create the classifier.
4:52 At this point, it's just an empty box of rules.
4:54 It doesn't know anything about apples and oranges yet.
4:56 To train it, we'll need a learning algorithm.
4:58 If a classifier is a box of rules,
5:00 then you can think of the learning algorithm
5:02 as the procedure that creates them.
5:04 It does that by finding patterns in your training data.
5:06 For example, it might notice oranges tend to weigh more,
5:09 so it'll create a rule saying that the heavier fruit is,
5:11 the more likely it is to be an orange.
5:14 In scikit, the training algorithm
5:16 is included in the classifier object, and it's called Fit.
5:19 You can think of Fit as being a synonym for "find patterns
5:21 in data."
5:23 We'll get into the details of how
5:24 this happens under the hood in a future episode.
5:27 At this point, we have a trained classifier.
5:29 So let's take it for a spin and use it to classify a new fruit.
5:32 The input to the classifier is the features for a new example.
5:36 Let's say the fruit we want to classify
5:37 is 150 grams and bumpy.
5:39 The output will be 0 if it's an apple or 1 if it's an orange.
5:43 Before we hit Enter and see what the classifier predicts,
5:46 let's think for a sec.
5:47 If you had to guess, what would you say the output should be?
5:51 To figure that out, compare this fruit to our training data.
5:53 It looks like it's similar to an orange
5:55 because it's heavy and bumpy.
5:57 That's what I'd guess anyway, and if we hit Enter,
5:59 it's what our classifier predicts as well.
6:01 If everything worked for you, then
6:03 that's it for your first Machine Learning program.
6:06 You can create a new classifier for a new problem
6:08 just by changing the training data.
6:10 That makes this approach far more reusable
6:13 than writing new rules for each problem.
6:15 Now, you might be wondering why we described our fruit
6:17 using a table of features instead of using pictures
6:19 of the fruit as training data.
6:21 Well, you can use pictures, and we'll
6:23 get to that in a future episode.
6:25 But, as you'll see later on, the way we did it here
6:27 is more general.
6:29 The neat thing is that programming with Machine
6:30 Learning isn't hard.
6:32 But to get it right, you need to understand
6:33 a few important concepts.
6:35 I'll start walking you through those in the next few episodes.
6:37 Thanks very much for watching, and I'll see you then.
6:40 [MUSIC PLAYING]
Transcripción : Youtube
No hay comentarios.:
Publicar un comentario