Hello World - Machine Learning Recipes #1

0:00   [MUSIC PLAYING]
0:06   Six lines of code is all it takes
0:08   to write your first Machine Learning program.
0:10   My name's Josh Gordon, and today I'll
0:11   walk you through writing Hello World for Machine learning.
0:14   In the first few episodes of the series,
0:16   we'll teach you how to get started with Machine
0:17   Learning from scratch.
0:19   To do that, we'll work with two open source libraries,
0:21   scikit-learn and TensorFlow.
0:23   We'll see scikit in action in a minute.
0:25   But first, let's talk quickly about what Machine Learning is
0:27   and why it's important.
0:29   You can think of Machine Learning as a subfield
0:31   of artificial intelligence.
0:32   Early AI programs typically excelled at just one thing.
0:35   For example, Deep Blue could play chess
0:37   at a championship level, but that's all it could do.
0:40   Today we want to write one program that
0:41   can solve many problems without needing to be rewritten.
0:45   AlphaGo is a great example of that.
0:47   As we speak, it's competing in the World Go Championship.
0:50   But similar software can also learn to play Atari games.
0:53   Machine Learning is what makes that possible.
0:55   It's the study of algorithms that
0:57   learn from examples and experience
0:59   instead of relying on hard-coded rules.
1:00   So that's the state-of-the-art.
1:02   But here's a much simpler example
1:03   we'll start coding up today.
1:05   I'll give you a problem that sounds easy but is
1:07   impossible to solve without Machine Learning.
1:09   Can you write code to tell the difference
1:11   between an apple and an orange?
1:12   Imagine I asked you to write a program that takes an image
1:15   file as input, does some analysis,
1:17   and outputs the types of fruit.
1:18   How can you solve this?
1:20   You'd have to start by writing lots of manual rules.
1:22   For example, you could write code
1:23   to count how many orange pixels there are and compare that
1:26   to the number of green ones.
1:27   The ratio should give you a hint about the type of fruit.
1:30   That works fine for simple images like these.
1:33   But as you dive deeper into the problem,
1:34   you'll find the real world is messy, and the rules you
1:37   write start to break.
1:38   How would you write code to handle black-and-white photos
1:41   or images with no apples or oranges in them at all?
1:44   In fact, for just about any rule you write,
1:46   I can find an image where it won't work.
1:48   You'd need to write tons of rules,
1:50   and that's just to tell the difference between apples
1:52   and oranges.
1:53   If I gave you a new problem, you need to start all over again.
1:57   Clearly, we need something better.
1:59   To solve this, we need an algorithm
2:00   that can figure out the rules for us,
2:02   so we don't have to write them by hand.
2:04   And for that, we're going to train a classifier.
2:07   For now you can think of a classifier as a function.
2:10   It takes some data as input and assigns a label to it
2:13   as output.
2:14   For example, I could have a picture
2:15   and want to classify it as an apple or an orange.
2:18   Or I have an email, and I want to classify it
2:20   as spam or not spam.
2:22   The technique to write the classifier
2:23   automatically is called supervised learning.
2:26   It begins with examples of the problem you want to solve.
2:29   To code this up, we'll work with scikit-learn.
2:31   Here, I'll download and install the library.
2:34   There are a couple different ways to do that.
2:35   But for me, the easiest has been to use Anaconda.
2:38   This makes it easy to get all the dependencies set up
2:40   and works well cross-platform.
2:42   With the magic of video, I'll fast forward
2:44   through downloading and installing it.
2:45   Once it's installed, you can test
2:47   that everything is working properly
2:48   by starting a Python script and importing SK learn.
2:51   Assuming that worked, that's line one of our program down,
2:53   five to go.
2:56   To use supervised learning, we'll
2:57   follow a recipe with a few standard steps.
3:00   Step one is to collect training data.
3:02   These are examples of the problem we want to solve.
3:04   For our problem, we're going to write a function
3:06   to classify a piece of fruit.
3:08   For starters, it will take a description of the fruit
3:10   as input and predict whether it's
3:11   an apple or an orange as output, based on features
3:14   like its weight and texture.
3:16   To collect our training data, imagine
3:18   we head out to an orchard.
3:19   We'll look at different apples and oranges
3:21   and write down measurements that describe them in a table.
3:23   In Machine Learning these measurements
3:25   are called features.
3:26   To keep things simple, here we've used just two--
3:28   how much each fruit weighs in grams and its texture, which
3:31   can be bumpy or smooth.
3:33   A good feature makes it easy to discriminate
3:35   between different types of fruit.
3:37   Each row in our training data is an example.
3:40   It describes one piece of fruit.
3:42   The last column is called the label.
3:44   It identifies what type of fruit is in each row,
3:46   and there are just two possibilities--
3:47   apples and oranges.
3:49   The whole table is our training data.
3:51   Think of these as all the examples
3:53   we want the classifier to learn from.
3:55   The more training data you have, the better a classifier
3:57   you can create.
3:59   Now let's write down our training data in code.
4:01   We'll use two variables-- features and labels.
4:04   Features contains the first two columns,
4:06   and labels contains the last.
4:07   You can think of features as the input
4:09   to the classifier and labels as the output we want.
4:13   I'm going to change the variable types of all features
4:15   to ints instead of strings, so I'll use 0 for bumpy and 1
4:18   for smooth.
4:19   I'll do the same for our labels, so I'll use 0 for apple
4:22   and 1 for orange.
4:23   These are lines two and three in our program.
4:26   Step two in our recipes to use these examples to train
4:29   a classifier.
4:30   The type of classifier we'll start with
4:32   is called a decision tree.
4:34   We'll dive into the details of how
4:35   these work in a future episode.
4:37   But for now, it's OK to think of a classifier as a box of rules.
4:41   That's because there are many different types of classifier,
4:43   but the input and output type is always the same.
4:47   I'm going to import the tree.
4:49   Then on line four of our script, we'll create the classifier.
4:52   At this point, it's just an empty box of rules.
4:54   It doesn't know anything about apples and oranges yet.
4:56   To train it, we'll need a learning algorithm.
4:58   If a classifier is a box of rules,
5:00   then you can think of the learning algorithm
5:02   as the procedure that creates them.
5:04   It does that by finding patterns in your training data.
5:06   For example, it might notice oranges tend to weigh more,
5:09   so it'll create a rule saying that the heavier fruit is,
5:11   the more likely it is to be an orange.
5:14   In scikit, the training algorithm
5:16   is included in the classifier object, and it's called Fit.
5:19   You can think of Fit as being a synonym for "find patterns
5:21   in data."
5:23   We'll get into the details of how
5:24   this happens under the hood in a future episode.
5:27   At this point, we have a trained classifier.
5:29   So let's take it for a spin and use it to classify a new fruit.
5:32   The input to the classifier is the features for a new example.
5:36   Let's say the fruit we want to classify
5:37   is 150 grams and bumpy.
5:39   The output will be 0 if it's an apple or 1 if it's an orange.
5:43   Before we hit Enter and see what the classifier predicts,
5:46   let's think for a sec.
5:47   If you had to guess, what would you say the output should be?
5:51   To figure that out, compare this fruit to our training data.
5:53   It looks like it's similar to an orange
5:55   because it's heavy and bumpy.
5:57   That's what I'd guess anyway, and if we hit Enter,
5:59   it's what our classifier predicts as well.
6:01   If everything worked for you, then
6:03   that's it for your first Machine Learning program.
6:06   You can create a new classifier for a new problem
6:08   just by changing the training data.
6:10   That makes this approach far more reusable
6:13   than writing new rules for each problem.
6:15   Now, you might be wondering why we described our fruit
6:17   using a table of features instead of using pictures
6:19   of the fruit as training data.
6:21   Well, you can use pictures, and we'll
6:23   get to that in a future episode.
6:25   But, as you'll see later on, the way we did it here
6:27   is more general.
6:29   The neat thing is that programming with Machine
6:30   Learning isn't hard.
6:32   But to get it right, you need to understand
6:33   a few important concepts.
6:35   I'll start walking you through those in the next few episodes.
6:37   Thanks very much for watching, and I'll see you then.
6:40   [MUSIC PLAYING]
Transcripción : Youtube

No hay comentarios.:

Publicar un comentario