Google Developers
Mostrando las entradas con la etiqueta Machine Learning Recipes. Mostrar todas las entradas
Mostrando las entradas con la etiqueta Machine Learning Recipes. Mostrar todas las entradas
Let’s Write a Pipeline - Machine Learning Recipes #4
0:00 [MUSIC PLAYING]
0:06 Welcome back.
0:07 We've covered a lot of ground already,
0:09 so today I want to review and reinforce concepts.
0:12 To do that, we'll explore two things.
0:14 First, we'll code up a basic pipeline
0:16 for supervised learning.
0:17 I'll show you how multiple classifiers
0:19 can solve the same problem.
0:21 Next, we'll build up a little more intuition
0:23 for what it means for an algorithm to learn something
0:25 from data, because that sounds kind of magical, but it's not.
0:29 To kick things off, let's look at a common experiment
0:31 you might want to do.
0:33 Imagine you're building a spam classifier.
0:35 That's just a function that labels an incoming email
0:37 as spam or not spam.
0:39 Now, say you've already collected a data set
0:41 and you're ready to train a model.
0:42 But before you put it into production,
0:44 there's a question you need to answer first--
0:46 how accurate will it be when you use it to classify emails that
0:49 weren't in your training data?
0:51 As best we can, we want to verify our models work well
0:54 before we deploy them.
0:56 And we can do an experiment to help us figure that out.
0:59 One approach is to partition our data set into two parts.
1:02 We'll call these Train and Test.
1:05 We'll use Train to train our model
1:07 and Test to see how accurate it is on new data.
1:10 That's a common pattern, so let's see how it looks in code.
1:13 To kick things off, let's import a data set into [? SyKit. ?]
1:17 We'll use Iris again, because it's handily included.
1:20 Now, we already saw Iris in episode two.
1:21 But what we haven't seen before is
1:23 that I'm calling the features x and the labels y.
1:26 Why is that?
1:28 Well, that's because one way to think of a classifier
1:30 is as a function.
1:32 At a high level, you can think of x as the input
1:34 and y as the output.
1:36 I'll talk more about that in the second half of this episode.
1:39 After we import the data set, the first thing we want to do
1:42 is partition it into Train and Test.
1:44 And to do that, we can import a handy utility,
1:46 and it makes the syntax clear.
1:48 We're taking our x's and our y's,
1:50 or our features and labels, and partitioning them
1:52 into two sets.
1:54 X_train and y_train are the features and labels
1:56 for the training set.
1:57 And X_test and y_test are the features and labels
2:00 for the testing set.
2:02 Here, I'm just saying that I want half the data to be
2:04 used for testing.
2:05 So if we have 150 examples in Iris, 75 will be in Train
2:09 and 75 will be in Test.
2:11 Now we'll create our classifier.
2:13 I'll use two different types here
2:14 to show you how they accomplish the same task.
2:17 Let's start with the decision tree we've already seen.
2:20 Note there's only two lines of code
2:22 that are classifier-specific.
2:25 Now let's train the classifier using our training data.
2:28 At this point, it's ready to be used to classify data.
2:31 And next, we'll call the predict method
2:33 and use it to classify our testing data.
2:35 If you print out the predictions,
2:37 you'll see there are a list of numbers.
2:38 These correspond to the type of Iris
2:40 the classifier predicts for each row in the testing data.
2:44 Now let's see how accurate our classifier
2:46 was on the testing set.
2:48 Recall that up top, we have the true labels for the testing
2:50 data.
2:51 To calculate our accuracy, we can
2:53 compare the predicted labels to the true labels,
2:55 and tally up the score.
2:57 There's a convenience method in [? Sykit ?]
2:59 we can import to do that.
3:00 Notice here, our accuracy was over 90%.
3:03 If you try this on your own, it might be a little bit different
3:06 because of some randomness in how the Train/Test
3:08 data is partitioned.
3:10 Now, here's something interesting.
3:11 By replacing these two lines, we can use a different classifier
3:14 to accomplish the same task.
3:16 Instead of using a decision tree,
3:18 we'll use one called [? KNearestNeighbors. ?]
3:20 If we run our experiment, we'll see that the code
3:23 works in exactly the same way.
3:25 The accuracy may be different when you run it,
3:27 because this classifier works a little bit differently
3:29 and because of the randomness in the Train/Test split.
3:32 Likewise, if we wanted to use a more sophisticated classifier,
3:35 we could just import it and change these two lines.
3:38 Otherwise, our code is the same.
3:40 The takeaway here is that while there are many different types
3:42 of classifiers, at a high level, they have a similar interface.
3:49 Now let's talk a little bit more about what
3:50 it means to learn from data.
3:53 Earlier, I said we called the features x and the labels y,
3:56 because they were the input and output of a function.
3:58 Now, of course, a function is something we already
4:00 know from programming.
4:02 def classify-- there's our function.
4:04 As we already know in supervised learning,
4:06 we don't want to write this ourselves.
4:09 We want an algorithm to learn it from training data.
4:12 So what does it mean to learn a function?
4:15 Well, a function is just a mapping from input
4:17 to output values.
4:18 Here's a function you might have seen before-- y
4:20 equals mx plus b.
4:22 That's the equation for a line, and there
4:24 are two parameters-- m, which gives the slope;
4:27 and b, which gives the y-intercept.
4:29 Given these parameters, of course,
4:31 we can plot the function for different values of x.
4:34 Now, in supervised learning, our classified function
4:36 might have some parameters as well,
4:38 but the input x are the features for an example we
4:41 want to classify, and the output y
4:43 is a label, like Spam or Not Spam, or a type of flower.
4:47 So what could the body of the function look like?
4:49 Well, that's the part we want to write algorithmically
4:51 or in other words, learn.
4:53 The important thing to understand here
4:55 is we're not starting from scratch
4:57 and pulling the body of the function out of thin air.
5:00 Instead, we start with a model.
5:01 And you can think of a model as the prototype for
5:04 or the rules that define the body of our function.
5:07 Typically, a model has parameters
5:08 that we can adjust with our training data.
5:10 And here's a high-level example of how this process works.
5:14 Let's look at a toy data set and think about what kind of model
5:17 we could use as a classifier.
5:19 Pretend we're interested in distinguishing
5:20 between red dots and green dots, some of which
5:23 I've drawn here on a graph.
5:25 To do that, we'll use just two features--
5:27 the x- and y-coordinates of a dot.
5:29 Now let's think about how we could classify this data.
5:32 We want a function that considers
5:34 a new dot it's never seen before,
5:35 and classifies it as red or green.
5:38 In fact, there might be a lot of data we want to classify.
5:40 Here, I've drawn our testing examples
5:42 in light green and light red.
5:44 These are dots that weren't in our training data.
5:47 The classifier has never seen them before, so how can
5:49 it predict the right label?
5:51 Well, imagine if we could somehow draw a line
5:53 across the data like this.
5:56 Then we could say the dots to the left
5:57 of the line are green and dots to the right of the line are
6:00 red.
6:00 And this line can serve as our classifier.
6:03 So how can we learn this line?
6:05 Well, one way is to use the training data to adjust
6:08 the parameters of a model.
6:09 And let's say the model we use is a simple straight line
6:12 like we saw before.
6:14 That means we have two parameters to adjust-- m and b.
6:17 And by changing them, we can change where the line appears.
6:21 So how could we learn the right parameters?
6:23 Well, one idea is that we can iteratively adjust
6:25 them using our training data.
6:27 For example, we might start with a random line
6:29 and use it to classify the first training example.
6:32 If it gets it right, we don't need to change our line,
6:35 so we move on to the next one.
6:36 But on the other hand, if it gets it wrong,
6:38 we could slightly adjust the parameters of our model
6:41 to make it more accurate.
6:43 The takeaway here is this.
6:44 One way to think of learning is using training data
6:47 to adjust the parameters of a model.
6:50 Now, here's something really special.
6:52 It's called tensorflow/playground.
6:55 This is a beautiful example of a neural network
6:57 you can run and experiment with right in your browser.
7:00 Now, this deserves its own episode for sure,
7:02 but for now, go ahead and play with it.
7:03 It's awesome.
7:04 The playground comes with different data
7:06 sets you can try out.
7:08 Some are very simple.
7:09 For example, we could use our line to classify this one.
7:12 Some data sets are much more complex.
7:15 This data set is especially hard.
7:17 And see if you can build a network to classify it.
7:20 Now, you can think of a neural network
7:21 as a more sophisticated type of classifier,
7:24 like a decision tree or a simple line.
7:26 But in principle, the idea is similar.
7:29 OK.
7:29 Hope that was helpful.
7:30 I just created a Twitter that you can follow
7:32 to be notified of new episodes.
7:33 And the next one should be out in a couple of weeks,
7:36 depending on how much work I'm doing for Google I/O. Thanks,
7:38 as always, for watching, and I'll see you next time.
Transcripción : Youtube
0:06 Welcome back.
0:07 We've covered a lot of ground already,
0:09 so today I want to review and reinforce concepts.
0:12 To do that, we'll explore two things.
0:14 First, we'll code up a basic pipeline
0:16 for supervised learning.
0:17 I'll show you how multiple classifiers
0:19 can solve the same problem.
0:21 Next, we'll build up a little more intuition
0:23 for what it means for an algorithm to learn something
0:25 from data, because that sounds kind of magical, but it's not.
0:29 To kick things off, let's look at a common experiment
0:31 you might want to do.
0:33 Imagine you're building a spam classifier.
0:35 That's just a function that labels an incoming email
0:37 as spam or not spam.
0:39 Now, say you've already collected a data set
0:41 and you're ready to train a model.
0:42 But before you put it into production,
0:44 there's a question you need to answer first--
0:46 how accurate will it be when you use it to classify emails that
0:49 weren't in your training data?
0:51 As best we can, we want to verify our models work well
0:54 before we deploy them.
0:56 And we can do an experiment to help us figure that out.
0:59 One approach is to partition our data set into two parts.
1:02 We'll call these Train and Test.
1:05 We'll use Train to train our model
1:07 and Test to see how accurate it is on new data.
1:10 That's a common pattern, so let's see how it looks in code.
1:13 To kick things off, let's import a data set into [? SyKit. ?]
1:17 We'll use Iris again, because it's handily included.
1:20 Now, we already saw Iris in episode two.
1:21 But what we haven't seen before is
1:23 that I'm calling the features x and the labels y.
1:26 Why is that?
1:28 Well, that's because one way to think of a classifier
1:30 is as a function.
1:32 At a high level, you can think of x as the input
1:34 and y as the output.
1:36 I'll talk more about that in the second half of this episode.
1:39 After we import the data set, the first thing we want to do
1:42 is partition it into Train and Test.
1:44 And to do that, we can import a handy utility,
1:46 and it makes the syntax clear.
1:48 We're taking our x's and our y's,
1:50 or our features and labels, and partitioning them
1:52 into two sets.
1:54 X_train and y_train are the features and labels
1:56 for the training set.
1:57 And X_test and y_test are the features and labels
2:00 for the testing set.
2:02 Here, I'm just saying that I want half the data to be
2:04 used for testing.
2:05 So if we have 150 examples in Iris, 75 will be in Train
2:09 and 75 will be in Test.
2:11 Now we'll create our classifier.
2:13 I'll use two different types here
2:14 to show you how they accomplish the same task.
2:17 Let's start with the decision tree we've already seen.
2:20 Note there's only two lines of code
2:22 that are classifier-specific.
2:25 Now let's train the classifier using our training data.
2:28 At this point, it's ready to be used to classify data.
2:31 And next, we'll call the predict method
2:33 and use it to classify our testing data.
2:35 If you print out the predictions,
2:37 you'll see there are a list of numbers.
2:38 These correspond to the type of Iris
2:40 the classifier predicts for each row in the testing data.
2:44 Now let's see how accurate our classifier
2:46 was on the testing set.
2:48 Recall that up top, we have the true labels for the testing
2:50 data.
2:51 To calculate our accuracy, we can
2:53 compare the predicted labels to the true labels,
2:55 and tally up the score.
2:57 There's a convenience method in [? Sykit ?]
2:59 we can import to do that.
3:00 Notice here, our accuracy was over 90%.
3:03 If you try this on your own, it might be a little bit different
3:06 because of some randomness in how the Train/Test
3:08 data is partitioned.
3:10 Now, here's something interesting.
3:11 By replacing these two lines, we can use a different classifier
3:14 to accomplish the same task.
3:16 Instead of using a decision tree,
3:18 we'll use one called [? KNearestNeighbors. ?]
3:20 If we run our experiment, we'll see that the code
3:23 works in exactly the same way.
3:25 The accuracy may be different when you run it,
3:27 because this classifier works a little bit differently
3:29 and because of the randomness in the Train/Test split.
3:32 Likewise, if we wanted to use a more sophisticated classifier,
3:35 we could just import it and change these two lines.
3:38 Otherwise, our code is the same.
3:40 The takeaway here is that while there are many different types
3:42 of classifiers, at a high level, they have a similar interface.
3:49 Now let's talk a little bit more about what
3:50 it means to learn from data.
3:53 Earlier, I said we called the features x and the labels y,
3:56 because they were the input and output of a function.
3:58 Now, of course, a function is something we already
4:00 know from programming.
4:02 def classify-- there's our function.
4:04 As we already know in supervised learning,
4:06 we don't want to write this ourselves.
4:09 We want an algorithm to learn it from training data.
4:12 So what does it mean to learn a function?
4:15 Well, a function is just a mapping from input
4:17 to output values.
4:18 Here's a function you might have seen before-- y
4:20 equals mx plus b.
4:22 That's the equation for a line, and there
4:24 are two parameters-- m, which gives the slope;
4:27 and b, which gives the y-intercept.
4:29 Given these parameters, of course,
4:31 we can plot the function for different values of x.
4:34 Now, in supervised learning, our classified function
4:36 might have some parameters as well,
4:38 but the input x are the features for an example we
4:41 want to classify, and the output y
4:43 is a label, like Spam or Not Spam, or a type of flower.
4:47 So what could the body of the function look like?
4:49 Well, that's the part we want to write algorithmically
4:51 or in other words, learn.
4:53 The important thing to understand here
4:55 is we're not starting from scratch
4:57 and pulling the body of the function out of thin air.
5:00 Instead, we start with a model.
5:01 And you can think of a model as the prototype for
5:04 or the rules that define the body of our function.
5:07 Typically, a model has parameters
5:08 that we can adjust with our training data.
5:10 And here's a high-level example of how this process works.
5:14 Let's look at a toy data set and think about what kind of model
5:17 we could use as a classifier.
5:19 Pretend we're interested in distinguishing
5:20 between red dots and green dots, some of which
5:23 I've drawn here on a graph.
5:25 To do that, we'll use just two features--
5:27 the x- and y-coordinates of a dot.
5:29 Now let's think about how we could classify this data.
5:32 We want a function that considers
5:34 a new dot it's never seen before,
5:35 and classifies it as red or green.
5:38 In fact, there might be a lot of data we want to classify.
5:40 Here, I've drawn our testing examples
5:42 in light green and light red.
5:44 These are dots that weren't in our training data.
5:47 The classifier has never seen them before, so how can
5:49 it predict the right label?
5:51 Well, imagine if we could somehow draw a line
5:53 across the data like this.
5:56 Then we could say the dots to the left
5:57 of the line are green and dots to the right of the line are
6:00 red.
6:00 And this line can serve as our classifier.
6:03 So how can we learn this line?
6:05 Well, one way is to use the training data to adjust
6:08 the parameters of a model.
6:09 And let's say the model we use is a simple straight line
6:12 like we saw before.
6:14 That means we have two parameters to adjust-- m and b.
6:17 And by changing them, we can change where the line appears.
6:21 So how could we learn the right parameters?
6:23 Well, one idea is that we can iteratively adjust
6:25 them using our training data.
6:27 For example, we might start with a random line
6:29 and use it to classify the first training example.
6:32 If it gets it right, we don't need to change our line,
6:35 so we move on to the next one.
6:36 But on the other hand, if it gets it wrong,
6:38 we could slightly adjust the parameters of our model
6:41 to make it more accurate.
6:43 The takeaway here is this.
6:44 One way to think of learning is using training data
6:47 to adjust the parameters of a model.
6:50 Now, here's something really special.
6:52 It's called tensorflow/playground.
6:55 This is a beautiful example of a neural network
6:57 you can run and experiment with right in your browser.
7:00 Now, this deserves its own episode for sure,
7:02 but for now, go ahead and play with it.
7:03 It's awesome.
7:04 The playground comes with different data
7:06 sets you can try out.
7:08 Some are very simple.
7:09 For example, we could use our line to classify this one.
7:12 Some data sets are much more complex.
7:15 This data set is especially hard.
7:17 And see if you can build a network to classify it.
7:20 Now, you can think of a neural network
7:21 as a more sophisticated type of classifier,
7:24 like a decision tree or a simple line.
7:26 But in principle, the idea is similar.
7:29 OK.
7:29 Hope that was helpful.
7:30 I just created a Twitter that you can follow
7:32 to be notified of new episodes.
7:33 And the next one should be out in a couple of weeks,
7:36 depending on how much work I'm doing for Google I/O. Thanks,
7:38 as always, for watching, and I'll see you next time.
Transcripción : Youtube
What Makes a Good Feature? - Machine Learning Recipes #3
0:06 JOSH GORDON: Classifiers are only
0:08 as good as the features you provide.
0:10 That means coming up with good features
0:12 is one of your most important jobs in machine learning.
0:14 But what makes a good feature, and how can you tell?
0:17 If you're doing binary classification,
0:19 then a good feature makes it easy to decide
0:21 between two different things.
0:23 For example, imagine we wanted to write a classifier
0:26 to tell the difference between two types of dogs--
0:29 greyhounds and Labradors.
0:30 Here we'll use two features-- the dog's height in inches
0:34 and their eye color.
0:35 Just for this toy example, let's make a couple assumptions
0:38 about dogs to keep things simple.
0:40 First, we'll say that greyhounds are usually
0:43 taller than Labradors.
0:44 Next, we'll pretend that dogs have only two eye
0:47 colors-- blue and brown.
0:48 And we'll say the color of their eyes
0:50 doesn't depend on the breed of dog.
0:53 This means that one of these features is useful
0:55 and the other tells us nothing.
0:57 To understand why, we'll visualize them using a toy
1:01 dataset I'll create.
1:02 Let's begin with height.
1:04 How useful do you think this feature is?
1:06 Well, on average, greyhounds tend
1:08 to be a couple inches taller than Labradors, but not always.
1:11 There's a lot of variation in the world.
1:13 So when we think of a feature, we
1:15 have to consider how it looks for different values
1:17 in a population.
1:19 Let's head into Python for a programmatic example.
1:22 I'm creating a population of 1,000
1:24 dogs-- 50-50 greyhound Labrador.
1:27 I'll give each of them a height.
1:29 For this example, we'll say that greyhounds
1:31 are on average 28 inches tall and Labradors are 24.
1:35 Now, all dogs are a bit different.
1:37 Let's say that height is normally distributed,
1:39 so we'll make both of these plus or minus 4 inches.
1:42 This will give us two arrays of numbers,
1:44 and we can visualize them in a histogram.
1:47 I'll add a parameter so greyhounds are in red
1:49 and Labradors are in blue.
1:51 Now we can run our script.
1:53 This shows how many dogs in our population have a given height.
1:57 There's a lot of data on the screen,
1:58 so let's simplify it and look at it piece by piece.
2:03 We'll start with dogs on the far left
2:05 of the distribution-- say, who are about 20 inches tall.
2:08 Imagine I asked you to predict whether a dog with his height
2:11 was a lab or a greyhound.
2:13 What would you do?
2:14 Well, you could figure out the probability of each type
2:16 of dog given their height.
2:18 Here, it's more likely the dog is a lab.
2:20 On the other hand, if we go all the way
2:22 to the right of the histogram and look
2:24 at a dog who is 35 inches tall, we
2:26 can be pretty confident they're a greyhound.
2:29 Now, what about a dog in the middle?
2:31 You can see the graph gives us less information
2:33 here, because the probability of each type of dog is close.
2:36 So height is a useful feature, but it's not perfect.
2:40 That's why in machine learning, you almost always
2:42 need multiple features.
2:43 Otherwise, you could just write an if statement
2:45 instead of bothering with the classifier.
2:47 To figure out what types of features you should use,
2:50 do a thought experiment.
2:52 Pretend you're the classifier.
2:53 If you were trying to figure out if this dog is
2:55 a lab or a greyhound, what other things would you want to know?
3:00 You might ask about their hair length,
3:01 or how fast they can run, or how much they weigh.
3:04 Exactly how many features you should use
3:06 is more of an art than a science,
3:08 but as a rule of thumb, think about how many you'd
3:10 need to solve the problem.
3:12 Now let's look at another feature like eye color.
3:15 Just for this toy example, let's imagine
3:17 dogs have only two eye colors, blue and brown.
3:20 And let's say the color of their eyes
3:22 doesn't depend on the breed of dog.
3:24 Here's what a histogram might look like for this example.
3:28 For most values, the distribution is about 50/50.
3:32 So this feature tells us nothing,
3:33 because it doesn't correlate with the type of dog.
3:36 Including a useless feature like this in your training
3:39 data can hurt your classifier's accuracy.
3:41 That's because there's a chance they might appear useful purely
3:45 by accident, especially if you have only a small amount
3:48 of training data.
3:50 You also want your features to be independent.
3:52 And independent features give you
3:54 different types of information.
3:56 Imagine we already have a feature-- height and inches--
3:59 in our dataset.
4:00 Ask yourself, would it be helpful
4:02 if we added another feature, like height in centimeters?
4:05 No, because it's perfectly correlated with one
4:08 we already have.
4:09 It's good practice to remove highly correlated features
4:12 from your training data.
4:14 That's because a lot of classifiers
4:15 aren't smart enough to realize that height in inches
4:18 in centimeters are the same thing,
4:20 so they might double count how important this feature is.
4:23 Last, you want your features to be easy to understand.
4:26 For a new example, imagine you want
4:28 to predict how many days it will take
4:30 to mail a letter between two different cities.
4:33 The farther apart the cities are, the longer it will take.
4:37 A great feature to use would be the distance
4:39 between the cities in miles.
4:42 A much worse pair of features to use
4:44 would be the city's locations given by their latitude
4:47 and longitude.
4:48 And here's why.
4:48 I can look at the distance and make
4:51 a good guess of how long it will take the letter to arrive.
4:54 But learning the relationship between latitude, longitude,
4:56 and time is much harder and would require many more
5:00 examples in your training data.
5:01 Now, there are techniques you can
5:03 use to figure out exactly how useful your features are,
5:05 and even what combinations of them are best,
5:08 so you never have to leave it to chance.
5:11 We'll get to those in a future episode.
5:13 Coming up next time, we'll continue building our intuition
5:16 for supervised learning.
5:17 We'll show how different types of classifiers
5:19 can be used to solve the same problem and dive a little bit
5:22 deeper into how they work.
5:24 Thanks very much for watching, and I'll see you then.
Transcripción : Youtube
0:08 as good as the features you provide.
0:10 That means coming up with good features
0:12 is one of your most important jobs in machine learning.
0:14 But what makes a good feature, and how can you tell?
0:17 If you're doing binary classification,
0:19 then a good feature makes it easy to decide
0:21 between two different things.
0:23 For example, imagine we wanted to write a classifier
0:26 to tell the difference between two types of dogs--
0:29 greyhounds and Labradors.
0:30 Here we'll use two features-- the dog's height in inches
0:34 and their eye color.
0:35 Just for this toy example, let's make a couple assumptions
0:38 about dogs to keep things simple.
0:40 First, we'll say that greyhounds are usually
0:43 taller than Labradors.
0:44 Next, we'll pretend that dogs have only two eye
0:47 colors-- blue and brown.
0:48 And we'll say the color of their eyes
0:50 doesn't depend on the breed of dog.
0:53 This means that one of these features is useful
0:55 and the other tells us nothing.
0:57 To understand why, we'll visualize them using a toy
1:01 dataset I'll create.
1:02 Let's begin with height.
1:04 How useful do you think this feature is?
1:06 Well, on average, greyhounds tend
1:08 to be a couple inches taller than Labradors, but not always.
1:11 There's a lot of variation in the world.
1:13 So when we think of a feature, we
1:15 have to consider how it looks for different values
1:17 in a population.
1:19 Let's head into Python for a programmatic example.
1:22 I'm creating a population of 1,000
1:24 dogs-- 50-50 greyhound Labrador.
1:27 I'll give each of them a height.
1:29 For this example, we'll say that greyhounds
1:31 are on average 28 inches tall and Labradors are 24.
1:35 Now, all dogs are a bit different.
1:37 Let's say that height is normally distributed,
1:39 so we'll make both of these plus or minus 4 inches.
1:42 This will give us two arrays of numbers,
1:44 and we can visualize them in a histogram.
1:47 I'll add a parameter so greyhounds are in red
1:49 and Labradors are in blue.
1:51 Now we can run our script.
1:53 This shows how many dogs in our population have a given height.
1:57 There's a lot of data on the screen,
1:58 so let's simplify it and look at it piece by piece.
2:03 We'll start with dogs on the far left
2:05 of the distribution-- say, who are about 20 inches tall.
2:08 Imagine I asked you to predict whether a dog with his height
2:11 was a lab or a greyhound.
2:13 What would you do?
2:14 Well, you could figure out the probability of each type
2:16 of dog given their height.
2:18 Here, it's more likely the dog is a lab.
2:20 On the other hand, if we go all the way
2:22 to the right of the histogram and look
2:24 at a dog who is 35 inches tall, we
2:26 can be pretty confident they're a greyhound.
2:29 Now, what about a dog in the middle?
2:31 You can see the graph gives us less information
2:33 here, because the probability of each type of dog is close.
2:36 So height is a useful feature, but it's not perfect.
2:40 That's why in machine learning, you almost always
2:42 need multiple features.
2:43 Otherwise, you could just write an if statement
2:45 instead of bothering with the classifier.
2:47 To figure out what types of features you should use,
2:50 do a thought experiment.
2:52 Pretend you're the classifier.
2:53 If you were trying to figure out if this dog is
2:55 a lab or a greyhound, what other things would you want to know?
3:00 You might ask about their hair length,
3:01 or how fast they can run, or how much they weigh.
3:04 Exactly how many features you should use
3:06 is more of an art than a science,
3:08 but as a rule of thumb, think about how many you'd
3:10 need to solve the problem.
3:12 Now let's look at another feature like eye color.
3:15 Just for this toy example, let's imagine
3:17 dogs have only two eye colors, blue and brown.
3:20 And let's say the color of their eyes
3:22 doesn't depend on the breed of dog.
3:24 Here's what a histogram might look like for this example.
3:28 For most values, the distribution is about 50/50.
3:32 So this feature tells us nothing,
3:33 because it doesn't correlate with the type of dog.
3:36 Including a useless feature like this in your training
3:39 data can hurt your classifier's accuracy.
3:41 That's because there's a chance they might appear useful purely
3:45 by accident, especially if you have only a small amount
3:48 of training data.
3:50 You also want your features to be independent.
3:52 And independent features give you
3:54 different types of information.
3:56 Imagine we already have a feature-- height and inches--
3:59 in our dataset.
4:00 Ask yourself, would it be helpful
4:02 if we added another feature, like height in centimeters?
4:05 No, because it's perfectly correlated with one
4:08 we already have.
4:09 It's good practice to remove highly correlated features
4:12 from your training data.
4:14 That's because a lot of classifiers
4:15 aren't smart enough to realize that height in inches
4:18 in centimeters are the same thing,
4:20 so they might double count how important this feature is.
4:23 Last, you want your features to be easy to understand.
4:26 For a new example, imagine you want
4:28 to predict how many days it will take
4:30 to mail a letter between two different cities.
4:33 The farther apart the cities are, the longer it will take.
4:37 A great feature to use would be the distance
4:39 between the cities in miles.
4:42 A much worse pair of features to use
4:44 would be the city's locations given by their latitude
4:47 and longitude.
4:48 And here's why.
4:48 I can look at the distance and make
4:51 a good guess of how long it will take the letter to arrive.
4:54 But learning the relationship between latitude, longitude,
4:56 and time is much harder and would require many more
5:00 examples in your training data.
5:01 Now, there are techniques you can
5:03 use to figure out exactly how useful your features are,
5:05 and even what combinations of them are best,
5:08 so you never have to leave it to chance.
5:11 We'll get to those in a future episode.
5:13 Coming up next time, we'll continue building our intuition
5:16 for supervised learning.
5:17 We'll show how different types of classifiers
5:19 can be used to solve the same problem and dive a little bit
5:22 deeper into how they work.
5:24 Thanks very much for watching, and I'll see you then.
Transcripción : Youtube
Visualizing a Decision Tree - Machine Learning Recipes #2
0:00 [MUSIC PLAYING]
0:06 Last episode, we used a decision tree as our classifier.
0:09 Today we'll add code to visualize it
0:10 so we can see how it works under the hood.
0:13 There are many types of classifiers
0:14 you may have heard of before-- things like neural nets
0:16 or support vector machines.
0:17 So why did we use a decision tree to start?
0:20 Well, they have a very unique property--
0:21 they're easy to read and understand.
0:23 In fact, they're one of the few models that are interpretable,
0:26 where you can understand exactly why the classifier makes
0:28 a decision.
0:29 That's amazingly useful in practice.
0:33 To get started, I'll introduce you
0:34 to a real data set we'll work with today.
0:37 It's called Iris.
0:38 Iris is a classic machine learning problem.
0:41 In it, you want to identify what type of flower
0:43 you have based on different measurements,
0:45 like the length and width of the petal.
0:46 The data set includes three different types of flowers.
0:49 They're all species of iris-- setosa, versicolor,
0:52 and virginica.
0:53 Scrolling down, you can see we're
0:55 given 50 examples of each type, so 150 examples total.
1:00 Notice there are four features that are
1:01 used to describe each example.
1:03 These are the length and width of the sepal and petal.
1:06 And just like in our apples and oranges problem,
1:08 the first four columns give the features and the last column
1:11 gives the labels, which is the type of flower in each row.
1:15 Our goal is to use this data set to train a classifier.
1:18 Then we can use that classifier to predict what species
1:21 of flower we have if we're given a new flower that we've never
1:23 seen before.
1:25 Knowing how to work with an existing data set
1:26 is a good skill, so let's import Iris into scikit-learn
1:29 and see what it looks like in code.
1:32 Conveniently, the friendly folks at scikit
1:33 provided a bunch of sample data sets,
1:35 including Iris, as well as utilities
1:37 to make them easy to import.
1:39 We can import Iris into our code like this.
1:42 The data set includes both the table
1:44 from Wikipedia as well as some metadata.
1:47 The metadata tells you the names of the features
1:49 and the names of different types of flowers.
1:52 The features and examples themselves
1:54 are contained in the data variable.
1:56 For example, if I print out the first entry,
1:58 you can see the measurements for this flower.
2:00 These index to the feature names, so the first value
2:03 refers to the sepal length, and the second to sepal width,
2:06 and so on.
2:09 The target variable contains the labels.
2:11 Likewise, these index to the target names.
2:14 Let's print out the first one.
2:16 A label of 0 means it's a setosa.
2:19 If you look at the table from Wikipedia,
2:21 you'll notice that we just printed out the first row.
2:24 Now both the data and target variables have 150 entries.
2:27 If you want, you can iterate over them
2:29 to print out the entire data set like this.
2:32 Now that we know how to work with the data set,
2:34 we're ready to train a classifier.
2:35 But before we do that, first we need to split up the data.
2:39 I'm going to remove several of the examples
2:41 and put them aside for later.
2:43 We'll call the examples I'm putting aside our testing data.
2:46 We'll keep these separate from our training data,
2:48 and later on we'll use our testing examples
2:50 to test how accurate the classifier is
2:53 on data it's never seen before.
2:55 Testing is actually a really important part
2:57 of doing machine learning well in practice,
2:59 and we'll cover it in more detail in a future episode.
3:02 Just for this exercise, I'll remove one example
3:04 of each type of flower.
3:06 And as it happens, the data set is
3:07 ordered so the first setosa is at index 0,
3:10 and the first versicolor is at 50, and so on.
3:14 The syntax looks a little bit complicated, but all I'm doing
3:16 is removing three entries from the data and target variables.
3:21 Then I'll create two new sets of variables-- one
3:24 for training and one for testing.
3:26 Training will have the majority of our data,
3:28 and testing will have just the examples I removed.
3:31 Now, just as before, we can create a decision tree
3:33 classifier and train it on our training data.
3:40 Before we visualize it, let's use the tree
3:42 to classify our testing data.
3:44 We know we have one flower of each type,
3:47 and we can print out the labels we expect.
3:50 Now let's see what the tree predicts.
3:52 We'll give it the features for our testing data,
3:54 and we'll get back labels.
3:56 You can see the predicted labels match our testing data.
3:59 That means it got them all right.
4:01 Now, keep in mind, this was a very simple test,
4:04 and we'll go into more detail down the road.
4:07 Now let's visualize the tree so we can
4:09 see how the classifier works.
4:11 To do that, I'm going to copy-paste
4:13 some code in from scikit's tutorials,
4:15 and because this code is for visualization
4:16 and not machine-learning concepts,
4:18 I won't cover the details here.
4:20 Note that I'm combining the code from these two examples
4:22 to create an easy-to-read PDF.
4:26 I can run our script and open up the PDF,
4:28 and we can see the tree.
4:30 To use it to classify data, you start by reading from the top.
4:33 Each node asks a yes or no question
4:35 about one of the features.
4:37 For example, this node asks if the pedal width
4:39 is less than 0.8 centimeters.
4:41 If it's true for the example you're classifying, go left.
4:44 Otherwise, go right.
4:46 Now let's use this tree to classify an example
4:48 from our testing data.
4:50 Here are the features and label for our first testing flower.
4:53 Remember, you can find the feature names
4:54 by looking at the metadata.
4:56 We know this flower is a setosa, so let's see
4:58 what the tree predicts.
5:00 I'll resize the windows to make this easier to see.
5:03 And the first question the tree asks
5:04 is whether the petal width is less than 0.8 centimeters.
5:08 That's the fourth feature.
5:09 The answer is true, so we proceed left.
5:11 At this point, we're already at a leaf node.
5:14 There are no other questions to ask,
5:15 so the tree gives us a prediction, setosa,
5:18 and it's right.
5:19 Notice the label is 0, which indexes to that type of flower.
5:23 Now let's try our second testing example.
5:25 This one is a versicolor.
5:27 Let's see what the tree predicts.
5:29 Again we read from the top, and this time the pedal width
5:31 is greater than 0.8 centimeters.
5:33 The answer to the tree's question is false,
5:35 so we go right.
5:36 The next question the tree asks is whether the pedal width
5:39 is less than 1.75.
5:40 It's trying to narrow it down.
5:42 That's true, so we go left.
5:44 Now it asks if the pedal length is less than 4.95.
5:47 That's true, so we go left again.
5:49 And finally, the tree asks if the pedal width
5:51 is less than 1.65.
5:52 That's true, so left it is.
5:54 And now we have our prediction-- it's a versicolor,
5:57 and that's right again.
5:58 You can try the last one on your own as an exercise.
6:01 And remember, the way we're using the tree
6:03 is the same way it works in code.
6:05 So that's how you quickly visualize and read
6:07 a decision tree.
6:08 There's a lot more to learn here,
6:09 especially how they're built automatically from examples.
6:12 We'll get to that in a future episode.
6:14 But for now, let's close with an essential point.
6:17 Every question the tree asks must be about one
6:19 of your features.
6:20 That means the better your features are, the better a tree
6:22 you can build.
6:23 And the next episode will start looking
6:25 at what makes a good feature.
6:26 Thanks very much for watching, and I'll see you next time.
6:28 [MUSIC PLAYING]
Transcripción : Youtube
0:06 Last episode, we used a decision tree as our classifier.
0:09 Today we'll add code to visualize it
0:10 so we can see how it works under the hood.
0:13 There are many types of classifiers
0:14 you may have heard of before-- things like neural nets
0:16 or support vector machines.
0:17 So why did we use a decision tree to start?
0:20 Well, they have a very unique property--
0:21 they're easy to read and understand.
0:23 In fact, they're one of the few models that are interpretable,
0:26 where you can understand exactly why the classifier makes
0:28 a decision.
0:29 That's amazingly useful in practice.
0:33 To get started, I'll introduce you
0:34 to a real data set we'll work with today.
0:37 It's called Iris.
0:38 Iris is a classic machine learning problem.
0:41 In it, you want to identify what type of flower
0:43 you have based on different measurements,
0:45 like the length and width of the petal.
0:46 The data set includes three different types of flowers.
0:49 They're all species of iris-- setosa, versicolor,
0:52 and virginica.
0:53 Scrolling down, you can see we're
0:55 given 50 examples of each type, so 150 examples total.
1:00 Notice there are four features that are
1:01 used to describe each example.
1:03 These are the length and width of the sepal and petal.
1:06 And just like in our apples and oranges problem,
1:08 the first four columns give the features and the last column
1:11 gives the labels, which is the type of flower in each row.
1:15 Our goal is to use this data set to train a classifier.
1:18 Then we can use that classifier to predict what species
1:21 of flower we have if we're given a new flower that we've never
1:23 seen before.
1:25 Knowing how to work with an existing data set
1:26 is a good skill, so let's import Iris into scikit-learn
1:29 and see what it looks like in code.
1:32 Conveniently, the friendly folks at scikit
1:33 provided a bunch of sample data sets,
1:35 including Iris, as well as utilities
1:37 to make them easy to import.
1:39 We can import Iris into our code like this.
1:42 The data set includes both the table
1:44 from Wikipedia as well as some metadata.
1:47 The metadata tells you the names of the features
1:49 and the names of different types of flowers.
1:52 The features and examples themselves
1:54 are contained in the data variable.
1:56 For example, if I print out the first entry,
1:58 you can see the measurements for this flower.
2:00 These index to the feature names, so the first value
2:03 refers to the sepal length, and the second to sepal width,
2:06 and so on.
2:09 The target variable contains the labels.
2:11 Likewise, these index to the target names.
2:14 Let's print out the first one.
2:16 A label of 0 means it's a setosa.
2:19 If you look at the table from Wikipedia,
2:21 you'll notice that we just printed out the first row.
2:24 Now both the data and target variables have 150 entries.
2:27 If you want, you can iterate over them
2:29 to print out the entire data set like this.
2:32 Now that we know how to work with the data set,
2:34 we're ready to train a classifier.
2:35 But before we do that, first we need to split up the data.
2:39 I'm going to remove several of the examples
2:41 and put them aside for later.
2:43 We'll call the examples I'm putting aside our testing data.
2:46 We'll keep these separate from our training data,
2:48 and later on we'll use our testing examples
2:50 to test how accurate the classifier is
2:53 on data it's never seen before.
2:55 Testing is actually a really important part
2:57 of doing machine learning well in practice,
2:59 and we'll cover it in more detail in a future episode.
3:02 Just for this exercise, I'll remove one example
3:04 of each type of flower.
3:06 And as it happens, the data set is
3:07 ordered so the first setosa is at index 0,
3:10 and the first versicolor is at 50, and so on.
3:14 The syntax looks a little bit complicated, but all I'm doing
3:16 is removing three entries from the data and target variables.
3:21 Then I'll create two new sets of variables-- one
3:24 for training and one for testing.
3:26 Training will have the majority of our data,
3:28 and testing will have just the examples I removed.
3:31 Now, just as before, we can create a decision tree
3:33 classifier and train it on our training data.
3:40 Before we visualize it, let's use the tree
3:42 to classify our testing data.
3:44 We know we have one flower of each type,
3:47 and we can print out the labels we expect.
3:50 Now let's see what the tree predicts.
3:52 We'll give it the features for our testing data,
3:54 and we'll get back labels.
3:56 You can see the predicted labels match our testing data.
3:59 That means it got them all right.
4:01 Now, keep in mind, this was a very simple test,
4:04 and we'll go into more detail down the road.
4:07 Now let's visualize the tree so we can
4:09 see how the classifier works.
4:11 To do that, I'm going to copy-paste
4:13 some code in from scikit's tutorials,
4:15 and because this code is for visualization
4:16 and not machine-learning concepts,
4:18 I won't cover the details here.
4:20 Note that I'm combining the code from these two examples
4:22 to create an easy-to-read PDF.
4:26 I can run our script and open up the PDF,
4:28 and we can see the tree.
4:30 To use it to classify data, you start by reading from the top.
4:33 Each node asks a yes or no question
4:35 about one of the features.
4:37 For example, this node asks if the pedal width
4:39 is less than 0.8 centimeters.
4:41 If it's true for the example you're classifying, go left.
4:44 Otherwise, go right.
4:46 Now let's use this tree to classify an example
4:48 from our testing data.
4:50 Here are the features and label for our first testing flower.
4:53 Remember, you can find the feature names
4:54 by looking at the metadata.
4:56 We know this flower is a setosa, so let's see
4:58 what the tree predicts.
5:00 I'll resize the windows to make this easier to see.
5:03 And the first question the tree asks
5:04 is whether the petal width is less than 0.8 centimeters.
5:08 That's the fourth feature.
5:09 The answer is true, so we proceed left.
5:11 At this point, we're already at a leaf node.
5:14 There are no other questions to ask,
5:15 so the tree gives us a prediction, setosa,
5:18 and it's right.
5:19 Notice the label is 0, which indexes to that type of flower.
5:23 Now let's try our second testing example.
5:25 This one is a versicolor.
5:27 Let's see what the tree predicts.
5:29 Again we read from the top, and this time the pedal width
5:31 is greater than 0.8 centimeters.
5:33 The answer to the tree's question is false,
5:35 so we go right.
5:36 The next question the tree asks is whether the pedal width
5:39 is less than 1.75.
5:40 It's trying to narrow it down.
5:42 That's true, so we go left.
5:44 Now it asks if the pedal length is less than 4.95.
5:47 That's true, so we go left again.
5:49 And finally, the tree asks if the pedal width
5:51 is less than 1.65.
5:52 That's true, so left it is.
5:54 And now we have our prediction-- it's a versicolor,
5:57 and that's right again.
5:58 You can try the last one on your own as an exercise.
6:01 And remember, the way we're using the tree
6:03 is the same way it works in code.
6:05 So that's how you quickly visualize and read
6:07 a decision tree.
6:08 There's a lot more to learn here,
6:09 especially how they're built automatically from examples.
6:12 We'll get to that in a future episode.
6:14 But for now, let's close with an essential point.
6:17 Every question the tree asks must be about one
6:19 of your features.
6:20 That means the better your features are, the better a tree
6:22 you can build.
6:23 And the next episode will start looking
6:25 at what makes a good feature.
6:26 Thanks very much for watching, and I'll see you next time.
6:28 [MUSIC PLAYING]
Transcripción : Youtube
Youtube : What Makes a Good Feature? - Machine Learning Recipes #3
Transcripción
0:06JOSH GORDON: Classifiers are only
0:08as good as the features you provide.
0:10That means coming up with good features
0:12is one of your most important jobs in machine learning.
0:14But what makes a good feature, and how can you tell?
0:17If you're doing binary classification,
0:19then a good feature makes it easy to decide
0:21between two different things.
0:23For example, imagine we wanted to write a classifier
0:26to tell the difference between two types of dogs--
0:29greyhounds and Labradors.
0:30Here we'll use two features-- the dog's height in inches
0:34and their eye color.
0:35Just for this toy example, let's make a couple assumptions
0:38about dogs to keep things simple.
0:40First, we'll say that greyhounds are usually
0:43taller than Labradors.
0:44Next, we'll pretend that dogs have only two eye
0:47colors-- blue and brown.
0:48And we'll say the color of their eyes
0:50doesn't depend on the breed of dog.
0:53This means that one of these features is useful
0:55and the other tells us nothing.
0:57To understand why, we'll visualize them using a toy
1:01dataset I'll create.
1:02Let's begin with height.
1:04How useful do you think this feature is?
1:06Well, on average, greyhounds tend
1:08to be a couple inches taller than Labradors, but not always.
1:11There's a lot of variation in the world.
1:13So when we think of a feature, we
1:15have to consider how it looks for different values
1:17in a population.
1:19Let's head into Python for a programmatic example.
1:22I'm creating a population of 1,000
1:24dogs-- 50-50 greyhound Labrador.
1:27I'll give each of them a height.
1:29For this example, we'll say that greyhounds
1:31are on average 28 inches tall and Labradors are 24.
1:35Now, all dogs are a bit different.
1:37Let's say that height is normally distributed,
1:39so we'll make both of these plus or minus 4 inches.
1:42This will give us two arrays of numbers,
1:44and we can visualize them in a histogram.
1:47I'll add a parameter so greyhounds are in red
1:49and Labradors are in blue.
1:51Now we can run our script.
1:53This shows how many dogs in our population have a given height.
1:57There's a lot of data on the screen,
1:58so let's simplify it and look at it piece by piece.
2:03We'll start with dogs on the far left
2:05of the distribution-- say, who are about 20 inches tall.
2:08Imagine I asked you to predict whether a dog with his height
2:11was a lab or a greyhound.
2:13What would you do?
2:14Well, you could figure out the probability of each type
2:16of dog given their height.
2:18Here, it's more likely the dog is a lab.
2:20On the other hand, if we go all the way
2:22to the right of the histogram and look
2:24at a dog who is 35 inches tall, we
2:26can be pretty confident they're a greyhound.
2:29Now, what about a dog in the middle?
2:31You can see the graph gives us less information
2:33here, because the probability of each type of dog is close.
2:36So height is a useful feature, but it's not perfect.
2:40That's why in machine learning, you almost always
2:42need multiple features.
2:43Otherwise, you could just write an if statement
2:45instead of bothering with the classifier.
2:47To figure out what types of features you should use,
2:50do a thought experiment.
2:52Pretend you're the classifier.
2:53If you were trying to figure out if this dog is
2:55a lab or a greyhound, what other things would you want to know?
3:00You might ask about their hair length,
3:01or how fast they can run, or how much they weigh.
3:04Exactly how many features you should use
3:06is more of an art than a science,
3:08but as a rule of thumb, think about how many you'd
3:10need to solve the problem.
3:12Now let's look at another feature like eye color.
3:15Just for this toy example, let's imagine
3:17dogs have only two eye colors, blue and brown.
3:20And let's say the color of their eyes
3:22doesn't depend on the breed of dog.
3:24Here's what a histogram might look like for this example.
3:28For most values, the distribution is about 50/50.
3:32So this feature tells us nothing,
3:33because it doesn't correlate with the type of dog.
3:36Including a useless feature like this in your training
3:39data can hurt your classifier's accuracy.
3:41That's because there's a chance they might appear useful purely
3:45by accident, especially if you have only a small amount
3:48of training data.
3:50You also want your features to be independent.
3:52And independent features give you
3:54different types of information.
3:56Imagine we already have a feature-- height and inches--
3:59in our dataset.
4:00Ask yourself, would it be helpful
4:02if we added another feature, like height in centimeters?
4:05No, because it's perfectly correlated with one
4:08we already have.
4:09It's good practice to remove highly correlated features
4:12from your training data.
4:14That's because a lot of classifiers
4:15aren't smart enough to realize that height in inches
4:18in centimeters are the same thing,
4:20so they might double count how important this feature is.
4:23Last, you want your features to be easy to understand.
4:26For a new example, imagine you want
4:28to predict how many days it will take
4:30to mail a letter between two different cities.
4:33The farther apart the cities are, the longer it will take.
4:37A great feature to use would be the distance
4:39between the cities in miles.
4:42A much worse pair of features to use
4:44would be the city's locations given by their latitude
4:47and longitude.
4:48And here's why.
4:48I can look at the distance and make
4:51a good guess of how long it will take the letter to arrive.
4:54But learning the relationship between latitude, longitude,
4:56and time is much harder and would require many more
5:00examples in your training data.
5:01Now, there are techniques you can
5:03use to figure out exactly how useful your features are,
5:05and even what combinations of them are best,
5:08so you never have to leave it to chance.
5:11We'll get to those in a future episode.
5:13Coming up next time, we'll continue building our intuition
5:16for supervised learning.
5:17We'll show how different types of classifiers
5:19can be used to solve the same problem and dive a little bit
5:22deeper into how they work.
5:24Thanks very much for watching, and I'll see you then.
Suscribirse a:
Entradas (Atom)