0:06 JOSH GORDON: Classifiers are only
0:08 as good as the features you provide.
0:10 That means coming up with good features
0:12 is one of your most important jobs in machine learning.
0:14 But what makes a good feature, and how can you tell?
0:17 If you're doing binary classification,
0:19 then a good feature makes it easy to decide
0:21 between two different things.
0:23 For example, imagine we wanted to write a classifier
0:26 to tell the difference between two types of dogs--
0:29 greyhounds and Labradors.
0:30 Here we'll use two features-- the dog's height in inches
0:34 and their eye color.
0:35 Just for this toy example, let's make a couple assumptions
0:38 about dogs to keep things simple.
0:40 First, we'll say that greyhounds are usually
0:43 taller than Labradors.
0:44 Next, we'll pretend that dogs have only two eye
0:47 colors-- blue and brown.
0:48 And we'll say the color of their eyes
0:50 doesn't depend on the breed of dog.
0:53 This means that one of these features is useful
0:55 and the other tells us nothing.
0:57 To understand why, we'll visualize them using a toy
1:01 dataset I'll create.
1:02 Let's begin with height.
1:04 How useful do you think this feature is?
1:06 Well, on average, greyhounds tend
1:08 to be a couple inches taller than Labradors, but not always.
1:11 There's a lot of variation in the world.
1:13 So when we think of a feature, we
1:15 have to consider how it looks for different values
1:17 in a population.
1:19 Let's head into Python for a programmatic example.
1:22 I'm creating a population of 1,000
1:24 dogs-- 50-50 greyhound Labrador.
1:27 I'll give each of them a height.
1:29 For this example, we'll say that greyhounds
1:31 are on average 28 inches tall and Labradors are 24.
1:35 Now, all dogs are a bit different.
1:37 Let's say that height is normally distributed,
1:39 so we'll make both of these plus or minus 4 inches.
1:42 This will give us two arrays of numbers,
1:44 and we can visualize them in a histogram.
1:47 I'll add a parameter so greyhounds are in red
1:49 and Labradors are in blue.
1:51 Now we can run our script.
1:53 This shows how many dogs in our population have a given height.
1:57 There's a lot of data on the screen,
1:58 so let's simplify it and look at it piece by piece.
2:03 We'll start with dogs on the far left
2:05 of the distribution-- say, who are about 20 inches tall.
2:08 Imagine I asked you to predict whether a dog with his height
2:11 was a lab or a greyhound.
2:13 What would you do?
2:14 Well, you could figure out the probability of each type
2:16 of dog given their height.
2:18 Here, it's more likely the dog is a lab.
2:20 On the other hand, if we go all the way
2:22 to the right of the histogram and look
2:24 at a dog who is 35 inches tall, we
2:26 can be pretty confident they're a greyhound.
2:29 Now, what about a dog in the middle?
2:31 You can see the graph gives us less information
2:33 here, because the probability of each type of dog is close.
2:36 So height is a useful feature, but it's not perfect.
2:40 That's why in machine learning, you almost always
2:42 need multiple features.
2:43 Otherwise, you could just write an if statement
2:45 instead of bothering with the classifier.
2:47 To figure out what types of features you should use,
2:50 do a thought experiment.
2:52 Pretend you're the classifier.
2:53 If you were trying to figure out if this dog is
2:55 a lab or a greyhound, what other things would you want to know?
3:00 You might ask about their hair length,
3:01 or how fast they can run, or how much they weigh.
3:04 Exactly how many features you should use
3:06 is more of an art than a science,
3:08 but as a rule of thumb, think about how many you'd
3:10 need to solve the problem.
3:12 Now let's look at another feature like eye color.
3:15 Just for this toy example, let's imagine
3:17 dogs have only two eye colors, blue and brown.
3:20 And let's say the color of their eyes
3:22 doesn't depend on the breed of dog.
3:24 Here's what a histogram might look like for this example.
3:28 For most values, the distribution is about 50/50.
3:32 So this feature tells us nothing,
3:33 because it doesn't correlate with the type of dog.
3:36 Including a useless feature like this in your training
3:39 data can hurt your classifier's accuracy.
3:41 That's because there's a chance they might appear useful purely
3:45 by accident, especially if you have only a small amount
3:48 of training data.
3:50 You also want your features to be independent.
3:52 And independent features give you
3:54 different types of information.
3:56 Imagine we already have a feature-- height and inches--
3:59 in our dataset.
4:00 Ask yourself, would it be helpful
4:02 if we added another feature, like height in centimeters?
4:05 No, because it's perfectly correlated with one
4:08 we already have.
4:09 It's good practice to remove highly correlated features
4:12 from your training data.
4:14 That's because a lot of classifiers
4:15 aren't smart enough to realize that height in inches
4:18 in centimeters are the same thing,
4:20 so they might double count how important this feature is.
4:23 Last, you want your features to be easy to understand.
4:26 For a new example, imagine you want
4:28 to predict how many days it will take
4:30 to mail a letter between two different cities.
4:33 The farther apart the cities are, the longer it will take.
4:37 A great feature to use would be the distance
4:39 between the cities in miles.
4:42 A much worse pair of features to use
4:44 would be the city's locations given by their latitude
4:47 and longitude.
4:48 And here's why.
4:48 I can look at the distance and make
4:51 a good guess of how long it will take the letter to arrive.
4:54 But learning the relationship between latitude, longitude,
4:56 and time is much harder and would require many more
5:00 examples in your training data.
5:01 Now, there are techniques you can
5:03 use to figure out exactly how useful your features are,
5:05 and even what combinations of them are best,
5:08 so you never have to leave it to chance.
5:11 We'll get to those in a future episode.
5:13 Coming up next time, we'll continue building our intuition
5:16 for supervised learning.
5:17 We'll show how different types of classifiers
5:19 can be used to solve the same problem and dive a little bit
5:22 deeper into how they work.
5:24 Thanks very much for watching, and I'll see you then.
Transcripción : Youtube
0:08 as good as the features you provide.
0:10 That means coming up with good features
0:12 is one of your most important jobs in machine learning.
0:14 But what makes a good feature, and how can you tell?
0:17 If you're doing binary classification,
0:19 then a good feature makes it easy to decide
0:21 between two different things.
0:23 For example, imagine we wanted to write a classifier
0:26 to tell the difference between two types of dogs--
0:29 greyhounds and Labradors.
0:30 Here we'll use two features-- the dog's height in inches
0:34 and their eye color.
0:35 Just for this toy example, let's make a couple assumptions
0:38 about dogs to keep things simple.
0:40 First, we'll say that greyhounds are usually
0:43 taller than Labradors.
0:44 Next, we'll pretend that dogs have only two eye
0:47 colors-- blue and brown.
0:48 And we'll say the color of their eyes
0:50 doesn't depend on the breed of dog.
0:53 This means that one of these features is useful
0:55 and the other tells us nothing.
0:57 To understand why, we'll visualize them using a toy
1:01 dataset I'll create.
1:02 Let's begin with height.
1:04 How useful do you think this feature is?
1:06 Well, on average, greyhounds tend
1:08 to be a couple inches taller than Labradors, but not always.
1:11 There's a lot of variation in the world.
1:13 So when we think of a feature, we
1:15 have to consider how it looks for different values
1:17 in a population.
1:19 Let's head into Python for a programmatic example.
1:22 I'm creating a population of 1,000
1:24 dogs-- 50-50 greyhound Labrador.
1:27 I'll give each of them a height.
1:29 For this example, we'll say that greyhounds
1:31 are on average 28 inches tall and Labradors are 24.
1:35 Now, all dogs are a bit different.
1:37 Let's say that height is normally distributed,
1:39 so we'll make both of these plus or minus 4 inches.
1:42 This will give us two arrays of numbers,
1:44 and we can visualize them in a histogram.
1:47 I'll add a parameter so greyhounds are in red
1:49 and Labradors are in blue.
1:51 Now we can run our script.
1:53 This shows how many dogs in our population have a given height.
1:57 There's a lot of data on the screen,
1:58 so let's simplify it and look at it piece by piece.
2:03 We'll start with dogs on the far left
2:05 of the distribution-- say, who are about 20 inches tall.
2:08 Imagine I asked you to predict whether a dog with his height
2:11 was a lab or a greyhound.
2:13 What would you do?
2:14 Well, you could figure out the probability of each type
2:16 of dog given their height.
2:18 Here, it's more likely the dog is a lab.
2:20 On the other hand, if we go all the way
2:22 to the right of the histogram and look
2:24 at a dog who is 35 inches tall, we
2:26 can be pretty confident they're a greyhound.
2:29 Now, what about a dog in the middle?
2:31 You can see the graph gives us less information
2:33 here, because the probability of each type of dog is close.
2:36 So height is a useful feature, but it's not perfect.
2:40 That's why in machine learning, you almost always
2:42 need multiple features.
2:43 Otherwise, you could just write an if statement
2:45 instead of bothering with the classifier.
2:47 To figure out what types of features you should use,
2:50 do a thought experiment.
2:52 Pretend you're the classifier.
2:53 If you were trying to figure out if this dog is
2:55 a lab or a greyhound, what other things would you want to know?
3:00 You might ask about their hair length,
3:01 or how fast they can run, or how much they weigh.
3:04 Exactly how many features you should use
3:06 is more of an art than a science,
3:08 but as a rule of thumb, think about how many you'd
3:10 need to solve the problem.
3:12 Now let's look at another feature like eye color.
3:15 Just for this toy example, let's imagine
3:17 dogs have only two eye colors, blue and brown.
3:20 And let's say the color of their eyes
3:22 doesn't depend on the breed of dog.
3:24 Here's what a histogram might look like for this example.
3:28 For most values, the distribution is about 50/50.
3:32 So this feature tells us nothing,
3:33 because it doesn't correlate with the type of dog.
3:36 Including a useless feature like this in your training
3:39 data can hurt your classifier's accuracy.
3:41 That's because there's a chance they might appear useful purely
3:45 by accident, especially if you have only a small amount
3:48 of training data.
3:50 You also want your features to be independent.
3:52 And independent features give you
3:54 different types of information.
3:56 Imagine we already have a feature-- height and inches--
3:59 in our dataset.
4:00 Ask yourself, would it be helpful
4:02 if we added another feature, like height in centimeters?
4:05 No, because it's perfectly correlated with one
4:08 we already have.
4:09 It's good practice to remove highly correlated features
4:12 from your training data.
4:14 That's because a lot of classifiers
4:15 aren't smart enough to realize that height in inches
4:18 in centimeters are the same thing,
4:20 so they might double count how important this feature is.
4:23 Last, you want your features to be easy to understand.
4:26 For a new example, imagine you want
4:28 to predict how many days it will take
4:30 to mail a letter between two different cities.
4:33 The farther apart the cities are, the longer it will take.
4:37 A great feature to use would be the distance
4:39 between the cities in miles.
4:42 A much worse pair of features to use
4:44 would be the city's locations given by their latitude
4:47 and longitude.
4:48 And here's why.
4:48 I can look at the distance and make
4:51 a good guess of how long it will take the letter to arrive.
4:54 But learning the relationship between latitude, longitude,
4:56 and time is much harder and would require many more
5:00 examples in your training data.
5:01 Now, there are techniques you can
5:03 use to figure out exactly how useful your features are,
5:05 and even what combinations of them are best,
5:08 so you never have to leave it to chance.
5:11 We'll get to those in a future episode.
5:13 Coming up next time, we'll continue building our intuition
5:16 for supervised learning.
5:17 We'll show how different types of classifiers
5:19 can be used to solve the same problem and dive a little bit
5:22 deeper into how they work.
5:24 Thanks very much for watching, and I'll see you then.
Transcripción : Youtube
No hay comentarios.:
Publicar un comentario