What Makes a Good Feature? - Machine Learning Recipes #3

0:06   JOSH GORDON: Classifiers are only
0:08   as good as the features you provide.
0:10   That means coming up with good features
0:12   is one of your most important jobs in machine learning.
0:14   But what makes a good feature, and how can you tell?
0:17   If you're doing binary classification,
0:19   then a good feature makes it easy to decide
0:21   between two different things.
0:23   For example, imagine we wanted to write a classifier
0:26   to tell the difference between two types of dogs--
0:29   greyhounds and Labradors.
0:30   Here we'll use two features-- the dog's height in inches
0:34   and their eye color.
0:35   Just for this toy example, let's make a couple assumptions
0:38   about dogs to keep things simple.
0:40   First, we'll say that greyhounds are usually
0:43   taller than Labradors.
0:44   Next, we'll pretend that dogs have only two eye
0:47   colors-- blue and brown.
0:48   And we'll say the color of their eyes
0:50   doesn't depend on the breed of dog.
0:53   This means that one of these features is useful
0:55   and the other tells us nothing.
0:57   To understand why, we'll visualize them using a toy
1:01   dataset I'll create.
1:02   Let's begin with height.
1:04   How useful do you think this feature is?
1:06   Well, on average, greyhounds tend
1:08   to be a couple inches taller than Labradors, but not always.
1:11   There's a lot of variation in the world.
1:13   So when we think of a feature, we
1:15   have to consider how it looks for different values
1:17   in a population.
1:19   Let's head into Python for a programmatic example.
1:22   I'm creating a population of 1,000
1:24   dogs-- 50-50 greyhound Labrador.
1:27   I'll give each of them a height.
1:29   For this example, we'll say that greyhounds
1:31   are on average 28 inches tall and Labradors are 24.
1:35   Now, all dogs are a bit different.
1:37   Let's say that height is normally distributed,
1:39   so we'll make both of these plus or minus 4 inches.
1:42   This will give us two arrays of numbers,
1:44   and we can visualize them in a histogram.
1:47   I'll add a parameter so greyhounds are in red
1:49   and Labradors are in blue.
1:51   Now we can run our script.
1:53   This shows how many dogs in our population have a given height.
1:57   There's a lot of data on the screen,
1:58   so let's simplify it and look at it piece by piece.
2:03   We'll start with dogs on the far left
2:05   of the distribution-- say, who are about 20 inches tall.
2:08   Imagine I asked you to predict whether a dog with his height
2:11   was a lab or a greyhound.
2:13   What would you do?
2:14   Well, you could figure out the probability of each type
2:16   of dog given their height.
2:18   Here, it's more likely the dog is a lab.
2:20   On the other hand, if we go all the way
2:22   to the right of the histogram and look
2:24   at a dog who is 35 inches tall, we
2:26   can be pretty confident they're a greyhound.
2:29   Now, what about a dog in the middle?
2:31   You can see the graph gives us less information
2:33   here, because the probability of each type of dog is close.
2:36   So height is a useful feature, but it's not perfect.
2:40   That's why in machine learning, you almost always
2:42   need multiple features.
2:43   Otherwise, you could just write an if statement
2:45   instead of bothering with the classifier.
2:47   To figure out what types of features you should use,
2:50   do a thought experiment.
2:52   Pretend you're the classifier.
2:53   If you were trying to figure out if this dog is
2:55   a lab or a greyhound, what other things would you want to know?
3:00   You might ask about their hair length,
3:01   or how fast they can run, or how much they weigh.
3:04   Exactly how many features you should use
3:06   is more of an art than a science,
3:08   but as a rule of thumb, think about how many you'd
3:10   need to solve the problem.
3:12   Now let's look at another feature like eye color.
3:15   Just for this toy example, let's imagine
3:17   dogs have only two eye colors, blue and brown.
3:20   And let's say the color of their eyes
3:22   doesn't depend on the breed of dog.
3:24   Here's what a histogram might look like for this example.
3:28   For most values, the distribution is about 50/50.
3:32   So this feature tells us nothing,
3:33   because it doesn't correlate with the type of dog.
3:36   Including a useless feature like this in your training
3:39   data can hurt your classifier's accuracy.
3:41   That's because there's a chance they might appear useful purely
3:45   by accident, especially if you have only a small amount
3:48   of training data.
3:50   You also want your features to be independent.
3:52   And independent features give you
3:54   different types of information.
3:56   Imagine we already have a feature-- height and inches--
3:59   in our dataset.
4:00   Ask yourself, would it be helpful
4:02   if we added another feature, like height in centimeters?
4:05   No, because it's perfectly correlated with one
4:08   we already have.
4:09   It's good practice to remove highly correlated features
4:12   from your training data.
4:14   That's because a lot of classifiers
4:15   aren't smart enough to realize that height in inches
4:18   in centimeters are the same thing,
4:20   so they might double count how important this feature is.
4:23   Last, you want your features to be easy to understand.
4:26   For a new example, imagine you want
4:28   to predict how many days it will take
4:30   to mail a letter between two different cities.
4:33   The farther apart the cities are, the longer it will take.
4:37   A great feature to use would be the distance
4:39   between the cities in miles.
4:42   A much worse pair of features to use
4:44   would be the city's locations given by their latitude
4:47   and longitude.
4:48   And here's why.
4:48   I can look at the distance and make
4:51   a good guess of how long it will take the letter to arrive.
4:54   But learning the relationship between latitude, longitude,
4:56   and time is much harder and would require many more
5:00   examples in your training data.
5:01   Now, there are techniques you can
5:03   use to figure out exactly how useful your features are,
5:05   and even what combinations of them are best,
5:08   so you never have to leave it to chance.
5:11   We'll get to those in a future episode.
5:13   Coming up next time, we'll continue building our intuition
5:16   for supervised learning.
5:17   We'll show how different types of classifiers
5:19   can be used to solve the same problem and dive a little bit
5:22   deeper into how they work.
5:24   Thanks very much for watching, and I'll see you then.
Transcripción : Youtube

No hay comentarios.:

Publicar un comentario