Datasets often have many more features than we could possibly process. For example, let's say our job was to predict a country's poverty rate. We would probably start by matching a country's name with its poverty rate, but that would not help us predict the poverty rate of a new country. So we start thinking about possible causes of poverty. But how many possible causes of poverty are there? Factors might include a country's economy, lack of education, high divorce rate, overpopulation, and so on. If each one of these causes was a feature used to help predict the poverty rate, we would end up with a countless number of features. If you're a mathematician, you might think of these features as axes in a high-dimensional space, and every country's poverty rate is then a single point in this high-dimensional space...




















































