Datasets often have many more features than we could possibly process. For example, let's say our job was to predict a country's poverty rate. We would probably start by matching a country's name with its poverty rate, but that would not help us predict the poverty rate of a new country. So we start thinking about possible causes of poverty. But how many possible causes of poverty are there? Factors might include a country's economy, lack of education, high divorce rate, overpopulation, and so on. If each one of these causes was a feature used to help predict the poverty rate, we would end up with a countless number of features. If you're a mathematician, you might think of these features as axes in a high-dimensional space, and every country's poverty rate is then a single point in this high-dimensional space...
United States
United Kingdom
India
Germany
France
Canada
Russia
Spain
Brazil
Australia
Argentina
Austria
Belgium
Bulgaria
Chile
Colombia
Cyprus
Czechia
Denmark
Ecuador
Egypt
Estonia
Finland
Greece
Hungary
Indonesia
Ireland
Italy
Japan
Latvia
Lithuania
Luxembourg
Malaysia
Malta
Mexico
Netherlands
New Zealand
Norway
Philippines
Poland
Portugal
Romania
Singapore
Slovakia
Slovenia
South Africa
South Korea
Sweden
Switzerland
Taiwan
Thailand
Turkey
Ukraine