Encoding Categorical Variables
Categorical variables are those whose values are selected from a group of categories or labels. For example, the Home owner
variable with the values of owner
and non-owner
is categorical, and so is the Marital status
variable with the values of never married
, married
, divorced
, and widowed
. In some categorical variables, the labels have an intrinsic order; for example, in the Student's grade
variable, the values of A
, B
, C
, and Fail
are ordered, with A
being the highest grade and Fail
being the lowest. These are called ordinal categorical variables. Variables in which the categories do not have an intrinsic order are called nominal categorical variables, such as the City
variable, with the values of London
, Manchester
, Bristol
, and so on.
The values of categorical variables are often encoded as strings. To train most machine learning models, we need to transform those strings into numbers. The act of replacing strings with numbers is called categorical...