Handling Categorical Variables
Categorical variables are a list of string values or numeric values for an attribute. For instance, gender can be "Male" or "Female". There are two types of categories: nominal and ordinal. In nominal categorical data, there is no ordering among the values in that attribute. This is the case with gender values. Ordinal categories have some order within the set of values. For instance, for temperature "Low," "Medium," and "High" have an order.
- Label Encoding: String literals needs to be converted to numeric values, where "Male" can take value 1 and "Female" can take value 2. This is called integer encoding or label encoding. The integer values have a natural ordering so this may be suitable in cases dealing with categorical data, which is ordinal.
- One-Hot Encoding: For nominal categories, label encoding is not suitable as the natural order of the numbers may be learned by the machine...