Processing high-cardinality categorical data
When processing high-cardinality categorical features, we can use the previously mentioned one-hot encoding strategy. However, we may encounter problems because the resulting matrix is too sparse (many zero values), thus preventing our DNN from converging to a good solution, or making the dataset unfeasible to handle (because sparse matrices made dense can occupy a large amount of memory).
The best solution instead is to pass them to our DNN as numerically labeled features and let a Keras embedding layer take care of them (https://www.tensorflow.org/api_docs/python/tf/keras/layers/Embedding). An embedding layers is just a matrix of weights that can convert the high-cardinality categorical input into a lower-dimensionality numerical output. It is basically a weighted linear combination whose weights are optimized to convert categories into numbers that can best help the prediction process.
Under the hood, the embedding layer converts...