Using Wide & Deep models
Linear models can boast a great advantage over complex models: they are efficient and easily interpretable, even when you work with many features and with features that interact with each other. Google researchers mentioned this aspect as the power of memorization because your linear model records the association between the features and the target into single coefficients. On the other hand, neural networks are blessed with the power of generalization, because in their complexity (they use multiple layers of weights and they interrelate each input), they can manage to approximate the general rules that govern the outcome of a process.
Wide & Deep models, as conceived by Google researchers (https://arxiv.org/abs/1606.07792), can blend memorization and generalization because they combine a linear model, applied to numeric features, together with generalization, applied to sparse features, such as categories encoded into a sparse matrix. Therefore...