Feature engineering
Feature engineering is perhaps the most important topic in machine learning. The success and failure of a model to predict the future depends primarily on how you engineer features to get a better lift. The difference between an experienced data scientist and a novice would be their ability to engineer features from the data sets given, and this is perhaps the most difficult and time consuming aspect of machine learning. This is where the understanding of business problems is the key. Feature engineering is basically an art more than it is a science, and basically it is needed to frame the problem. So what is feature engineering?
Feature engineering is the process of transforming raw data into features that better represent the underlying business problem to the predictive models, resulting in improved model accuracy on unseen data.
Due to the importance of feature engineering, Spark provides algorithms for working with features divided into three major groups:
- Feature...