Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Python Machine Learning Cookbook

You're reading from   Python Machine Learning Cookbook Over 100 recipes to progress from smart data analytics to deep learning using real-world datasets

Arrow left icon
Product type Paperback
Published in Mar 2019
Publisher Packt
ISBN-13 9781789808452
Length 642 pages
Edition 2nd Edition
Languages
Tools
Arrow right icon
Authors (2):
Arrow left icon
Giuseppe Ciaburro Giuseppe Ciaburro
Author Profile Icon Giuseppe Ciaburro
Giuseppe Ciaburro
Prateek Joshi Prateek Joshi
Author Profile Icon Prateek Joshi
Prateek Joshi
Arrow right icon
View More author details
Toc

Table of Contents (18) Chapters Close

Preface 1. The Realm of Supervised Learning FREE CHAPTER 2. Constructing a Classifier 3. Predictive Modeling 4. Clustering with Unsupervised Learning 5. Visualizing Data 6. Building Recommendation Engines 7. Analyzing Text Data 8. Speech Recognition 9. Dissecting Time Series and Sequential Data 10. Analyzing Image Content 11. Biometric Face Recognition 12. Reinforcement Learning Techniques 13. Deep Neural Networks 14. Unsupervised Representation Learning 15. Automated Machine Learning and Transfer Learning 16. Unlocking Production Issues 17. Other Books You May Enjoy

Computing the relative importance of features

Are all features equally important? In this case, we used 13 input features, and they all contributed to the model. However, an important question here is, How do we know which features are more important? Obviously, not all features contribute equally to the output. In case we want to discard some of them later, we need to know which features are less important. We have this functionality available in scikit-learn.

Getting ready

Let's calculate the relative importance of the features. Feature importance provides a measure that indicates the value of each feature in the construction of the model. The more an attribute is used to build the model, the greater its relative importance. This importance is explicitly calculated for each attribute in the dataset, allowing you to classify and compare attributes to each other. Feature importance is an attribute contained in the model (feature_importances_).

How to do it...

Let's see how to compute the relative importance of features:

  1. Let's see how to extract this. Add the following lines to housing.py:
DTFImp= dt_regressor.feature_importances_
DTFImp= 100.0 * (DTFImp / max(DTFImp))
index_sorted = np.flipud(np.argsort(DTFImp))
pos = np.arange(index_sorted.shape[0]) + 0.5

The regressor object has a callable feature_importances_ method that gives us the relative importance of each feature. To compare the results, the importance values have been normalized. Then, we ordered the index values and turned them upside down so that they are arranged in descending order of importance. Finally, for display purposes, the location of the labels on the x-axis has been centered.

  1. To visualize the results, we will plot the bar graph:
plt.figure()
plt.bar(pos, DTFImp[index_sorted], align='center')
plt.xticks(pos, housing_data.feature_names[index_sorted])
plt.ylabel('Relative Importance')
plt.title("Decision Tree regressor")
plt.show()
  1. We just take the values from the feature_importances_ method and scale them so that they range between 0 and 100. Let's see what we will get for a decision tree-based regressor in the following output:

So, the decision tree regressor says that the most important feature is RM.

  1. Now, we carry out a similar procedure for the AdaBoost model:
ABFImp= ab_regressor.feature_importances_ 
ABFImp= 100.0 * (ABFImp / max(ABFImp))
index_sorted = np.flipud(np.argsort(ABFImp))
pos = np.arange(index_sorted.shape[0]) + 0.5

  1. To visualize the results, we will plot the bar graph:
plt.figure()
plt.bar(pos, ABFImp[index_sorted], align='center')
plt.xticks(pos, housing_data.feature_names[index_sorted])
plt.ylabel('Relative Importance')
plt.title("AdaBoost regressor")
plt.show()

Let's take a look at what AdaBoost has to say in the following output:

According to AdaBoost, the most important feature is LSTAT. In reality, if you build various regressors on this data, you will see that the most important feature is in fact LSTAT. This shows the advantage of using AdaBoost with a decision tree-based regressor.

How it works...

Feature importance provides a measure that indicates the value of each feature in the construction of a model. The more an attribute is used to build a model, the greater its relative importance. In this recipe, the feature_importances_ attribute was used to extract the relative importance of the features from the model.

There's more...

Relative importance returns the utility of each characteristic in the construction of decision trees. The more an attribute is used to make predictions with decision trees, the greater its relative importance. This importance is explicitly calculated for each attribute in the dataset, allowing you to classify and compare attributes to each other.

See also

You have been reading a chapter from
Python Machine Learning Cookbook - Second Edition
Published in: Mar 2019
Publisher: Packt
ISBN-13: 9781789808452
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at AU $24.99/month. Cancel anytime