Preface
This book, titled Machine Learning Solutions, gives you a broad idea about the topic. As a reader, you will get the chance to learn how to develop cutting-edge data science applications using various Machine Learning (ML) techniques. This book is practical guide that can help you to build and optimize your data science applications.
We learn things by practically doing them. Practical implementations of various Machine Learning techniques, tips and tricks, optimization techniques, and so on will enhance your understanding in the ML and data science application development domains.
Now let me answer one of the most common questions I have heard from my friends and colleagues so frequently about ML and the data science application development front. This question is what really inspired me to write this book. For me, it's really important that all my readers get an idea of why am I writing this book. Let's find out that question…!
The question is, "How can I achieve the best possible accuracy for a machine learning application?" The answer includes lots of things that people should take care of:
Understand the goal of the application really well. Why does your organization want to build this application?
List down the expected output of the application and how this output helps the organization. This will clarify to you the technical aspect and business aspect of the application.
What kind of dataset do you have? Is there anything more you need in order to generate the required output?
Explore the dataset really well. Try to get an insight from the dataset.
Check whether the dataset is having labels or not. If it is a labeled dataset, then you can apply supervised algorithms; if it is not labeled, then apply unsupervised algorithms. Your problem statement is a regression problem or classification problem.
Build the very simple base line approach using simple ML techniques. Measure the accuracy.
Now you may think, "I haven't chosen the right algorithm and that is the reason the accuracy of the base line approach is not good." It's ok!
Try to list down all the possible problems that you can think your base-line approach has. Be honest about the problems.
Now solve the problems one by one and measure the accuracy. If the accuracy is improving, then move forward in that direction; otherwise try out other solutions that eventually solve the shortcomings of the base line approach and improve the accuracy.
You can repeat the process number of times. After every iteration, you will get a new and definite direction, which will lead you to the best possible solution as well as accuracy.
I have covered all the specified aspects in this book. Here, the major goal is how readers will get a state-of-the-art result for their own data science problem using ML algorithms, and in order to achieve that, we will use only the bare necessary theory and many hands-on examples of different domains.
We will cover the analytics domain, NLP domain, and computer vision domain. These examples are all industry problems and readers will learn how to get the best result. After reading this book, readers will apply their new skills to any sort of industry problem to achieve best possible for their machine learning applications.
Who this book is for
A typical reader will have a basic to intermediate knowledge of undergraduate mathematics, such as probabilities, statistics, calculus, and linear algebra. No advanced mathematics is required as the book will be mostly self-contained. Basic to intermediate knowledge of Machine Learning (ML) algorithms is required. No advance concepts of machine learning are required as the book will be mostly self-contained. A decent knowledge in Python is required too as it would be out-of-scope to go through an introduction to Python but each procedure will be explained step-by-step to be reproducible.
This book is full if practical examples. The reader wants to know about how to apply the Machine Learning (ML) algorithms for real life data science applications efficiently. This book starts from the basic ML techniques which can be used to develop base-line approach. After that readers learn how to apply optimization techniques for each application in order to achieve the state-of-the-art result. For each application, I have specified the basic concepts, tips and tricks along with the code.
What this book covers
Chapter 1, Credit Risk Modeling, builds the predictive analytics model to help us to predict whether the customer will default the loan or not. We will be using outlier detection, feature transformation, ensemble machine learning algorithms, and so on to get the best possible solution.
Chapter 2, Stock Market Price Prediction, builds a model to predict the stock index price based on a historical dataset. We will use neural networks to get the best possible solution.
Chapter 3, Customer Analytics, explores how to build customer segmentation so that marketing campaigns can be done optimally. Using various machine learning algorithms such as K-nearest neighbor, random forest, and so on, we can build the base-line approach. In order to get the best possible solution, we will be using ensemble machine learning algorithms.
Chapter 4, Recommendation Systems for E-commerce, builds a recommendation engine for e-commerce platform. It can recommend similar books. We will be using concepts such as correlation, TF-IDF, and cosine similarity to build the application.
Chapter 5, Sentiment Analysis, generates sentiment scores for movie reviews. In order to get the best solution, we will be using recurrent neural networks and Long short-term memory units.
Chapter 6, Job Recommendation Engine, is where we build our own dataset, which can be used to make a job recommendation engine. We will also use an already available dataset. We will be using basic statistical techniques to get the best possible solution.
Chapter 7, Text Summarization, covers an application to generate the extractive summary of a medical transcription. We will be using Python libraries for our base line approach. After that we will be using various vectorization and ranking techniques to get the summary for a medical document. We will also generate a summary for Amazon's product reviews.
Chapter 8, Developing Chatbots, develops a chatbot using the rule-based approach and deep learning-based approach. We will be using TensorFlow and Keras to build chatbots.
Chapter 9, Building a Real-Time Object Recognition App, teaches transfer learning. We learn about convolutional networks and YOLO (You Only Look Once) algorithms. We will be using pre-trained models to develop the application.
Chapter 10, Face Recognition and Face Emotion Recognition, covers an application to recognize human faces. During the second half of this chapter, we will be developing an application that can recognize facial expressions of humans. We will be using OpenCV, Keras, and TensorFlow to build this application.
Chapter 11, Building Gaming Bots, teaches reinforcement learning. Here, we will be using the gym or universe library to get the gaming environment. We'll first understand the Q-learning algorithm, and later on we will implement the same to train our gaming bot. Here, we are building bot for Atari games.
Appendix A, List of Cheat Sheets, shows cheat sheets for various Python libraries that we frequently use in data science applications.
Appendix B, Strategy for Wining Hackathons, tells you what the possible strategy for winning hackathons can be. I have also listed down some of the cool resources that can help you to update yourself.
To get the most out of this book
Basic to intermediate knowledge of mathematics, probability, statistics, and calculus is required.
Basic to intermediate knowledge of Machine Learning (ML) algorithms is also required.
Decent knowledge of Python is required.
While reading the chapter, please run the code so that you can understand the flow of the application. All the codes are available on GitHub. The link is: https://github.com/jalajthanaki/Awesome_Machine_Learning_Solutions.
Links of code are specified in the chapters. Installation instructions for each application are also available on GitHub.
You need minimum 8 GB of RAM to run the applications smoothly. If you can run code on GPU, then it would great; otherwise you can use pre-trained models. You can download pre-trained models using the GitHub link or Google drive link. The links are specified in the chapters.
Download the example code files
You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files emailed directly to you.
You can download the code files by following these steps:
Log in or register at http://www.packtpub.com.
Select the SUPPORT tab.
Click on Code Downloads & Errata.
Enter the name of the book in the Search box and follow the on-screen instructions.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR / 7-Zip for Windows
Zipeg / iZip / UnRarX for Mac
7-Zip / PeaZip for Linux
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Machine-Learning-Solutions. In case there's an update to the code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
Conventions used
There are a number of text conventions used throughout this book.
CodeInText
: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. For example; "Mount the downloaded WebStorm-10*.dmg
disk image file as another disk in your system."
A block of code is set as follows:
from __future__ import print_function import pandas as pd import numpy as np import seaborn as sns import matplotlib.pyplot as plt
When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:
from __future__ import print_function
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
Any command-line input or output is written as follows:
# cp /usr/src/asterisk-addons/configs/cdr_mysql.conf.sample /etc/asterisk/cdr_mysql.conf
Bold: Indicates a new term, an important word, or words that you see on the screen, for example, in menus or dialog boxes, also appear in the text like this. For example: "Select System info from the Administration panel."
Note
Warnings or important notes appear in a box like this.
Note
Tips and tricks appear like this.
Get in touch
Feedback from our readers is always welcome.
General feedback: Email feedback@packtpub.com
, and mention the book's title in the subject of your message. If you have questions about any aspect of this book, please email us at questions@packtpub.com
.
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book we would be grateful if you would report this to us. Please visit, http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at copyright@packtpub.com
with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit http://authors.packtpub.com.
Reviews
Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit packtpub.com.