What this book covers
Chapter 1, Introduction to Machine learning, will cover the basics of Machine learning and the landscape of Machine learning semantics. It will also define Machine learning in simple terms and introduce Machine learning jargon or commonly used terms. This chapter will form the base for the rest of the chapters.
Chapter 2, Machine learning and Large-scale datasets, will explore qualifiers of large datasets, common characteristics, problems of repetition, the reasons for the hyper-growth in the volumes, and approaches to handle the big data.
Chapter 3, An Introduction to Hadoop's Architecture and Ecosystem, will cover all about Hadoop, starting from its core frameworks to its ecosystem components. At the end of this chapter, readers will be able to set up Hadoop and run some MapReduce functions; they will be able to use one or more ecosystem components. They will also be able to run and manage Hadoop environment and understand the command-line usage.
Chapter 4, Machine Learning Tools, Libraries, and Frameworks, will explain open source options to implement Machine learning and cover installation, implementation, and execution of libraries, tools, and frameworks, such as Apache Mahout, Python, R, Julia, and Apache Spark's MLlib. Very importantly, we will cover the integration of these frameworks with the big data platform—Apache Hadoop
Chapter 5, Decision Tree based learning, will explore a supervised learning technique with Decision trees to solve classification and regression problems. We will cover methods to select attributes and split and prune the tree. Among all the other Decision tree algorithms, we will explore the CART, C4.5, Random forests, and advanced decision tree techniques.
Chapter 6, Instance and Kernel methods based learning, will explore two learning algorithms: instance-based and kernel methods; and we will discover how they address the classification and prediction requirements. In instance-based learning methods, we will explore the Nearest Neighbor algorithm in detail. Similarly in kernel-based methods, we will explore Support Vector Machines using real-world examples.
Chapter 7, Association Rules based learning, will explore association rule based learning methods and algorithms: Apriori and FP-growth. With a common example, you will learn how to do frequent pattern mining using the Apriori and FP-growth algorithms with a step-by-step debugging of the algorithm.
Chapter 8, Clustering based learning, will cover clustering based learning methods in the context of unsupervised learning. We will take a deep dive into k-means clustering algorithm using an example and learn to implement it using Mahout, R, Python, Julia, and Spark.
Chapter 9, Bayesian learning, will explore Bayesian Machine learning. Additionally, we will cover all the core concepts of statistics starting from basic nomenclature to various distributions. We will cover Bayes theorem in depth with examples to understand how to apply it to the real-world problems.
Chapter 10, Regression based learning, will cover regression analysis-based Machine learning and in specific, how to implement linear and logistic regression models using Mahout, R, Python, Julia, and Spark. Additionally, we will cover other related concepts of statistics such as variance, covariance, ANOVA, among others. We will also cover regression models in depth with examples to understand how to apply it to the real-world problems.
Chapter 11, Deep learning, will cover the model for a biological neuron and will explain how an artificial neuron is related to its function. You will learn the core concepts of neural networks and understand how fully-connected layers work. We will also explore some key activation functions that are used in conjunction with matrix multiplication.
Chapter 12, Reinforcement learning, will explore a new learning technique called reinforcement learning. We will see how this is different from the traditional supervised and unsupervised learning techniques. We will also explore the elements of MDP and learn about it using an example.
Chapter 13,Ensemble learning, will cover the ensemble learning methods of Machine learning. In specific, we will look at some supervised ensemble learning techniques with some real-world examples. Finally, this chapter will have source-code examples for gradient boosting algorithm using R, Python (scikit-learn), Julia, and Spark machine learning tools and recommendation engines using Mahout libraries.
Chapter 14, New generation data architectures for Machine learning, will be on the implementation aspects of Machine learning. We will understand what the traditional analytics platforms are and how they cannot fit in modern data requirements. You will also learn about the architecture drivers that promote new data architecture paradigms, such as Lambda architectures polyglot persistence (Multi-model database architecture); you will learn how Semantic architectures help in a seamless data integration.