Packt+ | Advance your knowledge in tech

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Free Learning

Practical Machine Learning with R

You're reading from Practical Machine Learning with R Define, build, and evaluate machine learning models for real-world applications

Product type Paperback

Published in Aug 2019

Publisher Packt

ISBN-13 9781838550134

Length 416 pages

Edition 1st Edition

Languages

R

Tools

RStudio

Concepts

Machine Learning

Authors (3):

Brindha Priyadarshini Jeyaraman

Ludvig Renbo Olsen

Monicah Wambugu

View More author details

Table of Contents (8) Chapters

About the Book

1. An Introduction to Machine Learning FREE CHAPTER

2. Data Cleaning and Pre-processing

3. Feature Engineering

4. Introduction to neuralnet and Evaluation Methods

5. Linear and Logistic Regression Models

6. Unsupervised Learning

1. Appendix

Handling Missing Values, Duplicates, and Outliers

In any dataset, we might have missing values, duplicate values, or outliers. We need to ensure that these are handled appropriately so that the data used by the model is clean.

Handling Missing Values

Missing values in a data frame can affect the model during the training process. Therefore, they need to be identified and handled during the pre-processing stage. They are represented as NA in a data frame. Using the example that follows, we will see how to identify a missing value in a dataset.

Using the is.na(), complete.cases(), and md.pattern() functions, we will identify the missing values.

The is.na() function, as the name suggests, returns TRUE for those elements marked NA or, for numeric or complex vectors, NaN (Not a Number) , and FALSE. The complete.cases() function returns TRUE if the value is missing and md.pattern() gives a summary of the missing values.

Exercise 12: Identifying the Missing Values

In the following example, we are adding...

The rest of the chapter is locked

Register for a free Packt account to unlock a world of extra content!

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €18.99/month. Cancel anytime

Authors (3)

Jeyaraman

Jeyaraman

Brindha Priyadarshini Jeyaraman is a senior data scientist at AIDA Technologies. She has completed her M.Tech in knowledge engineering with a gold medal from the National University of Singapore. She has more than 10 years of work experience and she is an expert in understanding business problems, and designing and implementing solutions using machine learning. She has worked on several real data science projects in the insurance and finance domain.

See other products by Jeyaraman

Olsen

Olsen

Ludvig Renbo Olsen, BSc in Cognitive Science from Aarhus University, is the author of multiple R packages, such as groupdata2 and cvms. With 4 years of R and Python experience, including working as a machine learning researcher at the Danish startup UNSILO, he is passionate about creating tools and tutorials for students and scientists. Guided by Effective Altruism, he intends to positively impact the world through his career.

See other products by Olsen

Wambugu

Wambugu

Monicah Wambugu is the lead Data Scientist at Loanbee, a financial technology company that offers micro-loans by leveraging on data, machine learning and analytics to perform alternative credit scoring. She is a graduate student at the School of Information at UC Berkeley Masters in Information Management and Systems. Monicah is particularly interested in how data science and machine learning can be used to design products and applications that respond to the behavioral and socio-economic needs of target audiences.

See other products by Wambugu

Other recommended products

Related to this chapter

Applied Supervised Learning with R

Applied Supervised Learning with R

Applied Supervised Learning with R will make you a pro at identifying your business problem, selecting the best supervised machine learning algorithm to solve it, and fine-tuning your model to exactly deliver your needs without overfitting itself.

May 2019 16h 44m

Machine Learning with R Cookbook

Machine Learning with R Cookbook

The R language is a powerful open source functional programming language. At its core, R is a statistical language that provides impressive tools to analyze data and create high-level graphics. This book covers the basics of R by setting up a user-friendly programming environment and programming ETL in R. Data exploration examples are provided that demonstrate how powerful data visualisation and machine learning is in discovering hidden relationships. You will also explore air quality data, steps to fix the missing values and visualising the same. You will then dive into important machine learning topics, including data classification, regression, survival analysis, time series analysis, clustering association rule mining, and dimension reduction.This book will include the latest code and examples based on R 3.3 and above—updated for better computation, accuracy, and speed with R.

Oct 2017 19h 4m

Regression Analysis with R

Regression Analysis with R

Regression analysis is a statistical process which enables prediction of relationships between variables. This book will give you a rundown explaining what regression analysis is, explaining you the process from scratch. Each chapter starts with explaining the theoretical concepts and once the reader gets comfortable with the theory, we move to the practical examples to support the understanding. By the end of this book you will know all the concepts and pain-points related to regression analysis, and you will be able to implement your learning in your projects.

Jan 2018 14h 4m

Applied Unsupervised Learning with R

Applied Unsupervised Learning with R

Starting with the basics, Applied Unsupervised Learning with R explains clustering methods, distribution analysis, data encoders, and all features of R that enable you to understand your data better and get answers to all your business questions.

Mar 2019 10h 40m

Hands-On Ensemble Learning with R

Hands-On Ensemble Learning with R

This book introduces you to the concept of ensemble learning and demonstrates how different machine learning algorithms can be combined to build efficient machine learning models. Use R to implement the popular trilogy of ensemble techniques, i.e. bagging, random forest and boosting, to build faster and more accurate machine learning models.

Jul 2018 12h 32m

R Data Analysis Cookbook

R Data Analysis Cookbook

Data analytics with R has emerged as a very important focus for organizations of all kinds. R enables even those with only an intuitive grasp of the underlying concepts, without a deep mathematical background, to unleash powerful and detailed examinations of their data. This book empowers you by showing you ways to use R to generate professional analysis reports. The book also teaches you to quickly adapt the example code for your own needs and save yourself the time needed to construct code from scratch.

Sep 2017 18h 40m

Data Science for Marketing Analytics

Data Science for Marketing Analytics

Data Science for Marketing Analytics opens doors to looking at data with a different approach and new tools. Drawing on machine learning and data science concepts, this book broadens the range of tools that you can use to transform the market analysis process.

Mar 2019 14h 0m

Practical Predictive Analytics

Practical Predictive Analytics

This book teaches six specific steps needed to implement predictive analytics using R. It also teaches how team collaboration is critical and how it increases the chances of implementing a successful model. The book uses cases from healthcare, marketing, and government to build practical skills. Big Data is also covered, in this book, which will extend your skill sets by learning Databricks and RSpark.

Jun 2017 19h 12m

Data Science for Marketing Analytics

Data Science for Marketing Analytics

This book on marketing analytics with Python will quickly get you up and running using practical data science and machine learning to improve your approach to marketing. You'll learn how to analyze sales, understand customer data, predict outcomes, and present conclusions with clear visualizations.

Sep 2021 21h 12m

Mastering Machine Learning with R

Mastering Machine Learning with R

Machine learning is the field of Artificial Intelligence where we build systems that learn from data. Given the growing prominence of R—a cross-platform, zero-cost statistical programming environment—there has never been a better time to start applying machine learning to your data. This book will teach you advanced techniques in machine learning with the latest code in R 3.3.2.

Apr 2017 14h 0m

Master Data Science with Python

Master Data Science with Python

Data Science with Python will help you get comfortable with using the Python environment for data science. You will learn all the libraries that a data scientist uses on a daily basis. By the end of this course, you will be able to take a large raw dataset, clean it, manipulate it, and run machine learning algorithms to obtain results that influence business decisions.

Jul 2019 14h 12m

Mastering Predictive Analytics with R

Mastering Predictive Analytics with R

R offers a free and open source environment that is perfect for both learning and deploying predictive modeling solutions in the real world. With its constantly growing community and plethora of packages, R offers the functionality to deal with a truly vast array of problems. Updated with revamped examples and to the latest version of R, this book is designed to be both a guide and a reference for moving beyond the basics of predictive modeling.

Aug 2017 14h 56m

Personalised recommendations for you

Based on your interests and search pattern

Modern Computer Vision with PyTorch

Modern Computer Vision with PyTorch

This book provides a hands-on approach to solving over 30 prominent real-world computer vision problems using PyTorch 2.x on actual datasets. Here you'll learn to build a neural network from scratch and optimize hyperparameters, perform image classification, multi-object detection, segmentation, and more. You'll also explore facial expression manipulation and combining CV with NLP and RL techniques, build generative AI applications, and take your model to production on AWS. By the end of this book, you'll master modern NN architectures and confidently solve real-world CV problems.

Jun 2024 24h 52m

Data Governance Handbook

Data Governance Handbook

This book provides a highly focused view of real business outcomes powered by data governance, that resonate with non-data executives such as CFOs and CEOs. You'll also find useful insights into how to implement data governance initiatives.

May 2024 13h 8m

Data Engineering with Databricks Cookbook

Data Engineering with Databricks Cookbook

This book shows you how to use Apache Spark, Delta Lake, and Databricks to build data pipelines, manage and transform data, optimize performance, and more. Additionally, you'll implement DataOps and DevOps practices, and orchestrate data workflows.

May 2024 14h 36m

Azure Data Engineer Associate Certification Guide

Azure Data Engineer Associate Certification Guide

Unlock the power of Azure data engineering with this certification guide, elevating your skills in data processing, storage, and security with the help of practical insights, hands-on exercises, and the latest advancements.

May 2024 18h 16m

Microsoft Power BI Cookbook

Microsoft Power BI Cookbook

Microsoft Power BI is the most sought-after platform for BI professionals' visualization needs. Explore the latest Power BI features, future AI enhancements, and integration with other Power Platform tools via new recipes in this updated edition.

Jul 2024 19h 56m

Python Data Cleaning Cookbook

Python Data Cleaning Cookbook

The book shows you how to clean, wrangle, and view data from multiple perspectives, including dataset and column attributes. You will cover common and not-so-common challenges that are faced while cleaning messy data for complex situations and learn to manipulate data to get it down to a form that can be useful for making the right decisions.

May 2024 16h 12m

Microsoft Azure AI Fundamentals AI-900 Exam Guide

Microsoft Azure AI Fundamentals AI-900 Exam Guide

This AI-900 study guide will help you prepare and practice for the certification exam. You'll delve into AI workloads, ML principles, computer vision, NLP, knowledge mining, and generative AI using Azure cloud services.

May 2024 9h 36m

Using Stable Diffusion with Python

Using Stable Diffusion with Python

This book shows you how to use Python to control Stable Diffusion and generate high-quality images. In addition to covering the basic usage of the diffusers package, the book provides solutions for extending the package for more advanced purposes.

Jun 2024 11h 44m

Getting Started with DuckDB

Getting Started with DuckDB

This hands-on book teaches you to analyze large datasets with blazing speed and ease. You will learn how to use DuckDB to quickly load, query, transform, analyze, and visualize data effectively through a series of practical examples.

Jun 2024 12h 44m

Databricks Certified Associate Developer for Apache Spark Using Python

Databricks Certified Associate Developer for Apache Spark Using Python

This guide gets you ready for certification with expert-backed content, key exam concepts, and topic reviews. Additionally, you'll be able to make the most of Apache Spark 3.0 to modernize workloads and more using specific tools and techniques.