You're reading from The Statistics and Machine Learning with R Workshop Unlock the power of efficient data science modeling with this hands-on guide

Product type Paperback

Published in Oct 2023

Publisher Packt

ISBN-13 9781803240305

Length 516 pages

Edition 1st Edition

Languages

Concepts

Data Science

Author (1):

Liu Peng

View More author details

Table of Contents (20) Chapters

Preface

1. Part 1:Statistics Essentials

2. Chapter 1: Getting Started with R FREE CHAPTER

3. Chapter 2: Data Processing with dplyr

4. Chapter 3: Intermediate Data Processing

5. Chapter 4: Data Visualization with ggplot2

6. Chapter 5: Exploratory Data Analysis

7. Chapter 6: Effective Reporting with R Markdown

8. Part 2:Fundamentals of Linear Algebra and Calculus in R

9. Chapter 7: Linear Algebra in R

10. Chapter 8: Intermediate Linear Algebra in R

11. Chapter 9: Calculus in R

12. Part 3:Fundamentals of Mathematical Statistics in R

13. Chapter 10: Probability Basics

14. Chapter 11: Statistical Estimation

15. Chapter 12: Linear Regression in R

16. Chapter 13: Logistic Regression in R

17. Chapter 14: Bayesian Statistics

18. Index

Why subscribe?

19. Other Books You May Enjoy

Data aggregation with dplyr

Data aggregation refers to a set of techniques that summarizes the dataset at an aggregate level and characterizes the original dataset at a higher level. Compared to data transformation, it operates at the row level for the input and the output.

We have already encountered a few aggregation functions, such as calculating the mean of a column. This section will cover some of the most widely used aggregation functions provided by dplyr. We will start with the count() function, which returns the number of observations/rows for each category of the specified input column.

Counting observations using the count() function

The count() function automatically groups the dataset into different categories according to the input argument and returns the number of observations for each category. The input argument could include one or more columns of the dataset. Let’s go through an exercise and apply it to the iris dataset.