You're reading from Cracking the Data Science Interview Unlock insider tips from industry experts to master the data science field

Product type Paperback

Published in Feb 2024

Publisher Packt

ISBN-13 9781805120506

Length 404 pages

Edition 1st Edition

Languages

Python

Tools

Git

Concepts

Data Science

Authors (2):

Leondra R. Gonzalez

Aaren Stubberfield

View More author details

Table of Contents (21) Chapters

Preface

1. Part 1: Breaking into the Data Science Field

2. Chapter 1: Exploring Today’s Modern Data Science Landscape FREE CHAPTER

3. Chapter 2: Finding a Job in Data Science

4. Part 2: Manipulating and Managing Data

5. Chapter 3: Programming with Python

6. Chapter 4: Visualizing Data and Data Storytelling

7. Chapter 5: Querying Databases with SQL

8. Chapter 6: Scripting with Shell and Bash Commands in Linux

9. Chapter 7: Using Git for Version Control

10. Part 3: Exploring Artificial Intelligence

11. Chapter 8: Mining Data with Probability and Statistics

12. Chapter 9: Understanding Feature Engineering and Preparing Data for Modeling

13. Chapter 10: Mastering Machine Learning Concepts

14. Chapter 11: Building Networks with Deep Learning

15. Chapter 12: Implementing Machine Learning Solutions with MLOps

16. Part 4: Getting the Job

17. Chapter 13: Mastering the Interview Rounds

18. Chapter 14: Negotiating Compensation

19. Index

Why subscribe?

20. Other Books You May Enjoy

Aggregating data with GROUP BY and HAVING

Aggregation is a concept with which you should already be familiar thanks to the discussion of Python using pandas in Chapter 3. Just like in Python, aggregation in SQL is about summarizing or grouping data in a way that makes it more useful, understandable, and manageable. GROUP BY and HAVING are two crucial components in SQL that help accomplish this.

The GROUP BY statement

Much like how grouping is performed in Python using pandas, the GROUP BY statement in SQL is used with aggregate functions (such as COUNT, SUM, AVG, MAX, and MIN) to group the result set by one or more columns. Thus, using GROUP BY should be familiar to you! The syntax is as follows:

SELECT column1, column2, columnN aggregate_function(columnX)
FROM table
GROUP BY columns(s);

Aggregate values are best managed by using aliases. An alias is simply a nickname for a calculated or aggregated field or temporary table. Simply use the term AS, like so: