Packt+ | Advance your knowledge in tech

You're reading from Data Science with SQL Server Quick Start Guide Integrate SQL Server with data science

Product type Paperback

Published in Aug 2018

Publisher Packt

ISBN-13 9781789537123

Length 206 pages

Edition 1st Edition

Languages

SQL

Tools

SQL Server

Concepts

Data Science

Author (1):

Dejan Sarka

View More author details

Chapter 1, Writing Queries with T-SQL, gives a brief overview of T-SQL queries. It introduces all of the important parts of the mighty SELECT statement and focuses on analytical queries.

Chapter 2, Introducing R, introduces the second language in this book, R. R has been supported in SQL Server since version 2016. In order to use it properly, you have to understand the language constructs and data structures.

Chapter 3, Getting Familiar with Python, gives an overview of the second most popular data science language, Python. As a more general language, Python is probably even more popular than R. Lately, Python has been catching up with R in the data science field.

Chapter 4, Data Overview, deals with understanding data. You can use introductory statistics and basic graphs for this task. You will learn how to perform a data overview in all three languages used in this book.

Chapter 5, Data Preparation, teaches you how to work with the data that you get from your business systems and from data warehouses, which is typically not suited for direct use in a machine learning project. You need to add derived variables, deal with outliers and missing values, and more.

Chapter 6, Intermediate Statistics and Graphs, starts with the real analysis of the data. You can use intermediate-level statistical methods and graphs for the beginning of your advanced analytics journey.

Chapter 7, Unsupervised Machine Learning, explains the algorithms that do not use a target variable. It is like fishing in the mud - you try and see if some meaningful information can be extracted from your data. The most common undirected techniques are clustering, dimensionality reduction, and affinity grouping, also known as basket analysis or association rules.

Chapter 8, Supervised Machine Learning, deals with the algorithms that need a target variable. Some of the most important directed techniques include classification and estimation. Classification means examining a new case and assigning it to a predefined discrete class, for example, assigning keywords to articles and assigning customers to known segments. Next is estimation, where you try to estimate the value of a continuous variable of a new case. You can, for example, estimate the number of children or the family income. This chapter also shows you how you can evaluate your machine learning models and use them for predictions.