Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Jupyter for Data Science

You're reading from   Jupyter for Data Science Exploratory analysis, statistical modeling, machine learning, and data visualization with Jupyter

Arrow left icon
Product type Paperback
Published in Oct 2017
Publisher Packt
ISBN-13 9781785880070
Length 242 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Dan Toomey Dan Toomey
Author Profile Icon Dan Toomey
Dan Toomey
Arrow right icon
View More author details
Toc

Table of Contents (11) Chapters Close

Preface 1. Jupyter and Data Science FREE CHAPTER 2. Working with Analytical Data on Jupyter 3. Data Visualization and Prediction 4. Data Mining and SQL Queries 5. R with Jupyter 6. Data Wrangling 7. Jupyter Dashboards 8. Statistical Modeling 9. Machine Learning Using Jupyter 10. Optimizing Jupyter Notebooks

What this book covers

Chapter 1, Jupyter and Data Science, covers the details of the Jupyter user interface: what objects it works with and what actions can be taken by Jupyter. We'll see what the display tells us about the data, what tools are available, and some real-life examples from the industry showing R and Python coding. We will also see some of the ways to share our notebook with other users and, correspondingly, how to protect our notebook with different security mechanisms.

Chapter 2, Working with Analytical Data in Jupyter, covers using Python to scrape a website to gather data for analysis. Then we use Python NumPy, pandas, and SciPy functions for in-depth computations of results. The chapter goes further into pandas and explores manipulating data frames. Lastly, it shows examples of sorting and filtering data frames.

Chapter 3, Data Visualization and Prediction, demonstrates prediction models from Python and R under Jupyter. Then it uses Matplotlib for data visualization and interactive plotting (under Python). Then it covers several graphing techniques available in Jupyter and density maps with SciPy. We use histograms to visualize social data. Lastly, we generate a 3D plot in Jupyter.

Chapter 4, Data Mining and SQL Queries, covers Spark Context. We show examples of using Hadoop map/reduce and use SQL with Spark data. Then we combine data frames, operate on the resulting set, import JSON data, and manipulate it with Spark. Lastly, we look at using a pivot to gather information about a data frame.

Chapter 5, R on Jupyter, covers setting up R to be one of the engines available for a notebook. Then we use some rudimentary R to analyze voter demographics for a presidential election and trends in college admissions. Finally, we look at using a predictive model to determine whether some flights would be delayed or not.

Chapter 6, Data Wrangling, teaches reading in CSV files and performing some quick analysis of the data, including visualizations to help understand the data. Next, we consider some of the functions available in the dplyr package. We also use piping to more easily transfer the results of one operation into another operation. Lastly, we look into using the tidyr package to clean up or tidy up our data.

Chapter 7, Jupyter Dashboards, covers visualizing data graphically using glyphs to emphasize important aspects of the data. We use markdown to annotate a notebook page and Shiny to generate an interactive application. We show a way to host notebooks outside of Jupyter.

Chapter 8, Statistical Modeling, teaches converting a JSON file to a CSV file. We evaluate the yelp cuisine review dataset, determining the top rated and most rated firms. We use Python to perform a similar evaluation of yelp business ratings, finding very similar distributions of the data.

Chapter 9, Machine Learning Using Jupyter, covers several machine learning algorithms in both R and Python to compare and contrast. We use naive Bayes to determine how the data might be used. We apply nearest neighbor in a couple of different ways to see results. We also use decision trees to come up with an algorithm for predictions and a neural net to explain housing prices. Finally, we use a random forest algorithm to do the same.

Chapter 10, Optimizing Jupyter Notebooks, deploys your notebook so that others can access it. It shows optimizations you can make to increase your notebook's performance. Then we look at securing the notebook and the mechanisms of sharing it.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime