Subscription

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Newsletter Hub

Free Learning

You're reading from Hands-On Data Science with the Command Line Automate everyday data science tasks using command-line tools

Product type Paperback

Published in Jan 2019

Publisher Packt

ISBN-13 9781789132984

Length 124 pages

Edition 1st Edition

Languages

Python

Tools

UNIX

Concepts

Data Science

Authors (3):

Jason Morris

Raymond Page

Chris McCubbin

View More author details

Table of Contents (8) Chapters

Preface

1. Data Science at the Command Line and Setting It Up FREE CHAPTER

2. Essential Commands

3. Shell Workflows, and Data Acquisition and Massaging

4. Bash Functions and Data Visualization

5. Loops, Functions, and String Processing

6. SQL, Math, and Wrapping it up

7. Other Books You May Enjoy

Leave a review - let other readers know what you think

Introduction to cut

Let's break the command down before you run it. The cut command removes sections from each line of a file. The -d parameter tells cut we are working with a tsv (tab separated values), and the -f parameter tells cut what fields we are interested in. Since product_title is the sixth field in our file, we started with that:

cut -d$'\t' -f 6,8,13,14 reviews.tsv | more

Unlike most programs, cut starts at 1 instead of 0.

Let’s see the results:

Much better! Let's go ahead and save this as a new file:

cut -d$'\t' -f 6,8,13,14 reviews.tsv > stripped_reviews.tsv

The following is what you should see once you run the preceding command:

Let's see how many times the word Packt shows up in this dataset:

grep -i Packt stripped_reviews.tsv | wc -w

The following is what you should see once you run the preceding command:

Let&apos...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (3)

Morris

Jason Morris is a systems and research engineer with over 19 years of experience in system architecture, research engineering, and large data analysis. His primary focus is machine learning with TensorFlow, CUDA, and Apache Spark. Jason is also a speaker and a consultant for designing large-scale architectures, implementing best security practices on the cloud, creating near real-time image detection analytics with deep learning, and developing serverless architectures to aid in ETL. His most recent roles include solution architect, big data engineer, big data specialist, and instructor at Amazon Web Services. He is currently the Chief Technology Officer of Next Rev Technologies and his favorite command line program is netcat

See other products by Morris

McCubbin

Chris McCubbin is a data scientist and software developer with 20 years experience in developing complex systems and analytics. He co-founded the successful big data security startup Sqrrl, since acquired by Amazon. He has also developed smart swarming systems for drones, social network analysis systems in MapReduce and big data security analytic platforms using the Apache projects Accumulo and Spark. He has been using the Unix command line starting on IRIX platforms in college and his favorite command line program is find.

See other products by McCubbin

Page

Raymond Page is a computer engineer specializing in site reliability. His experience with embedded development engendered a passion for removing the pervasive bloat from web technologies and cloud computing. His favorite command is cat.

See other products by Page