"In the beginning... was the command line" Years ago, we didn't have fancy frameworks that handled our distributed computing for us, or applications that could read files intelligently and give us accurate results. If we did, it was very expensive or only worked for a small problem set, very few people had access to this technology, and it was mostly proprietary.
For newcomers to the world of data science, you might have used the command line for a small number of things. Maybe you moved a file from one place to another using mv, or read a file using cat. Or you might have never used the command line at all, or at least not for data science. In this book, we hope to show you a number of tools and ways you can perform some everyday tasks that you can do locally, without using today's buzzword framework.
We created this book for the folks who have little to no experience with the command line, and perform a lot of data extraction, modelling, parsing, and analyzing. This doesn't mean that if you do have a lot of command-line experience (a lot of DevOps and systems folks do), you shouldn't read this book. In fact, you might pick up a couple commands and techniques that you haven't used before.
In this chapter, we will cover the following topics:
- The history of the command line
- Language-focused shells
- Why use the command line?
We will also walk through the setup and configuration of the command line with the following operating systems:
- Windows 10
- Mac OS X
- Ubuntu Linux
If you are running a different operating system, we suggest obtaining an instance from a cloud provider or using the Docker container that's provided in this book.