Fundamentals of Data Wrangling
The relationship between humans and data is age old. Knowing that our brains can capture and store only a limited amount of information, we had to create ways to keep and organize data.
The first idea of keeping and storing data goes back to 19000 BC (as stated in https://www.thinkautomation.com/histories/the-history-of-data/) when a bone stick is believed to have been used to count things and keep information engraved on it, serving as a tally stick. Since then, words, writing, numbers, and many other forms of data collection have been developed and evolved.
In 1663, John Graunt performed one of the first recognized data analyses, studying births and deaths by gender in the city of London, England.
In 1928, Fritz Pfleumer received the patent for magnetic tapes, a solution to store sound that enabled other researchers to create many of the storage technologies that are still used, such as hard disk drives.
Fast forward to the modern world, at the beginning of the computer age, in the 1970s, when IBM researchers Raymond Boyce and Donald Chamberlin created the Structured Query Language (SQL) for getting access to and modifying data held in databases. The language is still used, and, as a matter of fact, many data-wrangling concepts come from it. Concepts such as SELECT, WHERE, GROUP BY, and JOIN are heavily present in any work you want to perform with datasets. Therefore, a little knowledge of those basic commands might help you throughout this book, although it is not mandatory.
In this chapter, we will cover the following main topics:
- What is data wrangling?
- Why data wrangling?
- The key steps of data wrangling