Summary
At the beginning of this chapter, I posed a simple question, what's the catch of data science? Well there is one. It isn't all fun, games and modelling. There must be a price to our quest for ever smarter machines and algorithms. As we seek new and innovative ways to discover data trends, a beast lurks in the shadows. I'm not talking about the learning curve of mathematics or programming nor am I referring to the surplus of data. The industrial age left us with an ongoing battle against pollution. The subsequent information age left behind a trail of big data. So, what dangers might the data age bring us?
The data age can lead to something much more sinister—the dehumanization of the individual through mass data.
More and more people are jumping headfirst into the field of data science, most with no prior experience in math or CS, which on the surface is great. Average data scientists have access to millions of dating profiles' data, tweets, online reviews, and much more in order to jumpstart their education.
However, if you jump into data science without the proper exposure to theory or coding practices and without respect of the domain you are working in, you face the risk of oversimplifying the very phenomenon you are trying to model.
For example, let's say you want to automate your sales pipeline by building a simplistic program that looks at LinkedIn for very specific keywords in a person's LinkedIn profile.
keywords = ["Saas", "Sales", "Enterprise"]
Great, now you can scan LinkedIn quickly to find people who match your criteria. But what about that person who spells out "Software as a Service" instead of "Saas" or misspells "enterprise" (it happens to the best of us; I bet someone will find a typo in my book). How will your model figure out that these people are also a good match? They should not be left behind just because the cut corners data scientist has overgeneralized people in such an easy way.
The programmer chose to simplify their search for another human by looking for three basic keywords and ended up with a lot of missed opportunities left on the table.
In the next chapter, we will explore the different types of data that exist in the world, ranging from free-form text to highly structured row/column files. We will also look at the mathematical operations that are allowed for different types of data, as well as deduce insights based on the form the data that comes in.