Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Principles of Data Science

You're reading from   Principles of Data Science Mathematical techniques and theory to succeed in data-driven industries

Arrow left icon
Product type Paperback
Published in Dec 2016
Publisher Packt
ISBN-13 9781785887918
Length 388 pages
Edition 1st Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Sinan Ozdemir Sinan Ozdemir
Author Profile Icon Sinan Ozdemir
Sinan Ozdemir
Arrow right icon
View More author details
Toc

Table of Contents (15) Chapters Close

Preface 1. How to Sound Like a Data Scientist 2. Types of Data FREE CHAPTER 3. The Five Steps of Data Science 4. Basic Mathematics 5. Impossible or Improbable – A Gentle Introduction to Probability 6. Advanced Probability 7. Basic Statistics 8. Advanced Statistics 9. Communicating Data 10. How to Tell If Your Toaster Is Learning – Machine Learning Essentials 11. Predictions Don't Grow on Trees – or Do They? 12. Beyond the Essentials 13. Case Studies Index

Summary

At the beginning of this chapter, I posed a simple question, what's the catch of data science? Well there is one. It isn't all fun, games and modelling. There must be a price to our quest for ever smarter machines and algorithms. As we seek new and innovative ways to discover data trends, a beast lurks in the shadows. I'm not talking about the learning curve of mathematics or programming nor am I referring to the surplus of data. The industrial age left us with an ongoing battle against pollution. The subsequent information age left behind a trail of big data. So, what dangers might the data age bring us?

The data age can lead to something much more sinister—the dehumanization of the individual through mass data.

More and more people are jumping headfirst into the field of data science, most with no prior experience in math or CS, which on the surface is great. Average data scientists have access to millions of dating profiles' data, tweets, online reviews, and much more in order to jumpstart their education.

However, if you jump into data science without the proper exposure to theory or coding practices and without respect of the domain you are working in, you face the risk of oversimplifying the very phenomenon you are trying to model.

For example, let's say you want to automate your sales pipeline by building a simplistic program that looks at LinkedIn for very specific keywords in a person's LinkedIn profile.

keywords = ["Saas", "Sales", "Enterprise"]

Great, now you can scan LinkedIn quickly to find people who match your criteria. But what about that person who spells out "Software as a Service" instead of "Saas" or misspells "enterprise" (it happens to the best of us; I bet someone will find a typo in my book). How will your model figure out that these people are also a good match? They should not be left behind just because the cut corners data scientist has overgeneralized people in such an easy way.

The programmer chose to simplify their search for another human by looking for three basic keywords and ended up with a lot of missed opportunities left on the table.

In the next chapter, we will explore the different types of data that exist in the world, ranging from free-form text to highly structured row/column files. We will also look at the mathematical operations that are allowed for different types of data, as well as deduce insights based on the form the data that comes in.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image