What kind of skills are required to become a data scientist?
In the industry, the reality is that data science is so new that companies do not yet have a well-defined career path for it. How do you get hired for a data scientist position? How many years of experience is required? What skills do you need to bring to the table? Math, statistics, machine learning, information technology, computer science, and what else?
Well, the answer is probably a little bit of everything plus one more critical skill: domain-specific expertise.
There is a debate going on around whether applying generic data science techniques to any dataset without an intimate understanding of its meaning, leads to the desired business outcome. Many companies are leaning toward making sure data scientists have substantial amount of domain expertise, the rationale being that without it you may unknowingly introduce bias at any steps, such as when filling the gaps in the data cleansing phase or during the feature selection process, and ultimately build models that may well fit a given dataset but still end up being worthless. Imagine a data scientist working with no chemistry background, studying unwanted molecule interactions for a pharmaceutical company developing new drugs. This is also probably why we're seeing a multiplication of statistics courses specialized in a particular domain, such as biostatistics for biology, or supply chain analytics for analyzing operation management related to supply chains, and so on.
To summarize, a data scientist should be in theory somewhat proficient in the following areas:
- Data engineering / information retrieval
- Computer science
- Math and statistics
- Machine learning
- Data visualization
- Business intelligence
- Domain-specific expertise
Note
If you are thinking about acquiring these skills but don't have the time to attend traditional classes, I strongly recommend using online courses.
I particularly recommend this course: https://www.coursera.org/: https://www.coursera.org/learn/data-science-course.
The classic Drew's Conway Venn Diagram provides an excellent visualization of what is data science and why data scientists are a bit of a unicorn:
By now, I hope it becomes pretty clear that the perfect data scientist that fits the preceding description is more an exception than the norm and that, most often, the role involves multiple personas. Yes, that's right, the point I'm trying to make is that data science is a team sport and this idea will be a recurring theme throughout this book.