Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Statistics for Data Science

You're reading from   Statistics for Data Science Leverage the power of statistics for Data Analysis, Classification, Regression, Machine Learning, and Neural Networks

Arrow left icon
Product type Paperback
Published in Nov 2017
Publisher Packt
ISBN-13 9781788290678
Length 286 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
James D. Miller James D. Miller
Author Profile Icon James D. Miller
James D. Miller
Arrow right icon
View More author details
Toc

Table of Contents (13) Chapters Close

Preface 1. Transitioning from Data Developer to Data Scientist 2. Declaring the Objectives FREE CHAPTER 3. A Developer's Approach to Data Cleaning 4. Data Mining and the Database Developer 5. Statistical Analysis for the Database Developer 6. Database Progression to Database Regression 7. Regularization for Database Improvement 8. Database Development and Assessment 9. Databases and Neural Networks 10. Boosting your Database 11. Database Classification using Support Vector Machines 12. Database Structures and Machine Learning

Transitioning to a data scientist

Let's start this section by taking a moment to state what I consider to be a few generally accepted facts about transitioning to a data scientist. We'll reaffirm these beliefs as we continue through this book:

  • Academia: Data scientists are not all from one academic background. They are not all computer science or statistics/mathematics majors. They do not all possess an advanced degree (in fact, you can use statistics and data science with a bachelor's degree or even less).
  • It's not magic-based: Data scientists can use machine learning and other accepted statistical methods to identify insights from data, not magic.
  • They are not all tech or computer geeks: You don't need years of programming experience or expensive statistical software to be effective.
  • You don't need to be experienced to get started. You can start today, right now. (Well, you already did when you bought this book!)

Okay, having made the previous declarations, let's also be realistic. As always, there is an entry-point for everything in life, and, to give credit where it is due, the more credentials you can acquire to begin out with, the better off you will most likely be. Nonetheless, (as we'll see later in this chapter), there is absolutely no valid reason why you cannot begin understanding, using, and being productive with data science and statistics immediately.

As with any profession, certifications, and degrees carry the weight that may open the doors, while experience, as always, might be considered the best teacher. There are, however, no fake data scientists but only those with currently more desire than practical experience.

If you are seriously interested in not only understanding statistics and data science but eventually working as a full-time data scientist, you should consider the following common themes (you're likely to find in job postings for data scientists) as areas to focus on:

  • Education: Common fields of study are Mathematics and Statistics, followed by Computer Science and Engineering (also Economics and Operations research). Once more, there is no strict requirement to have an advanced or even related degree. In addition, typically, the idea of a degree or an equivalent experience will also apply here.
  • Technology: You will hear SAS and R (actually, you will hear quite a lot about R) as well as Python, Hadoop, and SQL mentioned as key or preferable for a data scientist to be comfortable with, but tools and technologies change all the time so, as mentioned several times throughout this chapter, data developers can begin to be productive as soon as they understand the objectives of data science and various statistical mythologies without having to learn a new tool or language.
Basic business skills such as Omniture, Google Analytics, SPSS, Excel, or any other Microsoft Office tool are assumed pretty much everywhere and don't really count as an advantage, but experience with programming languages (such as Java, PERL, or C++) or databases (such as MySQL, NoSQL, Oracle, and so on.) does help!
  • Data: The ability to understand data and deal with the challenges specific to the various types of data, such as unstructured, machine-generated, and big data (including organizing and structuring large datasets).
Unstructured data is a key area of interest in statistics and for a data scientist. It is usually described as data having no redefined model defined for it or is not organized in a predefined manner. Unstructured information is characteristically text-heavy but may also contain dates, numbers, and various other facts as well.
  • Intellectual curiosity: I love this. This is perhaps well defined as a character trait that comes in handy (if not required) if you want to be a data scientist. This means that you have a continuing need to know more than the basics or want to go beyond the common knowledge about a topic (you don't need a degree on the wall for this!)
  • Business acumen: To be a data developer or a data scientist you need a deep understanding of the industry you're working in, and you also need to know what business problems your organization needs to unravel. In terms of data science, being able to discern which problems are the most important to solve is critical in addition to identifying new ways the business should be leveraging its data.
  • Communication skills: All companies look for individuals who can clearly and fluently translate their findings to a non-technical team, such as the marketing or sales departments. As a data scientist, one must be able to enable the business to make decisions by arming them with quantified insights in addition to understanding the needs of their non-technical colleagues to add value and be successful.

Let's move ahead

So, let's finish up this chapter with some casual (if not common sense) advice for the data developer who wants to learn statistics and transition into the world of data science.

Following are several recommendations you should consider to be resources for familiarizing yourself with the topic of statistics and data science:

  • Books: Still the best way to learn! You can get very practical and detailed information (with examples) and advice from books. It's great you started with this book, but there is literally a staggering amount (and growing all the time) of written resources just waiting for you to consume.
  • Google: I'm a big fan of doing internet research. You will be surprised at the quantity and quality of open source and otherwise, free software libraries, utilities, models, sample data, white papers, blogs, and so on you can find out there. A lot of it can be downloaded and used directly to educate you or even as part of an actual project or deliverable.
  • LinkedIn: A very large percentage of corporate and independent recruiters use social media, and most use LinkedIn. This is an opportunity to see what types of positions are in demand and exactly what skills and experiences they require. When you see something you don't recognize, do the research to educate yourself on the topic. In addition, LinkedIn has an enormous number of groups that focus on statistics and data science. Join them all! Network with the members--even ask them direct questions. For the most part, the community is happy to help you (even if it's only to show how much they know).
  • Volunteer: A great way to build skills, continue learning, and expand your statistics network is to volunteer. Check out http://www.datakind.org/get-involved. If you sign up to volunteer, they will review your skills and keep in touch with projects that are a fit for your background or you are interested in coming up.
  • Internship: Experienced professionals may re-enlist as interns to test a new profession or break into a new industry (www.Wetfeet.com). Although perhaps unrealistic for anyone other than a recent college graduate, internships are available if you can afford to cut your pay (or even take no pay) for a period of time to gain some practical experience in statistics and data science. What might be more practical is interning within your own company as a data scientist apprentice role for a short period or for a particular project.
  • Side projects: This is one of my favorites. Look for opportunities within your organization where statistics may be in use, and ask to sit in meetings or join calls in your own time. If that isn't possible, look for scenarios where statistics and data science might solve a problem or address an issue, and make it a pet project you work on in your spare time. These kinds of projects are low risk as there will be no deadlines, and if they don't work out at first, it's not the end of the world.
  • Data: Probably one of the easiest things you can do to help your transition into statistics and data science is to get your hands on more types of data, especially unstructured data and big data. Additionally, it's always helpful to explore data from other industries or applications.
  • Coursera and Kaggle: Coursera is an online website where you can take Massive Online Open Curriculum (MOOCs) courses for a fee and earn a certification, while Kaggle hosts data science contests where you can not only evaluate your abilities as you transition against other members but also get access to large, unstructured big data files that may be more like the ones you might use on an actual statistical project.
  • Diversify: To add credibility to your analytic skills (since many companies are adopting numerous arrays of new tools every day) such as R, Python, SAS, Scala, (of course) SQL, and so on, you will have a significant advantage if you spend time acquiring knowledge in as many tools and technologies as you can. In addition to those mainstream data science tools, you may want to investigate some of the up-and-comers such as Paxada, MatLab, Trifacta, Google Cloud Prediction API, or Logical Glue.
  • Ask a recruiter: Taking the time to develop a relationship with a recruiter early in your transformation will provide many advantages, but a trusted recruiter can pass on a list of skills that are currently in demand as well as which statistical practices are most popular. In addition, as you gain experience and confidence, a recruiter can help you focus or fine-tune your experiences towards specific opportunities that may be further out on the horizon, potentially giving you an advantage over other candidates.
  • Online videos: Check out webinars and how to videos on YouTube. There are endless resources from both amateurs and professionals that you can view whenever your schedule allows.
lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image