Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Data Analysis with Python

You're reading from   Data Analysis with Python A Modern Approach

Arrow left icon
Product type Paperback
Published in Dec 2018
Publisher Packt
ISBN-13 9781789950069
Length 490 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
David Taieb David Taieb
Author Profile Icon David Taieb
David Taieb
Arrow right icon
View More author details
Toc

Table of Contents (14) Chapters Close

Preface 1. Programming and Data Science – A New Toolset FREE CHAPTER 2. Python and Jupyter Notebooks to Power your Data Analysis 3. Accelerate your Data Analysis with Python Libraries 4. Publish your Data Analysis to the Web - the PixieApp Tool 5. Python and PixieDust Best Practices and Advanced Concepts 6. Analytics Study: AI and Image Recognition with TensorFlow 7. Analytics Study: NLP and Big Data with Twitter Sentiment Analysis 8. Analytics Study: Prediction - Financial Time Series Analysis and Forecasting 9. Analytics Study: Graph Algorithms - US Domestic Flight Data Analysis 10. The Future of Data Analysis and Where to Develop your Skills A. PixieApp Quick-Reference Other Books You May Enjoy Index

Why am I writing this book?

As I'll explain in more detail in Chapter 1, Programming and Data Science – A New Toolset, I am first and foremost a developer with over 20 years, experience of building software components of a diverse nature; frontend, backend, middleware, and so on. Reflecting back on this time, I realize how much getting the algorithms right always came first in my mind; data was always somebody else's problem. I rarely had to analyze it or extract insight from it. At best, I was designing the right data structure to load it in a way that would make my algorithm run more efficiently and the code more elegant and reusable.

However, as the Artificial Intelligence and data science revolution got under way, it became obvious to me that developers like myself needed to get involved, and so 7 years ago in 2011, I jumped at the opportunity to become the lead architect for the IBM Watson core platform UI & Tooling. Of course, I don't pretend to have become an expert in machine learning or NLP, far from it. Learning through practice is not a substitute for getting a formal academic background.

However, a big part of what I want to demonstrate in this book is that, with the right tools and approach, someone equipped with the right mathematical foundations (I'm only talking about high-school level calculus concepts really) can quickly become a good practitioner in the field. A key ingredient to being successful is to simplify as much as possible the different steps of building a data pipeline; from acquiring, loading, and cleaning the data, to visualizing and exploring it, all the way to building and deploying machine learning models.

It was with an eye to furthering this idea of making data simple and accessible to a community beyond data scientists that, 3 years ago, I took on a leading role at the IBM Watson Data Platform team with the mission of expanding the community of developers working with data with a special focus on education and activism on their behalf. During that time as the lead developer advocate, I started to talk openly about the need for developers and data scientists to better collaborate in solving complex data problems.

Note

Note: During discussions at conferences and meetups, I would sometimes get in to trouble with data scientists who would get upset because they interpreted my narrative as me saying that data scientists are not good software developers. I want to set the record straight, including with you, the data scientist reader, that this is far from the case.

The majority of data scientists are excellent software developers with a comprehensive knowledge of computer science concepts. However, their main objective is to solve complex data problems which require rapid, iterative experimentations to try new things, not to write elegant, reusable components.

But I didn't want to only talk the talk; I also wanted to walk the walk and started the PixieDust open source project as my humble contribution to solving this important problem. As the PixieDust work progressed nicely, the narrative became crisper and easier to understand with concrete example applications that developers and data scientists alike could become excited about.

When I was presented with the opportunity to write a book about this story, I hesitated for a long time before embarking on this adventure for mainly two reasons:

  • I have written extensively in blogs, articles, and tutorials about my experience as a data science practitioner with Jupyter Notebooks. I also have extensive experience as a speaker and workshop moderator at a variety of conferences. One good example is the keynote speech I gave at ODSC London in 2017 titled, The Future of Data Science: Less Game of Thrones, More Alliances (https://odsc.com/training/portfolio/future-data-science-less-game-thrones-alliances). However, I had never written a book before and had no idea of how big a commitment it would be, even though I was warned many times by friends that had authored books before.
  • I wanted this book to be inclusive and target equally the developer, the data scientist, and the line of business user, but I was struggling to find the right content and tone to achieve that goal.

In the end, the decision to embark on this adventure came pretty easily. Having worked on the PixieDust project for 2 years, I felt we had made terrific progress with very interesting innovations that generated lots of interest in the open-source community and that writing a book would complement nicely our advocacy work on helping developers get involved in data science.

As a side note, for the reader who is thinking about writing a book and who has similar concerns, I can only advise on the first one with a big, "Yes, go for it." For sure, it is a big commitment that requires a substantial amount of sacrifice but provided that you have a good story to tell with solid content, it is really worth the effort.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime