Data wrangling with iPython
I found iPython to be the best way to learn Spark. It is also a very good choice for data scientists and data engineers to explore, model, and reason with data.
- The exploration step includes understanding the data, experimenting with multiple transformations, extracting features for aggregation, and machine learning as well as ETL strategies
- The modeling and reason (of relationships and distributions between the variables) steps require fast iteration over the data and extracted features with different algorithms, experimenting with different parameters and arriving at a set of ML algorithms to develop an analytics app
The iPython installation for your system (depending on OS, CPU, and so on) is best described at the iPython site, http://ipython.org/install.html and https://ipython.readthedocs.org/en/stable/install/install.html. The iPython command shell requires the Jupyter notebook system, and then the iPython libraries. Of course, you also would need to have...