Introduction
This book targets intermediate to advanced users who are familiar with Python, IPython, and scientific computing. In this chapter, we will give a brief recap on the fundamental tools we will be using throughout this book: IPython, the notebook, pandas, NumPy, and matplotlib.
In this introduction, we will give a broad overview of IPython and the Python scientific stack for high-performance computing and data science.
What is IPython?
IPython is an open source platform for interactive and parallel computing. It offers powerful interactive shells and a browser-based notebook. The notebook combines code, text, mathematical expressions, inline plots, interactive plots, and other rich media within a sharable web document. This platform provides an ideal framework for interactive scientific computing and data analysis. IPython has become essential to researchers, data scientists, and teachers.
IPython can be used with the Python programming language, but the platform also supports many other languages such as R, Julia, Haskell, or Ruby. The architecture of the project is indeed language-agnostic, consisting of messaging protocols and interactive clients (including the browser-based notebook). The clients are connected to kernels that implement the core interactive computing facilities. Therefore, the platform can be useful to technical and scientific communities that use languages other than Python.
In July 2014, Project Jupyter was announced by the IPython developers. This project will focus on the language-independent parts of IPython (including the notebook architecture), whereas the name IPython will be reserved to the Python kernel. In this book, for the sake of simplicity, we will just use the term IPython to refer to either the platform or the Python kernel.
A brief historical retrospective on Python as a scientific environment
Python is a high-level general-purpose language originally conceived by Guido van Rossum in the late 1980s (the name was inspired by the British comedy Monty Python's Flying Circus). This easy-to-use language is the basis of many scripting programs that glue different software components (glue language) together. In addition, Python comes with an extremely rich standard library (the batteries included philosophy), which covers string processing, Internet Protocols, operating system interfaces, and many other domains.
In the late 1990s, Travis Oliphant and others started to build efficient tools to deal with numerical data in Python: Numeric, Numarray, and finally, NumPy. SciPy, which implements many numerical computing algorithms, was also created on top of NumPy. In the early 2000s, John Hunter created matplotlib to bring scientific graphics to Python. At the same time, Fernando Perez created IPython to improve interactivity and productivity in Python. All the fundamental tools were here to turn Python into a great open source high-performance framework for scientific computing and data analysis.
Note
It is worth noting that Python as a platform for scientific computing was built slowly, step-by-step, on top of a programming language that was not originally designed for this purpose. This fact might explain a few minor inconsistencies or weaknesses of the platform, which do not preclude it from being one of the most popular open frameworks for scientific computing at this time. (You can also refer to http://cyrille.rossant.net/whats-wrong-with-scientific-python/.)
Notable competing open source platforms for numerical computing and data analysis include R (which focuses on statistics) and Julia (a young, high-level language that focuses on high performance and parallel computing). We will see these two languages very briefly in this book, as they can be used from the IPython notebook.
In the late 2000s, Wes McKinney created pandas for the manipulation and analysis of numerical tables and time series. At the same time, the IPython developers started to work on a notebook client inspired by mathematical software such as Sage, Maple, and Mathematica. Finally, IPython 0.12, released in December 2011, introduced the HTML-based notebook that has now gone mainstream.
In 2013, the IPython team received a grant from the Sloan Foundation and a donation from Microsoft to support the development of the notebook. IPython 2.0, released in early 2014, brought many improvements and long-awaited features.
What's new in IPython 2.0?
Here is a short summary of the changes brought by IPython 2.0 (succeeding v1.1):
- The notebook comes with a new modal user interface:
- In the edit mode, we can edit a cell by entering code or text.
- In the command mode, we can edit the notebook by moving cells around, duplicating or deleting them, changing their types, and so on. In this mode, the keyboard is mapped to a set of shortcuts that let us perform notebook and cell actions efficiently.
- Notebook widgets are JavaScript-based GUI widgets that interact dynamically with Python objects. This major feature considerably expands the possibilities of the IPython notebook. Writing Python code in the notebook is no longer the only possible interaction with the kernel. JavaScript widgets and, more generally, any JavaScript-based interactive element, can now interact with the kernel in real-time.
- We can now open notebooks in different subfolders with the dashboard, using the same server. A REST API maps local URIs to the filesystem.
- Notebooks are now signed to prevent untrusted code from executing when notebooks are opened.
- The dashboard now contains a Running tab with the list of running kernels.
- The tooltip now appears when pressing Shift + Tab instead of Tab.
- Notebooks can be run in an interactive session via
%run notebook.ipynb
. - The
%pylab
magic is discouraged in favor of%matplotlib inline
(to embed figures in the notebook) andimport matplotlib.pyplot as plt
. The main reason is that%pylab
clutters the interactive namespace by importing a huge number of variables. Also, it might harm the reproducibility and reusability of notebooks. - Python 2.6 and 3.2 are no longer supported. IPython now requires Python 2.7 or >= 3.3.
Roadmap for IPython 3.0 and 4.0
IPython 3.0 and 4.0, planned for late 2014/early 2015, should facilitate the use of non-Python kernels and provide multiuser capabilities to the notebook.
References
Here are a few references:
- The Python webpage at www.python.org
- Python on Wikipedia at http://en.wikipedia.org/wiki/Python_%28programming_language%29
- Python's standard library present at https://docs.python.org/2/library/
- Guido van Rossum on Wikipedia at http://en.wikipedia.org/wiki/Guido_van_Rossum
- Conversation with Guido van Rossum on the birth of Python available at www.artima.com/intv/pythonP.html
- History of scientific Python available at http://fr.slideshare.net/shoheihido/sci-pyhistory
- What's new in IPython 2.0 at http://ipython.org/ipython-doc/2/whatsnew/version2.0.html
- IPython on Wikipedia at http://en.wikipedia.org/wiki/IPython
- History of the IPython notebook at http://blog.fperez.org/2012/01/ipython-notebook-historical.html