Exploring categorical and numerical data in IPython
We will start our explorations in IPython by loading a text file into a DataFrame, calculating some summary statistics, and visualizing distributions. For this exercise we'll use a set of movie ratings and metadata from the Internet Movie Database (http://www.imdb.com/) to investigate what factors might correlate with high ratings for films on this website. Such information might be helpful, for example, in developing a recommendation system based on this kind of user feedback.
Installing IPython notebook
To follow along with the examples, you should have a Windows, Linux, or Mac OSX operating system installed on your computer and access to the Internet. There are a number of options available to install IPython: since each of these resources includes installation guides, we provide a summary of the available sources and direct the reader to the relevant documentation for more in-depth instructions.
- For most users, a pre-bundled Python...