Netflix uses a variety of tools to do data analysis. One of the big ways that data scientists and engineers at Netflix interact with their data is through Jupyter notebooks. In addition to providing execution environments to users, Netflix invests in various parts of the Jupyter ecosystem and tooling. They are “reimagining what a notebook can be, who can use it, and what they can do with it.”
Netflix aims to provide personalized content to their 130 million viewers. For this every day more than 1 trillion events are written into a streaming ingestion pipeline. To support this, they’ve built an industry-leading data platform which is flexible, powerful, and complex. There are so many diverse users of this platform, such as analytics engineers, data engineers, and data scientists, requiring different sets of tools and languages. To help the platform scale, they wanted to minimize the number of tools and the solution to this was the open-source tool: Jupyter notebooks.
These are the functionalities provided by notebook that benefits Netflix’s data scientists and engineers:
The following are some of the use cases they use Jupyter notebooks for:
Data access: Notebooks were first introduced for workflows and their adoption grew among the data scientists. Seeing this, Netflix decided to leverage its versatility and architecture for general data access. Notebooks provide an user-friendly interface for interactively running code, exploring the outputs, and visualizing data all from a single cloud-based development environment.
Notebook Templates: They introduced parameterized notebooks, which allow the use of parameters in the code and take values as input at runtime. These templates help:
Scheduling notebooks: Next they are using notebooks for creating a unifying layer for scheduling workflows. Notebooks are used for interactive work and allows smooth move to scheduling that work to run recurrently. Many users create an entire workflow in a notebook and just copy/paste it into separate files for scheduling when they’re ready to deploy it.
Notebook infrastructure: The three fundamental components of the infrastructure are: storage, compute, and interface.
Netflix is planning to make investments in both the frontend and backend to improve the overall notebook experience. This year they are also sponsoring JupyterCon.
To read more about how Jupyter is offering value to Netflix read Netflix’s original post at Medium.
10 reasons why data scientists love Jupyter notebooks
What’s new in Jupyter Notebook 5.3.0
Netflix open sources Zuul 2 cloud gateway