Creating and running an example pipeline in Pachyderm Notebooks
In the previous section, we learned how to use Pachyderm Notebooks, create repositories, put data, and even created a simple pipeline. In this section, we will create a pipeline that performs sentiment analysis on a Twitter dataset.
Important note
The code described in this section can be found in the https://github.com/PacktPublishing/Reproducible-Data-Science-with-Pachyderm/blob/main/Chapter11-Using-Pachyderm-Notebooks/sentiment-pipeline.ipynb file.
We will use a modified version of the International Women's Day Tweets dataset from Kaggle available at https://www.kaggle.com/michau96/international-womens-day-tweets. Our modified version includes only two columns—tweet number # and text. The dataset includes 51,480 rows.
Here is an extract of the first few rows of the dataset:
# &...