Data labeling using Snorkel
In this section, we are going to learn what Snorkel is and how we can use it to label data in Python programmatically.
Labeling data is an important step of a data science project and critical for training models to solve specific business problems.
In many real-world cases, training data does not have labels, or very little data with labels is available. For example, in a housing dataset, in some neighborhoods, historical housing prices may not be available for most of the houses. Another example, in the case of finance, is all transactions may not have an associated invoice number. Historical data with labels is critical for businesses to train models and automate their business processes using machine learning (ML) and artificial intelligence. However, this requires either outsourcing the data labeling to expensive domain experts or the business waiting for a long time to get new training data with labels.
This is where Snorkel comes into the...