Today the TensorFlow team announced the launch of TensorFlow Data Validation (TFDV), an open-source library that enables developers to understand, validate, and monitor their machine learning data at scale.
While building machine learning algorithms a lot of attention is paid on improving their performance. However, if the input data is wrong, all this optimization effort goes to waste. Understanding and validating small amount of data is easy, you can do it manually as well. However, in the real-world this is not the case. Data in production is huge and often arrives continuously and in big chunks. This is why, it is necessary to automate and scale the tasks of data analysis, validation, and monitoring.
TFDV is part of the TensorFlow Extended (TFX) platform, a TensorFlow-based general-purpose machine learning platform. It is already being used by Google every day to analyze and validate petabytes of data.
TFDV provides some of the following features:
To learn more on how it is used in production, read the official announcement by TensorFlow on Medium and also check out TFDV’s GitHub repository.
Why TensorFlow always tops machine learning and artificial intelligence tool surveys
TensorFlow 2.0 is coming. Here’s what we can expect.
Can a production ready Pytorch 1.0 give TensorFlow a tough time?