Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon

TensorFlow announces TensorFlow Data Validation (TFDV) to automate and scale data analysis, validation, and monitoring

Save for later
  • 2 min read
  • 11 Sep 2018

article-image

Today the TensorFlow team announced the launch of TensorFlow Data Validation (TFDV), an open-source library that enables developers to understand, validate, and monitor their machine learning data at scale.

Why is TensorFlow Data Validation introduced?


While building machine learning algorithms a lot of attention is paid on improving their performance. However, if the input data is wrong, all this optimization effort goes to waste. Understanding and validating small amount of data is easy, you can do it manually as well. However, in the real-world this is not the case. Data in production is huge and often arrives continuously and in big chunks. This is why, it is necessary to automate and scale the tasks of data analysis, validation, and monitoring.

What are some features of TFDV?


TFDV is part of the TensorFlow Extended (TFX) platform, a TensorFlow-based general-purpose machine learning platform. It is already being used by Google every day to analyze and validate petabytes of data.

TFDV provides some of the following features:

  • It can compute descriptive statistics that provide a quick overview of the data in terms of the features that are present and the shapes of their value distributions.
  • It includes tools such as Facets Overview, which provides a visualization of the computed statistics for easy browsing.
  • Data-schema can be generated automatically to describe expectations about data such as required values, ranges, and vocabularies. Since writing a schema can be a tedious task for datasets with lots of features, TFDV provides a method to generate an initial version of the schema based on the descriptive statistics.
  • Unlock access to the largest independent learning library in Tech for FREE!
    Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
    Renews at £16.99/month. Cancel anytime
  • You can inspect the schema with the help of schema viewer.
  • You can identify anomalies such as missing features, out-of-range values, or wrong feature types with Anomaly detection.
  • Provides an anomalies viewer so that you can see what features have anomalies and learn more in order to correct them.


To learn more on how it is used in production, read the official announcement by TensorFlow on Medium and also check out TFDV’s GitHub repository.

Why TensorFlow always tops machine learning and artificial intelligence tool surveys

TensorFlow 2.0 is coming. Here’s what we can expect.

Can a production ready Pytorch 1.0 give TensorFlow a tough time?