TensorFlow Extended for production
TFX is an end-to-end platform for deploying machine learning pipelines. A part of the TensorFlow ecosystem, it provides a configuration framework and shared libraries so as to integrate the common components needed to define, launch, and monitor software based on ML models. TFX includes many of the requirements for production software deployments and best practices, viz: scalability, consistency, testability, safety and security, and so on.
It starts with ingesting your data, followed by data validation, feature engineering, training, and serving. Google has created libraries for each major phase of the pipeline, and there are frameworks for a wide range of deployment targets. TFX implements a series of ML pipeline components. All of this is made possible by creating horizontal layers for things like pipeline storage, configuration, and orchestration. These layers are very important for managing and optimizing the pipelines and the applications that you run on them.
You will need to install it first. TensorFlow Extended can be installed using the pip
command:
pip install tfx
In the following section we will cover the fundamentals of TFX, its architecture, and the various libraries available within it.
TFX Pipelines
The TFX pipeline consists of a sequence of components that implement an ML pipeline, specifically, ensuring the scalability and high performance of the underlined ML task. It includes modeling, training, inference, and deployment to web or mobile targets. A TFX pipeline includes several components, with each component consisting of three main elements: Driver, Executor, and and the Publisher. The driver queries the metadata store and supplies the resultant metadata to the executor, publisher accepts the results of the executor and saves then in metadata. The executor is the one performing all the processing. As an ML software developer, you will need to write code that runs in the executor depending upon the component class you are working with:

In a TFX pipeline, a unit of data, called an artifact, is passed between components. Normally a component has one input artifact and one output artifact. Every artifact has an associated metadata that defines its type and properties. The artifact type defines the ontology of artifacts in the entire TFX system, while the artifact property specifies the ontology specific to an artifact type. Users have the option to extend the ontology globally or locally.
TFX pipeline components
The following diagram shows the flow of data between different TFX components:

Flow of data between TFX components
All the images in the TFX section have been adapted from the TensorFlow Extended official guide: https://www.tensorflow.org/tfx/guide.
To begin with we have ExampleGen, which ingests the input data, and can also split the input dataset. The data then flows to StatisticsGen, which calculates the statistics of the dataset. Then comes SchemaGen, which examines the statistics and creates a data schema; then an ExampleValidator, which looks for anomalies and missing values in the data; and Transform, which performs feature engineering in the dataset. The transformed dataset is then fed to the Trainer, which trains the model. The performance of the model is evaluated using Evaluator and ModelValidator. Finally, if all is well, the Pusher deploys the model on the serving infrastructure.
TFX libraries
TFX provides several Python packages that are used to create pipeline components. Quoting from the TensorFlow Extended User Guide (https://www.tensorflow.org/tfx/guide).
These packages are the libraries which you will use to create the components of your pipelines so that your code can focus on the unique aspects of your pipeline.
Different libraries included in TFX are:
- TensorFlow Data Validation (TFDV) is a library for analyzing and validating machine learning data
- TensorFlow Transform (TFT) is a library for preprocessing data with TensorFlow
- TensorFlow is used for training models with TFX
- TensorFlow Model Analysis (TFMA) is a library for evaluating TensorFlow models
- TensorFlow Metadata (TFMD) provides standard representations for metadata that are useful when training machine learning models with TensorFlow
- ML Metadata (MLMD) is a library for recording and retrieving metadata associated with ML developers and data scientists' workflows
The following diagram demonstrates the relationship between TFX libraries and pipeline components:

Figure 7: Relationships between TFX libraries and pipeline components, visualized
TFX uses the open source Apache Beam to implement data-parallel pipelines. Optionally TFX allows Apache Airflow and Kubeflow for easy configuration, operation, monitoring, and maintenance of the ML pipeline. Once the model is developed and trained, using TFX you can deploy it to one or more deployment target(s) where it will receive inference requests. TFX supports deployment to three classes of deployment targets: TensorFlow Serving (works with REST or gRPC interface), TensorFlow.js (for browser applications), and TensorFlow Lite (for native mobile and IoT applications). Trained models that have been exported as SavedModels can be deployed to any or all of these deployment targets.