Using Arrow with the standard tools for ML
With the explosion of the Python ecosystem of ML tools and utilities, several frameworks have become the de facto standard for building out pipelines for training and running inference. The most popular of these are PyTorch and TensorFlow, both of which have integrations with Hugging Face, along with various systems that are built on top of them. Both TensorFlow and PyTorch are open source libraries, the former released under the Apache License 2.0 and the latter under the BSD-3 license.
The primary data structure that’s used in both TensorFlow and PyTorch is a tensor or n-dimensional array. ML models are generally made up of multiple layers of computations, where each layer has a tensor for input and a tensor for output to the next layer. Simply put, you can describe tensors as follows (depicted in Figure 9.9):
- A one-dimensional tensor is generally referred to as a vector – that is,
[1, 2,
3, 4]
- A two-dimensional...