Using Arrow with Machine Learning Workflows
We just covered how to use Arrow Database Connectivity (ADBC), which provides a highly efficient way to interact with a multitude of data sources. In this chapter, we’ll dip into a way to use that data: machine learning (ML). It’s not just a buzzword– ML is frequently utilized for pattern recognition, data-driven decision-making, and generative artificial intelligence (GenAI) systems. It might be a controversial opinion, but at its core, ML workflows are just a specialized form of a standard data pipeline. As a result, where there’s data processing, there’s the opportunity for Arrow to be extremely useful!
Whether you’re doing feature engineering, model training, preprocessing, or otherwise, many of the most common tools and utilities offer interoperability with Arrow. Some of those tools even use Arrow under the hood.
We’re going to cover the following topics in this chapter:
-
...