More GPU, more speed!
Since the standard utilities and tools for handling ML workflows all operate on tensors, there’s a need for a stable, in-memory data structure that can be used for interoperability between those frameworks. The trick to this is that any protocol for sharing this information would need to also be able to define what device the memory is allocated on. In the Python data community, one such standard that has been widely adopted is DLPack.
A note about GPUs
While we’ve mentioned GPUs before, I wanted to note something just in case you aren’t as familiar with why GPUs improve performance for these workflows. GPUs, being specialized, are explicitly more performant and efficient at performing specific types of data transformation and computations, particularly when working with tensors and vector math. The difficulty has always been in writing code for them, along with the cost of copying data between the CPU and GPU frequently. Thus, by allowing...