What about non-CPU device data?
Toward the end of the last chapter, Chapter 3, Format and Memory Handling, I brought up the topic of utilizing Arrow with GPUs and other non-CPU devices. This is an increasingly important topic as pre-processing analytical workflows try to keep up with the demands of providing the data that machine learning models need. There are several different libraries that are commonly utilized for GPU-based analytics by data scientists. The following are just a few examples:
- Numba: An open source Just-In-Time (JIT) compiler to translate a subset of Python and NumPy into low-level machine code with options to parallelize Python code on CPUs and GPUs.
- XGBoost: An open source library providing optimized distributed gradient boosting algorithms that also run on GPUs.
- PyTorch: An open source machine learning library typically used for computer vision and natural language processing, which also supports running on NVIDIA GPUs for performance enhancement...