Training a model using Ray
Ray is an open source execution framework for scaling Python workloads across machines (https://www.ray.io). The following Python workloads are supported by Ray:
- DL model training implemented with PyTorch or TF
- Hyperparameter tuning via Ray Tune (https://docs.ray.io/en/latest/tune/index.html)
- Reinforcement learning (RL) via RLlib (https://docs.ray.io/en/latest/rllib/index.html), an open source library for RL
- Data processing leveraging Ray Datasets (https://docs.ray.io/en/latest/data/dataset.html)
- Model serving via Ray Serve (https://docs.ray.io/en/latest/serve/index.html)
- A general Python application leveraging Ray Core (https://docs.ray.io/en/latest/ray-core/walkthrough.html)
The key advantage of Ray comes from the simplicity of its cluster definition; you can define a cluster with machines of different types and from various sources. For example, Ray allows you to build instance fleets (clusters based on a wide variety...