Having a beefy desktop machine with a GPU and an Ubuntu build is great for prototyping and research, but when it comes time to getting your model into production, and to actually making the day-to-day predictions required by your use case, you need compute resources that are highly available and scalable. What does that actually mean?
Imagine you've taken our Convolutional Neural Network (CNN) example, tweaked the model and trained it on your own data, and created a simple REST API frontend to call the model. You want to build a little business around providing clients with a service whereby they pay some money, get an API key, and can submit an image to an endpoint and get a reply stating what that image contains. Image recognition as a service! Does this sound good?
How would we make sure our service is always available and fast? After all...