Often, specialists in AI are asked to model, present, or come up with models. However, even though the solution could be commercially impactful, in practice, productionizing a proof of concept (POC) for live decisioning in order to actually act on the insight can be a bigger struggle than coming up with the models in the first place. Once we've created a model based on training data, analyzed it to verify that it's working to an expected standard, and communicated with stakeholders, we want to make that model available so it can provide predictions on data for new decisions. This can mean certain requirements, such as latency (for real-time applications), and bandwidth (for servicing a large volume of customers). Often, a model is deployed as part of a microservice such as an inference server.
In this recipe, we'll build a small inference server from scratch, and we'll focus on the technical challenges around bringing AI into production...