As we saw previously, the hardware used for inference is crucial for speed. From the slowest option to the fastest, it is recommended to use the following:
- CPU: While slower, it is often the cheapest option.
- GPU: Faster but more expensive. Many smartphones have integrated GPUs that can be used for real-time applications.
- Specialized hardware: For instance, Google's TPU (for servers), Apple's Neural Engine (on mobile), or NVIDIA Jetson (for portable hardware). They are chips made specifically for running deep learning operations.
If speed is crucial for your application, it is important to use the fastest hardware available and to adapt your code.