Choosing the correct deep learning framework is important to get the optimal speed and model size you desire. There are several things to consider—overhead, added by the library, GPU acceleration, do you need training or inference only?, in which framework existing solutions were implemented.
You should understand that you don't always need GPU acceleration. Sometimes, SIMD/Accelerate is more than enough to implement neural networks that do inference in real-time.
Sometimes, you have to consider whether the calculations are going to be done on the client side, on the server side, or if they will be balanced between both. Try to do benchmarks with an extreme number of records, and test them with different devices.