To run models on NVIDIA GPUs, two libraries are mandatory—CUDA and cuDNN. TensorFlow natively exploits the speed-up offered by those libraries.
To properly run operations on the GPU, the tensorflow-gpu package must be installed. Moreover, the CUDA version of tensorflow-gpu must match the one installed on the computer.
Some modern GPUs offer Floating Point 16 (FP16) instructions. The idea is to use reduced precision floats (16 bits instead of the 32 bits commonly used) in order to speed up inference while not impacting the output quality by much. Not all GPUs are compatible with FP16.