Benchmarking for speed and memory
Just comparing the classification performance of large models on a specific task or a benchmark might turn out to be no longer sufficient. We may need faster inference and memory performance. Thus, we must also consider the computational cost of a particular model for a given environment (random-access memory (RAM), CPU, and GPU) in terms of memory usage and speed. The computational cost of training and deploying to production for inference are the two main values to be measured. Two classes of the Transformer
library, PyTorchBenchmark
and TensorFlowBenchmark
, make it possible to benchmark models for both TensorFlow and PyTorch.
Before we start our experiment, we need to check our GPU capabilities with the following execution:
import torch print(f"The GPU total memory is \ {torch.cuda.get_device_properties(0).total_memory/(1024**3)} GB") The GPU total memory is 2.89 GB
The output is obtained from NVIDIA GeForce...