Benchmarking for speed and memory
Just comparing the classification performance of large models on a specific task or a benchmark turns out to be no longer sufficient. We must now take care of the computational cost of a particular model for a given environment (Random-Access Memory (RAM), CPU, GPU) in terms of memory usage and speed. The computational cost of training and deploying to production for inference are two main values to be measured. Two classes of the Transformer
library, PyTorchBenchmark
and TensorFlowBenchmark
, make it possible to benchmark models for both TensorFlow and PyTorch.
Before we start our experiment, we need to check our GPU capabilities with the following execution:
>>> import torch >>> print(f"The GPU total memory is {torch.cuda.get_device_properties(0).total_memory /(1024**3)} GB") The GPU total memory is 2.94921875 GB
The output is obtained from NVIDIA GeForce GTX 1050 (3 Gigabytes (GB)). We need more powerful resources...