Advanced techniques to reduce GPU memory
Now, let’s imagine that you are pretty far into your project. You’ve already identified your dataset, your use case, and your base model. You’ve tested this at a small scale on SageMaker, such as by using 1% of your data on the smallest version of your model, and this seems to be working well. You’ve used scaling laws, or have simply seen through another example that a large model would help you increase your accuracy, and you’re confident that you have enough data to justify that larger model. You’ve increased your model enough to run on at least two GPUs and have successfully tested this on AWS.
If you haven’t hit all of those stages, then frankly I’d recommend you simply skip to the next section. Here, we’re about to seriously dive into very complex, detailed, and niche topics in the cutting-edge space of model parallelism. If you aren’t ready for them, such as through...