Handling failed Batch loads
An Azure Batch job can fail due to four types of errors:
- Pool errors
- Node errors
- Job errors
- Task errors
Let's look at some of the common errors in each group and ways to handle them.
Pool errors
Pool errors occur mostly due to infrastructure issues, quota issues, or timeout issues. Here are some sample pool errors:
- Insufficient quota: If there is not enough of a quota for your Batch account, pool creation could fail. The mitigation is to request an increase in quota. You can check the quota limits here: https://docs.microsoft.com/en-us/azure/batch/batch-quota-limit.
- Insufficient resources in your VNet: If your virtual network (VNet) doesn't have enough resources, such as available IP addresses, Network Security Groups (NSGs), VMs, and so on, the pool creation process may fail. The mitigation is to look for these errors and request higher resource allocation or move to a different VNet that has enough...