Choosing data storage solutions for Azure Machine Learning
When running ML experiments or training scripts on your local development machine, you often don't think about managing your datasets. You probably store your training data on your local hard drive, external storage device, or file share. In such a case, accessing the data for experimentation or training is not a problem, and you don't have to worry about the data location, access permissions, maximal throughput, parallel access, storage and egress cost, data versioning, and such.
However, as soon as you start training an ML model on remote compute targets, such as a VM in the cloud or within Azure Machine Learning, you must make sure that all your executables can access the training data efficiently. This is even more relevant if you collaborate with other people who also need to access the data in parallel for experimentation, labeling, and training from multiple environments and multiple machines. And if you...