Managing large genomics data on AWS
Apart from the large size of the genomics dataset, other challenges for managing it include discoverability, accessibility, availability, and storing it in a storage system that allows for scalable data processing while keeping the critical data safe. The responsible and secure sharing of genomic and health data is key to accelerating research and improving human health, is a stated objective of the Global Alliance for Genomics and Health (GA4GH). This approach requires two important things: one is a deep technical understanding of the domain, and the second is access to compute and storage resources. You can also find many genomics datasets hosted by AWS on the Registry of Open Data on AWS (https://registry.opendata.aws/).
Before you can begin any processing on the genomics dataset using cloud services, you need to make sure that it’s transferred and stored on the AWS cloud. For storing data, we recommend using Amazon Simple Storage Services...