AWS Storage Gateway is a service that provides a series of solutions to expand your storage infrastructure into the AWS cloud for purposes such as data migration, file shares, backup, and archiving. It uses standard protocols to access AWS storage services such as Amazon Simple Storage Service (S3), Amazon S3 Glacier, Amazon Elastic Block Store (EBS) snapshots, and Amazon FSx.
There are three different flavors of Storage Gateway as listed here:
- File Gateway
- Volume Gateway
- Tape Gateway
The following section dives into the details of each.
File Gateway
File Gateway is nowadays further split into two distinct types: S3 File Gateway and FSx File Gateway.
S3 File Gateway
Initially the only available type of file gateway when AWS Storage Gateway launched, S3 File Gateway allows you to store files on S3 transparently accessible from your on-premises environment through the Network File System (NFS) and Server Message Block (SMB) protocols. S3 File Gateway does a one-to-one mapping of your files to S3 objects and stores the file metadata (for example, Portable Operating System Interface (POSIX) file access control lists (ACLs)) in the S3 object metadata. The files are written synchronously to the file gateway local cache before being copied over to S3 asynchronously.
Concretely, S3 File Gateway comes either as a preset hardware appliance or as a software appliance that you deploy in your on-premises environment. The software appliance consists of a virtual machine (VM) that can run either on VMware Elastic Sky X (ESX), Microsoft Hyper-V, or a Linux kernel-based VM (KVM) hypervisor (but also on Amazon EC2 instances, should you need to).
See the following diagram for an illustration of how S3 File Gateway works:
Figure 2.7: Amazon S3 File Gateway
Once deployed and configured, your servers on-premises can use it like any other file share through the NFS and SMB protocols. Multiple elements can condition the performance of your gateway, but key factors are CPUs, local disk size, and network capacity.
The CPU resources and network capacity available to the appliance will directly influence the amount of data the gateway can process in parallel. The local disk size assigned to the file gateway will condition the cache size (on the hardware appliance, this is obviously constrained by the amount of physical storage available, so it is best to think it through before ordering the appliance). The cache size is to be determined such that it provides enough capacity to store your most frequently accessed files so that they benefit from low-latency access. On the software appliance, you can always add more cache capacity (additional storage volumes) later if you realize that your cache is undersized.
In terms of security, it remains your responsibility to control and manage access to the S3 bucket(s) sitting behind the gateway and to follow best practices. Therefore, remember to set up the right permissions (Identity and Access Management (IAM) role identity-based policies and/or S3 bucket policies) accordingly, to follow a least-privileges approach.
Because the files are ultimately stored as objects on S3, you also have the freedom to use the rich set of capabilities Amazon S3 provides to manage their life cycle, such as life cycle policies, versioning, cross-replication rules, and so on.
Finally, back up your file gateway storage. AWS Backup integrates with AWS Storage Gateway, so you can back up your file gateway storage to AWS. AWS Backup stores the gateway backup on Amazon S3 as EBS snapshots that can later be restored either on-premises or on AWS.
FSx File Gateway
Amazon FSx File Gateway is a recent addition to the AWS Storage Gateway family to provide access to Amazon FSx for Windows File Server file shares on AWS from your on-premises environment. The idea is very similar to S3 File Gateway, which is that you can access the data on AWS through either a physical hardware appliance or a software appliance that you deploy on-premises either on VMware ESX, Microsoft Hyper-V, or a Linux KVM hypervisor (but also on Amazon EC2 instances, should you need to).
There are a few major differences from the S3 File Gateway service as outlined below:
- Your files, managed through FSx File Gateway, will be available through the SMB protocol only (you cannot use NFS).
- You need to have previously deployed an Amazon FSx for Windows File Server filesystem in your AWS environment.
- You must have access via VPN or DX from your on-premises environment to that Amazon FSx for Windows File Server filesystem on your AWS environment.
For the above reasons, the use cases for each gateway type are also slightly different.
You would use Amazon S3 File Gateway when you want to access files you have stored on S3 from on-premises or want to make files you store on-premises available on S3 for further processing on AWS. In this case, you can then leverage all the services AWS provides to run all sorts of data analytics, including machine learning capabilities, to analyze the data on S3.
You would rather use Amazon FSx File Gateway when you want to move on-premises network file shares accessed through the SMB protocol to the cloud and keep accessing them seamlessly from your on-premises environment. Think of cases such as user home directories, team file shares, and so on.
See the following diagram for an illustration of how Amazon FSx File Gateway works:
Figure 2.8: FSx File Gateway
Amazon FSx File Gateway is integrated with AWS Backup, so you can also manage and automate backups centrally, like Amazon S3 File Gateway. Additionally, you can activate Microsoft snapshotting technology and Microsoft Windows Shadow Copy on your version of Amazon FSx for the Windows File Server filesystem to allow users to easily view and restore files and folders on your file shares from a snapshot.
Volume Gateway
Volume Gateway allows you to create storage volumes on S3 that offer a block storage interface accessible from your on-premises environment through the standard Internet Small Computer Systems Interface (iSCSI) protocol.
Concretely, Volume Gateway comes either as a preset hardware appliance or as a software appliance that you deploy in your on-premises environment. The software appliance consists of a VM that can run either on VMware ESX, Microsoft Hyper-V, or a Linux KVM hypervisor (but also on Amazon EC2 instances, should you need to).
You have the choice between two operations modes for your Volume Gateway Service. Either you cache a portion of the data (cached volume) or keep a full copy of the volume (stored volume) locally on the gateway.
With cached volumes, as illustrated in Figure 2.9, you can reduce the amount of storage you need on-premises by limiting it to store the most frequently accessed data. In this scenario, Volume Gateway stores all your data on storage volumes on Amazon S3 and retains only the most recently accessed data on your local cache storage on-premises for low-latency access. You can additionally take incremental backups, also known as snapshots, of your storage volumes in Amazon S3. These snapshots are also stored in Amazon S3 as Amazon EBS snapshots. If you need to recover your data after an incident, these snapshots can be restored to a storage volume on your gateway.
Alternatively, for cases such as application migration to the cloud or DR in the cloud, you can create a new Amazon EBS volume from one of your EBS snapshots (provided the snapshot is not larger than 16 tebibytes (TiB)) and then attach it to an Amazon EC2 instance.
See the following diagram for an illustration of how Volume Gateway works with cached volumes:
Figure 2.9: Volume Gateway (cached volumes)
With stored volumes, as illustrated in Figure 2.10, you retain your data entirely on-premises for low-latency access. In this case, Volume Gateway makes use of your local storage for storing your entire set of data and creates a backup copy of your volumes to Amazon S3 to provide durable offsite backup. The backup copy is performed asynchronously through Amazon EBS snapshots on Amazon S3:
Figure 2.10: Volume Gateway (stored volumes)
Volume Gateway can serve multiple use cases, such as the following:
- Hybrid cloud storage for file services (expandable cloud storage for on-premises file servers)
- Backup and DR (offsite durable storage with DR capability in the cloud)
- Application data migration (application ready to start in the cloud with a copy of the data)
Now, you may be wondering how to choose between cached volumes and stored volumes. Well, they serve slightly different use cases, don’t they? On the one hand, Cached Volumes gives you the opportunity to keep your most frequently accessed data on-premises for low latency access, while storing everything else—that is, cold(er) data—on Amazon S3. Thus, they let you keep the storage hardware you need on-premises to a minimum. They are a great solution when only a limited portion of your overall data is frequently accessed and when reducing your on-premises storage footprint and related costs is important to you. Maybe you need to expand your overall storage capacity but don’t want to do so on-premises. Occasional longer data access times must also be acceptable in this case (when the requested data is not in the local cache).
On the other hand, Stored Volumes keeps your entire dataset on-premises in local storage for low latency access. They are particularly adapted for cases where longer data access cannot be tolerated and where the focus is not on reducing your on-premises storage infrastructure footprint or costs as much as it is on improving the durability of your data and providing an additional option for DR in the cloud.
Tape Gateway
Tape Gateway offers a virtual tape library (VTL) service backed by storage on Amazon S3 and accessible on-premises through the standard iSCSI protocol.
Concretely, Tape Gateway comes either as a preset hardware appliance or as a software appliance that you deploy in your on-premises environment. The software appliance consists of a VM that can run either on VMware ESX, Microsoft Hyper-V, or a Linux KVM hypervisor (but also on Amazon EC2 instances, should you need to).
As illustrated in the following diagram, Tape Gateway provides a VTL infrastructure that scales seamlessly, without the burden of having to operate or maintain the tape infrastructure on-premises. It integrates with the most popular backup solutions on the market, so chances are high that you can keep using your existing backup application. Now, the major difference from your previous physical tape solution or VTL solution is that Tape Library will store your virtual tapes in the cloud on Amazon S3. When your backup application sends data to the tape gateway, the data is first stored locally on the gateway and then copied over to the virtual tapes on Amazon S3 asynchronously:
Figure 2.11: Tape gateway
Just as with any VTL solution, Tape Gateway proposes the concepts of a tape drive and media changer. Both the tape drive and media changer are available to your backup application as iSCSI devices.
The tape archive also offers the possibility to archive your tapes. When your backup application instructs Tape Gateway to archive a tape, the tape will be moved to a lower-cost storage tier using Amazon S3 Glacier or Amazon S3 Glacier Deep Archive.
Additional Considerations
To wrap up what was just covered, AWS Storage Gateway offers three different types of gateways to enable a hybrid storage architecture across your on-premises infrastructure and your AWS environment. You leverage each of these three types depending on the use case at stake—File Gateway when setting up a hybrid file server infrastructure, Volume Gateway when expanding your block storage infrastructure to the cloud, and Tape Gateway for replacing your physical tape infrastructure with virtual tapes on AWS.
The following section will take you through a few additional considerations to better plan the actual implementation of such a hybrid storage infrastructure.
Resiliency
The gateway, hardware, or software appliance is by default a SPOF. So, what are your options to deal with any type of failure, for instance, if a component crashes or at least stops responding, whether it is due to the appliance, the hypervisor, the network, and so on?
In the case of a software appliance that you deploy on VMware ESXi, you have an option to enable high availability (HA) using VMware HA. AWS Storage Gateway provides a series of application health checks that VMware HA can interpret to automatically recover your storage gateway when the health-check thresholds you specify are breached. That will cater to most failure cases.
This option is most useful when organizations cannot tolerate a long interruption of service or any data loss.
Quotas
As with any other AWS service, AWS Storage Gateway is bound by certain quotas. These quotas can be soft or hard limits constraining the service. Different quotas apply depending on the flavor of storage gateway that you implement. Here is an indication of the main quotas for each different type, but remember to check the AWS documentation to have the latest and most up-to-date figures:
- File Gateway quotas concern the maximum number of file shares per gateway (10), the maximum size of an individual file in the share (5 TB), the maximum path length (1,024 TiB). Note that one file share maps exactly to one Amazon S3 bucket. Adding more file shares will add more S3 buckets onto your AWS environment, so you also need to make sure you will not be exceeding your Amazon S3 quotas.
- Volume Gateway quotas are the maximum size of a volume (32 TiB for cached volumes; 16 TiB for stored volumes), the maximum number of volumes per gateway (32), the maximum size of all volumes per gateway (1,024 TiB for cached volumes; 512 TiB for stored volumes).
- Tape Gateway quotas concern the minimum and maximum sizes of a virtual tape (100 gibibytes (GiB) -> 5 TiB), the maximum number of virtual tapes per virtual tape library (1,500), the total size of all tapes in a library (1 pebibyte (PiB)).
This concludes the first half of this chapter, which focused on the creation of a hybrid infrastructure across on-premises infrastructure and AWS. In the second half of this chapter, you will investigate how to enhance communication first between your private environment on AWS and AWS services or third-party services offered on AWS, and secondly, within the realm of your AWS environment.
The following sections will describe how you can improve communication between your private environment on AWS and AWS services or third-party services offered on AWS.