Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How to Optimize PUT Requests

Save for later
  • 8 min read
  • 04 Sep 2015

article-image

In this article by Naoya Hashiotmo, author of the book Amazon S3 Cookbook, explains how to optimize PUT requests, it would be effective to use multipart uploads because it can aggregate throughput by parallelizing PUT requests and uploading a large object into parts. It is recommended that the size of each part should be between 25 and 50 MB for higher networks and 10 MB for mobile networks.

(For more resources related to this topic, see here.)

Amazon S3 is a highly-scalable, reliable, and low-latency data storage service at a very low cost, designed for mission-critical and primary data storage. It provides the Amazon S3 APIs to simplify your programming tasks.

S3 performance optimization is composed of several factors, for example, which region to choose to reduce latency, considering the naming scheme and optimizing the put and get operations.

Multipart upload consists of three-step processes; the first step is initiating the upload, next is uploading the object parts, and finally, after uploading all the parts, the multipart upload is finished. The following methods are currently supported to upload objects with multipart upload:

  • AWS SDK for Android
  • AWS SDK for iOS
  • AWS SDK for Java
  • AWS SDK for JavaScript
  • AWS SDK for PHP
  • AWS SDK for Python
  • AWS SDK for Ruby
  • AWS SDK for .NET
  • REST API
  • AWS CLI

In order to try multipart upload and see how much it aggregates throughput, we use AWS SDK for Node.js and S3 via NPM (package manager for Node.js).

AWS CLI also supports multipart upload. When you use the AWS CLI s3 or s3api subcommand to upload an object, the object is automatically uploaded via multipart requests.

Getting ready

You need to complete the following set up in advance:

  • Sign up on AWS and be able to access S3 with your IAM credentials
  • Install and set up AWS CLI in your PC or use Amazon Linux AMI
  • Install Node.js

It is recommended that you score the benchmark from your local PC or if you use the EC2 instance, you should launch an instance and create an S3 bucket in different regions. For example, if you launch an instance in the Asia Pacific Tokyo region, you should create an S3 bucket in the US standard region. The reason is that the latency between EC2 and S3 is very low, and it is hard to see the difference.

How to do it…

We upload a 300 GB file in an S3 bucket over HTTP in two ways; one is to use a multipart upload and the other is not to use multipart upload to compare the time. To clearly see how the performance differs, I launched an instance and created an S3 bucket in different regions as follows:

  • EC2 instance: Asia Pacific Tokyo Region (ap-northeast-1)
  • S3 bucket: US Standard region (us-east-1)

First, we install the S3 Node.js module via npm, create a dummy file, upload the object into a bucket using a sample Node.js script without enabling multipart upload, and then do the same enabling multipart upload, so that we can see how multipart upload performs the operation. Now, let's move on to the instructions:

  1. Install s3 via the npm command:
    $ cdaws-nodejs-sample/
    $ npm install s3
  2. Create a 300 GB dummy file:
    $ file=300mb.dmp
    $ dd if=/dev/zero of=${file} bs=10M count=30
  3. Put the following script and save the script as s3_upload.js:
    // Load the SDK
    var AWS = require('aws-sdk');
    var s3 = require('s3');
    var conf = require('./conf');
    
    // Load parameters
    var client = s3.createClient({
    maxAsyncS3: conf.maxAsyncS3,
    s3RetryCount: conf.s3RetryCount,
    s3RetryDelay: conf.s3RetryDelay,
    multipartUploadThreshold: conf.multipartUploadThreshold,
    multipartUploadSize: conf.multipartUploadSize,
    });
    
    var params = {
    localFile: conf.localFile,
    
    s3Params: {
       Bucket: conf.Bucket,
       Key: conf.localFile,
    },
    };
    
    // upload objects
    console.log("## s3 Parameters");
    console.log(conf);
    
    console.log("## Begin uploading.");
    var uploader = client.uploadFile(params);
    uploader.on('error', function(err) {
    console.error("Unable to upload:", err.stack);
    });
    uploader.on('progress', function() {
    console.log("Progress", uploader.progressMd5Amount,uploader.progressAmount, uploader.progressTotal);
    });
    uploader.on('end', function() {
    console.log("## Finished uploading.");
    });
  4. Create a configuration file and save the file conf.js in the same directory as s3_upload.js:
    exports.maxAsyncS3 = 20;       // default value
    exports.s3RetryCount = 3;       // default value
    exports.s3RetryDelay = 1000;   // default value
    exports.multipartUploadThreshold = 20971520;   // default value
    exports.multipartUploadSize = 15728640;         // default value
    exports.Bucket = "your-bucket-name";
    exports.localFile = "300mb.dmp";
    exports.Key = "300mb.dmp";

How it works…

First of all, let's try uploading a 300 GB object using multipart upload, and then upload the same file without using multipart upload.

You can upload an object and see how long it takes by typing the following command:

$ time node s3_upload.js
## s3 Parameters
{ maxAsyncS3: 20,
s3RetryCount: 3,
s3RetryDelay: 1000,
multipartUploadThreshold: 20971520,
multipartUploadSize: 15728640,
localFile: './300mb.dmp',
Bucket: 'bucket-sample-us-east-1',
Key: './300mb.dmp' }
## Begin uploading.
Progress 0 16384 314572800
Progress 0 32768 314572800
…
Progress 0 314572800 314572800
Progress 0 314572800 314572800
## Finished uploading.

real 0m16.111s
user 0m4.164s
sys 0m0.884s

As it took about 16 seconds to upload the object, the transfer rate was 18.75 MB/sec.

Then, let's change the following parameters in the configuration (conf.js) as follows and see the result. The 300 GB object is uploaded through only one S3 client and

exports.maxAsyncS3 = 1;
exports. multipartUploadThreshold = 2097152000;

exports.maxAsyncS3 = 1;
exports.s3RetryCount = 3;       // default value
exports.s3RetryDelay = 1000;   // default value
exports.multipartUploadThreshold = 2097152000;
exports.multipartUploadSize = 15728640;         // default value
exports.Bucket = "your-bucket-name";
exports.localFile = "300mb.dmp";
exports.Key = "300mb.dmp";

Let's see the result after changing the parameters in the configuration (conf.js):

$ time node s3_upload.js
## s3 Parameters
…
## Begin uploading.
Progress 0 16384 314572800
…
Progress 0 314572800 314572800
## Finished uploading.

real   0m41.887s
user   0m4.196s
sys     0m0.728s

As it took about 42 seconds to upload the object, the transfer rate was 7.14 MB/sec.

Now, let's quickly check each parameter, and then get to the conclusion;

  • maxAsyncS3 defines the maximum number of simultaneous requests that S3 clients are open to Amazon S3. The default value is 20.
  • s3RetryCount defines the number of retries when a request fails. The default value is 3.
  • s3RetryDelay is how many milliseconds S3 clients will wait when a request fails. The default value is 1000.
  • multipartUploadThreshold defines the size of uploading objects via multipart requests. The object will be uploaded via multipart request, if you choose an object that is greater than the size you specified. The default value is 20 MB, the minimum is 5 MB, and the maximum is 5 GB.
  • multipartUploadSize defines the size for each part when uploaded via the multipart request. The default value is 15 MB, the minimum is 5 MB, and the maximum is 5 GB.

The following table shows the speed test score with different parameters:

maxAsyncS3

1

20

20

40

30

s3RetryCount

3

3

3

3

3

s3RetryDelay

1000

1000

1000

1000

1000

multipartUploadThreshold

2097152000

20971520

20971520

Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime

20971520

20971520

multipartUploadSize

15728640

15728640

31457280

15728640

10728640

Time (seconds)

41.88

16.11

17.41

16.37

9.68

Transfer Rate (MB)

7.51

19.53

18.07

19.22

32.50

In conclusion, multipart upload is effective for optimizing the PUT operation, aggregating throughput. However, you need to consider the following:

  • Benchmark your scenario and evaluate the number of retry count, delay, parts, and the multipart upload size based on the networks that your application belongs to.

There's more…

Multipart upload specification

There are limits to using multipart upload. The following table shows the specification of multipart upload:

Item

Specification

Maximum object size

5 TB

Maximum number of parts per upload

10,000

Part numbers

1 to 10,000 (inclusive)

Part size

5 MB to 5 GB, last part can be more than 5 MB

Maximum number of parts returned for a list of parts request

1,000

Maximum number of multipart uploads returned in a list of multipart uploads request

1,000

Multipart upload and charging

If you initiate multipart upload and abort the request, Amazon S3 deletes the upload artifacts and any parts you have uploaded and you are not charged for the bills. However, you are charged for all storage, bandwidth, and requests for the multipart upload requests and the associated parts of an object after the operation is completed. The point is you are charged when a multipart upload is completed (not aborted).

See also

  • Multipart Upload Overview https://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html
  • AWS SDK for Node.js http://docs.aws.amazon.com/AWSJavaScriptSDK/guide/node-intro.htm
  • Node.js S3 package npm https://www.npmjs.com/package/s3
  • Amazon Simple Storage Service: Introduction to Amazon S3 http://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html
  • (PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014 http://www.slideshare.net/AmazonWebServices/pfc403-maximizing-amazon-s3-performance-aws-reinvent-2014
  • AWS re:Invent 2014 | (PFC403) Maximizing Amazon S3 Performance https://www.youtube.com/watch?v=_FHRzq7eHQc

Summary

In this article, we learned how to optimize the PUT requests and uploading a large object into parts.

Resources for Article:


Further resources on this subject: