




















































In this article by Naoya Hashiotmo, author of the book Amazon S3 Cookbook, explains how to optimize PUT requests, it would be effective to use multipart uploads because it can aggregate throughput by parallelizing PUT requests and uploading a large object into parts. It is recommended that the size of each part should be between 25 and 50 MB for higher networks and 10 MB for mobile networks.
(For more resources related to this topic, see here.)
Amazon S3 is a highly-scalable, reliable, and low-latency data storage service at a very low cost, designed for mission-critical and primary data storage. It provides the Amazon S3 APIs to simplify your programming tasks.
S3 performance optimization is composed of several factors, for example, which region to choose to reduce latency, considering the naming scheme and optimizing the put and get operations.
Multipart upload consists of three-step processes; the first step is initiating the upload, next is uploading the object parts, and finally, after uploading all the parts, the multipart upload is finished. The following methods are currently supported to upload objects with multipart upload:
In order to try multipart upload and see how much it aggregates throughput, we use AWS SDK for Node.js and S3 via NPM (package manager for Node.js).
AWS CLI also supports multipart upload. When you use the AWS CLI s3 or s3api subcommand to upload an object, the object is automatically uploaded via multipart requests.
You need to complete the following set up in advance:
It is recommended that you score the benchmark from your local PC or if you use the EC2 instance, you should launch an instance and create an S3 bucket in different regions. For example, if you launch an instance in the Asia Pacific Tokyo region, you should create an S3 bucket in the US standard region. The reason is that the latency between EC2 and S3 is very low, and it is hard to see the difference.
We upload a 300 GB file in an S3 bucket over HTTP in two ways; one is to use a multipart upload and the other is not to use multipart upload to compare the time. To clearly see how the performance differs, I launched an instance and created an S3 bucket in different regions as follows:
First, we install the S3 Node.js module via npm, create a dummy file, upload the object into a bucket using a sample Node.js script without enabling multipart upload, and then do the same enabling multipart upload, so that we can see how multipart upload performs the operation. Now, let's move on to the instructions:
$ cdaws-nodejs-sample/
$ npm install s3
$ file=300mb.dmp
$ dd if=/dev/zero of=${file} bs=10M count=30
// Load the SDK
var AWS = require('aws-sdk');
var s3 = require('s3');
var conf = require('./conf');
// Load parameters
var client = s3.createClient({
maxAsyncS3: conf.maxAsyncS3,
s3RetryCount: conf.s3RetryCount,
s3RetryDelay: conf.s3RetryDelay,
multipartUploadThreshold: conf.multipartUploadThreshold,
multipartUploadSize: conf.multipartUploadSize,
});
var params = {
localFile: conf.localFile,
s3Params: {
Bucket: conf.Bucket,
Key: conf.localFile,
},
};
// upload objects
console.log("## s3 Parameters");
console.log(conf);
console.log("## Begin uploading.");
var uploader = client.uploadFile(params);
uploader.on('error', function(err) {
console.error("Unable to upload:", err.stack);
});
uploader.on('progress', function() {
console.log("Progress", uploader.progressMd5Amount,uploader.progressAmount, uploader.progressTotal);
});
uploader.on('end', function() {
console.log("## Finished uploading.");
});
exports.maxAsyncS3 = 20; // default value
exports.s3RetryCount = 3; // default value
exports.s3RetryDelay = 1000; // default value
exports.multipartUploadThreshold = 20971520; // default value
exports.multipartUploadSize = 15728640; // default value
exports.Bucket = "your-bucket-name";
exports.localFile = "300mb.dmp";
exports.Key = "300mb.dmp";
First of all, let's try uploading a 300 GB object using multipart upload, and then upload the same file without using multipart upload.
You can upload an object and see how long it takes by typing the following command:
$ time node s3_upload.js
## s3 Parameters
{ maxAsyncS3: 20,
s3RetryCount: 3,
s3RetryDelay: 1000,
multipartUploadThreshold: 20971520,
multipartUploadSize: 15728640,
localFile: './300mb.dmp',
Bucket: 'bucket-sample-us-east-1',
Key: './300mb.dmp' }
## Begin uploading.
Progress 0 16384 314572800
Progress 0 32768 314572800
…
Progress 0 314572800 314572800
Progress 0 314572800 314572800
## Finished uploading.
real 0m16.111s
user 0m4.164s
sys 0m0.884s
As it took about 16 seconds to upload the object, the transfer rate was 18.75 MB/sec.
Then, let's change the following parameters in the configuration (conf.js) as follows and see the result. The 300 GB object is uploaded through only one S3 client and
exports.maxAsyncS3 = 1;
exports. multipartUploadThreshold = 2097152000;
exports.maxAsyncS3 = 1;
exports.s3RetryCount = 3; // default value
exports.s3RetryDelay = 1000; // default value
exports.multipartUploadThreshold = 2097152000;
exports.multipartUploadSize = 15728640; // default value
exports.Bucket = "your-bucket-name";
exports.localFile = "300mb.dmp";
exports.Key = "300mb.dmp";
Let's see the result after changing the parameters in the configuration (conf.js):
$ time node s3_upload.js
## s3 Parameters
…
## Begin uploading.
Progress 0 16384 314572800
…
Progress 0 314572800 314572800
## Finished uploading.
real 0m41.887s
user 0m4.196s
sys 0m0.728s
As it took about 42 seconds to upload the object, the transfer rate was 7.14 MB/sec.
Now, let's quickly check each parameter, and then get to the conclusion;
The following table shows the speed test score with different parameters:
maxAsyncS3 |
1 |
20 |
20 |
40 |
30 |
s3RetryCount |
3 |
3 |
3 |
3 |
3 |
s3RetryDelay |
1000 |
1000 |
1000 |
1000 |
1000 |
multipartUploadThreshold |
2097152000 |
20971520 |
20971520 |
20971520 |
20971520 |
multipartUploadSize |
15728640 |
15728640 |
31457280 |
15728640 |
10728640 |
Time (seconds) |
41.88 |
16.11 |
17.41 |
16.37 |
9.68 |
Transfer Rate (MB) |
7.51 |
19.53 |
18.07 |
19.22 |
32.50 |
In conclusion, multipart upload is effective for optimizing the PUT operation, aggregating throughput. However, you need to consider the following:
There are limits to using multipart upload. The following table shows the specification of multipart upload:
Item |
Specification |
Maximum object size |
5 TB |
Maximum number of parts per upload |
10,000 |
Part numbers |
1 to 10,000 (inclusive) |
Part size |
5 MB to 5 GB, last part can be more than 5 MB |
Maximum number of parts returned for a list of parts request |
1,000 |
Maximum number of multipart uploads returned in a list of multipart uploads request |
1,000 |
If you initiate multipart upload and abort the request, Amazon S3 deletes the upload artifacts and any parts you have uploaded and you are not charged for the bills. However, you are charged for all storage, bandwidth, and requests for the multipart upload requests and the associated parts of an object after the operation is completed. The point is you are charged when a multipart upload is completed (not aborted).
In this article, we learned how to optimize the PUT requests and uploading a large object into parts.
Further resources on this subject: