AWS S3 Performance Tip: Using DataMan™ to Increase Concurrency, Leads to 500 MBps Upload Speed

Maximizing Upload Speeds to S3 and Glacier

In a previous blog we described the huge benefit of using multipart uploads when transferring data to cloud storage. So today we’re revisiting our previous benchmarking with BigData in mind, which means transferring large numbers of files, of varying sizes, efficiently and easily. So what did we find?

Concurrency & Part-size Matter

We saw significant speed increases by adjusting file-part sizes, and concurrency:

DataMan Data Upload speed in Megabyte/s for different part sizes and part concurrency

Figure 1. Data Upload speed in Megabyte/s for different part sizes and part concurrency.

We were able to heavily utilize the available bandwidth by optimally adjusting not only the file-part sizes of the multipart uploads, but also the number of concurrent transfers. By transmitting many files at the same time and in the correct number of parts, we maximized the transfer rate to S3, achieving roughly 500 MBps.

Learn-more-features-transparentThis is why we have built both of those capabilities into our data transfer and workflow tool, DataMan™, the tool we used to run these benchmarks.

[For more information about DataMan™’s Multi-threading Performance, Transfer Monitoring, and Reporting & Visualization, Click Here.]

Benchmarking Methodology

First we measured the results of using different part sizes and concurrency when uploading a single 6.99 GB file to S3. The results are shown in Figure 1 above. Concurrency refers in this case to the number of parts of a fixed size uploaded to S3 in parallel, and these parts can come from one or more multipart uploads. Since we used a single file in this benchmark, the parts were all from the same one.

We used the following part sizes, all a power of 2:

  • 16 MB  (16777216 bytes),
  • 32 MB (33554432 bytes), and
  • 64 MB (67108864 bytes)

The concurrency was doubled as well, starting with 4 and ending at 128. To more closely emulate the real world, we chose an odd file size. That ensured that the resulting number of parts wouldn’t correspond exactly to the number of available uploaders.

The tests were executed in the us-east-1 region and uploaded data to a bucket in the us-east-1 region.

Try-it-out-for-yourself-transparentAs shown in the graph, the upload rate increased from about 100 MBps to 500 MBps as the level of concurrency was increased. As we increased the number of uploaders, the number of connections to S3 increased, improving the transfer rate until a limit at about 128 uploaders. At that point network congestion causes packet loss, forcing retransmission and slowing down the transfer rate. The upload speed maxed out at 64 threads for any part size due to the number of concurrent connections to S3.

And it’s not just part concurrency that matters; part size also plays a role in affecting transfer rates. As illustrated in figure 1, the transfer rate improved as we increased the part size from 16 MB to 64 MB. This is because, when concurrency is low, the part size does not have a significant impact on the bandwidth – the amount of data that can be uploaded over a single HTTP connection is limited. But with higher concurrency choosing small part sizes results in lower transfer rates due to the overhead incurred in creating connections and negotiating hand-shakes for each part. Choosing a very large part size also lowers transfer rates as failures when transmitting too-large parts results in larger re-transfers and decreases the average transfer rate.

Hence the part size must be chosen carefully depending on the network characteristics. Smaller part sizes are ideal on congested networks where frequent packet loss can cause an entire part upload to fail, while congestion free networks will benefit from the reduced overhead of larger part sizes. Our network was congestion free, which was reflected in our results.

Effects of Part Concurrency During Multiple File Transfers:

Here’s the graph of file upload speed vs part concurrency when uploading a thousand files at the same time:

Figure 2. Data Upload speed in Megabyte/s when uploading 1000 4MB files vs concurrency.

Effects of Part Concurrency for Very Large File Transfers:

Figure 3. Data Upload speed in Megabyte/s when uploading single 90GB file vs concurrency.

For this test, we used DataMan to transfer a 90 Gigabyte file using a part size of 64MB, and achieved transfer rates of 300MBps*. In the graph in Figure 3, concurrency varies from 4 to 128 parts. The speed increased consistently up to 64 parts, and then stayed constant. This is because at higher concurrency some data is retransmitted due to packet loss and the likelihood of such retransmission happening multiple times is higher for large file sizes. But when kept to an appropriate level in our benchmarks, increased concurrency offered a significant benefit to the upload speeds.

* This is lower than the 8 GB file because during every test, network errors caused retransmission of data resulting in lower average speeds.

[Try it out for yourself with this DataMan™ trial, click here.]

Optimizing Transfers in DataMan

Based on all of the data we’ve gathered over years of transferring files into and out of the Cloud – and summarized in the benchmarks we’ve shown above – we’ve found the following guidelines maximize bandwidth usage:

  • Optimizing the sizes of the file parts, whether they are part of a large file or an entire small file
  • Optimizing the number of parts transferred concurrently

Tuning these two parameters achieves the best possible transfer speeds to the Cloud. And this is exactly how Cycle Computing’s DataMan™ data workflow software provides a massive performance improvement when uploading data to Amazon S3 and Glacier.

DataMan optimizes for both of these by intelligently deciding when to use a multipart transfer for a large object and when to transfer the entire file as a single part. And while this is easy to say, it’s not easy to do. Tune the values incorrectly, and the transfers will error at increased rates, delaying uploads, and negatively impacting the overall transfer rate.

Benchmark Apparatus:

The benchmark was performed using a c3.8xlarge Amazon Linux EC2 instance in the us-east-1 region. The c3.8xlarge instance was chosen because it provides a sufficient number of cores to support the required parallelism, supports Enhanced Networking (SR-IOV), and provides SSD storage. The SR-IOV networking provides high speed, low latency communication, and enables DataMan to use all of the available bandwidth between the instance and S3. The SSD storage significantly improves disk I/O efficiency.

The test file used was 6.99GB in size for the first test, 4MB for the 1000 file test and 90GB for the very large file test. The DataMan filesystem walker performs a filter-based evaluation to see if the files need to be uploaded. It memory-maps those files, then queues parts of the file to be picked up by the uploaders to S3 or Glacier.

If this helps…

We have lot’s of ways to learn more, and you can even to try it out for yourself now by clicking here. You can also learn more about Dataman’s™ features here.

Share this: