AWS S3 Performance Tip: Using DataMan™ to Increase Concurrency, Leads to 500 MBps Upload Speed

AWS S3 Performance Tip: Using DataMan™ to Increase Concurrency, Leads to 500 MBps Upload Speed

Maximizing Upload Speeds to S3 and Glacier In a previous blog we described the huge benefit of using multipart uploads when transferring data to cloud storage. So today we’re revisiting our previous benchmarking with BigData in mind, which means transferring large numbers of files, of varying sizes, efficiently and easily. So what did we find? Concurrency & Part-size Matter We saw significant speed increases by adjusting file-part sizes, and concurrency: Figure 1. Data Upload speed in Megabyte/s for different part sizes and part concurrency. We were able to heavily utilize the available bandwidth by optimally adjusting not only the file-part sizes of the multipart uploads, but also the number of concurrent transfers. By transmitting many files at the same time and in the correct number of parts, we maximized the transfer rate to S3, achieving roughly 500 MBps. This is why we have built both of those capabilities into our data transfer and workflow tool, DataMan™, the tool we used to run these benchmarks. [For more information about DataMan™’s Multi-threading Performance, Transfer Monitoring, and Reporting & Visualization, Click Here.] Benchmarking Methodology First we measured the results of using different part sizes and concurrency when uploading a single 6.99 GB file to S3. The results are shown in Figure 1 above. Concurrency refers in this case to the number of parts of a fixed size uploaded to S3 in parallel, and these parts can come from one or more multipart uploads. Since we used a single file in this benchmark, the parts were all from the same one. We used the following part sizes, all a power of 2: 16 MB  (16777216 bytes),...