Cloud providers offer newer, better GPUs

Maximum performance of new cloud GPUs over time.

Maximum performance of new cloud GPUs over time.

Ever since there’s been a public cloud, people have been interested in running jobs on public cloud graphics processing units (GPUs). Amazon Web Services (AWS) became the first to offer this as an option when they announced their first GPU instance type six years ago. GPUs offer considerable performance improvements for some of the most demanding computational workloads. Originally designed to improve the performance of 3D rendering for games, GPUs found a use in big compute due to their ability to perform operations over a set of data rapidly and with a much greater core count than traditional central processing units (CPUs). Workloads that can use a GPU can see a performance improve up to 10-100 times.

Two years later, AWS announced an upgraded GPU instance type: the g2 family. AWS does not publish exact capacity or usage numbers, but it’s reasonable to believe that the cg1 instances were sufficiently successful from a business perspective to add the g2s. GPUs are not cheap, so cloud providers won’t keep spending money on them without return. We know that some of our customers were quick to make use of GPU clusters in CycleCloud.

But there was a segment of the market that still wasn’t being served. The GPUs in the cg1 and g2 instance families were great for so-called “single precision” floating point operations, but had poor performance for “double precision” operations. Single precision is faster, and is often sufficient for many calculations, particularly graphics rendering and other visualization needs. Computation that requires a higher degree of numerical precision, particularly if exponential calculations are made, need double precision. The GPUs that were available had poor double precision performance, to the point where using the CPU instead was more cost efficient for some codes.

That changed this summer when both AWS and Microsoft Azure announced offerings that feature NVIDIA’s Tesla K80 GPU. The K80 provides an order of magnitude greater performance, allowing the highest-end compute users to begin taking advantage of the benefits of cloud computing.

Card Single precision GFLOPS Double precision GFLOPS
Tesla M2050 (AWS cg1) 1030.4 515.2
GRID K520 (AWS g2) 2457.6 ~40*
Tesla M60 (Azure NV) 7365–9650 230.1–301.6
Tesla K80 (AWS p2, Azure NC) 5591–8736 1864–2912

* estimated based on performance statements from AWS

Gartner identified GPU computing as one of the top 10 strategic trends for 2016, and the recent investment by the two largest cloud service providers reinforce that. So why would you choose to rent GPU time from a cloud provider instead of running them internally?

Cost is often the largest factor: both the capital expense of purchasing the hardware and the operational expense of providing the large amounts of power (and thus cooling) that GPUs require. Even a small GPU cluster can overwhelm a datacenter that isn’t prepared for the extra load.

Cost considerations become even more important if utilization is low or bursty. As with CPUs, having more than you need is a waste, but having fewer than you need holds back your work. Renting what you need for only as long as you need it from a cloud service provider is a great way to make your cost match your usage. If you’re still experimenting with GPUs, this becomes even more appealing because you don’t need to make a large capital expenditure only to discover that your code doesn’t get enough benefit.

We’re excited about the capabilities these new GPU offerings provide for research and simulation. If you are, too, contact us to get started.

Share this: