HGST buys 70,000-core cloud HPC Cluster, breaks record, returns it 8 hours later


By Jason Stowe, CEO

Today we have a very special workload to talk about.

HGST, a Western Digital company, fully embraces the philosophy of using the right sized cluster to solve a problem, but in a plot twist, they return the cluster once they’re done innovating with it.

In this case, during a Friday 10:30 a.m. Session, BDT311 at AWS re:Invent, David Hinz, Global Director, Cloud Computing Engineering at HGST, will talk about this extensively. He will describe a number of workloads that are run as part of HGST’s constant push to innovate and build superior drives to hold the world’s information, but some workloads are larger than others…


Technical Computing: The New Enterprise Workload

HGST-Drive-Head-DesignThe folks at HGST are doing truly innovate work in technology, in part by enabling agility for technical computing workloads for engineering.  Technical computing, including simulation and analytics, HPC and BigData, is the new workload that every enterprise has to manage.

One of HGST’s engineering workloads seeks to find an optimal advanced drive head design, taking 30 days to complete on an in-house cluster. In layman terms, this workload runs 1 million simulations for designs based upon 22 different design parameters running on 3 drive media Running these simulations using an in-house, specially built simulator, the workload takes approximately 30 days to complete on an internal cluster.


World’s Largest Fortune 500 Cloud Cluster Run

First, we found out about this workload this past Wednesday, and our software ran it at scale this past weekend!

To solve this problem, our CycleCloud software created a compute environment out of AWS Spot Instances. Over 50,000 Intel IvyBridge cores of Spot instances were available in the first 23 minutes, across three regions (US-EAST, US-WEST1, US-WEST2). At peak the cluster had 70,908 IvyBridge cores, with an rPeak performance of 729 TeraFLOPS, greater than the rPeak on #63 of the Top 500 Supercomputer list. We named this cluster “Gojira” in honor of its scale and power.


The run finished in 8 hours instead of 30 days, with an infrastructure cost of $5,594.

And they can use this going forward to run the workload!

Our CycleServer software, with SubmitOnce, moved the jobs across the different regions Here’s what it looked like while running:


We also used Chef and CycleCloud’s reporting out of it to monitor configuration on these nodes:


Feel free to come by our Booth #125 at AWS re:Invent or Booth #1529 at SC14 for more answers on this cluster!


Business Results: The importance of Scale and Timing

This run was all about enabling acceleration at scale, enabling faster time to result, repeatably. The engineering run will be able to be repeated at will in the future to help with optimal design and faster time to result.

Scale and timing enable our customers to achieve better business and research results. Better Answers. Faster.

Specifically, this run had a few firsts:


And the scale and timing matter for a few reasons:



Yes, we built a Supercomputer out of Cloud

If you’re interested in hearing more about this run, please stop by and see us at:

AWS re:Invent Booth #125

Supercomputing Booth #1529

We’ll have more technical goodness and information about bursting to Cloud for easy access to Cloud Cluster Computing.


How does this impact you? Better Answers. Faster.

Now comes the question: Are you a researcher or an engineer or quant or developer at software company, that wants your workloads to run well in the Cloud?

If you’re interested in running HPC at any scale (most of our users are from 64 to 6,400 cores), or are interested in CycleCloud, please reach out to us here or sales -at- cyclecomputing.com or call 888.292.5320.

Meanwhile, we’ll be here helping more Fortune 500s, start-ups, and public research folks run important workloads at scale!

(if you’re interested in joining cycle, send your resume to jobs -at- cyclecomputing.com!)

Share this: