Creating A 2048-Core HPC Cluster in Minutes on AWS for a $525 job

World, meet Okami. Okami, meet World

We do a lot of work, at very large scales, on HPC work in the cloud, and today we’d like to introduce you to a decent sized HPC Cluster we recently worked on: let’s call it ‘Okami’. Okami has a number of components familiar to those who have worked with internal HPC environments: 2048 cores, shared storage, and a scheduling system. Had Okami been born in 2005 rather than 2010, he’d be in the Top 20 largest computers at that time.

But the similarities between Okami and internal clusters end there.

First, Okami was provisioned, from start to finish, by CycleCloud in under 30 minutes!

And more importantly: when calculations were done, the nodes were shut down, and the user paid only $525 to access this 2048 core cluster!

As many of our readers know, we built CycleCloud in 2007 and it was the first system to automate the process of creating complete compute clusters in virtual infrastructure. It is the easiest and fastest way to deploy traditional HPC clusters in the Cloud.

Creating HPC environments without security in EC2 is not burdensom, but CycleCloud automates:

  • provisioning cluster nodes with dependencies,
  • setting up the scheduling correctly/securely,
  • patching/maintaining OS images,
  • setting up encryption,
  • managing the encryption keys,
  • administering cluster users,
  • tracking audit information,
  • deploying/optimizing shared file systems,
  • application deployment,
  • scaling appropriately based upon load,
  • connecting to your license management software, and
  • keeping on top of all the latest and greatest Cloud infrastructure and features.

So when a very large life science research organization asked us to create a 2048-core cluster in EC2 to make their calculations scale, we said, “No problem!”

Migrating Workflows to EC2

With an internal cluster already being utilized to its full potential the client was looking to do >2000 core bursts to the cloud in the most economical way possible. We helped migrate, configure and test their workflow in EC2, which took comparatively little time.

We spun up a Torque cluster using CycleCloud and, in a matter of minutes, our client was able to upload their job data and run jobs. This provided the ideal rapid build and test environment to expand their workflow to the cloud. In the course of a day we were able to guide the client through the cloud migration and create a pre-configured machine image, customized for their research needs, and ready for scale and performance tests.

Benchmark to Save Dollars

Cycle’s client was excited by the recent release of the EC2 Cluster Compute Instance machine type as they more closely match what the client had in their own data center. The modern hardware does come at a premium and for low I/O, high CPU jobs, the high-speed machine interconnects that are included in that cost are going largely unused. So we all thought, let’s test this out.

Since 2007 we have devised benchmarking strategies to help discover the most cost-effective Amazon machine instances for client workloads. In this case, data from three tests were presented: Cluster Compute Instances (cc1.4xlarge) configured to execute 8 jobs concurrently, Cluster Compute Instances configured to execute 16 jobs concurrently (to test hyperthreading performance) and High-CPU Extra Large instances (c1.xlarge) configured to run 8 jobs concurrently. The benchmark tests compute pi to several thousand digits to simulate a low-I/O, high-CPU workload.
This chart shows that cc1.4xlarge instances configured to take advantage of hyperthreading provide the highest throughput for CPU heavy calculations. At first glance, it might look like this is the best way to run these jobs on this cluster. But, let’s take a look at the impact of the cost differences between these nodes:
This second chart plots throughput relative to cost, and reveals something very interesting. The most cost-effective configuration for this type of workload is the classic High-CPU Extra Large instance. Even though CC1 nodes are the new hotness, from a cost performance basis, you get almost 25% more results per dollar spent using the older high-CPU extra-large instances!

Our analysis gave the user a complete picture of the costs involved in running their workloads in the EC2 cloud. The beauty of CycleCloud is researchers can choose to get the fastest results using CC1, without regard for costs, or otherwise, use High CPU extra-large. The analysis allowed the client to make an informed decision to move ahead with clusters based around the High-CPU Extra Large instance and avoid a costly 256-instance Cluster Compute test; a significant savings for them.

Automatic Scaling Saves Money

Cycle’s unique management system helps maximize the throughput and minimize the costs of calculations by automatically scaling your cloud cluster up or down based on the size of your job queue. With machine images configured and tested our client was ready to put this scaling system to an extreme test.

The following graphs plot the load and CPU use from their cluster as the system scales up, runs the client’s simulations, and then spins back down. The client configured CycleCloud to provide a maximum of 256 machine instances and Torque to allow the jobs to run for two hours before terminating them.

In the first plot you can see machine instances being spun up in 64-instance chunks. The cluster was pegged at just under 100% CPU usage for the duration of the run and then dropped down to almost zero usage when run completed. Shortly afterwards the instances are terminated. We call this graph The Hat.

At it’s peak, there were 2048 active nodes in this cluster instance running simulations. The client was able to rent a cluster of 2048 cores, with a combined total of 1.7 TB of RAM, for about USD$525. When the work ended, so did their compute costs — the machines were turned off before another CPU hour was billed by EC2.

What’s next?

So that’s the life story of Okami, who will be resurrected when needed to save time-to-result in the future.

The fact that we can spin up a 2048 core cluster in minutes, run the jobs, and shut it off, is a testament to CycleCloud and AWS. But that was yesterday, and tomorrow we’ll be ready for the next big job. Do you have GPU cluster or other large-scale HPC environment you need quickly?

Let us know your thoughts and if you have a large cluster you’d like to cruise with. Visit us at to learn more.

Share this: