Back to the Future: 1.21 petaFLOPS(RPeak), 156,000-core CycleCloud HPC runs 264 years of Materials Science

Hi everyone, it's Jason, and I have an exciting environment to tell you about.

If I had asked Cycle engineers to create a massive HPC cluster to supply 1.21 PetaFLOPS of compute power, their reaction would be something like this: 


Thankfully, instead, we asked them to just put together another BigCompute cluster to help advance materials science. Lot easier pill to swallow, isn't it?


Materials Science, but Back in Time

Mark.Thompson.smHow do you use Utility HPC to advance materials science you ask? Well, first, we started working with an amazing researcher, Professor Mark Thompson, who got his PhD in Chemistry from Cal Tech, and now does research on organic semiconductors, specifically organic photovoltaic solar cells.

The challenge in solar power has always been the efficiency. Humanity has to find a material that can turn photons from the sun into electricity with as little waste as possible. The more efficient it is, the more electricity from each soalr panel, the more viable material. The total number of potential solar panel materials is limitless. But that actually makes it that much more difficult to find the best material out of all all the possibilities.


As Professor Thompson puts it, "If the 20th century was the century of silicon materials, the 21st will be all organic. The question is how to find the right material without spending the entire 21st century looking for it." Hear, hear!

Designing a new material that might be well-suited to converting sunshine into electricity is a difficult challenge. First, for any possible material, just figuring out how to synthesize it, purify it, and then analyze it, typically takes a year of grad student time and hundreds of thousands of dollars in equipment, chemicals, and labor for that one molecule.

Instead, Dr. Thompson can now simulate the properties of a solar panel material using Schrödinger's Materials Science software suite, which is a set of highly advanced quantum chemistry tools that can analyze a compound in code without having to synthesize it. Even cooler, instead of the year's worth of time and $100,000 in expenses, Schrödinger simulation tools running on Cycle / AWS Spot Instances do the job for a mere $0.16 per molecule in infrastructure.

Now the search for valuable solar materials actually means looking at tens of thousands of materials. So on November 4th, Dr. Thompson's 205,000 molecule workload used the world's largest cloud HPC cluster to run 264 compute years of computation.


Yes, we built an HPC Cluster out of a DeLorean (Cloud)

Great Scott, we're going to need a lot of compute! For this big workload, a 156,314-core CycleCloud behemoth spanning 8 AWS regions, totaling 1.21 petaFLOPS (RPeak, not RMax) of aggregate compute power, to simulate 205,000 materials, crunched 264 compute years in only 18 hours.

Thanks to Cycle's software and Amazon's Spot Instances, a supercomputing environment worth $68M if you had bought it, ran 2.3 Million hours of material science, approximately 264 compute-years, of simulation in only 18 hours, cost only $33,000, or $0.16 per molecule. 


To deploy this cluster, our software automated bidding, acquiring, testing, and assembling this large environment, plus distributing the data and the workload. We did have some familiar software in the mix, like Opscode's Chef… but there was also something new!


Task distribution, code-named Jupiter

In order to reliably move workload tasks between different cloud computing regions on AWS, we needed to build software with low overhead that would be be resilient to failure, and able to scale to massive sizes. And of course, instead of figuring out how to distribute work amongst fixed infrastructure, it needed to be able to build the infrastructure to solve the workload. What better code name than Jupiter for such a cloud HPC scheduler?

Traditional supercomputers use tools like SLRM, etc. that are good at getting large blocks of core counts to run a smaller number of MPI jobs, for example. They are batch oriented, and don't have an SOA oriented architecture. We needed something that supported millions of cores doing tens of millions of tasks. Jupiter was designed to do just this, while still working well with batch schedulers like Condor, GridEngine, PBS, etc. 

More specifically:

Screen Shot 2013-11-12 at 5.00.19 AM

We'll have more on this scheduler in a bit, but we're excited about its impact! Click here if you're interested in using Jupiter


Visit us at AWS re:Invent and SC13

If you're interested in hearing more about this run, please stop by and see us at:

AWS re:Invent Booth #1112

Supercomputing Booth #3610

We'll have more technical goodness and information about bursting to Cloud for Utility HPC.


Welcome back to the future! BigCompute back in time

So here we are, that place where scientists, quants, and engineers can do what we want them to do: Ask the right questions, without regard for the infrastructure they have available. In this case we used: 

Screen Shot 2013-11-12 at 5.26.34 AM

Now comes the question: Are you a researcher or a software company that wants your software to run well in the Cloud? If you're interested in running HPC at any scale (most of our users are from 40-4,000 cores), or are interested in Jupiter or CycleCloud please reach out to us here or utilityHPC -at-

Meanwhile, we'll be helping researchers like Mark Thompson get computing done!

Share this: