Improving ALS research with Google Cloud, Schrödinger, and Cycle Computing

Today we published a case study describing how the use of Google Cloud enabled one professor to do work she never thought possible. May Khanna, Assistant Professor of Pharmacology at the University of Arizona, studies pharmacological treatments for pain. Her specific area of expertise focuses on research that uses protein binding to develop possible treatments. Using our CycleCloud™ software to manage a 5,000 core Google Cloud Preemptible VM cluster running Schrödinger® Glide™ has enabled research she never thought possible. This cluster was used to run 20,000 hours of docking computations in four hours for $192, thanks to the simple, consistent pricing of GCP’s Preemptible VMs. The n1-highcpu-16 instances she used have 16 virtual cores and 60 gigabyte of RAM, so they’re well-suited for this kind of compute-heavy workload. For this project, Professor Khanna wanted to analyze a protein associated with amyotrophic lateral sclerosis, also known as “ALS” or “Lou Gerhig’s disease”. ALS has no known cure and causes pain and eventual death for some 20,000 people in the United States every year. Protein binding simulation is compute-intensive, even under the constraints researchers often apply to achieve their results in a reasonable time. For example, proteins are often simulated in isolation and the binding sites are restricted to a set of known-or-expected active locations on the protein. With those constraints, Professor Khanna was only able to simulate 50,000 compounds, which yielded a grand total of four possible hits. She was about to give up on the project when she approached Cycle Computing. Using her Google Cloud cluster, she was able to simulate binding of a million compounds in just a...
Simulating Hyperloop pods on Microsoft Azure

Simulating Hyperloop pods on Microsoft Azure

Earlier today, we published a case study and press release about some work we did with the HyperXite team from the University of California, Irvine team and their efforts in the Hyperloop competition. This team leveraged CycleCloud to run ANSYS Fluent™ on Microsoft Azure Big Compute to to complete their iterations in 48 hours, enabling them to get results fast enough to make adjustments and modifications to the design then rerun the simulations until they were able to converge on a final solution. All for less than $600 in simulation costs. This was a case where Cloud enabled them to do something they could not have done any other way. As a bit of background, Elon Musk’s SpaceX started the Hyperloop project as a way to accelerate development of a fast, safe, low-power, and cheap method of transporting people and freight. HyperXite was one of 27 teams that competed recently. Nima Mohseni, the team’s simulation lead, used the popular computational fluid dynamics software ANSYS Fluent™ to perform modeling of the pod. Key areas that the team modeled were related to the braking approach that they were using. Through the use of simulation, they were able to show that they could brake with just the use of magnetic force, removing the need for mechanical brakes. This reduced weight, increased efficiency, and improved the overall design, which was recognized with a Pod Technical Excellence award last year. Using the CycleCloud software suite, the HyperXite team created an Open Grid Scheduler cluster leveraging Azure’s memory-optimized instances in the East US region. Each instance has 16 cores based on the 2.4 GHz Intel...

LAMMPS scaling on Azure InfiniBand

While public clouds have gained a reputation as strong performers and a good fit for batch and throughput-based workloads, we often still hear that clouds don’t work for “real” or “at scale” high performance computing applications. That’s not necessarily true, however, as Microsoft Azure has continued its rollout of Infiniband-enabled virtual machines. InfiniBand is the most common interconnect among TOP500 supercomputers, and Microsoft has deployed the powerful and stable iteration known as “FDR” Infiniband. Best of all, these exceptionally high levels of interconnect performance are now available to everyone on Azure’s new H-series and N-series virtual machines. To see how well Azure’s Infiniband works, we benchmarked LAMMPS, an open source molecular dynamics simulation package developed by Sandia National Laboratories. LAMMPS is used widely-used across government, academia, and industry, and is frequently a computational tool of choice for some of the most advanced science and engineering teams. LAMMPS relies heavily on MPI to achieve sustained high performance on real-world workloads, and can scale to many hundreds of thousands of CPU cores. Armed with H16r virtual machines, we used the Lennard-Jones liquid benchmark. We selected the “LJ” benchmark and tested two scenarios: “weak scaling”, in which every core simulated 32,000 atoms no matter how many cores were utilized, and “strong scaling” which used a fixed problem size of 512,000 atoms with an increasing number of cores. Both scenarios simulated 1,000 time steps. We performed no “data dumps” (i.e. intermediate output to disk) in order to isolate solver performance, and ran 30 test jobs per data point in order to obtain statistical significance and associated averages. In summary, the results were impressive...

Expand your realm of possibility with Big Data and HPC – Seminar

I recently had the chance to present at a seminar titled “Expand your realm of possibility with Big Data and HPC” sponsored by DellEMC. Attendees from around the northeast gathered to discuss their current and future Big Data and HPC needs. The dialogue between the attendees is always the part that gets you thinking the most. It is interesting to see how the real world definition of “HPC” is getting broader and broader. Everyone was discussing how the users of their environments are no longer just classic large core count, math-based runs. They all see more and more data analysis users, more and more high throughput applications, and more and more workflows that mix all types to get to the answers they are looking for. My presentation on Approaches for Cloud HPC addressed this by highlighting a number of the use cases we have seen, talking through a number of the lessons learned, and offering ideas on how to get started. It was a great event with a mix of presenters. Jason Banfelder from discussed how HPC advances science at Rockefeller University, Al Rittaco from Becker College highlighted the work going on at the Massachusetts Green High Performance Computing Center (MGHPCC), and the DellEMC team presented on Big Data and HPC options they provide. There was no doubt that is our ability to generate more and more data grows, our need to actually process that data and find insights is growing even...

LISA 16 Cloud HPC BoF

Earlier this month, I attended the 30th USENIX LISA Conference in Boston, MA. In addition to my role co-chairing the Invited Talks tracks, I hosted a Birds of a Feather (BoF) session on cloud HPC. A group of 15 people joined to discuss the state of their cloud HPC efforts. Some attendees were actively using cloud resources — either public cloud or private cloud (i.e. OpenStack) resources. The rest know that they’ll be looking at cloud soon, so they wanted to hear what current practitioners had to say. Among those who haven’t started using public cloud for HPC, the greatest concern was data movement. One attendee mentioned he had 63 petabytes of data on his shared filesystem. That’s clearly not a trivial amount of data to push around, even over a very fast connection. For these kinds of scenarios, internal infrastructure (whether fixed or a cloud-like solution) is often the best solution. However it could be that each job only needs a very small subset of that data, and those could be candidates for moving to public cloud. The general agreement was that there are two main drivers for cloud-based HPC resources. The first is the ability to use correctly-sized resources. This can mean the right number of instances or even the right-sized instances. On a related note, the other main driver was the ability to isolate users from each other in a user-transparent manner. Giving users a dedicated sandbox simplifies administration and allows the IT organization to spend more time helping the user community make maximum use of the resources. A major driver, especially for the academic and...

Dell HPC Community event at SC ’16

The Saturday before Supercomputing ‘16, I had the pleasure of participating in the Dell HPC Community event. As you might expect from Dell’s partnership with us, the cloud was a big focus this year. What I didn’t expect, however, was just how much it had everyone’s attention. Jay Boisseau, Chief HPC Technology Strategist at Dell, started the event off by saying interest in cloud technologies dominated the input from customers in the last six months. And Jim Ganthier, Sr. VP of Validated Solutions and HPC, was unequivocal about how Dell intends to respond. “We are going to make Cloud HPC a core Dell offering and competency.” The event unfolded unlike any prior Dell gathering I’d attended. Instead of customers (or Dell) talking about PowerEdge servers, the latest CPUs and GPUs, or anything around Linpack, the presentations all spoke to organizations trying new approaches to serving scientists and engineers with the help of cloud technologies. A common motivation emerged, as well: urgency to meet the demands of increasingly heterogeneous and data-driven workloads in more nimble and collaborative ways. Two of the more interesting presentations related to cloud were from Shawn Strande (San Diego Supercomputing Center) and Tommy Minyard (Texas Advanced Computing Center). They talked about their efforts to run various cloud technologies on locally-hosted cyberinfrastructure programs funded by the National Science Foundation. Each indicated cloudifying (my fake-word, not theirs) was putting meaningfully better control in the hands of researchers from their state and the nationwide XSEDE network, especially with the rise of NSF programs that see data streaming in from geographically distributed sensor and instrument networks in enormous volumes. But...