Cloud in a broader HPC world

Cloud is bringing “HPC” into a broader world. We just don’t always see it that way.

Cloud Computing for large workloads continues to evolve. One of the things that we see is how this technology is enabling people who never think of themselves as “HPC people” to do work that they never thought possible. Part of this is that the definition of “HPC” has always been focused on the hardware aspect of computation. However, what we see is a broader use of the techniques and technologies of HPC to help people use computation to predict rather than simply report.

Historically, most analytic computation has been focused on validating and reporting what we already know: capturing transactions, reporting on activities, validating designs, checking our math. Classic examples include all of the accounting type workloads, human resource systems, inventory management, etc.

With the availability of HPC-like (large compute, network, and storage capacity) environments easily accessible by anyone, more and more groups are using data and simulation to predict future events, outcomes, or reactions (be they people, materials, chemicals, medicines, etc). This ability to use computation to predict versus validate is a fundamental shift in how many people are starting to leverage HPC.

The availability of additional compute power enables users to include computation as part of the design process, exploring multiple options to develop new approaches and not just validate completed designs. As an example, consider the process for finding useful small molecule targets for drug development. Historically, researchers would try to find potential targets by starting with a limited subset of pre-selected materials. This limit stems from a lack of sufficient computation available to analyze a larger set within a reasonable amount of time. The chemists were limiting their exploration space to fit within the confines of the compute capabilities they had. Now, we are starting to see researchers comb through much larger sets and find potential targets that might never had been considered before because they can scale their resources to fit the needs of the workload.

Another example is in jet engine design. Airplane manufacturers want to reduce the noise of jet engines. Noisy engines annoy passengers and people who live near airports, but it’s hard to reduce noise without reducing power. Through the use of computer simulation they were able to try numerous design options for the back end of the engine where the gasses leave. This showed that having a serrated back end had a large impact. Engineers didn’t initially suspect this choice would be the right answer, but it proved to be the best candidate.

HPC people often define HPC by the hardware aspects of what they have built, what they have running, or what they have designed. The industry often talks in terms of node counts, petabytes, latency and throughput. The admins often talk about petascale or exascale, data center square footage, bandwidth to the outside world. This approach provides an easy set of common metrics for defining our HPC systems, enabling us to compare and contrast different systems. The Top500 list is a great example of this. While it does serve as a great tool to understanding the raw capacity of any particular system under a limited set of use cases, it has become a badge of honor among many within the community. You will often overhear people at conferences referring to how large their systems are by saying that they are at position X on the Top500. While interesting, that really doesn’t highlight anything about the work being done or the science moving forward.

This shorthand way of explaining HPC does the community and the work that we accomplish a real disservice. More than the hardware we consume, the actual work that gets done because of the hardware is what makes HPC interesting, exciting, and — increasingly — critical. And with the availability of cloud-based HPC, the ability to count nodes, petabytes, or bandwidth is less relevant because anyone can get access to the size that they need.

This move to a “post-hardware” view of HPC enables people to think beyond the limitations of what they are running on and focus on problem they are trying to solve. Experiments can be designed to fit the science being done rather than the platform that it is targeted to run on. By focusing on the problem and not the underlying tool for the work, we can find new solutions to all kinds of interesting problems.

But there is a bigger issue, too. The availability of “cloud-based HPC” means more and more users are doing HPC workloads but they don’t even know it. They don’t think of what they are doing as HPC (they don’t care about the hardware). They don’t care about utilization and job queues because they don’t have to worry about them. With cloud-based HPC, they can get the resources when they need them in the size and flavor that they want. In fact, the whole idea of a “shared cluster” could go away. Everyone can have their own cluster for the size they need when they need it.

These are the people that are truly driving “HPC” forward in the coming years. These are the people that don’t have computer science, electrical engineering, or other classic technical degrees; they are people from the design or marketing or sales or production side of the world that are using these technologies to approach their work a new way. And these people don’t care about the hardware.

Share this: