Dell HPC Community event at SC ’16

The Saturday before Supercomputing ‘16, I had the pleasure of participating in the Dell HPC Community event. As you might expect from Dell’s partnership with us, the cloud was a big focus this year. What I didn’t expect, however, was just how much it had everyone’s attention. Jay Boisseau, Chief HPC Technology Strategist at Dell, started the event off by saying interest in cloud technologies dominated the input from customers in the last six months. And Jim Ganthier, Sr. VP of Validated Solutions and HPC, was unequivocal about how Dell intends to respond. “We are going to make Cloud HPC a core Dell offering and competency.” The event unfolded unlike any prior Dell gathering I’d attended. Instead of customers (or Dell) talking about PowerEdge servers, the latest CPUs and GPUs, or anything around Linpack, the presentations all spoke to organizations trying new approaches to serving scientists and engineers with the help of cloud technologies. A common motivation emerged, as well: urgency to meet the demands of increasingly heterogeneous and data-driven workloads in more nimble and collaborative ways. Two of the more interesting presentations related to cloud were from Shawn Strande (San Diego Supercomputing Center) and Tommy Minyard (Texas Advanced Computing Center). They talked about their efforts to run various cloud technologies on locally-hosted cyberinfrastructure programs funded by the National Science Foundation. Each indicated cloudifying (my fake-word, not theirs) was putting meaningfully better control in the hands of researchers from their state and the nationwide XSEDE network, especially with the rise of NSF programs that see data streaming in from geographically distributed sensor and instrument networks in enormous volumes. But...

Cycle Computing (Booth 3621) and SC16

It is that time of year again.  SuperComputing 16 is coming up Nov 13 – 18 in Salt Lake City. It should be a great event this year with lots of sessions and discussion plus a full exhibitor floor as well.     From what we have seen, Cloud will continue to be one of the hot topics at SC16. There are quite a few sessions and a number of companies that are talking cloud this year. As an industry, we have moved beyond simply discussions on “should I” and moved into discussion on things like MPI optimization for the cloud, GPU options, and clusters as a service. All these show the growing maturity of HPC in the cloud.   Of course, Cycle will be there, again, as we have for the past number of years. Clearly we are one of the original companies focused making our customers successful with Big Compute and Big Data workloads in the cloud, something we have been doing for 10 years now.   Ready to learn how you too can be successful in using public cloud for the kinds of workloads that make SC16 such an interesting conference? Reach out to us through our website form or by email and schedule a time where we can sit down with you.   Come see us at Booth 3621   Looking forward to a great week!!! Follow us on Twitter for updates throughout the conference....

The question isn’t cost, it’s value

Addison Snell recently wrote an article for The Next Platform called “The three great lies of cloud computing.” Snell points out that the marketing around cloud computing doesn’t always match reality. As someone who does marketing for cloud computing software, I just want to go on the record as saying Addison is absolutely….right. We’ve spent a lot of time on this blog, at conferences, etc. talking about the benefits of using public cloud services for big compute and big data. But we believe that a one-size-fits-all solution is never the right size. Public cloud services can be sized to fit many needs, but not every need. If there’s one area where Addison’s article falls short, it’s that he only considers the raw dollar amount when talking about cost. Raw dollar amount is important, of course, but it’s not the whole story. As I said in response to a question at HTCondor Week 2016, it’s all about the value that cloud resources provide, not the cost. If you spend twice as much to run a workload in the cloud, but you get three times the value (e.g. due to faster time-to-results or the ability to run simulations at a finer resolution due to adding greater capacity), that’s a net win. Customers often find value in the additional capacity or flexibility the cloud can offer: adding more compute without having to plan datacenter space or trying out new hardware by renting instead of making a large capital investment. Another part of the value discussion is the total value of your entire HPC environment: the mix of cloud plus internal resources. Many...

Running MPI applications in Amazon EC2

Despite significant improvements over the years, the same criticisms still color people’s opinion of using cloud environments for high performance computing (HPC). One of the most common things to hear when talking about using Amazon’s Elastic Compute Cloud (EC2) for HPC is “Sure, Amazon will work fine for pleasantly parallel workloads, but it won’t work for MPI (Message Passing Interface) applications.” While that statement is true for very large MPI workloads, we have seen comparable performance up to 256 cores for most workloads, and even up to 1024 for certain workloads that aren’t as tightly-coupled. Achieving that performance just requires some careful selection of MPI versions and EC2 compute nodes, along with a little network tuning. Note: While it is possible to run MPI applications in Windows on EC2, these recommendations focus on Linux. Enhanced Networking The most important factor in running an MPI workload in EC2 is using an instance type which supports Enhanced Networking (SR-IOV). In a traditional virtualized network interface, the hypervisor has to route packets to specific guest VMs and copy those packets into the VM’s memory so that it can process the data. SR-IOV helps reduce the network latency to the guest OS by making the physical NIC directly available to the VM, essentially circumventing the hypervisor. Fortunately, all of Amazon’s compute-optimized C3 and C4 instance types support SR-IOV as long as they’re launched in a Virtual Private Cluster (VPC). For specific instructions on enabling SR-IOV on Linux instances, see http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html Use of Placement Groups Another important factor in running MPI workloads on EC2 is the use of placement groups. When instances are launched...
HPC on Cloud: Results from the Record-breaking 156,000 Core MegaRun, and Insights to On-demand Clusters for Manufacturing Production Workloads

HPC on Cloud: Results from the Record-breaking 156,000 Core MegaRun, and Insights to On-demand Clusters for Manufacturing Production Workloads

In case you missed this great session at the AWS re:Invent 2014 conference, Cycle CEO Jason Stowe hosted HGST (a Western Digital company), and University of Southern California (USC) to talk about HPC on the Cloud. See below for full session video, and abstract description. BDT311 – HPC on AWS: Results from the Record-breaking 156,000 Core MegaRun, and Insights to On-demand Clusters for Manufacturing Production Workloads Not only did the 156,000+ core run (nicknamed the MegaRun) on Amazon EC2 break industry records for size, scale, and power, but it also delivered real-world results. University of Southern California (USC) ran the high-performance computing job in the Cloud to evaluate over 220,000 compounds and build a better organic solar cell. In this session, USC’s Patrick Saris provides an update on the six promising compounds he found that he is now synthesizing in laboratories for his clean energy project. Saris discusses the implementation of and lessons learned in running a cluster in eight AWS regions worldwide, with highlights from Cycle Computing’s project Jupiter, a low-overhead cloud scheduler and workload manager. This session also looks at how the MegaRun was financially achievable using the Amazon EC2 Spot Instance market, including an in-depth discussion on leveraging Amazon EC2 Spot Instances to reduce costs and maximize value, while maintaining the needed flexibility, and agility that AWS is known for. After a year of production workloads on AWS, HGST (a Western Digital company), has zeroed in on understanding how to create on-demand clusters to maximize value on AWS. HGST’s David Hinz outlines the company’s successes in addressing the company’s changes in operations, culture, and behavior to this new vision of on-demand clusters. In addition, the session will provide insights...