Using Tags for Tracking Cloud Resources

When Amazon announced that the limit for tags on resources was increased from 10 to 50, a great cheer went up from the masses. In the same way that you’ll always own slightly more stuff than you can fit in your house, many cloud users find they wanted more tags than were available. Tags are custom labels that can be assigned to cloud resources. While they don’t have a functional effect (except for in Google Compute Platform, see below), they can be very powerful for reporting and automation. For example, some customers have a single corporate account and apply resources based on department, user, project, et cetera for chargeback. Some customers also use labels in automated tools. For example, you can tag instances with a “backup” attribute and have a script that polls those instances to create snapshots of permanent volumes on a daily basis. Or perhaps you have an account for testing and you don’t want users to accidentally leave instances running forever. You can automatically terminate long-running instances that don’t have a “keepalive” tag set. In Amazon Elastic Compute Cloud (EC2) and Microsoft Azure, tags are key-value pairs. CycleCloud supports adding tags to instances and volumes a simple syntax:       tags.Application = my application       tags.CustomValue = 57       tags.Custom Text = Hello world Tags in Google Compute Platform The term “tag” has a different meaning in Google Compute Platform. A “tag” is an attribute places on an instance that is used to apply network or firewall settings. Other resources do not have tags. CycleCloud supports adding tags to GCP instances...

See the HPC in Cloud Educational Series videos

As the interest in HPC in the Cloud grows, Cycle Computing, in conjunction with Amazon Web Services, Avere, Google, and Microsoft Azure, presented the HPC in the Cloud Educational Series at SC15 this year. This series delivered a set of in-depth discussions on key topics around leveraging Public Clouds for large computation and big data analytics.   The series was well received and we have made the presentations available for viewing. If you saw the series at SC, you may want another chance to review the material. If you missed it, this is a great chance to hear from the experts. Please take a look and let us know what you think.   The series is available here and includes the following:   Why wouldn’t you use public cloud for HPC – Rob Futrick, Cycle Computing — As part of the HPC in the Cloud Educational Series, Rob Futrick, CTO of Cycle  Computing discusses the benefits and challenges of moving big data and HPC workloads to the public cloud.   HPC Cloud Data Mgmt – Jeff Layton, Amazon Web Services — As part of the HPC in the Cloud Educational Series, Jeff Layout, HPC Principal Architect at Amazon Web Services, explains concepts and options around using storage in the AWS Cloud.   Microsoft Azure for Engineering Analysis and Simulation – Tejas Karmarkar, Microsoft Azure — As part of the HPC in the Cloud Educational Series, Tejas Karmarkar, Senior Program Manager, Microsoft Azure, presents techniques for doing engineering analysis and simulation within the Microsoft Azure cloud.   Broad Institute use of Pre-Emptible VMs – Marcos Novaes, Google Cloud Platform – As part of the HPC in...

Running MPI applications in Amazon EC2

Despite significant improvements over the years, the same criticisms still color people’s opinion of using cloud environments for high performance computing (HPC). One of the most common things to hear when talking about using Amazon’s Elastic Compute Cloud (EC2) for HPC is “Sure, Amazon will work fine for pleasantly parallel workloads, but it won’t work for MPI (Message Passing Interface) applications.” While that statement is true for very large MPI workloads, we have seen comparable performance up to 256 cores for most workloads, and even up to 1024 for certain workloads that aren’t as tightly-coupled. Achieving that performance just requires some careful selection of MPI versions and EC2 compute nodes, along with a little network tuning. Note: While it is possible to run MPI applications in Windows on EC2, these recommendations focus on Linux. Enhanced Networking The most important factor in running an MPI workload in EC2 is using an instance type which supports Enhanced Networking (SR-IOV). In a traditional virtualized network interface, the hypervisor has to route packets to specific guest VMs and copy those packets into the VM’s memory so that it can process the data. SR-IOV helps reduce the network latency to the guest OS by making the physical NIC directly available to the VM, essentially circumventing the hypervisor. Fortunately, all of Amazon’s compute-optimized C3 and C4 instance types support SR-IOV as long as they’re launched in a Virtual Private Cluster (VPC). For specific instructions on enabling SR-IOV on Linux instances, see http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html Use of Placement Groups Another important factor in running MPI workloads on EC2 is the use of placement groups. When instances are launched...
HPC on Cloud: Results from the Record-breaking 156,000 Core MegaRun, and Insights to On-demand Clusters for Manufacturing Production Workloads

HPC on Cloud: Results from the Record-breaking 156,000 Core MegaRun, and Insights to On-demand Clusters for Manufacturing Production Workloads

In case you missed this great session at the AWS re:Invent 2014 conference, Cycle CEO Jason Stowe hosted HGST (a Western Digital company), and University of Southern California (USC) to talk about HPC on the Cloud. See below for full session video, and abstract description. BDT311 – HPC on AWS: Results from the Record-breaking 156,000 Core MegaRun, and Insights to On-demand Clusters for Manufacturing Production Workloads Not only did the 156,000+ core run (nicknamed the MegaRun) on Amazon EC2 break industry records for size, scale, and power, but it also delivered real-world results. University of Southern California (USC) ran the high-performance computing job in the Cloud to evaluate over 220,000 compounds and build a better organic solar cell. In this session, USC’s Patrick Saris provides an update on the six promising compounds he found that he is now synthesizing in laboratories for his clean energy project. Saris discusses the implementation of and lessons learned in running a cluster in eight AWS regions worldwide, with highlights from Cycle Computing’s project Jupiter, a low-overhead cloud scheduler and workload manager. This session also looks at how the MegaRun was financially achievable using the Amazon EC2 Spot Instance market, including an in-depth discussion on leveraging Amazon EC2 Spot Instances to reduce costs and maximize value, while maintaining the needed flexibility, and agility that AWS is known for. After a year of production workloads on AWS, HGST (a Western Digital company), has zeroed in on understanding how to create on-demand clusters to maximize value on AWS. HGST’s David Hinz outlines the company’s successes in addressing the company’s changes in operations, culture, and behavior to this new vision of on-demand clusters. In addition, the session will provide insights...
Video: Jason Stowe Outlines the New Enterprise Workload of Cloud HPC in  AWS re:Invent 2014 Interview

Video: Jason Stowe Outlines the New Enterprise Workload of Cloud HPC in AWS re:Invent 2014 Interview

Recorded Live from the re:Invent 2014 (read our re:Invent blog) show floor in Las Vegas, Cycle Computing CEO Jason Stowe is interviewed by The Cube. In this in-depth conversation, Stowe covers everything from Cycle’s latest Enterprise-speed Cloud cluster it enabled for HGST (a Western Digital company) in November 2014 (read about the record-setting Enterprise Cloud HPC run here), to the company’s record-setting 156,000 core run conducted for researchers at University of Southern California (USC). But all of this isn’t done to set records and build bigger Cloud clusters, Stowe outlines – it’s all in the spirit of better arming researchers & engineers with the powerful tools they need to help them do their jobs better. “The idea that you can just borrow 10-20,000 cores and give [those cores]  back when you’re done is just crazy. This all leads to enabling researchers & engineers to ask the right question – at a scale [beyond] their own internal clusters, to allow them to get better answers, faster.” – Jason Stowe, Cycle Computing CEO Click below to watch and listen to the full...