Lessons learned building a 4096-core Cloud HPC Supercomputer for $418/hr

The Challenge: 4096-core Cluster Back in December 2010, we discussed running a 2048-core cluster using CycleCloud, which was in effect renting a circa 2005 Top 20 supercomputer for two hours. After that run, we were given a use case from a client that required us to push the boundary even further with CycleCloud. The challenge at hand was running a large workflow on a 4096-core cluster, but could our software start and resolve issues in getting a 4096-core cluster up and running?   Cycle engineers accepted the challenge and built a new cluster we’ll call “Oni”. The mission of CycleCloud is to make running large computational clusters in the cloud as easy as possible. There is a lot of work that must happen behind the scenes to provision clusters both at this scale and on-demand. What kinds of issues did we run into as we prepared to scale out the CycleCloud service from building 2048-core cluster up to a whopping 4096-core Oni cluster?  This post covers three of these questions: Can we get 4096 cores from EC2 reliably? Can the configuration management software keep up? Can the scheduler scale? How much does a 4096-core cluster cost on CycleCloud?   Question 1: Can We Get 4096 Cores from EC2 Reliably? We needed 512 c1.xlarge instances (each with 8 virtual cores) in EC2’s us-east region for this workload. This is a lot of instances! First, we requested that our client’s EC2 instance limit be increased. This is a manual process, but Cycle Computing has a great relationship with AWS and we secured the limit increase without issue. However, an increased instance...

HowTo: Save a $million on HPC for a Fortune100 Bank

In any large, modern organization there exists a considerable deployment of desktop-based compute power. Those bland, beige boxes used to piece together slide presentations, surf the web and send out reminders about cake in the lunch room are  turned on at 8am and off at 5pm, left to collect dust after hours. Especially with modern virtual desktop initiatives (VDI), thin clients running Linux are left useless, despite the value they hold from a compute perspective. Fortune 100 Bank Harvesting Cycles Today we want to educate you about how big financial services companies use desktops of any type to perform high throughput pricing and risk calculations. The  example we want to leverage is from a Fortune 100 company, let's call them ExampleBank, that runs a constant stream of moderate data and heavy CPU computations on their dedicated grid. As an alternative to dedicated server resources, running jobs on desktops was estimated to save them millions in server equipment, power and other operation costs, and London/UK data center space, thanks to open source software that has no license costs associated with it! Cycle engineers worked with their desktop management IT team to deploy Condor on thousands of their desktops, all managed by our CycleServer product. Once deployed, Condor falls under control of CycleServer and job execution policies are crafted to allow latent desktop cycles to be used for quantitative finance jobs. Configuring Condor Condor is a highly flexible job execution engine that can fit very comfortably into a desktop compute environment, offering up spare cycles to grid jobs when the desktop machine is not being used for its primary role. Our...

Benchmarks for the brand new Cluster GPU Instance on Amazon EC2

A Couple More Nails in the Coffin of the Private Compute Cluster Update: We're getting an overwhelming response to this entry, if you have questions come to booth #4638 at Supercomputing 2010======Cycle Computing has been in the business of provisioning large-scale computing environments within clouds such as Amazon EC2 for quite some time. In parallel, we have also built, supported, and integrated internal computing environments for Fortune 100s, universities, government labs, and SMBs with clusters of all shapes and sizes. Through work with clients including JPMorgan Chase, Pfizer, Lockheed Martin, Purdue University, among others, we have developed a keen sense for use cases that are most appropriate for either internal or external computing. More and more we see the lines blurring between internal and cloud case overall performance. This is good news for end users that want to have the flexibility to consume resources both internally and externally. During the past few years it has been no secret that EC2 has been best cloud provider for massive scale, but loosely connected scientific computing environments. Thankfully, many workflows we have encountered have performed well within the EC2 boundaries. Specifically, those that take advantage of pleasantly parallel, high-throughput computing workflows. Still, the AWS approach to virtualization and available hardware has made it difficult to run workloads which required high bandwidth or low latency communication within a collection of distinct worker nodes. Many of the AWS machines used CPU technology that, while respectable, was not up to par with the current generation of chip architectures. The result? Certain use cases simply were not a good fit for EC2 and were easily beaten...

Make the Most of Your AWS Instances: Using open-source Condor to Harvest Cycles, Part 2

How To – Harvest Cycles From Your AWS App Servers, Part 2 In Part 1 of this series I introduced you to AmazingWebVideo Inc. They’re a successful, Amazon EC2-based, application provider who wants to get more out of their rented processors. Specifically they want to harvest unused compute cycles from various application servers in between bursty, end-user traffic. We introduced them to Condor in Part 1 and helped them move three classes of background processing jobs from a simple queuing system to Condor in preparation for cycle harvesting. Now lets take a look at how Condor, installed on their application severs, can help them accomplish this goal. In our existing Condor pool, our machines are set to service jobs always. Since the only processing load these machines experience comes directly from running Condor jobs this setup is fine. But our application servers won’t be running under Condor’s control. Condor needs to pay attention to load outside of Condor’s control and only run jobs when this load is suitably low. We’ll use Condor’s START attribute and ClassAd technology to write an expression that controls when these machines should run jobs. But first lets decide how we want the jobs to run on these machines. There is a whole spectrum of choice here and it helps to think about it advance of writing your run-time policies in Condor configuration files. Policy Time There are four state changes around which we need to develop policy: “When can Condor run jobs on this machine?”; “When should Condor suspend jobs it may be running?”; “When should Condor resume running suspended jobs?”; and “When should...

Make the Most of Your AWS Instances: Using open-source Condor to Harvest Cycles, Part 1

How To – Harvest Cycles From Your App Servers, Part 1 It’s a common problem: you run a successful, cloud-based application business in Amazon’s EC2 cloud with bursty traffic. In order to handle the bursts you have to keep a minimum number of EC2 application servers up and running. Would it not be nice if you could do something with these servers between handling the bursty requests? After all: you’re paying for that time, and there’s thumbnails to generate, analytics to calculate, and batch applications to run. Enter Condor. Condor is a high throughput distributed computing environment from the University of Wisconsin, Madison (http://cs.wisc.edu/condor/) that can be configured to steal unused cycles from your application severs when they aren’t serving your main business applications to your customers. Condor provides advanced job scheduling, quota management, policy configuration, support for virtual machine based work loads, integration with all the popular operating systems in use today. And: it’s free. In the next three posts I’m going to show you how to use Condor to harness the wasted compute power on your application servers and how Cycle Computing’s CycleServer can help make this process simple and manageable. The Setup Throughout this series of posts I’m going to talk about a fictitious web application company: AmazingWebVideo Inc. They offer video hosting services and their business has been growing rapidly over the past twelve months. They already run all of their web application components in Amazon’s EC2 cloud, but the nature of their business still requires that they keep a base number of web app servers constantly running to handle the start of any bursts...

Multiple Condor Schedulers on a Single host

So Cyclers have run multiple schedds per host since 2004/2005 when Jason did a Disney movie with Condor, running 12 schedds per submit server with Condor 6.6/6.7 and using software to load balance jobs between the schedulers. Given the interest in this area, we thought we could help explain how to do it in detail. Often, when dealing with scheduling jobs at a large scale in Condor, it can sometimes become useful to simultaneously run more than one condor_schedd daemon on the same server. On modern, multi-core architectures, this technique can bring about several improvements: scheduler bottleneck avoidance, improved job startup times, improved condor_q query times, improved job submission times, and enhanced overall throughput. Today, our guys wrote up both the new school (condor 7.4 or later) and the old school (Condor 7.2 or older) ways of implementing multiple schedulers. Since 2006 CycleServer has done load-based, job distribution between multiple schedulers, so that won't be covered. This post will show how to set up multiple schedulers on a single host, and name the schedds in question. Hope this helps:...