Running MPI applications in Amazon EC2

Despite significant improvements over the years, the same criticisms still color people’s opinion of using cloud environments for high performance computing (HPC). One of the most common things to hear when talking about using Amazon’s Elastic Compute Cloud (EC2) for HPC is “Sure, Amazon will work fine for pleasantly parallel workloads, but it won’t work for MPI (Message Passing Interface) applications.” While that statement is true for very large MPI workloads, we have seen comparable performance up to 256 cores for most workloads, and even up to 1024 for certain workloads that aren’t as tightly-coupled. Achieving that performance just requires some careful selection of MPI versions and EC2 compute nodes, along with a little network tuning. Note: While it is possible to run MPI applications in Windows on EC2, these recommendations focus on Linux. Enhanced Networking The most important factor in running an MPI workload in EC2 is using an instance type which supports Enhanced Networking (SR-IOV). In a traditional virtualized network interface, the hypervisor has to route packets to specific guest VMs and copy those packets into the VM’s memory so that it can process the data. SR-IOV helps reduce the network latency to the guest OS by making the physical NIC directly available to the VM, essentially circumventing the hypervisor. Fortunately, all of Amazon’s compute-optimized C3 and C4 instance types support SR-IOV as long as they’re launched in a Virtual Private Cluster (VPC). For specific instructions on enabling SR-IOV on Linux instances, see http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html Use of Placement Groups Another important factor in running MPI workloads on EC2 is the use of placement groups. When instances are launched... read more

BioIT World – Time to Test Cloud & to Beat Those Computational Traffic Jams

This year, unlike any previous BioIT World Conference and Expo, attendees have the opportunity to try out some of our push-button cloud clusters live on the showroom floor. Because many of our customers have paved the way in demanding their Life Science application are deployed on cloud, it means much of the heaving lifting has been done for those ready to start using cloud. By stopping by the Cycle Computing booth #360, you can: Discuss Scientific Computing & BigData on the cloud with one of our solutions architects and Cycle Computing’s leading tools running in a Cloud Computing Test Drive. Some of the Life Science workload & applications Cycle Computing has helped customers deploy on cloud include: Computational Chemistry Big Data workloads Intel Lustre File System Internet of Things data collection and processing Genomics PK / PD Proteomics Clinical Trial Simulation We hope you’ll stop by to also discuss Cloud Cost optimization, including Spot Instances, and multi-data center workloads. Keep Calm & Compute On   We also will have a limited number of KEEP CALM & COMPUTE ON t-shirts we’ll be handing out to those who take a short survey in our booth. Finally – don’t miss our CEO in his two conference speaking sessions: Conference session – Wed., April 22, 5 p.m. Creating Customized Research Computing Environments on Cloud, While Addressing Needs for Faster Data Transfer, and a High Performance Parallel File System Jason Stowe, CEO, Cycle Computing Cloud provides researchers the ability to create customized computing environments for drug design and life sciences. But with that flexibility, come challenges. This session will review successful enterprise & start-up use cases to highlight how people... read more

OpenSSL Vulnerabilities Announced, Cycle Computing Products Unaffected

On March 19, 2015, the OpenSSL developers announced a set of releases to address some high-priority security issues. Because OpenSSL is an important part of secure communication on the Internet, Cycle Computing has paid close attention to these vulnerabilities. Cycle Computing products (including CycleCloud and DataMan) do not rely on these libraries, and are therefore unaffected. The OpenSSH package used to provide remote login access to Linux instances does make use of OpenSSL, however the announced vulnerabilities to not appear to affect SSH. Customers who run other software on cloud instances that make use of OpenSSL (for example, the Apache web browser), should update the OpenSSL package as soon as it becomes available from the operating system vendor. Amazon has indicated that services provided by Amazon Web Services are not impacted or have mitigation plans in... read more

Extending Your Data Center to the Cloud for Faster Time to Market

There is a great event at Harvard University this Thursday, the Equinix Evolution event, which still has some seats available. Attending CIOs/Directors of IT will be gathering to discuss how dramatically and rapidly cloud is extending the power and capability of their data center. Among others, David Neitz, the CIO of CDM Smith will be giving a great talk on his organization’s Evolution in the datacenter and cloud, as will our CEO Jason Stowe, on how  evolved, cloud technical computing workloads can accelerate time to market and optimize cost. There will be a lot of great discussion at this event, so if you’re in the Boston area reserve a spot here: Date: Thurs., March 19, 2015 Time: 11:30 a.m. – 5:45 p.m. Location: Harvard University, The Loeb House RSVP Here If you can’t attend, this recent article in CIO Review, titled How CIOs Can Be the Corporate Hero in Accelerating Time to Market with Cloud Cluster Computing, outlines some of the concepts that will be presented. Hope to see you... read more

HGST buys 70,000-core cloud HPC Cluster, breaks record, returns it 8 hours later

By Jason Stowe, CEO Today we have a very special workload to talk about. HGST, a Western Digital company, fully embraces the philosophy of using the right sized cluster to solve a problem, but in a plot twist, they return the cluster once they’re done innovating with it. In this case, during a Friday 10:30 a.m. Session, BDT311 at AWS re:Invent, David Hinz, Global Director, Cloud Computing Engineering at HGST, will talk about this extensively. He will describe a number of workloads that are run as part of HGST’s constant push to innovate and build superior drives to hold the world’s information, but some workloads are larger than others… Technical Computing: The New Enterprise Workload The folks at HGST are doing truly innovate work in technology, in part by enabling agility for technical computing workloads for engineering.  Technical computing, including simulation and analytics, HPC and BigData, is the new workload that every enterprise has to manage. One of HGST’s engineering workloads seeks to find an optimal advanced drive head design, taking 30 days to complete on an in-house cluster. In layman terms, this workload runs 1 million simulations for designs based upon 22 different design parameters running on 3 drive media Running these simulations using an in-house, specially built simulator, the workload takes approximately 30 days to complete on an internal cluster. World’s Largest Fortune 500 Cloud Cluster Run First, we found out about this workload this past Wednesday, and our software ran it at scale this past weekend! To solve this problem, our CycleCloud software created a compute environment out of AWS Spot Instances. Over 50,000 Intel IvyBridge cores of... read more

A BRIEF HISTORY OF SUPERCOMPUTING

Infographic Timeline view larger The Evolution of the Supercomputer Computers arose from the need to perform calculations at a pace faster than is possible by hand. Once that problem was solved, the race was on to pit computers against themselves and meet ever-increasing demands for processing power. At first, the race was all about improving raw calculation speed and capabilities. Then, the challenge of solving more difficult problems led to improvements in programming models and software. Eventually, supercomputers were born, enabling scientists and engineers to solve highly complex technical problems. Whereas once supercomputers were simply mammoth machines full of expensive processors, supercomputing today takes advantage of improvements in processor and network technology. Clusters, and now even clusters on the cloud, pool the power of thousands of commodity off-the-shelf (COTS) microprocessors into machines that are among the fastest in the world. Understanding how we got here requires a look back at the evolution of computing. Early computing The earliest computers were mechanical and electro-mechanical devices, but the first high-speed computers used tube technology. Tubes were then replaced by transistors to create more reliable, general-purpose computers. The need for increased ease-of-use and the ability to solve a broader set of problems led to breakthroughs in programming models and languages, and eventually, to third-party application software solutions. By 1954, IBM offered the IBM 650, the first mass-produced computer. FORTRAN, an important language for numeric or computational programs, was developed at this time by IBM’s John Backus. In the early 1960s, general-purpose computers appeared from several suppliers. The next step was to design systems to support parallel operations, in which calculations are performed... read more

Schrödinger Materials Science Partnership a Sign of Things to Come

At Cycle Computing, we've had a long, successful working relationship with the software company Schrödinger. It's always fun when you're #winning – and not the Charlie Sheen way. Rather, together we've been breaking records, advancing meaningful science – and proving out Cloud computing as the future in high performance computing (HPC) for sometime now. It’s about time we formalize the partnership. The things we have been doing with Schrödinger is not only exciting because of what we’ve done – but more importantly of what it represents as far as capability for the future! Enabling better science – and breaking-records. Enabling better science The thing we at Cycle Computing are most proud of is that through partners like Schrödinger Materials Science group, we’re enabling better science. Greater access to computing power is the key. Things like fighting cancer, or developing clean energy products are worthwhile causes that our joint technology is advancing. While there are a lot of factors that have been driving us beyond the tipping point in Cloud HPC adoption, I believe there are three legs to the stool making it all happen. Powerful cloud infrastructure (AWS) Highly accurate, and trusted simulation software (Schrödinger) Orchestration software to enable the software to run on the Cloud (Cycle Computing) Setting Records Sure we’ve been beating our chests lately – and for good reason. The engineering teams at Schrödinger and Cycle Computing have proven out the capabilities and scale of what’s possible on the Cloud. In fact it was really exciting to see Amazon Web Services (AWS) CTO Werner Vogels mention us and the MegaRun at the November re:Invent Keynote address:... read more

Intel Cloud Technology Announcement Features Novartis

Intel is a master gardner. When you step back and consider all the computing innovation company has done to foster and create IT ecosystems over the years, the company’s massive green thumb for technology is undeniable! And therein lies the reason we at Cycle Computing are very excited to see the attention Intel is putting on Cloud HPC. Even more exciting, is work we’ve done with a customer, Novartis Institutes for Biomedical Research, is featured in a recent Intel announcement about this effort. Intel announced its Cloud Technology Program to help businesses identify and leverage top performing technlogies for cloud applications. Pretty cool. It seems Intel has moved above ecosystems, and is now focused on clouds in the atmonsphere too. As we know, when clouds get seeded – rain follows! Cycle Computing has generated a lot of attention lately due to some of our exciting Cloud HPC records, the most recent being The MegaRun, a 156,000+ core Cloud computing in 18 hours, over five continents (all 8 AWS regions), for research being conducted at University of Southern California (USC). BUT it’s important to understand that most of Cycle Computing’s customers are doing more modest Cloud runs, on a daily basis.  Novartis is one of those customers, and is focused on accelerating science and research in its fight against disease. We’ve been working with them for some time. They use both traditional HPC resources, as well as the Cloud. This is how the Intel Cloud Technology Program announcement references Novartis: For example, Novartis Institutes for Biomedical Research performed an extensive analysis of instances to find that choosing a premium high-performing one would provide... read more

HPC History Quiz Still Seeks Perfect Score

The window is still open – to recognize a superstar who can demonstrate their knowledge of high performance computing (HPC) history. Despite 100+ entries, we have yet to receive a perfect score on our Test Your Knowledge of HPC History Quiz yet. Is that because you haven’t YET taken the quiz? Maybe. If you think you are the ONE who can pull the sword out of the stone – it’s time to step forward. I said earlier there have been no perfect scores, but that’s not exactly true. I took the quiz and aced it. But – I wrote the quiz, so that doesn’t count. I also see where a few people took the quiz twice – which is completely fine – but your first score is the only one that counts. The Quiz is still live now. You can take it by visiting http://cyclecomputing.com/hpc-quiz now – and you should. Redeem our industry – make us proud. We’ll shut down the quiz Dec. 31, 2013. This quiz is part of a broader Celebration of Supercomputing History campaign that Cycle Computing unveiled at the SC13 Conference in Denver this November. At the event we handed out limited-edition, and what I think turned out to be really cool, History of Supercomputing timeline posters. If you missed picking up a copy – stay tuned, we’ll make an announce soon that offers you a way to checkout some of the significant milestones in HPC History.  In the mean time – with 2013 winding down, taking this short 20-question quiz is a perfect way to spend your time appreciating where we’ve come from, and... read more

Cycle Computing Featured in AWS Webinar Dec. 12: Accelerate Manufacturing Design Innovation with Cloud-Based HPC

We have a very exciting webinar coming up this Thursday, Dec. 12, where we have a great user speaker, and real customer use cases that will overview Cloud Computing & Utility HPC and how to accelerate Manufacturing Design. In the one-hour webinar, we’ll review how Cloud computing and Utility HPC is being used today as a competitive advantage with CAD/CAM and electronic design automation (EDA), with the ability to spin up clusters running common industry applications. Real-world manufacturing use cases will be discussed, and we’re honored to have Cycle Computing customer from HGST, a Western Digital Company, overview his experience using HPC in the Cloud. We’ll also be showcasing details on our Record-breaking MegaRun – the 156,000+ core run we received so much publicity for back in November.  RSVP for the Webinar here: http://bit.ly/1bUojrd This webinar follows a busy November for Cycle Computing! Two of the biggest events of the year for Cloud computing and Utility HPC were held back to back: AWS re:Invent in Las Vegas, and then SC13 in Denver. One of the highlights of these events was the first PetaFLOPS-scale cloud cluster, our record-setting MegaRun Cloud computing run – a Utility HPC milestone – and the World’s Largest and Fastest Cloud Computing Run. This run was highlighted in AWS CTO Dr. Werner Vogels' Keynote Address. Below are some details: Real-world Science Cutting-edge Clean Energy Research thanks to Schrödinger Materials Science tools  (https://www.schrodinger.com/materials/) Simulated ~205,000 organic semiconductors to determine efficiency as solar materials MegaRun Stats 156,000+ cores Measured at 1.21 PetaFLOPS peak throughput (not RMax) 264 years of computing in 18 hours World wide cloud run: Across all... read more

Back to the Future: 1.21 petaFLOPS(RPeak), 156,000-core CycleCloud HPC runs 264 years of Materials Science

Hi everyone, it's Jason, and I have an exciting environment to tell you about. If I had asked Cycle engineers to create a massive HPC cluster to supply 1.21 PetaFLOPS of compute power, their reaction would be something like this:    Thankfully, instead, we asked them to just put together another BigCompute cluster to help advance materials science. Lot easier pill to swallow, isn't it?   Materials Science, but Back in Time How do you use Utility HPC to advance materials science you ask? Well, first, we started working with an amazing researcher, Professor Mark Thompson, who got his PhD in Chemistry from Cal Tech, and now does research on organic semiconductors, specifically organic photovoltaic solar cells. The challenge in solar power has always been the efficiency. Humanity has to find a material that can turn photons from the sun into electricity with as little waste as possible. The more efficient it is, the more electricity from each soalr panel, the more viable material. The total number of potential solar panel materials is limitless. But that actually makes it that much more difficult to find the best material out of all all the possibilities. As Professor Thompson puts it, "If the 20th century was the century of silicon materials, the 21st will be all organic. The question is how to find the right material without spending the entire 21st century looking for it." Hear, hear! Designing a new material that might be well-suited to converting sunshine into electricity is a difficult challenge. First, for any possible material, just figuring out how to synthesize it, purify it, and then analyze it, typically takes... read more

Recognizing HPC Invention: Cycle Computing & Customers Receive 4 HPCWire Readers Choice Nominations

With so much going on, from technical sessions, to new product launches from vendors, and showcases of scientific and engineering achievement – it’s sometimes nice to step back and recognize the amazing work that’s happened over the past year. And this is exactly what the HPC Wire Readers Choice Awards (http://bit.ly/votecycle2013) are all about.

read more

Big Conferences, Big Sciences, Big Awards

What a blur the past two months have been – in a good way. Industry conferences were in full bloom this April and May so the Cycle team took to the road. Here are just a few of the highlights from our travels…   We kicked off April with the Bio IT World Conference & Expo in Boston. As always, it was an impressive conference in terms of the caliber of the audience and content. I had the honor of chairing the “Risk and Strategy for Pharma in the Cloud” Track, as well as giving a session that highlighted real-world experiences from our Life Sciences customers. The tremendous momentum and impact Utility HPC is having in Life Sciences were further validated when Bio IT World named our customer & partner, Schrödinger, as their BestPractice Grand Prize Winner in IT Infrastructure. Congratulations Schrödinger! Next stop: the Big Apple for AWS Summit NYC. Hearing a shout out from Werner Vogels during his keynote was a great start to the day. And a big thanks to Stephen Elliott, Sr. Product Manager, EC2, for inviting us to speak during his informative talk on “Optimizing Your AWS Applications and Usage to Reduce Costs.” It was great to share the success we’ve had leveraging spot instances in a wide range of use cases, including the 10,600 server cluster, or $44M worth of compute infrastructure, to run 39 years of science for $4,372! During the NY Summit, we also announced Cycle’s newest product, DataManager, which facilitates the storage and transfer of data. Customer response has been extremely positive, and we’re proud to report that a Top... read more

Announcing ZeroCompute™: Eco-conscious, Instantaneous Utility Supercomputing

Today’s researchers are shackled by a lack of access to compute power. Novel advances in modern science and engineering demand higher performing systems to solve increasingly difficult problems.   Cycle is pleased to announce our new intelligent orchestration software to meet these challenges, called “ZeroCompute™”. ZeroCompute solves access to remote high performance cloud computing environments with a patent pending technology designed to accelerate access to HPC and BigData systems. This approach can simultaneously manage science that might normally require brontobyte datasets, and five hundred billion concurrent jobs. After years building Utility HPC & Utility Supercomputing software, Cycle’s engineers have discovered that the fastest way to execute any algorithm is to just not run it at all. Although seemingly obvious, by simply not running the science on any cores, we remove the computational challenges and data transfer bottlenecks of today’s BigCompute and BigData workloads. Because ZeroCompute “completes” the floating point in the nanosecond it takes the software to decide not to run anything, accordingly, the system exceeds a peak floating-point performance of over one billion petaflops, at a cost of $0.00 per flop. By combining ZeroCompute with a nice hammock and an adult beverage, all researchers can now compute with ease. “This is earth shattering technology! For years we have been extremely worried about how can we possibly compute against such large datasets,” says Dr. James Cuff, Cycle’s Chief Technology Officer. “Now with ZeroCompute, we just don’t have to worry. It’s a relief that we can kick back, and allow ZeroCompute to not run the workload for us. It’s so easy anyone can do this, simply by doing nothing!” We... read more

Servers Are Not Houseplants: Servers Are Wheat!

Once upon a time, in the early evolution of computing, it was a time of great opulence, and all was well with the kingdom. This was a time when when every computer server had a title and they were loved and anthropomorphized by the humans. The elder humans would give them mawkish little names and take tender loving care of each and every one of them, talking to them, hugging them, feeding them individually and treating them with great care, understanding the uniqueness of each and every one. Ah, history! It teaches us so many things! These primitives were not alone, even today the British Royalty have been known to talk to their house plants! Hands up how many of us have named our servers?  We come up with all sorts of names much inspiration has come from Starwars, names of planets, names of beverages and other popular motifs. Our friend Peter Cheslock started a lovely twitter thread, which also included replies from Pinterest. Think about it, Pinterest’s entire business is about being social! Even their catchphrase is “A few (million) of your favorite things!“, but still these folks don’t name their servers, a UUID is just dandy! So why is this?  Why don’t we give cutesy names to our servers any more? Our answer here at cycle is: Scale has changed everything! The ubiquity of cloud computing and utility supercomputing clearly changes all of this. The amount of science we need to execute on a daily basis changes this. We don’t have time to nourish and look after individual computer servers, or houseplants for that matter! Accordingly: Servers... read more

Enterprise HPC in the Cloud: Fortune 500 Use Cases

Cycle Computing invited some of their customers to AWS re:Invent to talk about the amazing science, analytics and discovery they are executing at scale in concert with Cycle technology platforms. Our segment was called “Enterprise HPC in the Cloud: Fortune 500 Use Cases“. First up Jason Stowe Cycle Computing CEO gives a wonderful overview of why Utility Supercomputing really matters for each and every one of us!   However, we also thought it would be wonderful to share our customers activities with you so you may see what leading players in the field of Utility Supercomputing are doing on a day to day basis. There are four amazing talks below, each do an excellent job of highlighting the challenges and solutions for each domain.  The talks describe the use of utility supercomputers to solve real world problems in Life Science, Finance, Manufacturing, Energy, and Insurance. The latest and greatest in advanced workloads are discussed that are helping push these industries and technologies forward through the insights gained from big data and high performance computing, including a sneak peak into what they see as coming next. First up, Kurt Prenger and Taylor Hamilton of J & J High Performance Computing:   Challenges: Disparate internal clusters, lack of resource for large problem execution, and home grown systems pain. Solutions: Cycle Computing SubmitOnce – jobs are routed to internal and external clusters via configurable intelligence based on cluster loads, data size and estimated runtimes. Single entry point for users without having to understand complex backend systems. Next up: David Chang, Assistant Vice President of Pacific Life:   Challenges: Increasing complexity in product design with... read more

Built to scale: 10,600-instance CycleCloud cluster, 39 core-years of science: $4,362!

Here is the story of a 10,600 instance (i.e. a multi-core server) HPC cluster created in 2 hours with CycleCloud on Amazon EC2 with one Chef 11 server and one purpose in mind: to accelerate life science research relating to a cancer target for a Big 10 Pharmaceutical company. Our story begins…First, when we got a call from our friends at Opscode about scale-testing Chef, we had just the workload in mind. As it happened, one of our life science clients was running a very large scale run against a cancer target. And let us tell you, knowing that hardcore science is being done with the infrastructure below is a very satisfying thing:  AWS Console Output   That’s right, 10,598 server instances running real science! But we’re getting ahead of ourselves… Unfortunately, we’re a bit limited in what parts of the science we can talk about, other than to say we’ve done a large-scale computational chemistry run to simulate millions of compounds that may interact with a protein associated with a form of cancer. We estimated this would take about 341,700 hours. This is very cool science, science that would take months to years on available internal capacity! More on that later… Thankfully, our software has been doing a lot of utility supercomputing for clients, and as we mentioned last week, because of this we’re hiring. So to tackle this problem, we decided to build software to create a CycleCloud utility supercomputer from 10,600 cloud instances, each of which was a multi-core machine! This makes this cluster the largest server-count cloud HPC environment that we know about, or has... read more

Right People, Right Place, Right Time – Cycle Hiring to meet explosive demand

Dear HPC, HTC, and Cloud communities, We'd like your attention, please, for a moment. Cycle has had the good fortune to do some amazing things over the years, but now we've hit a serious growth curve, so we're delighted to say: We're hiring. And not in a small way… Every month, every quarter, from here on out. At Cycle we believe that many of the world’s scientific, engineering and finance minds are shackled by a lack of access to compute power.Our products connect people to computing resources at any scale, when it’s needed most. Many Fortune 500's, start-ups and public research customers use our software to run mission-critical science, risk management, and engineering, and now we're growing our team of game-changers.  Cloud HPC/HTC and utility supercomputing are here, and we're leading that charge: We have a full product set, and we're rolling We're hiring, in many functions from engineering, sales, business development, operations, etc.   We're helping customers do amazing work, check out the coverage in Wired, the NY Times, BusinessWeek, or the Wall Street Journal. So if you're an overachiever, if you want to be at a company helping the world fight disease, designing better/safer products, or ensuring returns for retirement funds that are decades in the making, then we want you. You're honest, you’re smart, you get things done, and you want to join a team at this 100% employee-owned software company. So check out our current job listings, send an e-mail to jointheteam -at- cyclecomputing.com with the job title in the subject, and your resume. Come interview, show us your strengths in engineering, sales, business development, sales... read more

Fortune 500s discuss Cloud HPC, Utility Supercomputing @ Cycle’s re:Invent session

As many of you know, at Cycle we think that giving every researcher, scientist, engineer, and mathematician access to the compute power they need, exactly when they need it, will enable humanity to achieve what its truly capable of.  So we organized five Fortune 500 Cycle customers of ours to talk at AWS re:Invent at 1pm Wednesday the 28th, about Cloud, HPC, and utility supercomputing. Whether its building safer medical devices, managing risk, helping quantify genomes at scale, protect hard-earned retirement funds, or find medicines to cure disease, they'll be talking about how they use Cloud to do it! At 1pm tomorrow (Wednesday) come to "Enterprise HPC in the Cloud: Fortune 500 use cases" in room 3401A to see: HartfordLIfe Johnson & Johnson Life Technologies  Novartis  PacificLife If you can't make the session come to Cycle's Booth #220, and we can talk more... read more

BigData, meet BigCompute: 1 Million Hours, 78 TB of genomic data analysis, in 1 week

It seems like every day at Cycle, we get to help people do amazing work, but this week is a little different. This week we wrapped up our involvement in the amazing work by Victor Ruotti of the Morgridge Institute for Research, winner of the inaugural Cycle Computing BigScience Challenge. In the name of improving the indexing of gene expression in differentiated stem cells, Cycle's utility supercomputing software just finished orchestrating the first publicly disclosed 1,000,000+ core hour HPC analysis on the Cloud. Yes, that’s 1 Million hours, or over a ComputeCenturyTM of work, on a total of 78 TB of genomic data, in a week, for $116/hr!  To put this 115 years of computing into context, the word ‘computer,’ meaning an electronic calculation device, was first used in 1897. So if you had started this run on a one-core computer when the term was first used, and kept it running through World War I, Jazz, the roaring 20’s, the Great Depression, WWII, Big Bands, the start of Rock’n’Roll, the Cold War, the Space Race, the Vietnam War, Disco, the 80s, grunge, techno, hip hop, reality TV, and up to Gangnam Style, Victor’s analysis would be finishing now, sometime in 2012. Now that’s a lot of compute. Below, we're going to explain the details of the analysis, and how it was executed, but if you're short on time, please skip to why this is important.  Cycle Computing BigScience Challenge Overview About a year ago we were very excited to announce the Challenge, a contest aimed at breaking the computation limits for any researchers working to answer questions that will help humanity.... read more

Free Grill v1.5 and Grill Recipe Challenge announced!

We’re pleased to announce the release of Grill version 1.5 for Opscode Chef! And it’s FREE! We’ve added a few features that came in useful during our last Utility Supercomputing run code-named Naga, including: New customizable reports Backporting Chef alert system to CycleServer core Greatly improved performance for large data sets Hosts are now removed after a certain amount of inactivity But we have a dirty secret: we don’t have a publicly available cookbook to install it. We know, we know. We wrote a chef report visualization tool that can handle several thousand nodes all converging at once but we don’t have an installer cookbook. Well, we do actually have an installer, but it’s tightly-coupled to CycleCloud and we just haven’t had time to make a rock-solid generally-useful cookbook. So we’d like to announce the Grill Recipe Challenge. Write an awesome Grill cookbook and we’ll send you a CycleComputing Mug and T-shirt. Here’s the simple rules: Submit your cookbook to grill-recipe@cyclecomputing.com Release your cookbook under the Apache license.  We’ll accept entries until 11:59 PM PST on June 12th 2012.  We’ll pick the best one and send the winner a sweet Cycle mug and T-shirt  (our decisions are final! 😉 So if you need to explore Chef converge data, download Grill today, kick the tires, write an awesome cookbook, and win fabulous prizes…... read more

CycleCloud Achieves Ludicrous Speed! (Utility Supercomputing with 50,000-cores)

Update: Since publishing this blog entry, our 50,000 core CycleCloud utility supercomputer has gotten great coverage by BusinessWeek, TheRegister, the NY Times, the Wall Street Journal’s CIO Report, Ars Technica, TheVerge, among many others. And now it would run for $750/hr with the AWS spot pricing as of 6/22/2012! Click here to contact us for more information… By now, we've shown that our software is capable of spinning up cloud computing environments that run  at massive scale and produce real scientific results.  After some of our previous efforts, we realized we were onto something with the CycleCloud Cloud HPC and Utility Supercomputing concept. However, even we underestimated the scales researchers would want to use and the scope of the research that this would impact.  Among the requests were some from a leader in computational chemistry research, Schrodinger. In collaboration with Nimbus Discovery, they needed to virtually screen 21 million molecule conformations, more than ever before, against one possible cancer target using their leading docking application, Glide. And they wanted to do it using a higher accuracy mode early-on in the process, which wasn’t possible before because it is so compute intensive! This is exactly what we did with our latest 50,000 core utility supercomputer that CycleCloud provisioned on Amazon Web Services, code-named Naga.  And Schrodinger/Nimbus got useful results they wouldn't have seen without utility supercomputing. We will describe how we accomplished this below, and in future articles and future blog posts. From a scale perspective, the most revolutionary concept implemented for Naga was scaling out all the components of an HPC environment. In our previous megaclusters, we performed a great deal of optimization... read more

Astounding Science: the CycleCloud BigScience Winners

  “If we did all the things we are capable of, we would literally astound ourselves.”  — Thomas A. Edison It is with this quote in mind that we get to do something wonderful. Today, we will honor the winner of the CycleCloud BigScience Challenge 2011.   What is the CycleCloud BigScience Challenge? Last October, we defined "utility supercomputing" and challenged researchers to break out of a common habit: limiting the questions they asked to the size of their local compute environment.  Instead we asked them to propose big questions whose answers can move humanity forward.  We offered the winner use of “utility supercomputing”, providing resources at a scale of the Top 500 supercomputing list, to run their BigScience for a few hours, then turn it off. With Cycle offering $10,000 in time, and Amazon adding another $2,500, together the Winner will have an equivalent of 10 hours on a 30,000-core CycleCloud cluster. We announced the Finalists at Supercomputing 2011, and had a group of industry luminaries agree to be judges, including Kevin Davies of Bio IT World, Matt Wood of AWS, and Peter Shenkin of Schrodinger. Many thanks to all of you for your help.   The Finalists And then we saw the Finalists' presentations. The experience was inspiring. The finalists all sought to tackle BigScience problems, which only utility supercomputing could help. And it was awesome! So it is with great pleasure I would like to recognize the Finalists that presented: Alan Aspuru-Guzik & Johannes Hachmann, Harvard Clean Energy Project Jesus Izaguirre, University of Notre Dame Victor Ruotti, Morgridge Institute for Research Martin Steinegger, TU Munich ROSTLAB... read more

CycleCloud BigScience Challenge Finalists = +5 for humanity

So we announced the finalists of the CycleCloud BigScience Challenge 2011 at Supercomputing 2011 in Seattle last night. Finalists were selected based on their proposal’s long-term benefit to humanity, originality, creativity and suitability to run on CycleCloud clusters. We're excited to announce that in addition to the $10,000 Grand Prize, and $500 for Finalists, the awesome folks over at Amazon Web Services are adding their own $2500 for the Grand Prize, and $1000 per Finalist, bringing our totals to $12500 for the BigScience award and $1500 per Finalist. With $12500 we will be able to do about 10 hours on a 30000 core environment, or 30 hours on 10000 cores. The finalists will be judged by me, and a panel of industry luminaries, including Kevin Davies, editor-in-chief, Bio-IT World, Matt Wood, technology evangelist for Amazon Web Services, and Peter S. Shenkin, vice president, Schrödinger. Thanks to Kevin, Matt, and Peter for helping judge the winner. Picking Finalists from all the entrants we got was hard enough. We're going to have our hands full picking a winner from all this great research! So without further ado, below are the Finalists for the inaugural CycleCloud BigScience Challenge: Alan Aspuru-Guzik, Harvard Clean Energy Project, professor in department of chemistry and chemical biology and Johannes Hachmann, postdoctoral fellow: Hachmann and Aspuru-Guzik wish to conduct computational screening and design of novel materials for the next generation of organic photovoltaics (OPVs). The goal is to facilitate creating the next generation of photovoltaic cells. Jesus Izaguirre, University of Notre Dame, associate professor of computer science and engineering and concurrent associate professor of applied and computational mathematics... read more

CycleCloud BigScience Challenge Finalists Announced at SC11 Amazon Booth #6202 on Monday at 8PM

We got a great response to the CycleCloud BigScience Challenge 2011, and wanted to thank all of the entrants. The entries included calculations to fight Alzheimer's, fight Parkinson's, help understand stem cell differentiation, and improve photovoltaics to make greener energy. It's going to be hard to pick Finalists! We are also excited that the gracious folks at Amazon Web Services are adding to the prize money that Cycle is providing. Amazon's contribution will help us get even more BigScience done in shorter periods of time using utility supercomputing. We'll announce the Finalists, the final Judges, and more details about the additional prizes, at 8pm Monday, November 14th, at SC11 at the Amazon Booth #6202. Additional information will be available at Cycle Computing's Booth #443 throughout the show and on the BigScience Challenge Website. Thanks again to the entrants. We're very excited about these scientists using utility supercomputing to perform research that will benefit humanity. We'll see you at... read more

Mad Scientist could win CycleCloud BigScience Challenge…

Just kidding, he's just a potential finalist! 😉 As some of you may know, Cycle wants to help scientists answer big research questions that might help humanity by donating compute time using our utility supercomputing softare. But in the overwhelming response we've gotten to the CycleCloud BigScience Challenge we announced last week, we repeatedly get the question, "What kind of research benefits humanity?" And the answer isn't Dr. Evil researching "sharks with frickin' laser beams"! Let's highlight a couple of the entries already received that might move us forward: There is the researcher doing quantum mechanics simulations for materials science to improve solar panel efficiency that might help "electrify 2.5 Billion people" with greener energy. Or the computational biologist that wants to use meta-genomics analysis to create a knowledgebase indexing system for stem cells and their derivatives, helping us "speed development of personalized cell-based therapies". Very exciting! Maybe you analyze public government data to provide clarity. Or you research science that might help in the race to treat Alzheimer's, Cancer and Diabetes. Or you're simulating ways to more efficiently distribute food in places that need it. There's plenty of utility supercomputing applications ahead of us that could benefit humanity, and now's your chance to start. Remember entries are due November 7th. So come join us. There's just four questions between you and the equivalent of 8 hours on a 30000 core cluster. So submit early & submit often, and let's change the speed that BigScience gets done! Jason StoweCEO, Cycle... read more

CycleCloud BigScience Challenge 2011

I’m planning on offering the equivalent of 8 hours on a ~30,000-core cluster ($10,000 in free CycleCloud time) to help researchers answer questions that will help humanity. But before we get there, let’s talk about why: Recently, Cycle got significant press coverage for using CycleCloud to create a ~30,000 core cluster on Amazon EC2 for a Top 5 Pharma to run research. It hit Ars Technica, then Slashdot, then Engadget (which had a great depiction of Nekomata, btw), then Wired's CloudLine … unreal. (Update: Now Forbes and Wired too!) In reading all the comments and questions, like "Would this run high-res Crysis?" or "How much capacity does Amazon have?", a thought went through my mind: I became concerned. Concerned because these shouldn't be the questions we're asking. I worried that in all this glitter, we would miss what is truly gold: that this type of computing can speed up scientific research and solve problems we’d traditionally never dream of tackling. So I'm writing this to introduce a new concept to you, and implore you to think about how to move the human race forward through science and research. To start, let's answer a question implied in many comments: Why is this important? For years cloud computing has been about paying for what you use, and accessing the compute power you need, when you needed it. The problem is, today, researchers are in the long-term habit of sizing their questions to the compute cluster they have, rather than the other way around. This isn’t the way we should work. We should provision compute at the scale the questions need. We're... read more

New CycleCloud HPC Cluster Is a Triple Threat: 30000 cores, $1279/Hour, & Grill monitoring GUI for Chef

Update: Wow, we've gotten tremendous feedback from this run on Arstechnica, Wired, and others, and man has it been a busy few days. We did have many people ask a quesiton that we wanted to clarify: Q: How long would the run-time take in-house vs. in-cyclecloud? A: The clients indicated the workload would never have happened in-house because it would have used everything they had for week(s). The in-cyclecloud run time was 7-8 hours. ========================= In more ways than one, the Nekomata cluster is three times as impressive as our last public mega-cluster. A few months ago, we released details of the Tanuki cluster, a 10,000-core behemoth launched within AWS with the click of a button. Since then, we have been launching large clusters regularly for a variety of industries. We kept our eye open for a workload large enough to push us to the next level of scale. It didn’t take very long. We have now launched a cluster 3 times the size of Tanuki, or 30,000 cores, which cost $1279/hour to operate for a Top 5 Pharma. It performed genuine scientific work — in this case molecular modeling — and a ton of it. The complexity of this environment did not necessarily scale linearly with the cores. In fact, we had to implement a triad of features within CycleCloud to make it a reality:1) MultiRegion support: To achieve the mind boggling core count of this cluster, we launched in three distinct AWS regions simultaneously, including Europe. 2) Massive Spot instance support: This was a requirement given the potential savings at this scale by going through the spot... read more

Fast and Cheap, pick two: Real data for Multi-threaded S3 Transfers

Gentleman start your uploads! They're free now but how fast can we do them? Lately we’ve been working with clients solving big scientific problems with Big Data (Next Generation Sequencing analysis is one example) so we’ve been working hard to transfer large files into and out of the cloud as efficiently as possible. We’re optimizing two costs here: money and time. Lucky for us, Amazon Web Services continues to drive down the costs of data transfer. We were excited to see that all data transfer into AWS will be free as of July 1st! They’re also reducing the cost to transfer data out of AWS. Less money, more science, yes! We still need to optimize for time, however. The scalability of the Elastic Compute Cloud (EC2) means we can throw as many cores at a scientific problem as we can afford in a very short time. But what if our input or result data is so large that the time to transfer it far outweighs the time to analyze it? Our previous work has shown that file transfers often do not fill the pipe to capacity, and are often limited by disk I/O and other factors. Therefore, we can speed transfers by using multiple threads to fill the pipe.   As shown above, this work involved moving data directly to a file system using rsync. But since that time, we’ve begun to rely upon the Simple Storage Service (S3) as both a staging area and long-term storage solution for input and result data. S3’s availability and scalability are far superior than even striped Elastic Block Store volumes running on... read more

Why Baking Your Cluster AMI Limits the Menu: DevOps for HPC clusters

You may have read our last blog post about Tanuki, the 10000-core HPC supercomputer we built to predict protein-protein interactions. We’re back to tell you a little bit about how we provisioned the 1250 c1.xlarge instances that made up Tanuki. In fact, it’s the same technology that builds all of our CycleCloud instances whether you select a single-node stand-alone Grid Engine cluster or a super kaiju Condor cluster like Tanuki. But before we get into how we do things today, lets talk about where we’ve been and what we’ve learned.   Pre-Built Custom Images: Basic Cloud Cluster “Hello World” It seems every one's first foray into building HPC clusters in a public virtual cloud (like Amazon’s EC2) involves baking a specialized image (AMI) complete with all the tools required to handle the workload. The most basic architecture includes software for pulling work from a queue or central storage location (e.g. Amazon’s SQS or S3), running the work, and pushing the results back. If you’re feeling especially clever, you may even use a scheduler like Condor, SGE, or Torque. This first cluster comes up fast, but like all first attempts, probably has some bugs in it. Maybe you need to fix libraries to support your application, add an encrypted file system, or tweak your scheduler configuration. Whatever the case, at some point you’ll need to make changes to it. If you’ve got just one cluster with a handful of nodes, making these changes manually can be done but it’s a pain. Alternatively, you can make your changes, bake new images and restart your cluster with the new images. This is... read more

Single click starts a 10,000-core CycleCloud cluster for $1060/hr

Update: This cluster received great coverage, including Amazon CTO Werner Vogel's kind tweet, customer commentary on this Life Science cloud HPC project, & results from our EC2 HPC Cluster. Meet our latest CycleCloud cluster type, Tanuki. Created with the push of a button, he weighs in at a hefty 10,000 cores. Yes, you read that right. 10,000 cores. Tanuki approximates #114 on the last 2010 Top 500 supercomputer list in size, and cost $1060/hr to operate, including all AWS and CycleCloud charges, with no up front costs. Yes, you read that right. 10,000 cores costs $1060/hr. Here are some statistics on the cluster: Scientific Need =  80000 Compute Hour Cluster Scale =  10k cores, 1250 servers Run-time =  8 hours User effort to start =  Push a button Provisioning Time =  First 2000 cores in 15 minutes, All cores in 45 minutes Upfront investment =  $0 Total Cost (IaaS & CycleCloud) =  $1060/hr This historic supercomputer, built completely in the cloud, drew its first breath minutes after the push of a button. Tanuki started operations through a completely automated launch using our CycleCloudSM service. It ran for 8 hours before the job workflow ended and the cluster was shutdown. The 8-hour run-time across 10000 cores yielded a treasure trove of scientific results for one of our large life science clients. The ability to run a cluster of this size for $1060/hr, including AWS and CycleCloud charges, is mind-boggling, even to those of us that have been in the cloud HPC business for a while. When Tanuki was first mentioned within Cycle, its scale was thrown out partly as a... read more

CondorWeek 2011 T-Shirts

Every year it seems like the Condor community will run out of ideas for CondorWeek t-shirts, but, we're happy to say that still hasn't happened! Check out list of t-shirt ideas, and please if you have a new idea, please comment! Suggestions so far for 2011:1. I knew I had a problem when I found myself looking at my phone thinking, "I could run Condor on that!"2. JPMC uses Condor, BearStearns didn't. hmmmm…3. My Condor pool's so big, my pool has a pool (hdfs), and even my pool's pool is bigger than your pool… 4. Ray. Ray! the next time someone asks if you're running Condor, you say YES! 5. I don't often run 10,000 core clusters, but when I do, I use Condor. Stay compute-thirsty my friends. 6. "Look at your cluster, now back to me, now look at your cluster, now back to me. I'm Condor, the scheduler your cluster could schedule like." 7. That's no moon, that's Purdue's Condor cluster! 8. Condor: Slots that actually pay out! 9. Dr. Condormatch (Or How I Learned To Stop Worrying And Love my Cluster) 10. Condor: "ZKM" WTF? 11. There Can Be Only One condor_master 12. Condor: finding particles, curing disease, and encoding mp3's since 1988. 13. You want the slots? You can't handle the slots! 14. Kirk: Khhaaaaaaaannnnndooooorrrr! EVERY VOTE COUNTS! If you like one of these over the others, please comment on our blog with your preferences. You don't have to use your name, but please use your e-mail so we can tell if we're using your preference/idea, and get your... read more

Lessons learned building a 4096-core Cloud HPC Supercomputer for $418/hr

The Challenge: 4096-core Cluster Back in December 2010, we discussed running a 2048-core cluster using CycleCloud, which was in effect renting a circa 2005 Top 20 supercomputer for two hours. After that run, we were given a use case from a client that required us to push the boundary even further with CycleCloud. The challenge at hand was running a large workflow on a 4096-core cluster, but could our software start and resolve issues in getting a 4096-core cluster up and running?   Cycle engineers accepted the challenge and built a new cluster we’ll call “Oni”. The mission of CycleCloud is to make running large computational clusters in the cloud as easy as possible. There is a lot of work that must happen behind the scenes to provision clusters both at this scale and on-demand. What kinds of issues did we run into as we prepared to scale out the CycleCloud service from building 2048-core cluster up to a whopping 4096-core Oni cluster?  This post covers three of these questions: Can we get 4096 cores from EC2 reliably? Can the configuration management software keep up? Can the scheduler scale? How much does a 4096-core cluster cost on CycleCloud?   Question 1: Can We Get 4096 Cores from EC2 Reliably? We needed 512 c1.xlarge instances (each with 8 virtual cores) in EC2’s us-east region for this workload. This is a lot of instances! First, we requested that our client’s EC2 instance limit be increased. This is a manual process, but Cycle Computing has a great relationship with AWS and we secured the limit increase without issue. However, an increased instance... read more

64 GPUs, $100, and a Dream: Practical GPU on EC2 Experience

When we last spoke about GPUs on our blog, it was during the SuperComputing 2010 conference when AWS announced their new cg1.4xlarge instance type. The response to our benchmarks for the Amazon CG1 instance for SC 2010 was phenomenal. As a quick review, cg1.4xlarge are the typical AWS “Cluster Compute” instance extended with a pair of Nvidia M2050 GPUs, 22 GB of memory, and a 10Gbps Ethernet interconnect. Since we first published our Amazon GPU on CycleCloud benchmarks, the phone has been ringing off the hook at Cycle as we received interest in automatically creating clusters with shared file systems using CG1, high memory, and high-cpu instance types. As an example, we’ve created a 32-node / 64-GPU cluster that ran molecular dynamics apps in 1 month instead of 5 months thanks to the Tesla GPUs. When combined with the 8TB filer, this particular cluster costs less than $100 per hour to operate, and took about 10-15 minutes to spin up initially. Given all this experience in automating clusters, we thought it was high time we shared some of what we found. First, we'll cover the what's and why's of GPU clusters on the cloud, then get into some data about how our experience has been, and cover our costs detail. Overview As a quick background, Amazon’s EC2 offerings now include the cg1.4xlarge instance type, the typical Cluster Compute Instance (CCI), extended with a pair of Nvidia M2050 GPUs. Access to the GPU units is through the standard CUDA toolkit installed on top a Cent OS 5 release. From an application development perspective nothing is different; you write your applications... read more

HowTo: Save a $million on HPC for a Fortune100 Bank

In any large, modern organization there exists a considerable deployment of desktop-based compute power. Those bland, beige boxes used to piece together slide presentations, surf the web and send out reminders about cake in the lunch room are  turned on at 8am and off at 5pm, left to collect dust after hours. Especially with modern virtual desktop initiatives (VDI), thin clients running Linux are left useless, despite the value they hold from a compute perspective. Fortune 100 Bank Harvesting Cycles Today we want to educate you about how big financial services companies use desktops of any type to perform high throughput pricing and risk calculations. The  example we want to leverage is from a Fortune 100 company, let's call them ExampleBank, that runs a constant stream of moderate data and heavy CPU computations on their dedicated grid. As an alternative to dedicated server resources, running jobs on desktops was estimated to save them millions in server equipment, power and other operation costs, and London/UK data center space, thanks to open source software that has no license costs associated with it! Cycle engineers worked with their desktop management IT team to deploy Condor on thousands of their desktops, all managed by our CycleServer product. Once deployed, Condor falls under control of CycleServer and job execution policies are crafted to allow latent desktop cycles to be used for quantitative finance jobs. Configuring Condor Condor is a highly flexible job execution engine that can fit very comfortably into a desktop compute environment, offering up spare cycles to grid jobs when the desktop machine is not being used for its primary role. Our... read more

Creating A 2048-Core HPC Cluster in Minutes on AWS for a $525 job

World, meet Okami. Okami, meet World We do a lot of work, at very large scales, on HPC work in the cloud, and today we’d like to introduce you to a decent sized HPC Cluster we recently worked on: let’s call it ‘Okami’. Okami has a number of components familiar to those who have worked with internal HPC environments: 2048 cores, shared storage, and a scheduling system. Had Okami been born in 2005 rather than 2010, he’d be in the Top 20 largest computers at that time. But the similarities between Okami and internal clusters end there. First, Okami was provisioned, from start to finish, by CycleCloud in under 30 minutes! And more importantly: when calculations were done, the nodes were shut down, and the user paid only $525 to access this 2048 core cluster! As many of our readers know, we built CycleCloud in 2007 and it was the first system to automate the process of creating complete compute clusters in virtual infrastructure. It is the easiest and fastest way to deploy traditional HPC clusters in the Cloud. Creating HPC environments without security in EC2 is not burdensom, but CycleCloud automates: provisioning cluster nodes with dependencies, setting up the scheduling correctly/securely, patching/maintaining OS images, setting up encryption, managing the encryption keys, administering cluster users, tracking audit information, deploying/optimizing shared file systems, application deployment, scaling appropriately based upon load, connecting to your license management software, and keeping on top of all the latest and greatest Cloud infrastructure and features. So when a very large life science research organization asked us to create a 2048-core cluster in EC2 to make... read more

Benchmarks for the brand new Cluster GPU Instance on Amazon EC2

A Couple More Nails in the Coffin of the Private Compute Cluster Update: We're getting an overwhelming response to this entry, if you have questions come to booth #4638 at Supercomputing 2010======Cycle Computing has been in the business of provisioning large-scale computing environments within clouds such as Amazon EC2 for quite some time. In parallel, we have also built, supported, and integrated internal computing environments for Fortune 100s, universities, government labs, and SMBs with clusters of all shapes and sizes. Through work with clients including JPMorgan Chase, Pfizer, Lockheed Martin, Purdue University, among others, we have developed a keen sense for use cases that are most appropriate for either internal or external computing. More and more we see the lines blurring between internal and cloud case overall performance. This is good news for end users that want to have the flexibility to consume resources both internally and externally. During the past few years it has been no secret that EC2 has been best cloud provider for massive scale, but loosely connected scientific computing environments. Thankfully, many workflows we have encountered have performed well within the EC2 boundaries. Specifically, those that take advantage of pleasantly parallel, high-throughput computing workflows. Still, the AWS approach to virtualization and available hardware has made it difficult to run workloads which required high bandwidth or low latency communication within a collection of distinct worker nodes. Many of the AWS machines used CPU technology that, while respectable, was not up to par with the current generation of chip architectures. The result? Certain use cases simply were not a good fit for EC2 and were easily beaten... read more

Make the Most of Your AWS Instances: Using open-source Condor to Harvest Cycles, Part 2

How To – Harvest Cycles From Your AWS App Servers, Part 2 In Part 1 of this series I introduced you to AmazingWebVideo Inc. They’re a successful, Amazon EC2-based, application provider who wants to get more out of their rented processors. Specifically they want to harvest unused compute cycles from various application servers in between bursty, end-user traffic. We introduced them to Condor in Part 1 and helped them move three classes of background processing jobs from a simple queuing system to Condor in preparation for cycle harvesting. Now lets take a look at how Condor, installed on their application severs, can help them accomplish this goal. In our existing Condor pool, our machines are set to service jobs always. Since the only processing load these machines experience comes directly from running Condor jobs this setup is fine. But our application servers won’t be running under Condor’s control. Condor needs to pay attention to load outside of Condor’s control and only run jobs when this load is suitably low. We’ll use Condor’s START attribute and ClassAd technology to write an expression that controls when these machines should run jobs. But first lets decide how we want the jobs to run on these machines. There is a whole spectrum of choice here and it helps to think about it advance of writing your run-time policies in Condor configuration files. Policy Time There are four state changes around which we need to develop policy: “When can Condor run jobs on this machine?”; “When should Condor suspend jobs it may be running?”; “When should Condor resume running suspended jobs?”; and “When should... read more

Experience with Data Transfer into EC2

Maximizing data throughput: Multi-stream data transfer into Amazon EC2 It is common for cloud computing articles to talk at length about the abundant hardware resources the cloud can offer the modern researcher or analyst, but little is typically said about the back end data store available with cloud computing. Before any research in the cloud can take place, data must be staged in a manner that is accessible to your cloud-based compute resources. It becomes non-trivial to perform the staging portion of your cloud use if your data sets are large. The Amazon EC2 cloud provides large quantities of hardware suitable for high speed, high throughput scientific computing. Coupled with the AWS storage and Amazon S3 system it makes a formidable platform for anyone looking to do large scale, scientific computing on quantities of file-based data. In this post we’ll explore data ingress to EC2 as download speeds out of EC2 are typically much higher, both from consumer grade and enterprise level Internet connections, as we typically see more upload than download in the practical use of AWS services. Large data sets get transferred in to EC2, working data stays on cloud-local storage, and summarized, compact results are brought back from the cloud. Let’s look at a few common cases for moving a large data set into AWS-hosted storage and explore the transfer rates, benefits and drawbacks of each approach. Case #1: Consumer Grade Internet Connection It is trivial to saturate the upstream network pipe of a consumer grade cable or DSL Internet connection transferring using only a single stream to EC2. With strong encryption on the ingress stream,... read more

Make the Most of Your AWS Instances: Using open-source Condor to Harvest Cycles, Part 1

How To – Harvest Cycles From Your App Servers, Part 1 It’s a common problem: you run a successful, cloud-based application business in Amazon’s EC2 cloud with bursty traffic. In order to handle the bursts you have to keep a minimum number of EC2 application servers up and running. Would it not be nice if you could do something with these servers between handling the bursty requests? After all: you’re paying for that time, and there’s thumbnails to generate, analytics to calculate, and batch applications to run. Enter Condor. Condor is a high throughput distributed computing environment from the University of Wisconsin, Madison (http://cs.wisc.edu/condor/) that can be configured to steal unused cycles from your application severs when they aren’t serving your main business applications to your customers. Condor provides advanced job scheduling, quota management, policy configuration, support for virtual machine based work loads, integration with all the popular operating systems in use today. And: it’s free. In the next three posts I’m going to show you how to use Condor to harness the wasted compute power on your application servers and how Cycle Computing’s CycleServer can help make this process simple and manageable. The Setup Throughout this series of posts I’m going to talk about a fictitious web application company: AmazingWebVideo Inc. They offer video hosting services and their business has been growing rapidly over the past twelve months. They already run all of their web application components in Amazon’s EC2 cloud, but the nature of their business still requires that they keep a base number of web app servers constantly running to handle the start of any bursts... read more

Multiple Condor Schedulers on a Single host

So Cyclers have run multiple schedds per host since 2004/2005 when Jason did a Disney movie with Condor, running 12 schedds per submit server with Condor 6.6/6.7 and using software to load balance jobs between the schedulers. Given the interest in this area, we thought we could help explain how to do it in detail. Often, when dealing with scheduling jobs at a large scale in Condor, it can sometimes become useful to simultaneously run more than one condor_schedd daemon on the same server. On modern, multi-core architectures, this technique can bring about several improvements: scheduler bottleneck avoidance, improved job startup times, improved condor_q query times, improved job submission times, and enhanced overall throughput. Today, our guys wrote up both the new school (condor 7.4 or later) and the old school (Condor 7.2 or older) ways of implementing multiple schedulers. Since 2006 CycleServer has done load-based, job distribution between multiple schedulers, so that won't be covered. This post will show how to set up multiple schedulers on a single host, and name the schedds in question. Hope this helps:... read more

Community Feedback: CondorWeek 2010 T-Shirts and Mugs

Hello All, It's that time of year again: picking Condor t-shirt phrases for CondorWeek! As we've done for the past few years for CondorWeek, we'd like to get suggestions for new phrases from the Condor Community. Below are some of the suggestions from this year.  If you have a new idea, please comment below this blog post (you don't have to use your real name, just give us your e-mail so we can let you know we're using it!). If we print yours, we'll give you a special surprise at CondorWeek or send it to you if you won't be in Madison this year. We're also interested in whether you'd prefer a mug to a t-shirt! Suggestions so far for 2010:1. Condor: There's a knob for that… 2. Condor: "ZKM" WTF? 3. Green Computing for Grizzlies:HIBERNATE = (isWinter) && (ExcessFat > 200) 4. ~/2010_Condor_Odyssey $ condor_rm -allTodd, I'm afraid that's something I cannot allow to happen… 5. JPMC uses Condor, BearStearns didn't. hmmmm… 6. PREEMPT = Spouse =?= "ELIN NORDEGREN" 7. My Condor pool's so big, my pool has a pool (hdfs), and even my pool's pool is bigger than your pool… 8. 2010 A Condor Odyssey: It's not HAL-9000, it's DiaGrid. Oldies but goodies:9. I am the condor_master. 10. Condor: an Evil Plot to control the world's computers. Enjoy! EVERY VOTE COUNTS! If you like one of these over the others, please comment on our blog with your preferences. ALSO, IF YOU'D RATHER GET A MUG than a T-SHIRT, let us know that as well. Please comment! We look forward to seeing what everyone comes up... read more

Considersations for Financial HPC Applications in the Cloud

Julio Gomez had a great post in WallStreet & Technology about considerations for Building a Cloud Strategy that was created by one of the Innovation Councils including the following: 1. Prepare to educate vendors2. Liability and indemnification are a major disconnect3. Nail down your authentication and federated i dentity capability4. Identify lowest risk, lowest value areas for initial forays5. Get your physical infrastructure organized This is a great piece, and the points mentioned by the council are spot on. It is important to consider security and authentication management up front, as well as applications that have low risk or downside from a business perspective. As companies start deciding the types of workloads that make sense to move into the cloud, there are a few steps that can make tremendous sense in this environment. Based on my experience in this area with high performance computing (HPC), there are a few areas to consider to ensure that applications are suitable for the cloud. Most of them revolve around data and security. When looking at hedging, pricing, trading and other quantitative applications, I would recommend the standard steps for technology adoption: Assessing entry points/applications Launching a proof-of-concept (POC) with an initial application Rolling that application into production Planning wider adoption based upon lessons learned from the POC In assessing potential HPC applications for cloud, I would recommend reviewing criteria including the authentication, risk factors, and regulatory requirements from the council meeting, plus the following: Latency requirements – External environments don't lend themselves to low latency at this point in time relative to internal InfiniBand environments Data volume and source – If the data... read more

Follow up on Life Science Leader

So a short while ago, I wrote an article on HPC in the Cloud for Life Science Leader, a widely-read monthly publication for life science executives. We had some nice responses, but today google notified me that David Dooling posted a response to my article that has a few inaccurate, IMHO, points, so while my response is awaiting moderation on his page, I thought I'd post it here (sorry for the typo on my first comment, David): David, Good to meet you. I've read your blog before, and take issue with your arguments regarding the cloud. I have more posts on Cloud at my blog http://blog.cyclecomputing.com which, as well as the post below. This is an interesting area, and I'd love to correspond with you more at the e-mail at the bottom: > No wonder he makes cloud computing sound so attractive. No mention of the> IT expertise needed to get up and running on the cloud. No mention of the> software engineering needed to ensure your programs run efficiently on> the cloud. You are implying that to get running in the cloud, an end user must worry about the "IT expertise" and "software engineering" needed to get applications up and running. I believe this is a straw-man, an incorrect assertion to begin with. One of the major benefits of virtualized infrastructure and service oriented architectures is that they are repeatable and decouple the knowledge of building the service from the users consuming it. This means that one person, who creates the virtual machine images or the server code running the service, does need the expertise to get an... read more

Poster & Lightning Talk: BLAST Workflow Performance on EC2 @ Rocky Mountain Bioinformatics

We've done quite a bit of application engineering for running various life science workflows in Amazon EC2, and we're (finally) getting some of our analysis of running BLAST on Condor in EC2 up on the web, so I wanted to share.  Back in 2008, we did a benchmark of the performance of running BLAST workflows running on Condor on EC2, and gave a Poster & Lightning Talk at Rocky Mountain Bioinformatics . Results: We were able to get 1.9825x performance for every 2x the cores on a Condor cluster in EC2. One of the more interesting visualizations that one of our talented guys, Ian Alderman, did, was on the reason that jobs run efficiently when done in high throughput environments. Basically, the chart below shows the run times of the first(orange), second (yellow), third (green), etc. tasks running on various processors in the cluster (the separate rows in the chart), with time running left to right. As you can see, high throughput computing takes advantage of the fact that different tasks/jobs, in this case from the same user, take different amounts of time. As a result, over time processors balance out their usage to finish comparatively close together in run-time: We will post more details about how our BLAST pipeline works in an upcoming post, but you can find the poster with more detail... read more

Life Science Leader: Cloudy Future for HPC in Life Sciences?

Early this week, Life Science Leader – a widely-read monthly publication for life science executives – posted an article I wrote about "Is the Future Of High-Performance Computing For Life Sciences Cloudy?" It covers a number of different topics including work we've done with Schrödinger on Amazon EC2, and the benefits of using Clusters As a Service on the Cloud. We think that cloud-based HPC/HTC are going to change the way life science researchers get their work done, especially with applications like Schrödinger's Glide. Update: Since this came out it has also been covered by Matthew Dublin at Genome Web. Matthew has some other great articles, including an overview of acceleration techniques in bioinformatics and an article on Nature Biotech ponders the... read more

Re: Condor Cloud-Aware Capabilities

We had a bunch of interest in the last post we made with a Correction to Stephen Wilson's blog from Sun indicating the SGE 6.2u5 release, in December 2009, was the first cloud-aware scheduler in enabling scheduling to Amazon EC2 and scheduling of Hadoop jobs. Miha Ahronovitz posted some questions that I wanted to address specifically. First, Miha, thanks for your response about Condor's capabilities.  By the questions asked and other feedback we've gotten from the first post, it seems like folks may be interested in finding out more about Condor and what it can do, so I highly recommend that you attend CondorWeek that the Condor Team hosts every year or feel free to contact Cycle. CondorWeek will give you a broad spectrum of all the cool areas people use Condor. As I stated in the last blog, as full disclosure, we started using Condor 6 years ago, and Hadoop/SGE/PBS within the last 2 years. It may sound like this is a Condor-only vantage point, but we are a unique company in that we support these other schedulers, and truly use the proper tool for the client in various scenarios. Given Miha's questions, please allow me to provide answers about Condor's capabilities, and ask about an SGE feature I saw: "Can you use Condor to transform a private cloud from in a cost center to a profit center? On-demand means billing, billing means making money." Yes. Condor was first able to manage internal VM-based infrastructure in the 6.9.4 release*, available on Sept. 5, 2007. Usage Tracking for billing/chargeback has been done by the Condor Accountant for 10-20 years... read more
ShareShare on FacebookShare on Google+Share on LinkedInTweet about this on TwitterEmail this to someone