Improving ALS research with Google Cloud, Schrödinger, and Cycle Computing

Today we published a case study describing how the use of Google Cloud enabled one professor to do work she never thought possible. May Khanna, Assistant Professor of Pharmacology at the University of Arizona, studies pharmacological treatments for pain. Her specific area of expertise focuses on research that uses protein binding to develop possible treatments. Using our CycleCloud™ software to manage a 5,000 core Google Cloud Preemptible VM cluster running Schrödinger® Glide™ has enabled research she never thought possible. For this project, Professor Khanna wanted to analyze a protein associated with amyotrophic lateral sclerosis, also known as “ALS” or “Lou Gerhig’s disease”. ALS has no known cure and causes pain and eventual death for some 20,000 people in the United States every year.

read more

Simulating Hyperloop pods on Microsoft Azure

Earlier today, we published a case study and press release about some work we did with the HyperXite team from the University of California, Irvine team and their efforts in the Hyperloop competition. This team leveraged CycleCloud to run ANSYS Fluent™ on Microsoft Azure Big Compute to to complete their iterations in 48 hours, enabling them to get results fast enough to make adjustments and modifications to the design then rerun the simulations until they were able to converge on a final solution. All for less than $600 in simulation costs. This was a case where Cloud enabled them to do something they could not have done any other way. As a bit of background, Elon Musk’s SpaceX started the Hyperloop project as a way to accelerate development of a fast, safe, low-power, and cheap method of transporting people and freight. HyperXite was one of 27 teams that competed recently. Nima Mohseni, the team’s simulation lead, used the popular computational fluid dynamics software ANSYS Fluent™ to perform modeling of the pod. Key areas that the team modeled were related to the braking approach that they were using. Through the use of simulation, they were able to show that they could brake with just the use of magnetic force, removing the need for mechanical brakes. This reduced weight, increased efficiency, and improved the overall design, which was recognized with a Pod Technical Excellence award last year. Using the CycleCloud software suite, the HyperXite team created an Open Grid Scheduler cluster leveraging Azure’s memory-optimized instances in the East US region. Each instance has 16 cores based on the 2.4 GHz Intel... read more

LAMMPS scaling on Azure InfiniBand

While public clouds have gained a reputation as strong performers and a good fit for batch and throughput-based workloads, we often still hear that clouds don’t work for “real” or “at scale” high performance computing applications. That’s not necessarily true, however, as Microsoft Azure has continued its rollout of Infiniband-enabled virtual machines. InfiniBand is the most common interconnect among TOP500 supercomputers, and Microsoft has deployed the powerful and stable iteration known as “FDR” Infiniband. Best of all, these exceptionally high levels of interconnect performance are now available to everyone on Azure’s new H-series and N-series virtual machines. To see how well Azure’s Infiniband works, we benchmarked LAMMPS, an open source molecular dynamics simulation package developed by Sandia National Laboratories. LAMMPS is used widely-used across government, academia, and industry, and is frequently a computational tool of choice for some of the most advanced science and engineering teams. LAMMPS relies heavily on MPI to achieve sustained high performance on real-world workloads, and can scale to many hundreds of thousands of CPU cores. Armed with H16r virtual machines, we used the Lennard-Jones liquid benchmark. We selected the “LJ” benchmark and tested two scenarios: “weak scaling”, in which every core simulated 32,000 atoms no matter how many cores were utilized, and “strong scaling” which used a fixed problem size of 512,000 atoms with an increasing number of cores. Both scenarios simulated 1,000 time steps. We performed no “data dumps” (i.e. intermediate output to disk) in order to isolate solver performance, and ran 30 test jobs per data point in order to obtain statistical significance and associated averages. In summary, the results were impressive... read more

Cycle Computing Collaborates with ANSYS on its Enterprise Cloud HPC Offering

CycleCloud to provide orchestration and management for leading engineering simulation cloud offering New York, NY – (Marketwired – February 2, 2017) – Cycle Computing, the global leader in Big Compute and Cloud HPC orchestration, today announced that ANSYS has officially chosen its CycleCloud product to spearhead the orchestration and management behind the ANSYS® Enterprise Cloud™. ANSYS is the global leader in engineering simulation bringing clarity and insight to its customers’ most complex design challenges. Many ANSYS customers require simulation workloads to be migrated to the cloud, as customers look to leverage dynamic cloud capacity to accelerate time to result, shorten product development cycles and reduce costs. ANSYS Enterprise Cloud, an enterprise-level engineering simulation platform, delivered on the Amazon Web Service (AWS) global platform using the CycleCloud software platform, enables this migration, including secure storage and data management and access to resources for interactive and batch execution that scales on demand for virtual-private cloud (VPC) for enterprise simulation. “Our collaboration with Cycle Computing enables the ANSYS Enterprise Cloud to meet the elastic capacity and security requirements of enterprise customers,” said Ray Milhem, vice president, Enterprise Solutions and Cloud, ANSYS. “CycleCloud has run some of the largest Cloud Big Compute and Cloud HPC projects in the world, and we are excited to bring their associated, proven software capability to our global customers with the ANSYS Enterprise Cloud.” Cycle Computing’s CycleCloud will optimize ANSYS Enterprise Cloud with the orchestration of cloud HPC clusters with ANSYS software applications in the cloud, ensuring optimal AWS Spot instance usage, and ensuring that appropriate resources are used for the right amount of time in the ANSYS... read more

New customer support portal

Our Customer Operations team has recently switched to using Freshdesk for managing customer interactions. This gives our customers one of the most-requested features: the ability for their managers to easily view tickets for the entire organization. In addition, tickets can be created, edited, and escalated via the portal at

read more

Cycle Computing accepted into Microsoft Accelerator

Last week, Microsoft announced Cycle Computing as one of 10 companies accepted into the latest group of its Accelerator program. The Microsoft Accelerator program is designed to help late-stage startups ramp up their whole organization: sales pipeline development, CEO coaching, finance, human resources, and marketing. Founded about 10 years ago, Cycle Computing has been growing strongly as use of cloud computing has grown. With cloud moving from “Why would you?” to “Why wouldn’t you?” Cycle Computing’s growth has been accelerating. Being part of this program will open even more doors.

read more

Cloud-Agnostic Glossary

No two cloud service providers are the same. This applies not only to the services they provide, but to what they call the services. At Cycle Computing, we spend a lot of time working with multiple cloud service providers; being able to abstract away small differences in providers is one of the compelling features of CycleCloud…

read more

CycleCloud Support for Elastic File System (EFS)

Last week, Amazon released the Elastic File System (EFS). EFS provides a scalable, POSIX-compliant filesystem for Amazon EC2 instances without having to run a file server. This means you can grow your storage as your usage increases instead of having to pre-provision disks. Instances mount EFS just as they would any traditional NFS volume….

read more

How Cloud Computing Is Helping to Conquer Cancer

Cycle Computing’s CTO, Rob Futrick, just published a great article about how The Cancer Genome Atlas (TCGA) is using Cycle Computing and cloud resources to help find markers and risk factors.  TCGA is a project within the National Institutes of Health (NIH) that uses genome sequencing and bioinformatics to catalog cancer-causing genetic mutations. You can read the full article here.  ... read more

What I learned from years in the cloud

As part of the HPC User Forum, Jason Stowe discussed some of key benefits that we have found after years of making customers successful doing their HPC and large computation in the cloud. In this presentation, he breaks it down to a number of key learnings. And, for the time pressed, he touches on the summary early in the presentation. You can see a video of the talk here  or watch below.  ... read more

Great Financial Services cloud talk at HPC User Forum

Jeffrey Smart from AIG did a great presentation at the HPC User Forum in Tucson, AZ April 11-13. He described his experience in moving some of the time critical AIG Risk Analysis workload from finite internal resources to the full flexibility of the cloud. His talk includes discussing how they were able to expand their workflow and offering to the business and some comments about the challenges in going cloud. You can see the video... read more

CycleCloud 5.5.6 adds “report_error” to simplify error reporting

At Cycle Computing, our support team knows that customers rely on our products for business-critical work. So we are continuously working to provide faster, better, and more accurate support for our customers. Our latest release includes a new feature designed to simplify issue reporting, by automating the capture and transfer of your log and environment information. This enables us to respond much more quickly – and with much less customer effort – when customers run into problems.
We’re proud to announce a new feature in CycleCloud 5.5.6: report_error.

read more

Jason Stowe, Cycle Computing CEO, presented at the 2016 Stanford HPC Conference

On February 24th, Jason helped to open the 2016 HPC Advisory Council event at Stanford University. Jason’s talked about how nine years of Cloud HPC experiences has taught us a number of lessons about things like security, scale, workload planning, cost models, multi-users, multi-cloud, and the difference between running one job versus multiple workflows in the cloud. InsideHPC captured the talk on video so that you can see it in its entirety or just watch it below  ... read more

Jason Stowe, Cycle Computing CEO, presented at the 2016 Stanford HPC Conference

On February 24th, Jason helped to open the 2016 HPC Advisory Council event at Stanford University. Jason’s talked about how nine years of Cloud HPC experiences has taught us a number of lessons about things like security, scale, workload planning, cost models, multi-users, multi-cloud, and the difference between running one job versus multiple workflows in the cloud. InsideHPC captured the talk on video so that you can see it in its entirety.... read more

Cycle CTO Rob Futrick to detail how cloud computing enabled 15.6 years of computation based analysis on The Cancer Genome Atlas (TCGA) in six months

Chief Technology Officer, Rob Futrick, will address attendees at the Molecular Med Tri-Conference 2016, to be held March 6-11 at the Moscone North Convention Center in San Francisco.  Rob will address the Best Practices in Personalized and Translational Medicine course scheduled for Monday, March 7th.  His discussion titled, Fighting Cancer Using Cloud to Collaborate & Compute: Gene Fusion Detection Science Project, kicks off the five-pronged course at 8:10 am. Read more about his talk here.  ... read more

Running MPI applications in Amazon EC2

Despite significant improvements over the years, the same criticisms still color people’s opinion of using cloud environments for HPC. One of the most common things to hear when talking about using Amazon’s Elastic Compute Cloud (EC2) for HPC is “Sure, Amazon will work fine for pleasantly parallel workloads, but it won’t work for MPI….

read more

OpenSSL Vulnerabilities Announced, Cycle Computing Products Unaffected

On March 19, 2015, the OpenSSL developers announced a set of releases to address some high-priority security issues. Because OpenSSL is an important part of secure communication on the Internet, Cycle Computing has paid close attention to these vulnerabilities. Cycle Computing products (including CycleCloud and DataMan) do not rely on these libraries, and are therefore unaffected. ….

read more

The Quiet Revolution: How Cloud Cluster Computing Has Defined the New Speed & Agility of Business

Recently, Cycle Computing CEO Jason Stowe was featured among four other business leaders and innovators in an online book published by Chef, called The Quiet Revolutionaries. Chef, a software automation platform company, and the other people featured in the book, called Quiet Revolutionaries, all share a vision for serving up a better customer experience, faster.

read more

HPC on Cloud: Results from the Record-breaking 156,000 Core MegaRun, and Insights to On-demand Clusters for Manufacturing Production Workloads

After a year of production workloads on AWS, HGST, a Western Digital Company, has zeroed in on understanding how to create on-demand clusters to maximize value on AWS. HGST’s David Hinz outlines the company’s successes in addressing the company’s changes in operations, culture, and behavior to this new vision of on-demand clusters. In addition, the session will provide insights into leveraging Amazon EC2 Spot, versus Reserved Instances to maximize value, while maintaining the needed flexibility, and agility that AWS is known for.

read more

Video: Jason Stowe Outlines the New Enterprise Workload of Cloud HPC in AWS re:Invent 2014 Interview

Recorded Live from the re:Invent 2014 (read our re:Invent blog) show floor in Las Vegas, Cycle Computing CEO Jason Stowe is interviewed by The Cube. In this in-depth conversation, Stowe covers everything from Cycle’s latest Enterprise-speed Cloud cluster it enabled for HGST (a Western Digital company) in November 2014 (read about the record-setting Enterprise Cloud HPC run here), to the company’s record-setting 156,000 core run conducted for researchers at University of Southern California (USC). But all of this isn’t done to set records and build bigger Cloud clusters, Stowe outlines – it’s all in the spirit of better arming researchers & engineers with the powerful tools they need to help them do their jobs better. “The idea that you can just borrow 10-20,000 cores and give [those cores]  back when you’re done is just crazy. This all leads to enabling researchers & engineers to ask the right question – at a scale [beyond] their own internal clusters, to allow them to get better answers, faster.” – Jason Stowe, Cycle Computing CEO Click below to watch and listen to the full... read more

HGST buys 70,000-core cloud HPC Cluster, breaks record, returns it 8 hours later

By Jason Stowe, CEO Today we have a very special workload to talk about. HGST, a Western Digital company, fully embraces the philosophy of using the right sized cluster to solve a problem, but in a plot twist, they return the cluster once they’re done innovating with it. In this case, during a Friday 10:30 a.m. Session, BDT311 at AWS re:Invent, David Hinz, Global Director, Cloud Computing Engineering at HGST, will talk about this extensively. He will describe a number of workloads that are run as part of HGST’s constant push to innovate and build superior drives to hold the world’s information, but some workloads are larger than others…   Technical Computing: The New Enterprise Workload The folks at HGST are doing truly innovate work in technology, in part by enabling agility for technical computing workloads for engineering.  Technical computing, including simulation and analytics, HPC and BigData, is the new workload that every enterprise has to manage. One of HGST’s engineering workloads seeks to find an optimal advanced drive head design, taking 30 days to complete on an in-house cluster. In layman terms, this workload runs 1 million simulations for designs based upon 22 different design parameters running on 3 drive media Running these simulations using an in-house, specially built simulator, the workload takes approximately 30 days to complete on an internal cluster.   World’s Largest Fortune 500 Cloud Cluster Run First, we found out about this workload this past Wednesday, and our software ran it at scale this past weekend! To solve this problem, our CycleCloud software created a compute environment out of AWS Spot Instances. Over 50,000... read more

HGST buys 70,000-core cloud HPC Cluster, breaks record, returns it 8 hours later

By Jason Stowe, CEO Today we have a very special workload to talk about. HGST, a Western Digital company, fully embraces the philosophy of using the right sized cluster to solve a problem, but in a plot twist, they return the cluster once they’re done innovating with it. In this case, during a Friday 10:30 a.m. Session, BDT311 at AWS re:Invent, David Hinz, Global Director, Cloud Computing Engineering at HGST, will talk about this extensively. He will describe a number of workloads that are run as part of HGST’s constant push to innovate and build superior drives to hold the world’s information, but some workloads are larger than others… Technical Computing: The New Enterprise Workload The folks at HGST are doing truly innovate work in technology, in part by enabling agility for technical computing workloads for engineering.  Technical computing, including simulation and analytics, HPC and BigData, is the new workload that every enterprise has to manage. One of HGST’s engineering workloads seeks to find an optimal advanced drive head design, taking 30 days to complete on an in-house cluster. In layman terms, this workload runs 1 million simulations for designs based upon 22 different design parameters running on 3 drive media Running these simulations using an in-house, specially built simulator, the workload takes approximately 30 days to complete on an internal cluster. World’s Largest Fortune 500 Cloud Cluster Run First, we found out about this workload this past Wednesday, and our software ran it at scale this past weekend! To solve this problem, our CycleCloud software created a compute environment out of AWS Spot Instances. Over 50,000 Intel IvyBridge cores of... read more

Cloud HPC Has Ecosystem & Tools In Place for Wide Market Adoption – Cycle Computing’s AWS re:Invent Preview

In the three years since AWS has hosted it’s re:Invent conference in Las Vegas, we’ve seen the Cloud HPC story go from being told by the early adopters that it works, to last year’s record-breaking news around scale and performance with the MegaRun, to this year, where we’ll be demonstrating a massive shift in Cloud HPC adoption, and showcasing the tools and maturing ecosystem that is enabling it.

read more

Students Surprise, Step Up as Next Generation HPC Professionals

In today’s HPC industry, we have a shortage of talented engineers, designers, and developers. One can only expect the issue to become worse, as our top professionals retire. While these programs involving students are great – we need more of them. Some of these same students may be drawn towards developing social media, and smart phone technologies simply because they are unaware of the role our industry plays in making the world better … in so many ways.

read more

#HPCMatters: Vote for Cancer Research, High-tech Hard Disk design, and PetaFLOP-Scale HPC Cloud Computing

HPC Matters (#HPCMatters), which is why we do it. We are honored to be nominated seven times for the 2014 Annual HPCWire Readers’ Choice Awards this year. But the real credit goes to the people who make it matter: Novartis, HGST and University of Southern California. Check out the links detailing more of the work they performed and vote for them.

read more

Enterprise HPC Conference Takeaway: Finding the Best Answer is Most Important

It’s exciting to see a broad understanding and acknowledgement that the most important thing today is getting results, and finding better answers; and this is exactly why I am so energized to be here at Cycle. While some people may call us Cloud HPC, cloud is not really the thing – and neither is HPC. Faster answers is what drives us and for today, HPC and cloud happen to be an extremely promising path to a whole lot more of them.

read more

Cloud HPC – The CIO’s answer to more computing now

CIO’s today are facing enormous pressure. With the combination of highly accurate simulation software, and extremely powerful computing resources, the tools of today’s engineers, researchers, and quants are vastly different than the wind tunnels, test tubes, and spread sheets from the not so distant past. As our ability to simulate in greater detail, and with greater confidence has risen, so has the hunger for greater computing resources. High Performance Computing (HPC) has moved from the national laboratories, and universities, to Fortune 100 companies, and now today we see the future of where HPC is not only a competitive advantage, but a requirement to compete.

read more

Video: How Cloud HPC can deliver the computing needed to tackle the world’s hardest problems

What technology can advance rocket design, disease fighting drugs, better computer equipment, and even Climate Change at the same time? Well a simple answer is through advanced software simulations using CAD, CAM, pharmaceutical modeling, and computational fluid dynamics using High Performance Computing (HPC) of course. But digging deeper we can see that Cloud HPC, and the ability to tap into nearly limitless computing, or Utility HPC is perhaps the biggest game-changer we’re seeing. In this presentation at ChefConf 2014, Cycle Computing CEO Jason Stowe outlines the biggest challenge facing us today, Climate Change, and suggests how Cloud HPC can help find a solution, including ideas around Climate Engineering, and Renewable Energy. As proof points, Jason uses three use cases from Cycle Computing customers, including from companies like HGST (a Western Digital Company), Aerospace Corporation, Novartis, and the University of Southern California. It’s clear that with these new tools that leverage both Cloud Computing, and HPC – the power of Cloud HPC enables researchers, and designers to ask the right questions, to help them find better answers, faster. This all delivers a more powerful future, and means to solving these really difficult problems.... read more

Novartis Taps Cloud HPC for Faster Drug Discovery & Better Science

Take Novartis Institutes for Biomedical Research, and the presentation they delivered at the 2014 AWS Summit in New York. In an important fight against cancer, the company leveraged the power and agility of Cloud computing to conduct 39 years of computing – in just 11 hours. They also created a computing system that would cost an estimated $44 million to build – all in the Cloud – for a cost of only $5,000.

read more


Infographic Timeline view larger The Evolution of the Supercomputer Computers arose from the need to perform calculations at a pace faster than is possible by hand. Once that problem was solved, the race was on to pit computers against themselves and meet ever-increasing demands for processing power. At first, the race was all about improving raw calculation speed and capabilities. Then, the challenge of solving more difficult problems led to improvements in programming models and software. Eventually, supercomputers were born, enabling scientists and engineers to solve highly complex technical problems. Whereas once supercomputers were simply mammoth machines full of expensive processors, supercomputing today takes advantage of improvements in processor and network technology. Clusters, and now even clusters on the cloud, pool the power of thousands of commodity off-the-shelf (COTS) microprocessors into machines that are among the fastest in the world. Understanding how we got here requires a look back at the evolution of computing. Early computing The earliest computers were mechanical and electro-mechanical devices, but the first high-speed computers used tube technology. Tubes were then replaced by transistors to create more reliable, general-purpose computers. The need for increased ease-of-use and the ability to solve a broader set of problems led to breakthroughs in programming models and languages, and eventually, to third-party application software solutions. By 1954, IBM offered the IBM 650, the first mass-produced computer. FORTRAN, an important language for numeric or computational programs, was developed at this time by IBM’s John Backus. In the early 1960s, general-purpose computers appeared from several suppliers. The next step was to design systems to support parallel operations, in which calculations are performed... read more

Bio-IT Recap: How Access to Cloud HPC Drives Better Science, Faster

The Better Science. Faster. theme comes from the fact that researchers & scientists can now ask the right questions by working with Cycle Computing. When their questions are no longer limited by the in-house computing resources they have, they can ask better questions, which lead to better answers … you see where this is going: Better Science. As far as Faster … nearly everything Cycle Computing helps enable is about doing things faster.

read more

HPC Quiz Winners Claim Bragging Rights, Showcase Prizes, Bask in Glory

As part of the HPC History poster, we put together a fun quiz based on historical facts and trivia. We had a lot of fun creating the quiz, and even more fun viewing the results. Before we can move on to our next project and celebrate more historical technology innovation and scientific breakthroughs, we need to recognize people who know the roots and history of HPC better than anyone.

read more

Schrödinger Materials Science Partnership a Sign of Things to Come

At Cycle Computing, we've had a long, successful working relationship with the software company Schrödinger. It's always fun when you're #winning – and not the Charlie Sheen way. Rather, together we've been breaking records, advancing meaningful science – and proving out Cloud computing as the future in high performance computing (HPC) for sometime now. It’s about time we formalize the partnership. The things we have been doing with Schrödinger is not only exciting because of what we’ve done – but more importantly of what it represents as far as capability for the future! Enabling better science – and breaking-records. Enabling better science The thing we at Cycle Computing are most proud of is that through partners like Schrödinger Materials Science group, we’re enabling better science. Greater access to computing power is the key. Things like fighting cancer, or developing clean energy products are worthwhile causes that our joint technology is advancing. While there are a lot of factors that have been driving us beyond the tipping point in Cloud HPC adoption, I believe there are three legs to the stool making it all happen. Powerful cloud infrastructure (AWS) Highly accurate, and trusted simulation software (Schrödinger) Orchestration software to enable the software to run on the Cloud (Cycle Computing) Setting Records Sure we’ve been beating our chests lately – and for good reason. The engineering teams at Schrödinger and Cycle Computing have proven out the capabilities and scale of what’s possible on the Cloud. In fact it was really exciting to see Amazon Web Services (AWS) CTO Werner Vogels mention us and the MegaRun at the November re:Invent Keynote address:... read more

Intel Cloud Technology Announcement Features Novartis

Intel is a master gardner. When you step back and consider all the computing innovation company has done to foster and create IT ecosystems over the years, the company’s massive green thumb for technology is undeniable! And therein lies the reason we at Cycle Computing are very excited to see the attention Intel is putting on Cloud HPC. Even more exciting, is work we’ve done with a customer, Novartis Institutes for Biomedical Research, is featured in a recent Intel announcement about this effort. Intel announced its Cloud Technology Program to help businesses identify and leverage top performing technlogies for cloud applications. Pretty cool. It seems Intel has moved above ecosystems, and is now focused on clouds in the atmonsphere too. As we know, when clouds get seeded – rain follows! Cycle Computing has generated a lot of attention lately due to some of our exciting Cloud HPC records, the most recent being The MegaRun, a 156,000+ core Cloud computing in 18 hours, over five continents (all 8 AWS regions), for research being conducted at University of Southern California (USC). BUT it’s important to understand that most of Cycle Computing’s customers are doing more modest Cloud runs, on a daily basis.  Novartis is one of those customers, and is focused on accelerating science and research in its fight against disease. We’ve been working with them for some time. They use both traditional HPC resources, as well as the Cloud. This is how the Intel Cloud Technology Program announcement references Novartis: For example, Novartis Institutes for Biomedical Research performed an extensive analysis of instances to find that choosing a premium high-performing one would provide... read more

HPC History Quiz Still Seeks Perfect Score

The window is still open – to recognize a superstar who can demonstrate their knowledge of high performance computing (HPC) history. Despite 100+ entries, we have yet to receive a perfect score on our Test Your Knowledge of HPC History Quiz yet. Is that because you haven’t YET taken the quiz? Maybe. If you think you are the ONE who can pull the sword out of the stone – it’s time to step forward. I said earlier there have been no perfect scores, but that’s not exactly true. I took the quiz and aced it. But – I wrote the quiz, so that doesn’t count. I also see where a few people took the quiz twice – which is completely fine – but your first score is the only one that counts. The Quiz is still live now. You can take it by visiting now – and you should. Redeem our industry – make us proud. We’ll shut down the quiz Dec. 31, 2013. This quiz is part of a broader Celebration of Supercomputing History campaign that Cycle Computing unveiled at the SC13 Conference in Denver this November. At the event we handed out limited-edition, and what I think turned out to be really cool, History of Supercomputing timeline posters. If you missed picking up a copy – stay tuned, we’ll make an announce soon that offers you a way to checkout some of the significant milestones in HPC History.  In the mean time – with 2013 winding down, taking this short 20-question quiz is a perfect way to spend your time appreciating where we’ve come from, and... read more

Cycle Computing Featured in AWS Webinar Dec. 12: Accelerate Manufacturing Design Innovation with Cloud-Based HPC

We have a very exciting webinar coming up this Thursday, Dec. 12, where we have a great user speaker, and real customer use cases that will overview Cloud Computing & Utility HPC and how to accelerate Manufacturing Design. In the one-hour webinar, we’ll review how Cloud computing and Utility HPC is being used today as a competitive advantage with CAD/CAM and electronic design automation (EDA), with the ability to spin up clusters running common industry applications. Real-world manufacturing use cases will be discussed, and we’re honored to have Cycle Computing customer from HGST, a Western Digital Company, overview his experience using HPC in the Cloud. We’ll also be showcasing details on our Record-breaking MegaRun – the 156,000+ core run we received so much publicity for back in November.  RSVP for the Webinar here: This webinar follows a busy November for Cycle Computing! Two of the biggest events of the year for Cloud computing and Utility HPC were held back to back: AWS re:Invent in Las Vegas, and then SC13 in Denver. One of the highlights of these events was the first PetaFLOPS-scale cloud cluster, our record-setting MegaRun Cloud computing run – a Utility HPC milestone – and the World’s Largest and Fastest Cloud Computing Run. This run was highlighted in AWS CTO Dr. Werner Vogels’ Keynote Address. Below are some details: Real-world Science Cutting-edge Clean Energy Research thanks to Schrödinger Materials Science tools  ( Simulated ~205,000 organic semiconductors to determine efficiency as solar materials MegaRun Stats 156,000+ cores Measured at 1.21 PetaFLOPS peak throughput (not RMax) 264 years of computing in 18 hours World wide cloud run: Across all... read more

Back to the Future: 1.21 petaFLOPS(RPeak), 156,000-core CycleCloud HPC runs 264 years of Materials Science

Hi everyone, it's Jason, and I have an exciting environment to tell you about. If I had asked Cycle engineers to create a massive HPC cluster to supply 1.21 PetaFLOPS of compute power, their reaction would be something like this:    Thankfully, instead, we asked them to just put together another BigCompute cluster to help advance materials science. Lot easier pill to swallow, isn't it?   Materials Science, but Back in Time How do you use Utility HPC to advance materials science you ask? Well, first, we started working with an amazing researcher, Professor Mark Thompson, who got his PhD in Chemistry from Cal Tech, and now does research on organic semiconductors, specifically organic photovoltaic solar cells. The challenge in solar power has always been the efficiency. Humanity has to find a material that can turn photons from the sun into electricity with as little waste as possible. The more efficient it is, the more electricity from each soalr panel, the more viable material. The total number of potential solar panel materials is limitless. But that actually makes it that much more difficult to find the best material out of all all the possibilities. As Professor Thompson puts it, "If the 20th century was the century of silicon materials, the 21st will be all organic. The question is how to find the right material without spending the entire 21st century looking for it." Hear, hear! Designing a new material that might be well-suited to converting sunshine into electricity is a difficult challenge. First, for any possible material, just figuring out how to synthesize it, purify it, and then analyze it, typically takes... read more

Recognizing HPC Invention: Cycle Computing & Customers Receive 4 HPCWire Readers Choice Nominations

With so much going on, from technical sessions, to new product launches from vendors, and showcases of scientific and engineering achievement – it’s sometimes nice to step back and recognize the amazing work that’s happened over the past year. And this is exactly what the HPC Wire Readers Choice Awards ( are all about.

read more

Big Conferences, Big Sciences, Big Awards

What a blur the past two months have been – in a good way. Industry conferences were in full bloom this April and May so the Cycle team took to the road. Here are just a few of the highlights from our travels…   We kicked off April with the Bio IT World Conference & Expo in Boston. As always, it was an impressive conference in terms of the caliber of the audience and content. I had the honor of chairing the “Risk and Strategy for Pharma in the Cloud” Track, as well as giving a session that highlighted real-world experiences from our Life Sciences customers. The tremendous momentum and impact Utility HPC is having in Life Sciences were further validated when Bio IT World named our customer & partner, Schrödinger, as their BestPractice Grand Prize Winner in IT Infrastructure. Congratulations Schrödinger! Next stop: the Big Apple for AWS Summit NYC. Hearing a shout out from Werner Vogels during his keynote was a great start to the day. And a big thanks to Stephen Elliott, Sr. Product Manager, EC2, for inviting us to speak during his informative talk on “Optimizing Your AWS Applications and Usage to Reduce Costs.” It was great to share the success we’ve had leveraging spot instances in a wide range of use cases, including the 10,600 server cluster, or $44M worth of compute infrastructure, to run 39 years of science for $4,372! During the NY Summit, we also announced Cycle’s newest product, DataManager, which facilitates the storage and transfer of data. Customer response has been extremely positive, and we’re proud to report that a Top... read more

Announcing ZeroCompute™: Eco-conscious, Instantaneous Utility Supercomputing

Today’s researchers are shackled by a lack of access to compute power. Novel advances in modern science and engineering demand higher performing systems to solve increasingly difficult problems.   Cycle is pleased to announce our new intelligent orchestration software to meet these challenges, called “ZeroCompute™”. ZeroCompute solves access to remote high performance cloud computing environments with a patent pending technology designed to accelerate access to HPC and BigData systems. This approach can simultaneously manage science that might normally require brontobyte datasets, and five hundred billion concurrent jobs. After years building Utility HPC & Utility Supercomputing software, Cycle’s engineers have discovered that the fastest way to execute any algorithm is to just not run it at all. Although seemingly obvious, by simply not running the science on any cores, we remove the computational challenges and data transfer bottlenecks of today’s BigCompute and BigData workloads. Because ZeroCompute “completes” the floating point in the nanosecond it takes the software to decide not to run anything, accordingly, the system exceeds a peak floating-point performance of over one billion petaflops, at a cost of $0.00 per flop. By combining ZeroCompute with a nice hammock and an adult beverage, all researchers can now compute with ease. “This is earth shattering technology! For years we have been extremely worried about how can we possibly compute against such large datasets,” says Dr. James Cuff, Cycle’s Chief Technology Officer. “Now with ZeroCompute, we just don’t have to worry. It’s a relief that we can kick back, and allow ZeroCompute to not run the workload for us. It’s so easy anyone can do this, simply by doing nothing!” We... read more

Servers Are Not Houseplants: Servers Are Wheat!

Once upon a time, in the early evolution of computing, it was a time of great opulence, and all was well with the kingdom. This was a time when when every computer server had a title and they were loved and anthropomorphized by the humans. The elder humans would give them mawkish little names and take tender loving care of each and every one of them, talking to them, hugging them, feeding them individually and treating them with great care, understanding the uniqueness of each and every one. Ah, history! It teaches us so many things! These primitives were not alone, even today the British Royalty have been known to talk to their house plants! Hands up how many of us have named our servers?  We come up with all sorts of names much inspiration has come from Starwars, names of planets, names of beverages and other popular motifs. Our friend Peter Cheslock started a lovely twitter thread, which also included replies from Pinterest. Think about it, Pinterest’s entire business is about being social! Even their catchphrase is “A few (million) of your favorite things!“, but still these folks don’t name their servers, a UUID is just dandy! So why is this?  Why don’t we give cutesy names to our servers any more? Our answer here at cycle is: Scale has changed everything! The ubiquity of cloud computing and utility supercomputing clearly changes all of this. The amount of science we need to execute on a daily basis changes this. We don’t have time to nourish and look after individual computer servers, or houseplants for that matter! Accordingly: Servers... read more

Enterprise HPC in the Cloud: Fortune 500 Use Cases

Cycle Computing invited some of their customers to AWS re:Invent to talk about the amazing science, analytics and discovery they are executing at scale in concert with Cycle technology platforms. Our segment was called “Enterprise HPC in the Cloud: Fortune 500 Use Cases“. First up Jason Stowe Cycle Computing CEO gives a wonderful overview of why Utility Supercomputing really matters for each and every one of us!   However, we also thought it would be wonderful to share our customers activities with you so you may see what leading players in the field of Utility Supercomputing are doing on a day to day basis. There are four amazing talks below, each do an excellent job of highlighting the challenges and solutions for each domain.  The talks describe the use of utility supercomputers to solve real world problems in Life Science, Finance, Manufacturing, Energy, and Insurance. The latest and greatest in advanced workloads are discussed that are helping push these industries and technologies forward through the insights gained from big data and high performance computing, including a sneak peak into what they see as coming next. First up, Kurt Prenger and Taylor Hamilton of J & J High Performance Computing:   Challenges: Disparate internal clusters, lack of resource for large problem execution, and home grown systems pain. Solutions: Cycle Computing SubmitOnce – jobs are routed to internal and external clusters via configurable intelligence based on cluster loads, data size and estimated runtimes. Single entry point for users without having to understand complex backend systems. Next up: David Chang, Assistant Vice President of Pacific Life:   Challenges: Increasing complexity in product design with... read more

Built to scale: 10,600-instance CycleCloud cluster, 39 core-years of science: $4,362!

Here is the story of a 10,600 instance (i.e. a multi-core server) HPC cluster created in 2 hours with CycleCloud on Amazon EC2 with one Chef 11 server and one purpose in mind: to accelerate life science research relating to a cancer target for a Big 10 Pharmaceutical company. Our story begins…First, when we got a call from our friends at Opscode about scale-testing Chef, we had just the workload in mind. As it happened, one of our life science clients was running a very large scale run against a cancer target. And let us tell you, knowing that hardcore science is being done with the infrastructure below is a very satisfying thing:  AWS Console Output   That’s right, 10,598 server instances running real science! But we’re getting ahead of ourselves… Unfortunately, we’re a bit limited in what parts of the science we can talk about, other than to say we’ve done a large-scale computational chemistry run to simulate millions of compounds that may interact with a protein associated with a form of cancer. We estimated this would take about 341,700 hours. This is very cool science, science that would take months to years on available internal capacity! More on that later… Thankfully, our software has been doing a lot of utility supercomputing for clients, and as we mentioned last week, because of this we’re hiring. So to tackle this problem, we decided to build software to create a CycleCloud utility supercomputer from 10,600 cloud instances, each of which was a multi-core machine! This makes this cluster the largest server-count cloud HPC environment that we know about, or has... read more

Right People, Right Place, Right Time – Cycle Hiring to meet explosive demand

Dear HPC, HTC, and Cloud communities, We'd like your attention, please, for a moment. Cycle has had the good fortune to do some amazing things over the years, but now we've hit a serious growth curve, so we're delighted to say: We're hiring. And not in a small way… Every month, every quarter, from here on out. At Cycle we believe that many of the world’s scientific, engineering and finance minds are shackled by a lack of access to compute power.Our products connect people to computing resources at any scale, when it’s needed most. Many Fortune 500's, start-ups and public research customers use our software to run mission-critical science, risk management, and engineering, and now we're growing our team of game-changers.  Cloud HPC/HTC and utility supercomputing are here, and we're leading that charge: We have a full product set, and we're rolling We're hiring, in many functions from engineering, sales, business development, operations, etc.   We're helping customers do amazing work, check out the coverage in Wired, the NY Times, BusinessWeek, or the Wall Street Journal. So if you're an overachiever, if you want to be at a company helping the world fight disease, designing better/safer products, or ensuring returns for retirement funds that are decades in the making, then we want you. You're honest, you’re smart, you get things done, and you want to join a team at this 100% employee-owned software company. So check out our current job listings, send an e-mail to jointheteam -at- with the job title in the subject, and your resume. Come interview, show us your strengths in engineering, sales, business development, sales... read more

Fortune 500s discuss Cloud HPC, Utility Supercomputing @ Cycle’s re:Invent session

As many of you know, at Cycle we think that giving every researcher, scientist, engineer, and mathematician access to the compute power they need, exactly when they need it, will enable humanity to achieve what its truly capable of.  So we organized five Fortune 500 Cycle customers of ours to talk at AWS re:Invent at 1pm Wednesday the 28th, about Cloud, HPC, and utility supercomputing. Whether its building safer medical devices, managing risk, helping quantify genomes at scale, protect hard-earned retirement funds, or find medicines to cure disease, they'll be talking about how they use Cloud to do it! At 1pm tomorrow (Wednesday) come to "Enterprise HPC in the Cloud: Fortune 500 use cases" in room 3401A to see: HartfordLIfe Johnson & Johnson Life Technologies  Novartis  PacificLife If you can't make the session come to Cycle's Booth #220, and we can talk more... read more

BigData, meet BigCompute: 1 Million Hours, 78 TB of genomic data analysis, in 1 week

It seems like every day at Cycle, we get to help people do amazing work, but this week is a little different. This week we wrapped up our involvement in the amazing work by Victor Ruotti of the Morgridge Institute for Research, winner of the inaugural Cycle Computing BigScience Challenge. In the name of improving the indexing of gene expression in differentiated stem cells, Cycle's utility supercomputing software just finished orchestrating the first publicly disclosed 1,000,000+ core hour HPC analysis on the Cloud. Yes, that’s 1 Million hours, or over a ComputeCenturyTM of work, on a total of 78 TB of genomic data, in a week, for $116/hr!  To put this 115 years of computing into context, the word ‘computer,’ meaning an electronic calculation device, was first used in 1897. So if you had started this run on a one-core computer when the term was first used, and kept it running through World War I, Jazz, the roaring 20’s, the Great Depression, WWII, Big Bands, the start of Rock’n’Roll, the Cold War, the Space Race, the Vietnam War, Disco, the 80s, grunge, techno, hip hop, reality TV, and up to Gangnam Style, Victor’s analysis would be finishing now, sometime in 2012. Now that’s a lot of compute. Below, we're going to explain the details of the analysis, and how it was executed, but if you're short on time, please skip to why this is important.  Cycle Computing BigScience Challenge Overview About a year ago we were very excited to announce the Challenge, a contest aimed at breaking the computation limits for any researchers working to answer questions that will help humanity.... read more

Free Grill v1.5 and Grill Recipe Challenge announced!

We’re pleased to announce the release of Grill version 1.5 for Opscode Chef! And it’s FREE! We’ve added a few features that came in useful during our last Utility Supercomputing run code-named Naga, including: New customizable reports Backporting Chef alert system to CycleServer core Greatly improved performance for large data sets Hosts are now removed after a certain amount of inactivity But we have a dirty secret: we don’t have a publicly available cookbook to install it. We know, we know. We wrote a chef report visualization tool that can handle several thousand nodes all converging at once but we don’t have an installer cookbook. Well, we do actually have an installer, but it’s tightly-coupled to CycleCloud and we just haven’t had time to make a rock-solid generally-useful cookbook. So we’d like to announce the Grill Recipe Challenge. Write an awesome Grill cookbook and we’ll send you a CycleComputing Mug and T-shirt. Here’s the simple rules: Submit your cookbook to Release your cookbook under the Apache license.  We’ll accept entries until 11:59 PM PST on June 12th 2012.  We’ll pick the best one and send the winner a sweet Cycle mug and T-shirt  (our decisions are final! 😉 So if you need to explore Chef converge data, download Grill today, kick the tires, write an awesome cookbook, and win fabulous prizes…... read more

CycleCloud Achieves Ludicrous Speed! (Utility Supercomputing with 50,000-cores)

Update: Since publishing this blog entry, our 50,000 core CycleCloud utility supercomputer has gotten great coverage by BusinessWeek, TheRegister, the NY Times, the Wall Street Journal’s CIO Report, Ars Technica, TheVerge, among many others. And now it would run for $750/hr with the AWS spot pricing as of 6/22/2012! Click here to contact us for more information… By now, we've shown that our software is capable of spinning up cloud computing environments that run  at massive scale and produce real scientific results.  After some of our previous efforts, we realized we were onto something with the CycleCloud Cloud HPC and Utility Supercomputing concept. However, even we underestimated the scales researchers would want to use and the scope of the research that this would impact.  Among the requests were some from a leader in computational chemistry research, Schrodinger. In collaboration with Nimbus Discovery, they needed to virtually screen 21 million molecule conformations, more than ever before, against one possible cancer target using their leading docking application, Glide. And they wanted to do it using a higher accuracy mode early-on in the process, which wasn’t possible before because it is so compute intensive! This is exactly what we did with our latest 50,000 core utility supercomputer that CycleCloud provisioned on Amazon Web Services, code-named Naga.  And Schrodinger/Nimbus got useful results they wouldn't have seen without utility supercomputing. We will describe how we accomplished this below, and in future articles and future blog posts. From a scale perspective, the most revolutionary concept implemented for Naga was scaling out all the components of an HPC environment. In our previous megaclusters, we performed a great deal of optimization... read more

Astounding Science: the CycleCloud BigScience Winners

  “If we did all the things we are capable of, we would literally astound ourselves.”  — Thomas A. Edison It is with this quote in mind that we get to do something wonderful. Today, we will honor the winner of the CycleCloud BigScience Challenge 2011.   What is the CycleCloud BigScience Challenge? Last October, we defined "utility supercomputing" and challenged researchers to break out of a common habit: limiting the questions they asked to the size of their local compute environment.  Instead we asked them to propose big questions whose answers can move humanity forward.  We offered the winner use of “utility supercomputing”, providing resources at a scale of the Top 500 supercomputing list, to run their BigScience for a few hours, then turn it off. With Cycle offering $10,000 in time, and Amazon adding another $2,500, together the Winner will have an equivalent of 10 hours on a 30,000-core CycleCloud cluster. We announced the Finalists at Supercomputing 2011, and had a group of industry luminaries agree to be judges, including Kevin Davies of Bio IT World, Matt Wood of AWS, and Peter Shenkin of Schrodinger. Many thanks to all of you for your help.   The Finalists And then we saw the Finalists' presentations. The experience was inspiring. The finalists all sought to tackle BigScience problems, which only utility supercomputing could help. And it was awesome! So it is with great pleasure I would like to recognize the Finalists that presented: Alan Aspuru-Guzik & Johannes Hachmann, Harvard Clean Energy Project Jesus Izaguirre, University of Notre Dame Victor Ruotti, Morgridge Institute for Research Martin Steinegger, TU Munich ROSTLAB... read more

CycleCloud BigScience Challenge Finalists = +5 for humanity

So we announced the finalists of the CycleCloud BigScience Challenge 2011 at Supercomputing 2011 in Seattle last night. Finalists were selected based on their proposal’s long-term benefit to humanity, originality, creativity and suitability to run on CycleCloud clusters. We're excited to announce that in addition to the $10,000 Grand Prize, and $500 for Finalists, the awesome folks over at Amazon Web Services are adding their own $2500 for the Grand Prize, and $1000 per Finalist, bringing our totals to $12500 for the BigScience award and $1500 per Finalist. With $12500 we will be able to do about 10 hours on a 30000 core environment, or 30 hours on 10000 cores. The finalists will be judged by me, and a panel of industry luminaries, including Kevin Davies, editor-in-chief, Bio-IT World, Matt Wood, technology evangelist for Amazon Web Services, and Peter S. Shenkin, vice president, Schrödinger. Thanks to Kevin, Matt, and Peter for helping judge the winner. Picking Finalists from all the entrants we got was hard enough. We're going to have our hands full picking a winner from all this great research! So without further ado, below are the Finalists for the inaugural CycleCloud BigScience Challenge: Alan Aspuru-Guzik, Harvard Clean Energy Project, professor in department of chemistry and chemical biology and Johannes Hachmann, postdoctoral fellow: Hachmann and Aspuru-Guzik wish to conduct computational screening and design of novel materials for the next generation of organic photovoltaics (OPVs). The goal is to facilitate creating the next generation of photovoltaic cells. Jesus Izaguirre, University of Notre Dame, associate professor of computer science and engineering and concurrent associate professor of applied and computational mathematics... read more

CycleCloud BigScience Challenge Finalists Announced at SC11 Amazon Booth #6202 on Monday at 8PM

We got a great response to the CycleCloud BigScience Challenge 2011, and wanted to thank all of the entrants. The entries included calculations to fight Alzheimer's, fight Parkinson's, help understand stem cell differentiation, and improve photovoltaics to make greener energy. It's going to be hard to pick Finalists! We are also excited that the gracious folks at Amazon Web Services are adding to the prize money that Cycle is providing. Amazon's contribution will help us get even more BigScience done in shorter periods of time using utility supercomputing. We'll announce the Finalists, the final Judges, and more details about the additional prizes, at 8pm Monday, November 14th, at SC11 at the Amazon Booth #6202. Additional information will be available at Cycle Computing's Booth #443 throughout the show and on the BigScience Challenge Website. Thanks again to the entrants. We're very excited about these scientists using utility supercomputing to perform research that will benefit humanity. We'll see you at... read more

Mad Scientist could win CycleCloud BigScience Challenge…

Just kidding, he's just a potential finalist! 😉 As some of you may know, Cycle wants to help scientists answer big research questions that might help humanity by donating compute time using our utility supercomputing softare. But in the overwhelming response we've gotten to the CycleCloud BigScience Challenge we announced last week, we repeatedly get the question, "What kind of research benefits humanity?" And the answer isn't Dr. Evil researching "sharks with frickin' laser beams"! Let's highlight a couple of the entries already received that might move us forward: There is the researcher doing quantum mechanics simulations for materials science to improve solar panel efficiency that might help "electrify 2.5 Billion people" with greener energy. Or the computational biologist that wants to use meta-genomics analysis to create a knowledgebase indexing system for stem cells and their derivatives, helping us "speed development of personalized cell-based therapies". Very exciting! Maybe you analyze public government data to provide clarity. Or you research science that might help in the race to treat Alzheimer's, Cancer and Diabetes. Or you're simulating ways to more efficiently distribute food in places that need it. There's plenty of utility supercomputing applications ahead of us that could benefit humanity, and now's your chance to start. Remember entries are due November 7th. So come join us. There's just four questions between you and the equivalent of 8 hours on a 30000 core cluster. So submit early & submit often, and let's change the speed that BigScience gets done! Jason StoweCEO, Cycle... read more

CycleCloud BigScience Challenge 2011

I’m planning on offering the equivalent of 8 hours on a ~30,000-core cluster ($10,000 in free CycleCloud time) to help researchers answer questions that will help humanity. But before we get there, let’s talk about why: Recently, Cycle got significant press coverage for using CycleCloud to create a ~30,000 core cluster on Amazon EC2 for a Top 5 Pharma to run research. It hit Ars Technica, then Slashdot, then Engadget (which had a great depiction of Nekomata, btw), then Wired's CloudLine … unreal. (Update: Now Forbes and Wired too!) In reading all the comments and questions, like "Would this run high-res Crysis?" or "How much capacity does Amazon have?", a thought went through my mind: I became concerned. Concerned because these shouldn't be the questions we're asking. I worried that in all this glitter, we would miss what is truly gold: that this type of computing can speed up scientific research and solve problems we’d traditionally never dream of tackling. So I'm writing this to introduce a new concept to you, and implore you to think about how to move the human race forward through science and research. To start, let's answer a question implied in many comments: Why is this important? For years cloud computing has been about paying for what you use, and accessing the compute power you need, when you needed it. The problem is, today, researchers are in the long-term habit of sizing their questions to the compute cluster they have, rather than the other way around. This isn’t the way we should work. We should provision compute at the scale the questions need. We're... read more

New CycleCloud HPC Cluster Is a Triple Threat: 30000 cores, $1279/Hour, & Grill monitoring GUI for Chef

Update: Wow, we've gotten tremendous feedback from this run on Arstechnica, Wired, and others, and man has it been a busy few days. We did have many people ask a quesiton that we wanted to clarify: Q: How long would the run-time take in-house vs. in-cyclecloud? A: The clients indicated the workload would never have happened in-house because it would have used everything they had for week(s). The in-cyclecloud run time was 7-8 hours. ========================= In more ways than one, the Nekomata cluster is three times as impressive as our last public mega-cluster. A few months ago, we released details of the Tanuki cluster, a 10,000-core behemoth launched within AWS with the click of a button. Since then, we have been launching large clusters regularly for a variety of industries. We kept our eye open for a workload large enough to push us to the next level of scale. It didn’t take very long. We have now launched a cluster 3 times the size of Tanuki, or 30,000 cores, which cost $1279/hour to operate for a Top 5 Pharma. It performed genuine scientific work — in this case molecular modeling — and a ton of it. The complexity of this environment did not necessarily scale linearly with the cores. In fact, we had to implement a triad of features within CycleCloud to make it a reality:1) MultiRegion support: To achieve the mind boggling core count of this cluster, we launched in three distinct AWS regions simultaneously, including Europe. 2) Massive Spot instance support: This was a requirement given the potential savings at this scale by going through the spot... read more

Fast and Cheap, pick two: Real data for Multi-threaded S3 Transfers

Gentleman start your uploads! They're free now but how fast can we do them? Lately we’ve been working with clients solving big scientific problems with Big Data (Next Generation Sequencing analysis is one example) so we’ve been working hard to transfer large files into and out of the cloud as efficiently as possible. We’re optimizing two costs here: money and time. Lucky for us, Amazon Web Services continues to drive down the costs of data transfer. We were excited to see that all data transfer into AWS will be free as of July 1st! They’re also reducing the cost to transfer data out of AWS. Less money, more science, yes! We still need to optimize for time, however. The scalability of the Elastic Compute Cloud (EC2) means we can throw as many cores at a scientific problem as we can afford in a very short time. But what if our input or result data is so large that the time to transfer it far outweighs the time to analyze it? Our previous work has shown that file transfers often do not fill the pipe to capacity, and are often limited by disk I/O and other factors. Therefore, we can speed transfers by using multiple threads to fill the pipe.   As shown above, this work involved moving data directly to a file system using rsync. But since that time, we’ve begun to rely upon the Simple Storage Service (S3) as both a staging area and long-term storage solution for input and result data. S3’s availability and scalability are far superior than even striped Elastic Block Store volumes running on... read more

Why Baking Your Cluster AMI Limits the Menu: DevOps for HPC clusters

You may have read our last blog post about Tanuki, the 10000-core HPC supercomputer we built to predict protein-protein interactions. We’re back to tell you a little bit about how we provisioned the 1250 c1.xlarge instances that made up Tanuki. In fact, it’s the same technology that builds all of our CycleCloud instances whether you select a single-node stand-alone Grid Engine cluster or a super kaiju Condor cluster like Tanuki. But before we get into how we do things today, lets talk about where we’ve been and what we’ve learned.   Pre-Built Custom Images: Basic Cloud Cluster “Hello World” It seems every one's first foray into building HPC clusters in a public virtual cloud (like Amazon’s EC2) involves baking a specialized image (AMI) complete with all the tools required to handle the workload. The most basic architecture includes software for pulling work from a queue or central storage location (e.g. Amazon’s SQS or S3), running the work, and pushing the results back. If you’re feeling especially clever, you may even use a scheduler like Condor, SGE, or Torque. This first cluster comes up fast, but like all first attempts, probably has some bugs in it. Maybe you need to fix libraries to support your application, add an encrypted file system, or tweak your scheduler configuration. Whatever the case, at some point you’ll need to make changes to it. If you’ve got just one cluster with a handful of nodes, making these changes manually can be done but it’s a pain. Alternatively, you can make your changes, bake new images and restart your cluster with the new images. This is... read more

Single click starts a 10,000-core CycleCloud cluster for $1060/hr

Update: This cluster received great coverage, including Amazon CTO Werner Vogel's kind tweet, customer commentary on this Life Science cloud HPC project, & results from our EC2 HPC Cluster. Meet our latest CycleCloud cluster type, Tanuki. Created with the push of a button, he weighs in at a hefty 10,000 cores. Yes, you read that right. 10,000 cores. Tanuki approximates #114 on the last 2010 Top 500 supercomputer list in size, and cost $1060/hr to operate, including all AWS and CycleCloud charges, with no up front costs. Yes, you read that right. 10,000 cores costs $1060/hr. Here are some statistics on the cluster: Scientific Need =  80000 Compute Hour Cluster Scale =  10k cores, 1250 servers Run-time =  8 hours User effort to start =  Push a button Provisioning Time =  First 2000 cores in 15 minutes, All cores in 45 minutes Upfront investment =  $0 Total Cost (IaaS & CycleCloud) =  $1060/hr This historic supercomputer, built completely in the cloud, drew its first breath minutes after the push of a button. Tanuki started operations through a completely automated launch using our CycleCloudSM service. It ran for 8 hours before the job workflow ended and the cluster was shutdown. The 8-hour run-time across 10000 cores yielded a treasure trove of scientific results for one of our large life science clients. The ability to run a cluster of this size for $1060/hr, including AWS and CycleCloud charges, is mind-boggling, even to those of us that have been in the cloud HPC business for a while. When Tanuki was first mentioned within Cycle, its scale was thrown out partly as a... read more

CondorWeek 2011 T-Shirts

Every year it seems like the Condor community will run out of ideas for CondorWeek t-shirts, but, we're happy to say that still hasn't happened! Check out list of t-shirt ideas, and please if you have a new idea, please comment! Suggestions so far for 2011:1. I knew I had a problem when I found myself looking at my phone thinking, "I could run Condor on that!"2. JPMC uses Condor, BearStearns didn't. hmmmm…3. My Condor pool's so big, my pool has a pool (hdfs), and even my pool's pool is bigger than your pool… 4. Ray. Ray! the next time someone asks if you're running Condor, you say YES! 5. I don't often run 10,000 core clusters, but when I do, I use Condor. Stay compute-thirsty my friends. 6. "Look at your cluster, now back to me, now look at your cluster, now back to me. I'm Condor, the scheduler your cluster could schedule like." 7. That's no moon, that's Purdue's Condor cluster! 8. Condor: Slots that actually pay out! 9. Dr. Condormatch (Or How I Learned To Stop Worrying And Love my Cluster) 10. Condor: "ZKM" WTF? 11. There Can Be Only One condor_master 12. Condor: finding particles, curing disease, and encoding mp3's since 1988. 13. You want the slots? You can't handle the slots! 14. Kirk: Khhaaaaaaaannnnndooooorrrr! EVERY VOTE COUNTS! If you like one of these over the others, please comment on our blog with your preferences. You don't have to use your name, but please use your e-mail so we can tell if we're using your preference/idea, and get your... read more

Lessons learned building a 4096-core Cloud HPC Supercomputer for $418/hr

The Challenge: 4096-core Cluster Back in December 2010, we discussed running a 2048-core cluster using CycleCloud, which was in effect renting a circa 2005 Top 20 supercomputer for two hours. After that run, we were given a use case from a client that required us to push the boundary even further with CycleCloud. The challenge at hand was running a large workflow on a 4096-core cluster, but could our software start and resolve issues in getting a 4096-core cluster up and running?   Cycle engineers accepted the challenge and built a new cluster we’ll call “Oni”. The mission of CycleCloud is to make running large computational clusters in the cloud as easy as possible. There is a lot of work that must happen behind the scenes to provision clusters both at this scale and on-demand. What kinds of issues did we run into as we prepared to scale out the CycleCloud service from building 2048-core cluster up to a whopping 4096-core Oni cluster?  This post covers three of these questions: Can we get 4096 cores from EC2 reliably? Can the configuration management software keep up? Can the scheduler scale? How much does a 4096-core cluster cost on CycleCloud?   Question 1: Can We Get 4096 Cores from EC2 Reliably? We needed 512 c1.xlarge instances (each with 8 virtual cores) in EC2’s us-east region for this workload. This is a lot of instances! First, we requested that our client’s EC2 instance limit be increased. This is a manual process, but Cycle Computing has a great relationship with AWS and we secured the limit increase without issue. However, an increased instance... read more

64 GPUs, $100, and a Dream: Practical GPU on EC2 Experience

When we last spoke about GPUs on our blog, it was during the SuperComputing 2010 conference when AWS announced their new cg1.4xlarge instance type. The response to our benchmarks for the Amazon CG1 instance for SC 2010 was phenomenal. As a quick review, cg1.4xlarge are the typical AWS “Cluster Compute” instance extended with a pair of Nvidia M2050 GPUs, 22 GB of memory, and a 10Gbps Ethernet interconnect. Since we first published our Amazon GPU on CycleCloud benchmarks, the phone has been ringing off the hook at Cycle as we received interest in automatically creating clusters with shared file systems using CG1, high memory, and high-cpu instance types. As an example, we’ve created a 32-node / 64-GPU cluster that ran molecular dynamics apps in 1 month instead of 5 months thanks to the Tesla GPUs. When combined with the 8TB filer, this particular cluster costs less than $100 per hour to operate, and took about 10-15 minutes to spin up initially. Given all this experience in automating clusters, we thought it was high time we shared some of what we found. First, we'll cover the what's and why's of GPU clusters on the cloud, then get into some data about how our experience has been, and cover our costs detail. Overview As a quick background, Amazon’s EC2 offerings now include the cg1.4xlarge instance type, the typical Cluster Compute Instance (CCI), extended with a pair of Nvidia M2050 GPUs. Access to the GPU units is through the standard CUDA toolkit installed on top a Cent OS 5 release. From an application development perspective nothing is different; you write your applications... read more

HowTo: Save a $million on HPC for a Fortune100 Bank

In any large, modern organization there exists a considerable deployment of desktop-based compute power. Those bland, beige boxes used to piece together slide presentations, surf the web and send out reminders about cake in the lunch room are  turned on at 8am and off at 5pm, left to collect dust after hours. Especially with modern virtual desktop initiatives (VDI), thin clients running Linux are left useless, despite the value they hold from a compute perspective. Fortune 100 Bank Harvesting Cycles Today we want to educate you about how big financial services companies use desktops of any type to perform high throughput pricing and risk calculations. The  example we want to leverage is from a Fortune 100 company, let's call them ExampleBank, that runs a constant stream of moderate data and heavy CPU computations on their dedicated grid. As an alternative to dedicated server resources, running jobs on desktops was estimated to save them millions in server equipment, power and other operation costs, and London/UK data center space, thanks to open source software that has no license costs associated with it! Cycle engineers worked with their desktop management IT team to deploy Condor on thousands of their desktops, all managed by our CycleServer product. Once deployed, Condor falls under control of CycleServer and job execution policies are crafted to allow latent desktop cycles to be used for quantitative finance jobs. Configuring Condor Condor is a highly flexible job execution engine that can fit very comfortably into a desktop compute environment, offering up spare cycles to grid jobs when the desktop machine is not being used for its primary role. Our... read more

Creating A 2048-Core HPC Cluster in Minutes on AWS for a $525 job

World, meet Okami. Okami, meet World We do a lot of work, at very large scales, on HPC work in the cloud, and today we’d like to introduce you to a decent sized HPC Cluster we recently worked on: let’s call it ‘Okami’. Okami has a number of components familiar to those who have worked with internal HPC environments: 2048 cores, shared storage, and a scheduling system. Had Okami been born in 2005 rather than 2010, he’d be in the Top 20 largest computers at that time. But the similarities between Okami and internal clusters end there. First, Okami was provisioned, from start to finish, by CycleCloud in under 30 minutes! And more importantly: when calculations were done, the nodes were shut down, and the user paid only $525 to access this 2048 core cluster! As many of our readers know, we built CycleCloud in 2007 and it was the first system to automate the process of creating complete compute clusters in virtual infrastructure. It is the easiest and fastest way to deploy traditional HPC clusters in the Cloud. Creating HPC environments without security in EC2 is not burdensom, but CycleCloud automates: provisioning cluster nodes with dependencies, setting up the scheduling correctly/securely, patching/maintaining OS images, setting up encryption, managing the encryption keys, administering cluster users, tracking audit information, deploying/optimizing shared file systems, application deployment, scaling appropriately based upon load, connecting to your license management software, and keeping on top of all the latest and greatest Cloud infrastructure and features. So when a very large life science research organization asked us to create a 2048-core cluster in EC2 to make... read more

Benchmarks for the brand new Cluster GPU Instance on Amazon EC2

A Couple More Nails in the Coffin of the Private Compute Cluster Update: We're getting an overwhelming response to this entry, if you have questions come to booth #4638 at Supercomputing 2010======Cycle Computing has been in the business of provisioning large-scale computing environments within clouds such as Amazon EC2 for quite some time. In parallel, we have also built, supported, and integrated internal computing environments for Fortune 100s, universities, government labs, and SMBs with clusters of all shapes and sizes. Through work with clients including JPMorgan Chase, Pfizer, Lockheed Martin, Purdue University, among others, we have developed a keen sense for use cases that are most appropriate for either internal or external computing. More and more we see the lines blurring between internal and cloud case overall performance. This is good news for end users that want to have the flexibility to consume resources both internally and externally. During the past few years it has been no secret that EC2 has been best cloud provider for massive scale, but loosely connected scientific computing environments. Thankfully, many workflows we have encountered have performed well within the EC2 boundaries. Specifically, those that take advantage of pleasantly parallel, high-throughput computing workflows. Still, the AWS approach to virtualization and available hardware has made it difficult to run workloads which required high bandwidth or low latency communication within a collection of distinct worker nodes. Many of the AWS machines used CPU technology that, while respectable, was not up to par with the current generation of chip architectures. The result? Certain use cases simply were not a good fit for EC2 and were easily beaten... read more

Make the Most of Your AWS Instances: Using open-source Condor to Harvest Cycles, Part 2

How To – Harvest Cycles From Your AWS App Servers, Part 2 In Part 1 of this series I introduced you to AmazingWebVideo Inc. They’re a successful, Amazon EC2-based, application provider who wants to get more out of their rented processors. Specifically they want to harvest unused compute cycles from various application servers in between bursty, end-user traffic. We introduced them to Condor in Part 1 and helped them move three classes of background processing jobs from a simple queuing system to Condor in preparation for cycle harvesting. Now lets take a look at how Condor, installed on their application severs, can help them accomplish this goal. In our existing Condor pool, our machines are set to service jobs always. Since the only processing load these machines experience comes directly from running Condor jobs this setup is fine. But our application servers won’t be running under Condor’s control. Condor needs to pay attention to load outside of Condor’s control and only run jobs when this load is suitably low. We’ll use Condor’s START attribute and ClassAd technology to write an expression that controls when these machines should run jobs. But first lets decide how we want the jobs to run on these machines. There is a whole spectrum of choice here and it helps to think about it advance of writing your run-time policies in Condor configuration files. Policy Time There are four state changes around which we need to develop policy: “When can Condor run jobs on this machine?”; “When should Condor suspend jobs it may be running?”; “When should Condor resume running suspended jobs?”; and “When should... read more

Experience with Data Transfer into EC2

Maximizing data throughput: Multi-stream data transfer into Amazon EC2 It is common for cloud computing articles to talk at length about the abundant hardware resources the cloud can offer the modern researcher or analyst, but little is typically said about the back end data store available with cloud computing. Before any research in the cloud can take place, data must be staged in a manner that is accessible to your cloud-based compute resources. It becomes non-trivial to perform the staging portion of your cloud use if your data sets are large. The Amazon EC2 cloud provides large quantities of hardware suitable for high speed, high throughput scientific computing. Coupled with the AWS storage and Amazon S3 system it makes a formidable platform for anyone looking to do large scale, scientific computing on quantities of file-based data. In this post we’ll explore data ingress to EC2 as download speeds out of EC2 are typically much higher, both from consumer grade and enterprise level Internet connections, as we typically see more upload than download in the practical use of AWS services. Large data sets get transferred in to EC2, working data stays on cloud-local storage, and summarized, compact results are brought back from the cloud. Let’s look at a few common cases for moving a large data set into AWS-hosted storage and explore the transfer rates, benefits and drawbacks of each approach. Case #1: Consumer Grade Internet Connection It is trivial to saturate the upstream network pipe of a consumer grade cable or DSL Internet connection transferring using only a single stream to EC2. With strong encryption on the ingress stream,... read more

Make the Most of Your AWS Instances: Using open-source Condor to Harvest Cycles, Part 1

How To – Harvest Cycles From Your App Servers, Part 1 It’s a common problem: you run a successful, cloud-based application business in Amazon’s EC2 cloud with bursty traffic. In order to handle the bursts you have to keep a minimum number of EC2 application servers up and running. Would it not be nice if you could do something with these servers between handling the bursty requests? After all: you’re paying for that time, and there’s thumbnails to generate, analytics to calculate, and batch applications to run. Enter Condor. Condor is a high throughput distributed computing environment from the University of Wisconsin, Madison ( that can be configured to steal unused cycles from your application severs when they aren’t serving your main business applications to your customers. Condor provides advanced job scheduling, quota management, policy configuration, support for virtual machine based work loads, integration with all the popular operating systems in use today. And: it’s free. In the next three posts I’m going to show you how to use Condor to harness the wasted compute power on your application servers and how Cycle Computing’s CycleServer can help make this process simple and manageable. The Setup Throughout this series of posts I’m going to talk about a fictitious web application company: AmazingWebVideo Inc. They offer video hosting services and their business has been growing rapidly over the past twelve months. They already run all of their web application components in Amazon’s EC2 cloud, but the nature of their business still requires that they keep a base number of web app servers constantly running to handle the start of any bursts... read more

Multiple Condor Schedulers on a Single host

So Cyclers have run multiple schedds per host since 2004/2005 when Jason did a Disney movie with Condor, running 12 schedds per submit server with Condor 6.6/6.7 and using software to load balance jobs between the schedulers. Given the interest in this area, we thought we could help explain how to do it in detail. Often, when dealing with scheduling jobs at a large scale in Condor, it can sometimes become useful to simultaneously run more than one condor_schedd daemon on the same server. On modern, multi-core architectures, this technique can bring about several improvements: scheduler bottleneck avoidance, improved job startup times, improved condor_q query times, improved job submission times, and enhanced overall throughput. Today, our guys wrote up both the new school (condor 7.4 or later) and the old school (Condor 7.2 or older) ways of implementing multiple schedulers. Since 2006 CycleServer has done load-based, job distribution between multiple schedulers, so that won't be covered. This post will show how to set up multiple schedulers on a single host, and name the schedds in question. Hope this helps:... read more

Community Feedback: CondorWeek 2010 T-Shirts and Mugs

Hello All, It's that time of year again: picking Condor t-shirt phrases for CondorWeek! As we've done for the past few years for CondorWeek, we'd like to get suggestions for new phrases from the Condor Community. Below are some of the suggestions from this year.  If you have a new idea, please comment below this blog post (you don't have to use your real name, just give us your e-mail so we can let you know we're using it!). If we print yours, we'll give you a special surprise at CondorWeek or send it to you if you won't be in Madison this year. We're also interested in whether you'd prefer a mug to a t-shirt! Suggestions so far for 2010:1. Condor: There's a knob for that… 2. Condor: "ZKM" WTF? 3. Green Computing for Grizzlies:HIBERNATE = (isWinter) && (ExcessFat > 200) 4. ~/2010_Condor_Odyssey $ condor_rm -allTodd, I'm afraid that's something I cannot allow to happen… 5. JPMC uses Condor, BearStearns didn't. hmmmm… 6. PREEMPT = Spouse =?= "ELIN NORDEGREN" 7. My Condor pool's so big, my pool has a pool (hdfs), and even my pool's pool is bigger than your pool… 8. 2010 A Condor Odyssey: It's not HAL-9000, it's DiaGrid. Oldies but goodies:9. I am the condor_master. 10. Condor: an Evil Plot to control the world's computers. Enjoy! EVERY VOTE COUNTS! If you like one of these over the others, please comment on our blog with your preferences. ALSO, IF YOU'D RATHER GET A MUG than a T-SHIRT, let us know that as well. Please comment! We look forward to seeing what everyone comes up with... read more

Considersations for Financial HPC Applications in the Cloud

Julio Gomez had a great post in WallStreet & Technology about considerations for Building a Cloud Strategy that was created by one of the Innovation Councils including the following: 1. Prepare to educate vendors2. Liability and indemnification are a major disconnect3. Nail down your authentication and federated i dentity capability4. Identify lowest risk, lowest value areas for initial forays5. Get your physical infrastructure organized This is a great piece, and the points mentioned by the council are spot on. It is important to consider security and authentication management up front, as well as applications that have low risk or downside from a business perspective. As companies start deciding the types of workloads that make sense to move into the cloud, there are a few steps that can make tremendous sense in this environment. Based on my experience in this area with high performance computing (HPC), there are a few areas to consider to ensure that applications are suitable for the cloud. Most of them revolve around data and security. When looking at hedging, pricing, trading and other quantitative applications, I would recommend the standard steps for technology adoption: Assessing entry points/applications Launching a proof-of-concept (POC) with an initial application Rolling that application into production Planning wider adoption based upon lessons learned from the POC In assessing potential HPC applications for cloud, I would recommend reviewing criteria including the authentication, risk factors, and regulatory requirements from the council meeting, plus the following: Latency requirements – External environments don't lend themselves to low latency at this point in time relative to internal InfiniBand environments Data volume and source – If the data... read more

Follow up on Life Science Leader

So a short while ago, I wrote an article on HPC in the Cloud for Life Science Leader, a widely-read monthly publication for life science executives. We had some nice responses, but today google notified me that David Dooling posted a response to my article that has a few inaccurate, IMHO, points, so while my response is awaiting moderation on his page, I thought I'd post it here (sorry for the typo on my first comment, David): David, Good to meet you. I've read your blog before, and take issue with your arguments regarding the cloud. I have more posts on Cloud at my blog which, as well as the post below. This is an interesting area, and I'd love to correspond with you more at the e-mail at the bottom: > No wonder he makes cloud computing sound so attractive. No mention of the> IT expertise needed to get up and running on the cloud. No mention of the> software engineering needed to ensure your programs run efficiently on> the cloud. You are implying that to get running in the cloud, an end user must worry about the "IT expertise" and "software engineering" needed to get applications up and running. I believe this is a straw-man, an incorrect assertion to begin with. One of the major benefits of virtualized infrastructure and service oriented architectures is that they are repeatable and decouple the knowledge of building the service from the users consuming it. This means that one person, who creates the virtual machine images or the server code running the service, does need the expertise to get an... read more

Poster & Lightning Talk: BLAST Workflow Performance on EC2 @ Rocky Mountain Bioinformatics

We've done quite a bit of application engineering for running various life science workflows in Amazon EC2, and we're (finally) getting some of our analysis of running BLAST on Condor in EC2 up on the web, so I wanted to share.  Back in 2008, we did a benchmark of the performance of running BLAST workflows running on Condor on EC2, and gave a Poster & Lightning Talk at Rocky Mountain Bioinformatics . Results: We were able to get 1.9825x performance for every 2x the cores on a Condor cluster in EC2. One of the more interesting visualizations that one of our talented guys, Ian Alderman, did, was on the reason that jobs run efficiently when done in high throughput environments. Basically, the chart below shows the run times of the first(orange), second (yellow), third (green), etc. tasks running on various processors in the cluster (the separate rows in the chart), with time running left to right. As you can see, high throughput computing takes advantage of the fact that different tasks/jobs, in this case from the same user, take different amounts of time. As a result, over time processors balance out their usage to finish comparatively close together in run-time: We will post more details about how our BLAST pipeline works in an upcoming post, but you can find the poster with more detail... read more

Life Science Leader: Cloudy Future for HPC in Life Sciences?

Early this week, Life Science Leader – a widely-read monthly publication for life science executives – posted an article I wrote about "Is the Future Of High-Performance Computing For Life Sciences Cloudy?" It covers a number of different topics including work we've done with Schrödinger on Amazon EC2, and the benefits of using Clusters As a Service on the Cloud. We think that cloud-based HPC/HTC are going to change the way life science researchers get their work done, especially with applications like Schrödinger's Glide. Update: Since this came out it has also been covered by Matthew Dublin at Genome Web. Matthew has some other great articles, including an overview of acceleration techniques in bioinformatics and an article on Nature Biotech ponders the... read more

Re: Condor Cloud-Aware Capabilities

We had a bunch of interest in the last post we made with a Correction to Stephen Wilson's blog from Sun indicating the SGE 6.2u5 release, in December 2009, was the first cloud-aware scheduler in enabling scheduling to Amazon EC2 and scheduling of Hadoop jobs. Miha Ahronovitz posted some questions that I wanted to address specifically. First, Miha, thanks for your response about Condor's capabilities.  By the questions asked and other feedback we've gotten from the first post, it seems like folks may be interested in finding out more about Condor and what it can do, so I highly recommend that you attend CondorWeek that the Condor Team hosts every year or feel free to contact Cycle. CondorWeek will give you a broad spectrum of all the cool areas people use Condor. As I stated in the last blog, as full disclosure, we started using Condor 6 years ago, and Hadoop/SGE/PBS within the last 2 years. It may sound like this is a Condor-only vantage point, but we are a unique company in that we support these other schedulers, and truly use the proper tool for the client in various scenarios. Given Miha's questions, please allow me to provide answers about Condor's capabilities, and ask about an SGE feature I saw: "Can you use Condor to transform a private cloud from in a cost center to a profit center? On-demand means billing, billing means making money." Yes. Condor was first able to manage internal VM-based infrastructure in the 6.9.4 release*, available on Sept. 5, 2007. Usage Tracking for billing/chargeback has been done by the Condor Accountant for 10-20 years... read more

Correction to Stephen Wilson’s Blog: “The World’s First Cloud-Aware Distributed Resource Manager”

As a computing specialist that works in the Condor, Hadoop, and SGE communities, I wanted to post a follow up to Stephen Wilson’s Jan. 14 blog post on SGE, as it is factually incorrect.  At Cycle Computing, I started working with Condor users five years ago, as well as using Hadoop and SGE with clients in the past two years, in life sciences, insurance, finance, energy and chip design. Condor is very flexible and powerful, but all schedulers have use cases they’re great at.    We’re fans of computation management in general, but we were alarmed by Wilson’s post about Grid Engine being the “World’s First Cloud-Aware Distributed Resource Manager” because it supports machines running in EC2 through a VPN and can schedule Hadoop clusters.  Simply put, it isn’t first.    Perhaps Wilson was misinformed about SGE being first with these features and will correct or update his post upon review of the following information.   The Condor Scheduler has had Hadoop Cluster scheduling since 2006, originally by Yahoo! using Condor in its Hadoop on Demand project.  Condor has had Amazon EC2 scheduling since 2008.  In 2007, CycleCloud offered Condor clusters as a service into the Amazon Cloud.  Condor can even be used to do advanced, cost-based scheduling in Amazon EC2, as we discussed here.  These dates are well published.    Condor is a freely available resource manager from the University of Wisconsin-Madison, with production users over the past 20 years including for many Fortune 500s and large companies like JP Morgan, Yahoo!, Fair Isaac, Altera and Hartford Insurance, who have also talked at Condor conferences.   For reference, the timeline looks like this:   ·      2006:  Yahoo!, the major contributor behind Hadoop, used Condor to schedule Hadoop... read more

Cycle on Condor, Hadoop and Clouds

In supporting large and small organizations using Condor, Hadoop, SGE/Torque, and the Cloud, Cycle encounters interesting tips, tricks and other notes from the field, that might be useful to all users of High Throughput and High Performance Computing on Desktops, Clusters, and Clouds. So we were thinking it would be great to share what we’ve discovered, talk about the cool work we’re doing in HPC, and get comments from the community as a whole. So here we are. Watch for new posts, and give your comments when you have... read more

The benefits of Accounting Groups and Group Quotas

<p>Today, I’m going to post a text explanation of a presentation I gave at CondorWeek 2007. The topic was new features in Condor’s negotiation algorithm that should enable a common use case for organizations with demanding users. </p> <p>In most enterprises, or other large organizations with demanding users, the requirements for Group_Quota are straightforward:</p> <ul><li>Guaranteed Minimum Quota</li> <li>Fast Claiming of Quota</li> <li>Avoid Unnecessary Preemption </li></ul> <p>This is because users in these environments have purchased compute capacity. They don’t mind sharing empty capacity, but they want what they pay for. And they want it as soon as they submit work. That is, now. Frequently, these jobs are running as Vanilla or Java Universe, which means no checkpointing, so we also need to avoid unnecessary preemptions. </p> <p>Condor uses three common mechanisms to allocate which jobs should be running on which resources:</p> <ul><li>&quot;Fair-share&quot; User Priority</li> <li>Machine RANK</li> <li>AccountingGroups and GROUP_QUOTA_*</li></ul> <p>Generally, there is a natural progression in the use of these features, starting from top to bottom as pool usage evolves. </p><br... read more