Next Steps for Cycle Computing: Joining Microsoft to bring new Big Compute and Cloud HPC capabilities to Azure

When Rachel, Rob, Doug, and I started Cycle twelve years ago on an $8,000 credit card bill, customers needed large up-front investments to access Big Compute. We set out to fix that, to accelerate the pace of innovation by changing the way the world accesses computing. Since then, our products have helped customers fight cancer & other diseases, design faster rockets, build better hard drives, create better solar panels, and manage risk for peoples’ retirements. We’ve had an amazing experience bootstrapping Cycle Computing without VC funding, building products that will manage 1 Billion core-hours this year, growing at 2.7x every 12 months, with a customer base that spends $50-100 million annually on cloud infrastructure. Today we couldn’t be happier to announce that we’re joining Microsoft to accelerate HPC cloud adoption. To our existing customers: We couldn’t have done any of this without you, and we’re excited to continue supporting you and the amazing work you do. Together you form an impressive community of innovative users that span Global 2000 Manufacturing, Big 5 Life Insurance, Big 10 Pharma & Biotech, Big 10 Media & Entertainment, Big 10 Financial Services & Hedge Funds, startups, and government agencies. We will continue to make CycleCloud the leading Big Compute and Cloud HPC software, but now even bigger and better than before. To my fellow Cyclers: You’ve been an integral part of a special team of bright, hard-working people that engineer great products and get things done with integrity. Customers frequently say that they love working with you and the products you build. I couldn’t be prouder. It has been an honor working alongside...
Monitoring cloud GPUs with CycleCloud

Monitoring cloud GPUs with CycleCloud

Graphics Processing Units (GPUs) provide a great boost for high performance computing, but they’re expensive and take time to purchase and install. With our CycleCloud software, you can get immediate access to just the right amount of cloud GPU time from Microsoft Azure, Google Cloud, and Amazon Web Services. GPU-enabled instances in CycleCloud enjoy the same features that traditional compute instances do: cost control, monitoring, and dynamic scaling. In our upcoming release, we’ve improved the monitoring experience, making it easier than ever to manage your cloud GPU instances. CycleCloud configures the monitoring automatically for GPU-enabled instances with drivers installed. You don’t need to do any of the setup yourself. When clicking Show Detail on a cloud nodes in the CycleCloud interface, you can now see performance graphs and statistics alongside the other node information. When the node has GPUs, this includes the GPU usage and memory. In addition, the detail window also includes a Metrics tab. This tab shows all of the raw performance metrics reported by the Ganglia system monitoring platform. If you’re interested in learning more, stop by booth #530 at the GPU Technology Conference this week for a demo, or contact...

CycleCloud 6.5.3 released

Last week we pushed the button on the latest release of our CycleCloud software for managing cloud HPC and Big Compute workloads. This release has one particular feature that many customers customers asked for: Cost Alerting. This new feature will give you the ability to easily set cost alerts on a per-cluster basis. You can set the alert to be dollars per day or per month. This gives you a great way to manage consumption and assure that users aren’t blowing through budgets. After all, you want to give your users access to unlimited compute, but you don’t want to give them an unlimited budget. Clusters from any supported cloud service provider display an estimated compute cost along with the core-hour usage. Daily or monthly budgets are set from the cluster page and trigger alerts when the threshold is crossed. Because the appropriate action when a cluster goes over budget varies, the CycleCloud software does not take any automated enforcement action. We find most customers try to set it the threshold to some percentage of total budget to give them a heads up before exceeding the budget. The percentage can be a function of the type of work and size of budget. In addition to the cost alerting, we’ve added additional features to our Microsoft Azure support. CycleCloud now uses Azure Managed Disks and Images for virtual machines, simplifying management of storage and improving performance. Azure instances will automatically use CycleCloud’s standalone DNS configuration to improve the experience for Open Grid Scheduler users. Current customers can download CycleCloud 6.5.3 from the Cycle Computing Portal. If you’d like to learn...
Simulating Hyperloop pods on Microsoft Azure

Simulating Hyperloop pods on Microsoft Azure

Earlier today, we published a case study and press release about some work we did with the HyperXite team from the University of California, Irvine team and their efforts in the Hyperloop competition. This team leveraged CycleCloud to run ANSYS Fluent™ on Microsoft Azure Big Compute to to complete their iterations in 48 hours, enabling them to get results fast enough to make adjustments and modifications to the design then rerun the simulations until they were able to converge on a final solution. All for less than $600 in simulation costs. This was a case where Cloud enabled them to do something they could not have done any other way. As a bit of background, Elon Musk’s SpaceX started the Hyperloop project as a way to accelerate development of a fast, safe, low-power, and cheap method of transporting people and freight. HyperXite was one of 27 teams that competed recently. Nima Mohseni, the team’s simulation lead, used the popular computational fluid dynamics software ANSYS Fluent™ to perform modeling of the pod. Key areas that the team modeled were related to the braking approach that they were using. Through the use of simulation, they were able to show that they could brake with just the use of magnetic force, removing the need for mechanical brakes. This reduced weight, increased efficiency, and improved the overall design, which was recognized with a Pod Technical Excellence award last year. Using the CycleCloud software suite, the HyperXite team created an Open Grid Scheduler cluster leveraging Azure’s memory-optimized instances in the East US region. Each instance has 16 cores based on the 2.4 GHz Intel...

LAMMPS scaling on Azure InfiniBand

While public clouds have gained a reputation as strong performers and a good fit for batch and throughput-based workloads, we often still hear that clouds don’t work for “real” or “at scale” high performance computing applications. That’s not necessarily true, however, as Microsoft Azure has continued its rollout of Infiniband-enabled virtual machines. InfiniBand is the most common interconnect among TOP500 supercomputers, and Microsoft has deployed the powerful and stable iteration known as “FDR” Infiniband. Best of all, these exceptionally high levels of interconnect performance are now available to everyone on Azure’s new H-series and N-series virtual machines. To see how well Azure’s Infiniband works, we benchmarked LAMMPS, an open source molecular dynamics simulation package developed by Sandia National Laboratories. LAMMPS is used widely-used across government, academia, and industry, and is frequently a computational tool of choice for some of the most advanced science and engineering teams. LAMMPS relies heavily on MPI to achieve sustained high performance on real-world workloads, and can scale to many hundreds of thousands of CPU cores. Armed with H16r virtual machines, we used the Lennard-Jones liquid benchmark. We selected the “LJ” benchmark and tested two scenarios: “weak scaling”, in which every core simulated 32,000 atoms no matter how many cores were utilized, and “strong scaling” which used a fixed problem size of 512,000 atoms with an increasing number of cores. Both scenarios simulated 1,000 time steps. We performed no “data dumps” (i.e. intermediate output to disk) in order to isolate solver performance, and ran 30 test jobs per data point in order to obtain statistical significance and associated averages. In summary, the results were impressive...

Leap second #37 is coming!

Everybody always talks about needing more time. Well, this year you get it! Saturday night will be one second longer than normal. A leap second is being inserted in order to slow clocks down to match the Earth’s rotation. Beyond just adding a second to your day, your software needs to be ready as well. The addition of leap seconds in 2012 and 2015 means that many software systems are ready for it. This includes CycleCloud and the cloud service providers it works with. Leap second handling Here’s how the cloud service providers handle the leap second: Amazon Web Services — The additional second is spread over the 24 hour period from 12:00 UTC on December 31 through 12:00 UTC on January 1. Each “second” will be 1/86400 longer. Azure — In 2015, Azure inserted leap seconds at midnight local time. The assumption is that they will do this again. Google Cloud — The additional second is spread over the 20 hour period from 14:00 UTC on December 31 through 10:00 UTC on January 1. Instances started in the cloud providers will depend on the configured behavior. Generally speaking, Linux instances will use the NTP server pools and handle the change in the kernel. Windows instances on AWS will follow the AWS time adjustment above. Windows generally handles leap seconds by changing the clock at the next update. It’s a leap year, too In case one extra second was not enough 2016 for you, remember that this year was a leap year as well. If your application considers the day of the year, you’ll want to make sure it’s...