Monitoring cloud GPUs with CycleCloud

Monitoring cloud GPUs with CycleCloud

Graphics Processing Units (GPUs) provide a great boost for high performance computing, but they’re expensive and take time to purchase and install. With our CycleCloud software, you can get immediate access to just the right amount of cloud GPU time from Microsoft Azure, Google Cloud, and Amazon Web Services. GPU-enabled instances in CycleCloud enjoy the same features that traditional compute instances do: cost control, monitoring, and dynamic scaling. In our upcoming release, we’ve improved the monitoring experience, making it easier than ever to manage your cloud GPU instances. CycleCloud configures the monitoring automatically for GPU-enabled instances with drivers installed. You don’t need to do any of the setup yourself. When clicking Show Detail on a cloud nodes in the CycleCloud interface, you can now see performance graphs and statistics alongside the other node information. When the node has GPUs, this includes the GPU usage and memory. In addition, the detail window also includes a Metrics tab. This tab shows all of the raw performance metrics reported by the Ganglia system monitoring platform. If you’re interested in learning more, stop by booth #530 at the GPU Technology Conference this week for a demo, or contact...
How do I optimize cloud HTCondor jobs for cost?

How do I optimize cloud HTCondor jobs for cost?

The most important consideration is getting your work done. The second most important consideration is doing it without wasting money. In this post, we describe how you can minimize costs in an HTCondor environment on Amazon Web Services using CycleCloud™. Our CycleCloud software is an orchestration platform for any workflow. It provides multi-user support, cost management, alerting, and automation to organizations who want to get better answers, faster. CycleCloud launches, configures, and monitors cloud resources and provides tools for managing data and workflows. With support for Microsoft Azure, Google Cloud, and Amazon Web Services, our customers use CycleCloud to power their cloud HPC and big compute workloads using HTCondor, PBS Pro, Hadoop, and other technologies. So how can you use CycleCloud’s features to get the most compute for your dollar? The HTCondor scheduler tracks the state of execution slots, including the time slots are idle. This makes it easy to identify “wasted” time in a cloud environment, but it’s not always as straightforward as it may seem. CycleCloud will wait for a user-configurable length of time before considering whether an idle instance should be shut down, and will only shut down when the node is within 5 minutes of the end of the billing hour. “Shut the instances down sooner!” is an understandable first reaction, but it isn’t necessarily beneficial. AWS bills for EC2 instances by the hour, so shutting an instance down early only reduces the appearance of idle time without lowering your bill. With any of the cloud service providers, keeping the minimum idle time too short will result in instance churn, especially with uneven work submission....
Simulating Hyperloop pods on Microsoft Azure

Simulating Hyperloop pods on Microsoft Azure

Earlier today, we published a case study and press release about some work we did with the HyperXite team from the University of California, Irvine team and their efforts in the Hyperloop competition. This team leveraged CycleCloud to run ANSYS Fluent™ on Microsoft Azure Big Compute to to complete their iterations in 48 hours, enabling them to get results fast enough to make adjustments and modifications to the design then rerun the simulations until they were able to converge on a final solution. All for less than $600 in simulation costs. This was a case where Cloud enabled them to do something they could not have done any other way. As a bit of background, Elon Musk’s SpaceX started the Hyperloop project as a way to accelerate development of a fast, safe, low-power, and cheap method of transporting people and freight. HyperXite was one of 27 teams that competed recently. Nima Mohseni, the team’s simulation lead, used the popular computational fluid dynamics software ANSYS Fluent™ to perform modeling of the pod. Key areas that the team modeled were related to the braking approach that they were using. Through the use of simulation, they were able to show that they could brake with just the use of magnetic force, removing the need for mechanical brakes. This reduced weight, increased efficiency, and improved the overall design, which was recognized with a Pod Technical Excellence award last year. Using the CycleCloud software suite, the HyperXite team created an Open Grid Scheduler cluster leveraging Azure’s memory-optimized instances in the East US region. Each instance has 16 cores based on the 2.4 GHz Intel...

CycleCloud 6 feature: MPI optimizations

This post is one of several in a series describing features introduced in CycleCloud 6, which we released on November 8. Batch workloads have long been a natural fit for cloud environments. Tightly-coupled workflows (e.g. MPI jobs) are sensitive to bandwidth, latency, and abruptly-terminated instances. MPI workloads can certainly be run on the cloud, but with guardrails. CycleCloud 6 adds several new features that make the cloud even better for MPI jobs. MPI jobs can’t make use of a subset of cores; they need all-or-nothing. CycleCloud now considers the minimum core count necessary for the job and sets the minimum request size. In other words, if the provider cannot fulfill the entire request, it won’t provision any nodes. Similarly, CycleCloud 6 also adds support for Amazon’s Launch Group feature, which provides all-or-nothing allocation for spot instances. This opens the spot marketing to MPI jobs, which can represent significant per-hour savings. To address the latency concern, CycleCloud now dynamically creates AWS Placement Groups for MPI jobs. This groups instances logically nearby, minimizing latency. At SC16? Stop by booth #3621 for a...

Efficient use of entropy in cloud environments

Secure communication requires entropy — unpredictable input to the encryption algorithms that convert your message into what seems like a string of gibberish. Entropy is particularly important when generating keypairs, encrypting filesystems, and encrypting communication between processes. Computers use a variety of inputs to provide entropy: network jitter, keyboard and mouse input, purpose-built hardware, and so on. Frequently drawing from the pool of entropy can reduce it to the point where communications are blocked waiting for sufficient entropy. Generally speaking, entropy has two aspects: quality (i.e. how random is the value you get?) and the amount available. The quality of entropy can be increased by seeding it from a quality source of entropy. Higher quality entropy makes better initialization vectors for the Linux Pseudo Random Number Generator (LinuxPRNG). The Ubuntu project offers a publicly-available entropy server. The quantity of entropy (i.e. the value of /proc/sys/kernel/random/entropy_avail) is only increased over time. It is worth noting here Virtual Machines in the cloud are not quite “normal” computers in regards to entropy. Cloud instances lack many of the inputs that a physical machine would have, since they don’t have keyboard and mice attached, and the hypervisor buffers away much of the random jitter of internal hardware. Further, the Xen (Amazon Web Service), KVM (Google Cloud), and HyperV (Microsoft Azure) hypervisors virtualize hardware access to varying degrees which can result in diminished entropy. You need to be aware of the entropy available on your instances and how your code affects that. When writing code, it’s important to minimize the calls to /dev/random for entropy as it blocks until sufficient entropy is available. /dev/urandom...

Cloud providers offer newer, better GPUs

Ever since there’s been a public cloud, people have been interested in running jobs on public cloud graphics processing units (GPUs). Amazon Web Services (AWS) became the first to offer this as an option when they announced their first GPU instance type six years ago. GPUs offer considerable performance improvements for some of the most demanding computational workloads. Originally designed to improve the performance of 3D rendering for games, GPUs found a use in big compute due to their ability to perform operations over a set of data rapidly and with a much greater core count than traditional central processing units (CPUs). Workloads that can use a GPU can see a performance improve up to 10-100 times. Two years later, AWS announced an upgraded GPU instance type: the g2 family. AWS does not publish exact capacity or usage numbers, but it’s reasonable to believe that the cg1 instances were sufficiently successful from a business perspective to add the g2s. GPUs are not cheap, so cloud providers won’t keep spending money on them without return. We know that some of our customers were quick to make use of GPU clusters in CycleCloud. But there was a segment of the market that still wasn’t being served. The GPUs in the cg1 and g2 instance families were great for so-called “single precision” floating point operations, but had poor performance for “double precision” operations. Single precision is faster, and is often sufficient for many calculations, particularly graphics rendering and other visualization needs. Computation that requires a higher degree of numerical precision, particularly if exponential calculations are made, need double precision. The GPUs that...