CycleCloud 6.5.3 released

Last week we pushed the button on the latest release of our CycleCloud software for managing cloud HPC and Big Compute workloads. This release has one particular feature that many customers customers asked for: Cost Alerting. This new feature will give you the ability to easily set cost alerts on a per-cluster basis. You can set the alert to be dollars per day or per week. This gives you a great way to manage consumption and assure that users aren’t blowing through budgets. After all, you want to give your users access to unlimited compute, but you don’t want to give them an unlimited budget. Clusters from any supported cloud service provider display an estimated compute cost along with the core-hour usage. Daily or monthly budgets are set from the cluster page and trigger alerts when the threshold is crossed. Because the appropriate action when a cluster goes over budget varies, the CycleCloud software does not take any automated enforcement action. We find most customers try to set it the threshold to some percentage of total budget to give them a heads up before exceeding the budget. The percentage can be a function of the type of work and size of budget. In addition to the cost alerting, we’ve added additional features to our Microsoft Azure support. CycleCloud now uses Azure Managed Disks and Images for virtual machines, simplifying management of storage and improving performance. Azure instances will automatically use CycleCloud’s standalone DNS configuration to improve the experience for Open Grid Scheduler users. Current customers can download CycleCloud 6.5.3 from the Cycle Computing Portal. If you’d like to learn...

Use AWS EBS Snapshots to speed instance setup

Time has always been money, but this is especially true when you rent your compute. Shortening the setup time of a cluster not only shortens the time-to-results, but it saves money. We recently worked with a customer to cut about 20 minutes from the initial boot time of their compute instances by changing how they staged in their data. This customer does cancer research, so the jobs have a lot of reference data. The reference data was provided by a collaborator and stored in Amazon S3, but in a different region from the compute instances used by the researchers. Initially, each instance pulled the data from S3 using Cycle Computing’s pogo tool for cloud data transfer. pogo provides great performance but since the traffic went across the public internet, it was slow and sometimes overwhelmed the AWS NAT Gateway. We looked at two options: mirroring the data to the desired region and using EBS Snapshots. We chose snapshots because the reference data was static, so the effort of creating a snapshot was amortized over many computation runs. Using a snapshot takes a little bit of time to create and copy, but the startup time for each compute instance is smaller than downloading the data from S3 for each time an instance starts. When execute nodes start, CycleCloud creates a volume from that Snapshot and attaches it to the instance. This gives each execute node a local copy of the reference data in a fraction of the time. Note that disk snapshots are not the only way to avoid downloading reference data each time. Snapshots work well when the reference...

Leap second #37 is coming!

Everybody always talks about needing more time. Well, this year you get it! Saturday night will be one second longer than normal. A leap second is being inserted in order to slow clocks down to match the Earth’s rotation. Beyond just adding a second to your day, your software needs to be ready as well. The addition of leap seconds in 2012 and 2015 means that many software systems are ready for it. This includes CycleCloud and the cloud service providers it works with. Leap second handling Here’s how the cloud service providers handle the leap second: Amazon Web Services — The additional second is spread over the 24 hour period from 12:00 UTC on December 31 through 12:00 UTC on January 1. Each “second” will be 1/86400 longer. Azure — In 2015, Azure inserted leap seconds at midnight local time. The assumption is that they will do this again. Google Cloud — The additional second is spread over the 20 hour period from 14:00 UTC on December 31 through 10:00 UTC on January 1. Instances started in the cloud providers will depend on the configured behavior. Generally speaking, Linux instances will use the NTP server pools and handle the change in the kernel. Windows instances on AWS will follow the AWS time adjustment above. Windows generally handles leap seconds by changing the clock at the next update. It’s a leap year, too In case one extra second was not enough 2016 for you, remember that this year was a leap year as well. If your application considers the day of the year, you’ll want to make sure it’s...

Efficient use of entropy in cloud environments

Secure communication requires entropy — unpredictable input to the encryption algorithms that convert your message into what seems like a string of gibberish. Entropy is particularly important when generating keypairs, encrypting filesystems, and encrypting communication between processes. Computers use a variety of inputs to provide entropy: network jitter, keyboard and mouse input, purpose-built hardware, and so on. Frequently drawing from the pool of entropy can reduce it to the point where communications are blocked waiting for sufficient entropy. Generally speaking, entropy has two aspects: quality (i.e. how random is the value you get?) and the amount available. The quality of entropy can be increased by seeding it from a quality source of entropy. Higher quality entropy makes better initialization vectors for the Linux Pseudo Random Number Generator (LinuxPRNG). The Ubuntu project offers a publicly-available entropy server. The quantity of entropy (i.e. the value of /proc/sys/kernel/random/entropy_avail) is only increased over time. It is worth noting here Virtual Machines in the cloud are not quite “normal” computers in regards to entropy. Cloud instances lack many of the inputs that a physical machine would have, since they don’t have keyboard and mice attached, and the hypervisor buffers away much of the random jitter of internal hardware. Further, the Xen (Amazon Web Service), KVM (Google Cloud), and HyperV (Microsoft Azure) hypervisors virtualize hardware access to varying degrees which can result in diminished entropy. You need to be aware of the entropy available on your instances and how your code affects that. When writing code, it’s important to minimize the calls to /dev/random for entropy as it blocks until sufficient entropy is available. /dev/urandom...

Cycle Computing: The Cloud Startup that Just Keeps Kicking

You may have already seen the article that The Next Platform recently published about us, but if you haven’t, take a moment to read it. We are flattered to be recognized by The Next Platform folks. They are forward-thinking and have a long and respected history of covering big compute. It has not always been easy, but we’ve gone from an $8,000 credit card bill to a self-sustaining business with an impressive customer list across a variety of industries. As CEO Jason Stowe said, “We’ve been fortunate to have the chance to build something wonderful on its own merits, and I could not be more proud of the way we have grown.” As great as the past has been, the future is looking even better. It is exciting to see the market begin to realize what our employees and customers have known all along: utility access to computing is a major driver of innovation and research. Cloud-based big compute has been embraced in life sciences, manufacturing, financial services, academia, pharmaceuticals, and so much more. As big compute in the public cloud moves from a niche setup to a mainstream operation, we are excited to lead the field. It has been great watching our customers discover the value of the cloud, and we’re looking forward to helping even more organizations do the same.  ...