Use AWS EBS Snapshots to speed instance setup

Time has always been money, but this is especially true when you rent your compute. Shortening the setup time of a cluster not only shortens the time-to-results, but it saves money. We recently worked with a customer to cut about 20 minutes from the initial boot time of their compute instances by changing how they staged in their data. This customer does cancer research, so the jobs have a lot of reference data. The reference data was provided by a collaborator and stored in Amazon S3, but in a different region from the compute instances used by the researchers. Initially, each instance pulled the data from S3 using Cycle Computing’s pogo tool for cloud data transfer. pogo provides great performance but since the traffic went across the public internet, it was slow and sometimes overwhelmed the AWS NAT Gateway. We looked at two options: mirroring the data to the desired region and using EBS Snapshots. We chose snapshots because the reference data was static, so the effort of creating a snapshot was amortized over many computation runs. Using a snapshot takes a little bit of time to create and copy, but the startup time for each compute instance is smaller than downloading the data from S3 for each time an instance starts. When execute nodes start, CycleCloud creates a volume from that Snapshot and attaches it to the instance. This gives each execute node a local copy of the reference data in a fraction of the time. Note that disk snapshots are not the only way to avoid downloading reference data each time. Snapshots work well when the reference...

Leap second #37 is coming!

Everybody always talks about needing more time. Well, this year you get it! Saturday night will be one second longer than normal. A leap second is being inserted in order to slow clocks down to match the Earth’s rotation. Beyond just adding a second to your day, your software needs to be ready as well. The addition of leap seconds in 2012 and 2015 means that many software systems are ready for it. This includes CycleCloud and the cloud service providers it works with. Leap second handling Here’s how the cloud service providers handle the leap second: Amazon Web Services — The additional second is spread over the 24 hour period from 12:00 UTC on December 31 through 12:00 UTC on January 1. Each “second” will be 1/86400 longer. Azure — In 2015, Azure inserted leap seconds at midnight local time. The assumption is that they will do this again. Google Cloud — The additional second is spread over the 20 hour period from 14:00 UTC on December 31 through 10:00 UTC on January 1. Instances started in the cloud providers will depend on the configured behavior. Generally speaking, Linux instances will use the NTP server pools and handle the change in the kernel. Windows instances on AWS will follow the AWS time adjustment above. Windows generally handles leap seconds by changing the clock at the next update. It’s a leap year, too In case one extra second was not enough 2016 for you, remember that this year was a leap year as well. If your application considers the day of the year, you’ll want to make sure it’s...

Efficient use of entropy in cloud environments

Secure communication requires entropy — unpredictable input to the encryption algorithms that convert your message into what seems like a string of gibberish. Entropy is particularly important when generating keypairs, encrypting filesystems, and encrypting communication between processes. Computers use a variety of inputs to provide entropy: network jitter, keyboard and mouse input, purpose-built hardware, and so on. Frequently drawing from the pool of entropy can reduce it to the point where communications are blocked waiting for sufficient entropy. Generally speaking, entropy has two aspects: quality (i.e. how random is the value you get?) and the amount available. The quality of entropy can be increased by seeding it from a quality source of entropy. Higher quality entropy makes better initialization vectors for the Linux Pseudo Random Number Generator (LinuxPRNG). The Ubuntu project offers a publicly-available entropy server. The quantity of entropy (i.e. the value of /proc/sys/kernel/random/entropy_avail) is only increased over time. It is worth noting here Virtual Machines in the cloud are not quite “normal” computers in regards to entropy. Cloud instances lack many of the inputs that a physical machine would have, since they don’t have keyboard and mice attached, and the hypervisor buffers away much of the random jitter of internal hardware. Further, the Xen (Amazon Web Service), KVM (Google Cloud), and HyperV (Microsoft Azure) hypervisors virtualize hardware access to varying degrees which can result in diminished entropy. You need to be aware of the entropy available on your instances and how your code affects that. When writing code, it’s important to minimize the calls to /dev/random for entropy as it blocks until sufficient entropy is available. /dev/urandom...

Cycle Computing: The Cloud Startup that Just Keeps Kicking

You may have already seen the article that The Next Platform recently published about us, but if you haven’t, take a moment to read it. We are flattered to be recognized by The Next Platform folks. They are forward-thinking and have a long and respected history of covering big compute. It has not always been easy, but we’ve gone from an $8,000 credit card bill to a self-sustaining business with an impressive customer list across a variety of industries. As CEO Jason Stowe said, “We’ve been fortunate to have the chance to build something wonderful on its own merits, and I could not be more proud of the way we have grown.” As great as the past has been, the future is looking even better. It is exciting to see the market begin to realize what our employees and customers have known all along: utility access to computing is a major driver of innovation and research. Cloud-based big compute has been embraced in life sciences, manufacturing, financial services, academia, pharmaceuticals, and so much more. As big compute in the public cloud moves from a niche setup to a mainstream operation, we are excited to lead the field. It has been great watching our customers discover the value of the cloud, and we’re looking forward to helping even more organizations do the same.  ...

Using Tags for Tracking Cloud Resources

When Amazon announced that the limit for tags on resources was increased from 10 to 50, a great cheer went up from the masses. In the same way that you’ll always own slightly more stuff than you can fit in your house, many cloud users find they wanted more tags than were available. Tags are custom labels that can be assigned to cloud resources. While they don’t have a functional effect (except for in Google Compute Platform, see below), they can be very powerful for reporting and automation. For example, some customers have a single corporate account and apply resources based on department, user, project, et cetera for chargeback. Some customers also use labels in automated tools. For example, you can tag instances with a “backup” attribute and have a script that polls those instances to create snapshots of permanent volumes on a daily basis. Or perhaps you have an account for testing and you don’t want users to accidentally leave instances running forever. You can automatically terminate long-running instances that don’t have a “keepalive” tag set. In Amazon Elastic Compute Cloud (EC2) and Microsoft Azure, tags are key-value pairs. CycleCloud supports adding tags to instances and volumes a simple syntax:       tags.Application = my application       tags.CustomValue = 57       tags.Custom Text = Hello world Tags in Google Compute Platform The term “tag” has a different meaning in Google Compute Platform. A “tag” is an attribute places on an instance that is used to apply network or firewall settings. Other resources do not have tags. CycleCloud supports adding tags to GCP instances...