CycleCloud 6 feature: Improved HealthCheck framework

This post is one of several in a series describing features introduced in CycleCloud 6, which we released on November 8.

CycleCloud’s HealthCheck system provides a mechanism for detecting and terminating bad instances. The definition of “bad” can vary by use case, so HealthCheck provides a framework for custom scripts to use instead of prescribing a defined state. HealthCheck runs customer-provided Python and shell (Unix shell or Windows batch, as appropriate) on cloud instances regularly.

New in CycleCloud 6, we’ve made it safer and easier for users to create custom health checks for the clusters. The HealthCheck system requires an explicit return code (254) to terminate an instance. This removes the possibility of an error in the script itself causing CycleCloud to terminate an instance. A return code of 0 still indicates a healthy system, and any other exit code is logged to CycleCloud’s Event Log.

At SC16? Stop by booth #3621 for a demo!

Share this: