In any large, modern organization there exists a considerable deployment of desktop-based compute power. Those bland, beige boxes used to piece together slide presentations, surf the web and send out reminders about cake in the lunch room are turned on at 8am and off at 5pm, left to collect dust after hours. Especially with modern virtual desktop initiatives (VDI), thin clients running Linux are left useless, despite the value they hold from a compute perspective.
Fortune 100 Bank Harvesting Cycles
Today we want to educate you about how big financial services companies use desktops of any type to perform high throughput pricing and risk calculations. The example we want to leverage is from a Fortune 100 company, let's call them ExampleBank, that runs a constant stream of moderate data and heavy CPU computations on their dedicated grid. As an alternative to dedicated server resources, running jobs on desktops was estimated to save them millions in server equipment, power and other operation costs, and London/UK data center space, thanks to open source software that has no license costs associated with it!
Cycle engineers worked with their desktop management IT team to deploy Condor on thousands of their desktops, all managed by our CycleServer product. Once deployed, Condor falls under control of CycleServer and job execution policies are crafted to allow latent desktop cycles to be used for quantitative finance jobs.
Condor is a highly flexible job execution engine that can fit very comfortably into a desktop compute environment, offering up spare cycles to grid jobs when the desktop machine is not being used for its primary role. Our client wanted a policy that would make machines available after hours and on weekends, but only if the machine wasn't performing computational or interactive work for its owner when the execution window opened. Condor tracks mouse and keyboard activity as well as non-Condor initiated CPU load making it possible to craft execution policies that meet complex requirements such as this case.
The first step was to define a policy section that managed the execution windows based on time and day of the week. Two macros were created to simplify the final START expression. The first macro, WEEKDAY_CAN_START, tested to see if the date and time fall on a weekday execution window, the second macro, WEEKEND_CAN_START, tested to see if the date and time fall on a weekend execution window. If either is true we know the machine can run jobs. We combined them in the RUNWINDOW_SCHEDULE_OBEYED macro for simplicity. If this macro is true, the job window is open.
# When, during a weekday, should Condor begin to run jobs and stop# running jobs?# 18:00 start time (=1080 minutes in to the day)WEEKDAY_START_TIME = 1080# 08:00 end time (=480 minutes in to the day)WEEKDAY_END_TIME = 480# When, during the weekend, should Condor begin to run jobs and stop# running jobs?# 00:00 start time (=0 minutes in to the day)WEEKEND_START_TIME = 0# 23:59 end time (=1439 minutes in to the day)WEEKEND_END_TIME = 1439# Boolean expression that returns true if it's a weekday and we're# in the weekday start window.WEEKDAY_CAN_START = \( \( \$(WEEKDAY_END_TIME) > $(WEEKDAY_START_TIME) && \( \ClockMin >= $(WEEKDAY_START_TIME) && \ClockMin <= $(WEEKDAY_END_TIME) \) \) || ( \$(WEEKDAY_END_TIME) < $(WEEKDAY_START_TIME) && \( \ClockMin >= $(WEEKDAY_START_TIME) || \ClockMin <= $(WEEKDAY_END_TIME) \) \) \)# Boolean expression that returns true if it's a weekend and we're# in the weekend start window.WEEKEND_CAN_START = \( \( \$(WEEKEND_END_TIME) > $(WEEKEND_START_TIME) && \( \ClockMin >= $(WEEKEND_START_TIME) && \ClockMin <= $(WEEKEND_END_TIME) \) \) || ( \( \$(WEEKEND_END_TIME) < $(WEEKEND_START_TIME) && \( \ClockMin >= $(WEEKEND_START_TIME) || \ClockMin <= $(WEEKEND_END_TIME) \) \) \) \)# This returns true if the schedule for running jobs is in a# run window. Otherwise false.RUNWINDOW_SCHEDULE_OBEYED = \( \((ClockDay > 0 && ClockDay < 6) && $(WEEKDAY_CAN_START)) || \((ClockDay == 0 || ClockDay == 6) && $(WEEKEND_CAN_START)) \)
With time-of-day windows now being computed we turned our attention to monitoring external-to-Condor CPU load and user activity on the console to ensure that the job window only opens up when the machine is really not being used off hours. Configurable macros were set up to determine when external-to-Condor CPU was deemed to be unacceptably high. The parameterized approach used meant the IT staff at ExampleBank could easily tune the policy. The CPUIdle macro indicated when a machine has become quiet and the CPUBusy macro indicated when a machine had become loaded down with non-Condor work.
# These are the load values that determine if non-Condor# load is acceptably low enough to run jobs.BackgroundLoad = 0.3HighLoad = 0.5# This is the non-Condor load averageNonCondorLoadAvg = (LoadAvg – CondorLoadAvg)# These macros return true or false and answer load questions# about this machine.CPUIdle = ($(NonCondorLoadAvg) <= $(BackgroundLoad))CPUBusy = ($(NonCondorLoadAvg) >= $(HighLoad))
KeyboardBusy = (KeyboardIdle < 60*60)ConsoleBusy = (ConsoleIdle < 60*60)ConsoleNotBusy = ($(KeyboardBusy) == False && $(ConsoleBusy) == False)
All three pieces were combined to define the START expression for the machine, which if true, indicates the machine is available to run jobs:
START = ($(ConsoleNotBusy)) && \($(CPUIdle)) && \($(RUNWINDOW_SCHEDULE_OBEYED) =?= True)
- The console may become busy in the form of mouse or keyboard activity;
- The external-to-Condor CPU load may go beyond the threshold set for acceptably low;
- The time of day may pass in to a time where jobs are not supposed to execute.
When reason 1 and 2 were encountered, ExampleBank wished to run jobs that suspended for a period of 1 hour before being returned to the queue to relocate CPU resources to use. For reason 3, it was desired the job be returned to the queue immediately when the time-of-day execution window closed since it known the job would have no chance to run on this machine again for many hours.
SUSPEND = ($(CPUBusy)) || ($(ConsoleBusy))WANT_SUSPEND = SUSPENDCONTINUE = ($(CPUBusy) =!= True) && ($(ConsoleNotBusy))PREEMPT = (Activity == "Suspended") && ($(ActivityTimer) > 3600) || \(($(RUNWINDOW_SCHEDULE_OBEYED) =?= False))
The SUSPEND statement tells Condor to suspend a job if the CPU or console becomes busy. The CONTINUE statement tells Condor to resume running a suspended job if the console or CPU stops being busy. The PREEMPT statement says put the job back in the queue if it's been suspended for more than an hour or if the run window time period has closed.