We had a bunch of interest in the last post we made with a Correction to Stephen Wilson's blog from Sun indicating the SGE 6.2u5 release, in December 2009, was the first cloud-aware scheduler in enabling scheduling to Amazon EC2 and scheduling of Hadoop jobs.
Miha Ahronovitz posted some questions that I wanted to address specifically. First, Miha, thanks for your response about Condor's capabilities. By the questions asked and other feedback we've gotten from the first post, it seems
like folks may be interested in finding out more about Condor and what it can do, so I highly recommend that you attend CondorWeek that the Condor Team hosts every year or feel free to contact Cycle. CondorWeek will give you a broad spectrum of all the cool areas people use Condor.
As I stated in the last blog, as full disclosure, we started using Condor 6 years ago, and Hadoop/SGE/PBS within the last 2 years. It may sound like this is a Condor-only vantage point, but we are a unique company in that we support these other schedulers, and truly use the proper tool for the client in various scenarios. Given Miha's questions, please allow me to provide answers about Condor's capabilities, and ask about an SGE feature I saw:
"Can you use Condor to transform a private cloud from in a cost center to a profit center? On-demand means billing, billing means making money."
Yes. Condor was first able to manage internal VM-based infrastructure in the 6.9.4 release*, available on Sept. 5, 2007. Usage Tracking for billing/chargeback has been done by the Condor Accountant for 10-20 years to track usage and provide information for the fair-share engine (as Condor was an early advocate of fair-sharing for high throughput computing). The Accountant, a part of Condor's job match-making daemon, tabulates Accumulated Usage in compute hours per User/AccountingGroup.
Additionally, Cycle's CycleServer management tool for Condor provides detailed analysis of usage for chargeback, and since 2005 has been used by many Fortune 100's, SMBs, government labs, and universities. Condor's Quill* is a system for storing all the actions of a grid in an RDBMS for tracking/billing/chargeback. It was first released in Sept 2005 in Condor 6.7.11, and is the Condor equivalent of SGE's ARCo (started in Dec 2006).
As you can see, Condor has been doing this a while…
"Can Condor manage a private cloud, based on customer's premises, to access Amazon only during peak demand?"
Yes. As mentioned earlier, Condor can perform scheduling of VMs across a pool of machines. It can also move jobs into the Amazon Cloud based upon business logic using the condor_job_router*, which was first released on July 15, 2008, about 1.5 years ago. Additionally, since 2007, we've used simple Condor scripts to look at the job queue, add nodes to a Condor pool from Amazon EC2 when jobs demand more compute, and separate Condor "Hawkeye"* scripts to terminate EC2 nodes when no further work is available to run on them.
"Can you provide billing content so the local organization makes money, not only Amazon.com?"
Yes, as you can see in the first response regarding internal chargeback.
"Can Condor recommend turning on and off power and create pools of on-demand hosts, saving power?"
Condor calls this Green Computing*, which Condor started providing features for in November 2008. Condor shuts down the nodes according to used-defined, fully-configurable policy expressions and re-powers them as needed. Other schedulers have innovated in this area as well, including PBS, SGE, and the like.
With regard to SGE, obviously a lot of cool work is being done, it just wasn't first with the two features described by Sun/Stephen Wilson's blog. I saw mention of an interesting feature in 6.2u5, for "topology-aware" scheduling, and as this is a pretty broad area, I'd love to hear more about how this feature works?
*Important note: All of these features and the ability to use
Condor's Hawkeye framework for custom scripting, are available in the
free, pre-built binaries for Condor from the University of
Wisconsin-Madison. Condor Users don't have to build
Condor from source because they want access to specific features
without paying; Full-Featured Condor is