So a short while ago, I wrote an article on HPC in the Cloud for Life Science Leader, a widely-read monthly publication for life science executives.
We had some nice responses, but today google notified me that David Dooling posted a response to my article that has a few inaccurate, IMHO, points, so while my response is awaiting moderation on his page, I thought I'd post it here (sorry for the typo on my first comment, David):
Good to meet you. I've read your blog before, and take issue with your arguments regarding the cloud. I have more posts on Cloud at my blog http://blog.cyclecomputing.com which, as well as the post below. This is an interesting area, and I'd love to correspond with you more at the e-mail at the bottom:
> No wonder he makes cloud computing sound so attractive. No mention of the
> IT expertise needed to get up and running on the cloud. No mention of the
> software engineering needed to ensure your programs run efficiently on
> the cloud.
You are implying that to get running in the cloud, an end user must worry about the "IT expertise" and "software engineering" needed to get applications up and running. I believe this is a straw-man, an incorrect assertion to begin with.
One of the major benefits of virtualized infrastructure and service oriented architectures is that they are repeatable and decouple the knowledge of building the service from the users consuming it. This means that one person, who creates the virtual machine images or the server code running the service, does need the expertise to get an application running properly in the cloud. But after that engineering is done once, a whole community of end-users of that service can benefit without knowledge of the specifics of getting the application to scale.
For example, does everyone that uses GMail/Yahoo/Hotmail know every line of software code to make it run? Do they know every operational aspect of how to make mail scale to tens of thousands of processors across many data centers?
Definitely not, and the point is they don't have to. The same is true for high performance and high throughput computing. To give examples of free services that don't require end user software engineering or IT expertise to do bioinformatics/proteomics/etc.:
– The NIH Website for BLAST has, for years, been running BLAST as a service so that researchers can use GUIs to run queries on parallel back-end infrastructure (see http://www.ncbi.nlm.nih.gov/genome/seq/BlastGen/BlastGen.cgi?taxid=9606) This requires no complicated knowledge or software engineering for scientists to run BLAST as a Service.
– Tools like ViPDAC have 2-minute tutorial videos to run proteomics on Amazon Web Service.
Lastly, because I recognize these benefits, and think they are tremendously valuable as a way to enable organizations to focus on what they are good at, and outsource others like IT and Software engineering, I translate my passion for this area into a business that I've built over the past 5 years. But that doesn't make the point any less valid.
> It may not be apparent from his article, but a program that
> runs well on one or ten computers does not necessarily run well on
> hundreds of computers. In fact, he implies the exact opposite.
> For compute clusters as a service, the math is different: Having 40
> processors work for 100 hours costs the same as having 1,000
> processors run for 4 hours.
> It may cost the same under that scenario, but not everything scales
> linearly. In fact, most things don’t and that less-than-linear scaling
> actually ends up making it cost more to get a shorter turnaround.
This is also a straw-man, and a deceptive one because it does contain a kernel of truth. It is true that all applications don't perform perfectly linearly at infinite scales. So for applications like Genomic Sequencing (CrossBow), or MPI apps like computational fluid dynamics, where there are various serial pieces to the computation including overhead if nothing else, you don't get 1000x the performance for 1000x the processors. But there are many other applications where it is possible to achieve near-linear scaling, what Condor's Miron Livny calls "pleasantly parallel" problems: monte carlo molecular dynamics, BLAST searching with thousands of queries, risk analysis, proteomics runs with different analysis settings, etc.
In fact, regardless of whether the job is linearly scalable, most companies and research institutions don't have 1 cluster to 1 user scenarios. There are multiple users with multiple jobs each. What if you have 10 crossbow users with 10 runs to do on various genomes? Then you can get 100x performance on the *workflow as a whole*.
And this is the problem most life science companies/research organizations face when they have multiple users. If 10 people are all submitting 10 jobs to an internal cluster, generally speaking, work can get done far faster with more resources, because the separate jobs from separate users can run along side each other. Hence, if you're in the common case of having many users on your cluster/grid, then on the cloud the math is different, odds are 25x the processors will run near 25x faster.
This is an interesting area, and I'd love to correspond with you more off-line, my e-mail is js at Cycle Computing.