Poster & Lightning Talk: BLAST Workflow Performance on EC2 @ Rocky Mountain Bioinformatics

We've done quite a bit of application engineering for running various life science workflows in Amazon EC2, and we're (finally) getting some of our analysis of running BLAST on Condor in EC2 up on the web, so I wanted to share. 

Back in 2008, we did a benchmark of the performance of running BLAST workflows running on Condor on EC2, and gave a Poster & Lightning Talk at Rocky Mountain Bioinformatics .

Results: We were able to get 1.9825x performance for every 2x the cores on a Condor cluster in EC2. One of the more interesting visualizations that one of our talented guys, Ian Alderman, did, was on the reason that jobs run efficiently when done in high throughput environments.

Basically, the chart below shows the run times of the first(orange), second (yellow), third (green), etc. tasks running on various processors in the cluster (the separate rows in the chart), with time running left to right.

As you can see, high throughput computing takes advantage of the fact
that different tasks/jobs, in this case from the same user, take
different amounts of time. As a result, over time processors balance out their usage to finish comparatively close together in run-time:

HighThroughput

We will post more details about how our BLAST pipeline works in an upcoming post, but you can find the poster with more detail here:

Blast_CycleCloud_RockyMountain_Poster

Share this: