Last week, Amazon released the Elastic File System (EFS) in the US East (Northern Virginia), US West (Oregon), and Europe (Ireland) regions. EFS provides a scalable, POSIX-compliant filesystem for Amazon EC2 instances without having to run a file server. This means you can grow your storage as your usage increases instead of having to pre-provision disks. Instances mount EFS just as they would any traditional NFS volume.
Of course, we know that, you, our customers will want to start testing workloads against EFS, so we’ve added support for it in the next CycleCloud release. Once the EFS is created through the AWS console, cluster instances can mount it with the configuration you’re already used to. For example, the configuration below will mount EFS
fs-f00cf6b8 to /mnt/efs_test: [[[configuration cyclecloud.mounts.efs_test]]] type = efs filesystem_id = fs-f00cf6b8
So what does EFS look like in the real world? We took an I/O-intensive genomics workload and ran it on a 16-instance cluster using four different configurations:
- c3.4xlarge filer using ephemeral storage
- c3.4xlarge filer using a 500 GB GP2 (solid state drive) volume
- c4.4xlarge filer using a 500 GB GP2 volume
- EFS (Basic)
Each job runs without competition on a c4.4xlarge instance and pulls 25 GB of reference genome data into memory. The code performs genomic alignment in batches and at the end writes approximately 1 GB of data (per job) back to the filer. The table below shows the average runtimes for the different filer configurations with as many as 16 of such tasks simultaneously using the shared filer:
|Filer||Simultaneous Tasks||Average runtime (seconds)|
Just based on the job runtime, EFS seems like the wrong choice for this workload. We did try EFS PIOPS, which offers higher performance at scale, but the small scale tests were substantially the same. Keep in mind that EFS bandwidth scales with size as well, file systems larger than 1TB can burst to 100 MB/s per TB of storage. Larger production uses would likely see better performance than in our benchmarks above, and it will scale better than a traditional NFS server at the thousand-plus clients range.
Of course, raw performance isn’t the only thing we all care about. Cost is often the most significant consideration. With EBS volumes, you pay for the amount of space provisioned, whether it is used or not. Thus, you’re paying for capacity you hope to never use, since filling the disk is generally something to avoid. With EFS, you pay for only the storage in use. And since no EC2 instance is required to serve as a filer, the overall cost is cheaper in most cases.
Another consideration is the reduced complexity of not having a separate filer instance. Changing the storage on a filer often means taking it offline, whereas EFS grows dynamically as you add data. In addition, you avoid the risk of the various things that can go wrong with a server: kernel panics, memory leaks, etc.
One thing is clear: cloud services mean you have a wide range of storage options to meet the specific needs of your application. With CycleCloud, it’s easy to benchmark different configurations and find what works best for you.