JuiceFS is a distributed high-performance file system designed for cloud environments.Amazon EFS Easy to use and scalable for a wide range of applications.Amazon FSx for Lustre It is a high-performance file system geared towards handling fast and large-scale data workloads.
In this article, we will evaluate the performance and cost performance of these three mainstream file systems through a series of tests to help users quickly understand these products.
01 Test environment
Tests were conducted using AWS's c5.4xlarge instance (16 cores, 32GB RAM, 10Gbps network bandwidth) covering AWS EFS, JuiceFS cloud services (located in U.S. West 2), and AWS FSx for Lustre (1.2 TB, 1,200 MB/s configuration).
JuiceFS client testing: 4 threads were configured to test both large files (1,024 MiB files, 1 MiB read/write per block) and small files (128 KiB files, 100 file reads/writes).
02 Performance Testing and Use Cases
Performance tests are primarily used to evaluate sequential read and write performance for large and small files:
- Sequential reading and writing of large files: Critical for applications that require sustained throughput, such as data processing and training for large language models (LLMs), gene sequencing, big data analytics, video streaming, and data backup;
-
Sequential reading and writing of small files: Simulates common workloads in computer vision data processing and model training, media processing, scientific computing, and other areas.
The following table shows the test results for throughput (in MiB/s):
test scenario | EFS | JuiceFS | FSx for Lustre |
---|---|---|---|
Write large files | 475 | 1,116 | 594 |
Read large files | 568 | 1,016 | 590 |
Write small files | 29 | 8 | 297 |
read a small file | 104 | 160 | 274 |
- Large files: JuiceFS is the fastest at reading and writing large files.
- Small files: FSx for Lustre performs best at reading and writing small files.
The parameters for mounting JuiceFS are as follows:
- buffer-size=1024
- max-upload=200
- max-download=200
03 Cost comparison
The following table lists the rates (in $) for these products:
Cost items | EFS | JuiceFS | FSx for Lustre |
---|---|---|---|
stockpile | 0.30/GB-month | 0.02/GB-month + S3 0.023/GB-month | 0.60/GB-month |
write access | 0.06/GB 0.10/GB cross AZ | Free same region + S3 API $0.005/1,000 requests | Free same AZ 0.10/GB cross AZ |
read access | 0.03/GB 0.10/GB cross AZ | Free same region + S3 API $0.0004/1,000 requests | Free same AZ0.10/GB cross AZ |
read a small file | 104 | 160 | 274 |
- AWS EFS charges based on storage and access, which is relatively costly for large-scale use;
- The costs incurred by using JuiceFS, both S3 and JuiceFS, are still a less costly option, especially when data is accessed within the same region;
- FSx for Lustre offers high performance, but it has the highest storage costs.
One hidden cost to pay special attention to is the cost of transferring data across availability zones (AZs). For AWS EFS and FSx for Lustre, data transfers across different availability zones can incur additional costs. JuiceFS reduces this cost by taking advantage of the fact that S3 storage is freely accessible within the same region.
04 POSIX Compatibility Comparison
In AI applications, where data processing processes are more complex, POSIX compatibility becomes an important factor for teams to consider. Therefore, the conclusion of this comparison will provide a detailed comparison of these three file systems in terms of POSIX compatibility. In our other blog post comparing the POSIX compatibility of seven filesystems on the cloud, JuiceFS passed all 8,832 tests and EFS failed 1,895 tests with a failure rate of 21%, as shown in this blog post.
The previous test did not include FSx for Lustre, we tested FSx for Lustre again for POSIX compatibility following the test methodology in the above blog, and FSX for Lustre failed 16 tests, 14 of which were chown failures, and 2 of which were utimensat failures.
In addition, JuiceFS supports POSIX ACLs, recycle bins, subdirectory mounts, subdirectory quotas, transparent compression of data, encryption of data in transit and at rest, and cross-region replication.
05 Summary
JuiceFS demonstrates significant performance benefits in sequential read/write scenarios, outperforming AWS EFS and comparable to AWS FSx for Lustre results, but offered at a lower cost. Whereas for random read/write operations, typically network file systems are relatively inefficient, in this case, for best performance, it is recommended to use advanced clustering techniques to centralize operations locally while batching remote processes.
Overall, JuiceFS offers a balance of performance and cost advantages, and is a good choice in scenarios that require large sequential read/write workloads.
I hope this has been of some help to you, and if you have any other questions feel free to join theJuiceFS CommunityCommunicate with everyone.