Alluxio is a virtual distributed file system that enables applications to access files and objects in different external storage like S3 or HDFS in a unified file system namespace with a single API. Scaling the capacity of Alluxio metadata service is vital to Alluxio for a couple of reasons:
- Alluxio provides a single namespace where multiple storage systems can be mounted. So the size of Alluxio's namespace needs to match the sum of the sizes of all mounted storages.
- Object storage is increasing in popularity, and object stores often hold many more small files compared with file systems like HDFS.
In Alluxio 1.x, the metadata service is limited to around 200 million files in practice. Scaling further would cause garbage collection issues due to the limited JVM heap size of the Alluxio master process. Also, storing 200 million files would require a large memory footprint (around 200GB) of JVM heap in a single machine running Alluxio master.