Comparison of distributed file systems
In computing, a distributed file system (DFS) or network file system is any file system that allows access to files from multiple hosts sharing via a computer network. This makes it possible for multiple users on multiple machines to share files and storage resources.
Distributed file systems differ in their performance, mutability of content, handling of concurrent writes, handling of permanent or temporary loss of nodes or storage, and their policy of storing content.
Locally managed
FOSS
Client | Written in | License | Access API | High availability | Shards | Efficient Redundancy | Redundancy Granularity | Initial release year | Memory requirements (GB) |
---|---|---|---|---|---|---|---|---|---|
Alluxio (Virtual Distributed File System) | Java | Apache License 2.0 | HDFS, FUSE, HTTP/REST, S3 | hot standby | No | Replication[1] | File[2] | 2013 | |
Ceph | C++ | LGPL | librados (C, C++, Python, Ruby), S3, Swift, FUSE | Yes | Yes | Pluggable erasure codes[3] | Pool[4] | 2010 | 1 per TB of storage |
Coda | C | GPL | C | Yes | Yes | Replication | Volume[5] | 1987 | |
GlusterFS | C | GPLv3 | libglusterfs, FUSE, NFS, SMB, Swift, libgfapi | Yes | Yes | Reed-Solomon[6] | Volume[7] | 2005 | |
MooseFS | C | GPLv2 | POSIX, FUSE | master | No | Replication[8] | File[9] | 2008 | |
Quantcast File System | C | Apache License 2.0 | C++ client, FUSE (C++ server: MetaServer and ChunkServer are both in C++) | master | No | Reed-Solomon[10] | File[11] | 2012 | |
IPFS | Go | Apache 2.0 or MIT | HTTP gateway, FUSE, Go client, Javascript client, command line tool | Yes | with IPFS Cluster | Replication[12] | Block[13] | 2015[14] | |
Kertish-DFS | Go | GPLv3 | HTTP(REST), CLI, C# Client, Go Client | Yes | Replication | 2020 | |||
LizardFS | C++ | GPLv3 | POSIX, FUSE, NFS-Ganesha, Ceph FSAL (via libcephfs) | master | No | Reed-Solomon[15] | File[16] | 2013 | |
Lustre | C | GPLv2 | POSIX, NFS-Ganesha, NFS, SMB | Yes | Yes | No redundancy[17] | No redundancy[18] | 2003 | |
MinIO | Go | Apache Licence 2.0 | AWS S3 API | Yes | Yes | Reed-Solomon[19] | Object[20] | 2014 | |
OpenAFS | C | IBM Public License | Virtual file system, Installable File System | Replication | Volume[21] | 2000 [22] | |||
OpenIO[23] | C | AGPLv3 / LGPLv3 | Native (Python, C, Java), HTTP/REST, S3, Swift, FUSE (POSIX, NFS, SMB, FTP) | Yes | Pluggable erasure codes[24] | Object[25] | 2015 | 0.5 | |
RozoFS | C, Python | GPLv2 | FUSE, SMB, NFS, key/value | Yes | Mojette[26] | Volume[27] | 2011[28] | ||
SeaweedFS | Go, Java | Apache License 2.0 | HTTP (REST), POSIX, FUSE, S3, HDFS | replicated filer store | Reed-Solomon[29] | Volume[30] | 2015 | ||
Tahoe-LAFS | Python | GNU GPL [31] | HTTP (browser or CLI), SFTP, FTP, FUSE via SSHFS, pyfilesystem | Reed-Solomon[32] | File[33] | 2007 | |||
HDFS | Java | Apache License 2.0 | Java and C client, HTTP, FUSE[34] | transparent master failover | No | Reed-Solomon[35] | File[36] | 2005 | |
XtreemFS | Java, C++ | BSD License | libxtreemfs (Java, C++), FUSE | Replication[37] | File[38] | 2009 | |||
Ori[39] | C, C++ | MIT | libori, FUSE | Replication | Filesystem[40] | 2012 |
Proprietary
Client | Written in | License | Access API |
---|---|---|---|
BeeGFS | C / C++ | FRAUNHOFER FS (FhGFS) EULA,[41]
GPLv2 client |
POSIX |
ObjectiveFS[42] | C | Proprietary | POSIX, FUSE |
Spectrum Scale (GPFS) | C, C++ | Proprietary | POSIX, NFS, SMB, Swift, S3, HDFS |
MapR-FS | C, C++ | Proprietary | POSIX, NFS, FUSE, S3, HDFS, CLI |
PanFS | C, C++ | Proprietary | DirectFlow, POSIX, NFS, SMB/CIFS, HTTP, CLI |
Infinit[43] | C++ | Proprietary (to be open sourced)[44] | FUSE, Installable File System, NFS/SMB, POSIX, CLI, SDK (libinfinit) |
Isilon OneFS | C/C++ | Proprietary | POSIX, NFS, SMB/CIFS, HDFS, HTTP, FTP, SWIFT Object, CLI, Rest API |
Scality | C | Proprietary | FUSE, NFS, REST, AWS S3 |
Quobyte | Java, C++ | Proprietary | POSIX, FUSE, NFS, SMB/CIFS, HDFS, AWS S3, TensorFlow Plugin, CLI, Rest API |
Remote access
Name | Run by | Access API |
---|---|---|
Amazon S3 | Amazon.com | HTTP (REST/SOAP) |
Google Cloud Storage | HTTP (REST) | |
SWIFT (part of OpenStack) | Rackspace, Hewlett-Packard, others | HTTP (REST) |
Microsoft Azure | Microsoft | HTTP (REST) |
IBM Cloud Object Storage | IBM (formerly Cleversafe)[45] | HTTP (REST) |
Comparison
Some researchers have made a functional and experimental analysis of several distributed file systems including HDFS, Ceph, Gluster, Lustre and old (1.6.x) version of MooseFS, although this document is from 2013 and a lot of information are outdated (e.g. MooseFS had no HA for Metadata Server at that time).[46]
The cloud based remote distributed storage from major vendors have different APIs and different consistency models.[47]
References
- "Caching: Managing Data Replication in Alluxio".
- "Caching: Managing Data Replication in Alluxio".
- "Erasure Code Profiles".
- "Pools".
- Satyanarayanan, Mahadev; Kistler, James J.; Kumar, Puneet; Okasaki, Maria E.; Siegel, Ellen H.; Steere, David C. "Coda: A Highly Available File System for a Distributed Workstation Environment" (PDF). Cite journal requires
|journal=
(help) - "Erasure coding implementation".
- "Setting up GlusterFS Volumes".
- Only available in the proprietary version 4.x "[feature] erasure-coding #8".
- "mfsgoal(1)".
- "The Quantcast File System" (PDF).
- "qfs/src/cc/tools/cptoqfs_main.cc".
- Erasure coding plan: "Reed-Solomon layer over IPFS #196"., "Erasure Coding Layer #6".
- "CLI Commands: ipfs bitswap wantlist".
- "Why The Internet Needs IPFS Before It's Too Late".
- "Configuring Replication Modes".
- "Configuring Replication Modes: Set and show the goal of a file/directory".
- "Lustre Operations Manual: What a Lustre File System Is (and What It Isn't)". Reed-Solomon in progress: "LU-10911 FLR2: Erasure coding".
- "Lustre Operations Manual: What a Lustre File System Is (and What It Isn't)". File-level redundancy plan: "File Level Redundancy Solution Architecture".
- "MinIO Erasure Code Quickstart Guide".
- "MinIO Storage Class Quickstart Guide".
- "Replicating Volumes (Creating Read-only Volumes)".
- https://www.openafs.org/release/openafs-1.0.html
- "OpenIO SDS Documentation". docs.openio.io.
- "Erasure Coding".
- "Declare Storage Policies".
- "About RozoFS: Mojette Transform".
- "Setting up RozoFS: Exportd Configuration File".
- "Initial commit".
- "Erasure Coding for warm storage".
- "Replication".
- "About Tahoe-LAFS".
- "zfec -- a fast C implementation of Reed-Solomon erasure coding".
- "Tahoe-LAFS Architecture: File Encoding".
- "MountableHDFS".
- "HDFS-7285 Erasure Coding Support inside HDFS".
- "Apache Hadoop: setrep".
- "Under the Hood: File Replication".
- "Quickstart: Replicate A File".
- "Ori: A Secure Distributed File System".
- Mashtizadeh, Ali Jose; Bittau, Andrea; Huang, Yifeng Frank; Mazières, David. "Replication, History, and Grafting in the Ori File System" (PDF). Cite journal requires
|journal=
(help) - "FRAUNHOFER FS (FhGFS) END USER LICENSE AGREEMENT". Fraunhofer Society. 2012-02-22.
- "ObjectiveFS official website".
- "The Infinit Storage Platform".
- "Infinit's Open Source Projects".
- "IBM Plans to Acquire Cleversafe for Object Storage in Cloud". www-03.ibm.com. 2015-10-05. Retrieved 2019-05-06.
- Séguin, Cyril; Depardon, Benjamin; Le Mahec, Gaël. "Analysis of Six Distributed File Systems" (PDF). HAL.
- "Data Consistency Models of Public Cloud Storage Services: Amazon S3, Google Cloud Storage and Windows Azure Storage". SysTutorials. Retrieved 19 June 2017.