CRUSH
Ceph is a highly distributed data storage system designed not just to store massive amounts of data, but also to guarantee reliability, scalability, and performance. Instead of constraining I/O to be performed only in fixed-size blocks like traditional file systems, it provides a simple interface for reading and writing data in variable-sized units called objects. These objects are then replicated and distributed throughout the cluster to provide effective fault-tolerance and parallel access.
Distributing objects throughout the cluster arbitrarily improves aggregate write performance but complicates read operations. How should we allocate objects in the cluster? What gets us uniform distribution despite cluster topology changes? What should we do when drives die or hosts fail? If we relocate data to a new location, how will clients know about the new location? In the past, these issues have been addressed by storing an allocation list of all Objects that is updated when objects are...