Chapter 1, Planning for Ceph, covers the basics of how Ceph works, its basic architecture, and what some good use cases are. It also discusses the steps that one should take to plan before implementing Ceph, including design goals, proof of concept, and infrastructure design.
Chapter 2, Deploying Ceph, is a no-nonsense step-by-step instructional chapter on how to set up a Ceph cluster. This chapter covers the ceph-deploy tool for testing and goes onto covering Ansible. A section on change management is also included, and it explains how this is essential for the stability of large Ceph clusters. This chapter also serves the purpose of providing you with a common platform you can use for examples later in the book.
Chapter 3, BlueStore, explains that Ceph has to be able to provide atomic operations around data and metadata and how filestore was built to provide these guarantees on top of standard filesystems. We will also cover the problems around this approach. The chapter then introduces BlueStore and explains how it works and the problems that it solves. This will include the components and how they interact with different types of storage devices. We will also have an overview of key-value stores, including RocksDB, which is used by BlueStore. Some of the BlueStore settings and how they interact with different hardware configurations will be discussed.
Chapter 4, Erasure Coding for Better Storage Efficiency, covers how erasure coding works and how it's implemented in Ceph, including explanations of RADOS pool parameters and erasure coding profiles. A reference to the changes in the Kraken release will highlight the possibility of append-overwrites to erasure pools, which will allow RBDs to directly function on erasure-coded pools. Performance considerations will also be explained. This will include references to BlueStore, as it is required for sufficient performance.
Finally, we have step-by-step instructions on actually setting up erasure coding on a pool, which can be used as a mechanical reference for sysadmins.
Chapter 5, Developing with Librados, explains how Librados is used to build applications that can interact directly with a Ceph cluster. It then moves onto several different examples of using Librados in different languages to give you an idea of how it can be used, including atomic transactions.
Chapter 6, Distributed Computation with Ceph RADOS Classes, discusses the benefits of moving processing directly into the OSD to effectively perform distributed computing. It then covers how to get started with RADOS classes by building simple ones with Lua. It then covers how to build your own C++ RADOS class into the Ceph source tree and conduct benchmarks against performing processing on the client versus the OSD.
Chapter 7, Monitoring Ceph, starts with a description of why monitoring is important and discusses the difference between alerting and monitoring. The chapter will then cover how to obtain performance counters from all the Ceph components and explain what some of the key counters mean and how to convert them into usable values.
Chapter 8, Tiering with Ceph, explains how RADOS tiering works in Ceph, where it should be used, and its pitfalls. It takes you step-by-step through configuring tiering on a Ceph cluster and finally covers the tuning options to extract the best performance for tiering. An example using Graphite will demonstrate the value of being able to manipulate captured data to provide more meaningful output in graph form.
Chapter 9, Tuning Ceph, starts with a brief overview of how to tune Ceph and the operating system. It also covers basic concepts of avoiding trying to tune something that is not a bottleneck. It will also cover the areas that you may wish to tune and establish how to gauge the success of tuning attempts. Finally, it will show you how to benchmark Ceph and take baseline measurements so that any results achieved are meaningful. We'll discuss different tools and how benchmarks might relate to real-life performance.
Chapter 10, Troubleshooting, explains how although Ceph is largely autonomous in taking care of itself and recovering from failure scenarios, in some cases, human intervention is required. We'll look at common errors and failure scenarios and how to bring Ceph back to full health by troubleshooting them.
Chapter 11, Disaster Recovery, covers situations when Ceph is in such a state that there is a complete loss of service or data loss has occurred. Less familiar recovery techniques are required to restore access to the cluster and, hopefully, recover data. This chapter arms you with the knowledge to attempt recovery in these scenarios.