Ceph is largely autonomous in taking care of itself and recovering from failure scenarios, but in some cases human intervention is required. This chapter will look at common errors and failure scenarios and how to bring Ceph back to working order by troubleshooting them.
In this chapter we will cover the following topics:
- How to correctly repair inconsistent objects
- How to solve problems with the help of peering
- How to deal with near_full and too_full OSDs
- How to investigate errors via Ceph logging
- How to investigate poor performance
- How to investigate PGs in a down state