By default, Ceph will warn us when OSD utilization approaches 85%, and it will stop writing I/O to the OSD when it reaches 95%. If, for some reason, the OSD completely fills up to 100%, the OSD is likely to crash and will refuse to come back online. An OSD that is above the 85% warning level will also refuse to participate in backfilling, so the recovery of the cluster may be impacted when OSDs are in a near-full state.
Before covering the troubleshooting steps around full OSDs, it is highly recommended that you monitor the capacity utilization of your OSDs, as described in Chapter 8, Monitoring Ceph. This will give you advanced warning as OSDs approach the near_full warning threshold.
If you find yourself in a situation where your cluster is above the near-full warning state, you have two options:
- Add some more OSDs
- Delete some data
However, in the real world, both...