The most important reason to monitor Ceph is to ensure that the cluster is running in a healthy state. If Ceph is not running in a healthy state, be it because of a failed disk or for some other reason, the chances of a loss of service or data increase. Although Ceph is highly automated in recovering from a variety of scenarios, being aware of what is going on and when manual intervention is required is essential.
Monitoring isn't just about detecting failures; monitoring other metrics such as used disk space is just as essential as knowing when a disk has failed. If your Ceph cluster fills up, it will stop accepting I/O requests and will not be able to recover from future OSD failures.
Finally, monitoring both the operating systems and Ceph's performance metrics can help you spot performance issues or identify tuning opportunities...