Reliability is a measure of the confidence in a system, and is inversely proportional to the probability of failure.
Reliability is measured using several metrics:
- Mean time between failures (MTBF): Uptime/number of failures
- Mean time to repair (MTTR): The average time it takes the team to fix a failure and return the system online