Survival Analysis: Hard Drive Reliability Sample
Abstract
The branch of statistics that study the expected duration of time for an event to occur is called survival analysis. The number of events can be one or more. This project reviews nonparametric methods like Kaplan-Meier, Nelson-Aalen, and Cox proportional hazards model. These techniques are applied to the Hard Drive data sets of Backblaze. This application of survival analysis is called failure-time analysis. In this way, the goal is to find the survival probabilities of the hard disks using the data collected by Backblaze in 2019. With the raw data, we create new variables for applying survival models. The major package used for this exercise is survival. For the number of files, it also uses data.table package.
Summary
The global number of hard disks observed during 2019 was 131 448 observations and the number of hard disks with failure was 2 211.
The observations are left truncated and right censored. The distribution of age and study time by fail is:
We can perceive how the density of fail increases until reaching the first year of operations. We can also appreciate the few values from the end of the expected life of hard drives.
We applied the techniques for measuring the survival probabilities:
-
Kaplan Meier
-
Nelson Aalen
Both methods gives similar values, as you can see in the following figure:
As a result, the survival probabilities are:
Therefore, we can conclude that there is more than a 90% probability that a hard disk will reach its estimated useful life of 1825 days.
On the other hand, using the package simPH, we can build a simulation of the relative hazard based on a comparison with a median age of 497 in hard disk using Cox regression.
As a result, there are more probabilities to fail during the first 497 days. In conclusion, the simulated relative hazards for ages below the median are more than one. This means that hard disks are more likely to fail at a given point in time than hard disks that have worked for 497 days.