Kernel density estimation
In order to explain KDE, let us generate some one-dimensional data and build some histograms. Histograms are a good way to understand the underlying probability distribution of the data.
We can generate histograms using the following code block for reference:
> data <- rnorm(1000, mean=25, sd=5) > data.1 <- rnorm(1000, mean=10, sd=2) > data <- c(data, data.1) > hist(data) > hist(data, plot = FALSE) $breaks [1] 0 5 10 15 20 25 30 35 40 45 $counts [1] 8 489 531 130 361 324 134 22 1 $density [1] 0.0008 0.0489 0.0531 0.0130 0.0361 0.0324 0.0134 0.0022 0.0001 $mids [1] 2.5 7.5 12.5 17.5 22.5 27.5 32.5 37.5 42.5 $xname [1] "data" $equidist [1] TRUE attr(,"class") [1] "histogram"
This code creates two artificial data-sets and combines them. Both datasets are based on the normal distribution; the first has a mean of 25 and standard deviation of 5, the second has a mean of 10 and standard deviation of 2. If we recall from basic statistics...