We just simulated the positive cases. Now, let's set up some similar code to simulate the non-diabetes patients (outcome=0).
For the negative cases, we will also multiply sample.bin by -1, so that in the future, we know that all the positive sample.bin instances correspond to positive cases and all the negative sample.bin instances correspond to negative ones:
set.seed(123)
nbins2=base::round(n2/400,0)
correlationMatrix <- cor(PimaIndians[PimaIndians$diabetes =='neg',1:8])
covarianceMatrix <- stats::cov(PimaIndians[PimaIndians$diabetes =='neg',1:8])
out_sd2 <- as.DataFrame(data.frame(data.frame(
sample.bin=base::sample(1:nbins2,n2,replace=TRUE)*(-1),
outcome=0,
mvrnorm(n2, mu = means.neg, Sigma = matrix(covarianceMatrix, ncol = 8), empirical = TRUE)
))[rep(1:n2, times=2000), ])
nrow(out_sd2)
The output...