In bioinformatics, the statistical analysis of datasets of varied size and composition is a frequent task. R is, of course, a hugely powerful statistical language with abundant options for all sorts of tasks. In this chapter, we will focus a little on some of those useful but not so often discussed methods that, while none of them make up an analysis in and of themselves, can be powerful additions to the analyses that you likely do quite often. We'll look at recipes for simulating datasets and machine learning methods for class prediction and dimensionality reduction.
The following recipes will be covered in this chapter:
- Correcting p-values to account for multiple hypotheses
- Generating a simulated dataset to represent a background
- Learning groupings within data and classifying with kNN
- Predicting classes with random forests...