Predicting chemical biodegration
In this section, we are going to use R's e1071
package to try out the models we've discussed on a real-world dataset. As our first example, we have chosen the QSARbiodegration data set, which can be found at https://archive.ics.uci.edu/ml/datasets/QSAR+biodegradation. This is a dataset containing 41 numerical variables that describe the molecular composition and properties of 1,055 chemicals. The modeling task is to predict whether a particular chemical will be biodegradable based on these properties. Example properties are the percentages of carbon, nitrogen, and oxygen atoms, as well as the number of heavy atoms in the molecule. These features are highly specialized and sufficiently numerous, so a full listing won't be given here. The complete list and further details of the quantities involved can be found on the website. For now, we've downloaded the data into a bdf
data frame:
>bdf<- read.table("biodeg.csv", sep = ";", quote = "\"") > head(bdf...