Getting the data
The first step in our data analysis pipeline is to get the dataset. We have actually cleaned the data and provided meaningful names to the data attributes and you can check that out by opening the german_credit_dataset.csv
file. You can also get the actual dataset from the source which is from the Department of Statistics, University of Munich through the following URL: http://www.statistik.lmu.de/service/datenarchiv/kredit/kredit_e.html.
You can download the data and then run the following commands by firing up R in the same directory with the data file, to get a feel of the data we will be dealing with in the following sections:
> # load in the data and attach the data frame > credit.df <- read.csv("german_credit_dataset.csv", header = TRUE, sep = ",") > # class should be data.frame > class(credit.df) [1] "data.frame" > > # get a quick peek at the data > head(credit.df)
The following figure shows the first six rows of the data. Each column indicates...