Overview: A first R session
Now we have R and Rstudio installed we can start our first R session from within RStudio. It is a good practice to use an RStudio
project for all your data analysis with R, for reasons we will encounter later in this book.
We create an R project using the menu Project | New Project. Choose New Directory and name the project file Abalone
.
Note
In this session, we download and manipulate the abalone
file. This file will be used in examples throughout the book.
Abalones are a very common type of edible sea snail (sometimes called sea ear) occurring in waters around the world. The data in the file used in this book was compiled and published by Warwick J. Nash, Tracy L. Sellers, Simon R. Talbot, Andrew J. Cawthorn, and Wes B. Ford in 1994 [Sea fisheries division Technical Report No. 48 (ISSN 1034-3288)]. It was generously donated to the UCI machine learning repository in 1995.
If you are a beginner in R programming, the RStudio menus facilitate many R commands. When you click on a menu item, RStudio generates and executes the corresponding R commands in the console window. It is a good (and a reproducible!) practice to put your R code in script files as much as possible; but for now we will use some menu commands.
Select Workspace | Import DataSet | From Web URL.
RStudio (and R) can import text files from the
disk and over the Internet as well, as shown in the following example:
Type (or paste) the following URL: http://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data.
RStudio downloads the file and shows the Import Dataset dialog:
The top left-hand side shows the name (abalone) of the resulting data.frame
. On the bottom left-hand side are the settings for reading the data file that RStudio deduced from the data file. You can alter these; however, in this example they are fine. On the top right-hand side RStudio shows the first 25 lines of the data file. On the bottom right-hand side it shows the first 25 records of the resulting data.frame
. Click on the
Import button.
RStudio imports the data and creates a data.frame
with the name abalone
using the R command read.table
and the options that you have set in the Import DataSet dialog. Also, it automatically runs View(abalone)
, which shows the data we just imported. Notice that the Workspace panel on the right-hand side now contains the variable abalone
. Also, notice that the column names of the data are missing, so we need to add them.
In the console panel we type the following:
This sets the correct names for the data set and stores the data in your project directory, so you don't have to download it again. This data file is part of your compendium.
We will start our first data analysis within RStudio with an R script.
Follow the next few steps in order to start the data analysis:
Create a new R script by navigating to File | New | R script (Ctrl+Shift+N or Command+Shift+N) and type the following:
These commands load the data, calculate the gender frequencies in the data, and plot a box plot of Length
by Sex
for abalone
.
Save your R script as abalone.R
using File | Save (Ctrl+S or Command+S).
Execute your R script with Ctrl+Shift+Enter or Command+Shift+Return.
Et voila! We have run a small R script from within RStudio. Notice that the panel on the bottom right-hand side shows the plot that we have created.
But we can do better than that. If you did not follow the previous instructions to install knitr
, now is the time to do it after all. You may also install it by typing install.packages("knitr")
in the console.
Choose File | Compile Notebook.
Close the Abalone project with Project | Close Project. Choose Save.
We have now a new empty RStudio session.
Open your newly created an Abalone project by navigating to Project | Recent Projects | Abalone.
Your environment is restored, including all the commands that you typed, thanks to R and RStudio.
Besides the standard keyboard shortcuts that you likely use in everyday computer use (cut-copy-paste, or to undo an activity), RStudio supports many keyboard shortcuts specifically for R code editing, execution, and more. Although you are unlikely to learn or use all of them, it is useful to get used to at least a few. We will highlight a few of the most useful keyboard shortcuts in every chapter.
If you run into trouble with RStudio, there are several ways to get help online.
The developers of RStudio have shown to be amazingly responsive on the help forum at http://support.rstudio.org/. There are many people using R and RStudio, so chances are that someone has already posted the same question somewhere and had it answered. So, before posting a question, make sure to take a look at the troubleshooting guide at RStudio's support page.
Search whether your question has been answered before in the FAQs or the forum.
Google your question. It may have been answered on another Q&A forum, such as stack exchange.
When you post a question, it helps a lot to include a small example that reproduces your problem. Also, you may want to attach the output of R's sessionInfo()
command to show in what context the problem occurred. Finally, it can be helpful if you attach RStudio's logfile. You can find the folder where it is stored by opening Help>Diagnostics>Show log files
. If RStudio fails to start, you can find it in the following place folder:
What if I uninstall RStudio?
Although you may find this hard to believe, this is absolutely no problem. Each RStudio project is just a folder, containing your scripts, reports, and data in their original form. Additionally there is a .proj
file that holds some session information for RStudio and possibly an .Rdata
file. So even if you wish to uninstall RStudio, your work is as accessible as before. You can still re-open your last-closed R session by starting the default Rgui and opening the .Rdata
file in that folder. Scripts are stored as simple text files.
It is important to note that RStudio does not alter the storage format of your data in any way. In contrast, many proprietary products force you to import your data and store it in some binary format that cannot be opened with other products.
The paper Statistical Analyses and Reproducible Research by Robert Gentleman and Duncan Temple Lang offers a thorough description of methods for reproducible research. It can be downloaded for free from http://biostats.bepress.com/bioconductor/paper2/. There are many books for learning about R, a lot of which are dedicated to specific subjects. Two recent books that discuss R in general that have quickly gained popularity are R in a Nutshell by Joseph Adler, 2010, O'Reilley, and The Art of R programming by Norman Matloff, 2011, No Starch Press, Inc. The former book discusses R as a language as well as many statistical features while the latter thoroughly discusses R as a programming language. Two books focusing on general statistics with R are worth mentioning here as well. The first is Introductory Statistics with R (2nd ed. 2008, Springer) by Peter Dalgaard. The second is Introductory Probability and Statistics Using R by G. Jay Kerns. The latter book is developed as an open source project and can be downloaded from http://ipsur.org/.
To keep up-to-date information on what happens in the R community, we highly recommend frequent visits to Tal Galili's r-bloggers.com. This website collects a large amount of R related blogs in a convenient newspaper-like layout. Subscribing with an RSS reader for smartphone or PC is also possible.
In this chapter we emphasized the importance of making your analyses reproducible and introduced the concepts of reproducible research and the compendium. How to install R and RStudio in several environments was shown. RStudio supports the concept of a compendium through projects, and if you followed the first session carefully, you have learned to read, alter, and store a simple CSV file, perform some simple analyses, and make a simple plot and generate an HTML report automatically that you can share with your coworkers.
In the next chapter we will take a deeper dive into writing scripts with RStudio.