There are many data types and structures of data within R. The following topics summarize some of the main types and structures that you will use when building Shiny applications.
Dataframes, lists, arrays, and matrices
Dataframes have several important features, which make them useful for data analysis:
- Rectangular data structures with the typical use being cases (for example, days in one month) down the rows and variables (page views, unique visitors, or referrers) along the columns.
- A mix of data types is supported. A typical dataframe might include variables containing dates, numbers (integers or floats), and text.
- With subsetting and variable extraction, R provides a lot of built-in functionality to select rows and variables within a dataframe.
- Many functions include a data argument, which makes it very simple to pass dataframes into functions and process only the variables and cases that are relevant, which makes for cleaner and simpler code.
We can inspect the first few rows of the dataframe using the head(analyticsData)
command. The following screenshot shows the output of this command:
As you can see, there are four variables within the dataframe—one contains dates, two contain integer variables, and one contains a numeric variable. There is more about variable types in R shown in the following paragraphs.
Variables can be extracted from dataframes very simply using the $
operator as follows:
Also, variables can be extracted from dataframes using []
, as shown in the following command:
Note the use of the comma with nothing before it to indicate that all rows are required. In general, dataframes can be accessed using dataObject[x,y]
with x
being the number(s) or name(s) of the rows required and y
being the number(s) or name(s) of the columns required. For example, if the first 10 rows were required from the pageViews
column, it could be achieved like this:
Leaving the space before the comma blank returns all rows, and the space after the comma blank returns all variables. For example, the following command returns the first three rows of all variables:
The following screenshot shows the output of this command:
Dataframes are a special type of list. Lists can hold many different types of data including lists. As with many data types in R, their elements can be named, which can be useful to write code that is easy to understand. Let's make a list of the options for dinner, with drink quantities expressed in milliliters.
In the following example, please note also the use of the c()
function, which is used to produce vectors and lists by giving their elements separated by commas. R will pick an appropriate class for the return value, string for vectors that contain strings, numeric for those that only contain numbers, logical for Boolean values, and so on:
Note
Note that code is indented throughout, although entering directly into the console will not produce indentations; it is done for readability.
Indexing is similar to dataframes (which are, after all, just a special instance of a list). They can be indexed by number, as shown in the following command:
This returns a list. Returning an object of the appropriate class is achieved using [[]]
:
In this case a numeric vector is returned. They can be indexed also by name:
Note that this, also, returns a list.
Matrices and arrays, which, unlike dataframes, only hold one type of data, also make use of square brackets for indexing, with analyticsMatrix[, 3:6] returning all rows of the third to sixth column, analyticsMatrix[1, 3] returning just the first row of the third column, and analyticsArray[1, 2, ] returning the first row of the second column across all of the elements within the third dimension.
There are many special object types within R which are designed to make it easier to analyze data. Functions in R can be polymorphic, that is to say they can respond to different data types in different ways in order to produce the output that the user desires. For example, the plot()
function in R responds to a wide variety of data types and objects, including single dimension vectors (each value of y plotted sequentially) and two-dimensional matrices (producing a scatterplot), as well as specialized statistical objects such as regression models and time series data. In the latter case, plots specialized for these purposes are produced.
As with the rest of this introduction, don't worry if you haven't written functions before, or don't understand object concepts and aren't sure what this all means. You can produce great applications without understanding all these things, but as you do more and more with R you will start to want to learn more detail about how R works and how experts produce R code. This introduction is designed to give you a jumping off point to learn more about how to get the best out of R (and Shiny).