Getting started with R
This section will help you get started with setting up your analysis and development environment and also acquaint you with the syntax, data structures, constructs, and other important concepts related to the R programming language. Feel free to skim through this section if you consider yourself to be a master of R! We will be mainly focusing our attention on the following topics:
- Environment setup
- Data types
- Data structures
- Functions
- Controlling code flow
- Advanced operations
- Visualizing data
- Next steps
We will be explaining each construct or concept with hands-on examples and code so that it is easier to understand and you can also learn by doing. Before we dive into further details, let us briefly get to know more about R. R is actually a scripting language but is used extensively for statistical modeling and analysis. The roots of R lie in the S language which was a statistical programming language developed by AT&T. R is a community-driven language and has grown by leaps and bounds over the years. It now has a vast arsenal of tools, frameworks, and packages for processing, analyzing, and visualizing any type of data. Because it's open source, the community posts constant improvements to the base R language, and it introduces extremely powerful R packages capable of performing complex analyzes and visualizations.
R and Python are perhaps the two most popular languages to be used for statistical analysis; and R is often preferred by statisticians, mathematicians, and data scientists because it has more capabilities related to statistical modeling, learning, and algorithms. R is maintained by the Comprehensive R Archival Network (CRAN) and includes all the latest and past versions, binaries, and source code for R, and its packages for different operating systems. Capabilities also exist to connect and interface R with other frameworks including big data frameworks such as Hadoop and Spark, computing platforms, and languages such as Python, Matlab, SPSS, and data interfaces to any possible source such as social media platforms, news portals, the Internet of Things based device data, web traffic data, and so on.
Environment setup
We will be discussing the necessary steps for setting up a proper analysis environment by installing the necessary dependencies around the R ecosystem and also the necessary code snippets, functions, and modules which we will be using across all the chapters. You can refer to any code snippet being used across any chapter from the code files which will be provided for each chapter along with this book. Besides that, you can also access our GitHub repository https://github.com/dipanjanS/learning-social-media-analytics-with-r for necessary code modules, snippets and functions which will be used in the book and adopt them for your own analyzes!
The R language is free and open-source as we mentioned earlier, and is available for all major operating systems. At the time of writing this book, the latest version of R is 3.3.1 (code named Bug in Your Hair
) and is available for downloading at https://www.r-project.org/. This link includes detailed steps, but the direct download page can be accessed at https://cloud.r-project.org/ if you are interested. Download the necessary binary distribution based on your operating system of choice and run the executable setup following the necessary instructions for the Windows platform. If you are using Unix or any *nix like environment, you can install it directly from the terminal too if needed.
Once R is installed, you can fire up the R interpreter directly. This has a graphical user interface (GUI) containing an editor where you can write your code and then execute it. We recommend using an Integrated Development Environment (IDE) instead which eases development and helps maintain code in a more structured way. Besides this you can also use it for other capabilities like generating R markdown documents, R notebooks and Shiny Web Applications. We recommend using RStudio which provides a user-friendly interface for working with R. You can download and install it from https://www.rstudio.com/products/rstudio/download3/ which contains installers for various operating systems.
Once installed, you can start RStudio and use R directly from the IDE itself. It usually contains a code editor window at the top and the R interactive interpreter in the bottom. The interactive interpreter is often called a Read-Evaluate-Print-Loop (REPL). The interpreter asks for input, evaluates and instantly returns the output if any in the interpreter window itself. The interface usually shows the >
symbol when waiting for any input and often shows the +
symbol in the prompt when you enter code which spans multiple lines. Anything in R is usually a vector and outputs are usually returned with square brackets preceding it, like [1]
indicating the output is a vector of size one. Comments are used to describe functions or sections of code. We can specify comments by using the #
symbol followed by text. A sample execution in the R interpreter is shown in the following code for convenience:
> 10 + 5 [1] 15 > c(1,2,3,4) [1] 1 2 3 4 > 5 == 5 [1] TRUE > fruit = 'apple' > if (fruit == 'apple'){ + print('Apple') + }else{ + print('Orange') + } [1] "Apple"
You can see various operations being performed in the R interpreter in the preceding code snippet, including some conditional evaluations and basic arithmetic. We will now delve deeper into the various constructs of R.