Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
R Data Mining

You're reading from   R Data Mining Implement data mining techniques through practical use cases and real-world datasets

Arrow left icon
Product type Paperback
Published in Nov 2017
Publisher Packt
ISBN-13 9781787124462
Length 442 pages
Edition 1st Edition
Languages
Tools
Concepts
Arrow right icon
Author (1):
Arrow left icon
Andrea Cirillo Andrea Cirillo
Author Profile Icon Andrea Cirillo
Andrea Cirillo
Arrow right icon
View More author details
Toc

Table of Contents (16) Chapters Close

Preface 1. Why to Choose R for Your Data Mining and Where to Start FREE CHAPTER 2. A First Primer on Data Mining Analysing Your Bank Account Data 3. The Data Mining Process - CRISP-DM Methodology 4. Keeping the House Clean – The Data Mining Architecture 5. How to Address a Data Mining Problem – Data Cleaning and Validation 6. Looking into Your Data Eyes – Exploratory Data Analysis 7. Our First Guess – a Linear Regression 8. A Gentle Introduction to Model Performance Evaluation 9. Don't Give up – Power up Your Regression Including Multiple Variables 10. A Different Outlook to Problems with Classification Models 11. The Final Clash – Random Forests and Ensemble Learning 12. Looking for the Culprit – Text Data Mining with R 13. Sharing Your Stories with Your Stakeholders through R Markdown 14. Epilogue
15. Dealing with Dates, Relative Paths and Functions

Possible alternatives to write and run R code

We have already discussed two ways of executing R code:

  • Employing your OS terminal
  • Employing the development environment that comes with the R base installation

The first of the aforementioned ways can be quite a convenient way for experienced R users. It clearly shows its advantages when executing articulated analytical activities, such as ones requiring:

  • The sequential execution of scripts from different languages
  • The execution of filesystem manipulation

Regarding the second alternative, we have already talked about its shortfalls compared to its direct competitor. Therefore, now is the time to have a closer look at this competitor, and this is what we are going to do in the following paragraphs before actually starting to write some more R code.

Two disclaimers are needed:

  • We are not considering text editor applications here, that is, software without an R console included and additional code execution utilities included. Rather, we prefer an integrated development environment, since they are able to provide a more user-friendly and comprehensive experience for a new language adopter. 
  • We are not looking for completeness here, just for the tools most often cited within R community discussions and events. Perhaps something better than these platforms is available, but it has not yet gained comparable momentum.

The alternative platforms we are going to introduce here are:

  • RStudio
  • Jupyter Notebook
  • Visual Studio

RStudio (all OSs)

RStudio is a really well-known IDE within the R community. It is freely available at https://www.rstudio.com. The main reason for its popularity is probably the R-dedicated nature of the platform, which differentiates it from the other two alternatives that we will discuss further, and its perfect integration with some of the most beloved packages of the R community.

RStudio comes packed with all the base features we talked about when discovering the R base installation development environment, enriched with a ton of useful additional components introduced to facilitate coding activity and maximize the effectiveness of the development process. Among those, we should point out:

  • A filesystem browser to explore and interact with the content of the directory you are working with
  • A file import wizard to facilitate the import of datasets
  • A plot pane to visualize and interact with the data visualization produced by code execution
  • An environment explorer to visualize and interact with values and the data produced by code execution
  • A spreadsheet-like data viewer to visualize the datasets produced by code execution

All of this is enhanced by features such as code autocompletion, inline help for functions, and splittable windows for multi-monitor users, as seen in the following screenshot:

A final word has to be said about integration with the most beloved R additional packages. RStudio comes with additional controls or predefined shortcuts to fully integrate, for instance:

  • markdown package for markdown integration with R code (more on this in Chapter 13, Sharing your stories with your stakeholders through R markdown)
  • dplyr for data manipulation (more on this in Chapter 2, A First Primer on Data Mining - Analysing Your Banking Account Data)
  • shiny package for web application development with R (more on this in Chapter 13, Sharing your stories with your stakeholders through R markdown)

The Jupyter Notebook (all OSs)

The Jupyter Notebook was primarily born as a Python extension to enable interactive data analysis and a fully reproducible workflow. The idea behind the Jupyter Notebook is to have both the code and the output of the code (plots and tables) within the same document. This allows both the developer and other subsequent readers, for instance a customer, to follow the logical flow of the analysis and gradually arrive at the results.

Compared to RStudio, Jupyter does not have a filesystem browser, nor an environment browser. Nevertheless, it is a very good alternative, especially when working on analyses which need to be shared.

Since it comes originally as a Python extension, it is actually developed with the Python language. This means that you will need to install Python as well as R to execute this application. Instructions on how to install Jupyter can be found in the Jupyter documentation at http://jupyter.readthedocs.io/en/latest/install.html.

After installing Jupyter, you will need to add a specific component, namely a kernel, to execute R code on the notebook. Instructions on how to install the kernel can be found on the component's home page at https://irkernel.github.io.

Visual Studio (Windows users only)

Visual Studio is a popular development tool, primarily for Visual Basic and C++ language development. Due to the recent interest showed by Microsoft in the R language, this IDE has been expanded through the introduction of the R Tools extension.

This extension adds all of the commonly expected features of an R IDE to the well-established platform such as Visual Studio. The main limitation at the moment is the availability of the product, as it is only available on a computer running on the Windows OS.

Also, Visual Studio is available for free, at least the Visual Studio Community Edition. Further details and installation guides are available at https://www.visualstudio.com/vs/rtvs.

You have been reading a chapter from
R Data Mining
Published in: Nov 2017
Publisher: Packt
ISBN-13: 9781787124462
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image