Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Mastering Data analysis with R

You're reading from   Mastering Data analysis with R Gain sharp insights into your data and solve real-world data science problems with R—from data munging to modeling and visualization

Arrow left icon
Product type Paperback
Published in Sep 2015
Publisher Packt
ISBN-13 9781783982028
Length 396 pages
Edition 1st Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Gergely Daróczi Gergely Daróczi
Author Profile Icon Gergely Daróczi
Gergely Daróczi
Arrow right icon
View More author details
Toc

Table of Contents (17) Chapters Close

Preface 1. Hello, Data! 2. Getting Data from the Web FREE CHAPTER 3. Filtering and Summarizing Data 4. Restructuring Data 5. Building Models (authored by Renata Nemeth and Gergely Toth) 6. Beyond the Linear Trend Line (authored by Renata Nemeth and Gergely Toth) 7. Unstructured Data 8. Polishing Data 9. From Big to Small Data 10. Classification and Clustering 11. Social Network Analysis of the R Ecosystem 12. Analyzing Time-series 13. Data Around Us 14. Analyzing the R Community A. References Index

Chapter 1. Hello, Data!

Most projects in R start with loading at least some data into the running R session. As R supports a variety of file formats and database backend, there are several ways to do so. In this chapter, we will not deal with basic data structures, which are already familiar to you, but will concentrate on the performance issue of loading larger datasets and dealing with special file formats.

Note

For a quick overview on the standard tools and to refresh your knowledge on importing general data, please see Chapter 7 of the official An Introduction to R manual of CRAN at http://cran.r-project.org/doc/manuals/R-intro.html#Reading-data-from-files or Rob Kabacoff's Quick-R site, which offers keywords and cheat-sheets for most general tasks in R at http://www.statmethods.net/input/importingdata.html. For further materials, please see the References section in the Appendix.

Although R has its own (serialized) binary RData and rds file formats, which are extremely convenient to use for all R users as these also store R object meta-information in an efficient way, most of the time we have to deal with other input formats—provided by our employer or client.

One of the most popular data file formats is flat files, which are simple text files in which the values are separated by white-space, the pipe character, commas, or more often by semi-colon in Europe. This chapter will discuss several options R has to offer to load these kinds of documents, and we will benchmark which of these is the most efficient approach to import larger files.

Sometimes we are only interested in a subset of a dataset; thus, there is no need to load all the data from the sources. In such cases, database backend can provide the best performance, where the data is stored in a structured way preloaded on our system, so we can query any subset of that with simple and efficient commands. The second section of this chapter will focus on the three most popular databases (MySQL, PostgreSQL, and Oracle Database), and how to interact with those in R.

Besides some other helper tools and a quick overview on other database backend, we will also discuss how to load Excel spreadsheets into R—without the need to previously convert those to text files in Excel or Open/LibreOffice.

Of course this chapter is not just about data file formats, database connections, and such boring internals. But please bear in mind that data analytics always starts with loading data. This is unavoidable, so that our computer and statistical environment know the structure of the data before doing some real analytics.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at R$50/month. Cancel anytime