Packt+ | Advance your knowledge in tech

You're reading from R Web Scraping Quick Start Guide Techniques and tools to crawl and scrape data from websites

Product type Paperback

Published in Oct 2018

Publisher Packt

ISBN-13 9781789138733

Length 114 pages

Edition 1st Edition

Languages

Concepts

Data Mining

Author (1):

Olgun Aydin

View More author details

Chapter 1, Introduction to Web Scraping, introduces web scraping techniques, which are getting more and more popular, since data is as valuable as oil in the 21^st century. In this chapter, you can find detailed information about web scraping technologies. We also take an overview of some of the key languages for web scraping, such as XPath and regEX. We'll also look into some web scraping libraries for R, such as rvest and RSelenium technologies.

Chapter 2, Working with the XML Path Language and the Regular Expression Language, looks at XPath and regEX rules, which are quite important to know when scraping a web page. In this chapter, you can find useful information about these languages and also have a chance to write XPath and regEX rules from scratch.

Chapter 3, Web Scraping with rvest, covers the rvest library. Scraping a web page with R is straightforward thanks to the rvest library, which was developed by Hadley Wickham. In this chapter, you can find tips and tricks about the library and learn how to write an R script by using the rvest library to scrape a web page from scratch.

Chapter 4, Web Scraping with RSelenium, explores RSelenium. RSelenium is a technology for testing, but it's also useful for scraping web pages. In this chapter, you can find an overview of Selenium and learn how to scrape a web page using RSelenium library.

Chapter 5, Storing Data and Creating Cronjobs, deals with the matter of storage. After collecting data, you should store the dataset somewhere; it would be good if you could use a cloud-based solution, such as AWS RDS, EC2, Google Cloud Platform, or Microsoft Azure. Also, if you would like to schedule the collection of data, it's possible to create cronjob that will help you do so. In this chapter, you can find an overview of databases and cloud platforms, and you'll also learn how to connect databases and schedule cronjobs using R.