Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
R Web Scraping Quick Start Guide

You're reading from   R Web Scraping Quick Start Guide Techniques and tools to crawl and scrape data from websites

Arrow left icon
Product type Paperback
Published in Oct 2018
Publisher Packt
ISBN-13 9781789138733
Length 114 pages
Edition 1st Edition
Languages
Concepts
Arrow right icon
Author (1):
Arrow left icon
Olgun Aydin Olgun Aydin
Author Profile Icon Olgun Aydin
Olgun Aydin
Arrow right icon
View More author details
Toc

What this book covers

Chapter 1, Introduction to Web Scraping, introduces web scraping techniques, which are getting more and more popular, since data is as valuable as oil in the 21st century. In this chapter, you can find detailed information about web scraping technologies. We also take an overview of some of the key languages for web scraping, such as XPath and regEX. We'll also look into some web scraping libraries for R, such as rvest and RSelenium technologies.

Chapter 2, Working with the XML Path Language and the Regular Expression Language, looks at XPath and regEX rules, which are quite important to know when scraping a web page. In this chapter, you can find useful information about these languages and also have a chance to write XPath and regEX rules from scratch.

Chapter 3, Web Scraping with rvest, covers the rvest library. Scraping a web page with R is straightforward thanks to the rvest library, which was developed by Hadley Wickham. In this chapter, you can find tips and tricks about the library and learn how to write an R script by using the rvest library to scrape a web page from scratch.

Chapter 4, Web Scraping with RSelenium, explores RSelenium. RSelenium is a technology for testing, but it's also useful for scraping web pages. In this chapter, you can find an overview of Selenium and learn how to scrape a web page using RSelenium library.

Chapter 5, Storing Data and Creating Cronjobs, deals with the matter of storage. After collecting data, you should store the dataset somewhere; it would be good if you could use a cloud-based solution, such as AWS RDS, EC2, Google Cloud Platform, or Microsoft Azure. Also, if you would like to schedule the collection of data, it's possible to create cronjob that will help you do so. In this chapter, you can find an overview of databases and cloud platforms, and you'll also learn how to connect databases and schedule cronjobs using R.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime