In this section, we will focus on web scraping and how to implement it using the rvest package.
Web scraping is the procedure of converting unstructured data into a structured format. Structured data can be easily accessed and used. We will use R for scraping the data of most popular feature films from the IMDb website.
The following steps are implemented to get data into R using the rvest package:
- Install the rvest package. It is mandatory to install it, as it does not come as a built-in library:
> install.packages('rvest')
package 'rvest' successfully unpacked and MD5 sums checked The downloaded binary packages are in C:\Users\Radhika\AppData\Local\Temp\RtmpMvNUA5\downloaded_packages
- Include the installed package in R's workspace:
> library(rvest)
- Let's start web scraping...