Case study – working with the Stack Overflow dataset
This section will cover an exercise to help you practice different data transformation, aggregation, and merging techniques based on the public Stack Overflow dataset, which contains a set of tables related to technical questions and answers posted on the Stack Overflow platform. The supporting raw data has been uploaded to the accompanying Github repository of this book. We will directly download it from the source GitHub link using the readr
package, another tidyverse
offering that provides an easy, fast, and friendly way to read a wide range of data sources, including those from the web.
Exercise 2.11 – working with the Stack Overflow dataset
Let’s begin this exercise:
- Download three data sources on questions, tags, and their mapping table from GitHub:
library(readr) df_questions = read_csv("https://raw.githubusercontent.com/PacktPublishing/The-Statistics-and-Machine-Learning-with-R-Workshop...