Data Science and StackExchange
Data science is not just an industry buzzword but an actual field of study which encompasses a whole lot of academic research and industry level application of these concepts. The https://datascience.stackexchange.com/ is one of those sites where users from different backgrounds and levels of expertise ask questions and discuss a whole lot of interesting concepts and things related to the field of data science, machine learning, advanced analytics, and so on.
As part of this use case, we will be making use of the Posts.xml
file primarily from the said site for the analysis and uncovering of insights. Introduced in the previous section, we will utilize the same utility to load the XML and perform a couple of pre-processing steps, such as date-time cleanup to get our dataset in useable form. The following snippet performs the cleanup as well as brings the Tags
attribute into useable form:
PostsDF <- loadXMLToDataFrame(paste0(path,"Posts.xml")) #...