Demographics and data science
Social networks exist for and by its user base. StackExchange rides upon its wide user base which has a diverse set of skills. In this use case, let us try and understand the demographic related dynamics of https://datascience.stackexchange.com/.
We first begin with loading the user related data from the dumps. As discussed earlier, this information is available in the Users.XML
file. We utilize the same loadXMLToDataFrame
utility function to get the required DataFrame. We then get some quick details from the DataFrame such as number of users, average age, average reputation, and so on. The following snippet gets us started on the same:
# Total Users > dim(UsersDF) [1] 19237 14 # Average Reputation Score > max(as.numeric(UsersDF[!is.na(UsersDF$Reputation),'Reputation'])) [1] 5305 # Average age of user on data.stack exchange > mean(as.numeric(UsersDF[!is.na(UsersDF$Age),'Age'])) [1] 30.83677
Note
Readers should check data types for each of the attributes...