Getting the data
A couple of small datasets of the Facebook network data are available on the Internet. None of them are particularly large or complete, but they do give us a reasonable snapshot of part of Facebook's network. As the Facebook graph is a private data source, this partial view is probably the best that we can hope for.
We'll get the data from the
Stanford Large Network Dataset Collection (http://snap.stanford.edu/data/). This contains a number of network datasets, from Facebook and Twitter, to road networks and citation networks. To do this, we'll download the facebook.tar.gz
file from http://snap.stanford.edu/data/egonets-Facebook.html. Once it's on your computer, you can extract it. When I put it into the folder with my source code, it created a directory named facebook
.
The directory contains 10 sets of files. Each group is based on one primary vertex (user), and each contains five files. For vertex 0
, these files would be as follows:
0.edges
: This contains the vertices that the primary one links to.0.circles
: This contains the groupings that the user has created for his or her friends.0.feat
: This contains the features of the vertices that the user is adjacent to and ones that are listed in0.edges
.0.egofeat
: This contains the primary user's features.0.featnames
: This contains the names of the features described in0.feat
and0.egofeat
. For Facebook, these values have been anonymized.
For these purposes, we'll just use the *.edges
files.
Now let's turn our attention to the data in the files and what they represent.