Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Mastering Social Media Mining with R

You're reading from   Mastering Social Media Mining with R Extract valuable data from your social media sites and make better business decisions using R

Arrow left icon
Product type Paperback
Published in Sep 2015
Publisher
ISBN-13 9781784396312
Length 248 pages
Edition 1st Edition
Languages
Tools
Concepts
Arrow right icon
Toc

Table of Contents (8) Chapters Close

Preface 1. Fundamentals of Mining 2. Mining Opinions, Exploring Trends, and More with Twitter FREE CHAPTER 3. Find Friends on Facebook 4. Finding Popular Photos on Instagram 5. Let's Build Software with GitHub 6. More Social Media Websites Index

The generic process of social media mining

Any data mining activity follows some generic steps to gain some useful insights from the data. Since social media is the central theme of this book, let's discuss these steps by taking example data from Twitter:

  • Getting authentication from the social website
  • Data visualization
  • Cleaning and preprocessing
  • Data modeling using standard algorithms such as opinion mining, clustering, anomaly/spam detection, correlations and segmentations, recommendations
  • Result visualization

Getting authentication from the social website – OAuth 2.0

Most social media websites provide API access to their data. To do the mining, we (as a third-party) would need some mechanism to get access to users' data, available on these websites. But the problem is that a user will not share their credentials with anyone due to obvious security reasons. This is where OAuth comes in the picture. According to its home page (http://oauth.net/), OAuth can be defined as follows:

An open protocol to allow secure authorization in a simple and standard method from web, mobile and desktop applications.

To understand it better, let's take an example of Instagram where a user can allow a printing service access to his/her private photographs stored on Instagram's server, without sharing her credentials with the printing service. Instead, they authenticate directly with Instagram, which issues the printing service delegation-specific permissions. The user here is the primary owner of the resource and the printing service is the third-party client. Social media websites such as Instagram, Twitter, and Facebook allow various applications to access user data for various advertisements or recommendations. Almost all cab service applications access user location.

Here's a diagram illustrating the concept:

Getting authentication from the social website – OAuth 2.0

OAuth 2.0 provides various methods in which different levels of authorizations of the various resources can reliably be granted to the requesting client application. One of the most frequently used and most important use cases is the authorization of World Wide Web server data to another World Wide Web server/application.

The following image shows the authentication process:

Getting authentication from the social website – OAuth 2.0

Let's look at the various steps involved:

  1. The client accesses the web app with the button Login via Twitter (or Login via LinkedIn or Login via Facebook).
  2. This takes the client to an app, which will authenticate it. The client app then asks the user to allow it the access to his/her resources, that is, the profile data. The user needs to accept it to go the next step.
  3. The client is then redirected to a redirect link via the authenticating app, which the client app has provided to the authenticating app. Usually, the redirect link is delivered by registering the client app with the authenticating app. The user of the client app also registers the redirect link and at the same time authenticating app also gives the client app with client credentials.
  4. Using the redirect link, the client contacts the website in the client app. During this step, a connection between authenticating app and client app is made and the authentication code received in the redirect request parameters. So, an access token is returned by the authenticating app.

Depending on the network, the access provided by the access token can be constrained not only in terms of the information but also the life of the access token itself. As soon as the client app obtains an access token, this access token can be sent to the respective social media organizations, such as Facebook, LinkedIn, Twitter, and so on, to access resources in these servers that are related to the clients who gave permission via the tokens.

Differences between OAuth and OAuth 2.0

Here are some of the major differences:

  • More flows in OAuth 2.0 to permit improved support for non-browser based apps
  • OAuth 2.0 does not need the client app to have cryptography
  • OAuth 2.0 offers much less complicated signatures
  • OAuth 2.0 generates short-lived access tokens, hence it is more secure
  • OAuth 2.0 has a clearer segregation of roles concerning the server responsible for handling user authorization and the server handling OAuth requests

Data visualization R packages

A number of visualization R packages for text data are available as R package. These libraries, based on available data and objective, provide various options varying from simple clusters of words to the one inline with semantic analysis or topic modeling of the corpus. These libraries provide means to better understand text data. In this book, we'll use the following libraries:

The simple word cloud

One of the simplest and most frequently used visualization libraries is the simple word cloud. The basic intent to using word cloud is to visualize the weights of the words present. The "wordcloud" R library helps the user get an understanding of weights of a word/term with respect to the tf-idf matrix. The weights are proportional to the size and color of the word you see in the plot. Here's an example of one such simple word cloud based on the corpus created from tweets:

The simple word cloud

Sentiment analysis Wordcloud

There are R packages that can generate a word cloud similar to the preceding figure, along with the sentiments each word is representing. Such plots are one step ahead of the basic word cloud because they let the user get an understanding of what kind of sentiments are present and why the particular documents (collection of tweets) are of a particular nature (joy, sadness, disgust, love, and so on.). Timothy Jurka developed one such package, which we are going to use. The two main functions of this package are as follows:

  • Classify_emotion: As the name suggests, the procedure helps the user understand the type of sentiment that is present. This procedure also clusters the words present in the query based on the sentiment and level of emotions that particular word present. A voting-based classification is one the algorithms used in this particular procedure. The Naive Bayes algorithm is also used for more enhanced results. The training dataset used on the above algorithms is from Carlo Strapparava and Alessandro Valitutti. Here's a sample output:
    Sentiment analysis Wordcloud
  • Classify_polarity: This procedure indicates the overall polarity of the emotions (positive or negative). This is, in a way, an extension of the procedure. The training data used here comes from Janyce Wiebe's subjectivity lexicon.

The most commonly used visualization library for Facebook data is Gephi. The key difference between Facebook and Twitter is the richness of the profile of a user and the social connections one shares on Facebook. Gephi helps users visualize both of the distinctions in a very pleasant way. It enables a user to understand the impact one Facebook profile has, or could have, over the network. Gephi is highly customizable and user-friendly library. We'll discuss this in Chapter 3, Find Friends on Facebook. As a working example, here's the graph representation of a social network of two friends.

Sentiment analysis Wordcloud

Many more R packages are available to visualize most social media data. For more information, refer to the following links:

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image