Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Mastering Social Media Mining with R

You're reading from   Mastering Social Media Mining with R Extract valuable data from your social media sites and make better business decisions using R

Arrow left icon
Product type Paperback
Published in Sep 2015
Publisher
ISBN-13 9781784396312
Length 248 pages
Edition 1st Edition
Languages
Tools
Concepts
Arrow right icon
Toc

Table of Contents (8) Chapters Close

Preface 1. Fundamentals of Mining 2. Mining Opinions, Exploring Trends, and More with Twitter FREE CHAPTER 3. Find Friends on Facebook 4. Finding Popular Photos on Instagram 5. Let's Build Software with GitHub 6. More Social Media Websites Index

Challenges for social media mining

Social media mining is currently in a stage of infancy, and its practitioners are learning and developing new approaches. Social media mining draws its roots from many fields, such as statistics, machine learning, information retrieval, pattern recognition, and bioinformatics. The parent fields themselves are not without their challenges. The sheer amount of data being generated daily is staggering, but current techniques allow for novel data mining solutions and scalable computational models with help from the fundamental concepts and theories and algorithms.

In social media theory, people are considered to be the basic building blocks of a world created on the grounds provided by the social media. The measurements of the interactions between these building blocks and other entities such as sites, networks, content, and so on leads to the discovery of human nature. The knowledge gained via these measurements constitutes the soul of the social worlds. Finding the insights from this data where social relationships play a critical role can be termed as the mining of social media data. This problem not only has to face the basic data mining challenges but also those that emerge because of the social-relationship aspect. We have listed down some of the important challenges here:

  • Big Data: Should we use the taste of a friend of a friend of the person of interest, who has studied at one particular college and whose hometown was one particular city to recommend something to the person of the interest? In some applications, this might be overkill and in others this information could lead to a very small but differentiating performance increase. The content that can be used in social media data can be very deep. However, this can lead to a problem called over fitting, which is well known in the domain of machine learning. Using multiple sources of data can also complicate the overall performance in a similar fashion.
  • Sufficiency: Should we restrict people to view only the person of interest's alma mater and his/her hometown to recommend something and not use the tastes of his/her friends? Common sense says this is not correct and we may be missing out on something. This is a problem commonly known as under fitting. This problem can also arise due to the fact that most social media networks restrict the amount of information that can be accessed in a certain time frame, so sometimes the data is not sufficient enough to generate patterns and/or generate recommendations.
  • Noise removal error: Preprocessing steps are more or less always required in any application of data mining. These steps not only make the actual application run faster on the cleaned data, but they also improve overall accuracy. Due to all the clutter, which is present in most social data, a large amount of noise is always expected but effectively removing the noise from the data we have is a very tricky business. You can always end up missing some information while trying to remove this noise. Noise by its definition is a subjective quantity and can always be confused; hence, this step can end up introducing more error in pattern recognition.
  • Evaluation dilemma: Because of the sheer size of social media data, it's not possible to obtain a properly annotated dataset to train a supervised machine-learning algorithm. Without the proper ground truth data, there is no way to judge the accuracy of any off-the-shell classification algorithms. Since there can't be any accuracy measures without the ground truth data, only a clustering (unsupervised machine learning) algorithm can be applied. But the problem is that such algorithms rely heavily on the domain expertise.
lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image