Subscription

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Free Learning

You're reading from Python Social Media Analytics Analyze and visualize data from Twitter, YouTube, GitHub, and more

Product type Paperback

Published in Jul 2017

Publisher Packt

ISBN-13 9781787121485

Length 312 pages

Edition 1st Edition

Languages

Python

Tools

GitHub

Concepts

Data Analysis

Authors (3):

Baihaqi Siregar

Siddhartha Chatterjee

Michal Krystyanczuk

View More author details

Table of Contents (10) Chapters

Preface

1. Introduction to the Latest Social Media Landscape and Importance FREE CHAPTER

2. Harnessing Social Data - Connecting, Capturing, and Cleaning

3. Uncovering Brand Activity, Popularity, and Emotions on Facebook

4. Analyzing Twitter Using Sentiment Analysis and Entity Recognition

5. Campaigns and Consumer Reaction Analytics on YouTube – Structured and Unstructured

6. The Next Great Technology – Trends Mining on GitHub

7. Scraping and Extracting Conversational Topics on Internet Forums

8. Demystifying Pinterest through Network Analysis of Users Interests

9. Social Data Analytics at Scale – Spark and Amazon Web Services

Keywords

In the first place, we generate wordclouds for most frequent keywords for posts and consumer comments on the whole dataset.

In the following screenshot, you can see the most frequent keywords in brand posts:

In the following screenshot, you can see the most frequent keywords used in comments:

We can easily notice that the keywords are polluted by lots of comments related to political and religious issues. As we don't want to focus our analysis on these topics, we'll create a filtering method to remove all the irrelevant words.

We define a list of keywords associated with comments considered as noise in a global variable, CLEANING_LST. Our list can be also saved in a file and loaded to the variable:

CLEANING_LST = ['gulf','d','ban','persic' ...]

Cleaning irrelevant words is an iterative process and you can add any other...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (3)

Siddhartha Chatterjee

Siddhartha Chatterjee is an experienced data scientist with a strong focus in the area of machine learning and big data applied to digital (e-commerce and CRM) and social media analytics. He worked between 2007 to 2012 with companies such as IBM, Cognizant Technologies, and Technicolor Research and Innovation. He completed a Pan-European Masters in Data Mining and Knowledge Management at Ecole Polytechnique of the University of Nantes and University of Eastern Piedmont, Italy. Since 2012, he has worked at OgilvyOne Worldwide, a leading global customer engagement agency in Paris, as a lead data scientist and set up the social media analytics and predictive analytics offering. From 2014 to 2016, he was a senior data scientist and head of semantic data of Publicis, France. During his time at Ogilvy and Publicis, he worked on international projects for brands such as Nestle, AXA, BNP Paribas, McDonald's, Orange, Netflix, and others. Currently, Siddhartha is serving as head of data and analytics of Groupe Aéroport des Paris.

See other products by Siddhartha Chatterjee

Baihaqi Siregar

See other products by Baihaqi Siregar

Michal Krystyanczuk

Michal Krystyanczuk is the co-founder of The Data Strategy, a start-up company based in Paris that builds artificial intelligence technologies to provide consumer insights from unstructured data. Previously, he worked as a data scientist in the financial sector using machine learning and big data techniques for tasks such as pattern recognition on financial markets, credit scoring, and hedging strategies optimization. He specializes in social media analysis for brands using advanced natural language processing and machine learning algorithms. He has managed semantic data projects for global brands, such as Mulberry, BNP Paribas, Groupe SEB, Publicis, Chipotle, and others. He is an enthusiast of cognitive computing and information retrieval from different types of data, such as text, image, and video.

See other products by Michal Krystyanczuk