Packt+ | Advance your knowledge in tech

0

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Free Learning

Python Social Media Analytics

You're reading from Python Social Media Analytics Analyze and visualize data from Twitter, YouTube, GitHub, and more

Product type Paperback

Published in Jul 2017

Publisher Packt

ISBN-13 9781787121485

Length 312 pages

Edition 1st Edition

Languages

Python

Tools

GitHub

Concepts

Data Analysis

Authors (3):

Baihaqi Siregar

Siddhartha Chatterjee

Michal Krystyanczuk

View More author details

Table of Contents (10) Chapters

Preface

1. Introduction to the Latest Social Media Landscape and Importance FREE CHAPTER

2. Harnessing Social Data - Connecting, Capturing, and Cleaning

3. Uncovering Brand Activity, Popularity, and Emotions on Facebook

4. Analyzing Twitter Using Sentiment Analysis and Entity Recognition

5. Campaigns and Consumer Reaction Analytics on YouTube – Structured and Unstructured

6. The Next Great Technology – Trends Mining on GitHub

7. Scraping and Extracting Conversational Topics on Internet Forums

8. Demystifying Pinterest through Network Analysis of Users Interests

9. Social Data Analytics at Scale – Spark and Amazon Web Services

Data processing

In the previous step we structured the raw data which is now ready for further analysis. Our objective is to analyze two types of data:

Textual data in description
Numerical data in other variables

Each of them requires a different pre-processing technique. Let's take a look at each type in detail.

Textual data

For the first kind, we have to create a new variable which contains a cleaned string. We will do it in three steps which have already been presented in previous chapters:

Selecting English descriptions
Tokenization
Stopwords removal

As we work only on English data, we should remove all the descriptions which are written in other languages. The main reason to do so is that each language requires a different processing and analysis flow. If we left descriptions in Russian or Chinese, we would have very noisy data which we would not be able to interpret. As a consequence, we can say that we are analyzing trends in the English-speaking world.

Firstly, we remove all the empty strings...

The rest of the chapter is locked

Register for a free Packt account to unlock a world of extra content!

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (3)

Baihaqi Siregar

Baihaqi Siregar

See other products by Baihaqi Siregar

Siddhartha Chatterjee

Siddhartha Chatterjee

Siddhartha Chatterjee is an experienced data scientist with a strong focus in the area of machine learning and big data applied to digital (e-commerce and CRM) and social media analytics. He worked between 2007 to 2012 with companies such as IBM, Cognizant Technologies, and Technicolor Research and Innovation. He completed a Pan-European Masters in Data Mining and Knowledge Management at Ecole Polytechnique of the University of Nantes and University of Eastern Piedmont, Italy. Since 2012, he has worked at OgilvyOne Worldwide, a leading global customer engagement agency in Paris, as a lead data scientist and set up the social media analytics and predictive analytics offering. From 2014 to 2016, he was a senior data scientist and head of semantic data of Publicis, France. During his time at Ogilvy and Publicis, he worked on international projects for brands such as Nestle, AXA, BNP Paribas, McDonald's, Orange, Netflix, and others. Currently, Siddhartha is serving as head of data and analytics of Groupe Aéroport des Paris.

See other products by Siddhartha Chatterjee

Michal Krystyanczuk

Michal Krystyanczuk

Michal Krystyanczuk is the co-founder of The Data Strategy, a start-up company based in Paris that builds artificial intelligence technologies to provide consumer insights from unstructured data. Previously, he worked as a data scientist in the financial sector using machine learning and big data techniques for tasks such as pattern recognition on financial markets, credit scoring, and hedging strategies optimization. He specializes in social media analysis for brands using advanced natural language processing and machine learning algorithms. He has managed semantic data projects for global brands, such as Mulberry, BNP Paribas, Groupe SEB, Publicis, Chipotle, and others. He is an enthusiast of cognitive computing and information retrieval from different types of data, such as text, image, and video.

See other products by Michal Krystyanczuk

Other recommended products

Related to this chapter

Learning Social Media Analytics with R

Learning Social Media Analytics with R

Dive into the world of social media and learn the art and science behind leveraging the power of R and analytics to transform data into actionable insights. This book will provide you with strategies and hands-on approaches to tap into data from diverse social media platforms and showcase the power of leveraging analytics to get insightful information.

May 2017 13h 8m

Hands-On Web Scraping with Python

Hands-On Web Scraping with Python

Web scraping is an essential technique used in many organizations to scrape valuable data from web pages. This book will help you master web scraping techniques and methodologies using Python libraries and other popular tools such as Selenium. By the end of this book, you will have learned how to efficiently scrape different websites.

Jul 2019 11h 40m

Natural Language Processing Fundamentals

Natural Language Processing Fundamentals

Natural Language Processing Fundamentals starts with basics and goes on to explain various NLP tools and techniques that equip you with all that you need to solve common business problems for processing text.

Mar 2019 12h 28m

The Natural Language Processing Workshop

The Natural Language Processing Workshop

The Natural Language Processing Workshop takes you through fundamental NLP techniques, such as preparing datasets, collecting text, extracting text, and sentiment analysis. As you progress, you'll get to grips with creating your own chatbots and dynamic models.

Aug 2020 15h 4m

Hands-On Big Data Modeling

Hands-On Big Data Modeling

Big data modeling is very challenging to handle using traditional database modeling and management systems. This book will teach you how to model big data using the latest and more efficient tools such as ERWIN, ANACONDA (Python), and WEKA to model data.

Nov 2018 10h 12m

Personalised recommendations for you

Based on your interests and search pattern

Modern Computer Vision with PyTorch

Modern Computer Vision with PyTorch

This book provides a hands-on approach to solving over 30 prominent real-world computer vision problems using PyTorch 2.x on actual datasets. Here you'll learn to build a neural network from scratch and optimize hyperparameters, perform image classification, multi-object detection, segmentation, and more. You'll also explore facial expression manipulation and combining CV with NLP and RL techniques, build generative AI applications, and take your model to production on AWS. By the end of this book, you'll master modern NN architectures and confidently solve real-world CV problems.

Jun 2024 24h 52m

Data Governance Handbook

Data Governance Handbook

This book provides a highly focused view of real business outcomes powered by data governance, that resonate with non-data executives such as CFOs and CEOs. You'll also find useful insights into how to implement data governance initiatives.

May 2024 13h 8m

Data Engineering with Databricks Cookbook

Data Engineering with Databricks Cookbook

This book shows you how to use Apache Spark, Delta Lake, and Databricks to build data pipelines, manage and transform data, optimize performance, and more. Additionally, you'll implement DataOps and DevOps practices, and orchestrate data workflows.

May 2024 14h 36m

Azure Data Engineer Associate Certification Guide

Azure Data Engineer Associate Certification Guide

Unlock the power of Azure data engineering with this certification guide, elevating your skills in data processing, storage, and security with the help of practical insights, hands-on exercises, and the latest advancements.

May 2024 18h 16m

Microsoft Power BI Cookbook

Microsoft Power BI Cookbook

Microsoft Power BI is the most sought-after platform for BI professionals' visualization needs. Explore the latest Power BI features, future AI enhancements, and integration with other Power Platform tools via new recipes in this updated edition.

Jul 2024 19h 56m

Python Data Cleaning Cookbook

Python Data Cleaning Cookbook

The book shows you how to clean, wrangle, and view data from multiple perspectives, including dataset and column attributes. You will cover common and not-so-common challenges that are faced while cleaning messy data for complex situations and learn to manipulate data to get it down to a form that can be useful for making the right decisions.

May 2024 16h 12m

Microsoft Azure AI Fundamentals AI-900 Exam Guide

Microsoft Azure AI Fundamentals AI-900 Exam Guide

This AI-900 study guide will help you prepare and practice for the certification exam. You'll delve into AI workloads, ML principles, computer vision, NLP, knowledge mining, and generative AI using Azure cloud services.

May 2024 9h 36m

Using Stable Diffusion with Python

Using Stable Diffusion with Python

This book shows you how to use Python to control Stable Diffusion and generate high-quality images. In addition to covering the basic usage of the diffusers package, the book provides solutions for extending the package for more advanced purposes.

Jun 2024 11h 44m

Getting Started with DuckDB

Getting Started with DuckDB

This hands-on book teaches you to analyze large datasets with blazing speed and ease. You will learn how to use DuckDB to quickly load, query, transform, analyze, and visualize data effectively through a series of practical examples.

Jun 2024 12h 44m

Databricks Certified Associate Developer for Apache Spark Using Python

Databricks Certified Associate Developer for Apache Spark Using Python

This guide gets you ready for certification with expert-backed content, key exam concepts, and topic reviews. Additionally, you'll be able to make the most of Apache Spark 3.0 to modernize workloads and more using specific tools and techniques.