Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Python Social Media Analytics
Python Social Media Analytics

Python Social Media Analytics: Analyze and visualize data from Twitter, YouTube, GitHub, and more

Arrow left icon
Profile Icon Siddhartha Chatterjee Profile Icon Baihaqi Siregar Profile Icon Michal Krystyanczuk
Arrow right icon
$54.99
Full star icon Full star icon Full star icon Full star icon Empty star icon 4 (2 Ratings)
Paperback Jul 2017 312 pages 1st Edition
eBook
$9.99 $43.99
Paperback
$54.99
Subscription
Free Trial
Renews at $19.99p/m
Arrow left icon
Profile Icon Siddhartha Chatterjee Profile Icon Baihaqi Siregar Profile Icon Michal Krystyanczuk
Arrow right icon
$54.99
Full star icon Full star icon Full star icon Full star icon Empty star icon 4 (2 Ratings)
Paperback Jul 2017 312 pages 1st Edition
eBook
$9.99 $43.99
Paperback
$54.99
Subscription
Free Trial
Renews at $19.99p/m
eBook
$9.99 $43.99
Paperback
$54.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Table of content icon View table of contents Preview book icon Preview Book

Python Social Media Analytics

Introduction to the Latest Social Media Landscape and Importance

Have you seen the movie The Social Network? If you have not, it could be a good idea to see it before you read this book. If you have, you may have seen the success story around Mark Zuckerberg and his company Facebook. This was possible due to power of the platform in connecting, enabling, sharing, and impacting the lives of almost two billion people on this planet.

The earliest social networks existed as far back as 1995; such as Yahoo (Geocities), theglobe.com, and tripod.com. These platforms were mainly to facilitate interaction among people through chat rooms. It was only at the end of the 90s that user profiles became the in thing in social networking platforms, allowing information about people to be discoverable, and therefore, providing a choice to make friends or not. Those embracing this new methodology were Makeoutclub, Friendster, SixDegrees.com, and so on.

MySpace, LinkedIn, and Orkut were thereafter created, and the social networks were on the verge of becoming mainstream. However, the biggest impact happened with the creation of Facebook in 2004; a total game changer for people's lives, business, and the world. The sophistication and the ease of using the platform made it into mainstream media for individuals and companies to advertise and sell their ideas and products. Hence, we are in the age of social media that has changed the way the world functions.

Since the last few years, there have been new entrants in the social media, which are essentially of different interaction models as compared to Facebook, LinkedIn, or Twitter. These are Pinterest, Instagram, Tinder, and others. Interesting example is Pinterest, which unlike Facebook, is not centered around people but is centered around interests and/or topics. It's essentially able to structure people based on their interest around these topics. CEO of Pinterest describes it as a catalog of ideas. Forums which are not considered as regular social networks, such as Facebook, Twitter, and others, are also very important social platforms. Unlike in Twitter or Facebook, forum users are often anonymous in nature, which enables them to make in-depth conversations with communities. Other non-typical social networks are video sharing platforms, such as YouTube and Dailymotion. They are non-typical because they are centered around the user-generated content, and the social nature is generated by the sharing of these content on various social networks and also the discussion it generates around the user commentaries. Social media is gradually changing from being platform centric to focusing more on experiences and features. In the future, we'll see more and more traditional content providers and services becoming social in nature through sharing and conversations. The term social media today includes not just social networks but every service that's social in nature with a wide audience.

To understand the importance of social media, it's interesting to look at the statistics of these platforms. It's estimated that out of around 3.4 billion internet users, 2.3 billion of them are active social media users. This is a staggering number, reinforcing the enormous importance of social media. In terms of users of individual social media platforms, Facebook leads the way with almost 1.6 billion active users. You must have heard the adage that if Facebook were a country, it would be second largest one after China and ahead of India. Other social platforms linked to Facebook are also benefiting from this user base, such as WhatsApp, hosting 1 billion users on its chat application, and Instagram, with 400 million on its image sharing social network.

Among other platforms, Tumblr and Twitter lead the way with 550 million and 320 million active users respectively. LinkedIn, the world's most popular professional social media has 100 million active users. Pinterest, which is a subject of a later chapter, also has 100 million active users. Seina and Weibo, the equivalents of Facebook and Twitter in China, alone host 222 million active users. In terms of growth and engagement, Facebook is still the fastest growing social media, way ahead of the rest. If we look at engagement, millennials (age group 18-34) spend close to 100 minutes on average per person per month on Facebook. The number is way lower for others. Among user-generated content and sharing platforms, YouTube is a leader with 300 hours of video uploaded every minute and 3.25 billion hours of video watched every month.

In this chapter, we will cover the following topics:

  • Social graph
  • Introduction to the latest social media landscape and importance
  • What does social data mean in the modern world?
  • Tools and their specificities to mine the social web (Python, APIs, and machine learning)

Introducing social graph

A social graph is created through this widespread interaction and exchange of information on social media. A social graph is a massive network that illustrates the relations between individuals and information on the internet. Facebook owns the largest social graph of relations of 1.5 billion users. Every social media has its own social graph. The nature of social graph and its utility can be of various types, based on the types of relations described as follows. We will show a concrete example of a piece of the social graph and how to analyze it.

  • User graph: This is a network that shows the relationships between users or individuals connected to each other.
  • Content graph: As there are billions of content being uploaded on social media, there is a relationship existing between different types of content (text, images, videos, or multimedia). These relations could be based on semantic sense around those content, or in-bond or out-bond links between them, like that of the Google's page rank.
  • Interest graph: The interest graph takes the original graphs a step further, where individuals on the social media or the internet are not related based on their mere links, like being added as a friend or followed on Twitter, but on their mutual interests. This has a huge advantage over standard social graph, in the sense that it leads to finding communities of people with similar interests. Even if these people have no interaction or know each other personally, there is an inherent link based on their interests and passions.

Notion of influence

This massive growth and interaction on the social web is leading the way to understand these individuals. Like in a society there are influencers, the same phenomenon is getting replicated on the social web. There are people who have more influence over other users. The process of finding influencers and calculating influence is becoming an important science. If you have used a service called Klout, you'll know what we are talking about. Klout gives a 'social influence score' based on your social media activities. There are questions about the relevance of such scores, but that's only because the influence of a person is a very relative topic. In fact, in our view, no one is an influencer while everyone's an influencer. This can sound very confusing but what we are trying to say is that influence is relative. Someone who is an influencer to you may not be an influencer to another. If you need admission of your child to a school, the principal of the school is an influencer, but if you are seeking admission to a university, the same principal is not an influencer to you. This confusion makes the topic super exciting; trying to understand human dynamics and then figuring out who influences whom and how. Merely having thousands of followers on Twitter doesn't make one an influencer but the influencer of his or her followers and the way they are influenced to take action, sure does. Our book will not get into detailed aspects of influence but it's important to keep in mind this notion while trying to understand social media analytics.

Social impacts

Social media is already having a profound influence on both society and business. The societal impact has been both psychological and behavioral. Various events, crises, and issues in the world have received a boost because of the use of social media by millions of people. Stating a few examples would be that of the Arab Spring and the refugee crisis. In environmental crisis, such as earthquakes, social media like Twitter has proved in accelerating information and action because of its immediacy of dissemination and spread.

Platforms on platform

Social media companies like Facebook started presenting their technology as a platform, where programmers could build further tools to give rise to more social experiences, such as games, contests, and quizzes, which in turn gave rise to social interactions and experiences beyond mere conversational interaction. Today, there is a range of tools that allows one to build over the platforms. Another application of this is to gather intelligence through the data collected from these platforms. Twitter shares a lot of its data around the usage of its platform with programmers and companies. Similarly, most of the popular social networks have started sharing their data with developers and data warehousing companies. Sharing their data serves revenue growth, and is also a very interesting source for researchers and marketers to learn about people and the world.

Delving into social data

The data acquired from social media is called social data. Social data exists in many forms.

The types of social media data can be information around the users of social networks, like name, city, interests, and so on. These types of data that are numeric or quantifiable are known as structured data.

However, since social media are platforms for expression, a lot of the data is in the form of texts, images, videos, and such. These sources are rich in information, but not as direct to analyze as structured data described earlier. These types of data are known as unstructured data.

The process of applying rigorous methods to make sense of the social data is called social data analytics. In the book, we will go into great depth in social data analytics to demonstrate how we can extract valuable sense and information from these really interesting sources of social data. Since there are almost no restrictions on social media, there are lot of meaningless accounts, content, and interactions. So, the data coming out of these streams is quite noisy and polluted. Hence, a lot of effort is required to separate the information from the noise. Once the data is cleaned and we are focused on the most important and interesting aspects, we then require various statistical and algorithmic methods to make sense out of the filtered data and draw meaningful conclusions.

Understanding semantics

A concept important to understand when handling unstructured data is semantics. Dictionaries define the term as the branch of linguistics and logic concerned with meaning.

It is a concept that comes from linguistic science and philosophy, to deal with the study and research of meaning. These meanings are uncovered by understanding the relationship between words, phrases, and all types of symbols. From a social media point of view, symbol could be the popular emoticons, which are not exactly formal language but they signify emotions. These symbols can be extended to images and videos, where patterns in their content can be used to extract meanings. In the later chapters, we will show few techniques that can help you to get meaning out of textual data. Extracting meaning or sense from images and videos is out of scope for the book. Semantic technology is very central to effectively analyzing unstructured social data.

For effectively extracting sense out of social data, semantic technologies have underlying artificial intelligence or machine learning algorithms. These algorithms allow you to find patterns in the data, which are then humanly interpreted. That's why social data analytics is so exciting, as it brings together knowledge from the fields of semantics and machine learning, and then binds it with sociology for business or other objectives.

Defining the semantic web

The growth of the internet has given rise to platforms like websites, portals, search engines, social media, and so on. All of these have created a massive collection of content and documents. Google and other search engines have helped to organize these documents and make them accessible to everyday users. So, today we are able to search our questions and select websites or pages that are linked to the answer. Even social media content is more and more accessible via search engines. You may find a tweet that you created two years back suddenly showing on a Google result. The problem of organization of web content is almost a solved problem. However, wouldn't it be more exciting if you asked a question on Google, Bing, or another search engine and it directly gave you the answer, just like a friend with the required knowledge? This is exactly what the future web would look like and would do. Already, for example, if you put the query on Google about what's the temperature in Paris? or who is the wife of Barack Obama?, it gives you the right answer. The ability of Google to do this is inherently semantic technology with natural language processing and machine learning. Algorithms that Google has behind its search engine creates links between queries and content by understanding relation between words, phrases, and actual answers.

However, today only a fixed number of questions can be answered, as there is big risk of inferring wrong answers on multiple questions. The future of the internet will be an extension to the World Wide Web, which is the semantic web. The term was coined by the creator of the World Wide Web, Tim Berners-Lee. The semantic web is a complex concept on a simple idea of connecting entities (URLs, pages, and content) on the web through relations, but the underlying implementation is difficult at scale, due to the sheer volume of entities present on the internet. New markup languages called Resource Description Framework (RDF) and Web Ontology Language (OWL) will be used to create these links between pages and content, based on relations. These new languages will allow creators of content to add meaning to their documents, which machines could process for reasoning or inference purposes, allowing automating many tasks on the web. Our book is not about explaining the underlying concepts of the semantic web, but just for your knowledge about where the web is heading and to appreciate the needs to mine the web more intelligently, as you'll learn in the later chapters.

Exploring social data applications

Now that you know where the future of the web is heading, let's shift our focus back to our discussion on the purpose of analyzing the social web data. We have discussed about the nature of social media and the social data, structured and unstructured, but you must be curious as to how this is used in the real world. In our view, restricting the application of social data analytics to certain fields or sectors is not entirely fair. Social data analytics leads you to the learning or discovery of facts or knowledge. If acquiring knowledge can't be restricted to a few fields, neither can be social media analytics. However, there are some fields that are prospering more from this science, such as marketing, advertising, and research communities. Social media data is being integrated more and more in existing digital services to provide a much more personalized experience through recommendations. You must have seen that most online services allow you to register using your social profiles along with added information. When you do so, the service is able to mine your social data and recommend products or catalogs aligned with your interests. Entertainment services like Spotify and Netflix, or e-commerce ones like Amazon, eBay, and others, are able to propose personalized recommendations based on social data analytics and other data sources. More traditional companies selling consumer products derive value from social data in their marketing of products and brands. People use social networks as a means to both connect with companies and to express about their products and services. Hence, there is a huge amount of data on the social web that contains customer preferences and complaints about companies. This is an example of unstructured-social data, since it's mostly textual or images in format. Companies are analyzing this data to understand how consumers feel and use their services or campaigns, and then are using this intelligence to integrate it in their marketing and communications.

A similar approach has been applied in political campaigns to understand the opinion of people on various political issues. Analysts and data scientists have gone as far as trying to predict election results using sentiments of people about the concerned politicians. There are certainly many data scientists using social media data to predict the results of Clinton and Trump elections. There have been attempts to predict the stock market using social data but this has not been very successful, as financial data is highly sensitive and volatile and so is social data, and combining the two is still a big challenge.

In the later chapters, you'll see how we can analyze the Facebook page of a brand to understand their relation with their consumers. In Chapter 7, Scraping and Extracting Conversational Topics on Internet Forums about analyzing forums, you'll see how we are able to understand deeper conversations regarding certain subjects. Building recommendation engines is beyond the scope of the book, but you'll know enough about social data in order to integrate it for your recommender system projects.

Now that you know enough about social media, social data, and their applications, we will dive into the methods to get on top of social data. Among the many techniques used to analyze social data, machine learning is one of the most effective ones.

Understanding the process

Once you are familiar with the topic of social media data, let us proceed to the next phase. The first step is to understand the process involved in exploitation of data present on social networks. A proper execution of the process, with attention to small details, is the key to good results. In many computer science domains, a small error in code will lead to a visible or at least correctable dysfunction, but in data science, it will produce entirely wrong results, which in turn will lead to incorrect conclusions.

The very first step of data analysis is always problem definition. Understanding the problem is crucial for choosing the right data sources and the methods of analysis. It also helps to realize what kind of information and conclusions we can infer from the data and what is impossible to derive. This part is very often underestimated while it is key to successful data analysis.

Any question that we try to answer in a data science project has to be very precise. Some people tend to ask very generic questions, such as I want to find trends on Twitter. This is not a correct problem definition and an analysis based on such statement can fail in finding relevant trends. By a naive analysis, we can get repeating Twitter ads and content generated by bots. Moreover, it raises more questions than it answers. In order to approach the problem correctly, we have to ask in the first step: what is a trend? what is an interesting trend for us? and what is the time scope? Once we answer these questions, we can break up the problem in multiple sub problems: I'm looking for the most frequent consumer reactions about my brand on Twitter in English over the last week and I want to know if they were positive or negative. Such a problem definition will lead to a relevant, valuable analysis with insightful conclusions.

The next part of the process consists of getting the right data according to the defined problem. Many social media platforms allow users to collect a lot of information in an automatized way via APIs (Application Programming Interfaces), which is the easiest way to complete the task. However, other platforms, such as forums or blogs, usually require a customized programming approach (scraping), which will be explained in later chapters.

Once the data is stored in a database, we perform the cleaning. This step requires a precise understanding of the project's goals. In many cases, it will involve very basic tasks such as duplicates removal, for example, retweets on Twitter, or more sophisticated such as spam detection to remove irrelevant comments, language detection to perform linguistic analysis, or other statistical or machine learning approaches that can help to produce a clean dataset.

When the data is ready to be analyzed, we have to choose what kind of analysis and structure the data accordingly. If our goal is to understand the sense of the conversations, then it only requires a simple list of verbatims (textual data), but if we aim to perform analysis on different variables, like number of likes, dates, number of shares, and so on, the data should be combined in a structure such as data frame, where each row corresponds to an observation and each column to a variable.

The choice of the analysis method depends on the objectives of the study and the type of data. It may require statistical or machine learning approach, or a specific approach to time series. Different approaches will be explained on the examples of Facebook, Twitter, YouTube, GitHub, Pinterest, and Forum data, subsequently in the book.

Once the analysis is done, it's time to infer conclusions. We can derive conclusions based on the outputs from the models, but one of the most useful tools is visualization technique. Data and output can be presented in many different ways, starting from charts, plots, and diagrams through more complex 2D charts, to multidimensional visualizations. These techniques are shown in example chapters as well as the reasoning process to infer insightful conclusions.

Once the process is clear enough, we can start setting up the programming environment.

Working environment

The choice of the right tools is decisive for the smooth execution of the process. There are some important parts of the environment which facilitate data manipulation and algorithm implementation, but above all, they make the data science reproducible. We have selected the main components, which are widely used by data scientists all over the world, related to programming language, integrated development environment, and version control system.

Defining Python

Python is one of the most common programming languages among data scientists, along with R. The main advantage of Python is its flexibility and simplicity. It makes the data analysis and manipulation easy by offering a lot of packages. It shows great performance in analyzing unstructured textual data and has a very good ecosystem of tools and packages for this purpose.

For the purposes of the book, we have chosen Python 3.5.2. It is the most up-to-date version, which implements many improvements compared to Python 2.7. The main advantage in text analysis is an automatic management of Unicode variables. Python 2.7 is still widely used by programmers and data scientists due to a big choice of external libraries, documentation, and online resources. However, the new version has already reached a sufficient level of compatibility with packages, and on top of it, offers multiple new features.

We will use the pip command tool for installation of all libraries and dependencies.

Selecting an IDE

The choice of IDE (Integrated Development Environment) is mostly a matter of personal preferences. The most common choices are PyCharm, Sublime Text, or Atom. PyCharm is a very complete development environment while Sublime Text is a lightweight text editor which allows us to install additional plugins. Its main advantage is the fact that the user can choose the tools that he needs for his development. Atom is the newest software of all three, similar to Sublime Text. It was developed by GitHub and integrates by default the git version control system. In the book, we will use Sublime Text as the simplest and the easiest solution to start data analysis and development.

Illustrating Git

A version control system is one of the main elements of programming process. It helps to manage different versions of the code over time and reverse the changes in case of errors or wrong approach. The most widespread and easy-to-use version control system is Git and we will use it in our examples.

Getting the data

Data harvesting is the entry point for any social media analysis project. There are two main ways to collect data for an analysis: by connecting to APIs or by crawling and scraping the social networks. It is crucial to understand how the data was collected in order to be aware of the bias that might introduced. Different harvesting techniques will require customized approaches to data preprocessing and analysis workflow, which we will explain in further chapters.

Defining API

A widely used term, API (Application Programming Interface) is defined as a set of instructions and standards to access a web based software application. But what does it mean in real life? Firstly, APIs allow users to send a request for a particular resource, such as Facebook or Twitter , and receive some data in response. It is worth noting that all API providers fix some limitations on the quantity or type of data which users can obtain. APIs give access data processing resources, such as AlchemyAPI that receives in a request verbatim (textual data) and sends in response all results of the analysis, such as nouns, verbs, entities, and so on. In our case, the APIs are used either to get data from social networks or to execute some complex processing on them.

In order to access and manipulate APIs, we have to install the urllib2 library:

pip3 install urllib2

In some cases, if you fail to perform the installation. You can also try using the request library, which is compatible with Python 2.x and 3.x.

pip3 install request

Scraping and crawling

Scraping (or web scraping) is a technique to extract information from websites. When we do not have access to APIs, we can only retrieve visible information from HTML generated on a web page. In order to perform the task, we need a scraper that is able to extract information that we need and structure it in a predefined format. The next step is to build a crawler—a tool to follow links on a website and extract the information from all sub pages. When we decide to build a scraping strategy, we have to take into consideration the terms and conditions, as some websites do not allow scraping.

Python offers very useful tools to create scrapers and crawlers, such as beautifulsoup and scrapy.

pip3 install bs4, scrapy 

Analyzing the data

In this section, we briefly introduce the main techniques which lie behind the social media analysis process and bring intelligence to the data. We also present how to deal with reasonably big amount of data using our development environment. However, it is worth noting that the problem of scaling and dealing with massive data will be analyzed in Chapter 9, Social Data Analytics at Scale - Spark and Amazon Web Services.

Brief introduction to machine learning

The recent growth in the volume of data created by mobile devices and social networks has dramatically impacted the need for high performance computation and new methods of analysis. Historically, large quantities of data (big data) were analyzed by statistical approaches which were based on sampling and inductive reasoning to derive knowledge from data. A more recent development of artificial intelligence, and more specifically, machine learning, enabled not only the ability to deal with large volume of data, but it brought a tremendous value to businesses and consumers by extracting valuable insights and hidden patterns.

Machine learning is not new. In 1959, Arthur Samuel defined machine learning as:

Field of study that gives computers ability to learn without being specifically programmed for it.

Within the field of data analytics, machine learning is a method used to devise complex models and algorithms that allow to This approach is similar to a person who increases his knowledge on a subject by reading more and more books on the subject. There are three main approaches in machine learning: supervised learning, unsupervised learning, and reinforcement learning.

Supervised learning assumes that we know what the outputs are of each data point. For example, we learn that a car that costs $80,000, which has an electric engine and acceleration of 0-100 km/h in 3 seconds, is called Tesla; another car, which costs $40,000, has a diesel engine, and acceleration of 0-100 km/h in 9.2 seconds, is called Toyota; and so on. Then, when we look for the name of a car which costs $35,000, has acceleration of 0-100 km/h in 9.8 seconds, and has a diesel engine, it is most probably Toyota and not Tesla.

Unsupervised learning is used when we do not know the outputs. In the case of cars, we only have technical specifications: acceleration, price, engine type. Then we cluster the data points into different groups (clusters) of similar cars. In our case, we will have the clusters with similar price and engine types. Then, we understand similarities and differences between the cars.

The third type of machine learning is reinforcement learning, which is used more in artificial intelligence applications. It consists of devising an algorithm that learns how to behave based on a system of rewards. This kind of learning is similar to the natural human learning process. It can be used in teaching an algorithm how to play chess. In the first step, we define the environment—the chess board and all possible moves. Then the algorithm starts by making random moves and earns positive or negative rewards. When a reward is positive, it means that the move was successful, and when it is negative, it means that it has to avoid such moves in the future. After thousands of games, it finishes by knowing all the best sequences of moves.

In real-life applications, many hybrid approaches are widely used, based on available data and the complexity of problems.

Techniques for social media analysis

Machine learning is a basic tool to add intelligence and extract valuable insights from social media data. There exist other widespread concepts that are used for social media analysis: Text Analytics, Natural Language Processing, and Graph Mining.

The first notion allows to retrieve non trivial information from textual data, such as brands or people names, relationships between words, extraction of phone numbers, URLs, hashtags, and so on. Natural Language Processing is more extensive and aims at finding the meaning of the text by analyzing text structure, semantics, and concepts among others.

Social networks can also be represented by graph structures. The last mining technique enables the structural analysis of such networks. These methods help in discovering relationships, paths, connections and clusters of people, brands, topics, and so on, in social networks.

The applications of all the techniques will be presented in following chapters.

Setting up data structure libraries

In our analysis, we will use some libraries that enable flexible data structures, such as pandas and sframe. The advantage of sframe over pandas is that it helps to deal with very big datasets which do not fit RAM memory. We will also use a pymongo library to pull collected data from MongoDB, as shown in the following code:

pip3 install pandas, sframe, pymongo 

All necessary machine learning libraries will be presented in corresponding chapters.

Visualizing the data

Visualization is one of the most important parts of data science process. It helps in the initial steps of analysis, for example, to choose the right method or algorithm according to the structure of the data, but essentially to present in a simple way the obtained results. Python ecosystem proposes many interesting libraries, such us plotly, matplotlib, and seaborn, among others, as follows. In the following chapters, we will focus on three of them.

pip3 install plotly, matplotlib, seaborn 

Getting started with the toolset

Once you set up the whole environment, you can create your first project. If you use Linux or macOS machine, you can open a terminal and go to your working directory. Then, use the following command to create your project directory:

mkdir myproject 

On Windows machines, you can create the directory in the usual way without terminal.

At the same time, we initialize an empty repository in Git (in terminal on Linux or macOS, or in Git bash on Windows):

cd myproject 
git init

Then, you can open your directory in Sublime Text 2 using its GUI (Graphical User Interface) and create your first Python file. Now it's time to start working on a real project.

Summary

The avalanche of social network data is a result of communication platforms being developed for the last two decades. These are the platforms that evolved from chat rooms to personal information sharing and finally, social and professional networks. Among many, Facebook, Twitter, Instagram, Pinterest, and LinkedIn have emerged as the modern day social media. These platforms collectively have reach of more than a billion individuals across the world, sharing their activities and interaction with each other. Sharing of their data by these media through APIs and other technologies has given rise to a new field called social media analytics. This has multiple applications, such as in marketing, personalized recommendations, research, and societal. modern data science techniques such as machine learning and text mining are widely used for these applications. Python is one of the most programming languages used for these techniques. However, manipulating the unstructured-data from social networks requires a lot of precise processing and preparation before coming to the most interesting bits.

In the next chapter, we will see the way this data from social networks can be harnessed, processed, and prepared to make a sandbox for interesting analysis and applications in the subsequent chapters.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Acquire data from various social media platforms such as Facebook, Twitter, YouTube, GitHub, and more
  • Analyze and extract actionable insights from your social data using various Python tools
  • A highly practical guide to conducting efficient social media analytics at scale

Description

Social Media platforms such as Facebook, Twitter, Forums, Pinterest, and YouTube have become part of everyday life in a big way. However, these complex and noisy data streams pose a potent challenge to everyone when it comes to harnessing them properly and benefiting from them. This book will introduce you to the concept of social media analytics, and how you can leverage its capabilities to empower your business. Right from acquiring data from various social networking sources such as Twitter, Facebook, YouTube, Pinterest, and social forums, you will see how to clean data and make it ready for analytical operations using various Python APIs. This book explains how to structure the clean data obtained and store in MongoDB using PyMongo. You will also perform web scraping and visualize data using Scrappy and Beautifulsoup. Finally, you will be introduced to different techniques to perform analytics at scale for your social data on the cloud, using Python and Spark. By the end of this book, you will be able to utilize the power of Python to gain valuable insights from social media data and use them to enhance your business processes.

Who is this book for?

If you are a programmer or a data analyst familiar with the Python programming language and want to perform analyses of your social data to acquire valuable business insights, this book is for you. The book does not assume any prior knowledge of any data analysis tool or process.

What you will learn

  • Understand the basics of social media mining
  • Use PyMongo to clean, store, and access data in MongoDB
  • Understand user reactions and emotion detection on Facebook
  • Perform Twitter sentiment analysis and entity recognition using Python
  • Analyze video and campaign performance on YouTube
  • Mine popular trends on GitHub and predict the next big technology
  • Extract conversational topics on public internet forums
  • Analyze user interests on Pinterest
  • Perform large-scale social media analytics on the cloud
Estimated delivery fee Deliver to Russia

Economy delivery 10 - 13 business days

$6.95

Premium delivery 6 - 9 business days

$21.95
(Includes tracking information)

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Jul 28, 2017
Length: 312 pages
Edition : 1st
Language : English
ISBN-13 : 9781787121485
Category :
Languages :
Concepts :
Tools :

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Estimated delivery fee Deliver to Russia

Economy delivery 10 - 13 business days

$6.95

Premium delivery 6 - 9 business days

$21.95
(Includes tracking information)

Product Details

Publication date : Jul 28, 2017
Length: 312 pages
Edition : 1st
Language : English
ISBN-13 : 9781787121485
Category :
Languages :
Concepts :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 142.97
Learning Data Mining with Python
$48.99
Python Web Scraping
$38.99
Python Social Media Analytics
$54.99
Total $ 142.97 Stars icon
Banner background image

Table of Contents

9 Chapters
Introduction to the Latest Social Media Landscape and Importance Chevron down icon Chevron up icon
Harnessing Social Data - Connecting, Capturing, and Cleaning Chevron down icon Chevron up icon
Uncovering Brand Activity, Popularity, and Emotions on Facebook Chevron down icon Chevron up icon
Analyzing Twitter Using Sentiment Analysis and Entity Recognition Chevron down icon Chevron up icon
Campaigns and Consumer Reaction Analytics on YouTube – Structured and Unstructured Chevron down icon Chevron up icon
The Next Great Technology – Trends Mining on GitHub Chevron down icon Chevron up icon
Scraping and Extracting Conversational Topics on Internet Forums Chevron down icon Chevron up icon
Demystifying Pinterest through Network Analysis of Users Interests Chevron down icon Chevron up icon
Social Data Analytics at Scale – Spark and Amazon Web Services Chevron down icon Chevron up icon

Customer reviews

Rating distribution
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
(2 Ratings)
5 star 50%
4 star 0%
3 star 50%
2 star 0%
1 star 0%
Neha Aug 14, 2018
Full star icon Full star icon Full star icon Full star icon Full star icon 5
A really good book for a fresher who wants to study social meadia analytics.
Amazon Verified review Amazon
casatishum Dec 12, 2018
Full star icon Full star icon Full star icon Empty star icon Empty star icon 3
some codes no longer works since social media tend to change fast. on top of that, there're also errors in code, which makes it confusing and frustrating to follow. not recommending.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is the delivery time and cost of print book? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela
What is custom duty/charge? Chevron down icon Chevron up icon

Customs duty are charges levied on goods when they cross international borders. It is a tax that is imposed on imported goods. These duties are charged by special authorities and bodies created by local governments and are meant to protect local industries, economies, and businesses.

Do I have to pay customs charges for the print book order? Chevron down icon Chevron up icon

The orders shipped to the countries that are listed under EU27 will not bear custom charges. They are paid by Packt as part of the order.

List of EU27 countries: www.gov.uk/eu-eea:

A custom duty or localized taxes may be applicable on the shipment and would be charged by the recipient country outside of the EU27 which should be paid by the customer and these duties are not included in the shipping charges been charged on the order.

How do I know my custom duty charges? Chevron down icon Chevron up icon

The amount of duty payable varies greatly depending on the imported goods, the country of origin and several other factors like the total invoice amount or dimensions like weight, and other such criteria applicable in your country.

For example:

  • If you live in Mexico, and the declared value of your ordered items is over $ 50, for you to receive a package, you will have to pay additional import tax of 19% which will be $ 9.50 to the courier service.
  • Whereas if you live in Turkey, and the declared value of your ordered items is over € 22, for you to receive a package, you will have to pay additional import tax of 18% which will be € 3.96 to the courier service.
How can I cancel my order? Chevron down icon Chevron up icon

Cancellation Policy for Published Printed Books:

You can cancel any order within 1 hour of placing the order. Simply contact customercare@packt.com with your order details or payment transaction id. If your order has already started the shipment process, we will do our best to stop it. However, if it is already on the way to you then when you receive it, you can contact us at customercare@packt.com using the returns and refund process.

Please understand that Packt Publishing cannot provide refunds or cancel any order except for the cases described in our Return Policy (i.e. Packt Publishing agrees to replace your printed book because it arrives damaged or material defect in book), Packt Publishing will not accept returns.

What is your returns and refunds policy? Chevron down icon Chevron up icon

Return Policy:

We want you to be happy with your purchase from Packtpub.com. We will not hassle you with returning print books to us. If the print book you receive from us is incorrect, damaged, doesn't work or is unacceptably late, please contact Customer Relations Team on customercare@packt.com with the order number and issue details as explained below:

  1. If you ordered (eBook, Video or Print Book) incorrectly or accidentally, please contact Customer Relations Team on customercare@packt.com within one hour of placing the order and we will replace/refund you the item cost.
  2. Sadly, if your eBook or Video file is faulty or a fault occurs during the eBook or Video being made available to you, i.e. during download then you should contact Customer Relations Team within 14 days of purchase on customercare@packt.com who will be able to resolve this issue for you.
  3. You will have a choice of replacement or refund of the problem items.(damaged, defective or incorrect)
  4. Once Customer Care Team confirms that you will be refunded, you should receive the refund within 10 to 12 working days.
  5. If you are only requesting a refund of one book from a multiple order, then we will refund you the appropriate single item.
  6. Where the items were shipped under a free shipping offer, there will be no shipping costs to refund.

On the off chance your printed book arrives damaged, with book material defect, contact our Customer Relation Team on customercare@packt.com within 14 days of receipt of the book with appropriate evidence of damage and we will work with you to secure a replacement copy, if necessary. Please note that each printed book you order from us is individually made by Packt's professional book-printing partner which is on a print-on-demand basis.

What tax is charged? Chevron down icon Chevron up icon

Currently, no tax is charged on the purchase of any print book (subject to change based on the laws and regulations). A localized VAT fee is charged only to our European and UK customers on eBooks, Video and subscriptions that they buy. GST is charged to Indian customers for eBooks and video purchases.

What payment methods can I use? Chevron down icon Chevron up icon

You can pay with the following card types:

  1. Visa Debit
  2. Visa Credit
  3. MasterCard
  4. PayPal
What is the delivery time and cost of print books? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela