Why NLP in a network analysis book?
Most of you probably bought this book in order to learn applied social network analysis using Python. So, why am I explaining NLP? Here’s why: if you know your way around NLP and are comfortable extracting data from text, that can be extremely powerful for creating network data and investigating the relationship between things that are mentioned in text. Here is an example from the book Alice’s Adventures in Wonderland by Lewis Carroll, my favorite book.
What can we observe from these words? What characters or places are mentioned? We can see that the Dormouse is telling a story about three sisters named Elsie, Lacie, and Tillie and that they lived at the bottom of a well. If you allow yourself to think in terms of relationships, you will see that these relationships exist:
- Three sisters -> Dormouse (he either knows them or knows a story about them)
- Dormouse -> Elsie
- Dormouse -> Lacie
- Dormouse -> Tillie
- Elsie -> bottom of a well
- Lacie -> bottom of a well
- Tillie -> bottom of a well
It’s also very likely that the three sisters all know each other, so additional relationships emerge:
- Elsie -> Lacie
- Elsie -> Tillie
- Lacie -> Elsie
- Lacie -> Tillie
- Tillie -> Elsie
- Tillie -> Lacie
Our minds build these relationship maps so effectively that we don’t even realize that we are doing it. The moment I read that the three were sisters, I drew a mental image that the three knew each other.
Let’s try another example from a current news story: Ocasio-Cortez doubles down on Manchin criticism (CNN, June 2021: https://edition.cnn.com/videos/politics/2021/06/13/alexandria-ocasio-cortez-joe-manchin-criticism-sot-sotu-vpx.cnn).
Who is mentioned, and what is their relationship? What can we learn from this short text?
- Rep. Alexandria Ocasio-Cortez is talking about Sen. Joe Manchin
- Both are Democrats
- Sen. Joe Manchin does not support a house voting rights bill
- Rep. Alexandria Ocasio-Cortez claims that Sen. Joe Manchin is being influenced by the legislation’s reforms
- Rep. Alexandria Ocasio-Cortez claims that Sen. Joe Manchin is being influenced by “dark money” political donations
- There may be a relationship between Sen. Joe Manchin and “dark money” political donors
We can see that even a small amount of text has a lot of information embedded.
If you are stuck trying to figure out relationships when dealing with text, I learned in college creative writing classes to consider the “W” questions (and How) in order to explain things in a story:
- Who: Who is involved? Who is telling the story?
- What: What is being talked about? What is happening?
- When: When does this take place? What time of the day is it?
- Where: Where is this taking place? What location is being described?
- Why: Why is this important?
- How: How is the thing being done?
If you ask these questions, you will notice relationships between things and other things, which is foundational for building and analyzing networks. If you can do this, you can identify relationships in text. If you can identify relationships in text, you can use that knowledge to build social networks. If you can build social networks, you can analyze relationships, detect importance, detect weaknesses, and use this knowledge to gain a really profound understanding of whatever it is that you are analyzing. You can also use this knowledge to attack dark networks (crime, terrorism, and so on) or protect people, places, and infrastructure. This isn’t just insights. These are actionable insights—the best kind.
That is the point of this book. Marrying NLP with social network analysis and data science is extremely powerful for acquiring a new perspective. If you can scrape or get the data you need, you can really gain deep knowledge of how things relate and why.
That is why this chapter aims to explain very simply what NLP is, how to use it, and what it can be used for. But before that, let’s get into the history for a bit, as that is often left out of NLP books.