Every day, we are producing a huge amount of text data, either structured or unstructured plain format through various media such as Facebook, Twitter, Blog posts, or even scientific research articles. In the financial market, the sentiment of people plays a vital role. You can mine sentiment by analyzing text data obtained from various sources. In this chapter, you will learn the recipe related to working with unstructured text data. This chapter will cover the following recipes:
- Extracting unstructured text data from a plain web page
- Extracting text data from an HTML page
- Extracting text data from an HTML page using the XML library
- Extracting text data from PubMed
- Importing unstructured text data from a plain text file
- Importing plain text data from a PDF file
- Pre-processing text data for topic modeling and sentiment analysis
- Creating a word cloud to explore...