Parsing your data
In this section, we will talk about what parsing is and some common ways it is used. Sometimes you will receive data in a format that is not readily usable. Whether you are pulling data from a website, working with JSON files, or have big chunks of text, you will need to parse your data. There are many different parsers that you can use, depending on what you need to parse, but the general idea is that you are breaking a single large piece of data into several smaller pieces of data that can be easily identified and processed.
Natural Language Processing (NLP) is a field of data analytics that specializes in analyzing, you guessed it, language. Spoken or written, NLP is trying to translate common speech into actionable data. Parsing is necessary for even basic NLP.
Important note
In reference to NLP, parsing is called tokenization because it is breaking up the text into words, and each becomes its own object or token.
Let’s consider an example...