Using regular expressions
In this recipe, we will use regular expressions to find email addresses and URLs in text. Regular expressions are special character sequences that define search patterns and can be created and used via the Python re
package. We will use a job descriptions dataset and write two regular expressions, one for emails and one for URLs.
Getting ready
Download the job descriptions dataset here: https://www.kaggle.com/andrewmvd/data-scientist-jobs. It is also available in the book’s GitHub repository at https://github.com/PacktPublishing/Python-Natural-Language-Processing-Cookbook-Second-Edition/blob/main/data/DataScientist.csv. Save it into the /
data
folder.
The notebook is located at https://github.com/PacktPublishing/Python-Natural-Language-Processing-Cookbook-Second-Edition/blob/main/Chapter05/5.1_regex.ipynb.
How to do it…
We will read the data from the CSV file into a pandas
DataFrame and will use the Python re
package to create regular...