Using regular expressions
In this recipe, we will use regular expressions to find email addresses and URLs in text. Regular expressions are special character sequences that define search patterns. We will use a job descriptions dataset and write two regular expressions, one for emails and one for URLs.
Getting ready
We will need the pandas
package for handling the data. If you haven't already installed it, install it like so:
pip install pandas
Download the job descriptions dataset from https://www.kaggle.com/andrewmvd/data-scientist-jobs.
Here is a very handy tool for debugging regular expressions: https://regex101.com/. You can input a regular expression and a test string. It will show matches resulting from the regular expression, and the steps that the regular expression engine took in the process.
How to do it…
We will read the data from the CSV file into a pandas DataFrame and will use the Python re
package to create regular expressions and search...