Working with structured data
Now that we have explored some of the basics of NLP, let's dive into some more complex and common use cases that are often observed in the biotech and life sciences fields. When working with text-based data, it is much more common to work with larger datasets rather than single strings. More often than not, we generally want these datasets to involve scientific data regarding specific areas of interest relating to a particular research topic. Let's go ahead and learn how to retrieve scientific data using Python.
Searching for scientific articles
To programmatically retrieve scientific publication data using Python, we can make use of the pymed
library from PubMed (https://pubmed.ncbi.nlm.nih.gov/). Let's go ahead and build a sample dataset:
- First, let's import our libraries and instantiate a new
PubMed
object:from pymed import PubMed pubmed = PubMed()
- Next, we will need to define and run our query. Let's go ahead...