Extracting subjects and objects of the sentence
Sometimes, we might need to find the subject and direct objects of the sentence, and that is easily accomplished with the spaCy
package.
Getting ready
We will be using the dependency tags from spaCy
to find subjects and objects. The code uses the spaCy
engine to parse the sentence. Then, the subject function loops through the tokens, and if the dependency tag contains subj
, it returns that token’s subtree, a Span
object. There are different subject tags, including nsubj
for regular subjects and nsubjpass
for subjects of passive sentences, thus we want to look for both.
How to do it…
We will use the subtree
attribute of tokens to find the complete noun chunk that is the subject or direct object of the verb (see the Getting the dependency parse recipe). We will define functions to find the subject, direct object, dative phrase, and prepositional phrases:
- Run the file and language utility notebooks:
%run -i "../util/file_utils.ipynb" %run -i "../util/lang_utils.ipynb"
- We will use two functions to find the subject and the direct object of the sentence. These functions will loop through the tokens and return the subtree that contains the token with
subj
ordobj
in the dependency tag, respectively. Here is the subject function. It looks for the token that has a dependency tag that containssubj
and then returns the subtree that contains that token. There are several subject dependency tags, includingnsubj
andnsubjpass
(for the subject of a passive sentence), so we look for the most general pattern:def get_subject_phrase(doc): for token in doc: if ("subj" in token.dep_): subtree = list(token.subtree) start = subtree[0].i end = subtree[-1].i + 1 return doc[start:end]
- Here is the direct object function. It works similarly to
get_subject_phrase
but looks for thedobj
dependency tag instead of a tag that containssubj
. If the sentence does not have a direct object, it will returnNone
:def get_object_phrase(doc): for token in doc: if ("dobj" in token.dep_): subtree = list(token.subtree) start = subtree[0].i end = subtree[-1].i + 1 return doc[start:end]
- Assign a list of sentences to a variable, loop through them, and use the preceding functions to print out their subjects and objects:
sentences = [ "The big black cat stared at the small dog.", "Jane watched her brother in the evenings.", "Laura gave Sam a very interesting book." ] for sentence in sentences: doc = small_model(sentence) subject_phrase = get_subject_phrase(doc) object_phrase = get_object_phrase(doc) print(sentence) print("\tSubject:", subject_phrase) print("\tDirect object:", object_phrase)
The result will be as follows. Since the first sentence does not have a direct object,
None
is printed out. For the sentenceThe big black cat stared at the small dog
, the subject isthe big black cat
and there is no direct object (the small dog
is the object of the prepositionat
). For the sentenceJane watched her brother in the evenings
, the subject isJane
and the direct object isher brother
. In the sentenceLaura gave Sam a very interesting book
, the subject isLaura
and the direct object isa very
interesting book
:The big black cat stared at the small dog. Subject: The big black cat Direct object: None Jane watched her brother in the evenings. Subject: Jane Direct object: her brother Laura gave Sam a very interesting book. Subject: Laura Direct object: a very interesting book
There’s more…
We can look for other objects, for example, the dative objects of verbs such as give and objects of prepositional phrases. The functions will look very similar, with the main difference being the dependency tags: dative
for the dative object function, and pobj
for the prepositional object function. The prepositional object function will return a list since there can be more than one prepositional phrase in a sentence:
- The dative object function checks the tokens for the
dative
tag. It returnsNone
if there are no dative objects:def get_dative_phrase(doc): for token in doc: if ("dative" in token.dep_): subtree = list(token.subtree) start = subtree[0].i end = subtree[-1].i + 1 return doc[start:end]
- We can also combine the subject, object, and dative functions into one with an argument that specifies which object to look for:
def get_phrase(doc, phrase): # phrase is one of "subj", "obj", "dative" for token in doc: if (phrase in token.dep_): subtree = list(token.subtree) start = subtree[0].i end = subtree[-1].i + 1 return doc[start:end]
- Let us now define a sentence with a dative object and run the function for all three types of phrases:
sentence = "Laura gave Sam a very interesting book." doc = small_model(sentence) subject_phrase = get_phrase(doc, "subj") object_phrase = get_phrase(doc, "obj") dative_phrase = get_phrase(doc, "dative") print(sentence) print("\tSubject:", subject_phrase) print("\tDirect object:", object_phrase) print("\tDative object:", dative_phrase)
The result will be as follows. The dative object is
Sam
:Laura gave Sam a very interesting book. Subject: Laura Direct object: a very interesting book Dative object: Sam
- Here is the prepositional object function. It returns a list of objects of prepositions, which will be empty if there are none:
def get_prepositional_phrase_objs(doc): prep_spans = [] for token in doc: if ("pobj" in token.dep_): subtree = list(token.subtree) start = subtree[0].i end = subtree[-1].i + 1 prep_spans.append(doc[start:end]) return prep_spans
- Let’s define a list of sentences and run the two functions on them:
sentences = [ "The big black cat stared at the small dog.", "Jane watched her brother in the evenings." ] for sentence in sentences: doc = small_model(sentence) subject_phrase = get_phrase(doc, "subj") object_phrase = get_phrase(doc, "obj") dative_phrase = get_phrase(doc, "dative") prepositional_phrase_objs = \ get_prepositional_phrase_objs(doc) print(sentence) print("\tSubject:", subject_phrase) print("\tDirect object:", object_phrase) print("\tPrepositional phrases:", prepositional_phrase_objs)
The result will be as follows:
The big black cat stared at the small dog. Subject: The big black cat Direct object: the small dog Prepositional phrases: [the small dog] Jane watched her brother in the evenings. Subject: Jane Direct object: her brother Prepositional phrases: [the evenings]
There is one prepositional phrase in each sentence. In the sentence
The big black cat stared at the small dog
, it isat the small dog
, and in the sentenceJane watched her brother in the evenings
, it isin
the evenings
.
It is left as an exercise for you to find the actual prepositional phrases with prepositions intact instead of just the noun phrases that are dependent on these prepositions.