EntityRuler
While covering Matcher, we saw that we can extract named entities with Matcher by using the ENT_TYPE
attribute. We recall from the previous chapter that ENT_TYPE
is a linguistic attribute that refers to the entity type of the token, such as person, place, or organization. Let's see an example:
pattern = [{"ENT_TYPE": "PERSON"}] matcher.add("personEnt", [pattern]) doc = nlp("Bill Gates visited Berlin.") matches = matcher(doc) for mid, start, end in matches: print(start, end, doc[start:end]) ... 0 1 Bill 1 2 Gates
Again, we created a Matcher
object called matcher
and called it on the Doc
object, doc
. The result is two tokens, Bill
and Gates
; Matcher always matches at the token level. We got Bill
and Gates
, instead of the full entity, Bill Gates
. If you want to get the full entity rather than the individual tokens, you can do this:
pattern = [{"ENT_TYPE": "PERSON"...