More spaCy features
Most of the NLP development is token and span oriented; that is, it processes tags, dependency relations, tokens themselves, and phrases. Most of the time we eliminate small words and words without much meaning; we process URLs differently, and so on. What we do sometimes depends on the token shape (token is a short word or token looks like an URL string) or more semantical features (such as the token is an article, or the token is a conjunction). In this section, we will see these features of tokens with examples. We'll start with features related to the token shape:
doc = nlp("Hello, hi!") doc[0].lower_ 'hello'
token.lower_
returns the token in lowercase. The return value is a Unicode string and this feature is equivalent to token.text.lower()
.
is_lower
and is_upper
are similar to their Python string method counterparts, islower()
and isupper()
. is_lower
returns True
if all the characters are lowercase, while is_upper
does the...