More spaCy Features
Let's delve into more spaCy features.
We'll cover the following...
Most of the NLP development is token and span oriented; that is, it processes tags, dependency relations, tokens themselves, and phrases. Most of the time, we eliminate small words and words without much meaning; we process URLs differently, and so on. What we do sometimes depends on the token shape (token is a short word or token looks like an URL string) or more semantical features (such as the token is an article or the token is a conjunction). In this section, we will see these features of tokens with examples. We'll start with features related to the token shape:
doc = nlp("Hello, hi!")print(doc[0].lower_)
The token.lower_
method returns the token in lowercase. The return value is a Unicode string, and this feature is equivalent to token.text.lower()
.
...