Part-of-speech (PoS) tagging is the process of labeling words in a text according to their word types, such as nouns, adjectives, adverbs, verbs, prepositions, conjunctions, pronouns, interjections, etc.
Let's try to understand how PoS tagging works through this example:
In this example, "I" is labeled as a personal pronoun (PRP), "work" is labeled as a third-person singular present verb (VBP), "at" as a preposition (IN), and "Educative" as a singular noun (NN).
PoS tagging can be implemented by using the nltk
library. We need to follow these steps to implement POS tagging:
We first need to import the relevant libraries. We can do this using the following code snippet:
import nltkfrom nltk import word_tokenize
Next, we give the text that needs to be labeled as the input, and tokenize it. The word_tokenize()
function in nltk
tokenizes the text into separate words. We can do this using the following code snippet:
text = "I love reading Educative Answers."tokens = nltk.word_tokenize(text)
In this step, we label the words with tags. This can be done by using the pos_tag()
function. The following snippet demonstrates this step:
print("Parts of Speech: ",nltk.pos_tag(tokens))
After this step, a list consisting of the tokenized words and their tags is printed, as follows:
Parts of Speech: [('I', 'PRP'), ('love', 'VB'),('reading', 'VB'), ('Educative', 'NN'), ('Answers', 'NNS')]
PoS tagging finds its uses in the following domains:
Named entity recognition (NER)
Sentiment analysis
Word-sense disambiguation
Question answering
Hence, PoS tagging is an integral part of NLP and is vital to differentiate between the two meanings of a word.
Free Resources