Contractions are combinations of words that are shortened by dropping letters and replacing them with apostrophes. In NLP,
There are two main reasons why we should deal with contractions in NLP:
A computer doesn't recognize that the contractions are abbreviations for a combination of words. Hence, it recognizes "I'm" and "I am" as two different terms with different meanings.
Contractions increase the dimensionality of the
We can use the contractions
library of Python to expand the contractions. It can be installed by using the following command:
pip install contractions
The following code snippet demonstrates how to expand the contractions:
import contractionstext = '''Hello mom! Yes, I'm fine. How're you? No, I didn't have lunch. I'm about to go.Are you coming next weekend? I've been missing you.'''expanded_text = []for word in text.split():expanded_text.append(contractions.fix(word))expanded_text = ' '.join(expanded_text)print('Input : ' + text)print('\n')print('Output: ' + expanded_text)
Line 7–8: We use contractions.fix()
to expand the shortened words, and append them to the expanded_text
in a loop.
Line 10: We add space (' '
) between the words in the expanded_text
string.
It's very easy to use the contractions library to expand the words. However, if we take a closer look, we observe that some contractions represent multiple word combinations. Consider the following for example:
"ain't": "am not / are not / is not / has not / have not"
The contractions
library doesn't handle this ambiguity. For the example above, the package always expands to "are not."
This is demonstrated in the code below:
import contractionstext = '''I ain't doing that.'''expanded_text = []for word in text.split():expanded_text.append(contractions.fix(word))expanded_text = ' '.join(expanded_text)print('Input : ' + text)print('\n')print('Output: ' + expanded_text)
pycontractions
libraryWe can also use the pycontractions
library to expand the contractions. It works in the following way:
Case 1: If a contraction corresponds to only one sequence of words, pycontractions
replaces the contraction with that word sequence.
Case 2: If a contraction corresponds to many possible expansions. Then, in that case, pycontractions
produces all the possible expansions and then uses a spell checker. The grammatically incorrect options are discarded, and the correct choice is selected.
It has been observed that pycontractions
is more accurate than the contractions library of python as it takes into account the grammar of the text.
Free Resources