...

/

Solution Explanations: Advanced Text Preprocessing

Solution Explanations: Advanced Text Preprocessing

Review solution explanations for the code challenges on advanced text preprocessing.

Solution 1: Part-of-speech tagging

Here’s the solution:

Press + to interact
main.py
feedback.csv
import pandas as pd
import nltk
import string
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
nltk.download('averaged_perceptron_tagger', quiet=True)
feedback_df = pd.read_csv('feedback.csv')
feedback_df['tokens'] = feedback_df['feedback'].apply(lambda text: word_tokenize(text.lower()))
stop_words = set(stopwords.words('english'))
feedback_df['tokens'] = feedback_df['tokens'].apply(lambda tokens: [token for token in tokens if token not in stop_words])
feedback_df['tokens'] = feedback_df['tokens'].apply(lambda tokens: [token for token in tokens if token not in string.punctuation])
feedback_df['pos_tags'] = feedback_df['tokens'].apply(nltk.pos_tag)
print(feedback_df['pos_tags'])

Let’s go through the solution explanation:

  • Line 9: We tokenize the text in the text column using the word_tokenize function and convert each token to lowercase. We then save the tokenized text as a new tokens column.

  • Line 10: We create a set of stopwords using the stopwords.words('english') function.

  • Lines 11–12: ...

Access this course and 1400+ top-rated courses and projects.