...

/

Calculate tf-idf with quanteda

Calculate tf-idf with quanteda

Learn to calculate tf-idf with quanteda to identify important words in documents for improved text analysis.

We'll cover the following...

tf-idf with quanteda

The quanteda package calculates the tf-idf of a document-feature matrix using the dfm_tfidf() function. Term frequency-inverse document frequency is a ratio used to identify important words for a collection of documents. To calculate this ratio, quanteda provides dfm_tfidf() that calculates the term frequency-inverse document frequency (tf-idfhttps://www.educative.io/answers/what-is-tf-idf) of a document-feature matrix.

Here’s code to demonstrate the creation of tf-idf:

Press + to interact
# install.packages("quanteda")
# install.packages("readtext")
library(quanteda, quietly = TRUE)
library(readtext)
tf_idf <- readtext(file = "data/mws*txt", docvarsfrom = "filenames") |>
corpus() |>
tokens(remove_numbers = TRUE, remove_punct = TRUE) |>
tokens_remove(pattern = stopwords("english")) |>
tokens_tolower() |>
dfm() |>
dfm_tfidf()
tf_idf[1,order(tf_idf[1,],decreasing = TRUE)]

Here’s an ...