Searching for Matching Documents with tf-idf
Learn about tf-idf and how it is calculated and used in information retrieval, search engines, and other NLP applications.
We'll cover the following...
Playing a game with documents
There is a common children’s game called “I Spy.” A group sits in a circle, and the leader says, “I spy, with my little eye, something blue.” Everyone else would then try to guess what the leader was looking at. Was it the blue telephone? Or perhaps the blue couch?
Natural language processing is often similar to this game. Given a document or a word, we have to determine the best-matching document from a list of documents. This is exactly what is done with an internet search or spam filtering.
There are many strategies for this type of search. One of the most common is called term frequency-inverse document frequency or tf-idf.
Note: TF–IDF, TF*IDF, ...