Document-Term Matrix

Learn about how a document-term matrix is a commonly accepted data structure for natural language processing.

We'll cover the following...

A document-term matrix is fairly simple to understand. It is a matrix with rows and columns.

  • Each row represents a document. In our case, there will be one row for Frankenstein and a second row for The Last Man.

  • Each column represents a term. In this case, terms are words, although they can be sentences, lines, paragraphs, or n-grams (more on these in a later lesson).

  • Each cell in the matrix contains the frequency of the term in the document.

On the other hand, a term-document matrix (TDM) is a data structure that is essentially the ...