How to perform extractive summarization of text in Python

If you are building an application that has the capability to perform natural language processing on text data, it may be because you are generating a summary for the text dataThis is really helpful if you have a long piece of text and want to fetch only the important sentences to understand that text..

What is extractive summarization?

Extractive summarization is a type of summarization in which the articles are summarized by selecting a subset of words from the original article that retain the most important points. With this approach, we would not be generating a summary that contains words other than those present in the original article.

We will use a package named summarizer to help you generate summarized content in just one line of code!

Let’s first install the package by running:

pip install summarizer

As a dependency of this package, you also need to install nltk, which is one of the most widely used libraries, to perform Natural Language Processing.

Install this by running:

pip install nltk

We will be using the summarize() function from this package. Let’s take a look at the details of this function.

Parameters

The summarize() function accepts the following parameters:

  • title: This is the title of your text article. It will be used to determine what the article is about and the (potential) most important words.
  • text: The complete text data of your article.
  • count: This is an optional parameter with the default value of 55. It denotes the number of sentences that you want to return in the summary.

Return value

The summarize() function returns a list of the most important sentences. This can be treated as the summarized content for your text data.

Code

Now, since we know all the details, let’s move on to the code.

main.py
data.txt
from summarizer import summarize
with open("./data.txt") as f:
text = f.read()
text = " ".join(text.split())
title = "Microsoft Launches Intelligent Cloud Hub To Upskill Students In AI & Cloud Technologies"
print("Original Text: \n")
print(text)
summary = summarize(title, text)
print("\nSummarized Text: \n")
print(summary)

Explanation

  • In line 1, we import the required package.
  • In line 3, we load our text data from a file.
  • In line 7, we define the title for our text data.
  • In line 10, we print the original text data.
  • In line 12, we use the summarize() function to generate the summary for our text data.
  • In line 14, we print the generated summary.

So, in this way, with just one line of code, you can generate the extractive summary from any text data using summarizer in Python.

Free Resources