Building an Extractive Text Summarizer with GUI using Python and NLTK

Kenneth Idanwekhai
5 min readJun 30, 2023

--

from pixabay

You must have seen various applications that convert long-form words to short words. While I was in university studying history, I was always looking for ways to summarize extremely long written texts to be able to read them in a short time. This process is called Text summarization. Text summarization is the process of obtaining summaries from a large source text without sacrificing important details. It is a powerful natural language processing (NLP) technique that condenses the content of a text while preserving its key information.

The capacity of NLP text summarizers to improve knowledge extraction and information retrieval procedures is one of its most important advantages. The technology allows experts in disciplines like research, journalism, and content curation to swiftly sift through voluminous amounts of text and locate pertinent information. These tools may be used by content producers to review and condense previously published articles, blog posts, or reports, giving them a thorough grasp of the subject. This makes it possible for writers to create thoughtful, interesting material that meets the demands of their target audience while also assuring originality and avoiding plagiarism. Additionally, teachers may use NLP text summarization tools to assess their students’ comprehension of texts and pinpoint any places that need extra explanation or discussion.

Types of Text Summarizers

There are mainly two types of Text summarizers: Extractive and Abstractive.

  1. Extractive: The most important sentences or phrases are chosen and combined by extractive summarizers in an effort to reduce the original material. These summarizers function by picking out key information from the original text and literally extracting it to provide a summary.

2. Abstractive → This method of summarizing goes beyond just extracting phrases from the source text, abstractive summarizers use a new strategy. Instead, they seek to comprehend the text’s content and context before using their own language-generating talents to produce a summary. The material from the original text can be paraphrased to provide more contextual meaning.

For this project, I will be using the Extractive method. In this tutorial, we will explore how to build a simple text summarizer using Python and the Natural Language Toolkit (NLTK). We will go through the code step by step, explaining its functionality and providing examples along the way.

Step 1: Importing the Python package to be used

The first step is to install and import the necessary libraries. Open your Python environment and execute the following code:

!pip install nltk
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize, sent_tokenize
import ipywidgets as widgets
from IPython.display import display

Installing the NLTK library and importing the necessary modules are covered in this section. These lines import the necessary libraries: nltk for natural language processing, wordnet from nltk.corpus for synonym retrieval, word_tokenize from nltk.tokenize for word tokenization, ipywidgets for creating the GUI widgets, and IPython.display for displaying the widgets. I have written an article explaining the basic NLP concepts here. If you do not understand some terms used here.

nltk.download('punkt')
nltk.download('wordnet')

These lines download the NLTK resources needed for tokenization and WordNet.

def paraphrase_text(b):
text = input_text.value.strip()

When the “Paraphrase” button is selected, the paraphrase_text function is defined on this line. The text typed into the input_text widget is retrieved, and any leading or following spaces are eliminated.

words = word_tokenize(text)
paraphrased_text = []

These lines tokenize the input text into words using the word_tokenize function and initialize an empty list, paraphrased_text, to store the paraphrased words.

for word in words:
synonyms = wordnet.synsets(word)
if synonyms:
paraphrased_word = synonyms[0].lemmas()[0].name()
paraphrased_text.append(paraphrased_word)
else:
paraphrased_text.append(word)

These lines iterate over each word in the input text. For each word, it retrieves the synonyms using WordNet’s synsets function. If synonyms are available, it selects the first lemma of the first synset as the paraphrased word. Otherwise, it keeps the original word. The paraphrased words are collected in the paraphrased_text list.

paraphrased_sentence = ' '.join(paraphrased_text)
output_text.value = paraphrased_sentence

These lines join the paraphrased words together using ' '.join() to form a paraphrased sentence. The generated paraphrased sentence is assigned to the value attribute of the output_text widget, which updates the displayed paraphrased text in the GUI.

input_text = widgets.Textarea(
value='',
placeholder='Enter text to paraphrase...',
layout={'height': '200px'}
)
display(input_text)

These lines create the input_text Textarea widget, which is used to enter the text to be paraphrased. It is initially empty and has a placeholder text. The widget is displayed in the notebook with a height of 200 pixels.

paraphrase_button = widgets.Button(description='Paraphrase')
paraphrase_button.on_click(paraphrase_text)
display(paraphrase_button)

These lines create the “Paraphrase” button widget using widgets.Button and associate the paraphrase_text function with the on_click event of the button. When the button is clicked, it triggers the paraphrasing process. The button is displayed in the notebook.

output_text = widgets.Textarea(
value='',
placeholder='Paraphrased text will appear here...',
layout={'height': '200px'},
disabled=True
)
display(output_text)

These lines create the output_text Textarea widget, which displays the generated paraphrased text. It is initially empty and has a placeholder text. The widget is disabled, so the user cannot edit its content. It is displayed in the notebook with a height of 200 pixels.

The final Product looks like this

I would try the app now by using any long-form test to see how it performs.

I input this

And it was summarized as this

Congratulations! You have successfully built a text summarizer with a GUI using Python, NLTK, and ipywidgets. Obviously, you can improve this and make it better. There are other packages that would make this better. This is just for demonstration purposes. The GUI-based approach allows users to conveniently enter text and obtain a summary with a single click. Feel free to experiment with different texts and customize the GUI to suit your specific needs.

--

--