Conversational Summarization with Natural Language Processing

A Real-World Use Case for NLP, Leveraging T5 and PyTorch Lightning and AWS SageMaker

Natural language processing (NLP) is the technology we use to get computers to interact with text data. Popular applications include chat bots, language translation and grammar correction. Traditionally, it’s a field dominated by word-counting techniques like Bag-of-Words (BOW) and Term Frequency-Inverse Document Frequency (TF-IDF), where the goal is to make inferences based on the words present in the data. However, with recent breakthroughs in deep learning, we’re seeing the possibilities in NLP explode.

The Self-Attention Mechanism

Cornell University’s “Attention Is All You Need,” arguably the most influential NLP paper published in the last five years, introduced the “self-attention mechanism.” This mechanism created a simple solution to the problems the top NLP machine learning models were facing. At that time, the state-of-the-art NLP models were sequence-to-sequence (seq2seq) Recurrent Neural Networks (RNN), which performed well on short text strings but struggled mightily when used on long-blocks of text. They were also very costly to train. Self-attention replaced recurrence, since it was simpler to calculate and expanded the length of text that could be processed.

Possibly the biggest benefit of self-attention is how easily it scales. Multiple layers of self-attention can be ensembled together and calculated at once in what’s known as the “multi-headed attention mechanism” — the core of present-day machine learning in NLP.

BERT And Pals

Maybe the most famous deep-learning model to be released, Bidirectional Encoder Representations from Transformers (BERT) is a seq2seq model comprised of transformer blocks. These transformers are just a collection of multi-headed attention layers ensembled together. The release of BERT started a revolution in NLP, similar to the one in the deep learning computer vision space in the 2010s. BERT showed the possible applications of these transformer models in almost every NLP task, including classification, named-entity recognition and sentiment analysis.

Since BERT, we’ve seen an unending supply of new transformer models, with each one surpassing the previous on at least one NLP task. In this article, we’ll focus on building a conversation summarization model using T5. I highly encourage you to continue researching fascinating models like RoBERTa from FacebookAI and GPT-3 from OpenAI.

Summarizing Conversations Walkthrough

Today, everyone is communicating more than ever. Between social media, text messages and email, it can be difficult to keep up and ignore the noise. To remedy this, I’ll use T5 to create a model that will summarize conversations.

I’ll do so by leveraging the following technologies:

  • AWS SageMaker

  • AWS S3

  • AWS CloudWatch

  • PyTorch

  • Pytorch-Lightning

  • HuggingFace/Transformers


Preparing The Dataset

Obtaining a proper dataset is one of the hardest tasks in modern NLP machine learning problems, as it often involves very time-intensive manual labelling and, in our case, by-hand summarization. Fortunately, the models I’ll share thrive on “transfer learning,” which is the process of training a model on one dataset, then later training again on another. The latter is often referred to as “fine tuning.”

To begin, I’ll first train my model on the SAMSum Corpus Dataset, a dataset of more than 16,000 conversations between two people, which has already been summarized by humans.

Listing 1: Example Data Point

{
    'id: '13818513',