Jul 29, 20212 min read

Question generation using NLP

“Question generation using NLP by QuestGen.AI”, by Ramsri Goutham, CTO of QuestGen.AI

QuestGen is an open-source library used to generate questions automatically from text. The intuition is coming from the needs to create a tool to automate the assessment process helping teachers in their job. This tool is able to generate from an article/text: Multiple Choice Questions (MCQs), true or false questions, FAQs, paraphrasing, and question answering. Ramsri showed a use case on how to generate multiple choice questions using T5 Transformers.

Given, for instance an article, the process follows these steps:

-Extractive summarization;

-Identify key sentences/concepts;

-Identify keywords from sentences;

-Form multiple-choice questions;

-Distractors generation.

T5 Transformer model affords to reframe all NLP tasks into a text-to-text-format where the input and output are always text strings compared to BERT models where output can either a class label or a span of the input. It’s an encoder decoder Transformer model, so giving an input text it will learn to generate an output text, more precisely it will automatically generate a question that is ideally suited, to the context and an answer.

To train this model is been used a SQuAD data set (Stanford Question Answering Dataset). It is a reading comprehension data set, consisting of questions posed by crowd workers on a set of Wikipedia articles. Answers of questions are actually the keywords extracted from the context, and to obtain them can be used several python keyword extraction libraries.

The next step is to understand the right contextual meaning of a keyword, and for this activity are enrolled distractors (wrong answer choices) to discover the correct meaning for a given word in a sentence. To generate distractors can be used several algorithms, and Ramsri showed WordNet and Sense2Vec.

The first one is a large lexical database of English to capture broader relationships between words, the second one captures contextual information from a word, generating synonyms.