How To Build Your Own Chatbot Using Deep Learning

A comprehensive step-by-step guide to implementing an intelligent chatbot solution

If you are interested in developing chatbots, you can find out that there are a lot of powerful bot development frameworks, tools, and platforms that can use to implement intelligent chatbot solutions. How about developing a simple, intelligent chatbot from scratch using deep learning rather than using any bot development framework or any other platform. In this tutorial, you can learn how to develop an end-to-end domain-specific intelligent chatbot solution using deep learning with Keras.


Concept

Before jumping into the coding section, first, we need to understand some design concepts. Since we are going to develop a deep learning based model, we need data to train our model. But we are not going to gather or download any large dataset since this is a simple chatbot. We can just create our own dataset in order to train the model. To create this dataset, we need to understand what are the intents that we are going to train. An “intent” is the intention of the user interacting with a chatbot or the intention behind each message that the chatbot receives from a particular user. According to the domain that you are developing a chatbot solution, these intents may vary from one chatbot solution to another. Therefore it is important to understand the right intents for your chatbot with relevance to the domain that you are going to work with.

Then why it needs to define these intents? That’s a very important point to understand. In order to answer questions, search from domain knowledge base and perform various other tasks to continue conversations with the user, your chatbot really needs to understand what the users say or what they intend to do. That’s why your chatbot needs to understand intents behind the user messages (to identify user’s intention). How can you make your chatbot understand intents in order to make users feel like it knows what they want and provide accurate responses. The strategy here is to define different intents and make training samples for those intents and train your chatbot model with those training sample data as model training data (X) and intents as model training categories (Y).


Implementation

Required Packages

The required python packages are as follows, (here I mentioned the packages with versions that I have used for the developments)

tensorflow==2.3.1
nltk==3.5
colorama==0.4.3
numpy==1.18.5
scikit_learn==0.23.2
Flask==1.1.2

Define Intents

I will define few simple intents and bunch of messages that corresponds to those intents and also map some responses according to each intent category. I will create a JSON file named “intents.json” including these data as follows.


{"intents": [    
    {"tag": "greeting",
    "patterns": ["Hi", "Hey", "Is anyone there?", "Hello", "Hay"],
    "responses": ["Hello", "Hi", "Hi there"]    
    },    
    {"tag": "goodbye",
    "patterns": ["Bye", "See you later", "Goodbye"],
    "responses": ["See you later", "Have a nice day", "Bye! Come back 
     again"]    
    },    
    {"tag": "thanks",
    "patterns": ["Thanks", "Thank you", "That's helpful", "Thanks for 
    the help"],
    "responses": ["Happy to help!", "Any time!", "My pleasure", "You're 
    most welcome!"]    
    },    
    {"tag": "about","patterns": ["Who are you?", "What are you?", "Who 
    you are?" ],
    "responses": ["I.m Joana, your bot assistant", "I'm Joana, an 
    Artificial Intelligent bot"]    
    },    
    {"tag": "name",
    "patterns": ["what is your name", "what should I call you", "whats 
    your name?"],
    "responses": ["You can call me Joana.", "I'm Joana!", "Just call me 
    as Joana"]    
    },    
    {"tag": "help",
    "patterns": ["Could you help me?", "give me a hand please", "Can 
    you help?", "What can you do for me?", "I need a support", "I need 
    a help", "support me please"],
    "responses": ["Tell me how can assist you", "Tell me your problem 
    to assist you", "Yes Sure, How can I support you"]    
    },    
    {"tag": "createaccount",
    "patterns": ["I need to create a new account", "how to open a new 
    account", "I want to create an account", "can you create an account 
    for me", "how to open a new account"],
    "responses": ["You can just easily create a new account from our 
    web site", "Just go to our web site and follow the guidelines to 
    create a new account"]    
    },    
    {"tag": "complaint",
    "patterns": ["have a complaint", "I want to raise a complaint", 
    "there is a complaint about a service"],
    "responses": ["Please provide us your complaint in order to assist 
    you", "Please mention your complaint, we will reach you and sorry 
    for any inconvenience caused"]    
    }
]
}

Data Preparation

First we need to import all the required packages


import json
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Embedding, GlobalAveragePooling1D
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from sklearn.preprocessing import LabelEncoder


Now we load the json file and extract the required data.

with open('intents.json') as file:     
    data=json.load(file)

training_sentences= []
training_labels= []
labels= []
responses= []

for intent in data['intents']:
    for pattern in intent['patterns']:
        training_sentences.append(pattern)
        training_labels.append(intent['tag'])
    responses.append(intent['responses'])
    
    if intent['tag'] notinlabels:
    labels.append(intent['tag'])
   
num_classes=len(labels)

The variable “training_sentences” holds all the training data (which are the sample messages in each intent category) and the “training_labels” variable holds all the target labels correspond to each training data.

Then we use “LabelEncoder()” function provided by scikit-learn to convert the target labels into a model understandable form.


lbl_encoder=LabelEncoder()
lbl_encoder.fit(training_labels)
training_labels=lbl_encoder.transform(training_labels)

Next, we vectorize our text data corpus by using the “Tokenizer” class and it allows us to limit our vocabulary size up to some defined number. When we use this class for the text pre-processing task, by default all punctuations will be removed, turning the texts into space-separated sequences of words, and these sequences are then split into lists of tokens. They will then be indexed or vectorized. We can also add “oov_token” which is a value for “out of token” to deal with out of vocabulary words(tokens) at inference time.


vocab_size=1000
embedding_dim=16
max_len=20
oov_token="<OOV>"

tokenizer=Tokenizer(num_words=vocab_size, oov_token=oov_token)
tokenizer.fit_on_texts(training_sentences)
word_index=tokenizer.word_index
sequences=tokenizer.texts_to_sequences(training_sentences)
padded_sequences=pad_sequences(sequences, truncating='post', maxlen=max_len)

The “pad_sequences” method is used to make all the training text sequences into the same size.

Model Training

Let’s define our Neural Network architecture for the proposed model and for that we use the “Sequential” model class of Keras.


model=Sequential()
model.add(Embedding(vocab_size, embedding_dim, input_length=max_len))
model.add(GlobalAveragePooling1D())
model.add(Dense(16, activation='relu'))
model.add(Dense(16, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss='sparse_categorical_crossentropy', 
        optimizer='adam', metrics=['accuracy'])
model.summary()

Our model architecture looks as follows.

Now we are ready to train our model. Simply we can call the “fit” method with training data and labels.

epochs=500
history=model.fit(padded_sequences, np.array(training_labels), epochs=epochs)

After training, it is better to save all the required files in order to use it at the inference time. So that we save the trained model, fitted tokenizer object and fitted label encoder object.


# to save the trained model
model.save("chat_model")
import pickle

# to save the fitted tokenizer
with open('tokenizer.pickle', 'wb') as handle:
    pickle.dump(tokenizer, handle, protocol=pickle.HIGHEST_PROTOCOL)

# to save the fitted label encoder
with open('label_encoder.pickle', 'wb') as ecn_file:
    pickle.dump(lbl_encoder, ecn_file,protocol=pickle.HIGHEST_PROTOCOL)

Inference

Okay!!!! now it’s time to check how our model performs. 😊 We are going to implement a chat function to engage with a real user. When a new user message is received, the chatbot will calculate the similarity between the new text sequence and training data. Considering the confidence scores got for each category, it categorizes the user message to an intent with the highest confidence score.


import json
import numpy as np 
from tensorflow import keras
from sklearn.preprocessing import LabelEncoder

import colorama
colorama.init()
from colorama import Fore, Style, Back

import random
import pickle with open("intents.json") as file:data=json.load(file)

defchat():
    # load trained model
    model=keras.models.load_model('chat_model')
    
    # load tokenizer object
    with open('tokenizer.pickle', 'rb') as handle:
        tokenizer=pickle.load(handle)
    
    # load label encoder object
    with open('label_encoder.pickle', 'rb') asenc:
        lbl_encoder=pickle.load(enc)
    
    # parameters
    max_len=20
    
    whileTrue:
    print(Fore.LIGHTBLUE_EX+"User: "+Style.RESET_ALL, end="")
    inp=input()
    if inp.lower() =="quit":
        break
    result=model.predict(keras.preprocessing.sequence.pad_sequences(tokenizer.texts_to_sequences([inp]),truncating='post', maxlen=max_len))
    tag=lbl_encoder.inverse_transform([np.argmax(result)])
    
    for i in data['intents']:
    if i['tag'] ==tag:
        print(Fore.GREEN+"ChatBot:"+Style.RESET_ALL , 
        np.random.choice(i['responses']))

    # print(Fore.GREEN + "ChatBot:" + 
    Style.RESET_ALL,random.choice(responses))

print(Fore.YELLOW+"Start messaging with the bot (type quit to stop)!"+Style.RESET_ALL)chat()

You can see that it’s working perfectly!!!

Integration With Chat Applications

Also, you can integrate your trained chatbot model with any other chat application in order to make it more effective to deal with real world users.

I have already developed an application using flask and integrated this trained chatbot model with that application.



Final Thoughts

We discussed how to develop a chatbot model using deep learning from scratch and how we can use it to engage with real users. With these steps, anyone can implement their own chatbot relevant to any domain.

As further improvements you can try different tasks to enhance performance and features.

  • Use more data to train: You can add more data to the training dataset. A large dataset with a good number of intents can lead to making a powerful chatbot solution.

  • Apply different NLP techniques: You can add more NLP solutions to your chatbot solution like NER (Named Entity Recognition) in order to add more features to your chatbot. With having a NER model along with your chatbot, you can easily find out any entity that appeared in user chat messages and use it for further conversations. And also you can add a Sentiment Analysis model to identify different sentiment tones behind user messages and it will exactly give some additional colors to your chatbot.

  • Try different neural network architectures: You can also try different neural network architectures with different hyperparameters.

  • Add emojis: You can also consider emojis when building your models.


Source: paper.li

Recent Posts

See All

Domain Name System

The Domain Name System (DNS) is the phonebook of the Internet. Humans access information online through domain names, like nytimes.com or espn.com. Web browsers interact through Internet Protocol (IP)