Understanding chatbots and how to build one simple chatbot in Python

Understanding chatbots and how to build one simple chatbot in Python

A chatbot is an artificial intelligence powered piece of software in a device, application, web site or alternative networks that try to complete consumer’s needs and then assist them to perform a selected task. Now a days almost every company has a chatbot deployed to interact with the users.


Chatbots are often used in many departments, businesses and every environment. They are artificial narrow intelligence (ANI). Chatbots only do a restricted quantity of task i.e. as per their design. However, these Chatbots make our lives easier and convenient. The trend of Chatbots is growing rapidly between businesses and entrepreneurs, and are willing to bring chatbots to their sites. You might also produce it yourself using Python.


How do chatbots work?

There are broadly two variants of chatbotsRule-Based and Self learning.

  1. In a Rule-based approach, a bot answers questions based on some rules on that it is trained on. The rules outlined could be very easy to very complicated. The bots will handle easy queries but fail to manage complicated ones.
  2. The Self learning bots are those that use some Machine Learning-based approaches and are positively a lot of economical than rule-based bots. These bots may be of additional two types: Retrieval based or Generative.
    1. In retrieval-based models, Chatbot uses the message and context of conversation for selecting the best response from a predefined list of bot messages.
    2. Generative bots can generate the answers and not always reply with one of the answers from a set of answers. This makes them more intelligent as they take word by word from the query and generates the answers.


Building a chatbot using Python


The field of study that focuses on the interactions between human language and computers is called Natural Language Processing. NLP is a way for computers to analyze, understand, and derive meaning from human language in a smart and useful way. However, if you are new to NLP, you can read Natural Language Processing in Python.



NLTK (Natural Language Toolkit) is a leading platform for building Python programs to work with human language data. It provides easy-to-use lexical resources such as WordNet, along with a suite of text processing libraries.


Importing necessary libraries

import nltk 

import numpy as np 

import random 

import string # to process standard python strings


Copy the content in text file named ‘chatbot.txt’, read in the text file and convert the entire file content into a list of sentences and a list of words for further pre-processing.


f=open('chatbot.txt','r',errors = 'ignore')


raw=raw.lower()# converts to lowercase

nltk.download('punkt') # first-time use only

nltk.download('wordnet') # first-time use only

sent_tokens = nltk.sent_tokenize(raw)# converts to list of sentences 

word_tokens = nltk.word_tokenize(raw)# converts to list of words


Pre-processing the raw text

We shall now define a function called LemTokens which will take as input the tokens and return normalized tokens.


lemmer = nltk.stem.WordNetLemmatizer()

#WordNet is a semantically-oriented dictionary of English included in NLTK.

def LemTokens(tokens):     

return [lemmer.lemmatize(token) for token in tokens]

remove_punct_dict = dict((ord(punct), None) for punct in string.punctuation) 

def LemNormalize(text):     

return LemTokens(nltk.word_tokenize(text.lower().translate(remove_punct_dict)))


Keyword matching

Define a function for greeting by bot i.e. if user’s input is greeting, the bot shall return a greeting response.

GREETING_INPUTS = ("hello", "hi", "greetings", "sup", "what's up","hey",)

GREETING_RESPONSES = ["hi", "hey", "*nods*", "hi there", "hello", "I am glad! You are talking to me"]

def greeting(sentence):

for word in sentence.split():

if word.lower() in GREETING_INPUTS:

return random.choice(GREETING_RESPONSES)


Generate responses

To generate a response from our bot for input queries, the concept of document similarity is used. Therefore, we start by importing necessary modules.

From scikit learn library, import the TFidf vector to convert a collection of raw documents to a matrix of TF-IDF features

from sklearn.feature_extraction.text import TfidfVectorizer

Also, import cosine similarity module from scikit learn library

from sklearn.metrics.pairwise import cosine_similarity

This will be used to find the similarity between words entered by the user and therefore the words within the corpus. This can be the simplest possible implementation of a chatbot.

Define a function response that searches the user’s vocalization for one or more known keywords and returns one of several possible responses. If it doesn’t find the input matching any of the keywords, it returns a response: “I’m sorry! I don’t understand you”


def response(user_response):



TfidfVec = TfidfVectorizer(tokenizer=LemNormalize, stop_words='english')

tfidf = TfidfVec.fit_transform(sent_tokens)

vals = cosine_similarity(tfidf[-1], tfidf)


flat = vals.flatten()


req_tfidf = flat[-2]


robo_response=robo_response+"I am sorry! I don't understand you"

return robo_response

else:  robo_response = robo_response+sent_tokens[idx]

return robo_response


I have tried to explain in simple steps how you can build your own chatbot using NLTK and of course it’s not an intelligent one.

I hope you guys have enjoyed reading.

Happy Learning!!!

Saad A.

I am Computer Science graduated developer with experience of 1+ years in this domain. I have great skills of coding at front-end as well as back-end. I can build a product from scratch to production level, and i also have experience of working as a team player with great teams.

Saad A. | Java Developer

What our users are discussing about Chatbot