gunthercox / chatterbot Goto Github PK

View Code? Open in Web Editor NEW

13.9K 546.0 4.4K 4.25 MB

ChatterBot is a machine learning, conversational dialog engine for creating chat bots

Home Page: https://chatterbot.readthedocs.io

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

chatterbot machine-learning chatbot python conversation language bot

chatterbot's People

Contributors

Stargazers

Watchers

Forkers

kevin-brown paarthbatra algby dmaehler aibotnet hamiltonpinheiro noohkvm nivanov85 alinahid477 cxp1991 drat virdi hidayat722 shabeer3508 off-by-some skullkid94 daherk2 josephpaik skymeson zuiwufenghua kartik71 snapsapp smiyfei dannygoncalves ghaithtroudi mrafayaleem b3rew engr-kaka rishkarajgi thejulieproject darkmattervale jnewton1024 akfork clarkeadg style13 hostintruder absarf lancevalour amarouni harsha129 leif-waldner jamdagni86 anuragmishracse pradeephrish childofametalgod uname-a knkrth dwfchu niteshkhilwani cloudxtreme sharkqwy totalgood juankarlo bloodywing rajatsaini89 copyfun mustafamhasan julianoengineer sameerrathod lenny-r sgoblin ynalcakan quicm joab40 endika anwar-hegazy manjush3v arlin17 gintu anupkumar16 vanpitkinobi laomouzi smalho4 pedram7sd rusith einyx orfeomorello macmorgan kevark sgaurav codeaudit devashishmamgain a22894 pendan niko2756 mongonauta wallabra showkeyjar blackhen sidaga ttracx quan2d korymath ntu-artfest22 manoharp jianjun66 ikhanryu parisge htmlguy testmana2

chatterbot's Issues

Python 3 and python-oauth2

Right now the tests are triggering an error on Travis because you are using python-oauth2, which is a library for OAuth 1 that does not support Python 3.

You may want to look into oauthlib for generating the tokens, and requests-oauthlib for sending signed HTTP requests. It supports both OAuth 1 and OAuth 2.

Corpus for conversation training

It would be useful to have a corpus available to use for training ChatterBot instances. Here are a few possible sources to investigate.

via http://opendata.stackexchange.com/questions/5589/movie-script-database/5593

I'm also considering passing off the task of training to a Training class of which there can be different types that can handle data in different formats: csv, json, plain text, etc.

Add chat bot name to the constructor

Google hangouts adapter

It would be really cool to have an IO adapter for ChatterBot that makes it possible for a chatbot instance to communicate through the Google Hangouts API.

It may be useful to look at https://github.com/hangoutsbot/hangoutsbot

Pull requests welcomed if anyone is interested.

Pip install not finding 'requirements.txt'

I'm getting the error "No such file or directory: 'requirements.txt'" when running pip install chatterbot. I think this might be because you don't include 'requirements.txt' in your MANIFEST.in file.

An apostrophe is showing up at the end of bot generated text.

Support additional data on response object

Serialized response lists should be a list of json objects so that additional data such can be added in the future.

Current format:

["Response text", 4]

New format:

{
    "text": "Response text",
    "occurrence": 4
}

Markov chain based response algorithm

ChatterBot is allows alternate statement selection algorithms to be passed into it's constructor. The default selection algorithm is engram, which looks for the closest existing match to a statement in the database and then returns a known response to that statement.

At the moment there is a placeholder file in chatterbot/algorithms/markov.py that is intended to be created as an option for selecting statements.

The markov algorithm would retrieve a list of statements that can be recognized as matches to the input text. A Markov chain based algorithm can then be used to build a new response based on the collection of matching statements.

Ideally, a method to validate the grammatical correctness of the newly created statement could also be created. It may be useful to look into NLTK to determine this validation.
I'm open to suggestions on how to determine the number of words required to satisfy the markov algorithm.

corpus data dne?

When I try to train using the example in the readme, I get this error:

code:

#!/usr/bin/env python
from chatterbot import ChatBot

bot = ChatBot("Terminal",
    storage_adapter="chatterbot.adapters.storage.JsonDatabaseAdapter",
    logic_adapter="chatterbot.adapters.logic.ClosestMatchAdapter",
    io_adapter="chatterbot.adapters.io.TerminalAdapter",
    database="database.db")

bot.train("chatterbot.corpus.english.greetings")

output:

[pons@australis-aurora code]$ ./chatting.py 
Traceback (most recent call last):
  File "./chatting.py", line 10, in <module>
    bot.train("chatterbot.corpus.english.greetings")
  File "/usr/lib64/python2.7/site-packages/chatterbot/chatterbot.py", line 152, in train
    self.trainer.train_from_corpora(corpora)
  File "/usr/lib64/python2.7/site-packages/chatterbot/training.py", line 34, in train_from_corpora
    corpus_data = self.corpus.load_corpus(corpus)
  File "/usr/lib64/python2.7/site-packages/chatterbot/corpus/corpus.py", line 55, in load_corpus
    corpus = self.read_corpus(corpus_path)
  File "/usr/lib64/python2.7/site-packages/chatterbot/corpus/corpus.py", line 30, in read_corpus
    with open(file_name) as data_file:    
IOError: [Errno 2] No such file or directory: '/usr/lib64/python2.7/site-packages/chatterbot/corpus/data/english/greetings'

is something just not installing?

Bot starts running slowly after several hundred responses

On my computer the bot delays a few seconds between inputs when I train with with 1000 example lines. It gets worse the more training data there is. Is there any way to speed up the performance, or is this just the nature of the beast?

Chatterbot slows down on large databases

So my plan was, to make a irc bot that would learn from logs and then be in the channel.

I fed it about 25k lines of logs from a large log file, and then I decided to test it out.

I asked it something, then waited for it to respond, it was taking about 40% CPU usage, and I waited an hour for it to respond. Nothing.

Also I am on PyPy so the default json library is already very fast

IRC adapter

It would be interesting to have an IO adapter for ChatterBot that makes it possible for a instance of the program to communicate through an IRC client. I don't have a lot of experience with IRC, but this would be a great addition and a pull request would be welcomed if anyone is ever interested.

Use occurrence count of statements as a weighting factor to determine response text

ChatterBot currently keeps count of the number of times it receives a particular statement as input. This count is not being used anywhere at the moment, however it would useful to use it as a weighting factor to determine what response should be returned. Statements that occur frequently should be returned at a roughly equal frequency.

Training should be modified to increment this count so that desirable responses can be reinforced.

Closest Meaning Logic Adapter improvements

The Closest Meaning Logic Adapter currently only checks the path similarity between the first synset for each word (synset1[0].path_similarity(synset2[0])). This heuristic might be more accurate if it selected the maximum (shortest path) pair of synsets for the two words.

Because this operation might greatly increase the amount of time required to process each result, it may be useful to look into the possibility of caching logical evaluations made by this adapter on the statement object when it is saved to the database. Then, the check would only need to be processed if the value of the overall synonymous meaning between two statements had not already been evaluated.

Turing Test

I would like to allow this program to begin to address the Turing test concept.

A few common ways that users from a study attempted to determine if an entity was a computer or a person included the following:

Ask name
Ask gender
Notice repeated information
Ask questions repeatedly to see if different answers are given
Try to get the entity to contradict itself
Ask math questions
Length of time taken to respond

I do not plan to immediately address the last two bullets regarding math and speed. The math questions inherently suggest that the chat bot should get some math problems wrong in order to seem more human. My view on this is that it is more useful to have a bot that is good at math, which could just as easily be a human. Also, the amount of time taken to respond is not critical at this point but could be easily added if needed.

A great selection of turing test questions: http://greatbird.com/turing/

Additional response algorithms

These are additional methods for returning responses from ChatterBot. These methods will increase the accuracy of any possible output that the program can provide. For example, questions about a specific subject will have to be processed much differently than simple greetings. Some algorithms that would be useful to implement may include the following:

find_a_name() A method designed to determine the most likely answer to a question regarding a name of a person, place, etc.
Example input: "Who was the president of the United States in 1953?"
Example output: "Dwight D. Eisenhower"
evaluate_mathematically() A method which checks an input value for any references to mathematical operations. If they exist, it will attempt to return a solution. This method should be able to evaluate both words and mathematical characters.
Example input: "What is 2 * the square root of 4?"
Example output: "4"
custom_methods() This would allow a user to ask the bot to run a specific command, for instance entering "Do I have any new messages?" could check several services for new notifications. This would be based on loading in a reference to a third party method upon initialization of the program. (Accomplished through the addition of the adapter system)

small_truths() This idea suggests that by knowing a collection of true facts about everyday items, the program will be able to determine the truth of more complex statements. For instance, note the statement "can a can can cans". As a human we are able to determine that a can is a metal container which cannot engage in the act of canning because it is a non-mechanized inanimate object. A computer on the other hand would find this statement more challenging to decipher. Microsoft Word 2010 detects this statement as incorrect http://imgur.com/crDA3eh.
~~When the program receives an input item, it will process the input using each of the different algorithms and saving the value returned from each.~~ This will be addressed in #85

Compare each answer to known questions and answers. There will need to be a way to to calculate how successful an algorithm is at generating the appropriate response. The result of this calculation will be used to determine the most appropriate response to return. To do this, the result of each algorithm which was run on the input statement will be compared to training text statements which most closely resemble the initial question. The closest matching response should be the result which has the closest matching question.

Future optimization
Further optimization may be possible by performing preliminary checks to determine if certain algorithms do not need to be run when evaluating an input item. An example of this might be that an algorithm designed to extract and interpret mathematical operations in an input statement will not need to be executed if the input contains no numeric values or mathematical characters.

MongoDB few questions

Few questions
Once we have mongo db to replace flat files conversations

Do anyone who is using this project need to install its own instance of mondodb and have to open mongod.exe on their machine ?
Will it replace flat files or can we have both flat files and mongodb in such a way if someone dont have db installed can still use flat files conversations ?
Do we still need to specify logfiles directory ?
How are we going to add default conversations to mongodb for every new installations ?

Create a method for returning richer result data

This is a useful feature for connecting the chat bot to provide output through APIs.
It may be useful to change the class to include a method that returns just a string based on input
and another method which returns richer data based in the form of a dictionary.

ValueError from converstation.py, line 100, need more than 3 values to unpack

Never used github, so sorry if this is in the wrong place for this or format.

Seems that chatterbot is creating converstation files that contain an empty line at the end, which converstation.py tries to read. My fix was just to check the line is not empty:

        # Continue only if the file contains lines
        if lines:
            previous_statement = None
            for line in lines:
                if line:
                    user, date, text = line

How to make the chatbot remember?

Hi i am using your chatbot but it only uses the engrams. Everything i say it does not save it just follows the engrams conversation how do i make it remember the conversation and create a engram so i can train it through talking? Like cleverbot?

Basic usage : AttributeError: 'module' object has no attribute 'english'

In [5]: chatbot.train('chatterbot.corpus.english')

AttributeError Traceback (most recent call last)
in ()
----> 1 chatbot.train('chatterbot.corpus.english')

/usr/lib64/python2.7/site-packages/chatterbot/chatterbot.pyc in train(self, conversation, _args, *_kwargs)
150
151 if corpora:
--> 152 self.trainer.train_from_corpora(corpora)
153 else:
154 self.trainer.train_from_list(conversation)

/usr/lib64/python2.7/site-packages/chatterbot/training.pyc in train_from_corpora(self, corpora)
31 def train_from_corpora(self, corpora):
32 for corpus in corpora:
---> 33 corpus_data = load_corpus(corpus)
34 for data in corpus_data:
35 for pair in data:

/usr/lib64/python2.7/site-packages/chatterbot/corpus/utils.pyc in load_corpus(corpus_path)
16 from types import ModuleType
17
---> 18 corpus = import_module(corpus_path)
19
20 if isinstance(corpus, ModuleType):

/usr/lib64/python2.7/site-packages/chatterbot/utils/module_loading.pyc in import_module(dotted_path)
12 module = importlib.import_module(module_path)
13
---> 14 return getattr(module, module_parts[-1])

AttributeError: 'module' object has no attribute 'english'

Conversation tracking

Currently, ChatterBot only responds to the last statement that was entered. For better conversations and more accurate responses it would be useful to track the last statements that were entered in a given conversation and use these details to determine what to say next.

Should the chatterbot's responses to the user be remembered as well? In the past these were stored in the database along with user inputs, however this caused issues with the chat bot learning the wrong output because it was recalling its own responses.

Research how the weight of a statement in a conversation changes as the conversation goes on.
Use statements from the current conversation to help determine the most appropriate response.
Statements decrease in weight as time increases.
Add set_persona method & set_user_persons() methods

It might be interesting to investigate the NLTK PositiveNaiveBayesClassifier for determining if a statement shares a subject with a list of past statemets.

for any input bot gives output as ('bot', 'No possible replies could be determined.')

from chatterbot import Terminal
terminal = Terminal()
terminal.log_directory="D:/python_logs/chatterbot/conversation_engrams"
terminal.begin()

and for any input i just get No possible replies could be determined. How to fix this ?

hi
('bot', 'No possible replies could be determined.')
how r u ?
('bot', 'No possible replies could be determined.')

default database.db gives KeyError: 'in_response_to'

Hi when i tried running chatterbox with default data base it exits with KeyError: 'in_response_to'

  File "terminal_example.py", line 36, in <module>
    bot_input = bot.get_response(user_input)
  File "/usr/local/lib/python2.7/dist-packages/ChatterBot-0.2.5-py2.7.egg/chatterbot/chatterbot.py", line 118, in get_response
    in_response_to__contains=closest_match.text
  File "/usr/local/lib/python2.7/dist-packages/ChatterBot-0.2.5-py2.7.egg/chatterbot/adapters/storage/jsondatabase.py", line 56, in filter
    if self._all_kwargs_match_values(kwargs, values):
  File "/usr/local/lib/python2.7/dist-packages/ChatterBot-0.2.5-py2.7.egg/chatterbot/adapters/storage/jsondatabase.py", line 37, in _all_kwargs_match_values
    if kwarguments[kwarg] not in values[kwarg_parts[0]]:
KeyError: 'in_response_to'

Need to create a contributing.md file

which dialogue algorithm is adopted in this project?

I am curious about which of the following features are available;

enriching or at least retaining a context during a dialogue
learning from humans
making prediction or/and deduction based on available knowledge

Another thing is: Is it just simple statement matcher in which user questions and bot questions have to be the exactly same?

For instance, suppose i trained the bot with:
q: "where is the post office? "
a: "it is right behind you"

and user may ask like this:
q: "looking for the post office"

What will the chatbot give to the user?

How to stop a termimnal chat conversation with bot ?

This might not be a bug but rather a Question regarding bot . Can you advice How can be finish the chat with bot from terminal ?

I mean do we need to terminate the program or is there any other way like saying bye or something like that ?

Response list structure

Each statement object that ChatterBot stores contains an attribute for referencing each statement that it has been used to respond to.

In the future, the in_response_to list should become a dictionary of the response statements with a value of the number of times each statement has occurred. This should make selecting likely responses more accurate because we can see how common one response is to another, instead of just the number of times that response has occurred.

How is the database created and how can I change it?

I am doing some research on chatting bots for studying purposes, but I have never done any Python programming before and coming from a Java background, the database creation is pure black magic to me. Can you please point me in a direction on how I can add 1 additional "column" to the list of values stored?

I need to add a "score_value" to the database so I can measure how many points an user made throughout a chat session with the bot.

Thanks.

Weight responses by identity

Hi - great project, I'm having fun playing around with it. (Built a small Shakespeare chat bot, feeding it Hamlet via nltk.corpus.gutenberg to train with.)

Would it be possible, when training the bot with a conversation between Person 1 and Person 2, to weight, say, Person 1's responses more heavily? That is, the bot would learn about context and human language, generally, from the dialogue, but it would more often mimic the sentence structure, grammar and vocab from Person 1. It would thus assume Person 1's "personality".

Additional training sources

These are sources that would be useful to get lots of text based information to train the chat bot with.

https://www.gutenberg.org
- Project Gutenberg offers a large quantity of public domain texts which can be downloaded for free. Books can be downloaded in plain text format which will be useful for training the chat bot.
https://dumps.wikimedia.org
- Wikipedia hosts backups of its database. Using Wikipedia articles as a source of information to train the chat bot would be incredibly valuable because such a diverse quantity of topics are covered.
  The challenge will be extracting articles from the database dump. It would be useful to have a utility program that could do this.

Add support for multi-person conversations

Currently the logs in which each conversation is recorded contain the name of the speaker. These details could be used to allow the program to have a conversation with multiple people simultaneously in which each person can be replied to but the relevance of topics in the conversation still hold weight.

terminal.py does not work

Traceback (most recent call last):
  File "terminal.py", line 9, in <module>
    database="../database.db")
  File "/Users/Frank/Downloads/ChatterBot-master/chatterbot/chatterbot.py", line 32, in __init__
    self.io = IOAdapter()
TypeError: __init__() takes exactly 2 arguments (1 given)

When I try to run the example/terminal.py, it gives me above error

Your welcome

Is "You're" intentionally misspelt to add realism? ;)

Use Semantic Folding Engine to compare text

Using Cortical.io's ( http://www.cortical.io ) Semantic Folding Engine, the chat bot might be able to get better performance when identifying the best response for a given input. I'm not sure how well this would work, but it is definitely something to consider.

I'm still researching this, so I'll hopefully have more details on this soon. Have you heard about this before @gunthercox ?

Importing Terminal fails

The command from chatterbot import Terminal gives me a cannot import name Terminal error. This also happens with TalkWithCleverBot and I suspect every import except for ChatBot.

AttributeError: 'ChatBot' object has no attribute 'get_response'

Using amazon linux and python 2.6.9 and installed chatterBot using pip

Ran basic examples and got this error :

>>> from chatterbot import ChatBot
>>> chatbot = ChatBot("Ron Obvious")
>>> response = chatbot.get_response("Good morning!")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'ChatBot' object has no attribute 'get_response'
>>> chatbot = ChatBot()
>>> response = chatbot.get_response("Good morning!")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'ChatBot' object has no attribute 'get_response'
>>>

Tried similarly in windows python2,7 and same issue . Any solution to this ?

Chatterbot returns a random answer when asked the same question (similar to the one in the training set) multiple times

When the Chatterbot is asked a question (similar to the one present in the training set), it returns the proper answer only once. After that, it returns a random answer.

Terminal Example

probably a simple oversight, but trying to run your terminal example i get the following error:
AttributeError: 'ChatBot' object has no attribute 'get_input'

have successfully imported the module and 'dir'd around the objects a bit, but can't see a get_input method anywhere. do i need to explicitly import Terminal?
thanks!

Selecting a response for matching input statements

The issue is that an input statement can have several equivalent matches. It may be possible to use the history of a conversation to determine which of the possible responses is the most reasonable.

                       |‾‾‾‾ "I'm good."
"Hi, how are you?" ----|---- "I'm doing well."
                       |____ "Much better, thank you."

python-levenshtein as recommended requirement

bot throws errors for pure python parsing.

should python-levenshtein be a requirement when installed?

Filter responses by tone

It would be useful if various input statements could change the weight of which response gets selected based on the tone of the speech. As a result, if an aggressive or angry input statement is entered, a suitable response can be issued.

It may be necessary to use user responses to determine what sentiment of statement should be returned. It may not always be appropriate for ChatterBot to reply to a negative statement with another negative statement when a positive response would be acceptable.

ChatterBot issue on pypy?

I dont know if this is an issue on Pypy only, but this is what happens:

When I start a ChatBot() session and get a response it gives me
IndexError: list index out of range

This does not happen if I use my previous database.db file.

I thought chatbot() creates a database.db file if one does not exist? This does not seem to be happening here...

When a matching response cannot be found, prefer statements with no know responses

This will help build and enhance the knowledge base.

Stop importing api key from settings at the top of the file.

Currently, api keys are loaded from settings.py at the top of the file. This has caused some issues because the program doesn't always use an api.

A way to fix this might be to:

Change chat bot so that it takes the api key in the constructor
Add a method which allows an api to be enabled (this one might be more flexible as more apis are used)

SVO Logic Adapter

It turns out I made a minor mistake going with Pattern. Since it is not available on Python 3.x yet, we will need to change the logic adapter to use another SVO triplet identifier. We can use nlpnet, but that contains a fairly large dependency file which I am not sure is worth it. If that doesn't work, we could use a simple SVO triplet identifier I created using NLTK's built-in POS Tagger, but I am not confident in its ability to correctly identify the subject, verb, and object of complex sentences.

@gunthercox what course of action would you prefer?

some bug fixes

I am still in the process of reading the chatterbot and gettting more familiar slowly .

I found some bugs like in conversations.py 's Statement class function detect_sentiment

def detect_sentiment():
        """
        A property that describes hows the 
        """

        if self.sentiment:
            return self.sentiment

        # Evaluate the sentiment of the statement
        #else:

not sure if we are using this function anywhere however it still miss a self i.e. correct one should be

def detect_sentiment(self):
        """
        A property that describes hows the 
        """

        if self.sentiment:
            return self.sentiment

        # Evaluate the sentiment of the statement
        #else:

similarly something wrong in

def get_sentiment(name):
        """
        Returns the average sentiment for a single user throughout a
        conversation.
        """
        sentiment = []
        for statement in self:
            if statement.name == name:
                sentiment.append(statement)

        return "" #TODO: return the average sentiment

can i do a change in it and add it in a new brach or is it best to raise a similar issue here

Set up proper oAuth signature generation

Currently post requests are failing when trying to make replies via the twitter api. This is because there isn't a valid signature for sending them. The signature is not needed for get requests, which is why it has been working so far.

print(api.get_list("salviusrobot", "Robots"))
api.tweet_to_friends("salviusrobot", "Robots", debug=True)

tweet = {}
tweet["id_str"] = "508654764713050112"
print(api.favorite(tweet))

Add nltk and python-Levenshtein in requirements.txt

I tested it on my Mac OS X.
It needs these two packages to run.

Twitter social integration

ChatterBot has the capability to integrate with various social networking sites to learn from user input and also respond to input. Implementation has been specified for the following social platforms.

Twitter

Retrieve OAuth token for Twitter
Reply to direct messages

Communication

Create statements / replies based on statements learned from social media sites.