Giter VIP home page Giter VIP logo

deepthought's Introduction

deepThought

一个聊天机器人

The answer to life, universe and everything is 42 --deepThought

目标

  • 具有从真实对话或语料库中学习的能力
  • 有泛化能力
  • 能从自然语言中推测用户意图
  • 针对意图,给予恰当回答(回调任务模块)
  • 构造通用的解析工具,将自然语言解析为结构化信息

核心概念

  • 实体 (Entity)
  • 意图 (intent)
  • 行动/响应(action)
  • 结构化的输出
  • 实体的模式
  • require 请求补齐
  • 上下文环境 (context)
  • session

设计

  • 插件化
  • 输入/输出
  • 存储
  • 逻辑单元
  • 事件驱动(意图)
  • 让数据在管道中流动
  • 区分两类bot
  • 闲聊型
  • 学习人类说话 * 在对话中学习
  • 助理型
  • 领域知识
  • 搜索
  • rich output
  • 不要放弃超链(支持富文本输出)
  • 采用ipython交互做实验
  • from IPython.display import HTML, Image, YouTubeVideo

涉及知识

  • 自然语言处理(NLP)
  • 中文分词
  • 命名实体识别
  • 模式匹配(Haskell)
  • 借鉴lisp
  • 机器学习
  • 朴素贝叶斯
  • RNN
  • LSTM

思路

提取结构化信息

  • 通过命名实体识别等取出语言的结构,之后转化为其他问题:
  • 作为机器学习的特征向量
  • 模式匹配问题(lisp)
  • 结构化的信息可以作为功能函数(action)的变量,以此来对接业务系统(database/RESTful)
  • 特征/意图到action的过程,通过学习和训练完成(神经网络/机器学习)
  • action可以对接到既有业务/系统(webapp/database/api)

intent的促发条件

  • intent的促发可以有依赖条件(类比django中的@require),由此进一步向用户索求信息,以不足促发条件

语料库

  • 电影字幕
  • 小说台词
  • 古龙
  • 构建openbot,开发技术和接口,也开放语料库,大家一起来收集真实语料库,通过开源和协议说明来处理隐私问题

Todo

  • 将wit作为bot的一个logic adapter
  • 添加timeout

Done

  • bot作为一个RESTful服务
  • 对接微信,作为自动回复机器人
  • 运行在树莓派上(长期稳定)
  • 文字 -> 语音

衍生计划

openBot(闲聊型)

  • 源码开放/服务开放/语料库开放
  • 允许接入到开发者自己的应用
  • http请求
  • sdk
  • 作为RESTful服务
  • 使用django-rest-framework作为框架,可以快速构建api
  • 丰富的免费午餐 * oauth2/access token * 访问(次数)控制
  • 后期可能需要考虑效率问题

openBot中需要关注的核心问题:如何设计一种机制,让这个过程具有扩张性(《失控》)

deepthought's People

Contributors

wwj718 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deepthought's Issues

chatterbot 使用中文语料库,报编码错误

使用win10 python2.7
在代码中设置了编码,但是还是报错。英文是ok的。
bot.py

# -*- coding: utf-8 -*-
import sys;reload(sys);sys.setdefaultencoding('utf8')

from chatterbot import ChatBot

chatbot = ChatBot(
    'ABC',
    trainer='chatterbot.trainers.ChatterBotCorpusTrainer',
    silence_performance_warning=True
)

# Train based on the english corpus
chatbot.train("chatterbot.corpus.chinese")

# Get a response to an input statement
response = chatbot.get_response("很高兴认识你")
print(response)

运行 python bot.py

[nltk_data] Downloading package stopwords to                                                     
[nltk_data]     C:\Users\Administrator\AppData\Roaming\nltk_data...                              
[nltk_data]   Package stopwords is already up-to-date!                                           
[nltk_data] Downloading package wordnet to                                                       
[nltk_data]     C:\Users\Administrator\AppData\Roaming\nltk_data...                              
[nltk_data]   Package wordnet is already up-to-date!                                             
[nltk_data] Downloading package punkt to                                                         
[nltk_data]     C:\Users\Administrator\AppData\Roaming\nltk_data...                              
[nltk_data]   Package punkt is already up-to-date!                                               
[nltk_data] Downloading package vader_lexicon to                                                 
[nltk_data]     C:\Users\Administrator\AppData\Roaming\nltk_data...                              
[nltk_data]   Package vader_lexicon is already up-to-date!                                       
Traceback (most recent call last):                                                               
  File "bot.py", line 13, in <module>                                                            
    chatbot.train("chatterbot.corpus.chinese")                                                   
  File "D:\AnacondaSetup\lib\site-packages\chatterbot\trainers.py", line 117, in train           
    trainer.train(pair)                                                                          
  File "D:\AnacondaSetup\lib\site-packages\chatterbot\trainers.py", line 82, in train            
    statement = self.get_or_create(text)                                                         
  File "D:\AnacondaSetup\lib\site-packages\chatterbot\trainers.py", line 25, in get_or_create    
    statement = self.storage.find(statement_text)                                                
  File "D:\AnacondaSetup\lib\site-packages\chatterbot\storage\jsonfile.py", line 46, in find     
    values = self.database.data(key=statement_text)                                              
  File "D:\AnacondaSetup\lib\site-packages\jsondb\db.py", line 98, in data                       
    return self._get_content(key)                                                                
  File "D:\AnacondaSetup\lib\site-packages\jsondb\db.py", line 52, in _get_content               
    obj = self.read_data(self.path)                                                              
  File "D:\AnacondaSetup\lib\site-packages\jsondb\file_writer.py", line 15, in read_data         
    obj = decode(content)                                                                        
  File "D:\AnacondaSetup\lib\site-packages\jsondb\compat.py", line 28, in decode                 
    return json_decode(value, encoding='utf-8')                                                  
  File "D:\AnacondaSetup\lib\json\__init__.py", line 352, in loads                               
    return cls(encoding=encoding, **kw).decode(s)                                                
  File "D:\AnacondaSetup\lib\json\decoder.py", line 364, in decode                               
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())                                            
  File "D:\AnacondaSetup\lib\json\decoder.py", line 380, in raw_decode                           
    obj, end = self.scan_once(s, idx)                                                            
UnicodeDecodeError: 'utf8' codec can't decode byte 0xd4 in position 0: invalid continuation byte 

不知道是不是有其他因素影响,给官方提了issue回复的人也不清楚什么原因
另外能不能把console前面打出来的[nltk_data]... 隐藏掉,看着好烦 -.-

有一些可以补充的todo

hi wwj718,我前段时间也看到了ChatterBot,确实他逻辑、代码比较清楚、简单。
聊天语料的话,我觉得还可以尝试爬一下微博,贴吧的对话语料。也许会比较有用
还有一个问题就是,聊天语料多了之后效率怎么样,现在在match 的时候是找到query一一算相似度。可能语料比较多了就要考虑这个问题了。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.