codemayq / chinese-chatbot-corpus Goto Github PK
View Code? Open in Web Editor NEW中文公开聊天语料库
License: Apache License 2.0
中文公开聊天语料库
License: Apache License 2.0
来自华为的paper能提供这篇论文 的题目吗?
并没有用户名和密码。
您好~对您的整理完的训练材料非常感兴趣,但baidu下载点已失效,是否方便可以提供google drive的载点?万分感谢
你好!请问能求一份处理后的小黄鸡语料库吗?我自己处理出来的.yml文件在train()方法读取后会有bug,尝试了好久都没能修好。感谢您的教程及经验分享!
the link of baidu tieba corpus is down
The download link doesn't work
你好,大佬,这个Google Drive地址访问不了了?
应该是这两个文件无法用utf-8编码格式打开,我尝试了其他的编码格式,也都不行
stc_weibo_train_post,stc_weibo_train_response
您好,想问下如若文章中使用到您整理搜集的数据该如何引用比较好呢?能否新增一条引用说明呢?
我百兆光纤现在以80k/s的速度龟速下载。。。。。。
python 3.10.5
1.
raw_chat_corpus_root =r"E:\temp\chatcorpus\chinese_chatbot_corpus-master\raw_chat_corpus"
前面加'r'可以正确运行,否则出错找不到文件.
OSError: [Errno 22] Invalid argument: 'E:\temp\chatcorpus\chinese_chatbot_corpus-master\raw_chat_corpus\douban-multiturn-100w\train.txt'
E:\temp\chatcorpus\chinese_chatbot_corpus-master\raw_chat_corpus\qingyun-11w\12涓囧璇濊鏂欓潚浜戝簱.csv
中文文件名乱码, 纠正为"12万对话语料青云库.csv"再运行
FileNotFoundError: [Errno 2] No such file or directory: 'E:\temp\chatcorpus\chinese_chatbot_corpus-master\raw_chat_corpus\qingyun-11w\12万对话语料青云库.csv'
E:\temp\chatcorpus\chinese_chatbot_corpus-master\raw_chat_corpus\weibo-400w\stc_weibo_train_post E:\temp\chatcorpus\chinese_chatbot_corpus-master\raw_chat_corpus\weibo-400w\stc_weibo_train_response 2800000
....
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe8 in position 0: invalid continuation byte
删除下面的还是不行
春夏秋冬 🌺🌻🍂🍁
求 不 到 雨 、 去 游泳 。 🏊🏊🏊🏊
修改weibo.py, 打开文件增加errors='ignore'
raw_corpus_post_file = codecs.open(raw_corpus_post_file_name, encoding=Config.encoding,errors='ignore')
raw_corpus_response_file = codecs.open(raw_corpus_response_file_name, encoding=Config.encoding,errors='ignore')
终于成功运行完成
您好,此项目还会更新吗?网盘链接失效了
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.