Comments (2)
Hi,
This is a Chinese-nlp repo. It contains word banks, specific indurstry/business word banks, poetry banks, the introduction of many interesting Chinese python packages. Most of the word banks, corpus and python package are used in Chinese related nlp tasks. There are also some English related python packages. For instance,
- No.2: langid and langdetect, a python package, which can detect the language of the text, spanish? English?French? and many other 97 languages.
- No.5: phone, a python package, which can identify the location of the telephone number?
phone('+852` 6569-8900')
return ['+85265698900', `'HKG']
- No.7: the regex of email: extract email address from texts
- No.10: English-chinese name pairs: John(English)--约翰(Chinese),it also contains many Japanese names, in that Japanese is a language generated from Chinese.
- No.17: funny chinese text to speech engine, which is built by @tinyfool
English: I love you
say: wo i ni
#说:我爱你
- No.20: wordninja: probabilistically split concatenated words using NLP based on English Wikipedia unigram frequencies, which is built by @keredson.
import wordninja
wordninja.split('derekanderson')
['derek', 'anderson']
wordninja.split('imateapot')
['im', 'a', 'teapot']
-No.21: the regex of ip: extract IP address from texts
Hope this will be helpful.
from funnlp.
Thank you very much, it is helpful.
from funnlp.
Related Issues (20)
- 阅读理解模块中的 面向各语种/任务的BERT模型大列表/搜索引擎
- NLP4Han仓库已经404
- 有没有根据文字及数据生成图标的项目 HOT 2
- 提议在FunNLP列表中添加Odin Runes
- 在哪里可以找到代码审计数据集?
- 【分享】技术翻译数据集
- 链接错误 HOT 2
- 请问职业名称词表在哪 HOT 1
- 您好,中文问答数据集链接失效了可以再提供一下吗? HOT 5
- 让人人都变得“彬彬有礼”礼貌迁移任务, 地址无效: Article identifier '200414257' not recognized HOT 1
- 程序员技术交流
- 有用于训练奖励模型的中文数据集吗? HOT 3
- 爬虫数据库
- 💥💥学生党翻墙 | 全网最便宜的VPN | 12元/年 == 1月1元 | 翻墙机场推荐 | 自用推荐 | 无广告 | 性价比超高
- MiningZhiDaoQACorpus数据集链接失效 HOT 3
- 很多链接打不开,想问这是为啥 HOT 1
- 可以推荐项目?个人发起开源项目,AI可视化 HOT 2
- 提交一些语料库 HOT 2
- 有用 HOT 2
- 中文AI训练数据集项目 HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from funnlp.