Topic: corpus Goto Github
Some thing interesting about corpus
Some thing interesting about corpus
corpus,Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
User: adbar
Home Page: https://trafilatura.readthedocs.io
corpus,An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation
User: blkserene
corpus,大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
User: brightmart
corpus,用于训练中英文对话系统的语料库 Datasets for Training Chatbot System
User: candlewill
corpus,中文医疗信息处理基准CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark
Organization: cbluebenchmark
Home Page: https://tianchi.aliyun.com/dataset/dataDetail?dataId=95414&lang=en-us
corpus,❤️Emotional First Aid Dataset, 心理咨询问答、聊天机器人语料库
Organization: chatopera
Home Page: https://www.chatopera.com/
corpus,:helicopter: 保险行业语料库,聊天机器人
Organization: chatopera
Home Page: https://www.chatopera.com/
corpus,A multilingual parallel corpus created from translations of the Bible.
User: christos-c
corpus,中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
Organization: cluebenchmark
Home Page: http://www.CLUEbenchmarks.com
corpus,Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料
Organization: cluebenchmark
Home Page: https://arxiv.org/abs/2003.01355
corpus,搜索所有中文NLP数据集,附常用英文NLP数据集
Organization: cluebenchmark
Home Page: https://www.cluebenchmarks.com/dataSet_search.html
corpus,高质量中文预训练模型集合:最先进大模型、最快小模型、相似度专门模型
Organization: cluebenchmark
Home Page: https://arxiv.org/abs/2003.01355
corpus,Some useful Chinese corpus datasets 中文语料小数据
User: crownpku
corpus,Preprocessed Python functions and docstrings for automated code documentation (code2doc) and automated code generation (doc2code) tasks.
Organization: edinburghnlp
Home Page: https://arxiv.org/abs/1707.02275
corpus,Deep Learning and deep reinforcement learning research papers and some codes
User: endymecy
corpus,Awesome Chatbot Projects,Corpus,Papers,Tutorials.Chinese Chatbot =>:
User: fendouai
Home Page: http://www.panchuangai.com/
corpus,A very simple news crawler with a funny name
Organization: flairnlp
corpus,PubMed 200k RCT dataset: a large dataset for sequential sentence classification.
User: franck-dernoncourt
corpus,Generative AI for Math: MathPile
Organization: gair-nlp
Home Page: https://gair-nlp.github.io/MathPile/
corpus,UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language
Organization: grammarly
Home Page: https://ua-gec-dataset.grammarly.ai/
corpus,chinese NLP corpus of chinese science fiction,chinese science fiction corpus : About 4675 Chinese science fiction novels 大约有4675本科幻小说,中文科幻小说自然语言处理语料库,中文科幻小说文本语料库,中文科幻小说文本数据库,科幻小说语料
User: guhhhhaa
corpus,A multilingual dialog corpus
User: gunthercox
Home Page: http://chatterbot-corpus.readthedocs.io
corpus,Helsinki Prosody Corpus and A System for Predicting Prosodic Prominence from Text
Organization: helsinki-nlp
corpus,data resource untuk NLP bahasa indonesia
Organization: kirralabs
corpus,KH Coder: for Quantitative Content Analysis or Text Mining
User: ko-ichi-h
Home Page: http://khcoder.net/en
corpus,Korean corpus repository
Organization: ko-nlp
corpus,中文文本分类实践,基于搜狗新闻语料库,采用传统机器学习方法以及预训练模型等方法
User: lijqhs
corpus,Cornell NLVR and NLVR2 are natural language grounding datasets. Each example shows a visual input and a sentence describing it, and is annotated with the truth-value of the sentence.
Organization: lil-lab
Home Page: http://lic.nlp.cornell.edu/nlvr/
corpus,A Curated List of Dataset and Usable Library Resources for NLP in Bahasa Indonesia
User: louisowen6
corpus,Final Weibo Crawler Scrap Anything From Weibo, comments, weibo contents, followers, anything. The Terminator
User: lucasjinreal
corpus,We gather Malaysian dataset! https://malaysian-dataset.readthedocs.io/
Organization: mesolitica
Home Page: https://malaysian-dataset.readthedocs.io/
corpus,Curated List of Persian Natural Language Processing and Information Retrieval Tools and Resources
User: mhbashari
corpus,非常全的文言文(古文)-现代文平行语料
Organization: niutrans
corpus,微信公众号语料库
User: nonamestreet
corpus,A curated list of NLP resources for Hungarian
User: oroszgy
Home Page: https://oroszgy.gitbook.io/awesome-hungarian-nlp-resources/
corpus,Collections of Chinese NLP corpus
User: oye93
corpus,ChatGPT 中文语料库 对话语料 小说语料 客服语料 用于训练大模型
User: plexpt
Home Page: https://chat.aimakex.com/
corpus,An R package for the Quantitative Analysis of Textual Data
Organization: quanteda
Home Page: https://quanteda.io
corpus,A dataset of millions of news articles scraped from a curated list of data sources.
User: several27
corpus,This repository contains code and metadata of How2 dataset
Organization: srvk
Home Page: https://srvk.github.io/how2-dataset/
corpus,My fuzzing corpus
User: strongcourage
corpus,Chatbot in 200 lines of code using TensorLayer
Organization: tensorlayer
Home Page: https://github.com/tensorlayer/tensorlayer
corpus,Poetry-related datasets developed by THUAIPoet (Jiuge) group.
Organization: thunlp-aipoet
corpus,中文人名语料库。人名生成器。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。可用于中文分词、人名实体识别。
User: wainshine
Home Page: https://open.namemoe.com/
corpus,公司名语料库。机构名语料库。公司简称,缩写,品牌词,企业名。可用于中文分词、机构名实体识别。
User: wainshine
Home Page: https://open.namemoe.com/
corpus,A command-line toolkit to extract text content and category data from Wikipedia dump files
User: yohasebe
corpus,PTT 八卦版問答中文語料
User: zake7749
Home Page: https://www.kaggle.com/zake7749/pttgossipingcorpus
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.