The nlp_kesci from loyalzc

nlp_kesci's Introduction

英文语句相似度作文自动打分

句子相似度

博客：http://www.cnblogs.com/infaraway/p/8666269.html

CountVectorizer 向量
TF-IDF 向量
word2vec 向量
ICLR2017 论文方案：A simple but tough-to-beat baseline for sentence embedding

英文作文自动打分

传统机器学习方案：提取文本特征； Machine learning based

长度相关：Length Features
- 单词数 word_count
- 句子数 sentence_count
- 每个句子的平均单词数 avg_sentence_len
- 每个单词的平均长度 avg_word_len
- 长单词数long_word (这里选定长度≥7的为长单词)
- 停用词个数stopwords_count
- 大于4的单词的个数 long_4word
- 使用不重复单词的数量 unique_word
标点相关：Occurrence Features
- 感叹号出现的数目exc_count
- 问号出现的数目que_count
- 逗号出现的数目comma_count
Error Features
- 拼写错误的单词数spelling_errors
n-gram相关：ngrams_counts Features，此特征可以说明作者的词汇丰富程度
- unigrams_count：将文章分词后-->采用1-gram-->统计非重复gram的个数
- bigrams_count：将文章分词后-->采用2-grams-->统计非重复grams的个数
- trigrams_count：将文章分词后-->采用3-grams-->统计非重复grams的个数
词性相关：POS counts Features，此特征用于统计文章中不同词性的个数
- 名词noun_count
- 形容词adj_count
- 副词adv_count
- 动词verb_count
- 外来词fw_count
语气相关：Personality Features 分析文中每句话的语气:positive，negative or neutrual
- 消极语气neg_sentiment
- 中立语气neu_sentiment
- 积极语气pos_sentiment

深度学习方式： Deeplearning based

nlp_kesci's People

Contributors

Stargazers

Watchers

nlp_kesci's Issues

Where your data come from?

Could you please tell me where your data come from please? Thanks.

Result pearsonr : nan

I got the following warning when I ran essay_scoring_dl.py:

PearsonRConstantInputWarning: An input array is constant; the correlation coefficent is not defined.
  warnings.warn(PearsonRConstantInputWarning())

and the output is:

Result pearsonr : nan

Could you please give me some suggestions? Thanks.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.

Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

TensorFlow

An Open Source Machine Learning Framework for Everyone

Django

The Web framework for perfectionists with deadlines.

Laravel

A PHP framework for web artisans

D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

web

Some thing interesting about web. New door for the world.

server

A server is a program made to process requests and deliver data to clients.

Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

Visualization

Some thing interesting about visualization, use data art

Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.

Microsoft

Open source projects and samples from Microsoft.

Google

Google ❤️ Open Source for everyone.

Alibaba

Alibaba Open Source for everyone

D3

Data-Driven Documents codes.

Tencent

China tencent open source team.

loyalzc / nlp_kesci Goto Github PK

nlp_kesci's Introduction

英文语句相似度作文自动打分

句子相似度

博客：http://www.cnblogs.com/infaraway/p/8666269.html

英文作文自动打分

nlp_kesci's People

Contributors

Stargazers

Watchers

Forkers

nlp_kesci's Issues

Where your data come from?

Result pearsonr : nan

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

loyalzc / nlp_kesci Goto Github PK

nlp_kesci's Introduction

英文语句相似度 作文自动打分

句子相似度

博客：http://www.cnblogs.com/infaraway/p/8666269.html

英文作文自动打分

nlp_kesci's People

Contributors

Stargazers

Watchers

Forkers

nlp_kesci's Issues

Recommend Projects

Recommend Topics

Recommend Org

英文语句相似度作文自动打分