Giter VIP home page Giter VIP logo

nlp_kesci's Introduction

英文语句相似度 作文自动打分

句子相似度

  • CountVectorizer 向量
  • TF-IDF 向量
  • word2vec 向量
  • ICLR2017 论文方案:A simple but tough-to-beat baseline for sentence embedding

英文作文自动打分

  • 传统机器学习方案:提取文本特征; Machine learning based
  1. 长度相关:Length Features

    • 单词数 word_count
    • 句子数 sentence_count
    • 每个句子的平均单词数 avg_sentence_len
    • 每个单词的平均长度 avg_word_len
    • 长单词数long_word (这里选定长度≥7的为长单词)
    • 停用词个数stopwords_count
    • 大于4的单词的个数 long_4word
    • 使用不重复单词的数量 unique_word
  2. 标点相关:Occurrence Features

    • 感叹号出现的数目exc_count
    • 问号出现的数目que_count
    • 逗号出现的数目comma_count
  3. Error Features

    • 拼写错误的单词数spelling_errors
  4. n-gram相关:ngrams_counts Features, 此特征可以说明作者的词汇丰富程度

    • unigrams_count:将文章分词后-->采用1-gram-->统计非重复gram的个数
    • bigrams_count:将文章分词后-->采用2-grams-->统计非重复grams的个数
    • trigrams_count:将文章分词后-->采用3-grams-->统计非重复grams的个数
  5. 词性相关:POS counts Features,此特征用于统计文章中不同词性的个数

    • 名词noun_count
    • 形容词adj_count
    • 副词adv_count
    • 动词verb_count
    • 外来词fw_count
  6. 语气相关:Personality Features 分析文中每句话的语气:positive,negative or neutrual

    • 消极语气neg_sentiment
    • 中立语气neu_sentiment
    • 积极语气pos_sentiment
  • 深度学习方式: Deeplearning based

nlp_kesci's People

Contributors

loyalzc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

nlp_kesci's Issues

Result pearsonr : nan

I got the following warning when I ran essay_scoring_dl.py:

PearsonRConstantInputWarning: An input array is constant; the correlation coefficent is not defined.
  warnings.warn(PearsonRConstantInputWarning())

and the output is:

Result pearsonr : nan

Could you please give me some suggestions? Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.