Giter VIP home page Giter VIP logo

coolnltk's Introduction

CoolNLTK

文本分类工具集

特点

  1. 多模型,相对统一的数据输入,方便效果对比
  2. 可直接用于生产
  3. 使用相对简单

已实现模型

  1. TextCNN
  2. TextRNN
  3. CLstm

模型训练

1.train file

使用和fastText一样的数据输入

测试数据可以从fastText的代码中下载然后copy到./datasets/dbpedia目录下 具体方法, 参照fatText的文档,运行其中的classification-example.sh就能得到dbpedia.traindbpedia.test

注意:类别标签是从1开始的,因为在后面训练的时候需要做pad 0 的操作,为了避免混淆。

一个例子如下:

__label__7 , joseph purdy homestead
__label__13 , forever young ( 1992 film )
__label__11 , nepenthes ' boca rose
__label__6 , mv eilean bhearnaraigh

train/main.sh指定相关的训练样本路径

TRAIN_FILE=./datasets/dbpedia/dbpedia.train
TEST_FILE=./datasets/dbpedia/dbpedia.test

# 使用的模型 可选cnn, bilstm, clstm
MODEL=cnn

# 中间文件输出路径
DATA_OUT_DIR=./datasets/dbpedia/


# 模型输出路径
MODEL_OUT_DIR=./results/dbpedia/

2.embedding

生成word2vec的训练数据

./main.sh pre

训练词向量

./main.sh vec

3.map file

这一步产生需要的映射文件

./main.sh map

4.tfrecord

产生tfrecord 文件

./main.sh data

5.train

模型训练

./main.sh train

6.模型导出

导出成pb文件,可用Java,Go语言读取

./main export

模型使用

predict.py中有例子,读取上面训练好导出的模型,和产生的vocab.json文件

TextRNN、TextCNN,CLstm 模型能共用这个模块

todo

  • 根据最新的tensorflow重构代码
  • 修改tfrecord 文件的格式,产生多分而不是一份
  • 添加tensorboard 

计划实现更多模型,包括但不限于下面这些

  1. HAM
  2. RCNN
  3. Recurrent Entity Network
  4. Dynamic Memory Network

coolnltk's People

Contributors

rockyzhengwu avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.