Light

fendouai / chinese-text-classification Goto Github PK

View Code? Open in Web Editor NEW

290.0 24.0 89.0 5.82 MB

Chinese-Text-Classification，Tensorflow CNN（卷积神经网络）实现的中文文本分类。QQ群：522785813，微信群二维码：http://www.tensorflownews.com/

Home Page: http://www.tensorflownews.com/

License: Apache License 2.0

Python 100.00%

tensorflow text-classification chinese cnn cnn-text-classification jieba

chinese-text-classification's Introduction

用卷积神经网络基于 Tensorflow 实现的中文文本分类

这个项目是基于以下项目改写： cnn-text-classification-tf

关于 Chinese-Text-Classification 的问题欢迎来这里提问

主要的改动：

兼容 tensorflow 1.2 以上
增加了中文数据集
增加了中文处理流程

特性：

兼容最新 TensorFlow
中文数据集
基于 jieba 的中文处理工具
模型训练，模型保存，模型评估的完整实现

训练结果

模型评估

以下为原项目的 README

This code belongs to the "Implementing a CNN for Text Classification in Tensorflow" blog post.

It is slightly simplified implementation of Kim's Convolutional Neural Networks for Sentence Classification paper in Tensorflow.

Requirements

Python 3
Tensorflow > 1.2
Numpy

Training

Print parameters:

./train.py --help

optional arguments:
  -h, --help            show this help message and exit
  --embedding_dim EMBEDDING_DIM
                        Dimensionality of character embedding (default: 128)
  --filter_sizes FILTER_SIZES
                        Comma-separated filter sizes (default: '3,4,5')
  --num_filters NUM_FILTERS
                        Number of filters per filter size (default: 128)
  --l2_reg_lambda L2_REG_LAMBDA
                        L2 regularizaion lambda (default: 0.0)
  --dropout_keep_prob DROPOUT_KEEP_PROB
                        Dropout keep probability (default: 0.5)
  --batch_size BATCH_SIZE
                        Batch Size (default: 64)
  --num_epochs NUM_EPOCHS
                        Number of training epochs (default: 100)
  --evaluate_every EVALUATE_EVERY
                        Evaluate model on dev set after this many steps
                        (default: 100)
  --checkpoint_every CHECKPOINT_EVERY
                        Save model after this many steps (default: 100)
  --allow_soft_placement ALLOW_SOFT_PLACEMENT
                        Allow device soft device placement
  --noallow_soft_placement
  --log_device_placement LOG_DEVICE_PLACEMENT
                        Log placement of ops on devices
  --nolog_device_placement

Train:

./train.py

Evaluating

./eval.py --eval_train --checkpoint_dir="./runs/1459637919/checkpoints/"

Replace the checkpoint dir with the output from the training. To use your own data, change the eval.py script to load your data.

References

chinese-text-classification's People

Contributors

Stargazers

Watchers

Forkers

a382695908 morindaz haonanli qiuyukuhe ryfan-rs nanfengpo shikaize jiaenyue little1tow generalzh 2460555471 biubiutang zhuiyuan616124 lia-git wwwanghao dl-yc rootopia ritali felixdae skyyhh quanke bihui9968 wdufu linlinxiaostudent teqdex itonly dst1213 genilyc samchen1981 muyuwuxin arryboom lyricremix gjjg1331jggj pogevip 7326 xiongfeiqin agnon1573 i69086 mint-shang kwanegx freelogic kissthink shuoshuoge web199195 idreamsoft zhujiahui fengw417 chladams weiqiangzheng nonva fancycheung aiedward adangadang xwixcn viphsj cqulun123 yan0409 folgerhuang promisenuc ben2017a aigeorgeli leishenvictoria carolinexull chaoongithub perryhau pengpage duanzhihua asia-lee parety baichunfei zhuzhuxa iamweiweishi adamlouisky hhy5277 wangtaogh shangkang bitallin borgzz daisey666 zengzehui jonyboy2000 tuco3333 wuting0521 kingcat1908 iflybird dystudio

chinese-text-classification's Issues

建个深度学习自然语言 QQ群，供大家交流把，QQ群号：935105641

如题！

用新版本中的方法定义输出层，怎么添加l2正则到loss中

        self.scores =tf.layers.dense(self.h_drop,num_classes,name='scores',kernel_initializer=tf.contrib.layers.xavier_initializer(),bias_initializer=tf.constant_initializer(0.1))

        # W = tf.get_variable(
        #     "W",
        #     shape=[num_filters_total, num_classes],
        #     initializer=tf.contrib.layers.xavier_initializer())
        # b = tf.Variable(tf.constant(0.1, shape=[num_classes]), name="b")
        # l2_loss += tf.nn.l2_loss(W)
        # l2_loss += tf.nn.l2_loss(b)
        # print(self.scores.shape)
        # self.scores = tf.nn.xw_plus_b(self.h_drop, W, b, name="scores")
        
        self.predictions = tf.argmax(self.scores, 1, name="predictions")


在这里面被注释的这些，我用开头的一句话tf.layers.dense代替了，这个方法有kernel_regularizer，但是不知道这里的kernel_regularizer会不会加入到loss中去。

内存溢出怎么解决？

在最初的代码中使用的时候没有发生内存溢出的问题，但是在使用中文数据集进行训练的时候发生内存溢出导致内存不足，有没有什么解决办法？

README file里说change checkpoint directory，应该改成什么？

tf.flags.DEFINE_string("checkpoint_dir", "", "Checkpoint directory from training run")
是应该改第二项input么？改成什么

为什么运行./train.py的时候会提示无权限

建议把.DS_Store添加到.gitignore里面

正确率就50%多吗？

shouldn't shuffle during evaluation

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.