Giter VIP home page Giter VIP logo

fendouai / chinese-text-classification Goto Github PK

View Code? Open in Web Editor NEW
291.0 24.0 89.0 5.82 MB

Chinese-Text-Classification,Tensorflow CNN(卷积神经网络)实现的中文文本分类。QQ群:522785813,微信群二维码:http://www.tensorflownews.com/

Home Page: http://www.tensorflownews.com/

License: Apache License 2.0

Python 100.00%
tensorflow text-classification chinese cnn cnn-text-classification jieba

chinese-text-classification's Introduction

用卷积神经网络基于 Tensorflow 实现的中文文本分类

这个项目是基于以下项目改写: cnn-text-classification-tf

关于 Chinese-Text-Classification 的问题欢迎来这里提问

主要的改动:

  • 兼容 tensorflow 1.2 以上
  • 增加了中文数据集
  • 增加了中文处理流程

特性:

  • 兼容最新 TensorFlow
  • 中文数据集
  • 基于 jieba 的中文处理工具
  • 模型训练,模型保存,模型评估的完整实现

训练结果

模型评估

以下为原项目的 README

This code belongs to the "Implementing a CNN for Text Classification in Tensorflow" blog post.

It is slightly simplified implementation of Kim's Convolutional Neural Networks for Sentence Classification paper in Tensorflow.

Requirements

  • Python 3
  • Tensorflow > 1.2
  • Numpy

Training

Print parameters:

./train.py --help
optional arguments:
  -h, --help            show this help message and exit
  --embedding_dim EMBEDDING_DIM
                        Dimensionality of character embedding (default: 128)
  --filter_sizes FILTER_SIZES
                        Comma-separated filter sizes (default: '3,4,5')
  --num_filters NUM_FILTERS
                        Number of filters per filter size (default: 128)
  --l2_reg_lambda L2_REG_LAMBDA
                        L2 regularizaion lambda (default: 0.0)
  --dropout_keep_prob DROPOUT_KEEP_PROB
                        Dropout keep probability (default: 0.5)
  --batch_size BATCH_SIZE
                        Batch Size (default: 64)
  --num_epochs NUM_EPOCHS
                        Number of training epochs (default: 100)
  --evaluate_every EVALUATE_EVERY
                        Evaluate model on dev set after this many steps
                        (default: 100)
  --checkpoint_every CHECKPOINT_EVERY
                        Save model after this many steps (default: 100)
  --allow_soft_placement ALLOW_SOFT_PLACEMENT
                        Allow device soft device placement
  --noallow_soft_placement
  --log_device_placement LOG_DEVICE_PLACEMENT
                        Log placement of ops on devices
  --nolog_device_placement

Train:

./train.py

Evaluating

./eval.py --eval_train --checkpoint_dir="./runs/1459637919/checkpoints/"

Replace the checkpoint dir with the output from the training. To use your own data, change the eval.py script to load your data.

References

chinese-text-classification's People

Contributors

fendouai avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

chinese-text-classification's Issues

用新版本中的方法定义输出层,怎么添加l2正则到loss中

        self.scores =tf.layers.dense(self.h_drop,num_classes,name='scores',kernel_initializer=tf.contrib.layers.xavier_initializer(),bias_initializer=tf.constant_initializer(0.1))

        # W = tf.get_variable(
        #     "W",
        #     shape=[num_filters_total, num_classes],
        #     initializer=tf.contrib.layers.xavier_initializer())
        # b = tf.Variable(tf.constant(0.1, shape=[num_classes]), name="b")
        # l2_loss += tf.nn.l2_loss(W)
        # l2_loss += tf.nn.l2_loss(b)
        # print(self.scores.shape)
        # self.scores = tf.nn.xw_plus_b(self.h_drop, W, b, name="scores")
        
        self.predictions = tf.argmax(self.scores, 1, name="predictions")


在这里面被注释的这些,我用开头的一句话tf.layers.dense代替了,这个方法有kernel_regularizer,但是不知道这里的kernel_regularizer会不会加入到loss中去。

内存溢出怎么解决?

在最初的代码中使用的时候没有发生内存溢出的问题,但是在使用中文数据集进行训练的时候发生内存溢出导致内存不足,有没有什么解决办法?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.