This project is my research group project, and it is also a study of TensorFlow, Deep Learning(Fasttext, CNN, LSTM, RCNN, etc.).
The main objective of the project is to solve the multi-label text classification problem based on Convolutional Neural Networks. Thus, the format of the data label is like [0, 1, 0, ..., 1, 1] according to the characteristics of such problem.
- Python 3.6
- Tensorflow 1.8 +
- Numpy
- Gensim
Research data may attract copyright protection under China law. Thus, there is only code.
实验数据属于实验室与某公司的合作项目,涉及商业机密,在此不予提供,还望谅解。
- Make the data support Chinese and English.(Which use
jieba
seems easy) - Can use your own pre-trained word vectors.(Which use
gensim
seems easy) - Add embedding visualization based on the tensorboard.
- Add the correct L2 loss calculation operation.
- Add gradients clip operation to prevent gradient explosion.
- Add learning rate decay with exponential decay.
- Add a new Highway Layer.(Which is useful according to the model performance)
- Add Batch Normalization Layer.
- Can choose to train the model directly or restore the model from checkpoint in
train.py
. - Can predict the labels via threshold and topK in
train.py
andtest.py
. - Add
test.py
, the model test code, it can show the predict value of each labels of the data in Testset when create the final prediction file. - Add other useful data preprocess functions in
data_helpers.py
. - Use
logging
for helping recording the whole info(including parameters display, model training info, etc.). - Provide the ability to save the best n checkpoints in
checkmate.py
, whereas thetf.train.Saver
can only save the last n checkpoints.
Depends on what your data and task are.
You can use jieba
package if you are going to deal with the chinese text data.
- Use
gensim
package to pre-train data. - Use
glove
tools to pre-train data. - Even can use a fasttext network to pre-train data.
References:
References:
- Personal ideas 🙃
References:
- Convolutional Neural Networks for Sentence Classification
- A Sensitivity Analysis of (and Practitioners' Guide to) Convolutional Neural Networks for Sentence Classification
Warning: Model can use but not finished yet 🤪!
- Add BN-LSTM cell unit.
- Add attention.
References:
References:
- Personal ideas 🙃
References:
- Personal ideas 🙃
References:
Warning: Model can use but not finished yet 🤪!
- Add attention penalization loss.
- Add visualization.
References:
黄威,Randolph
SCU SE Bachelor; USTC CS Master
Email: [email protected]
My Blog: randolph.pro
LinkedIn: randolph's linkedin