Classify Kaggle San Francisco Crime Description into 39 classes. Build the model with CNN, RNN (GRU and LSTM) and Word Embeddings on Tensorflow.

Home Page: https://www.kaggle.com/c/sf-crime/data

License: Apache License 2.0

Python 100.00%

cnn text-classification kaggle tensorflow rnn embeddings lstm

multi-class-text-classification-cnn-rnn's Introduction

Project: Classify Kaggle San Francisco Crime Description

Highlights:

This is a multi-class text classification (sentence classification) problem.
The goal of this project is to classify Kaggle San Francisco Crime Description into 39 classes.
This model was built with CNN, RNN (LSTM and GRU) and Word Embeddings on Tensorflow.

Data: Kaggle San Francisco Crime

Input: Descript
Output: Category

Examples:

Descript	Category
GRAND THEFT FROM LOCKED AUTO	LARCENY/THEFT
POSSESSION OF NARCOTICS PARAPHERNALIA	DRUG/NARCOTIC
AIDED CASE, MENTAL DISTURBED	NON-CRIMINAL
AGGRAVATED ASSAULT WITH BODILY FORCE	ASSAULT
ATTEMPTED ROBBERY ON THE STREET WITH A GUN	ROBBERY

Train:

Command: python3 train.py train_data.file train_parameters.json
Example: python3 train.py ./data/train.csv.zip ./training_config.json

Predict:

Command: python3 predict.py ./trained_results_dir/ new_data.csv
Example: python3 predict.py ./trained_results_1478563595/ ./data/small_samples.csv

Reference:

Implement a cnn for text classification in tensorflow

multi-class-text-classification-cnn-rnn's People

Contributors

Stargazers

Watchers

Forkers

ilyeong-ai asiagood chenmoshushi abhinavwalia95 prabh-me jgabriellima karla-isabel-sandoval wlcoolongs pierrearb binkmust alphadl vybhavk shdut shaoguangcheng ydjbuaa hartbenjar12 allensmile stevenlol ericxsun lastdawnsu xingxingxudong xsongx xiaolongmeng hxl1990 mars-wei fancycheung balajeeu maozhiqiang jz3707 wesamalnabki ziliwang pandagod 5iknowledge dongxf369 pkuhaywire zhangleiqss skepsun jasonhoou jidlin yuchaotao leezqcst ruoyucad ycsuperlife gustavomr stella-gao madebyweng linkfar shubhangikishore qwzhong1988 vangogh0318 babyhanhanhan rtygbwwwerr nrvnujd cbganap hellozjj norangola mwin007 x-hacker 0b01 nininininini lcssos rbaral citysir haiyansang hwade notlaughinggirl zgj4future skywalkerytx hkhatod casillas-qf ufukhurriyetoglu balancewing bcrisp darinyazanr like3107 dafeix leidongfeng shwetasood tcxdgit chivalrouss ankurmali morgan-hd kobesxl knightofdawn yxpku jamesmw423 blankit mingspy scottai isnowalarm pengcheng617 glorykim99 tigeryang93 orapradeep minuteswithmetrics fengchangfight niranjan272 henrywoodotc meccy vijayendra-g

multi-class-text-classification-cnn-rnn's Issues

How to change parametrs to much bigger text on russian?

Thanks for your code it's pretty much exactly what i was looking for.
But i need to classify bigger text (around 500 words in one article), and it's gonna be in Russian language.
Can you advise how to improve code for this?

What do i need to change in config file?
batch_size": 256,
"dropout_keep_prob": 0.5,
"embedding_dim": 300,
"evaluate_every": 100,
"filter_sizes": "3,4,5",
"hidden_unit": 300,
"l2_reg_lambda": 0.0,
"max_pool_size": 4,
"non_static": false,
"num_epochs": 1,
"num_filters": 128

What do i need to change for bigger sentences?

ata loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?

DataLossError (see above for traceback): unable to open table file .\trained_results_1509434l62\best_mo del.ckpt: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
[[Node: save/RestoreV2_ll = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/tas k:0/cpu:0"](_arg_save/Const_0_0, save/RestoreV2_ll/tensor_names, save/RestoreV2_ll/shape_and_slices)]]
K:\users\Eduard\Downloads\multi-class-text-classification-cnn-rnn-master\multi-class-text-classificatio
n-cnn-rnn-master>

Can it work if “non_static” in parameters sets "True"?

Thanks for your sharing project, and I find the "non-static" in the config.

I just find the code like this:
with open(trained_dir + 'embeddings.pickle', 'wb') as outfile:
pickle.dump(embedding_mat, outfile, pickle.HIGHEST_PROTOCOL)

"embedding_mat" is formed by np.uniform and when I train using "non-static : True", the val is changed just in model "cnn_rnn". when I save it ,it still was the val formed by np.uniform.

So, I don't know if I decript clearly. when I want to using variable embedding, it will work just set "non_static : True"?

Thank you

From where can I install python3

What is your score using this algorithm

Could you tell me roughly what score you reach using this method? Thanks

While Training getting error

[root@bdl02node04 multi-class-text-classification-cnn-rnn-master]# python train.py
CRITICAL:root:The maximum length is 14
INFO:root:x_train: 711219, x_dev: 79025, x_test: 87805
INFO:root:y_train: 711219, y_dev: 79025, y_test: 87805
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
Traceback (most recent call last):
File "train.py", line 161, in
train_cnn_rnn()
File "train.py", line 60, in train_cnn_rnn
l2_reg_lambda = params['l2_reg_lambda'])
File "/root/NN/multi-class-text-classification-cnn-rnn-master/text_cnn_rnn.py", line 34, in init
pad_prio = tf.concat(1, [self.pad] * num_prio)
File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 1047, in concat
dtype=dtypes.int32).get_shape(
File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 651, in convert_to_tensor
as_ref=False)
File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 716, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.py", line 176, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.py", line 165, in constant
tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape, verify_shape=verify_shape))
File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 367, in make_tensor_proto
_AssertCompatible(values, dtype)
File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 302, in _AssertCompatible
(dtype.name, repr(mismatch), type(mismatch).name))
TypeError: Expected int32, got list containing Tensors of type '_Message' instead.

create a SavedModel

hi jiegzhan,
i want to create a savedModel to deploy it later in google ML engine , but your code support only tensorflow 0.9.0 . and in this version, I can't use the tf.train.Saver class to generate a savedmodel . do you have any idea how can I fix this problem ?

when trying to run train.py

sidrah@sidrah-VirtualBox:/Downloads/multi-class-text-classification-cnn-rnn-master$ python3 train.py ./data/train.csv.zip ./training_config.json
Traceback (most recent call last):
File "train.py", line 8, in
import data_helper
File "/home/sidrah/Downloads/multi-class-text-classification-cnn-rnn-master/data_helper.py", line 13, in
from tensorflow.contrib import learn
File "/usr/local/lib/python3.4/dist-packages/tensorflow/init.py", line 23, in
from tensorflow.python import *
File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/init.py", line 48, in
from tensorflow.python import pywrap_tensorflow
File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 28, in
_pywrap_tensorflow = swig_import_helper()
File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow', fp, pathname, description)
File "/usr/lib/python3.4/imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
ImportError: /usr/local/lib/python3.4/dist-packages/tensorflow/python/_pywrap_tensorflow.so: invalid ELF header
sidrah@sidrah-VirtualBox:/Downloads/multi-class-text-classification-cnn-rnn-master$ python3 train.py ./data/train.csv.zip ./training_config.json
Traceback (most recent call last):
File "train.py", line 8, in
import data_helper
File "/home/sidrah/Downloads/multi-class-text-classification-cnn-rnn-master/data_helper.py", line 13, in
from tensorflow.contrib import learn
File "/usr/local/lib/python3.4/dist-packages/tensorflow/init.py", line 23, in
from tensorflow.python import *
File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/init.py", line 48, in
from tensorflow.python import pywrap_tensorflow
File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 28, in
_pywrap_tensorflow = swig_import_helper()
File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow', fp, pathname, description)
File "/usr/lib/python3.4/imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
ImportError: /usr/local/lib/python3.4/dist-packages/tensorflow/python/_pywrap_tensorflow.so: invalid ELF header

Error while training: Unknown argument "syntax"

I get the following error when trying to train :
TypeError: init() got an unexpected keyword argument 'syntax'

Any idea where this comes from?

File "train.py", line 8, in
import data_helper
File "/Users/Nanous/Desktop/crime_classification/data_helper.py", line 13, in
from tensorflow.contrib import learn
File "/usr/local/lib/python2.7/site-packages/tensorflow/init.py", line 24, in
from tensorflow.python import *
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/init.py", line 54, in
from tensorflow.core.framework.graph_pb2 import *
File "/usr/local/lib/python2.7/site-packages/tensorflow/core/framework/graph_pb2.py", line 16, in
from tensorflow.core.framework import node_def_pb2 as tensorflow_dot_core_dot_framework_dot_node__def__pb2
File "/usr/local/lib/python2.7/site-packages/tensorflow/core/framework/node_def_pb2.py", line 16, in
from tensorflow.core.framework import attr_value_pb2 as tensorflow_dot_core_dot_framework_dot_attr__value__pb2
File "/usr/local/lib/python2.7/site-packages/tensorflow/core/framework/attr_value_pb2.py", line 16, in
from tensorflow.core.framework import tensor_pb2 as tensorflow_dot_core_dot_framework_dot_tensor__pb2
File "/usr/local/lib/python2.7/site-packages/tensorflow/core/framework/tensor_pb2.py", line 16, in
from tensorflow.core.framework import resource_handle_pb2 as tensorflow_dot_core_dot_framework_dot_resource__handle__pb2
File "/usr/local/lib/python2.7/site-packages/tensorflow/core/framework/resource_handle_pb2.py", line 22, in
serialized_pb=_b('\n/tensorflow/core/framework/resource_handle.proto\x12\ntensorflow"m\n\x0eResourceHandle\x12\x0e\n\x06\x64\x65vice\x18\x01 \x01(\t\x12\x11\n\tcontainer\x18\x02 \x01(\t\x12\x0c\n\x04name\x18\x03 \x01(\t\x12\x11\n\thash_code\x18\x04 \x01(\x04\x12\x17\n\x0fmaybe_type_name\x18\x05 \x01(\tB4\n\x18org.tensorflow.frameworkB\x13ResourceHandleProtoP\x01\xf8\x01\x01\x62\x06proto3')

TypeError: init() got an unexpected keyword argument 'syntax'

Extract Associated Probability

This is awesome code. I am fairly new to TF and have tried to google the answer myself but can't figure it out. How do I also extract the associated probability with the label that it predicts?

How to deal with the imbalance data problem?

I tried to transplant the code on my own text classification data( 47 classes in 42000 records), finding out that the classifier would tend to choose the larger classes like THEFT, ASSULT and so forth. How you guys deal with the imbalance data to make them seems more 'balance'?

tensorflow 1 migration

Migration to tensorflow 1
I changed the concat from (1,xxx) to (xxx,1)
I changed the tf.nn.rnn_ to tf.contrib.rnn

But now I have this error in
File "train.py", line 161, in
train_cnn_rnn()
File "train.py", line 60, in train_cnn_rnn
l2_reg_lambda = params['l2_reg_lambda'])
File "/home/administrator/django/demo/tempo/multi-class-text-classification-cnn-rnn/text_cnn_rnn.py", line 58, in init
inputs = [tf.squeeze(input_, [1]) for input_ in tf.split(1, reduced, pooled_concat)]
File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/ops/array_ops.py", line 1203, in split
num = size_splits_shape.dims[0]
IndexError: list index out of range

Any ideas?

use softmax to get probability distribution for each class

I wish to get raw probabilities of each class, instead of argmax() returning the index for highest probability.
request you to kindly help me in using softmax() in the predict.py. Thanks

issue regarding training file please help

W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
Traceback (most recent call last):
File "train.py", line 161, in
train_cnn_rnn()
File "train.py", line 60, in train_cnn_rnn
l2_reg_lambda = params['l2_reg_lambda'])
File "/home/akshata/keras_NN/multi-class-text-classification-cnn-rnn-master/text_cnn_rnn.py", line 34, in init
pad_prio = tf.concat(1, [self.pad] * num_prio)
File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 1047, in concat
dtype=dtypes.int32).get_shape(
File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 651, in convert_to_tensor
as_ref=False)
File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 716, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.py", line 176, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.py", line 165, in constant
tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape, verify_shape=verify_shape))
File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 367, in make_tensor_proto
_AssertCompatible(values, dtype)
File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 302, in _AssertCompatible
(dtype.name, repr(mismatch), type(mismatch).name))
TypeError: Expected int32, got list containing Tensors of type '_Message' instead.

Saver not working

I believe the file train.py and line#151 : os.rename(path, trained_dir + 'best_model.ckpt') needs to be updated for 1.2. The path variable is missing the extension of the file ? not sure. Is the any other way to fix it ?

AND

predict.py line 109,110, and 111 needs to be updated as well.

checkpoint_file = trained_dir + 'best_model.ckpt' saver = tf.train.Saver(tf.all_variables()) saver = tf.train.import_meta_graph("{}.meta".format(checkpoint_file[:-5]))`

How does this algorithm apply on Chinese text?

Any hint? It will be grateful if you could provide some examples.

Why remove pre-trained word embedding?

Hi jiegzhan,

Thanks for contributing this great project. I saw you removed pre-trained word embedding? Did it make accuracy drop down?

don't work with python2.7 and tensorflow0.9

Traceback (most recent call last):
File "/home/mjq/PycharmProjects/multi-class-text/multi-class-text-classification-cnn-rnn/train.py", line 167, in
train_cnn_rnn()
File "/home/mjq/PycharmProjects/multi-class-text/multi-class-text-classification-cnn-rnn/train.py", line 63, in train_cnn_rnn
l2_reg_lambda=params['l2_reg_lambda'])
File "/home/mjq/PycharmProjects/multi-class-text/multi-class-text-classification-cnn-rnn/text_cnn_rnn.py", line 56, in init
lstm_cell = tf.contrib.rnn.DropoutWrapper(lstm_cell, output_keep_prob=self.dropout_keep_prob)
AttributeError: 'module' object has no attribute 'DropoutWrapper'

What is the reference of rnn?

It only says about cnn in the reference of this ReadMe, but what about rnn?
Would you please give out the reference paper or some materials?

os.rename can not found model-2600

I didn't change any file
Traceback (most recent call last):
File "train.py", line 161, in
train_cnn_rnn()
File "train.py", line 151, in train_cnn_rnn
os.rename(path, trained_dir + 'best_model.ckpt')
FileNotFoundError: [Errno 2] No such file or directory: './checkpoints_1504000981/model-2600' -> './trained_results_1504000981/best_model.ckpt'

train.py fails with best_model.ckpt not found

Hi,
I am trying the example with Python 3.6.1 and TensorFlow 1.2.1 on Windows 10.

I am getting the following error when I run "python train.py ./data/train.csv.zip ./training_config.json".

CRITICAL:root:Saved model ./checkpoints_1501717661/model-2700 at step 2700
CRITICAL:root:Best accuracy 0.997291996203733 at step 2700
CRITICAL:root:Training is complete, testing the best model on x_test and y_test
INFO:tensorflow:Restoring parameters from ./checkpoints_1501717661/model-2700
INFO:tensorflow:Restoring parameters from ./checkpoints_1501717661/model-2700
CRITICAL:root:Accuracy on test set: 0.9972894482090997
Traceback (most recent call last):
File "train.py", line 161, in
train_cnn_rnn()
File "train.py", line 151, in train_cnn_rnn
os.rename(path, trained_dir + 'best_model.ckpt')
FileNotFoundError: [WinError 2] The system cannot find the file specified: './checkpoints_1501717661/model-2700' -> './trained_results_1501717661/best_model.ckpt'

I ran the train.py couple of times now. Same error. Please help me to solve this issue.

Thanks,
Hilmi.

Predicting fail using tensorflow 1.3.0

env: tensorflow (1.3.0)

using the demo data, execute the predict.py failed.

`(tf_env) [root@patsnap360svr multi-class-text-classification-cnn-rnn]# python predict.py ../trained_results_1506070488/ ../multi-class-text-classification-cnn-rnn-modified/data/small_samples.csv
2017-09-27 13:00:53.275348: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-27 13:00:53.275444: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-27 13:00:53.275473: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-09-27 13:00:53.275492: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-27 13:00:53.275510: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
WARNING:tensorflow:From predict.py:110: all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Please use tf.global_variables instead.
INFO:tensorflow:Restoring parameters from ../trained_results_1506070488/best_model.ckpt
Traceback (most recent call last):
File "/root/.virtualenvs/tf_env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1327, in _do_call
return fn(*args)
File "/root/.virtualenvs/tf_env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1306, in _run_fn
status, run_metadata)
File "/usr/local/lib/python3.6/contextlib.py", line 89, in exit
next(self.gen)
File "/root/.virtualenvs/tf_env/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.NotFoundError: Tensor name "conv-maxpool-3/b" not found in checkpoint files ../trained_results_1506070488/best_model.ckpt
[[Node: save/RestoreV2_1 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/RestoreV2_1/tensor_names, save/RestoreV2_1/shape_and_slices)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "predict.py", line 136, in
predict_unseen_data()
File "predict.py", line 112, in predict_unseen_data
saver.restore(sess, checkpoint_file)
File "/root/.virtualenvs/tf_env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1560, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/root/.virtualenvs/tf_env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 895, in run
run_metadata_ptr)
File "/root/.virtualenvs/tf_env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1124, in _run
feed_dict_tensor, options, run_metadata)
File "/root/.virtualenvs/tf_env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1321, in _do_run
options, run_metadata)
File "/root/.virtualenvs/tf_env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1340, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Tensor name "conv-maxpool-3/b" not found in checkpoint files ../trained_results_1506070488/best_model.ckpt
[[Node: save/RestoreV2_1 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/RestoreV2_1/tensor_names, save/RestoreV2_1/shape_and_slices)]]

Caused by op 'save/RestoreV2_1', defined at:
File "predict.py", line 136, in
predict_unseen_data()
File "predict.py", line 110, in predict_unseen_data
saver = tf.train.Saver(tf.all_variables())
File "/root/.virtualenvs/tf_env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1140, in init
self.build()
File "/root/.virtualenvs/tf_env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1172, in build
filename=self._filename)
File "/root/.virtualenvs/tf_env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 688, in build
restore_sequentially, reshape)
File "/root/.virtualenvs/tf_env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 407, in _AddRestoreOps
tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
File "/root/.virtualenvs/tf_env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 247, in restore_op
[spec.tensor.dtype])[0])
File "/root/.virtualenvs/tf_env/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 663, in restore_v2
dtypes=dtypes, name=name)
File "/root/.virtualenvs/tf_env/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/root/.virtualenvs/tf_env/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2630, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/root/.virtualenvs/tf_env/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1204, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

NotFoundError (see above for traceback): Tensor name "conv-maxpool-3/b" not found in checkpoint files ../trained_results_1506070488/best_model.ckpt
[[Node: save/RestoreV2_1 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/RestoreV2_1/tensor_names, save/RestoreV2_1/shape_and_slices)]]`

Tensorboard embedding

Has anyone tried to visualize text embeddings in tensorboard ? Any guidence on how to implement it ?

Thanks

请问small_sample.cvs是测试集合，prediction_all是预测的结果，对吗？？

AttributeError and IndexError

Hi, I'm getting some errors with Tensorflow 1.0

I ran the tensorflow upgrade script. This fixed the argument order error for concat.

However now I get:

AttributeError: module 'tensorflow.python.ops.nn' has no attribute 'rnn_cell'.

I understand that rnn_cell was moved to tf.contrib.

If I change rnn_cell to tf.contrib I get the following error:

IndexError: list index out of range

Unsuccessful TensorSliceReader constructor

Hi Jie getting below error any idea who to rectify this error:

NotFoundError (see above for traceback): Unsuccessful TensorSliceReader constructor: Failed to find any matching files for ./checkpoints_1517652366/model-0
[[Node: save/RestoreV2_36 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_36/tensor_names, save/RestoreV2_36/shape_and_slices)]]

KeyError: 'RECOVERED VEHICLE',what's the small_sample.csv?

(tensorflow) F:\Postgraduate\KaggleLearning\multi-class-text-classification-cnn-rnn-master\multi-class-text-classification-cnn-rnn-master>python predict.py ./t
rained_results_1541818386/ ./data2/samples.csv
D:\Anaconda\anaconda\envs\tensorflow\lib\site-packages\gensim\utils.py:1212: UserWarning: detected Windows; aliasing chunkize to chunkize_serial
warnings.warn("detected Windows; aliasing chunkize to chunkize_serial")
Traceback (most recent call last):
File "predict.py", line 141, in
predict_unseen_data()
File "predict.py", line 68, in predict_unseen_data
x_, y_, df = load_test_data(test_file, labels)
File "predict.py", line 43, in load_test_data
y_ = df[select[1]].apply(lambda x: label_dict[x]).tolist()
File "D:\Anaconda\anaconda\envs\tensorflow\lib\site-packages\pandas\core\series.py", line 3194, in apply
mapped = lib.map_infer(values, f, convert=convert_dtype)
File "pandas/_libs/src\inference.pyx", line 1472, in pandas.libs.lib.map_infer
File "predict.py", line 43, in
y = df[select[1]].apply(lambda x: label_dict[x]).tolist()
KeyError: 'RECOVERED VEHICLE'

webserver for quick demo

Hey,

Hope you are all well !

Is it possible to have a demo web-server to predict classes from the trained model from the browser for a single short text ?

Cheers,
Richard

Training using tensorflow 1.0 fails

I learnt that you used tf 0.9 when you built this project. My tf version is 1.0. There are something different such as the location of rnn_cell.py. I've changed the source to adjust the run_cell and tf.concat() problems. However, an IndexError: list index out of range exception were thrown when I run the code. Here is the traceback.

    Traceback (most recent call last):
      File "train.py", line 165, in <module>
        train_cnn_rnn()
      File "train.py", line 62, in train_cnn_rnn
        l2_reg_lambda=params['l2_reg_lambda'])
      File "/Users/jiechengwu/Downloads/mctccr/text_cnn_rnn.py", line 60, in __init__
        inputs = [tf.squeeze(input_, [1]) for input_ in tf.split(1, reduced, pooled_concat)]
      File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 1203, in split
        num = size_splits_shape.dims[0]
    IndexError: list index out of range

The exception happens in the tf.split() function. I'm new to NN, please do reply soon.

Thanks.

How is your accuracy result

I want to learn about how the best accuracy of your methods is and the comparisons of all your methods.

Training fails

Hi, I'm having this issue when I run training:

python3 train.py ./data/train.csv.zip ./training_config.json

CRITICAL:root:Accuracy on test set: 0.9971641706053186
Traceback (most recent call last):
File "train.py", line 161, in
train_cnn_rnn()
File "train.py", line 151, in train_cnn_rnn
os.rename(path, trained_dir + 'best_model.ckpt')
FileNotFoundError: [Errno 2] No such file or directory: './checkpoints_1486165230/model-2700' -> './trained_results_1486165230/best_model.ckpt'

I'll spend a bit of time tomorrow to see how t fix this problem.

Shape must be of rank 4 but is of rank 3

Hi,

While running train.py I am getting below error:

Shape must be rank 4 but is rank 3 for 'conv-maxpool-3/concat_2' (op: 'ConcatV2') with input shapes: [?,1,300,1], [?,548,1], [?,1,300,1], [].

This is happening in text_cnn_rnn.py at
conv = tf.nn.conv2d(emb_pad, W, strides=[1, 1, 1, 1], padding='VALID', name='conv')

Can any one support in providing some help on this.

jiegzhan / multi-class-text-classification-cnn-rnn Goto Github PK