miulab / slotgated-slu Goto Github PK

View Code? Open in Web Editor NEW

303.0 9.0 108.0 428 KB

Slot-Gated Modeling for Joint Slot Filling and Intent Prediction

Python 100.00%

intent-prediction joint-models natural-language-understanding slot-filling spoken-language-understanding

slotgated-slu's Introduction

Slot-Gated Modeling for Joint Slot Filling and Intent Prediction

Reference

Main paper to be cited (Goo et al., 2018)

@inproceedings{goo2018slot,
  title={Slot-Gated Modeling for Joint Slot Filling and Intent Prediction},
    author={Chih-Wen Goo and Guang Gao and Yun-Kai Hsu and Chih-Li Huo and Tsung-Chieh Chen and Keng-Wei Hsu and Yun-Nung Chen},
    booktitle={Proceedings of The 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
    year={2018}
}

Want to Reproduce the experiment?

Enter --dataset=atis or --dataset=snips to use ATIS or Snips (Coucke et al., 2018) dataset.

Where to Put My Dataset?

You need to put your dataset under ./data/ and use --dataset=foldername. For example, your dataset is ./data/mydata, then you need to enter --dataset=mydata
Your dataset should be seperated to three folders - train, test, and valid, which is named 'train', 'test', and 'valid' by default setting of train.py. Each of these folders contain three files - word sequence, slot label, and intent label, which is named 'seq.in', 'seq.out', and 'label' by default setting of train.py. For example, the full path to train/slot_label_file is './data/mydata/train/seq.out' .
Each line represents an example, and slot label should use the IBO format.
Vocabulary files will be generated by utils.createVocabulary() automatically
You may see ./data/atis for more detail.

Requirements

tensorflow 1.4
python 3.5

Usage

some sample usage

run with 32 units, atis dataset and no patience for early stop
python3 train.py --num_units=32 --dataset=atis --patience=0
disable early stop, use snips dataset and use intent attention version
python3 train.py --no_early_stop --dataset=snips --model_type=intent_only
use "python3 train.py -h" for all avaliable parameter settings
Note: must type --dataset. If you don't want to use this flag, type --dataset='' instead.

slotgated-slu's People

Contributors

Stargazers

Watchers

Forkers

shubhampachori12110095 yorick76ee hydercps smashpumpkin robspringles isaacmg knight-wang yuantinghsieh frankey419 cheesezh shadowridgedev sevinjyolchuyeva chenmoshushi waheedabro db-li akailcy cindytech 649453932 luoyexuge zhongyunuestc ninedaywang helicqin syx528911137 mingyates zxsted m-and-ms hkxiron zgd716 wwx13 wuguobiao stiffxj joohyukjeon1 erickrribeiro fgiordaniello titsuki chenxf0619 happyyolanda bobobe jiefangxuanyan stnew xiaopp123 donnyslin xumeng123 leedong123 tianxie-9 ishine zhgwen suicidedetection currywu123 zimengq casually-pylearner zhangli2953 whatyouknow123 taopengpengtao 90217 sharonlong328 krishnapriya401 clistawang angel8023 youlandasu colindowney frostjsy jorgeramirez shirinliu ashbursong sakura09 y12uc231 xiaoqi-huang moore3930 steady-pace lansefangzhou li-ming-fan nlp-alpha caoyuji1986 brucekyle99 novellll howl-anderson iiacobac shenyi666666 hoomanvhd eviliclufas connietong veranep piaofu110 victortowne jasonchan117 pcannon67 kartikay-bagla hykyungyoon matteo-grella temuujine ericdoug-qi quanjiehan zhaoze92 leesinliang benben2022 mysqlsc chunyu226 zephyrthebird batman-do

slotgated-slu's Issues

Reload saved model caused performance drop by more than 3 percent

Hi,

I tried to reload the whole model by adding

    saver.restore(sess, "model/*")

after sess.run(tf.global_variables_initializer()), I notice that every time it restored the model, the performance will be much lower. Could you have a try and explain the possible reasons?

Better performance could be achieved with larger patience

Thanks for sharing the codes.
I run the model on ATIS with larger max-epoch number and patience value, and the result is much better than the result 82.5% which is reported in the paper.
So I write to know why don't you report the better result.

2018-06-21 10:37:47,961 : INFO : Step: 8680
2018-06-21 10:37:47,961 : INFO : Epochs: 31
2018-06-21 10:37:47,961 : INFO : Loss: 0.017756752894118627
2018-06-21 10:37:48,120 : INFO : Valid:
2018-06-21 10:37:48,325 : INFO : slot f1: 97.34306569343066
2018-06-21 10:37:48,325 : INFO : intent accuracy: 97.6
2018-06-21 10:37:48,325 : INFO : semantic error(intent, slots are all correct): 89.8
2018-06-21 10:37:48,326 : INFO : Test:
2018-06-21 10:37:48,634 : INFO : slot f1: 95.4809213996835
2018-06-21 10:37:48,634 : INFO : intent accuracy: 94.8488241881299
2018-06-21 10:37:48,634 : INFO : semantic error(intent, slots are all correct): 85.1063829787234

File in the folder "data"

Hi,
I read the paper and I think it's an interesting work. Are you going to provide the files inside the data folder? If not, can you please specify how the data format should be in each of the file?

Thank you.

clarification on paper result

In your paper, the ATIS Dataset
The "Atten Base Liu and Lane 2016"
is reported as
F1 94.2, Intent Accuracy 91, Sentence Accuracy 78.9

while in their paper and this new paper

is reported as
F1 95.98 and Intent Accuracy 98.21

I wonder if your experiment setting is different from theirs?
Could you kindly explain this?

Thank you!

Is the model in code matches what is described in the paper?

In the first attention block,
You get hidden features using a filter AttnW to conv. with state outputs,
I believe that's the formula W h_k,
BUT you also have state outputs pass through a linear layer and get y
Then you add y and hidden features then pass through an tanh then multiply by a matrix

In the paper, I only see tanh(W h_k)

Also in your code there is AttnV,
where I can't find corresponding description in the paper.

The paper only has gate V and gate W

Could you kindly explain this?
I am really confusing.
Thank you!

about the data

in data/snips/train/seq.in Line 72, the sample is "find playstation官方杂志 a song". Why is there a Chinese in the data set?

Clarification on reproducing paper result

Im am trying to reproduce the experiment for my thesis and i am having a hard time getting the same numbers.

In your paper, you report: "In all experiments ... the numbers are averaged over 20 runs".
But based on the output file:

...
2020-03-09 19:52:52,220 : INFO : Epochs: 19
2020-03-09 19:52:52,220 : INFO : Loss: 0.02750968653033072
2020-03-09 19:52:53,346 : INFO : Valid:
2020-03-09 19:52:54,032 : INFO : slot f1: 97.74259747874524
2020-03-09 19:52:54,033 : INFO : intent accuracy: 96.6
2020-03-09 19:52:54,033 : INFO : semantic error(intent, slots are all correct): 89.60000000000001
2020-03-09 19:52:54,033 : INFO : Test:
2020-03-09 19:52:54,948 : INFO : slot f1: 95.40960451977402
2020-03-09 19:52:54,948 : INFO : intent accuracy: 95.63269876819709
2020-03-09 19:52:54,948 : INFO : semantic error(intent, slots are all correct): 84.5464725643897
2020-03-09 19:53:11,712 : INFO : Step: 5600
2020-03-09 19:53:11,714 : INFO : Epochs: 20
2020-03-09 19:53:11,714 : INFO : Loss: 0.027824909109663818
2020-03-09 19:53:12,100 : INFO : Valid:
2020-03-09 19:53:12,613 : INFO : slot f1: 97.48538011695906
2020-03-09 19:53:12,613 : INFO : intent accuracy: 97.39999999999999
2020-03-09 19:53:12,613 : INFO : semantic error(intent, slots are all correct): 89.8
2020-03-09 19:53:12,613 : INFO : Test:
2020-03-09 19:53:13,270 : INFO : slot f1: 95.26501766784452
2020-03-09 19:53:13,270 : INFO : intent accuracy: 95.40873460246361
2020-03-09 19:53:13,270 : INFO : semantic error(intent, slots are all correct): 83.87458006718926

i am confused which is the representative set of numbers for 1 run.
Since an early-stop strategy is applied, do i understand correctly that the representative result is indeed the last output (marked in bold) which are then averaged over 20 runs ?

I would appreciate if somebody could kindly clarify.

Thanks!

about the paper symbol

Hello, I meet some questions, when I read the paper.

In different i^th, I don't know how (W^S_he,h_k) to compute different results. the (W^S_he,h_k) can't find the symbol 'i'.

Waiting for your reply!

add inference intime code

Hi, i have trained my Chinese data use your code ,and it test well,thans for your share.

And i want to inference result intime when user input come in. So i add some code about inference intime ,could you please check it , thx!
`import os
import argparse
import logging
import sys
import tensorflow as tf
import numpy as np
import my_la
import time

from tensorflow.python.ops import rnn_cell_impl

import tensorflow.contrib.rnn.python.ops.core_rnn_cell as rnn_cell_impl

import tensorflow.contrib.rnn as rnn_cell_impl

import tensorflow.contrib.rnn.core_rnn_cell as rnn_cell_impl

import tensorflow.contrib.rnn as rnn
from tensorflow.contrib.rnn.python.ops import core_rnn_cell as rnn_cell_impl

from tensorflow.python.ops import rnn_cell_impl

from utils import createVocabulary
from utils import loadVocabulary
from utils import computeF1Score
from utils import DataProcessor

parser = argparse.ArgumentParser(allow_abbrev=False)

#Network
parser.add_argument("--num_units", type=int, default=128, help="Network size.", dest='layer_size')
parser.add_argument("--model_type", type=str, default='full', help="""full(default) | intent_only
full: full attention model
intent_only: intent attention model""")

#Training Environment
parser.add_argument("--batch_size", type=int, default=32, help="Batch size.")
parser.add_argument("--max_epochs", type=int, default=30, help="Max epochs to train.")
parser.add_argument("--no_early_stop", action='store_false',dest='early_stop', help="Disable early stop, which is based on sentence level accuracy.")
parser.add_argument("--patience", type=int, default=8, help="Patience to wait before stop.")

#Model and Vocab
parser.add_argument("--dataset", type=str, default='aiways', help="""Type 'atis' or 'snips' to use dataset provided by us or enter what ever you named your own dataset.
Note, if you don't want to use this part, enter --dataset=''. It can not be None""")
parser.add_argument("--model_path", type=str, default='./model', help="Path to save model.")
parser.add_argument("--vocab_path", type=str, default='./vocab', help="Path to vocabulary files.")

#Data
parser.add_argument("--train_data_path", type=str, default='train', help="Path to training data files.")
parser.add_argument("--test_data_path", type=str, default='test', help="Path to testing data files.")
parser.add_argument("--valid_data_path", type=str, default='valid', help="Path to validation data files.")
parser.add_argument("--input_file", type=str, default='sql.in', help="Input file name.")
parser.add_argument("--slot_file", type=str, default='sql.out', help="Slot file name.")
parser.add_argument("--intent_file", type=str, default='label', help="Intent file name.")
parser.add_argument("--emb_dim", type=str, default=100, help="Intent file name.")
parser.add_argument("--pretrain_model", type=str, default='./model', help="Path to save model.")
parser.add_argument("--Is_Train",type=bool,default=False,help="whether is train")
parser.add_argument("--Is_intime",type=bool,default=True,help="whether predict intime")

arg=parser.parse_args()

#Print arguments
for k,v in sorted(vars(arg).items()):
print(k,'=',v)
print()

if arg.model_type == 'full':
add_final_state_to_intent = True
remove_slot_attn = False
elif arg.model_type == 'intent_only':
add_final_state_to_intent = True
remove_slot_attn = True
else:
print('unknown model type!')
exit(1)

#full path to data will be: ./data + dataset + train/test/valid
if arg.dataset == None:
print('name of dataset can not be None')
exit(1)
elif arg.dataset == 'snips':
print('use snips dataset')
elif arg.dataset == 'atis':
print('use atis dataset')
else:
print('use own dataset: ',arg.dataset)
full_train_path = os.path.join('./data',arg.dataset,arg.train_data_path)
full_test_path = os.path.join('./data',arg.dataset,arg.test_data_path)
full_valid_path = os.path.join('./data',arg.dataset,arg.valid_data_path)

createVocabulary(os.path.join(full_train_path, arg.input_file), os.path.join(arg.vocab_path, 'in_vocab'))
createVocabulary(os.path.join(full_train_path, arg.slot_file), os.path.join(arg.vocab_path, 'slot_vocab'))
createVocabulary(os.path.join(full_train_path, arg.intent_file), os.path.join(arg.vocab_path, 'intent_vocab'))

in_vocab = loadVocabulary(os.path.join(arg.vocab_path, 'in_vocab'))
slot_vocab = loadVocabulary(os.path.join(arg.vocab_path, 'slot_vocab'))
intent_vocab = loadVocabulary(os.path.join(arg.vocab_path, 'intent_vocab'))

def createModel(input_data, input_size, sequence_length, slot_size, intent_size, emb,layer_size = 128, isTraining = True):
cell_fw = tf.contrib.rnn.BasicLSTMCell(layer_size)
cell_bw = tf.contrib.rnn.BasicLSTMCell(layer_size)

if isTraining == True:
    cell_fw = tf.contrib.rnn.DropoutWrapper(cell_fw, input_keep_prob=0.5,
                                         output_keep_prob=0.5)
    cell_bw = tf.contrib.rnn.DropoutWrapper(cell_bw, input_keep_prob=0.5,
                                         output_keep_prob=0.5)

embedding = tf.get_variable('embedding', [input_size, arg.emb_dim])
tf.assign(embedding,emb)
inputs = tf.nn.embedding_lookup(embedding, input_data)

state_outputs, final_state = tf.nn.bidirectional_dynamic_rnn(cell_fw, cell_bw, inputs, sequence_length=sequence_length, dtype=tf.float32)

final_state = tf.concat([final_state[0][0], final_state[0][1], final_state[1][0], final_state[1][1]], 1)
state_outputs = tf.concat([state_outputs[0], state_outputs[1]], 2)
state_shape = state_outputs.get_shape()

with tf.variable_scope('attention'):
    slot_inputs = state_outputs
    if remove_slot_attn == False:
        with tf.variable_scope('slot_attn'):
            attn_size = state_shape[2].value
            origin_shape = tf.shape(state_outputs)
            hidden = tf.expand_dims(state_outputs, 1)
            hidden_conv = tf.expand_dims(state_outputs, 2)
            # hidden shape = [batch, sentence length, 1, hidden size]
            k = tf.get_variable("AttnW", [1, 1, attn_size, attn_size])
            hidden_features = tf.nn.conv2d(hidden_conv, k, [1, 1, 1, 1], "SAME")
            hidden_features = tf.reshape(hidden_features, origin_shape)
            hidden_features = tf.expand_dims(hidden_features, 1)
            v = tf.get_variable("AttnV", [attn_size])

            slot_inputs_shape = tf.shape(slot_inputs)
            slot_inputs = tf.reshape(slot_inputs, [-1, attn_size])
            y = rnn_cell_impl._linear(slot_inputs, attn_size, True)
            y = tf.reshape(y, slot_inputs_shape)
            y = tf.expand_dims(y, 2)
            s = tf.reduce_sum(v * tf.tanh(hidden_features + y), [3])
            a = tf.nn.softmax(s)
            # a shape = [batch, input size, sentence length, 1]
            a = tf.expand_dims(a, -1)
            slot_d = tf.reduce_sum(a * hidden, [2])
    else:
        attn_size = state_shape[2].value
        slot_inputs = tf.reshape(slot_inputs, [-1, attn_size])

    intent_input = final_state
    with tf.variable_scope('intent_attn'):
        attn_size = state_shape[2].value
        hidden = tf.expand_dims(state_outputs, 2)
        k = tf.get_variable("AttnW", [1, 1, attn_size, attn_size])
        hidden_features = tf.nn.conv2d(hidden, k, [1, 1, 1, 1], "SAME")
        v = tf.get_variable("AttnV", [attn_size])

        y = rnn_cell_impl._linear(intent_input, attn_size, True)
        y = tf.reshape(y, [-1, 1, 1, attn_size])
        s = tf.reduce_sum(v*tf.tanh(hidden_features + y), [2,3])
        a = tf.nn.softmax(s)
        a = tf.expand_dims(a, -1)
        a = tf.expand_dims(a, -1)
        d = tf.reduce_sum(a * hidden, [1, 2])

        if add_final_state_to_intent == True:
            intent_output = tf.concat([d, intent_input], 1)
        else:
            intent_output = d

    with tf.variable_scope('slot_gated'):
        intent_gate = rnn_cell_impl._linear(intent_output, attn_size, True)
        intent_gate = tf.reshape(intent_gate, [-1, 1, intent_gate.get_shape()[1].value])
        v1 = tf.get_variable("gateV", [attn_size])
        if remove_slot_attn == False:
            slot_gate = v1 * tf.tanh(slot_d + intent_gate)
        else:
            slot_gate = v1 * tf.tanh(state_outputs + intent_gate)
        slot_gate = tf.reduce_sum(slot_gate, [2])
        slot_gate = tf.expand_dims(slot_gate, -1)
        if remove_slot_attn == False:
            slot_gate = slot_d * slot_gate
        else:
            slot_gate = state_outputs * slot_gate
        slot_gate = tf.reshape(slot_gate, [-1, attn_size])
        slot_output = tf.concat([slot_gate, slot_inputs], 1)

with tf.variable_scope('intent_proj'):
    intent = rnn_cell_impl._linear(intent_output, intent_size, True)

with tf.variable_scope('slot_proj'):
    slot = rnn_cell_impl._linear(slot_output, slot_size, True)

outputs = [slot, intent]
return outputs

def valid(in_path, slot_path, intent_path, sg_sess):
data_processor_valid = DataProcessor(in_path, slot_path, intent_path, in_vocab, slot_vocab, intent_vocab)

pred_intents = []
correct_intents = []
slot_outputs = []
correct_slots = []
input_words = []

# used to gate
gate_seq = []
while True:
    in_data, slot_data, slot_weight, length, intents, in_seq, slot_seq, intent_seq = data_processor_valid.get_batch(
        arg.batch_size)
    feed_dict = {input_data.name: in_data, sequence_length.name: length}
    ret = sg_sess.run(inference_outputs, feed_dict)
    for i in ret[0]:
        pred_intents.append(np.argmax(i))
    for i in intents:
        correct_intents.append(i)

    pred_slots = ret[1].reshape((slot_data.shape[0], slot_data.shape[1], -1))
    for p, t, i, l in zip(pred_slots, slot_data, in_data, length):
        p = np.argmax(p, 1)
        tmp_pred = []
        tmp_correct = []
        tmp_input = []
        for j in range(l):
            tmp_pred.append(slot_vocab['rev'][p[j]])
            tmp_correct.append(slot_vocab['rev'][t[j]])
            tmp_input.append(in_vocab['rev'][i[j]])

        slot_outputs.append(tmp_pred)
        correct_slots.append(tmp_correct)
        input_words.append(tmp_input)

    if data_processor_valid.end == 1:
        break

pred_intents = np.array(pred_intents)
correct_intents = np.array(correct_intents)
accuracy = (pred_intents == correct_intents)
semantic_error = accuracy
accuracy = accuracy.astype(float)
accuracy = np.mean(accuracy) * 100.0

index = 0
for t, p in zip(correct_slots, slot_outputs):
    # Process Semantic Error
    if len(t) != len(p):
        raise ValueError('Error!!')

    for j in range(len(t)):
        if p[j] != t[j]:
            semantic_error[index] = False
            break
    index += 1
semantic_error = semantic_error.astype(float)
semantic_error = np.mean(semantic_error) * 100.0

f1, precision, recall = computeF1Score(correct_slots, slot_outputs)
logging.info('slot f1: ' + str(f1))
logging.info('intent accuracy: ' + str(accuracy))
logging.info('semantic error(intent, slots are all correct): ' + str(semantic_error))

data_processor_valid.close()
return f1, accuracy, semantic_error, pred_intents, correct_intents, slot_outputs, correct_slots, input_words, gate_seq

def inference_intime(in_path, slot_path, intent_path, sg_sess, input_sqr):
data_processor_valid = DataProcessor(in_path, slot_path, intent_path, in_vocab, slot_vocab, intent_vocab,is_intime=True)

pred_intents = []
correct_intents = []
slot_outputs = []
correct_slots = []
input_words = []

# used to gate
gate_seq = []

in_data,length = data_processor_valid.get_input(input_sqr)

feed_dict = {input_data.name: in_data, sequence_length.name: length}
ret = sg_sess.run(inference_outputs, feed_dict)
for i in ret[0]:
    pred_intents.append(np.argmax(i))

pred_slots = ret[1].reshape((1, 20, -1))
for p,i, l in zip(pred_slots, in_data, length):
    p = np.argmax(p, 1)
    tmp_pred = []
    tmp_input = []
    for j in range(l):
        tmp_pred.append(slot_vocab['rev'][p[j]])
        tmp_input.append(in_vocab['rev'][i[j]])

    slot_outputs.append(tmp_pred)
    input_words.append(tmp_input)


pred_intents =[intent_vocab['rev'][x] for x in pred_intents]

data_processor_valid.close()

return pred_intents,slot_outputs

Create Training Model

input_data = tf.placeholder(tf.int32, [None, None], name='inputs')
sequence_length = tf.placeholder(tf.int32, [None], name="sequence_length")
global_step = tf.Variable(0, trainable=False, name='global_step')
slots = tf.placeholder(tf.int32, [None, None], name='slots')
slot_weights = tf.placeholder(tf.float32, [None, None], name='slot_weights')
intent = tf.placeholder(tf.int32, [None], name='intent')

char_list,emb_list = my_la.load_emb('wiki_100.utf8')
d = loadVocabulary('vocab/in_vocab')
emb = my_la.find_emb((char_list,emb_list),d)

with tf.variable_scope('model'):

# 修改emb添加

training_outputs = createModel(input_data, len(in_vocab['vocab']), sequence_length, len(slot_vocab['vocab']), len(intent_vocab['vocab']),emb, layer_size=arg.emb_dim)

slots_shape = tf.shape(slots)
slots_reshape = tf.reshape(slots, [-1])

slot_outputs = training_outputs[0]
with tf.variable_scope('slot_loss'):
crossent = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=slots_reshape, logits=slot_outputs)
crossent = tf.reshape(crossent, slots_shape)
slot_loss = tf.reduce_sum(crossent*slot_weights, 1)
total_size = tf.reduce_sum(slot_weights, 1)
total_size += 1e-12
slot_loss = slot_loss / total_size

intent_output = training_outputs[1]
with tf.variable_scope('intent_loss'):
crossent =tf.nn.sparse_softmax_cross_entropy_with_logits(labels=intent, logits=intent_output)
intent_loss = tf.reduce_sum(crossent) / tf.cast(arg.batch_size, tf.float32)

params = tf.trainable_variables()
opt = tf.train.AdamOptimizer()

intent_params = []
slot_params = []
for p in params:
if not 'slot_' in p.name:
intent_params.append(p)
if 'slot_' in p.name or 'bidirectional_rnn' in p.name or 'embedding' in p.name:
slot_params.append(p)

gradients_slot = tf.gradients(slot_loss, slot_params)
gradients_intent = tf.gradients(intent_loss, intent_params)

clipped_gradients_slot, norm_slot = tf.clip_by_global_norm(gradients_slot, 5.0)
clipped_gradients_intent, norm_intent = tf.clip_by_global_norm(gradients_intent, 5.0)

gradient_norm_slot = norm_slot
gradient_norm_intent = norm_intent
update_slot = opt.apply_gradients(zip(clipped_gradients_slot, slot_params))
update_intent = opt.apply_gradients(zip(clipped_gradients_intent, intent_params), global_step=global_step)

training_outputs = [global_step, slot_loss, update_intent, update_slot, gradient_norm_intent, gradient_norm_slot]
inputs = [input_data, sequence_length, slots, slot_weights, intent]

Create Inference Model

with tf.variable_scope('model', reuse=True):
inference_outputs = createModel(input_data, len(in_vocab['vocab']), sequence_length, len(slot_vocab['vocab']),len(intent_vocab['vocab']), emb,layer_size=arg.emb_dim, isTraining=False)

inference_slot_output = tf.nn.softmax(inference_outputs[0], name='slot_output')
inference_intent_output = tf.nn.softmax(inference_outputs[1], name='intent_output')

inference_outputs = [inference_intent_output, inference_slot_output]
inference_inputs = [input_data, sequence_length]

logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)

saver = tf.train.Saver()

Start Training

with tf.Session() as sess:

if not os.path.exists(arg.pretrain_model):
    os.makedirs(arg.pretrain_model)
    sess.run(tf.global_variables_initializer())
else:
    print("加载预训练模型")
    saver.restore(sess, tf.train.latest_checkpoint(arg.pretrain_model))

epochs = 0
loss = 0.0
data_processor = None
line = 0
num_loss = 0
step = 0
no_improve = 0

#variables to store highest values among epochs, only use 'valid_err' for now
valid_slot = 0
test_slot = 0
valid_intent = 0
test_intent = 0
valid_err = 0
test_err = 0

if arg.Is_Train:

    logging.info('Training Start')
    while True:
        if data_processor == None:
            data_processor = DataProcessor(os.path.join(full_train_path, arg.input_file), os.path.join(full_train_path, arg.slot_file), os.path.join(full_train_path, arg.intent_file), in_vocab, slot_vocab, intent_vocab)
        in_data, slot_data, slot_weight, length, intents,_,_,_ = data_processor.get_batch(arg.batch_size)
        feed_dict = {input_data.name: in_data, slots.name: slot_data, slot_weights.name: slot_weight, sequence_length.name: length, intent.name: intents}
        ret = sess.run(training_outputs, feed_dict)
        loss += np.mean(ret[1])

        line += arg.batch_size
        step = ret[0]
        num_loss += 1

        if data_processor.end == 1:
            line = 0
            data_processor.close()
            data_processor = None
            epochs += 1
            logging.info('Step: ' + str(step))
            logging.info('Epochs: ' + str(epochs))
            logging.info('Loss: ' + str(loss/num_loss))
            num_loss = 0
            loss = 0.0

            save_path = os.path.join(arg.model_path,'_step_' + str(step) + '_epochs_' + str(epochs) + '.ckpt')
            saver.save(sess, save_path)

            logging.info('Valid:')
            epoch_valid_slot, epoch_valid_intent, epoch_valid_err,valid_pred_intent,valid_correct_intent,valid_pred_slot,valid_correct_slot,valid_words,valid_gate = valid(os.path.join(full_valid_path, arg.input_file), os.path.join(full_valid_path, arg.slot_file), os.path.join(full_valid_path, arg.intent_file),sess)

            logging.info('Test:')
            epoch_test_slot, epoch_test_intent, epoch_test_err,test_pred_intent,test_correct_intent,test_pred_slot,test_correct_slot,test_words,test_gate = valid(os.path.join(full_test_path, arg.input_file), os.path.join(full_test_path, arg.slot_file), os.path.join(full_test_path, arg.intent_file),sess)

            if epoch_valid_err <= valid_err:
                no_improve += 1
            else:
                valid_err = epoch_valid_err
                no_improve = 0

            if epochs == arg.max_epochs:
                break

            if arg.early_stop == True:
                if no_improve > arg.patience:
                    break
else:
    logging.info('test start')
    if arg.Is_intime:

        while True:
            input_sqr = input("请输入要查询的语句")
            start = time.time()
            intent,slot = inference_intime( os.path.join(full_valid_path, arg.input_file), os.path.join(full_valid_path, arg.slot_file),
                os.path.join(full_valid_path, arg.intent_file),sess,input_sqr)
            end = time.time()
            print("耗时={}".format(end-start))
            print("intent={}\nslot={}".format(intent,slot))

    else:
        epoch_valid_slot, epoch_valid_intent, epoch_valid_err, valid_pred_intent, valid_correct_intent, valid_pred_slot, valid_correct_slot, valid_words, valid_gate = \
            valid(
            os.path.join(full_valid_path, arg.input_file), os.path.join(full_valid_path, arg.slot_file),
            os.path.join(full_valid_path, arg.intent_file),sess)

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.