locuslab / tcn Goto Github PK
View Code? Open in Web Editor NEWSequence modeling benchmarks and temporal convolutional networks
Home Page: https://github.com/locuslab/TCN
License: MIT License
Sequence modeling benchmarks and temporal convolutional networks
Home Page: https://github.com/locuslab/TCN
License: MIT License
I have a dataset and the text length is very small, and it's a multi-label task with 100+ class, most text length is 2,3,4, and i try to use tcn, but the result is not good(bag than pure cnn), my level is 9, and kernel is 2. Can you give me some suggest?
Hello,
When I am running pmisst_test.py, I get following error:
(Pdb) train_loss.data[0]
*** IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number
It happens on line 98.
I am looking for LSTM replacement that can easily be deployed with ONNX. As there are many issues with LSTMS in onnx, I was thinking about using TCNs. Now, it turned out that the pytorch onnx export module cannot find a suitable op for the weight_norm operation. My questions is
Hi,thanks for your code,this is not a issue,but my confusion,sorry to put it here:
I want to use TCN for speech recognition,and I have some speech labeled with words,but these speech,they do not have the same length,like speech 1 duration is 10 second,speech 2 is 5 second,how can I use TCN for speech recognition?how can I handle the uncertainty of the input?
Thanks,looking forward for your reply.
Hey guys,
Thanks very much for sharing the code for the paper.
I tried running the add_test.py with no change in the default values and the loss does not converge. However for some other random seeds it'll converge.
OS: Ubuntu 16.04
PyTorch version: 1.0.0
Output:
Namespace(batch_size=32, clip=-1, cuda=True, dropout=0.0, epochs=10, ksize=7, levels=8, log_interval=100, lr=0.004, nhid=30, optim='Adam', seed=1111, seq_len=400)
Producing data...
Train Epoch: 1 [ 3168/ 50000 (6%)] Learning rate: 0.0040 Loss: 0.531405
Train Epoch: 1 [ 6368/ 50000 (13%)] Learning rate: 0.0040 Loss: 0.166126
Train Epoch: 1 [ 9568/ 50000 (19%)] Learning rate: 0.0040 Loss: 0.171005
Train Epoch: 1 [ 12768/ 50000 (26%)] Learning rate: 0.0040 Loss: 0.170864
Train Epoch: 1 [ 15968/ 50000 (32%)] Learning rate: 0.0040 Loss: 0.166217
Train Epoch: 1 [ 19168/ 50000 (38%)] Learning rate: 0.0040 Loss: 0.172447
Train Epoch: 1 [ 22368/ 50000 (45%)] Learning rate: 0.0040 Loss: 0.166411
Train Epoch: 1 [ 25568/ 50000 (51%)] Learning rate: 0.0040 Loss: 0.167221
Train Epoch: 1 [ 28768/ 50000 (58%)] Learning rate: 0.0040 Loss: 0.166980
Train Epoch: 1 [ 31968/ 50000 (64%)] Learning rate: 0.0040 Loss: 0.170149
Train Epoch: 1 [ 35168/ 50000 (70%)] Learning rate: 0.0040 Loss: 0.167781
Train Epoch: 1 [ 38368/ 50000 (77%)] Learning rate: 0.0040 Loss: 0.173033
Train Epoch: 1 [ 41568/ 50000 (83%)] Learning rate: 0.0040 Loss: 0.167806
Train Epoch: 1 [ 44768/ 50000 (90%)] Learning rate: 0.0040 Loss: 0.176322
Train Epoch: 1 [ 47968/ 50000 (96%)] Learning rate: 0.0040 Loss: 0.174221
Test set: Average loss: 0.162485
Train Epoch: 2 [ 3168/ 50000 (6%)] Learning rate: 0.0040 Loss: 0.164098
Train Epoch: 2 [ 6368/ 50000 (13%)] Learning rate: 0.0040 Loss: 0.165515
Train Epoch: 2 [ 9568/ 50000 (19%)] Learning rate: 0.0040 Loss: 0.169491
Train Epoch: 2 [ 12768/ 50000 (26%)] Learning rate: 0.0040 Loss: 0.170406
Train Epoch: 2 [ 15968/ 50000 (32%)] Learning rate: 0.0040 Loss: 0.164345
Train Epoch: 2 [ 19168/ 50000 (38%)] Learning rate: 0.0040 Loss: 0.171381
Train Epoch: 2 [ 22368/ 50000 (45%)] Learning rate: 0.0040 Loss: 0.165580
Train Epoch: 2 [ 25568/ 50000 (51%)] Learning rate: 0.0040 Loss: 0.167373
Train Epoch: 2 [ 28768/ 50000 (58%)] Learning rate: 0.0040 Loss: 0.165166
Train Epoch: 2 [ 31968/ 50000 (64%)] Learning rate: 0.0040 Loss: 0.169122
Train Epoch: 2 [ 35168/ 50000 (70%)] Learning rate: 0.0040 Loss: 0.167244
Train Epoch: 2 [ 38368/ 50000 (77%)] Learning rate: 0.0040 Loss: 0.172299
Train Epoch: 2 [ 41568/ 50000 (83%)] Learning rate: 0.0040 Loss: 0.166954
Train Epoch: 2 [ 44768/ 50000 (90%)] Learning rate: 0.0040 Loss: 0.175234
Train Epoch: 2 [ 47968/ 50000 (96%)] Learning rate: 0.0040 Loss: 0.174419
Test set: Average loss: 0.159353
Train Epoch: 3 [ 3168/ 50000 (6%)] Learning rate: 0.0040 Loss: 0.162857
Train Epoch: 3 [ 6368/ 50000 (13%)] Learning rate: 0.0040 Loss: 0.165053
Train Epoch: 3 [ 9568/ 50000 (19%)] Learning rate: 0.0040 Loss: 0.168938
Train Epoch: 3 [ 12768/ 50000 (26%)] Learning rate: 0.0040 Loss: 0.170395
Train Epoch: 3 [ 15968/ 50000 (32%)] Learning rate: 0.0040 Loss: 0.163797
Train Epoch: 3 [ 19168/ 50000 (38%)] Learning rate: 0.0040 Loss: 0.171040
Train Epoch: 3 [ 22368/ 50000 (45%)] Learning rate: 0.0040 Loss: 0.164996
Train Epoch: 3 [ 25568/ 50000 (51%)] Learning rate: 0.0040 Loss: 0.167644
Train Epoch: 3 [ 28768/ 50000 (58%)] Learning rate: 0.0040 Loss: 0.164682
Train Epoch: 3 [ 31968/ 50000 (64%)] Learning rate: 0.0040 Loss: 0.168531
Train Epoch: 3 [ 35168/ 50000 (70%)] Learning rate: 0.0040 Loss: 0.167023
Train Epoch: 3 [ 38368/ 50000 (77%)] Learning rate: 0.0040 Loss: 0.172070
Train Epoch: 3 [ 41568/ 50000 (83%)] Learning rate: 0.0040 Loss: 0.165484
Train Epoch: 3 [ 44768/ 50000 (90%)] Learning rate: 0.0040 Loss: 0.174374
Train Epoch: 3 [ 47968/ 50000 (96%)] Learning rate: 0.0040 Loss: 0.174143
Test set: Average loss: 0.159425
Train Epoch: 4 [ 3168/ 50000 (6%)] Learning rate: 0.0040 Loss: 0.162641
Train Epoch: 4 [ 6368/ 50000 (13%)] Learning rate: 0.0040 Loss: 0.164765
Train Epoch: 4 [ 9568/ 50000 (19%)] Learning rate: 0.0040 Loss: 0.168638
Train Epoch: 4 [ 12768/ 50000 (26%)] Learning rate: 0.0040 Loss: 0.170502
Train Epoch: 4 [ 15968/ 50000 (32%)] Learning rate: 0.0040 Loss: 0.163475
Train Epoch: 4 [ 19168/ 50000 (38%)] Learning rate: 0.0040 Loss: 0.170376
Train Epoch: 4 [ 22368/ 50000 (45%)] Learning rate: 0.0040 Loss: 0.164857
Train Epoch: 4 [ 25568/ 50000 (51%)] Learning rate: 0.0040 Loss: 0.167976
Train Epoch: 4 [ 28768/ 50000 (58%)] Learning rate: 0.0040 Loss: 0.164582
Train Epoch: 4 [ 31968/ 50000 (64%)] Learning rate: 0.0040 Loss: 0.168683
Train Epoch: 4 [ 35168/ 50000 (70%)] Learning rate: 0.0040 Loss: 0.166684
Train Epoch: 4 [ 38368/ 50000 (77%)] Learning rate: 0.0040 Loss: 0.171560
Train Epoch: 4 [ 41568/ 50000 (83%)] Learning rate: 0.0040 Loss: 0.165219
Train Epoch: 4 [ 44768/ 50000 (90%)] Learning rate: 0.0040 Loss: 0.174026
Train Epoch: 4 [ 47968/ 50000 (96%)] Learning rate: 0.0040 Loss: 0.173639
Test set: Average loss: 0.159309
Train Epoch: 5 [ 3168/ 50000 (6%)] Learning rate: 0.0040 Loss: 0.162744
Train Epoch: 5 [ 6368/ 50000 (13%)] Learning rate: 0.0040 Loss: 0.164428
Train Epoch: 5 [ 9568/ 50000 (19%)] Learning rate: 0.0040 Loss: 0.168446
Train Epoch: 5 [ 12768/ 50000 (26%)] Learning rate: 0.0040 Loss: 0.170423
Train Epoch: 5 [ 15968/ 50000 (32%)] Learning rate: 0.0040 Loss: 0.163086
Train Epoch: 5 [ 19168/ 50000 (38%)] Learning rate: 0.0040 Loss: 0.170119
Train Epoch: 5 [ 22368/ 50000 (45%)] Learning rate: 0.0040 Loss: 0.164834
Train Epoch: 5 [ 25568/ 50000 (51%)] Learning rate: 0.0040 Loss: 0.167641
Train Epoch: 5 [ 28768/ 50000 (58%)] Learning rate: 0.0040 Loss: 0.164558
Train Epoch: 5 [ 31968/ 50000 (64%)] Learning rate: 0.0040 Loss: 0.168754
Train Epoch: 5 [ 35168/ 50000 (70%)] Learning rate: 0.0040 Loss: 0.166589
Train Epoch: 5 [ 38368/ 50000 (77%)] Learning rate: 0.0040 Loss: 0.171319
Train Epoch: 5 [ 41568/ 50000 (83%)] Learning rate: 0.0040 Loss: 0.165145
Train Epoch: 5 [ 44768/ 50000 (90%)] Learning rate: 0.0040 Loss: 0.173861
Train Epoch: 5 [ 47968/ 50000 (96%)] Learning rate: 0.0040 Loss: 0.173357
Test set: Average loss: 0.159252
Train Epoch: 6 [ 3168/ 50000 (6%)] Learning rate: 0.0040 Loss: 0.163019
Train Epoch: 6 [ 6368/ 50000 (13%)] Learning rate: 0.0040 Loss: 0.164251
Train Epoch: 6 [ 9568/ 50000 (19%)] Learning rate: 0.0040 Loss: 0.168384
Train Epoch: 6 [ 12768/ 50000 (26%)] Learning rate: 0.0040 Loss: 0.170225
Train Epoch: 6 [ 15968/ 50000 (32%)] Learning rate: 0.0040 Loss: 0.162895
Train Epoch: 6 [ 19168/ 50000 (38%)] Learning rate: 0.0040 Loss: 0.170042
Train Epoch: 6 [ 22368/ 50000 (45%)] Learning rate: 0.0040 Loss: 0.164821
Train Epoch: 6 [ 25568/ 50000 (51%)] Learning rate: 0.0040 Loss: 0.167372
Train Epoch: 6 [ 28768/ 50000 (58%)] Learning rate: 0.0040 Loss: 0.164525
Train Epoch: 6 [ 31968/ 50000 (64%)] Learning rate: 0.0040 Loss: 0.168579
Train Epoch: 6 [ 35168/ 50000 (70%)] Learning rate: 0.0040 Loss: 0.166480
Train Epoch: 6 [ 38368/ 50000 (77%)] Learning rate: 0.0040 Loss: 0.171241
Train Epoch: 6 [ 41568/ 50000 (83%)] Learning rate: 0.0040 Loss: 0.165065
Train Epoch: 6 [ 44768/ 50000 (90%)] Learning rate: 0.0040 Loss: 0.173829
Train Epoch: 6 [ 47968/ 50000 (96%)] Learning rate: 0.0040 Loss: 0.173157
Test set: Average loss: 0.159345
Train Epoch: 7 [ 3168/ 50000 (6%)] Learning rate: 0.0040 Loss: 0.163180
Train Epoch: 7 [ 6368/ 50000 (13%)] Learning rate: 0.0040 Loss: 0.164179
Train Epoch: 7 [ 9568/ 50000 (19%)] Learning rate: 0.0040 Loss: 0.168342
Train Epoch: 7 [ 12768/ 50000 (26%)] Learning rate: 0.0040 Loss: 0.169995
Train Epoch: 7 [ 15968/ 50000 (32%)] Learning rate: 0.0040 Loss: 0.162740
Train Epoch: 7 [ 19168/ 50000 (38%)] Learning rate: 0.0040 Loss: 0.169979
Train Epoch: 7 [ 22368/ 50000 (45%)] Learning rate: 0.0040 Loss: 0.164783
Train Epoch: 7 [ 25568/ 50000 (51%)] Learning rate: 0.0040 Loss: 0.167070
Train Epoch: 7 [ 28768/ 50000 (58%)] Learning rate: 0.0040 Loss: 0.164500
Train Epoch: 7 [ 31968/ 50000 (64%)] Learning rate: 0.0040 Loss: 0.168542
Train Epoch: 7 [ 35168/ 50000 (70%)] Learning rate: 0.0040 Loss: 0.166456
Train Epoch: 7 [ 38368/ 50000 (77%)] Learning rate: 0.0040 Loss: 0.171119
Train Epoch: 7 [ 41568/ 50000 (83%)] Learning rate: 0.0040 Loss: 0.164998
Train Epoch: 7 [ 44768/ 50000 (90%)] Learning rate: 0.0040 Loss: 0.173744
Train Epoch: 7 [ 47968/ 50000 (96%)] Learning rate: 0.0040 Loss: 0.172998
Test set: Average loss: 0.159525
Train Epoch: 8 [ 3168/ 50000 (6%)] Learning rate: 0.0040 Loss: 0.163299
Train Epoch: 8 [ 6368/ 50000 (13%)] Learning rate: 0.0040 Loss: 0.164069
Train Epoch: 8 [ 9568/ 50000 (19%)] Learning rate: 0.0040 Loss: 0.168318
Train Epoch: 8 [ 12768/ 50000 (26%)] Learning rate: 0.0040 Loss: 0.169810
Train Epoch: 8 [ 15968/ 50000 (32%)] Learning rate: 0.0040 Loss: 0.162647
Train Epoch: 8 [ 19168/ 50000 (38%)] Learning rate: 0.0040 Loss: 0.169944
Train Epoch: 8 [ 22368/ 50000 (45%)] Learning rate: 0.0040 Loss: 0.164746
Train Epoch: 8 [ 25568/ 50000 (51%)] Learning rate: 0.0040 Loss: 0.166869
Train Epoch: 8 [ 28768/ 50000 (58%)] Learning rate: 0.0040 Loss: 0.164391
Train Epoch: 8 [ 31968/ 50000 (64%)] Learning rate: 0.0040 Loss: 0.168394
Train Epoch: 8 [ 35168/ 50000 (70%)] Learning rate: 0.0040 Loss: 0.166326
Train Epoch: 8 [ 38368/ 50000 (77%)] Learning rate: 0.0040 Loss: 0.171027
Train Epoch: 8 [ 41568/ 50000 (83%)] Learning rate: 0.0040 Loss: 0.164934
Train Epoch: 8 [ 44768/ 50000 (90%)] Learning rate: 0.0040 Loss: 0.173725
Train Epoch: 8 [ 47968/ 50000 (96%)] Learning rate: 0.0040 Loss: 0.172872
Test set: Average loss: 0.159705
Train Epoch: 9 [ 3168/ 50000 (6%)] Learning rate: 0.0040 Loss: 0.163319
Train Epoch: 9 [ 6368/ 50000 (13%)] Learning rate: 0.0040 Loss: 0.164034
Train Epoch: 9 [ 9568/ 50000 (19%)] Learning rate: 0.0040 Loss: 0.168286
Train Epoch: 9 [ 12768/ 50000 (26%)] Learning rate: 0.0040 Loss: 0.169652
Train Epoch: 9 [ 15968/ 50000 (32%)] Learning rate: 0.0040 Loss: 0.162581
Train Epoch: 9 [ 19168/ 50000 (38%)] Learning rate: 0.0040 Loss: 0.169919
Train Epoch: 9 [ 22368/ 50000 (45%)] Learning rate: 0.0040 Loss: 0.164722
Train Epoch: 9 [ 25568/ 50000 (51%)] Learning rate: 0.0040 Loss: 0.166746
Train Epoch: 9 [ 28768/ 50000 (58%)] Learning rate: 0.0040 Loss: 0.164353
Train Epoch: 9 [ 31968/ 50000 (64%)] Learning rate: 0.0040 Loss: 0.168298
Train Epoch: 9 [ 35168/ 50000 (70%)] Learning rate: 0.0040 Loss: 0.166287
Train Epoch: 9 [ 38368/ 50000 (77%)] Learning rate: 0.0040 Loss: 0.170995
Train Epoch: 9 [ 41568/ 50000 (83%)] Learning rate: 0.0040 Loss: 0.164889
Train Epoch: 9 [ 44768/ 50000 (90%)] Learning rate: 0.0040 Loss: 0.173729
Train Epoch: 9 [ 47968/ 50000 (96%)] Learning rate: 0.0040 Loss: 0.172800
Test set: Average loss: 0.159804
Train Epoch: 10 [ 3168/ 50000 (6%)] Learning rate: 0.0040 Loss: 0.163326
Train Epoch: 10 [ 6368/ 50000 (13%)] Learning rate: 0.0040 Loss: 0.164024
Train Epoch: 10 [ 9568/ 50000 (19%)] Learning rate: 0.0040 Loss: 0.168285
Train Epoch: 10 [ 12768/ 50000 (26%)] Learning rate: 0.0040 Loss: 0.169597
Train Epoch: 10 [ 15968/ 50000 (32%)] Learning rate: 0.0040 Loss: 0.162569
Train Epoch: 10 [ 19168/ 50000 (38%)] Learning rate: 0.0040 Loss: 0.169920
Train Epoch: 10 [ 22368/ 50000 (45%)] Learning rate: 0.0040 Loss: 0.164698
Train Epoch: 10 [ 25568/ 50000 (51%)] Learning rate: 0.0040 Loss: 0.166688
Train Epoch: 10 [ 28768/ 50000 (58%)] Learning rate: 0.0040 Loss: 0.164336
Train Epoch: 10 [ 31968/ 50000 (64%)] Learning rate: 0.0040 Loss: 0.168244
Train Epoch: 10 [ 35168/ 50000 (70%)] Learning rate: 0.0040 Loss: 0.166240
Train Epoch: 10 [ 38368/ 50000 (77%)] Learning rate: 0.0040 Loss: 0.170987
Train Epoch: 10 [ 41568/ 50000 (83%)] Learning rate: 0.0040 Loss: 0.164852
Train Epoch: 10 [ 44768/ 50000 (90%)] Learning rate: 0.0040 Loss: 0.173748
Train Epoch: 10 [ 47968/ 50000 (96%)] Learning rate: 0.0040 Loss: 0.172743
Test set: Average loss: 0.159894
I'd like to ask how midi files of Nottingham dataset can converse to discrete squences ,by time or by frames?
How can we use the TCN as an encoder? How can we use the network to produce a single vector representation of the input like the context vector for RNNs? Do we take the last timestep of the output like here?
Line 15 in 2221de3
Hello,
Thank you for your great paper and sharing!
I'm wondering how to use TCN to solve time series regression problems. In my time series scenario, data for each moment contains multiple variables and each variable is a real number. For example, data for time step 0 is something like "vector_0 = <0.1, 0.2, 0.3, ...>", and I want to use historical k vectors to predict the next vector data.
I have developed a LSTM model for this question. The input shape of LSTM model is "batch_size, time_steps(k), input_size(length of each vector)", and the prediction result is the last value of LSTM. Then I could calculate the MSE loss and do backward. How can I use TCN to solve this problem?
Best Regards
Hi. How do I use cnn for seq2seq? I understand that cnn can be used as a encoder. What about decoder? I looked through tensorflow transformer, where you iteratively generate one symbol a time by feeding <start symbol, none, none, ...>, then <start symbol, 1st generated symbol, none, none ,...> and so on into decoder and use a mask to avoid backward information flow. Is that the same approach here ?
Nice work! I'm researching time series regression using machine learning so I'm looking at LSTM, TCN and Transformers based models and getting good results with your model.
One general question? I'm not sure I understand the reason why we pad each layer of a TCN at all. I understand that it ensures each layer produces a sequence of the same length so there's a benefit in that your predictions are aligned with your inputs. But it's very similar to initialising an AR(p) model with a vector of zeros when you predict forward - the initial predictions will all be "wrong" until the effect of the initial state has decayed out. LSTM's also have this issue - most applications seem to set the initial state per batch to zero which results in transients errors at the start of the batch (some authors train a separate model to estimate the initial state which I've had good success with). I would assume this would impact training as well and it seems to make sense to mask out the start of the output sequence when calculating the loss or the model may try and adapt to "fix" the impact of the wrong ic.
Certainly when I train a regression-based TCN I can observe transient errors at the start of the prediction - i.e. the diagram below underpredicts for the first 96 samples (that's 1 day of 15minute electricity consumption) then overpredicts for the first week before settling down. Interested in your thoughts.
Also, one general observation - the prediction from TCN seem noisier than LSTM, I thought the long AR window might filter out more noise than it has. Plus it's quite sensitive to learning rate - low learning rate produces a very noisey output sequence.
Hi,
Thanks for this great paper ... I am trying to use this architecture in auto-encoder setting such that the encoder part is a stack of strided-dilated-causal conv layers and now thinking about the decoder part.
In terms of up-sampling using transposed convolutions, does it follow the same intuition in order to have causal up-sampling (i.e. to exclude the reconstructions of future part) ? Or shall we generate sample-by-sample without transposed conv layers ?
With many thanks in advance
Best Regards
Is it just the testing result in the last epoch using default parameters? I have tried to run add_test.py and below is the result i get for the 10 epochs.
Test set: Average loss: 0.168699
Test set: Average loss: 0.001142
Test set: Average loss: 0.000922
Test set: Average loss: 0.000345
Test set: Average loss: 0.000143
Test set: Average loss: 0.000188
Test set: Average loss: 0.000121
Test set: Average loss: 0.000028
Test set: Average loss: 0.000244
Test set: Average loss: 0.000042
Which one should I use for benchmarking? In the paper, the result of TCN was 5.8e-5 but it seems like we can use 2.8e-5 or 4.2e-5 here.
It seems to me that LSTM is faster when the sequence length is short (say 28).
When the sequence length is long (say 784), LSTM will be much slower than TCN.
It seems to me for TCN, the computation time is independent of the sequence length.
Am I correct?
Hi,
I have a few questions about strategies to tune hyperparameters of TCN. Apparently, like any other neural network-based architecture, the performance of TCN is sensitive to the hyperparameter values whose ~ideal values vary (sometimes significantly) from task to task. It is easy to think about tuning some hyperparameters like kernel size and #levels but not others. Hence my questions are:
Thanks!!
Thanks for sharing, I'm confused about tcn model for mnist_pixel. In mnist model, why just y1[:, :, -1] is ok? before linear layer. why discard others?
Hey, I was just looking at Figure 1b) in the paper showing the residual block, which adds only applies an optional 1x1 conv to the identity connection before adding the result to the output from the convolutional chain
But in the code, I see that you apply relu after adding the two outputs? Is that a mistake? I can't see it in Figure 1b).
See here:
https://github.com/locuslab/TCN/blob/master/TCN/tcn.py#L45
This is a great set of experiments! I'm wondering if the code for the RNN/LSTM baselines reported in the paper are available somewhere. At present, I only see code for the TCN model.
Thanks!
I was wondering if the repository in its current state has code to build the residual blocks shown in the paper (Figure 1 (b)).
I'm trying to use your paper's insights to build CNN architectures for sequence modelling using keras and was a bit confused about implementing the shown residual block of 2 conv layers followed by a residual connection.
I don't get why on test time (or when evaluating the model on a validation set), we don't compute the loss on all the sequence and not only on a part of the sequence that ensures sufficient history.
The model is not evaluated on the whole dataset but only on a sub-part, are the results reliable ? or even comparable to other models (LSTM, ect ) that doesn't use this method ?
is the figure 4 (b) of the paper for T=1000 or T=2000? If its for T=1000 then shouldn't the random guess baseline loss be 0.02?
Hi,
I am a student. When I run your code, I have a question to think about. Your MNIST pixel task is very good. The data set is very formal. When I have a self-data set, what changes should I make on your basis and how to do them?
Best Regards
I'm sorry for asking what I expect has a really obvious answer. I downloaded the zip file and I get the following results when I try add_test.py
(base) F:\PersystCode\Python\TCN-master\TCN\adding_problem>python add_test.py
Traceback (most recent call last):
File "add_test.py", line 5, in
from TCN.adding_problem.model import TCN
ModuleNotFoundError: No module named 'TCN'
Clearly I need to install the TCN module. I tried >python tcn.py install, but that didn't seem to do anything.
Hi! I'm learning your TCN architecture and I got stuck with an understanding of the next part.
In the code:
https://github.com/locuslab/TCN/blob/master/TCN/tcn.py#L30-L31
You sequentially add conv layer and chomp1d layer. I see that chomp1d cuts the last padding
elements (which are redundant as I understand).
I assume that chomp1d == 1x1 conv described in the paper. But the code doesn't look like the same thing described the paper because:
My question is: 1x1 convolution from Fig 1.b == chomp1d? If yes then why it's different from the scheme in the paper and does it conceptually affect somehow? If not did you implement it somewhere or it's a good chance to try?)
I didn't work with NN a lot so I don't know some common thing and try to figure it out.
Thank you in advance!
Oleh
To encourage exploration of TCN networks, this code needs a license, for example Mozilla Public License 2.0, that allows commercial and non-commercial entities to build upon this work.
Without such an explicit license, this code can not be used and built upon by other entities without exposing them to legal risk.
I want to learn how to process text data using tf.keras and tcn
I have a totally different sequence, the smallest length is about 100 words, the max is about 5000. I attempt to padding zero to the same length of 5000, but the classified result is terrible. but if I just input different size, that's means keep the original and make the batch_size just 1, that's works well. I don't know why this happens.
If i want to use TCN with a long step length, how i calculate the receptive area to guide the num_channels design?
I haven't managed to find the datasets referenced in the utils inside the poly_music example.
With the observations package there is a jsb_chorales only dataset, but it is not pre-processed to be used by this algorithm, I have tried to adapt it but without success. Can you provide the link to download the pre-processed data .mat files?
Hi,
In which part of the code you make sure the network is causal? Is it the Chomp1d
?
Thanks a lot!
Amir
I notice that in your code, you init all the weight like below. Is there any special reason for doing such way? why not init with xavier or other method?
def init_weights(self): self.conv1.weight.data.normal_(0, 0.01) self.conv2.weight.data.normal_(0, 0.01) if self.downsample is not None: self.downsample.weight.data.normal_(0, 0.01)
How inference the next word / char in the word or character based TCN? Shall I use get_batch
like
data, targets = get_batch(data_source, i, args, seq_len=1, evaluation=False)
output = model(data)
Thank you.
So in order to make this architecture non-causal, assuming the kernel size is 3, I just need to remove the chomps and make the padding equal to the dilation, right?
Lines 20-22 make the cuda
arg True
by default:
parser.add_argument('--cuda', action='store_false',
help='use CUDA (default: True)')
Yet lines 60-62 test as if it expects args.cuda
to be False
by default:
if torch.cuda.is_available():
if not args.cuda:
print("WARNING: You have a CUDA device, so you should probably run with --cuda")
Maybe you could swap in the following for the argparser --cuda
argument:
parser.add_argument('--cuda', action='store_false', default=False,
help='use CUDA (default: True)')
For people wanting to test drive the code on small machines (i.e my laptop) it helps remove one snag. I can do a quick PR if you like. Thanks for sharing the code for these experiments, they are super useful!
My goal is to train a model that can output sequences of text from image inputs. Using the IAM handwriting dataset for example, we would pass the model an image
and expect it to return "broadcast and television report on his". Historically, the common (i.e. recurrent) way to accomplish this would be an encoder (CNN) + decoder (LSTM) architecture like OpenNMT's implementation. I am interested in replacing the decoder with a TCN, but am unsure how to approach the image data. The CNN encoder will create a batch of N
features maps with reduced spatial dimensions (H', W')
The issue is a TCN expects 3D tensors (N, L, C)
whereas each "timestep" of the image is 2D (N, H, W, C)
. Following the p-MNIST example in the paper, we could flatten the image into a 1D sequence with length H' x W'
. Then the TCN would effectively snake through the pseudo-timesteps like below
However, if we want one prediction per timestep it makes much more sense to define a left-to-right sequence instead of a snaking one since that's the direction the text is depicted in the image. Did you experiment at all with image to text models, and if so, how did you chose to represent the images?
I also wonder about the loss function for training a TCN decoder. Assuming you divide the image width into more timesteps than your maximum expected sequence length, it seems like connectionist temporal classification (CTC) would be a good choice. Then you do not have to worry about alignment between the target sequence and model's prediction. For instance, "bbb--ee-cau--sssss----e" would be collapsed to "because" by combining neighboring duplicates and removing blanks. Do you agree or is there a different loss function you would suggest?
I used zero padding to make records in my batch share the same length. Is there any other preprocessing required in order to use the model? (scaling, etc.)
Hi, I want to ask a simple problem about the mnist classification example. Images in mnist are treated as sequences by expanding them to 1D, then each image gets a probability distribution. But images have no relations, so what is the meaning of tcn here when you process each sequence separately. It looks like a fully connected layer. Did I misunderstand the process procedure?
i am currently training a model that requires a very large number of levels (effective history to be looked at) and i cannot fit it onto 1 GPU ... how can we parallelize TCNs to scale across multiple GPUs to fit one single model?
Could you clarify if spatial dropout is used? The paper suggests that it is, but the code seems to use standard dropout.
I am trying to reproduce the results for the polyphonic music (Nott dataset), but I am having trouble with setting the correct TCN hyperparams. Are the ones shown in Table 2 of the paper the ones I should use? If so, then why are the default parameters set differently, e.g. kernel size to 5 instead of 6 for Nott (or 3 for JSB)?
Also, when I manually set the parameters as shown in the paper, by using
--dropout 0.2 --clip 0.4 --ksize 6 --levels 4 --nhid 150
as parameters, then the TCN model has 2M parameters, but in the paper the model size is given as 1M (roughly). So now I am confused whether this is actually the setting used to produce the paper results?
Hi Jerry, thanks for the nice work, I'm trying to use tcn for a multi-variate time series task, suppose two data points with different sequence length are x1 = [t1, t2, t3, t4], x2 = [t1, t2, t3, t4, t5, t6], and corresponding labels for each time-step are [y1, y2, y3, y4] and [y1, y2, y3, y4, y5, y6], padding them to the same length: x1 = [0, 0, t1, t2, t3, t4], x2 = [t1, t2, t3, t4, t5, t6] for mini-batch input.
My question is, given x1 and x2, how the model output the prediction for each time-step? like for x1 we want the model output [z0, z1, z2, z3, z4, z5] and we only take [z2, z3, z4, z5] for binary cross entropy loss with it's true label [y1, y2, y3, y4].
my current approach is refer to the mnist example, by modify the models forward with:
def forward(self, inputs):
"""Inputs dimension (N, C_in, L_in)"""
seq_len = inputs.shape[2]
y1 = self.tcn(inputs)
o = torch.cat([self.linear(y1[:,:,i]) for i in range(0, seq_len)], dim=1)
return o
this will output the same length as input, but I'm not sure if this is the correct understanding for tcn model, or there should be some other approaches.
Thank you.
Q1
I noticed that in all examples you use the same hidden layer size across your layers. I was wondering if you've tried altering this and instead use a more tradition V shape, where earlier layers have more filters, gradually mapping down to fewer and fewer in deeper layers.
Q2
In the adding problem you use the very last element of the linear layer following the TCN to obtain a single scalar output.
self.linear1 = nn.Linear(num_channels[-1], output_size)
y1 = tcn(x)
out = F.relu(self.linear1(y1[:, :, -1]))
I was wondering if it would make sense to instead learn a combination across all time steps like so:
self.linear1 = nn.Linear(num_channels[-1], output_size)
self.linear2 = nn.Linear(seq_len, 1)
y1 = tcn(x)
y2 = F.relu(self.linear1(y1.transpose(1, 2)))
y3 = self.linear2(y2.transpose(1, 2))[:, :, 0]
Here, we combine all filters from the last layer with a linear layer just like you did, but also combine across all time-steps.. Is there any reason for not doing so?
Thanks a lot!
Hi,
As it is written in the paper, input and output should be the same length and output at time t depends on previous values of input. When I look at the implementation of adding problem, I see that input is 2*T where T is 200,300,400 etc. However output is just a scalar. What is the explanation for this ?
So, you guys stated that tcn can be used as a dropped in replacement for lstm.
Lets assume I have a batch of images of shape: N x T x C x H x W.
I reshape the images to be of size N X T X (-1). This is the x given to the forward function of the TCN. TCN is initialized with number of inputs T and the number of channels is [T] * num_levels_tcn. This means that effectively the TCN slides over (C x H x W) , or am I misunderstanding something ? I was under the impression (from the images in the paper) that TCN would slide over time.
The mnist_pixel scripts presumes that data/mnist/processed folder exists
And I also ran into the the pytorch version issue with data[0] should be data.item() or you get a IndexError: invalid index of a 0-dim tensor.
In your paper, you mentioned that TCN should be causal and in Figure 1 it seems the conv is causal indeed.
But in this implementation, I see the only tweek is chomp1D and you used conv1d directly?
Can we say that pytorch's conv1d is causual?
I'm using your TCN module for a language modeling task. My code follows the structure of your char_cnn code. It works but the performance is very bad compared to an LSTM network. Each epoch with the TCN network takes about 10 times longer. Do you know if the performance can be improved? Here is the forward method from the TCN class:
def forward(self, x):
emb = self.drop(self.encoder(x))
y = self.tcn(emb.transpose(1, 2))
o = self.decoder(y.transpose(1, 2))
return o.contiguous()
Perhaps it is the transpose calls that is making the code slow?
Sorry, silly question..
Let's say my seq_len is 100 and these are time-ordered, so t0 < t1 < ... < t100
and I have a few instances in my batch which are shorter. So I'll pad them with zeros naturally from the left, i.e:
batch_size=2
0, 0, 0, t1 < t2 < ... < t97
0, 0, 0, 0, t1 < t2 < ... < t96
Am I right in assuming that TCN will read (enforce causality) from left to right, i.e. the future time points have to be always to the right of the older ones?
Thanks in advance..
if i have many articles , they have different words. these articles maybe have some relationships. How can i integrate them?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.