vincent-leguen / dilate Goto Github PK

View Code? Open in Web Editor NEW

353.0 353.0 73.0 4.7 MB

Code for our NeurIPS 2019 paper "Shape and Time Distortion Loss for Training Deep Time Series Forecasting Models"

License: Other

Python 100.00%

dilate's People

Contributors

Stargazers

Watchers

dilate's Issues

Pure PyTorch implementation

I love the idea behind DILATE and would like to include it in pytorch-forecasting. However, a GPU-only implementation is probably needed for wider adoption. Do you plan on a CUDA or performant pure PyTorch implementation?

As a loss function for Boosted Learners

Really interesting stuff!

Can this be used as a loss function for boosted learners, I'm thinking GBMs/GBDTs. The requirements are twice continuously differentiable. Is this the case with DILATE?

Potential misuse of parameter input

Dear author, thank you so much for sharing your work, but I may ask if there may be a potential misuse of parameter input. Could you please take the time to check it?

in your main.py, lines 53 to 54, you use:

if (loss_type=='dilate'):
loss, loss_shape, loss_temporal = dilate_loss(target,outputs,alpha, gamma, device)
so the first arg is target(ground truth) and the second arg is output (prediction)

But in your dilate_loss.py, line 5, you use:

def dilate_loss(outputs, targets, alpha, gamma, device):

So the first arg is changed to output (prediction) and the second arg is the target(ground truth).

So may I ask should the code in main.py, lines 53 to 54, to be changed as:

if (loss_type=='dilate'):
loss, loss_shape, loss_temporal = dilate_loss(outputs,target,alpha, gamma, device)

Because in your paper, you mention the latter parameter is the ground truth.

Thanks a lot! Looking forward to your reply!

why my loss_shape is negative

i have a question about my loss_shape, i don't know why the value is negative. i think it's not right, but i don't know why

The numba and python version the code runs at

Hi, I am trying to run your code and find the code will report error no matter what numba version I tried with python version of 3.9. Do you know the version of numba and other environment settings for running the code?
In my case：problem happens in "from numba.np import npyimpl" called by "path, sim = dtw_path(target_k_cpu, output_k_cpu)"
Numba =0.53 or 0.56: Final error message from numba/np/npyimpl.py: kernel = kernels[ufunc]; KeyError: <ufunc 'invert'>
Numba == 0.47: Final error message: ImportError: cannot import name 'npyimpl' from 'numba.np'

Notation

I am looking forward to your code! It an interesting paper.

Is your arxiv paper already your final version or is it possible to still suggest improvements? My main suggestion is to rework some of the notations. Just a short example: You use

{x_i}_{i\in{ 1:N }}

which already considered to be a notation reserved to denote sequences, but you use it to describe a set. Instead, I'd suggest to use the set builder notation: https://en.m.wikipedia.org/wiki/Set-builder_notation

Also the notation of the pairwise cost matrix is difficult.

Is <> denoting an inner product? How is it defined if it is not the standard for product? Logically I think you want to use the element wise hardamard product, which uses a circle with a dot in the middle as a symbol (but should still be mentioned as such), together with a L2 norm: || A⊙ \Delta(...) ||_2

How is A chosen in (2)?

The result diagram of the running code is inconsistent with the paper.

Thanks a lot for the code release. I have a small question to ask you.
Why did I run the code and there were only two pictures output, but there were three（seq2seq MSE,seq2seq DTW Seq2seq DILAT） in the paper,
Looking forward to your reply

Multidimensional outputs

Thanks a lot for the code release.

I had a doubt regarding your code. How can I extend the loss function for cases with multi-dimensional outputs like vehicle trajectory forecasting (2D)? Currently, the code does not support this I assume.

Any suggestions on the changes I need to make to the code / other references would be really helpful.

License is missing

It would be nice if the license is indicated in the LICENSE file or somewhere.

Without a proper license, we cannot use the code without legal concerns.

Data Load

The dataloading stage is quite intense, is there possibly a way to showcase how a generaly pandas df with date index to be converted to fit your format here/

X_train_input,X_train_target,X_test_input,X_test_target,train_bkp,test_bkp = create_synthetic_dataset(N,N_input,N_output,sigma)
dataset_train = SyntheticDataset(X_train_input,X_train_target, train_bkp)
dataset_test  = SyntheticDataset(X_test_input,X_test_target, test_bkp)
trainloader = DataLoader(dataset_train, batch_size=batch_size,shuffle=True, num_workers=1)
testloader  = DataLoader(dataset_test, batch_size=batch_size,shuffle=False, num_workers=1)

Traffic datasets

Dear author,
I appreciate your work. I notice that Traffic datasets has mentioned in the original paper. However, the content of this version doesn't release the relevant code of it, especially the part of the data input. Would you please sent the code of it? Thank you! My email address is [email protected]
Looking for your reply!
Kind regards

found a small formal bug

Hi @vincent-leguen,

nice work, congrats.
I am still studying the way you calculate DILATE loss.
However I think there is a small formal bug in main.py at row 78:

bath_size, N_output = target.shape[0:2]

should be:

batch_size, N_output = target.shape[0:2]

it is a small typo, but I do't think it impacts the normal flow of the code.
Cheers

A quick question

Hi,
Really appreciate your great work. I really enjoyed reading through your paper and code.

I am curious how long does your model take for training (with and without using GPU) on long time steps like for ECG5000 and traffic cases?

Thanks in advance

Cannot reproduce same dtw loss for ECG5000 dataset

Hello Everyone! Christmas almost arrives, I wish you all Christmas :)

Unfortunately, my code is not happy....when I try to reproduce the same result as Vincent's paper for ECG5000 dataset, I failed....

The dataset ECG5000 that I used is http://storage.googleapis.com/download.tensorflow.org/data/ecg.csv

Maybe it's the problem of the dataset?
My result for sequence to sequence model MSE loss function:

epoch  0  loss  1.0772087574005127  loss shape  0  loss temporal  0
 Eval mse=  0.8767168291977474  dtw=  6.805486422927132  tdi=  2.0440532069970843
epoch  50  loss  0.3731319308280945  loss shape  0  loss temporal  0
 Eval mse=  0.3903078040906361  dtw=  3.0475928054320987  tdi=  1.055505193148688
epoch  100  loss  0.24237403273582458  loss shape  0  loss temporal  0
 Eval mse=  0.3061082886798041  dtw=  2.4698390916706847  tdi=  0.9395733418367348
epoch  150  loss  0.2584376335144043  loss shape  0  loss temporal  0
 Eval mse=  0.22645755005734308  dtw=  1.9507582848166412  tdi=  0.7841070517492711
epoch  200  loss  0.15192793309688568  loss shape  0  loss temporal  0
 Eval mse=  0.24554287110056197  dtw=  2.016065729844596  tdi=  0.6677225765306123
epoch  250  loss  0.1566656529903412  loss shape  0  loss temporal  0
 Eval mse=  0.2019440990473543  dtw=  1.880212425006847  tdi=  0.7010162172011662
epoch  300  loss  0.12690874934196472  loss shape  0  loss temporal  0
 Eval mse=  0.19364993029407093  dtw=  1.852122344428214  tdi=  0.6694605502915453
epoch  350  loss  0.12332551181316376  loss shape  0  loss temporal  0
 Eval mse=  0.1977188979940755  dtw=  1.8767601223560113  tdi=  0.6332683126822158
epoch  400  loss  0.10750801116228104  loss shape  0  loss temporal  0
 Eval mse=  0.21163474129778997  dtw=  1.8957054875381265  tdi=  0.7194324890670553
epoch  450  loss  0.10328985005617142  loss shape  0  loss temporal  0
 Eval mse=  0.19005996425236973  dtw=  1.8016342832546404  tdi=  0.6326609876093293
epoch  500  loss  0.0954132080078125  loss shape  0  loss temporal  0
 Eval mse=  0.1958838226539748  dtw=  1.7668837248907803  tdi=  0.6169864249271136
epoch  550  loss  0.09286423027515411  loss shape  0  loss temporal  0
 Eval mse=  0.19250875785946847  dtw=  1.7858421110580047  tdi=  0.6433294460641399
epoch  600  loss  0.09554877132177353  loss shape  0  loss temporal  0
 Eval mse=  0.19318228970680917  dtw=  1.8026873947053852  tdi=  0.6590691508746356
epoch  650  loss  0.06814754754304886  loss shape  0  loss temporal  0
 Eval mse=  0.19417715136493954  dtw=  1.7603379046970657  tdi=  0.672698250728863
epoch  700  loss  0.07659073919057846  loss shape  0  loss temporal  0
 Eval mse=  0.21282084967408862  dtw=  1.7930373014810026  tdi=  0.6734083454810496
epoch  750  loss  0.07163602858781815  loss shape  0  loss temporal  0
 Eval mse=  0.20653479067342623  dtw=  1.7746248434154144  tdi=  0.6520079263848396
epoch  800  loss  0.06505869328975677  loss shape  0  loss temporal  0
 Eval mse=  0.19753393722432  dtw=  1.7156214133114422  tdi=  0.6494909803206996
epoch  850  loss  0.07344229519367218  loss shape  0  loss temporal  0
 Eval mse=  0.194216572280441  dtw=  1.7329767024008997  tdi=  0.6313555029154518
epoch  900  loss  0.06015300750732422  loss shape  0  loss temporal  0
 Eval mse=  0.20844823292323522  dtw=  1.741320480761508  tdi=  0.6895331632653061
epoch  950  loss  0.05017583444714546  loss shape  0  loss temporal  0
 Eval mse=  0.20004522502422334  dtw=  1.7107445075037588  tdi=  0.6221432215743441

The result from Vincent's paper: mse: 0.212 dtw: 0.178 tdi: 0.827
Compared to my result, the mse and tdi is Ok, but why dtw is tooooo far away ???? I don't know why....
Really hope someone can help me!!!!

The code is the same as Vincent, but I will also copy here in order to detect if I made some stupid mistakes..

import numpy as np
import torch
from data.synthetic_dataset import create_synthetic_dataset, SyntheticDataset
from models.seq2seq import EncoderRNN, DecoderRNN, Net_GRU
from loss.dilate_loss import dilate_loss
from torch.utils.data import DataLoader
import random
from tslearn.metrics import dtw, dtw_path
import matplotlib.pyplot as plt
import warnings
import warnings; warnings.simplefilter('ignore')
import pandas as pd

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
random.seed(0)

dataframe = pd.read_csv('http://storage.googleapis.com/download.tensorflow.org/data/ecg.csv', header=None)

X_train_input=dataframe.iloc[0:500,0:84].values
X_test_input=dataframe.iloc[500:4000,0:84].values
X_train_target=dataframe.iloc[0:500,84:140].values
X_test_target=dataframe.iloc[500:4000,84:140].values

batch_size = 50
N_input = 84
N_output = 56
gamma = 0.01

dataset_train = SyntheticDataset(X_train_input,X_train_target)
dataset_test  = SyntheticDataset(X_test_input,X_test_target)

trainloader = DataLoader(dataset_train, batch_size=batch_size,shuffle=True, num_workers=0,drop_last=True)
testloader  = DataLoader(dataset_test, batch_size=batch_size,shuffle=False, num_workers=0,drop_last=True)

def train_model(net,loss_type, learning_rate, epochs=1000, gamma = 0.01,
                print_every=50,eval_every=50, verbose=1, Lambda=1, alpha=0.5):
    
    optimizer = torch.optim.Adam(net.parameters(),lr=learning_rate)
    criterion = torch.nn.MSELoss()
    
    for epoch in range(epochs): 
        for i, data in enumerate(trainloader, 0):
            inputs, target = data
            inputs = torch.tensor(inputs, dtype=torch.float32).to(device)
            target = torch.tensor(target, dtype=torch.float32).to(device)
            batch_size, N_output = target.shape[0:2]                     

            # forward + backward + optimize
            outputs = net(inputs)
            loss_mse,loss_shape,loss_temporal = torch.tensor(0),torch.tensor(0),torch.tensor(0)
            
            if (loss_type=='mse'):
                loss_mse = criterion(target,outputs)
                loss = loss_mse                   
 
            if (loss_type=='dilate'):    
                loss, loss_shape, loss_temporal = dilate_loss(target,outputs,alpha, gamma, device)             
                  
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()          
        
        if(verbose):
            if (epoch % print_every == 0):
                print('epoch ', epoch, ' loss ',loss.item(),' loss shape ',loss_shape.item(),' loss temporal ',loss_temporal.item())
                eval_model(net,testloader, gamma,verbose=1)
  

def eval_model(net,loader, gamma,verbose=1):   
    criterion = torch.nn.MSELoss()
    losses_mse = []
    losses_dtw = []
    losses_tdi = []   

    for i, data in enumerate(loader, 0):
        loss_mse, loss_dtw, loss_tdi = torch.tensor(0),torch.tensor(0),torch.tensor(0)
        # get the inputs
        inputs, target = data
        inputs = torch.tensor(inputs, dtype=torch.float32).to(device)
        target = torch.tensor(target, dtype=torch.float32).to(device)
        batch_size, N_output = target.shape[0:2]
        outputs = net(inputs)
         
        # MSE    
        loss_mse = criterion(target,outputs)    
        loss_dtw, loss_tdi = 0,0
        # DTW and TDI
        for k in range(batch_size):         
            target_k_cpu = target[k,:,0:1].view(-1).detach().cpu().numpy()
            output_k_cpu = outputs[k,:,0:1].view(-1).detach().cpu().numpy()

            path, sim = dtw_path(target_k_cpu, output_k_cpu)   
            loss_dtw += sim
                       
            Dist = 0
            for i,j in path:
                    Dist += (i-j)*(i-j)
            loss_tdi += Dist / (N_output*N_output)            
                        
        loss_dtw = loss_dtw /batch_size
        loss_tdi = loss_tdi / batch_size

        # print statistics
        losses_mse.append( loss_mse.item() )
        losses_dtw.append( loss_dtw )
        losses_tdi.append( loss_tdi )

    print( ' Eval mse= ', np.array(losses_mse).mean() ,' dtw= ',np.array(losses_dtw).mean() ,' tdi= ', np.array(losses_tdi).mean()) 

encoder = EncoderRNN(input_size=1, hidden_size=128, num_grulstm_layers=1, batch_size=batch_size).to(device)
decoder = DecoderRNN(input_size=1, hidden_size=128, num_grulstm_layers=1,fc_units=16, output_size=1).to(device)
net_gru_mse = Net_GRU(encoder,decoder, N_output, device).to(device)
train_model(net_gru_mse,loss_type='mse',learning_rate=0.001, epochs=1000, gamma=gamma, print_every=50, eval_every=50,verbose=1)

I also changed a little bit the format in synthetic_dataset.py But I don't think that is matter

class SyntheticDataset(torch.utils.data.Dataset):
    def __init__(self, X_input, X_target):
        super(SyntheticDataset, self).__init__()  
        self.X_input = X_input
        self.X_target = X_target
        
    def __len__(self):
        return (self.X_input).shape[0]

    def __getitem__(self, idx):
        return (self.X_input[idx,:,np.newaxis], self.X_target[idx,:,np.newaxis])

Have a good day !!!!! Hopefully someone can answer that :)

Question about TDI metric

In the paper "Laura et al., Assessing energy forecasting inaccuracy by simultaneously considering
temporal and absolute errors, 2017", TDI is defined in the range of [0,1].

"TDI is a dimensionless number varying in the interval [0,1], where 0 corresponds with null temporal distortion and 1 with maximum temporal distortion"

However, your results are in the range of [0.xx, 2.xx]. (Table 1)
What is the difference between the TDI measure defined in the above paper and yours?

Financial Time Series

Were you able to test this model against financial time series? Does the model work well with more-complex time series, such as FOREX or stock market forecasting?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.