Giter VIP home page Giter VIP logo

Comments (23)

shchur avatar shchur commented on July 27, 2024

You should use F.log_softmax(prior_logits, dim=-1) to obtain the prior logits that would sum up to 1 when exponentiated. I am not 100% sure if the rest of the code is correct - there might also be some subtle bug with the shapes. I will publish a notebook showing how to compute the means next week after the NeurIPS deadline.

from ifl-tpp.

guoshnBJTU avatar guoshnBJTU commented on July 27, 2024

In my code, base_dist = NormalMixtureDistribution() and I directly use the function NormalMixtureDistribution.get_params() in your code to get prior_logits. So it does sum up to 1 when exponentiated. There may be some other bugs in my code. Looking forward to your notebook to be published. Thank you.

from ifl-tpp.

guoshnBJTU avatar guoshnBJTU commented on July 27, 2024

Hi, when are you available to release the code about getting the mean? Thank you very much~

from ifl-tpp.

shchur avatar shchur commented on July 27, 2024

Hi, as you probably heard, the NeurIPS deadline was extended, so I was busy all of last week.

Here is a simple script that computes the means for the test set:

# Obtain a dataloader for the entire test set
all_test = torch.utils.data.DataLoader(d_test, batch_size=len(d_test), shuffle=False, collate_fn=collate)
for batch in all_test:
    break

h = model.rnn(batch)
gmm = model.decoder.base_dist
prior_logits, means, log_scales = gmm.get_params(h, None)
prior = prior_logits.exp()
scales_squared = (log_scales * 2).exp()

affine = model.decoder.transforms[0]
a = affine.log_scale.exp().item()
b = affine.shift.item()

mean_time = (prior * torch.exp(a * means + b +  0.5 * a**2 * scales_squared)).sum(-1)

You can simply run this code in a new cell after executing all the code in interactive.ipynb. Make sure to use a mask to only consider the times corresponding to events that happened, if you want to compute the total MAE or MSE (the tensor mean_time has shape [batch_size, max_seq_len] and is padded to the maximum sequence length in the test set).

As a general comment though, if you only care about only computing the MAE or MSE for the inter-event times, you probably don't need our model (or a probabilistic model at all). You could simply train an RNN to only output the expected inter-event time and directly minimize the MAE/MSE to obtain a better result.

from ifl-tpp.

guoshnBJTU avatar guoshnBJTU commented on July 27, 2024

Thank you! It works.

I wonder why "simply train an RNN to only output the expected inter-event time and directly minimize the MAE/MSE to obtain a better result."? So what is the advantages of the probabilistic model on such temporal point processes prediction tasks.

from ifl-tpp.

shchur avatar shchur commented on July 27, 2024

Well, a probabilistic (generative) model allows you to do many other things as well. You can generate entire realistic sequences, which, for example, allows you answer questions such "How many events will happen in the next hour/day/week?". You can answer this question for arbitrary intervals with a single model. You can also construct other prediction tasks, that you will answer via simulation. Since you have a model for p(t_1, t_2, ...), you can use it to define p(t_1, t_2, ... | z) and use it in an autoencoder-like architecture, or to simply learn sequence embeddings, like we do in our paper. You can also do other "standard" things possible with generative models, such as imputing missing data or detecting anomalies.

from ifl-tpp.

guoshnBJTU avatar guoshnBJTU commented on July 27, 2024

Well, I know. Thank you very much.

from ifl-tpp.

guoshnBJTU avatar guoshnBJTU commented on July 27, 2024

Hi, your script that computes the means for the test set does work. However, I find that when the model is in the train mode model.train(), the means are wrong while when the model is in the test mode model.eval(), the means are right. So can you tell me what the difference between the training mode and the eval/test mode?

from ifl-tpp.

shchur avatar shchur commented on July 27, 2024

The BatchNorm layer acts differently depending on whether you are in the training or eval mode.

from ifl-tpp.

guoshnBJTU avatar guoshnBJTU commented on July 27, 2024

I found after training the model many epochs, the loss score is right, but MAE or RMSE will go abnormal. Only in the first few epochs, the MAE and RMSE look normal.
My code is as follow:
I modify your get_total_loss() to:

def get_total_loss(loader):
    loader_log_prob, loader_lengths = [], []
    loader_mae, loader_rmse = [], []
    for input in loader:
        loader_log_prob.append(model.log_prob(input).detach()) 
        loader_lengths.append(input.length.detach())
        
        # calculate means
        h = model.rnn(input)
        gmm = model.decoder.base_dist
        prior_logits, means, log_scales = gmm.get_params(h, None)
        prior = prior_logits.exp()
        scales_squared = (log_scales * 2).exp()

        affine = model.decoder.transforms[0]
        a = affine.log_scale.exp().item()
        b = affine.shift.item()
        mean_time = (prior * torch.exp(a * means + b + 0.5 * a ** 2 * scales_squared)).sum(-1)
        mae = abs(mean_time - input.out_time)
        rmse = ((mean_time - input.out_time) ** 2)
        
        loader_mae.append(mae)
        loader_rmse.append(rmse)
    return -model.aggregate(loader_log_prob, loader_lengths), torch.sqrt(model.aggregate(loader_mae, loader_lengths)), torch.sqrt(model.aggregate(loader_rmse, loader_lengths)) 

I print loss, mae and rmse on the val dataset, like this:

for epoch in range(max_epochs):
    model.train()
    for input in dl_train:
        opt.zero_grad()
        log_prob = model.log_prob(input)
        loss = -model.aggregate(log_prob, input.length)
        loss.backward()
        opt.step()
    model.eval()
    loss_val, val_mae, val_rmse = get_total_loss(dl_train)
    training_val_losses.append(loss_val.item())
    if (best_loss - loss_val) < 1e-4:
        impatient += 1
        if loss_val < best_loss:
            best_loss = loss_val.item()
            best_model = deepcopy(model.state_dict())
    else:
        best_loss = loss_val.item()
        best_model = deepcopy(model.state_dict())
        impatient = 0
    if impatient >= patience:
        print(f'Breaking due to early stopping at epoch {epoch}')
        break
    if (epoch + 1) % display_step == 0:
        print(f"Epoch {epoch+1:4d}, loss_train_last_batch = {loss:.4f}, loss_val = {loss_val:.4f}, mae_val={val_mae:.4f}, rmse_val={val_rmse:.4f}")

I run many times on the dataset yelp_toronto, the settings are:

dataset_name = 'yelp_toronto' # other: [ 'yelp_toronto', 'wikipedia', 'mooc', 'stack_overflow', 'lastfm',
                              #          'reddit', 'synth/poisson', 'synth/renewal', 'synth/self_correcting',
                               #          'synth/hawkes1', 'synth/hawkes2']

split = 'each_sequence' # How to split the sequences (other 'each_sequence' -- split every seq. into train/val/test)

## General model config
use_history = True        # Whether to use RNN to encode history
history_size = 64         # Size of the RNN hidden vector
rnn_type = 'RNN'          # Which RNN cell to use (other: ['GRU', 'LSTM'])
use_embedding = False     # Whether to use sequence embedding (should use with 'each_sequence' split)
embedding_size = 32       # Size of the sequence embedding vector
                          # IMPORTANT: when using split = 'whole_sequences', the model will only learn embeddings
                          # for the training sequences, and not for validation / test
trainable_affine = False  # Train the final affine layer

## Decoder config
decoder_name = 'LogNormMix' # other: ['RMTPP', 'FullyNeuralNet', 'Exponential', 'SOSPolynomial', 'DeepSigmoidalFlow','LogNormMix']
print('dataset_name:', dataset_name, ' split:', split, ' decoder_name:',decoder_name)
n_components = 64           # Number of components for a mixture model
hypernet_hidden_sizes = []  # Number of units in MLP generating parameters ([] -- affine layer, [64] -- one layer, etc.)

## Flow params
# Polynomial
max_degree = 3  # Maximum degree value for Sum-of-squares polynomial flow (SOS)
n_terms = 4     # Number of terms for SOS flow
# DSF / FullyNN
n_layers = 2    # Number of layers for Deep Sigmoidal Flow (DSF) / Fully Neural Network flow (Omi et al., 2019)
layer_size = 64 # Number of mixture components / units in a layer for DSF and FullyNN

## Training config
regularization = 1e-5 # L2 regularization parameter
learning_rate = 1e-3  # Learning rate for Adam optimizer
max_epochs = 1000     # For how many epochs to train
display_step = 1     # Display training statistics after every display_step
patience = 1000       

The output is:

Starting training...
Epoch    1, loss_train_last_batch = 13.4079, loss_val = 13.2335, mae_val=26766698.0000, rmse_val=1819904613285888.0000
Epoch    2, loss_train_last_batch = 13.1865, loss_val = 13.1495, mae_val=3426.1558, rmse_val=1383180928.0000
Epoch    3, loss_train_last_batch = 12.9530, loss_val = 13.0888, mae_val=512.5600, rmse_val=501593.9688
Epoch    4, loss_train_last_batch = 13.0125, loss_val = 13.0766, mae_val=503.6180, rmse_val=500986.3125
Epoch    5, loss_train_last_batch = 13.0837, loss_val = 13.0718, mae_val=506.8425, rmse_val=501509.8750
Epoch    6, loss_train_last_batch = 13.0529, loss_val = 13.0475, mae_val=499.4471, rmse_val=499397.9375
Epoch    7, loss_train_last_batch = 13.0129, loss_val = 13.0494, mae_val=492.0758, rmse_val=502415.4375
Epoch    8, loss_train_last_batch = 13.1083, loss_val = 13.0371, mae_val=501.7400, rmse_val=498083.4688
Epoch    9, loss_train_last_batch = 13.1387, loss_val = 13.0398, mae_val=498.1692, rmse_val=497672.8125
Epoch   10, loss_train_last_batch = 12.9956, loss_val = 13.0223, mae_val=500.8606, rmse_val=497590.7500
Epoch   11, loss_train_last_batch = 13.1573, loss_val = 13.0222, mae_val=508.6006, rmse_val=498502.3438
Epoch   12, loss_train_last_batch = 12.9300, loss_val = 13.0185, mae_val=500.3276, rmse_val=498012.8438
Epoch   13, loss_train_last_batch = 13.1425, loss_val = 13.0230, mae_val=519.6070, rmse_val=502788.6562
Epoch   14, loss_train_last_batch = 13.2399, loss_val = 13.0151, mae_val=500.7939, rmse_val=496978.7188
Epoch   15, loss_train_last_batch = 13.1108, loss_val = 13.0243, mae_val=507.5722, rmse_val=497184.7188
Epoch   16, loss_train_last_batch = 13.0713, loss_val = 13.0214, mae_val=520.1346, rmse_val=505738.1875
Epoch   17, loss_train_last_batch = 12.6886, loss_val = 13.0367, mae_val=486.1195, rmse_val=500092.6875
Epoch   18, loss_train_last_batch = 12.9877, loss_val = 13.0175, mae_val=497.2266, rmse_val=496743.5000
Epoch   19, loss_train_last_batch = 13.0687, loss_val = 12.9997, mae_val=503.1215, rmse_val=496066.5312
Epoch   20, loss_train_last_batch = 13.1702, loss_val = 13.0088, mae_val=492.2871, rmse_val=496514.0312
Epoch   21, loss_train_last_batch = 13.2350, loss_val = 13.0038, mae_val=497.3408, rmse_val=495459.9062
Epoch   22, loss_train_last_batch = 13.1782, loss_val = 12.9992, mae_val=495.5132, rmse_val=498258.4375
Epoch   23, loss_train_last_batch = 13.0062, loss_val = 13.0005, mae_val=499.0397, rmse_val=519898.7812
Epoch   24, loss_train_last_batch = 12.9067, loss_val = 12.9931, mae_val=507.4612, rmse_val=594324.8750
Epoch   25, loss_train_last_batch = 13.1079, loss_val = 12.9945, mae_val=517.7587, rmse_val=900886.3750
Epoch   26, loss_train_last_batch = 12.9857, loss_val = 12.9920, mae_val=512.2386, rmse_val=4392982.5000
Epoch   27, loss_train_last_batch = 12.9264, loss_val = 12.9950, mae_val=514.3329, rmse_val=5169827.0000
Epoch   28, loss_train_last_batch = 12.8157, loss_val = 13.0302, mae_val=493.4668, rmse_val=1860104.7500
Epoch   29, loss_train_last_batch = 12.5613, loss_val = 12.9847, mae_val=537.9836, rmse_val=11795128.0000
Epoch   30, loss_train_last_batch = 13.2016, loss_val = 12.9804, mae_val=529.3032, rmse_val=6777215.0000
Epoch   31, loss_train_last_batch = 12.9775, loss_val = 13.0134, mae_val=846.9025, rmse_val=128691048.0000
Epoch   32, loss_train_last_batch = 12.8428, loss_val = 12.9859, mae_val=1849.7711, rmse_val=980683584.0000
Epoch   33, loss_train_last_batch = 12.8807, loss_val = 12.9806, mae_val=1070.1407, rmse_val=245876272.0000
Epoch   34, loss_train_last_batch = 12.9979, loss_val = 12.9916, mae_val=2452.5298, rmse_val=1599526016.0000
Epoch   35, loss_train_last_batch = 12.8360, loss_val = 12.9800, mae_val=6217.9438, rmse_val=10881764352.0000
Epoch   36, loss_train_last_batch = 13.0218, loss_val = 12.9772, mae_val=4096.3394, rmse_val=4222642432.0000
Epoch   37, loss_train_last_batch = 13.1249, loss_val = 12.9804, mae_val=14538.7510, rmse_val=55435677696.0000
Epoch   38, loss_train_last_batch = 12.9419, loss_val = 12.9751, mae_val=5957.1592, rmse_val=9103671296.0000
Epoch   39, loss_train_last_batch = 12.9560, loss_val = 12.9771, mae_val=15795.3037, rmse_val=63540269056.0000
Epoch   40, loss_train_last_batch = 12.6515, loss_val = 12.9749, mae_val=20455.7715, rmse_val=111674048512.0000
Epoch   41, loss_train_last_batch = 12.6312, loss_val = 12.9742, mae_val=24490.1016, rmse_val=161982464000.0000
Epoch   42, loss_train_last_batch = 12.8424, loss_val = 12.9783, mae_val=58356.1211, rmse_val=895563464704.0000
Epoch   43, loss_train_last_batch = 12.8352, loss_val = 12.9720, mae_val=707848.0625, rmse_val=131829575188480.0000
Epoch   44, loss_train_last_batch = 13.1982, loss_val = 12.9710, mae_val=159048.2344, rmse_val=6599020642304.0000
Epoch   45, loss_train_last_batch = 13.0262, loss_val = 12.9698, mae_val=1080644.5000, rmse_val=301453822394368.0000
Epoch   46, loss_train_last_batch = 13.0949, loss_val = 12.9716, mae_val=1310502.1250, rmse_val=443600429121536.0000
Epoch   47, loss_train_last_batch = 13.0794, loss_val = 12.9766, mae_val=952063.8750, rmse_val=236723497861120.0000
Epoch   48, loss_train_last_batch = 13.0596, loss_val = 12.9782, mae_val=2891685.2500, rmse_val=2266192517529600.0000
Epoch   49, loss_train_last_batch = 13.0674, loss_val = 12.9788, mae_val=2158038.7500, rmse_val=1224882125799424.0000
Epoch   50, loss_train_last_batch = 12.8285, loss_val = 12.9803, mae_val=6511638.0000, rmse_val=12143592680194048.0000
Epoch   51, loss_train_last_batch = 12.8486, loss_val = 12.9766, mae_val=6650059.5000, rmse_val=12599905038106624.0000
Epoch   52, loss_train_last_batch = 12.9981, loss_val = 12.9706, mae_val=4850225.5000, rmse_val=6423470409777152.0000
Epoch   53, loss_train_last_batch = 12.9091, loss_val = 12.9838, mae_val=8188149.5000, rmse_val=19454163888898048.0000
Epoch   54, loss_train_last_batch = 12.5680, loss_val = 12.9725, mae_val=45244352.0000, rmse_val=inf
Epoch   55, loss_train_last_batch = 12.6416, loss_val = 12.9725, mae_val=3023256.0000, rmse_val=2654210332033024.0000
Epoch   56, loss_train_last_batch = 13.4106, loss_val = 12.9710, mae_val=3500038.2500, rmse_val=3797295208333312.0000
Epoch   57, loss_train_last_batch = 12.7122, loss_val = 12.9776, mae_val=17572978.0000, rmse_val=inf
Epoch   58, loss_train_last_batch = 12.8259, loss_val = 12.9673, mae_val=27126030.0000, rmse_val=inf
Epoch   59, loss_train_last_batch = 13.2648, loss_val = 12.9665, mae_val=23597588.0000, rmse_val=inf
Epoch   60, loss_train_last_batch = 13.1881, loss_val = 12.9709, mae_val=19627790.0000, rmse_val=inf

After around 20 epochs, train/val loss is still right while the mae and rmse go abnormal and finally they become inf. Could you help me to find out the reason? Thank you!

from ifl-tpp.

yyhyplxyz avatar yyhyplxyz commented on July 27, 2024

I found after training the model many epochs, the loss score is right, but MAE or RMSE will go abnormal. Only in the first few epochs, the MAE and RMSE look normal.
My code is as follow:
I modify your get_total_loss() to:

def get_total_loss(loader):
    loader_log_prob, loader_lengths = [], []
    loader_mae, loader_rmse = [], []
    for input in loader:
        loader_log_prob.append(model.log_prob(input).detach()) 
        loader_lengths.append(input.length.detach())
        
        # calculate means
        h = model.rnn(input)
        gmm = model.decoder.base_dist
        prior_logits, means, log_scales = gmm.get_params(h, None)
        prior = prior_logits.exp()
        scales_squared = (log_scales * 2).exp()

        affine = model.decoder.transforms[0]
        a = affine.log_scale.exp().item()
        b = affine.shift.item()
        mean_time = (prior * torch.exp(a * means + b + 0.5 * a ** 2 * scales_squared)).sum(-1)
        mae = abs(mean_time - input.out_time)
        rmse = ((mean_time - input.out_time) ** 2)
        
        loader_mae.append(mae)
        loader_rmse.append(rmse)
    return -model.aggregate(loader_log_prob, loader_lengths), torch.sqrt(model.aggregate(loader_mae, loader_lengths)), torch.sqrt(model.aggregate(loader_rmse, loader_lengths)) 

I print loss, mae and rmse on the val dataset, like this:

for epoch in range(max_epochs):
    model.train()
    for input in dl_train:
        opt.zero_grad()
        log_prob = model.log_prob(input)
        loss = -model.aggregate(log_prob, input.length)
        loss.backward()
        opt.step()
    model.eval()
    loss_val, val_mae, val_rmse = get_total_loss(dl_train)
    training_val_losses.append(loss_val.item())
    if (best_loss - loss_val) < 1e-4:
        impatient += 1
        if loss_val < best_loss:
            best_loss = loss_val.item()
            best_model = deepcopy(model.state_dict())
    else:
        best_loss = loss_val.item()
        best_model = deepcopy(model.state_dict())
        impatient = 0
    if impatient >= patience:
        print(f'Breaking due to early stopping at epoch {epoch}')
        break
    if (epoch + 1) % display_step == 0:
        print(f"Epoch {epoch+1:4d}, loss_train_last_batch = {loss:.4f}, loss_val = {loss_val:.4f}, mae_val={val_mae:.4f}, rmse_val={val_rmse:.4f}")

I run many times on the dataset yelp_toronto, the settings are:

dataset_name = 'yelp_toronto' # other: [ 'yelp_toronto', 'wikipedia', 'mooc', 'stack_overflow', 'lastfm',
                              #          'reddit', 'synth/poisson', 'synth/renewal', 'synth/self_correcting',
                               #          'synth/hawkes1', 'synth/hawkes2']

split = 'each_sequence' # How to split the sequences (other 'each_sequence' -- split every seq. into train/val/test)

## General model config
use_history = True        # Whether to use RNN to encode history
history_size = 64         # Size of the RNN hidden vector
rnn_type = 'RNN'          # Which RNN cell to use (other: ['GRU', 'LSTM'])
use_embedding = False     # Whether to use sequence embedding (should use with 'each_sequence' split)
embedding_size = 32       # Size of the sequence embedding vector
                          # IMPORTANT: when using split = 'whole_sequences', the model will only learn embeddings
                          # for the training sequences, and not for validation / test
trainable_affine = False  # Train the final affine layer

## Decoder config
decoder_name = 'LogNormMix' # other: ['RMTPP', 'FullyNeuralNet', 'Exponential', 'SOSPolynomial', 'DeepSigmoidalFlow','LogNormMix']
print('dataset_name:', dataset_name, ' split:', split, ' decoder_name:',decoder_name)
n_components = 64           # Number of components for a mixture model
hypernet_hidden_sizes = []  # Number of units in MLP generating parameters ([] -- affine layer, [64] -- one layer, etc.)

## Flow params
# Polynomial
max_degree = 3  # Maximum degree value for Sum-of-squares polynomial flow (SOS)
n_terms = 4     # Number of terms for SOS flow
# DSF / FullyNN
n_layers = 2    # Number of layers for Deep Sigmoidal Flow (DSF) / Fully Neural Network flow (Omi et al., 2019)
layer_size = 64 # Number of mixture components / units in a layer for DSF and FullyNN

## Training config
regularization = 1e-5 # L2 regularization parameter
learning_rate = 1e-3  # Learning rate for Adam optimizer
max_epochs = 1000     # For how many epochs to train
display_step = 1     # Display training statistics after every display_step
patience = 1000       

The output is:

Starting training...
Epoch    1, loss_train_last_batch = 13.4079, loss_val = 13.2335, mae_val=26766698.0000, rmse_val=1819904613285888.0000
Epoch    2, loss_train_last_batch = 13.1865, loss_val = 13.1495, mae_val=3426.1558, rmse_val=1383180928.0000
Epoch    3, loss_train_last_batch = 12.9530, loss_val = 13.0888, mae_val=512.5600, rmse_val=501593.9688
Epoch    4, loss_train_last_batch = 13.0125, loss_val = 13.0766, mae_val=503.6180, rmse_val=500986.3125
Epoch    5, loss_train_last_batch = 13.0837, loss_val = 13.0718, mae_val=506.8425, rmse_val=501509.8750
Epoch    6, loss_train_last_batch = 13.0529, loss_val = 13.0475, mae_val=499.4471, rmse_val=499397.9375
Epoch    7, loss_train_last_batch = 13.0129, loss_val = 13.0494, mae_val=492.0758, rmse_val=502415.4375
Epoch    8, loss_train_last_batch = 13.1083, loss_val = 13.0371, mae_val=501.7400, rmse_val=498083.4688
Epoch    9, loss_train_last_batch = 13.1387, loss_val = 13.0398, mae_val=498.1692, rmse_val=497672.8125
Epoch   10, loss_train_last_batch = 12.9956, loss_val = 13.0223, mae_val=500.8606, rmse_val=497590.7500
Epoch   11, loss_train_last_batch = 13.1573, loss_val = 13.0222, mae_val=508.6006, rmse_val=498502.3438
Epoch   12, loss_train_last_batch = 12.9300, loss_val = 13.0185, mae_val=500.3276, rmse_val=498012.8438
Epoch   13, loss_train_last_batch = 13.1425, loss_val = 13.0230, mae_val=519.6070, rmse_val=502788.6562
Epoch   14, loss_train_last_batch = 13.2399, loss_val = 13.0151, mae_val=500.7939, rmse_val=496978.7188
Epoch   15, loss_train_last_batch = 13.1108, loss_val = 13.0243, mae_val=507.5722, rmse_val=497184.7188
Epoch   16, loss_train_last_batch = 13.0713, loss_val = 13.0214, mae_val=520.1346, rmse_val=505738.1875
Epoch   17, loss_train_last_batch = 12.6886, loss_val = 13.0367, mae_val=486.1195, rmse_val=500092.6875
Epoch   18, loss_train_last_batch = 12.9877, loss_val = 13.0175, mae_val=497.2266, rmse_val=496743.5000
Epoch   19, loss_train_last_batch = 13.0687, loss_val = 12.9997, mae_val=503.1215, rmse_val=496066.5312
Epoch   20, loss_train_last_batch = 13.1702, loss_val = 13.0088, mae_val=492.2871, rmse_val=496514.0312
Epoch   21, loss_train_last_batch = 13.2350, loss_val = 13.0038, mae_val=497.3408, rmse_val=495459.9062
Epoch   22, loss_train_last_batch = 13.1782, loss_val = 12.9992, mae_val=495.5132, rmse_val=498258.4375
Epoch   23, loss_train_last_batch = 13.0062, loss_val = 13.0005, mae_val=499.0397, rmse_val=519898.7812
Epoch   24, loss_train_last_batch = 12.9067, loss_val = 12.9931, mae_val=507.4612, rmse_val=594324.8750
Epoch   25, loss_train_last_batch = 13.1079, loss_val = 12.9945, mae_val=517.7587, rmse_val=900886.3750
Epoch   26, loss_train_last_batch = 12.9857, loss_val = 12.9920, mae_val=512.2386, rmse_val=4392982.5000
Epoch   27, loss_train_last_batch = 12.9264, loss_val = 12.9950, mae_val=514.3329, rmse_val=5169827.0000
Epoch   28, loss_train_last_batch = 12.8157, loss_val = 13.0302, mae_val=493.4668, rmse_val=1860104.7500
Epoch   29, loss_train_last_batch = 12.5613, loss_val = 12.9847, mae_val=537.9836, rmse_val=11795128.0000
Epoch   30, loss_train_last_batch = 13.2016, loss_val = 12.9804, mae_val=529.3032, rmse_val=6777215.0000
Epoch   31, loss_train_last_batch = 12.9775, loss_val = 13.0134, mae_val=846.9025, rmse_val=128691048.0000
Epoch   32, loss_train_last_batch = 12.8428, loss_val = 12.9859, mae_val=1849.7711, rmse_val=980683584.0000
Epoch   33, loss_train_last_batch = 12.8807, loss_val = 12.9806, mae_val=1070.1407, rmse_val=245876272.0000
Epoch   34, loss_train_last_batch = 12.9979, loss_val = 12.9916, mae_val=2452.5298, rmse_val=1599526016.0000
Epoch   35, loss_train_last_batch = 12.8360, loss_val = 12.9800, mae_val=6217.9438, rmse_val=10881764352.0000
Epoch   36, loss_train_last_batch = 13.0218, loss_val = 12.9772, mae_val=4096.3394, rmse_val=4222642432.0000
Epoch   37, loss_train_last_batch = 13.1249, loss_val = 12.9804, mae_val=14538.7510, rmse_val=55435677696.0000
Epoch   38, loss_train_last_batch = 12.9419, loss_val = 12.9751, mae_val=5957.1592, rmse_val=9103671296.0000
Epoch   39, loss_train_last_batch = 12.9560, loss_val = 12.9771, mae_val=15795.3037, rmse_val=63540269056.0000
Epoch   40, loss_train_last_batch = 12.6515, loss_val = 12.9749, mae_val=20455.7715, rmse_val=111674048512.0000
Epoch   41, loss_train_last_batch = 12.6312, loss_val = 12.9742, mae_val=24490.1016, rmse_val=161982464000.0000
Epoch   42, loss_train_last_batch = 12.8424, loss_val = 12.9783, mae_val=58356.1211, rmse_val=895563464704.0000
Epoch   43, loss_train_last_batch = 12.8352, loss_val = 12.9720, mae_val=707848.0625, rmse_val=131829575188480.0000
Epoch   44, loss_train_last_batch = 13.1982, loss_val = 12.9710, mae_val=159048.2344, rmse_val=6599020642304.0000
Epoch   45, loss_train_last_batch = 13.0262, loss_val = 12.9698, mae_val=1080644.5000, rmse_val=301453822394368.0000
Epoch   46, loss_train_last_batch = 13.0949, loss_val = 12.9716, mae_val=1310502.1250, rmse_val=443600429121536.0000
Epoch   47, loss_train_last_batch = 13.0794, loss_val = 12.9766, mae_val=952063.8750, rmse_val=236723497861120.0000
Epoch   48, loss_train_last_batch = 13.0596, loss_val = 12.9782, mae_val=2891685.2500, rmse_val=2266192517529600.0000
Epoch   49, loss_train_last_batch = 13.0674, loss_val = 12.9788, mae_val=2158038.7500, rmse_val=1224882125799424.0000
Epoch   50, loss_train_last_batch = 12.8285, loss_val = 12.9803, mae_val=6511638.0000, rmse_val=12143592680194048.0000
Epoch   51, loss_train_last_batch = 12.8486, loss_val = 12.9766, mae_val=6650059.5000, rmse_val=12599905038106624.0000
Epoch   52, loss_train_last_batch = 12.9981, loss_val = 12.9706, mae_val=4850225.5000, rmse_val=6423470409777152.0000
Epoch   53, loss_train_last_batch = 12.9091, loss_val = 12.9838, mae_val=8188149.5000, rmse_val=19454163888898048.0000
Epoch   54, loss_train_last_batch = 12.5680, loss_val = 12.9725, mae_val=45244352.0000, rmse_val=inf
Epoch   55, loss_train_last_batch = 12.6416, loss_val = 12.9725, mae_val=3023256.0000, rmse_val=2654210332033024.0000
Epoch   56, loss_train_last_batch = 13.4106, loss_val = 12.9710, mae_val=3500038.2500, rmse_val=3797295208333312.0000
Epoch   57, loss_train_last_batch = 12.7122, loss_val = 12.9776, mae_val=17572978.0000, rmse_val=inf
Epoch   58, loss_train_last_batch = 12.8259, loss_val = 12.9673, mae_val=27126030.0000, rmse_val=inf
Epoch   59, loss_train_last_batch = 13.2648, loss_val = 12.9665, mae_val=23597588.0000, rmse_val=inf
Epoch   60, loss_train_last_batch = 13.1881, loss_val = 12.9709, mae_val=19627790.0000, rmse_val=inf

After around 20 epochs, train/val loss is still right while the mae and rmse go abnormal and finally they become inf. Could you help me to find out the reason? Thank you!

I am not sure have you solved the problem or not, but this is what author says in another issue "Under default settings, we transform the RNN input in_times by applying logarithm (code), and also additionally normalize the values to have zero mean and unit standard deviation (code) using the statistics of the training set." I suppose this may cause the errors you encountered. Hope it helps.

from ifl-tpp.

shchur avatar shchur commented on July 27, 2024

@guoshnBJTU sorry, I just noticed your last post. I will look into it this week.

from ifl-tpp.

guoshnBJTU avatar guoshnBJTU commented on July 27, 2024

Thank you. I find if I set n_components to a smaller number such as 2,4,8, the loss will be always normal no matter how long it trains. But I still do not find the reason...

from ifl-tpp.

shchur avatar shchur commented on July 27, 2024

@guoshnBJTU I found that it's possible to get somewhat reasonable MAE/MSE values by normalizing the errors before computing the loss (as done in https://arxiv.org/pdf/1907.07561.pdf - see Equation 20 and Table 4).

        # I add 1e-8 to avoid division by zero where out_time = 0
        mae = abs((mean_time - input.out_time) / (input.out_time + 1e-8))
        rmse = (((mean_time - input.out_time) / (input.out_time + 1e-8)) ** 2)

Here is the output that I get with n_components = 64 on yelp_toronto when computing MAE and MSE on the validation set https://pastebin.com/kY4zkEJG.

However, the MSE/MAE values still start to diverge after a while as the model starts overfitting.

I have a hypothesis as to why this happens when we have a large number of mixture components. Log-normal is a heavy-tailed distribution, and if the scale parameter gets large for even a single component of our mixture, we get an extremely high expected value for this one component. This means that in the end this component has a disproportionately large influence on the overall expected inter-arrival time, and the MAE/MSE loss doesn't look great.

Also, it might be that the problem arises because the inter-arrival times are on a very large scale, which leads to some numerical issues. It could be that rescaling the inter-arrival times will lead to a more stable behavior (but I'm not 100% sure about this).

from ifl-tpp.

guoshnBJTU avatar guoshnBJTU commented on July 27, 2024

@shchur Thank you. I agree with your hypothesis and explanation.

from ifl-tpp.

SZH1230456 avatar SZH1230456 commented on July 27, 2024

According to your reply, I calculate the mean as follows:
prior_logits, means, log_scales = base_dist.get_params(h, emb)
s = torch.exp(log_scales)
prior = torch.exp(prior_logits)
expectation = torch.sum(prior * torch.exp(a * means + b + a * a * s * s / 2), dim=-1),
within which a=std_in_train, b=mean_in_train, base_dist = NormalMixtureDistribution(). But, I got error WARNING:root:NaN or Inf found in input tensor.. Could you help me to find out why I got such error? which step is wrong in my code?

sorry, I am just confused about the expectation calculation process. I think the code 'expectation = torch.sum(prior * torch.exp(a * means + b + a * a * s * s / 2), dim=-1) which is calculating the z_2 expectation not the E(\tau), why not just adopt E_P[\tau] in page 4?

from ifl-tpp.

shchur avatar shchur commented on July 27, 2024

Please have a look at #3 (comment)

from ifl-tpp.

SZH1230456 avatar SZH1230456 commented on July 27, 2024

I have got it. Thank you very much!

from ifl-tpp.

SunderlandAJ-1130 avatar SunderlandAJ-1130 commented on July 27, 2024

Hi, @shchur! Could you please tell me how to obtain the mark at the next event? Thanks.

from ifl-tpp.

shchur avatar shchur commented on July 27, 2024

Hi @SunderlandAJ-1130, in this line of code we compute the logits of the mark distribution. If you're interested in the probabilities of the next mark, you can compute

mark_probs = mark_logits.softmax(dim=-1)

mark_probs will be a tensor of shape [batch_size, seq_length, num_marks] where each entry mark_probs[i, j, k] corresponds to probability that the event j of sequence i has event of type k.

from ifl-tpp.

SunderlandAJ-1130 avatar SunderlandAJ-1130 commented on July 27, 2024

@shchur Thank you very much. By the way, I found your video about the paper "Neural Temporal Point Processes: A Review" on Youtube. In this video, you said that the neural TPP can be used to estimate how many events will happen on the forecast horizon (at about 10:30 of this video). So, I wonder whether your ifl-tpp model to achieve this goal. If it is possible, could you please describe how to deal with this task with your code? Thanks!

from ifl-tpp.

shchur avatar shchur commented on July 27, 2024

Suppose you observed a sequence of events over time interval $[0, T]$ and want to predict how many events will happen in the interval $[T, T+H]$. You can do this by sampling many new event sequences and looking at the empirical distribution of sequence lengths (e.g., predicting the mean of this distribution). You can condition on the observed events in $[0, T]$ by providing the context_init parameter to the sample method https://github.com/shchur/ifl-tpp/blob/e7ebab1ceab56cee440bd8e99b5c1bd42d6ada07/code/dpp/models/recurrent_tpp.py#L139

I don't think there is way to obtain this distribution over # of events in $[T, T+H]$ analytically for our model or any other autoregressive neural TPP (e.g., RMTPP, NeuralHawkes, THP, SAHP), so sampling seems like the only realistic option.

from ifl-tpp.

SunderlandAJ-1130 avatar SunderlandAJ-1130 commented on July 27, 2024

Thank you very much! @shchur

from ifl-tpp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.