Comments (23)
You should use F.log_softmax(prior_logits, dim=-1)
to obtain the prior logits that would sum up to 1 when exponentiated. I am not 100% sure if the rest of the code is correct - there might also be some subtle bug with the shapes. I will publish a notebook showing how to compute the means next week after the NeurIPS deadline.
from ifl-tpp.
In my code, base_dist = NormalMixtureDistribution()
and I directly use the function NormalMixtureDistribution.get_params()
in your code to get prior_logits
. So it does sum up to 1 when exponentiated. There may be some other bugs in my code. Looking forward to your notebook to be published. Thank you.
from ifl-tpp.
Hi, when are you available to release the code about getting the mean? Thank you very much~
from ifl-tpp.
Hi, as you probably heard, the NeurIPS deadline was extended, so I was busy all of last week.
Here is a simple script that computes the means for the test set:
# Obtain a dataloader for the entire test set
all_test = torch.utils.data.DataLoader(d_test, batch_size=len(d_test), shuffle=False, collate_fn=collate)
for batch in all_test:
break
h = model.rnn(batch)
gmm = model.decoder.base_dist
prior_logits, means, log_scales = gmm.get_params(h, None)
prior = prior_logits.exp()
scales_squared = (log_scales * 2).exp()
affine = model.decoder.transforms[0]
a = affine.log_scale.exp().item()
b = affine.shift.item()
mean_time = (prior * torch.exp(a * means + b + 0.5 * a**2 * scales_squared)).sum(-1)
You can simply run this code in a new cell after executing all the code in interactive.ipynb
. Make sure to use a mask to only consider the times corresponding to events that happened, if you want to compute the total MAE or MSE (the tensor mean_time
has shape [batch_size, max_seq_len]
and is padded to the maximum sequence length in the test set).
As a general comment though, if you only care about only computing the MAE or MSE for the inter-event times, you probably don't need our model (or a probabilistic model at all). You could simply train an RNN to only output the expected inter-event time and directly minimize the MAE/MSE to obtain a better result.
from ifl-tpp.
Thank you! It works.
I wonder why "simply train an RNN to only output the expected inter-event time and directly minimize the MAE/MSE to obtain a better result."? So what is the advantages of the probabilistic model on such temporal point processes prediction tasks.
from ifl-tpp.
Well, a probabilistic (generative) model allows you to do many other things as well. You can generate entire realistic sequences, which, for example, allows you answer questions such "How many events will happen in the next hour/day/week?". You can answer this question for arbitrary intervals with a single model. You can also construct other prediction tasks, that you will answer via simulation. Since you have a model for p(t_1, t_2, ...), you can use it to define p(t_1, t_2, ... | z) and use it in an autoencoder-like architecture, or to simply learn sequence embeddings, like we do in our paper. You can also do other "standard" things possible with generative models, such as imputing missing data or detecting anomalies.
from ifl-tpp.
Well, I know. Thank you very much.
from ifl-tpp.
Hi, your script that computes the means for the test set does work. However, I find that when the model is in the train mode model.train()
, the means are wrong while when the model is in the test mode model.eval()
, the means are right. So can you tell me what the difference between the training mode and the eval/test mode?
from ifl-tpp.
The BatchNorm
layer acts differently depending on whether you are in the training or eval mode.
from ifl-tpp.
I found after training the model many epochs, the loss score is right, but MAE or RMSE will go abnormal. Only in the first few epochs, the MAE and RMSE look normal.
My code is as follow:
I modify your get_total_loss()
to:
def get_total_loss(loader):
loader_log_prob, loader_lengths = [], []
loader_mae, loader_rmse = [], []
for input in loader:
loader_log_prob.append(model.log_prob(input).detach())
loader_lengths.append(input.length.detach())
# calculate means
h = model.rnn(input)
gmm = model.decoder.base_dist
prior_logits, means, log_scales = gmm.get_params(h, None)
prior = prior_logits.exp()
scales_squared = (log_scales * 2).exp()
affine = model.decoder.transforms[0]
a = affine.log_scale.exp().item()
b = affine.shift.item()
mean_time = (prior * torch.exp(a * means + b + 0.5 * a ** 2 * scales_squared)).sum(-1)
mae = abs(mean_time - input.out_time)
rmse = ((mean_time - input.out_time) ** 2)
loader_mae.append(mae)
loader_rmse.append(rmse)
return -model.aggregate(loader_log_prob, loader_lengths), torch.sqrt(model.aggregate(loader_mae, loader_lengths)), torch.sqrt(model.aggregate(loader_rmse, loader_lengths))
I print loss, mae and rmse on the val dataset, like this:
for epoch in range(max_epochs):
model.train()
for input in dl_train:
opt.zero_grad()
log_prob = model.log_prob(input)
loss = -model.aggregate(log_prob, input.length)
loss.backward()
opt.step()
model.eval()
loss_val, val_mae, val_rmse = get_total_loss(dl_train)
training_val_losses.append(loss_val.item())
if (best_loss - loss_val) < 1e-4:
impatient += 1
if loss_val < best_loss:
best_loss = loss_val.item()
best_model = deepcopy(model.state_dict())
else:
best_loss = loss_val.item()
best_model = deepcopy(model.state_dict())
impatient = 0
if impatient >= patience:
print(f'Breaking due to early stopping at epoch {epoch}')
break
if (epoch + 1) % display_step == 0:
print(f"Epoch {epoch+1:4d}, loss_train_last_batch = {loss:.4f}, loss_val = {loss_val:.4f}, mae_val={val_mae:.4f}, rmse_val={val_rmse:.4f}")
I run many times on the dataset yelp_toronto
, the settings are:
dataset_name = 'yelp_toronto' # other: [ 'yelp_toronto', 'wikipedia', 'mooc', 'stack_overflow', 'lastfm',
# 'reddit', 'synth/poisson', 'synth/renewal', 'synth/self_correcting',
# 'synth/hawkes1', 'synth/hawkes2']
split = 'each_sequence' # How to split the sequences (other 'each_sequence' -- split every seq. into train/val/test)
## General model config
use_history = True # Whether to use RNN to encode history
history_size = 64 # Size of the RNN hidden vector
rnn_type = 'RNN' # Which RNN cell to use (other: ['GRU', 'LSTM'])
use_embedding = False # Whether to use sequence embedding (should use with 'each_sequence' split)
embedding_size = 32 # Size of the sequence embedding vector
# IMPORTANT: when using split = 'whole_sequences', the model will only learn embeddings
# for the training sequences, and not for validation / test
trainable_affine = False # Train the final affine layer
## Decoder config
decoder_name = 'LogNormMix' # other: ['RMTPP', 'FullyNeuralNet', 'Exponential', 'SOSPolynomial', 'DeepSigmoidalFlow','LogNormMix']
print('dataset_name:', dataset_name, ' split:', split, ' decoder_name:',decoder_name)
n_components = 64 # Number of components for a mixture model
hypernet_hidden_sizes = [] # Number of units in MLP generating parameters ([] -- affine layer, [64] -- one layer, etc.)
## Flow params
# Polynomial
max_degree = 3 # Maximum degree value for Sum-of-squares polynomial flow (SOS)
n_terms = 4 # Number of terms for SOS flow
# DSF / FullyNN
n_layers = 2 # Number of layers for Deep Sigmoidal Flow (DSF) / Fully Neural Network flow (Omi et al., 2019)
layer_size = 64 # Number of mixture components / units in a layer for DSF and FullyNN
## Training config
regularization = 1e-5 # L2 regularization parameter
learning_rate = 1e-3 # Learning rate for Adam optimizer
max_epochs = 1000 # For how many epochs to train
display_step = 1 # Display training statistics after every display_step
patience = 1000
The output is:
Starting training...
Epoch 1, loss_train_last_batch = 13.4079, loss_val = 13.2335, mae_val=26766698.0000, rmse_val=1819904613285888.0000
Epoch 2, loss_train_last_batch = 13.1865, loss_val = 13.1495, mae_val=3426.1558, rmse_val=1383180928.0000
Epoch 3, loss_train_last_batch = 12.9530, loss_val = 13.0888, mae_val=512.5600, rmse_val=501593.9688
Epoch 4, loss_train_last_batch = 13.0125, loss_val = 13.0766, mae_val=503.6180, rmse_val=500986.3125
Epoch 5, loss_train_last_batch = 13.0837, loss_val = 13.0718, mae_val=506.8425, rmse_val=501509.8750
Epoch 6, loss_train_last_batch = 13.0529, loss_val = 13.0475, mae_val=499.4471, rmse_val=499397.9375
Epoch 7, loss_train_last_batch = 13.0129, loss_val = 13.0494, mae_val=492.0758, rmse_val=502415.4375
Epoch 8, loss_train_last_batch = 13.1083, loss_val = 13.0371, mae_val=501.7400, rmse_val=498083.4688
Epoch 9, loss_train_last_batch = 13.1387, loss_val = 13.0398, mae_val=498.1692, rmse_val=497672.8125
Epoch 10, loss_train_last_batch = 12.9956, loss_val = 13.0223, mae_val=500.8606, rmse_val=497590.7500
Epoch 11, loss_train_last_batch = 13.1573, loss_val = 13.0222, mae_val=508.6006, rmse_val=498502.3438
Epoch 12, loss_train_last_batch = 12.9300, loss_val = 13.0185, mae_val=500.3276, rmse_val=498012.8438
Epoch 13, loss_train_last_batch = 13.1425, loss_val = 13.0230, mae_val=519.6070, rmse_val=502788.6562
Epoch 14, loss_train_last_batch = 13.2399, loss_val = 13.0151, mae_val=500.7939, rmse_val=496978.7188
Epoch 15, loss_train_last_batch = 13.1108, loss_val = 13.0243, mae_val=507.5722, rmse_val=497184.7188
Epoch 16, loss_train_last_batch = 13.0713, loss_val = 13.0214, mae_val=520.1346, rmse_val=505738.1875
Epoch 17, loss_train_last_batch = 12.6886, loss_val = 13.0367, mae_val=486.1195, rmse_val=500092.6875
Epoch 18, loss_train_last_batch = 12.9877, loss_val = 13.0175, mae_val=497.2266, rmse_val=496743.5000
Epoch 19, loss_train_last_batch = 13.0687, loss_val = 12.9997, mae_val=503.1215, rmse_val=496066.5312
Epoch 20, loss_train_last_batch = 13.1702, loss_val = 13.0088, mae_val=492.2871, rmse_val=496514.0312
Epoch 21, loss_train_last_batch = 13.2350, loss_val = 13.0038, mae_val=497.3408, rmse_val=495459.9062
Epoch 22, loss_train_last_batch = 13.1782, loss_val = 12.9992, mae_val=495.5132, rmse_val=498258.4375
Epoch 23, loss_train_last_batch = 13.0062, loss_val = 13.0005, mae_val=499.0397, rmse_val=519898.7812
Epoch 24, loss_train_last_batch = 12.9067, loss_val = 12.9931, mae_val=507.4612, rmse_val=594324.8750
Epoch 25, loss_train_last_batch = 13.1079, loss_val = 12.9945, mae_val=517.7587, rmse_val=900886.3750
Epoch 26, loss_train_last_batch = 12.9857, loss_val = 12.9920, mae_val=512.2386, rmse_val=4392982.5000
Epoch 27, loss_train_last_batch = 12.9264, loss_val = 12.9950, mae_val=514.3329, rmse_val=5169827.0000
Epoch 28, loss_train_last_batch = 12.8157, loss_val = 13.0302, mae_val=493.4668, rmse_val=1860104.7500
Epoch 29, loss_train_last_batch = 12.5613, loss_val = 12.9847, mae_val=537.9836, rmse_val=11795128.0000
Epoch 30, loss_train_last_batch = 13.2016, loss_val = 12.9804, mae_val=529.3032, rmse_val=6777215.0000
Epoch 31, loss_train_last_batch = 12.9775, loss_val = 13.0134, mae_val=846.9025, rmse_val=128691048.0000
Epoch 32, loss_train_last_batch = 12.8428, loss_val = 12.9859, mae_val=1849.7711, rmse_val=980683584.0000
Epoch 33, loss_train_last_batch = 12.8807, loss_val = 12.9806, mae_val=1070.1407, rmse_val=245876272.0000
Epoch 34, loss_train_last_batch = 12.9979, loss_val = 12.9916, mae_val=2452.5298, rmse_val=1599526016.0000
Epoch 35, loss_train_last_batch = 12.8360, loss_val = 12.9800, mae_val=6217.9438, rmse_val=10881764352.0000
Epoch 36, loss_train_last_batch = 13.0218, loss_val = 12.9772, mae_val=4096.3394, rmse_val=4222642432.0000
Epoch 37, loss_train_last_batch = 13.1249, loss_val = 12.9804, mae_val=14538.7510, rmse_val=55435677696.0000
Epoch 38, loss_train_last_batch = 12.9419, loss_val = 12.9751, mae_val=5957.1592, rmse_val=9103671296.0000
Epoch 39, loss_train_last_batch = 12.9560, loss_val = 12.9771, mae_val=15795.3037, rmse_val=63540269056.0000
Epoch 40, loss_train_last_batch = 12.6515, loss_val = 12.9749, mae_val=20455.7715, rmse_val=111674048512.0000
Epoch 41, loss_train_last_batch = 12.6312, loss_val = 12.9742, mae_val=24490.1016, rmse_val=161982464000.0000
Epoch 42, loss_train_last_batch = 12.8424, loss_val = 12.9783, mae_val=58356.1211, rmse_val=895563464704.0000
Epoch 43, loss_train_last_batch = 12.8352, loss_val = 12.9720, mae_val=707848.0625, rmse_val=131829575188480.0000
Epoch 44, loss_train_last_batch = 13.1982, loss_val = 12.9710, mae_val=159048.2344, rmse_val=6599020642304.0000
Epoch 45, loss_train_last_batch = 13.0262, loss_val = 12.9698, mae_val=1080644.5000, rmse_val=301453822394368.0000
Epoch 46, loss_train_last_batch = 13.0949, loss_val = 12.9716, mae_val=1310502.1250, rmse_val=443600429121536.0000
Epoch 47, loss_train_last_batch = 13.0794, loss_val = 12.9766, mae_val=952063.8750, rmse_val=236723497861120.0000
Epoch 48, loss_train_last_batch = 13.0596, loss_val = 12.9782, mae_val=2891685.2500, rmse_val=2266192517529600.0000
Epoch 49, loss_train_last_batch = 13.0674, loss_val = 12.9788, mae_val=2158038.7500, rmse_val=1224882125799424.0000
Epoch 50, loss_train_last_batch = 12.8285, loss_val = 12.9803, mae_val=6511638.0000, rmse_val=12143592680194048.0000
Epoch 51, loss_train_last_batch = 12.8486, loss_val = 12.9766, mae_val=6650059.5000, rmse_val=12599905038106624.0000
Epoch 52, loss_train_last_batch = 12.9981, loss_val = 12.9706, mae_val=4850225.5000, rmse_val=6423470409777152.0000
Epoch 53, loss_train_last_batch = 12.9091, loss_val = 12.9838, mae_val=8188149.5000, rmse_val=19454163888898048.0000
Epoch 54, loss_train_last_batch = 12.5680, loss_val = 12.9725, mae_val=45244352.0000, rmse_val=inf
Epoch 55, loss_train_last_batch = 12.6416, loss_val = 12.9725, mae_val=3023256.0000, rmse_val=2654210332033024.0000
Epoch 56, loss_train_last_batch = 13.4106, loss_val = 12.9710, mae_val=3500038.2500, rmse_val=3797295208333312.0000
Epoch 57, loss_train_last_batch = 12.7122, loss_val = 12.9776, mae_val=17572978.0000, rmse_val=inf
Epoch 58, loss_train_last_batch = 12.8259, loss_val = 12.9673, mae_val=27126030.0000, rmse_val=inf
Epoch 59, loss_train_last_batch = 13.2648, loss_val = 12.9665, mae_val=23597588.0000, rmse_val=inf
Epoch 60, loss_train_last_batch = 13.1881, loss_val = 12.9709, mae_val=19627790.0000, rmse_val=inf
After around 20 epochs, train/val loss
is still right while the mae
and rmse
go abnormal and finally they become inf
. Could you help me to find out the reason? Thank you!
from ifl-tpp.
I found after training the model many epochs, the loss score is right, but MAE or RMSE will go abnormal. Only in the first few epochs, the MAE and RMSE look normal.
My code is as follow:
I modify yourget_total_loss()
to:def get_total_loss(loader): loader_log_prob, loader_lengths = [], [] loader_mae, loader_rmse = [], [] for input in loader: loader_log_prob.append(model.log_prob(input).detach()) loader_lengths.append(input.length.detach()) # calculate means h = model.rnn(input) gmm = model.decoder.base_dist prior_logits, means, log_scales = gmm.get_params(h, None) prior = prior_logits.exp() scales_squared = (log_scales * 2).exp() affine = model.decoder.transforms[0] a = affine.log_scale.exp().item() b = affine.shift.item() mean_time = (prior * torch.exp(a * means + b + 0.5 * a ** 2 * scales_squared)).sum(-1) mae = abs(mean_time - input.out_time) rmse = ((mean_time - input.out_time) ** 2) loader_mae.append(mae) loader_rmse.append(rmse) return -model.aggregate(loader_log_prob, loader_lengths), torch.sqrt(model.aggregate(loader_mae, loader_lengths)), torch.sqrt(model.aggregate(loader_rmse, loader_lengths))
I print loss, mae and rmse on the val dataset, like this:
for epoch in range(max_epochs): model.train() for input in dl_train: opt.zero_grad() log_prob = model.log_prob(input) loss = -model.aggregate(log_prob, input.length) loss.backward() opt.step() model.eval() loss_val, val_mae, val_rmse = get_total_loss(dl_train) training_val_losses.append(loss_val.item()) if (best_loss - loss_val) < 1e-4: impatient += 1 if loss_val < best_loss: best_loss = loss_val.item() best_model = deepcopy(model.state_dict()) else: best_loss = loss_val.item() best_model = deepcopy(model.state_dict()) impatient = 0 if impatient >= patience: print(f'Breaking due to early stopping at epoch {epoch}') break if (epoch + 1) % display_step == 0: print(f"Epoch {epoch+1:4d}, loss_train_last_batch = {loss:.4f}, loss_val = {loss_val:.4f}, mae_val={val_mae:.4f}, rmse_val={val_rmse:.4f}")
I run many times on the dataset
yelp_toronto
, the settings are:dataset_name = 'yelp_toronto' # other: [ 'yelp_toronto', 'wikipedia', 'mooc', 'stack_overflow', 'lastfm', # 'reddit', 'synth/poisson', 'synth/renewal', 'synth/self_correcting', # 'synth/hawkes1', 'synth/hawkes2'] split = 'each_sequence' # How to split the sequences (other 'each_sequence' -- split every seq. into train/val/test) ## General model config use_history = True # Whether to use RNN to encode history history_size = 64 # Size of the RNN hidden vector rnn_type = 'RNN' # Which RNN cell to use (other: ['GRU', 'LSTM']) use_embedding = False # Whether to use sequence embedding (should use with 'each_sequence' split) embedding_size = 32 # Size of the sequence embedding vector # IMPORTANT: when using split = 'whole_sequences', the model will only learn embeddings # for the training sequences, and not for validation / test trainable_affine = False # Train the final affine layer ## Decoder config decoder_name = 'LogNormMix' # other: ['RMTPP', 'FullyNeuralNet', 'Exponential', 'SOSPolynomial', 'DeepSigmoidalFlow','LogNormMix'] print('dataset_name:', dataset_name, ' split:', split, ' decoder_name:',decoder_name) n_components = 64 # Number of components for a mixture model hypernet_hidden_sizes = [] # Number of units in MLP generating parameters ([] -- affine layer, [64] -- one layer, etc.) ## Flow params # Polynomial max_degree = 3 # Maximum degree value for Sum-of-squares polynomial flow (SOS) n_terms = 4 # Number of terms for SOS flow # DSF / FullyNN n_layers = 2 # Number of layers for Deep Sigmoidal Flow (DSF) / Fully Neural Network flow (Omi et al., 2019) layer_size = 64 # Number of mixture components / units in a layer for DSF and FullyNN ## Training config regularization = 1e-5 # L2 regularization parameter learning_rate = 1e-3 # Learning rate for Adam optimizer max_epochs = 1000 # For how many epochs to train display_step = 1 # Display training statistics after every display_step patience = 1000
The output is:
Starting training... Epoch 1, loss_train_last_batch = 13.4079, loss_val = 13.2335, mae_val=26766698.0000, rmse_val=1819904613285888.0000 Epoch 2, loss_train_last_batch = 13.1865, loss_val = 13.1495, mae_val=3426.1558, rmse_val=1383180928.0000 Epoch 3, loss_train_last_batch = 12.9530, loss_val = 13.0888, mae_val=512.5600, rmse_val=501593.9688 Epoch 4, loss_train_last_batch = 13.0125, loss_val = 13.0766, mae_val=503.6180, rmse_val=500986.3125 Epoch 5, loss_train_last_batch = 13.0837, loss_val = 13.0718, mae_val=506.8425, rmse_val=501509.8750 Epoch 6, loss_train_last_batch = 13.0529, loss_val = 13.0475, mae_val=499.4471, rmse_val=499397.9375 Epoch 7, loss_train_last_batch = 13.0129, loss_val = 13.0494, mae_val=492.0758, rmse_val=502415.4375 Epoch 8, loss_train_last_batch = 13.1083, loss_val = 13.0371, mae_val=501.7400, rmse_val=498083.4688 Epoch 9, loss_train_last_batch = 13.1387, loss_val = 13.0398, mae_val=498.1692, rmse_val=497672.8125 Epoch 10, loss_train_last_batch = 12.9956, loss_val = 13.0223, mae_val=500.8606, rmse_val=497590.7500 Epoch 11, loss_train_last_batch = 13.1573, loss_val = 13.0222, mae_val=508.6006, rmse_val=498502.3438 Epoch 12, loss_train_last_batch = 12.9300, loss_val = 13.0185, mae_val=500.3276, rmse_val=498012.8438 Epoch 13, loss_train_last_batch = 13.1425, loss_val = 13.0230, mae_val=519.6070, rmse_val=502788.6562 Epoch 14, loss_train_last_batch = 13.2399, loss_val = 13.0151, mae_val=500.7939, rmse_val=496978.7188 Epoch 15, loss_train_last_batch = 13.1108, loss_val = 13.0243, mae_val=507.5722, rmse_val=497184.7188 Epoch 16, loss_train_last_batch = 13.0713, loss_val = 13.0214, mae_val=520.1346, rmse_val=505738.1875 Epoch 17, loss_train_last_batch = 12.6886, loss_val = 13.0367, mae_val=486.1195, rmse_val=500092.6875 Epoch 18, loss_train_last_batch = 12.9877, loss_val = 13.0175, mae_val=497.2266, rmse_val=496743.5000 Epoch 19, loss_train_last_batch = 13.0687, loss_val = 12.9997, mae_val=503.1215, rmse_val=496066.5312 Epoch 20, loss_train_last_batch = 13.1702, loss_val = 13.0088, mae_val=492.2871, rmse_val=496514.0312 Epoch 21, loss_train_last_batch = 13.2350, loss_val = 13.0038, mae_val=497.3408, rmse_val=495459.9062 Epoch 22, loss_train_last_batch = 13.1782, loss_val = 12.9992, mae_val=495.5132, rmse_val=498258.4375 Epoch 23, loss_train_last_batch = 13.0062, loss_val = 13.0005, mae_val=499.0397, rmse_val=519898.7812 Epoch 24, loss_train_last_batch = 12.9067, loss_val = 12.9931, mae_val=507.4612, rmse_val=594324.8750 Epoch 25, loss_train_last_batch = 13.1079, loss_val = 12.9945, mae_val=517.7587, rmse_val=900886.3750 Epoch 26, loss_train_last_batch = 12.9857, loss_val = 12.9920, mae_val=512.2386, rmse_val=4392982.5000 Epoch 27, loss_train_last_batch = 12.9264, loss_val = 12.9950, mae_val=514.3329, rmse_val=5169827.0000 Epoch 28, loss_train_last_batch = 12.8157, loss_val = 13.0302, mae_val=493.4668, rmse_val=1860104.7500 Epoch 29, loss_train_last_batch = 12.5613, loss_val = 12.9847, mae_val=537.9836, rmse_val=11795128.0000 Epoch 30, loss_train_last_batch = 13.2016, loss_val = 12.9804, mae_val=529.3032, rmse_val=6777215.0000 Epoch 31, loss_train_last_batch = 12.9775, loss_val = 13.0134, mae_val=846.9025, rmse_val=128691048.0000 Epoch 32, loss_train_last_batch = 12.8428, loss_val = 12.9859, mae_val=1849.7711, rmse_val=980683584.0000 Epoch 33, loss_train_last_batch = 12.8807, loss_val = 12.9806, mae_val=1070.1407, rmse_val=245876272.0000 Epoch 34, loss_train_last_batch = 12.9979, loss_val = 12.9916, mae_val=2452.5298, rmse_val=1599526016.0000 Epoch 35, loss_train_last_batch = 12.8360, loss_val = 12.9800, mae_val=6217.9438, rmse_val=10881764352.0000 Epoch 36, loss_train_last_batch = 13.0218, loss_val = 12.9772, mae_val=4096.3394, rmse_val=4222642432.0000 Epoch 37, loss_train_last_batch = 13.1249, loss_val = 12.9804, mae_val=14538.7510, rmse_val=55435677696.0000 Epoch 38, loss_train_last_batch = 12.9419, loss_val = 12.9751, mae_val=5957.1592, rmse_val=9103671296.0000 Epoch 39, loss_train_last_batch = 12.9560, loss_val = 12.9771, mae_val=15795.3037, rmse_val=63540269056.0000 Epoch 40, loss_train_last_batch = 12.6515, loss_val = 12.9749, mae_val=20455.7715, rmse_val=111674048512.0000 Epoch 41, loss_train_last_batch = 12.6312, loss_val = 12.9742, mae_val=24490.1016, rmse_val=161982464000.0000 Epoch 42, loss_train_last_batch = 12.8424, loss_val = 12.9783, mae_val=58356.1211, rmse_val=895563464704.0000 Epoch 43, loss_train_last_batch = 12.8352, loss_val = 12.9720, mae_val=707848.0625, rmse_val=131829575188480.0000 Epoch 44, loss_train_last_batch = 13.1982, loss_val = 12.9710, mae_val=159048.2344, rmse_val=6599020642304.0000 Epoch 45, loss_train_last_batch = 13.0262, loss_val = 12.9698, mae_val=1080644.5000, rmse_val=301453822394368.0000 Epoch 46, loss_train_last_batch = 13.0949, loss_val = 12.9716, mae_val=1310502.1250, rmse_val=443600429121536.0000 Epoch 47, loss_train_last_batch = 13.0794, loss_val = 12.9766, mae_val=952063.8750, rmse_val=236723497861120.0000 Epoch 48, loss_train_last_batch = 13.0596, loss_val = 12.9782, mae_val=2891685.2500, rmse_val=2266192517529600.0000 Epoch 49, loss_train_last_batch = 13.0674, loss_val = 12.9788, mae_val=2158038.7500, rmse_val=1224882125799424.0000 Epoch 50, loss_train_last_batch = 12.8285, loss_val = 12.9803, mae_val=6511638.0000, rmse_val=12143592680194048.0000 Epoch 51, loss_train_last_batch = 12.8486, loss_val = 12.9766, mae_val=6650059.5000, rmse_val=12599905038106624.0000 Epoch 52, loss_train_last_batch = 12.9981, loss_val = 12.9706, mae_val=4850225.5000, rmse_val=6423470409777152.0000 Epoch 53, loss_train_last_batch = 12.9091, loss_val = 12.9838, mae_val=8188149.5000, rmse_val=19454163888898048.0000 Epoch 54, loss_train_last_batch = 12.5680, loss_val = 12.9725, mae_val=45244352.0000, rmse_val=inf Epoch 55, loss_train_last_batch = 12.6416, loss_val = 12.9725, mae_val=3023256.0000, rmse_val=2654210332033024.0000 Epoch 56, loss_train_last_batch = 13.4106, loss_val = 12.9710, mae_val=3500038.2500, rmse_val=3797295208333312.0000 Epoch 57, loss_train_last_batch = 12.7122, loss_val = 12.9776, mae_val=17572978.0000, rmse_val=inf Epoch 58, loss_train_last_batch = 12.8259, loss_val = 12.9673, mae_val=27126030.0000, rmse_val=inf Epoch 59, loss_train_last_batch = 13.2648, loss_val = 12.9665, mae_val=23597588.0000, rmse_val=inf Epoch 60, loss_train_last_batch = 13.1881, loss_val = 12.9709, mae_val=19627790.0000, rmse_val=inf
After around 20 epochs,
train/val loss
is still right while themae
andrmse
go abnormal and finally they becomeinf
. Could you help me to find out the reason? Thank you!
I am not sure have you solved the problem or not, but this is what author says in another issue "Under default settings, we transform the RNN input in_times by applying logarithm (code), and also additionally normalize the values to have zero mean and unit standard deviation (code) using the statistics of the training set." I suppose this may cause the errors you encountered. Hope it helps.
from ifl-tpp.
@guoshnBJTU sorry, I just noticed your last post. I will look into it this week.
from ifl-tpp.
Thank you. I find if I set n_components
to a smaller number such as 2,4,8, the loss will be always normal no matter how long it trains. But I still do not find the reason...
from ifl-tpp.
@guoshnBJTU I found that it's possible to get somewhat reasonable MAE/MSE values by normalizing the errors before computing the loss (as done in https://arxiv.org/pdf/1907.07561.pdf - see Equation 20 and Table 4).
# I add 1e-8 to avoid division by zero where out_time = 0
mae = abs((mean_time - input.out_time) / (input.out_time + 1e-8))
rmse = (((mean_time - input.out_time) / (input.out_time + 1e-8)) ** 2)
Here is the output that I get with n_components = 64
on yelp_toronto
when computing MAE and MSE on the validation set https://pastebin.com/kY4zkEJG.
However, the MSE/MAE values still start to diverge after a while as the model starts overfitting.
I have a hypothesis as to why this happens when we have a large number of mixture components. Log-normal is a heavy-tailed distribution, and if the scale parameter gets large for even a single component of our mixture, we get an extremely high expected value for this one component. This means that in the end this component has a disproportionately large influence on the overall expected inter-arrival time, and the MAE/MSE loss doesn't look great.
Also, it might be that the problem arises because the inter-arrival times are on a very large scale, which leads to some numerical issues. It could be that rescaling the inter-arrival times will lead to a more stable behavior (but I'm not 100% sure about this).
from ifl-tpp.
@shchur Thank you. I agree with your hypothesis and explanation.
from ifl-tpp.
According to your reply, I calculate the mean as follows:
prior_logits, means, log_scales = base_dist.get_params(h, emb)
s = torch.exp(log_scales)
prior = torch.exp(prior_logits)
expectation = torch.sum(prior * torch.exp(a * means + b + a * a * s * s / 2), dim=-1)
,
within whicha=std_in_train
,b=mean_in_train
,base_dist = NormalMixtureDistribution()
. But, I got errorWARNING:root:NaN or Inf found in input tensor.
. Could you help me to find out why I got such error? which step is wrong in my code?
sorry, I am just confused about the expectation calculation process. I think the code 'expectation = torch.sum(prior * torch.exp(a * means + b + a * a * s * s / 2), dim=-1)
which is calculating the z_2 expectation not the E(\tau), why not just adopt E_P[\tau] in page 4?
from ifl-tpp.
Please have a look at #3 (comment)
from ifl-tpp.
I have got it. Thank you very much!
from ifl-tpp.
Hi, @shchur! Could you please tell me how to obtain the mark at the next event? Thanks.
from ifl-tpp.
Hi @SunderlandAJ-1130, in this line of code we compute the logits of the mark distribution. If you're interested in the probabilities of the next mark, you can compute
mark_probs = mark_logits.softmax(dim=-1)
mark_probs
will be a tensor of shape [batch_size, seq_length, num_marks]
where each entry mark_probs[i, j, k]
corresponds to probability that the event j
of sequence i
has event of type k
.
from ifl-tpp.
@shchur Thank you very much. By the way, I found your video about the paper "Neural Temporal Point Processes: A Review" on Youtube. In this video, you said that the neural TPP can be used to estimate how many events will happen on the forecast horizon (at about 10:30 of this video). So, I wonder whether your ifl-tpp model to achieve this goal. If it is possible, could you please describe how to deal with this task with your code? Thanks!
from ifl-tpp.
Suppose you observed a sequence of events over time interval context_init
parameter to the sample
method https://github.com/shchur/ifl-tpp/blob/e7ebab1ceab56cee440bd8e99b5c1bd42d6ada07/code/dpp/models/recurrent_tpp.py#L139
I don't think there is way to obtain this distribution over # of events in
from ifl-tpp.
Thank you very much! @shchur
from ifl-tpp.
Related Issues (20)
- Why does the normalization apply only to the input? HOT 1
- LogNorm curiosity HOT 4
- Code for using context vector in the models HOT 2
- on log likelihood misunderstanding HOT 4
- Loss with NLL of mark and MAE of inter-event time HOT 6
- history HOT 5
- Hyperparameters for reproducibility HOT 6
- Sampling points of a specific mark HOT 3
- Implementation on missing data imputation HOT 1
- Understanding given datasets HOT 5
- How to get sequence embeddings? HOT 1
- NLL results HOT 5
- Sampling with additional conditional information
- Calculate the mean of the entire distribution. HOT 7
- all evaluation expriments code HOT 1
- How could I get the predicted results? HOT 20
- ATM dataset testing HOT 2
- Learning with Marks HOT 9
- How to get the expression of the distribution of inter-event time HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ifl-tpp.