Giter VIP home page Giter VIP logo

ifl-tpp's Introduction

Intensity-Free Learning of Temporal Point Processes

Pytorch implementation of the paper "Intensity-Free Learning of Temporal Point Processes", Oleksandr Shchur, Marin Biloš and Stephan Günnemann, ICLR 2020.

Refactored code

The master branch contains a refactored version of the code. Some of the original functionality is missing, but the code is much cleaner and should be easier to extend.

You can find the original code (used for experiments in the paper) on branch original-code.

Usage

In order to run the code, you need to install the dpp library that contains all the algorithms described in the paper

cd code
python setup.py install

A Jupyter notebook code/interactive.ipynb contains the code for training models on the datasets used in the paper. The same code can also be run as a Python script code/train.py.

Using your own data

You can save your custom dataset in the format used in our code as follows:

dataset = {
    "sequences": [
        {"arrival_times": [0.2, 4.5, 9.1], "marks": [1, 0, 4], "t_start": 0.0, "t_end": 10.0},
        {"arrival_times": [2.3, 3.3, 5.5, 8.15], "marks": [4, 3, 2, 2], "t_start": 0.0, "t_end": 10.0},
    ],
    "num_marks": 5,
}
torch.save(dataset, "data/my_dataset.pkl")

Defining new models

RecurrentTPP is the base class for marked TPP models.

You just need to inherit from it and implement the get_inter_time_dist method that defines how to obtain the distribution (an instance of torch.distributions.Distribution) over the inter-event times given the context vector. For example, have a look at the LogNormMix model from our paper. You can also change the get_features and get_context methods of RecurrentTPP to, for example, use a transformer instead of an RNN.

Mistakes in the old version

  • In the old code we used to normalize the NLL of each sequence by the number of events --- this was incorrect. When computing NLL for multiple TPP sequences, we are only allowed to divide the NLL by the same number for each sequence.
  • In the old code we didn't include the survival time of the last event (i.e. time from the last event until the end of the obseved interval) into the NLL computation. This is fixed in the refactored version (and by the way, this seems to be a common mistake in other TPP implementations online).

Requirements

numpy=1.16.4
pytorch=1.2.0
scikit-learn=0.21.2
scipy=1.3.1

Cite

Please cite our paper if you use the code or datasets in your own work

@article{
    shchur2020intensity,
    title={Intensity-Free Learning of Temporal Point Processes},
    author={Oleksandr Shchur and Marin Bilo\v{s} and Stephan G\"{u}nnemann},
    journal={International Conference on Learning Representations (ICLR)},
    year={2020},
}

ifl-tpp's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

ifl-tpp's Issues

Calculate the mean of the entire distribution.

Thanks for your code.
I wonder how to calculate the mean of the entire distribution.
Because I not only want to get the NLL loss of \tau, but also want to get the rmse/mae of \tau. And I will use the mean of \tau to calculate rmse/mae.
I think I should use the eq E[\tau]=\sum_k w_k \exp( \mu_k + s^2_k/2). But I cannotget the right answer.
Could you tell me how to get the mean? Thank you!

Loss with NLL of mark and MAE of inter-event time

Hi Oleksandr,

I want to model a marked TPP with LogNormMix. For that I also want to consider the MAE as a loss for the inter-event times. But I'm unsure how to combine it with the NLL loss of the predicted marks? I tried following and it worked quite well:

Loss = abs(E[p(tau)] - tau) + NLL(mark) = MAE(tau) + NLL(mark)

But is this a reasonable loss function?

Thank you for your help!

Calculate the mean of the entire distribution.

According to your reply, I calculate the mean as follows:
prior_logits, means, log_scales = base_dist.get_params(h, emb)
s = torch.exp(log_scales)
prior = torch.exp(prior_logits)
expectation = torch.sum(prior * torch.exp(a * means + b + a * a * s * s / 2), dim=-1),
within which a=std_in_train, b=mean_in_train, base_dist = NormalMixtureDistribution(). But, I got error WARNING:root:NaN or Inf found in input tensor.. Could you help me to find out why I got such error? which step is wrong in my code?

LogNorm curiosity

Hi @shchur - my code is somewhat deviated from the repo as it currently stands, but we are still experiencing a few oddities. Notably - we are sometimes getting negative inter-event times when calling decoder.sample with the LogNormMix implementation.

Having dug around the codebase, I think I've found the problem, but I just wanted to verify!

Looking at the normal_sample

def normal_sample(means, log_scales):
if means.shape != log_scales.shape:
raise ValueError("Shapes of means and scales don't match.")
z = torch.empty(means.shape).normal_(0., 1.)
return torch.exp(log_scales) * z + means
as it is called in the _sample call, it seems that there isn't anything preventing a negative result if the sample drawn at Line 16 is negative, and the mean of the corresponding mixture component is positive enough, for small mixture components. Is my understanding of this presently correct?

How to get sequence embeddings?

Would you provide the code to get sequence embeddings? Or If it is already provided in the code, would you please let me know which part?It's would save me a lot of time. thanks

Hyperparameters for reproducibility

Thank you for providing your code, it is very well organized.

Would it also be possible to release the hyperparameters used for each of the datasets in the paper? This comes from me having trouble reproducing the set of numbers in Table 3 in the appendix for the provided datasets (StackOverflow and other synthetic datasets)? For example, running the code out of the box gives:

Breaking due to early stopping at epoch 172
Negative log-likelihood:

  • Train: 1151.5
  • Val: 1179.1
  • Test: 1131.5

For stack overflow dataset at seed = 0, which is not the 14.4 number.

In addition, I was wondering if it would be possible to release the code for event time prediction using history?

Understanding given datasets

Hi @shchur,

I do not really know if this is the right place to ask you this question. But I need to know the backgrounds of these datasets.

I see that in your data, the datasets are partitioned into multiple sequences. What is the reason behind partitioning a single sequence into multiple sequences? How do I partition my sequence into multiple sub-sequences? For your information, my data consists of multiple users (for the moment, around 50) posting on social media. Unique marks are assigned to every user.

Any help would be greatly appreciated. Thank you.

How could I get the predicted results?

This repo hightlights me a lot

However, I focus more on the predicted results In my work.

It's hard for me to understand your implementation datails to deduct the predict results as I just step into the area for few months.

Could you please tell me how to get the predicted results based on your code?

I'd appreciated it if you can help .

Thanks any way

Implementation on missing data imputation

Hi @shchur, thanks for releasing the code! I'm wondering if it is possible that you could provide the code for the section of missing data imputation - Sec. 5.4 & F.4 MISSING DATA IMPUTATION from your paper.

I'm curious about your implementation of feeding imputations to the RNN. If keep your current framework, to include the imputations to be the history, the batch will keep changing and we need get_features(batch) & get_context(features) every time we have a new imputed event. This gives a very slow training process. Could you provide your implementation on this part, or give me some suggestions on implementing this 'training while imputing'? Thanks!

Learning with Marks

Thanks for the release of your paper and code.
In trying to implement learning with marks with the provided interactive notebook, adapting the remarks in the paper, I'm also running into some trouble. Based on appendix F.2. I assume it's a case of just adding the terms?

model.log_prob in this case returns the (time_log_prob, mark_nll, accuracy) - so adapting for the training loop, is it as simple as changing lines as below?:

        log_prob = model.log_prob(input)
        loss = -model.aggregate(log_prob, input.length)

for

        if use_marks:
            log_prob, mark_nll, mark_acc = model.log_prob(input)
        else:
            log_prob = model.log_prob(input)
        loss = -model.aggregate(log_prob + mark_nll, input.length)

As a side problem - when doing the above with my custom dataset (which conforms to the same formatting as the example datasets, so arrival_times and marks), all loss terms are NaN. I'm wondering if you might have some insight as to why this might be occurring! When using the reddit dataset with the above modifications, I get non-zero loss terms for both log_prob and mark_nll.

Code for using context vector in the models

@shchur - thank you so much for sharing the code, it is very readable and follows the paper closely. I was hoping you could give me pointers on how to incorporate the context vector $y_i$ in the models. This is described for the Yelp dataset in sec. F.3 in the paper but I was not able to find it in your codebase.
Thanks again for your help!

on log likelihood misunderstanding

In the log likelihood, there are two terms:
\lambda(\tau) and - \int_0^\tau \Lambda(s)ds
However, the first term in your code is calculated directedly use the inter_time_dist.log_prob() function, which is already the log of pdf, i.e. the sum of the two terms. And then the second term is calculated by log_survival_function again. I want to make sure if I misunderstood you, or it is an error.

Missing data imputation

Hi, this is a very inspiring work, and I'm quite interested in section 5.4 (Missing data imputation). Could you please provide a pointer to the code implementation? Thanks very much!

use other dataset

hi,i am trying to use wikipedia data from the original code , but the type of data is different ,please tell me what should i do ? i am new in this area ,i am sorry if the question is native.

all evaluation expriments code

Would you mind to open other evaluation expriments code include learn with masks and mask prediction
, event time predcition using history ?

Sampling with additional conditional information

Hi, Shchur! Thank you so much for sharing your code. It is pretty helpful for my current research, but I encountered a problem with sampling.

I trained an ifl-tpp model with additional conditional information (e.g., Yelp dataset). However, during the sampling process, there is no additional information for the model. So, I don't know how to get the final context embedding with additional conditional information during sampling.

Do you have some suggestions for this problem? Thanks!

Sampling points of a specific mark

Hi, thanks for the wonderful code. I am able to use it easily and reproduce the results.

I wanted to sample points of a specific mark in a horizon, I would appreciate any pointers from you how I could do this with your code.

ATM dataset testing

Hi,

I'm new to TPP.

I was trying to run your interactive ipython file with the ATM dataset as CSV file. I see all datasets you used are .npz.
Do I have to preprocess ATM csv file to .npz? How you set the sequence length? Since the ATM dataset I have contains only 4 months of data.

I request you to clarify.

Regards,
kiahsa

history

I want to know whether your method takes into account the influence of historical events?thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.