Giter VIP home page Giter VIP logo

patrick-kidger / torchcde Goto Github PK

View Code? Open in Web Editor NEW
393.0 14.0 45.0 253 KB

Differentiable controlled differential equation solvers for PyTorch with GPU support and memory-efficient adjoint backpropagation.

License: Apache License 2.0

Python 100.00%
time-series machine-learning neural-differential-equations controlled-differential-equations deep-learning deep-neural-networks pytorch dynamical-systems differential-equations neural-networks

torchcde's Introduction

torchcde

Differentiable GPU-capable solvers for CDEs

Update: for any new projects, I would now recommend using Diffrax instead. This is much faster, and producion-quality. torchcde was its prototype as a research project!

This library provides differentiable GPU-capable solvers for controlled differential equations (CDEs). Backpropagation through the solver or via the adjoint method is supported; the latter allows for improved memory efficiency.

In particular this allows for building Neural Controlled Differential Equation models, which are state-of-the-art models for (arbitrarily irregular!) time series. Neural CDEs can be thought of as a "continuous time RNN".


Installation

pip install torchcde

Requires PyTorch >=1.7.

Example

import torch
import torchcde

# Create some data
batch, length, input_channels = 1, 10, 2
hidden_channels = 3
t = torch.linspace(0, 1, length)
t_ = t.unsqueeze(0).unsqueeze(-1).expand(batch, length, 1)
x_ = torch.rand(batch, length, input_channels - 1)
x = torch.cat([t_, x_], dim=2)  # include time as a channel

# Interpolate it
coeffs = torchcde.hermite_cubic_coefficients_with_backward_differences(x)
X = torchcde.CubicSpline(coeffs)

# Create the Neural CDE system
class F(torch.nn.Module):
    def __init__(self):
        super(F, self).__init__()
        self.linear = torch.nn.Linear(hidden_channels,
                                      hidden_channels * input_channels)

    def forward(self, t, z):
        return self.linear(z).view(batch, hidden_channels, input_channels)

func = F()
z0 = torch.rand(batch, hidden_channels)

# Integrate it
torchcde.cdeint(X=X, func=func, z0=z0, t=X.interval)

See time_series_classification.py, which demonstrates how to use the library to train a Neural CDE model to predict the chirality of a spiral.

Also see irregular_data.py, for demonstrations on how to handle variable-length inputs, irregular sampling, or missing data, all of which can be handled easily, without changing the model.

Citation

If you found use this library useful, please consider citing

@article{kidger2020neuralcde,
    title={{N}eural {C}ontrolled {D}ifferential {E}quations for {I}rregular {T}ime {S}eries},
    author={Kidger, Patrick and Morrill, James and Foster, James and Lyons, Terry},
    journal={Advances in Neural Information Processing Systems},
    year={2020}
}

Documentation

The library consists of two main components: (1) integrators for solving controlled differential equations, and (2) ways of constructing controls from data.

Integrators

The library provides the cdeint function, which solves the system of controlled differential equations:

dz(t) = f(t, z(t))dX(t)     z(t_0) = z0

The goal is to find the response z driven by the control X. This can be re-written as the following differential equation:

dz/dt(t) = f(t, z)dX/dt(t)     z(t_0) = z0

where the right hand side describes a matrix-vector product between f(t, z) and dX/dt(t).

This is solved by

cdeint(X, func, z0, t, adjoint, backend, **kwargs)

where letting ... denote an arbitrary number of batch dimensions:

  • X is a torch.nn.Module with method derivative, such that X.derivative(t) is a Tensor of shape (..., input_channels),
  • func is a torch.nn.Module, such that func(t, z) returns a Tensor of shape (..., hidden_channels, input_channels),
  • z0 is a Tensor of shape (..., hidden_channels),
  • t is a one-dimensional Tensor of times to output z at.
  • adjoint is a boolean (defaulting to True).
  • backend is a string (defaulting to "torchdiffeq").

Adjoint backpropagation (which is slower but more memory efficient) can be toggled with adjoint=True/False.

The backend should be either "torchdiffeq" or "torchsde", corresponding to which underlying library to use for the solvers. If using torchsde then the stochastic term is zero -- so the CDE is still reduced to an ODE. This is useful if one library supports a feature that the other doesn't. (For example torchsde supports a reversible solver, the reversible Heun method; at time of writing torchdiffeq does not support any reversible solvers.)

Any additional **kwargs are passed on to torchdiffeq.odeint[_adjoint] or torchsde.sdeint[_adjoint], for example to specify the solver.

Constructing controls

A very common scenario is to construct the continuous controlX from discrete data (which may be irregularly sampled with missing values). To support this, we provide three main interpolation schemes:

  • Hermite cubic splines with backwards differences
  • Linear interpolation
  • Rectilinear interpolation

Note that if for some reason you already have a continuous control X then you won't need an interpolation scheme at all!

Hermite cubic splines are usually the best choice, if possible. Linear and rectilinear interpolations are particularly useful in causal settings -- when at inference time the data is arriving over time. We go into further details in the Further Documentation below.

Just demonstrating Hermite cubic splines for now:

coeffs = hermite_cubic_coefficients_with_backward_differences(x)

# coeffs is a torch.Tensor you can save, load,
# pass through Datasets and DataLoaders etc.

X = CubicSpline(coeffs)

where:

  • x is a Tensor of shape (..., length, input_channels), where ... is some number of batch dimensions. Missing data should be represented as a NaN.

The interface provided by CubicSpline is:

  • .interval, which gives the time interval the spline is defined over. (Often used as the t argument in cdeint.) This is determined implicitly from the length of the data, and so does not in general correspond to the time your data was actually observed at. (See the Further Documentation note on reparameterisation invariance.)
  • .grid_points is all of the knots in the spline, so that for example X.evaluate(X.grid_points) will recover the original data.
  • .evaluate(t), where t is an any-dimensional Tensor, to evaluate the spline at any (collection of) time(s).
  • .derivative(t), where t is an any-dimensional Tensor, to evaluate the derivative of the spline at any (collection of) time(s).

Usually hermite_cubic_coefficients_with_backward_differences should be computed as a preprocessing step, whilst CubicSpline should be called inside the forward pass of your model. See time_series_classification.py for a worked example.

Then call:

cdeint(X=X, func=... z0=..., t=X.interval)

Further documentation

The earlier documentation section should give everything you need to get up and running.

Here we discuss a few more advanced bits of functionality:

  • The reparameterisation invariance property of CDEs.
  • Other interpolation methods, and the differences between them.
  • The use of fixed solvers. (They just work.)
  • Stacking CDEs (i.e. controlling one by the output of another).
  • Computing logsignatures for the log-ODE method.

Reparameterisation invariance

This is a classical fact about CDEs.

Let be differentiable and increasing, with and . Let , let , let , and let . Then substituting into a CDE (and just using the standard change of variables formula):

We see that also satisfies the neural CDE equation, just with as input instead of . In other words, using changes the speed at which we traverse the input , and correspondingly changes the speed at which we traverse the output -- and that's it! In particular the CDE itself doesn't need any adjusting.

This ends up being a really useful fact for writing neater software. We can handle things like messy data (e.g. variable length time series) just during data preprocessing, without it complicating the model code. In time_series_classification.py, the region we integrate over is given by X.interval as a standardised region to integrate over. In the example irregular_data.py, we use this to handle variable-length data.

Different interpolation methods

For a full breakdown into the interpolation schemes, see Neural Controlled Differential Equations for Online Prediction Tasks where each interpolation scheme is scrutinised, and best practices are presented.

In brief:

  • Will your data: (a) be arriving in an online fashion at inference time; and (b) be multivariate; and (c) potentially have missing values?
    • Yes: rectilinear interpolation.
    • No: Are you using an adaptive step size solver (e.g. the default dopri5)?
      • Yes: Hermite cubic splines with backwards differences.
      • No: linear interpolation.
      • Not sure / both: Hermite cubic splines with backwards differences.

In more detail:

  • Linear interpolation: these are "kind-of" causal.

During inference we can simply wait at each time point for the next data point to arrive, and then interpolate towards the next data point when it arrives, and solve the CDE over that interval.

If there is missing data, however, then this isn't possible. (As some of the channels might not have observations you can interpolate to.) In this case use rectilinear interpolation, below.

Example:

coeffs = linear_interpolation_coeffs(x)
X = LinearInterpolation(coeffs)
cdeint(X=X, ...)

Linear interpolation has kinks. If using adaptive step size solvers then it should be told about the kinks. (Rather than expensively finding them for itself -- slowing down to resolve the kink, and then speeding up again afterwards.) This is done with the jump_t option when using the torchdiffeq backend:

cdeint(...,
       backend='torchdiffeq',
       method='dopri5',
       options=dict(jump_t=X.grid_points))

Although adaptive step size solvers will probably find it easier to resolve Hermite cubic splines with backward differences, below.

  • Hermite cubic splines with backwards differences: these are "kind-of" causal in the same way as linear interpolation, but dont have kinks, which makes them faster with adaptive step size solvers. (But is simply an unnecessary overhead when working with fixed step size solvers, which is why we recommend linear interpolation is you know you're only going to be using fixed step size solvers.)

Example:

coeffs = hermite_cubic_coefficients_with_backward_differences(x)
X = CubicSpline(coeffs)
cdeint(X=X, ...)
  • Rectilinear interpolation: This is appropriate if there is multivariate missing data, and you need causality.

What is done is to linearly interpolate forward in time (keeping the observations constant), and then linearly interpolate the values (keeping the time constant). This is possible because time is a channel (and doesn't need to line up with the "time" used in the differential equation solver, as per the reparameterisation invariance of the previous section).

This can be a bit unintuitive at first. We suggest firing up matplotlib and plotting things to get a feel for what's going on. As a fun sidenote, using rectilinear interpolation makes neural CDEs generalise ODE-RNNs.

Example:

# standard setup for a neural CDE: include time as a channel
t = torch.linspace(0, 1, 10)
x = torch.rand(2, 10, 3)
t_ = t.unsqueeze(0).unsqueeze(-1).expand(2, 10, 1)
x = torch.cat([t_, x], dim=-1)
del t, t_  # won't need these again!
# The `rectilinear` argument is the channel index corresponding to time
coeffs = linear_interpolation_coeffs(x, rectilinear=0)
X = LinearInterpolation(coeffs)
cdeint(X=X, ...)

As before, if using an adaptive step size solver, it should be informed about the kinks.

cdeint(...,
       backend='torchdiffeq',
       method='dopri5',
       options=dict(jump_t=X.grid_points))

Fixed solvers

Solving CDEs (regardless of the choice of interpolation scheme in a Neural CDE) with fixed solvers like euler, midpoint, rk4 etc. is pretty much exactly the same as solving an ODE with a fixed solver. Just make sure to set the step_size option to something sensible; for example the smallest gap between times:

X = LinearInterpolation(coeffs)
step_size = (X.grid_points[1:] - X.grid_points[:-1]).min()
cdeint(
    X=X, t=X.interval, func=..., method='rk4',
    options=dict(step_size=step_size)
)

Stacking CDEs

You may wish to use the output of one CDE to control another. That is, to solve the coupled CDEs:

du(t) = g(t, u(t))dz(t)     u(t_0) = u0
dz(t) = f(t, z(t))dX(t)     z(t_0) = z0

There are two ways to do this. The first way is to put everything inside a single cdeint call, by solving the system

v = [u, z]
v0 = [u0, z0]
h(t, v) = [g(t, u)f(t, z), f(t, z)]

dv(t) = h(t, v(t))dX(t)      v(t_0) = v0

and using cdeint as normal. This is usually the best way to do it! It's simpler and usually faster. (But forces you to use the same solver for the whole system, for example.)

The second way is to have cdeint output z(t) at multiple times t, interpolate the discrete output into a continuous path, and then call cdeint again. This is probably less memory efficient, but allows for different choices of solver for each call to cdeint.

For example, this could be used to create multi-layer Neural CDEs, just like multi-layer RNNs. Although as of writing this, no-one seems to have tried this yet!

The log-ODE method

This is a way of reducing the length of data by using extra channels. (For example, this may help train Neural CDE models faster, as the extra channels can be parallelised, but extra length cannot.)

This is done by splitting the control X up into windows, and computing the logsignature of the control over each window. The logsignature is a transform known to extract the information that is most important to describing how X controls a CDE.

This is supported by the logsig_windows function, which takes in data, and produces a transformed path, that now exists in logsignature space:

batch, length, channels = 1, 100, 2
x = torch.rand(batch, length, channels)
depth, window = 3, 10.0
x = torchcde.logsig_windows(x, depth, window)
# use x as you would normally: interpolate, etc.

See the paper Neural Rough Differential Equations for Long Time Series for more information. See logsignature_example.py for a worked example.

Note that this requires installing the Signatory package.

torchcde's People

Contributors

jambo6 avatar jeongwhanchoi avatar patrick-kidger avatar zymrael avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

torchcde's Issues

Online prediction tasks needs examples

Hi Patrick. First of all thank you all for creating such a beautiful library for the neural CDE model!

I have read both of your papers, neural CDE as well as its extension for online predictions. As far as I could understand, the rectilinear interpolation allows for "causal" interpolation without requiring future values. However, since constant interpolation is used for every channel in X, the change in the control signal over time is always going to be 0, except for the knot points where the gradient is possibly undefined due to a discrete jump. This means that the change in the hidden state over time is also going to be 0, since . Correct me if I'm wrong, please. Also, by looking at the repositories published, it is still not clear to me how I could use the online prediction extension of the neural CDE algorithm.

To my understanding, the entire channel dimension could be thought of as a concatenation of measurements from multiple different sensors and each channel (except for time and observational density) as a measurement of a single sensor. So, what I would like to say is, it would be much more clear and much appreciated if you could provide a minimal code example of how to do online predictions as data arrives for a given channel. Because I could not put the pieces together myself to be able to do it.

Cheers,
Deniz

much slower when using torchsde as backend.

hello, im trying to using neural cde with some noise. I found torchsde is one choice of backend. But after solving the 2-dim-limits problems, it becomes much slower than just using torchdiffeq. May I using in a wrong way or is this the problem of neural sde ?
or can i have a better way to combine noise ?
thanks.

TupleControl + computed parameters

TupleControl does not propagate computed parameters.

Perhaps just remove computed parameters and require that people pass them in via adjoint_params.

Prediction of irregular time series

​​Hi Patrick! Congratulations on your research work on neural differential equations. It's quite impressive, and thank you for the torchcde and Diffrax libraries.

I've been experimenting with the torchcde module for some time now. I've read the repository and related papers: https://arxiv.org/abs/2005.08926, https://arxiv.org/abs/2106.11028. Currently, I'm working on a time series prediction problem using neural CDEs. I will migrate it to Diffrax, but I have a question, and I think your experience can help me to address it.

In a nutshell, I'm predicting a substance concentration in blood, denoted as $Y$, from different patients. This concentration is irregularly sampled within and across patients. For instance, patient A has eleven measurements in ~50 minutes, patient B has only two measurements in ~4 hours, while patient C has five measurements in ~2 hours. The goal is to estimate $Y$ based on a set of medical signals $X$ (e.g., heart rate, $O_2$ level) that are almost uniformly sampled for all patients (handling $X$ is not an issue). My objective is to predict $Y$ at time $t_n$ , considering all historical information [ $X$ and $Y$ (at least an initial condition of $Y$)] from $t_0$ to $t_{n-1}$ for each patient. This means I would like to have ten predictions of $Y$ for patient A, one prediction for patient B, and so on. This setup is quite different from the examples I have seen, as neuralCDEs are mainly used for classification or static regression tasks (such as the BeijingPM10 or the LOS examples in https://arxiv.org/abs/2106.11028).

I've tried several strategies:

  • Training a neuralCDE for each patient and validating it using a rolling window strategy. However, the out-of-sample predictions seem to be quite similar to the last value of $Y$ observed, indicating possible overfitting. Moreover, the coeffs obtained from interpolation change their size across windows, which raises concerns about the approach's validation and effectiveness.

  • Instead of using the window strategy, I considered replacing t=X.interval with t=X.grid_points in the torchcde.cdeint(...) function (assuming that the hidden channel, dim=1, directly represents $Y$). This change would allow me to obtain an estimated array $\hat{Y}$ for all time steps considered, but the true value of $Y$ is recorded only at specific steps. Not sure about how to compute the loss function in this case.

  • Another approach I considered is splitting $X$ and $Y$ into one-step $Y$-related measurements for all patients. For example, if $Y$ is available for patient A at $t_{a1}$, $t_{a2}$, $t_{a3}$ ..., I would divide $X$, $Y$ for patient A into batches $[t_{0}, t_{a1-1}], [t_{a1}, t_{a2-1}]$, and so on. I would apply a similar strategy for patient B, and then group all batches from all patients to follow the irregular data strategy as commented in irregular_data.py. This approach would allow me to perform train/validation/test splits, ensuring that all sets have the same coeffs length and making testing more manageable. However, I'm concerned that with this strategy I'm losing information as predicting $Y$ at $t_{a2}$ would mean missing records before $t_{a1}$ that may be useful.

As you can see, it's a question related to preprocessing or train-test strategies, but with the way of input data for neuralCDEs, it might be worth thinking it over carefully. Any comments would be greatly appreciated. Thank you very much!

Very slow training with market data, normal?

I adapted your time_series_classification example for market data prediction. It seems to be working but training is exceptionally slow on a P100 GPU which normally finishes similar tasks in 30m. After 4 hours it completed the first 2 epochs. Is this normal with CDEs or did I do something wrong? Training loss is also diverging, but that might be due to learning rate I haven't checked that yet.
Here is the dataprep function I added as well as some minor adaptations to the model.

The complete code with corresponding data CSV: time_series_prediction example

def get_data():
    btc_df = pd.read_csv('example/btc_data.csv', parse_dates=['open_time'])
    btc_df_n_t = normalize(btc_df)
    
    # Split training/testing
    train_size = int(len(btc_df_n_t) * .8)
    train_df, test_df = btc_df_n_t[:train_size], btc_df_n_t[train_size + 1:]
    
    # Create sequences
    SEQUENCE_LENGTH = 120
    train_X, train_y = create_sequences(train_df, 'close', SEQUENCE_LENGTH)
    test_X, test_y = create_sequences(test_df, 'close', SEQUENCE_LENGTH)
    
    # Create tensor arrays
    train_X, train_y  = arr_to_tensor(train_X), arr_to_tensor(train_y)
    test_X, test_y  = arr_to_tensor(test_X), arr_to_tensor(test_y)

    return train_X, train_y, test_X, test_y

def main(num_epochs=30):
    train_X, train_y, test_X, test_y = get_data()

    ######################
    # input_channels=3 because we have both the horizontal and vertical position of a point in the spiral, and time.
    # hidden_channels=8 is the number of hidden channels for the evolving z_t, which we get to choose.
    # output_channels=1 because we're doing binary classification.
    ######################
    model = NeuralCDE(input_channels=8, hidden_channels=8, output_channels=1)
    optimizer = torch.optim.Adam(model.parameters())

    ######################
    # Now we turn our dataset into a continuous path. We do this here via Hermite cubic spline interpolation.
    # The resulting `train_coeffs` is a tensor describing the path.
    # For most problems, it's probably easiest to save this tensor and treat it as the dataset.
    ######################
    train_coeffs = torchcde.hermite_cubic_coefficients_with_backward_differences(train_X)

    train_dataset = torch.utils.data.TensorDataset(train_coeffs, train_y)
    train_dataloader = torch.utils.data.DataLoader(train_dataset, batch_size=32)
    for epoch in range(num_epochs):
        for batch in train_dataloader:
            batch_coeffs, batch_y = batch
            pred_y = model(batch_coeffs).squeeze(-1)
            loss = torch.nn.functional.binary_cross_entropy_with_logits(pred_y.unsqueeze(1), batch_y)
            loss.backward()
            optimizer.step()
            optimizer.zero_grad()
        print('Epoch: {}   Training loss: {}'.format(epoch, loss.item()))

    test_coeffs = torchcde.hermite_cubic_coefficients_with_backward_differences(test_X)
    pred_y = model(test_coeffs).squeeze(-1)
    
    # TODO: Modify evaluation for non-binary prediction
    binary_prediction = (torch.sigmoid(pred_y) > 0.5).to(test_y.dtype)
    prediction_matches = (binary_prediction == test_y).to(test_y.dtype)
    proportion_correct = prediction_matches.sum() / test_y.size(0)
    print('Test Accuracy: {}'.format(proportion_correct))

Integration to pytorch lighting pipeline

I'm trying to integrate torchcde to a BTC price prediction pipeline using pytorch lighting but I'm not able to figure out how to do it based on the examples provided.
The goal is to predict if the price will go up or down in the next interval and by how much. So the target variable is close

Scaled Data Example

minute hour day_of_week vol open high low close_change close
-1.000000 -0.391304 -0.666667 -0.992874 -0.635116 -0.657622 -0.621309 0.035182 -0.654171
-0.818182 -0.391304 -0.666667 -0.993615 -0.647342 -0.663678 -0.625783 0.014025 -0.658862
-0.636364 -0.391304 -0.666667 -0.997344 -0.652328 -0.668183 -0.625867 0.035498 -0.658792

Feel free to have a look at the complete pipeline
Any orientation to how to put things into place would be highly appreciated!

Breaking install

'torchsde @ git+https://github.com/google-research/torchsde.git>=0.2.4']

This line is breaking installation for me, whether or not I have the latest torchsde available.

Sequence outputs from Neural ODE (similar to 'many to many' RNN)?

Hi, Patrick!
I'm currently training irregularly sampled data, and previously I used many to many RNN for modeling.
When time-series data is sampled at the time of t1, t2, t3, and t4, my intended model will predict matched outcomes, y1, y2, y3, and y4.

And I want my model not to know the future sequence data.
For example, when predicting at the time of t2, the model should not know the information of t3 and t4 and it should yield the same result even though the information of t3 and t4 will change.

My previous code of RNN is like below:
L_LSTM = nn.LSTM(n_hidden, n_hidden, batch_first=True)
sequences_l, _ = L_LSTM(X, (state_h, state_c)) # sequences_l.shape => n_batch, n_sequence, n_hidden

I recently found this article, neural ODE, and I'm really interested in its amazing concepts.
Because my dataset is severely irregularly sampled, neural ODE seems to improve the model performance.
I want to apply neural ODE to my dataset instead of RNN, but I'm not pretty sure whether my code is right or not.

Example code in README.md is like below:
zt = torchcde.cdeint(X=X, func=self.func, z0=z0, t=X.interval)
zT = zt[..., -1, :] # get the terminal value of the CDE, zT.shape => n_batch, n_hidden

and I want to change this code like below:
sequences_t = torchcde.cdeint(X=X, func=func, z0=z0, t=X._t) # sequences_t.shape => n_batch, n_sequence, n_hidden
(X._t represents all time sequences according to my best knowledge)

Does it make sense to change codes like that? Will sequences_t act like sequences_l (from RNN code)?
Thanks in advance!

Basic prediction problem.

Hi @patrick-kidger !! I hope your well and staying safe during these times.

For a basic single variable prediction problem can this model be used.

for example for the below date -

import numpy as np

import yfinance as yf
data = yf.download("SPY", start="2017-01-01", end="2017-04-30")['Adj Close']

y=data.to_numpy()
X=np.linspace(1, len(data))

CDEs with Image Data

Hey Patrick!
I want to utilize neural CDEs for the purpose of time series with images at different time points, as opposed to structured data.

I have seen that on the torchdiffeq repo, they use a convolutional layer in their "ODEFunc" in order to use image data. I was wondering if a similar approach could be taken with neural CDEs. I'm just not sure how to prepare the data for doing such with the interpolation and cubic spline. I was wondering if you had any suggestions for how I could go about doing this.

In addition (unrelated to the main question), I was wondering whether interpolation with the hermite cubic spline could be performed during model training, as opposed to beforehand during data preprocessing.

I would greatly appreciate any help with these questions.

Regards,
Aashish

Consider opening a GH discussions

Hi Patrick,

First I would like to thank you and everyone involved in this package and related research 🎉 ! I've been trying to use it, but I'm seeing some nans and I would really appreciate your (and/or others) insights on this. Therefore I would like to know your thoughts about opening a GH discussions tab for non-issues related conversations, which is basically a forum inside GH.

Best regards,

piping in & predicting arbitrary streams of values

repost from other related thread which was closed:

Can you please speak to the general problem of how to pipe in an arbitrary stream of price values (say from a single column of a .csv), where the goal is to use Neural CDE (edit: SDE changed to CDE) to train and predict the next value in sequence?

It is not immediately clear from the get_data function here how to import a sequence of values (or whether get_data is the appropriate function to modify). Any pointers are greatly valued! Thank you much

Time channel unaltered after interpolation and automatic padding when missing values

Hi Patrick,

I was testing issue #14 solution and seems to work well when filling the last/first valid row forward/backward, except that the time channel is not padded along with it. This is also the case when using linear interpolation.

Below is a minimal working example. If I understood correctly from the examples (irregular_data.py -> variable_length_data() -> final comments) the output in the time dimension should read 0.1667 in the first timestep and 0.8333 in the last timestep, rather than 0.0 and 1.0 (in bold) for it to work properly. Or perhaps I misunderstood and this is actually correct behavior?

Cheers,
Joaquin

import numpy as np
import torch
import torchcde

torch.set_printoptions(sci_mode=False, linewidth=200)

# Toy data
x = torch.rand(5, 2) # 5 timesteps and 2 features
nans_row = torch.empty(x.size(-1)) * np.nan
x = torch.cat([nans_row.unsqueeze(0), x, nans_row.unsqueeze(0)], dim=0)

# Include cumulative observational mask
obs_mask = (~torch.isnan(x)).cumsum(dim=-2)
x = torch.cat([x, obs_mask], dim=-1)

# Add time as the first feature
t = torch.linspace(0., 1., x.size(-2))
x = torch.cat([t.unsqueeze(-1), x], dim=-1)

# Interpolate and recover data at knots
print('Original data:\n',x, x.shape)
coeffs = torchcde.natural_cubic_coeffs(x) # or torchcde.linear_interpolation_coeffs(x) 
X = torchcde.NaturalCubicSpline(coeffs) # or torchcde.LinearInterpolation(coeffs)
data = X.evaluate(X.grid_points)
print('Interpolated data:\n', data, data.shape)

Output:
Original data:
tensor([[0.0000, nan, nan, 0.0000, 0.0000],
[0.1667, 0.0771, 0.4205, 1.0000, 1.0000],
[0.3333, 0.5345, 0.6326, 2.0000, 2.0000],
[0.5000, 0.8389, 0.2319, 3.0000, 3.0000],
[0.6667, 0.6324, 0.5267, 4.0000, 4.0000],
[0.8333, 0.4999, 0.8982, 5.0000, 5.0000],
[1.0000, nan, nan, 5.0000, 5.0000]]) torch.Size([7, 5])
Interpolated data:
tensor([[0.0000, 0.0771, 0.4205, 0.0000, 0.0000],
[0.1667, 0.0771, 0.4205, 1.0000, 1.0000],
[0.3333, 0.5345, 0.6326, 2.0000, 2.0000],
[0.5000, 0.8389, 0.2319, 3.0000, 3.0000],
[0.6667, 0.6324, 0.5267, 4.0000, 4.0000],
[0.8333, 0.4999, 0.8982, 5.0000, 5.0000],
[1.0000, 0.4999, 0.8982, 5.0000, 5.0000]]) torch.Size([7, 5])

publish `torchcde` on PyPI

Any plan to publish torchcde on PyPI? It's inconvenient to build packages that rely on torchcde as a dependency since it is not served on PyPI. There are workarounds with some versioning tools, like poetry, which I'm using for torchdyn. But it'd be lovely to have this out.

Linked to this, which is also a bottleneck.

Fix interpolation documentation

The docstrings currently advocate for batching irregular samples by putting NaNs to represent missing data at each other's observation times. This is wildly inefficient and completely unnecessary.

Add log-ODE example.

Additionally improve the log-ODE documentation, docstring etc. It's not really understandable unless you're already an expert.

Fix logsignatures to divide by window length

At the moment the logsignature functionality is a little buggy, in that it doesn't normalise the logsignature by the length of the interval it's taken over.

This should be fixed in a backward-compatible manner.

Comparison to alternative ODE models

Hi Patrick,

I'm really interested in your work and while I was reading the neural CDE paper carefully I came across this subsection where you compare neural CDEs to seemingly alternative neural ODEs . There, the conclusion was that any such neural ODE can be rewritten as a neural CDE but not vice versa.

However, I cannot convince myself this to be the case. Let's take a really simple example where the control input is a scalar, the hidden state is also a scalar and such that the hidden state is simply the integral of the control path . Also, assume for the sake of simplicity that . Therefore, the control path is constant. In this case we would expect . However, in the formulation of neural CDE dynamics , if the control path doesn't change neither does the hidden state. So, how would we be able to rewrite this very simple "neural ODE" which is just an identity function as a neural CDE?

Cheers,
Deniz

Masking Coefficients?

Hi Patrick,

Thanks for all your work with Diffrax and torchcde.

I noticed that for each time step in a two channel time series dataset, there are 4 coefficients associated with that particular time step (so 10 time steps would have 40 coefficients per example in batch).

To this end, I am attempting to incorporate a Neural CDE with a Transformer Decoder and wanted to apply masking on the coefficients to avoid any lookahead bias with the CDE model. My question then is if this is something that can be done?

My immediate thought would be reshaping the coefficients into a (batch_size, 4, 10) matrix and trying to find some way to use a (10, 10) tril mask and unsqueeze to pass into a CubicSpline interpolation, but I'm not sure how exactly this could be done.

Any help with this would be greatly appreciated!

Thanks,

Aashish

setup.py and torchsde

I might be missing something but it looks to me like https://github.com/google-research/torchsde is up to version 0.2.4 but setup.py of torchcde is looking for version >= 0.2.5?

I could only pip install by changing:
install_requires = ['torch>=1.7.0', 'torchdiffeq>=0.2.0',
'torchsde @ git+https://github.com/google-research/torchsde.git>=0.2.5']
to ...
install_requires = ['torch>=1.7.0', 'torchdiffeq>=0.2.0', 'torchsde>=0.2.4']

How to use log_ode? What does "window" represent time or number of points?

Hi Patrick,

First of all, thank you for posting this work - very impressive (and beyond my mathematical background).

I was able to use the neural CDE for a small dataset of irregularly sampled tracks, however when moving to a larger dataset the training times become much too long.

Therefore, I am attempting to use the RDE formulation with log signatures, but it's not clear to me how time is processed by logsig_windows function and what the window_length units are - time (seconds) or number of points.

Based on James Morrill's logsignature-example (https://github.com/jambo6/torchcde/blob/logsignature-example/example/logsignature_example.py) it looks like the window_length is in terms of number of points, but they how is time preserved?

In my application, I am presented with a data batch of size (b, num_points, C) where the time series are irregularly sampled so channel 0 is time - in hours, and channel 1-C are features. However, the time series can be of different length in terms of number of points and duration (hours).

x - is my data batch (filled forward so that there is the same # of points per track)
t = x[:,:,0] - time for each point in each track
y = torchcde.logsig_windows(x, depth=3, window=4) - I don't pass in "t" explicitly it's already part of "x"
train_coeffs = torchcde.natural_cubic_coeffs(y)

Is this implementation correct? Does this mean that each signature window is looking at 4 sampled points? How does it represent the time variability from track to track: 4 points for track A can be 4 minutes, but for track B it can be 40 minutes, depending on the sampling rate.

Thanks,
Alex

Overfitting

Hi,
Any tips to deal with overfitting? Can we add dropouts?
For example, in

def forward(self, t, z):
# z has shape (batch, hidden_channels)
z = self.linear1(z)
z = z.relu()
z = self.linear2(z)
######################

Can we add a dropout somewhere?

I am new to NeuralCDEs, apologies if I am missing anything obvious.

About how to use CDE in Variational Autoencoders?

Hello, Mr Kidger! This work gives me a lot insight!

I've create a repo Forecast (also mentioned in #39 ). It's not too hard to use CDE in Seq2Seq models, but how to use it in Variational Autoencoders?

Here is a qoutation about modeling uncertainty from your paper:

As presented here, Neural CDEs do not give any measure of uncertainty about their predictions. Such extensions are likely to be possible, given the close links between CDEs and SDEs, and existing work on Neural SDEs.

ODE_RNN arms GRU Cell which can model h_0 and h_0_std. The latter can be viewed as "uncertainty" and is crucial to VAE architecture. Are there related work for how to model uncertainty using CDE?

generation example

Hi,
This work is really amazing. But it seems the experiments in your paper are all about classification tasks. Could you please also provide a sequence generation example?Many thanks~

About how to use on seq2seq works

Hi, patrick!
First, thank you very much for share your work for us! It is very useful!
And I have a question. Is it work on the regression tasks, such as seq2seq. I found all examples are classification and specifically cdeint only return terminal value( initial value also but not used ) to linear map to category. So if I want to use it to seq2seq, such as fill nan, it is somehow diffcult. How or where can I make some modifications to adapte it on seq2seq task.
Thank you very much!

Integrate the ODE function of the CDE system to infinity

Hi Patrick I do really amazed and appreciate you and your team's work on handling these dynamic systems. As a medical student, what I think might be wrong from a mathematician's perspective so please forgive me if I asked silly questions.

My work in a nutshell is to estimate the signal intensity of Dynamic Contrasted MRI images for all(any) time points. The latent space in my model is to represent human body dynamics that are controlled by a NeuralCDE system (you may regard them as interpolated signal intensities). And Interestingly, specific to this study, you can imagine that the initial latent space z0 is more or less similar with zT when T is large enough (like 2 days after, all the contrast should already be eliminated from body, versus contrast has not yet arrived in z0).

From my understanding, CDE takes the ODE func (a neural network) and control system X into a Vector Field first, then pass the Vector Field as the func for torchdiffeq.odeint(). I am still considering the optimal (clinically justified) method for initializing a reliable latent space, while at this moment, I would only like to integrate the ODE func alone (Not the vector field) to infinity or a relatively large number outside my model to get the zT and thus the initial latent space (the same ODE func that will use in CDE)

I would like to ask for your valuable comments on it, of great if you could give me some suggestions on how to integrate it (elegantly) rather than I set a large number for it. I have also read the paper "Deep Equilibrium Model" that I am not sure if there fixed point solver can be applied to ODE functions. Do you have any paper/topic suggested for me that I can further research the possibility of doing that? I am looking forward to receiving your reply.

The last part is only my appreciation to you sir! I like your "mentally substitute the above treatment throughout" written in the "textbook" cuz I did it a lot haha. Human body follows a way more complex dynamic system that we could often regard as black box. I do believe your work could significantly benefit the root-finding of these unknown body dynamics in clinical field. Big applause!

Seq2seq forecasting: adding temporal information and extrapolation

Hi Patrick! Thank you so much for your inspiring work.

I am currently trying to implement sequence-to-sequence forecasting with CDE but the output smoother without adding time as an additional dimension on my data. I am not sure whether I have implemented the code in the right way.

So my input data is with size batch size, input sequence length, feature dimension and I mapped it to a latent space using GRU, this holds a dimension of batch size, input sequence length, latent dimension, which I considered this as the observation of each time step.

I then concatenated the time points with this latent vector (add 1 dim to the latent dimension) prior to fitting the feature to the cubic spline.

x = torch.cat((t_x, x), dim=2)

After that I do:

z0 = self.mlp(X0) 
t_forecast = torch.arange(start=0, end=t_y.shape[1], dtype=dtype, device=self.args.device) # create the forecasting time step
z_T = cdeint(X=X, z0=z0, func=self.func, t=t_forecast) # size: batch size, output sequence length, latent dimension
pred_y = self.decoder(z_T) # linear layer on the latent dimension, output size: batch size, output sequence length

The loss is then computed between pred_y and the ground truth

My questions are:

  1. Is it correct to do time series forecasting with CDE in this way? If it is, why is the one without t works better?
  2. If I want to do extrapolation, is it correct to have a longer t fed into CDE with torch.arange?

Thank you in advance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.