google-research / torchsde Goto Github PK
View Code? Open in Web Editor NEWDifferentiable SDE solvers with GPU support and efficient sensitivity analysis.
License: Apache License 2.0
Differentiable SDE solvers with GPU support and efficient sensitivity analysis.
License: Apache License 2.0
Hey,
Does this project include the code for the demo gif in the readme file?
BR,
Hello!
First of all, fantastic work!! This framework is amazing and I've managed to implement some basic SDE's and solve them. My current goal is to go through some of the canonical SDEs used in Finance and implement them, which leads to my question.
Q: I discovered some slides pertaining to the Neural SDE as GANs paper, in which you reference the Heston model, in torchsde
how would I link the two Brownian motions together by their correlation factor rho
? I think I managed to do it in a quick and dirty way below
class CorrelatedBrownianIntervals(BrownianInterval):
def __init__(self, t0, t1, rho, w1, W=None, H=None):
super().__init__(t0, t1, (batch_size, 1), W, H)
self.rho = rho # Correlation Coefficient between the 2 Brownians
self.w1 = w1 # The first correlated Brownian
def correlated_brownian(self, ta, tb):
z1 = super().__call__(ta, tb) # This generates an uncorrelated Brownian motion Z1
corr = (self.rho * self.w1_val + torch.sqrt(1 - self.rho ** 2) * z1) ## Creates the correlated Brownian W2
return corr
def __call__(self, ta, tb):
self.w1_val = self.w1(tb) - self.w1(ta)
return self.correlated_brownian(ta, tb)
Is this the right approach? Or is there a better / more flexible way to do this?
Thanks again!
I think this relates to the issue @patrick-kidger mentioned that when running the scalar diagnostics, large amounts of memory is consumed. I'm seeing this also on my machine.
I've managed to pinpoint the issue: When tb-ta
is very small, the dependency tree creation phase hangs, with _set_points
being repeated indefinitely. More specifically, this line.
Creating a list of potential features that may be nice to have, but aren't necessarily a priority right now.
f_and_g
function on SDEs, that computes drift+diffusion simultaneously. (#84)Tests:
Maybe?
dg_ga
, if and when batch-vjp ever becomes available.sdeint
, sdeint_adjoint
This bug relates to my concern that there's virtually no test for BrownianInterval at the moment, and in retrospect it shouldn't have been merged into dev without testing out the aspects I mentioned in #15. Shape tests are basic, though it is helpful at most times. The change of API for the solvers also breaks backward compatibility.
The data structure couldn't produce A
with the correct shape
import torch
from torchsde import BrownianInterval
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
batch_size, d, m = 16, 3, 2
t0, t1 = 0.0, 0.3
dtype = torch.get_default_dtype()
bm_general = BrownianInterval(
t0=t0, t1=t1, shape=(batch_size, m), dtype=dtype, device=device, levy_area_approximation='foster'
)
W, U, A = bm_general(0.0, 0.1) # Fails.
print(W.size())
print(U.size())
print(A.size())
Could you please provide the reproducible experimental source code for experiment 4.2-4.4?
Or, I would like to see the code for the reproducible metric (Classification, Prediction, MMD) to check the results of running the experiment.
thanks for reading.
Hi,
I observed that the memory usage increases while training goes on. This is unusual in standard NN, since the size of the parameters is fixed. Is this normal in this model or did I get something wrong? I also noticed there is a cache_size parameter. What is the unit of this integer, is it in MB?
thanks
I was trying to use general noise with either sdeint or sdeint_adjoint. But I always get the error message below:
SDE has noise type general but solver only supports noise types ('additive', 'diagonal', 'scalar')
Would you mind implementing the corresponding solver?
Another question: what are the meanings of g_prod, gdg_prod and g_prod_and_gdg_prod?
Thanks
To be clear, not raising this because I feel like being nitpicky - read on!
Other than very minor differences (e.g. order), the code for each Euler--Maruyama/Milstein/SRK method is essentially the same across noise types. Instead one can have e.g. a single Euler
that accepts order
as an argument, and four different wrappers that set that argument.
Similarly, the adjoint SDE definitions have lots of code reuse; it'd easier to write down a single adjoint SDE for the general noise case. (And any unnecessary computation can be masked out with if statements checking the noise type.)
I think the latter in particular offers an easy opportunity to unify Ito and Stratonovich: it'd just be another if statement checking if the correction term should be applied.
As for why I'm raising this - I'm interested in:
general
noise. (And I'm happy to do so with Euler--Maruyama, knowing the strong order is 0.5. It should still work, right?)The first is aided by tidying up the existing code for solvers (I don't want to define MyCustomSolverAdditive
, MyCustomSolverGeneral
,...), the second is made possible by improving the adjoint SDE, and the third is a stretch goal that'll probably be easier if it can just be implemented with if sde_type=
corrections in the right places.
Tidying up the solvers at least is probably something I can offer a PR on, but I wanted to check if this fit your general vision, or if you think there's anything that might go wrong, as it pretty much involves removing the whole methods
directory structure.
To-do list so far:
Brownian motions:
examples/demo.ipynb
tests/problems.py
tests/test_brownian_path.py
tests/test_brownian_tree.py
__init__
:
Solvers:
integrate_logqp
and interp_logqp
SDEs:
gdg_prod
for the adjoint. (Chen)sde_type
and the method
.Misc:
python_requires
.diagnostics/ito_scalar.py
appears to be using huge amount of memory, and is also using a diagonal (and thus not compatible) problem.sdeint
, sdeint_adjoint
to automatically select the probably-best solver for a given noise type / SDE type combination. (Rather than just always using SRK.) (#73)Tests:
test_sdeint.py
, test_adjoint.py
, test_adjoint_logqp.py
(Chen) (#21)test_brownian_interval.py::test_normality_conditional
(#44)test_sdeint.py
to pytest. (#73)Bugs to fix:
a
from space-time Levy area to Levy area is of the wrong shape. (Patrick) (#28)I_k0
in SRK.method='srk'
and levy_area_approximation='none'
are incompatible with one another. (Done in #19)bm
isn't passed then it defaults to being a BrownianPath
on the CPU, potentially causing device errors later. (Done in #19)Following the note, the entropy
argument passed to initialize BrownianTree
doesn't govern the random numbers sampled outside the predetermined range [t0, t1].
Queries of this type is rate. Though, at the very least, there should be some warning or documentation on this.
It would be great if torchsde
could be installed into an environment with numpy==1.20.*
and scipy==1.6.*
Line 37 in 4c84649
pip install git+ssh://[email protected]/google-research/torchsde
throws an error for me ([email protected]: Permission denied (publickey)
), presumably because one is required to have set their SSH key with GitHub.
Is there a particular reason for this choice? Rather than the usual
pip install git+https://github.com/google-research/torchsde.git
which is a little more user-friendly. :)
Latest on dev, the contract checking fails for the following
import torch
from torchsde import BrownianInterval, BaseSDE, sdeint
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
batch_size, d, m = 16, 3, 2
t0, t1 = 0.0, 0.3
dtype = torch.get_default_dtype()
y0 = torch.zeros(batch_size, d, device=device)
class ScalarSDE(BaseSDE):
def f(self, t, y):
return torch.sin(y)
def g(self, t, y):
return torch.cos(y).sigmoid()
sde = ScalarSDE(sde_type='ito', noise_type='scalar')
bm = BrownianInterval(t0=t0, t1=t1, shape=(batch_size, 1), dtype=dtype, device=device, levy_area_approximation='foster')
sdeint(sde=sde, y0=y0, ts=[t0, t1], bm=bm)
Also it might be worthwhile to characterize exactly what API we'd like to use. I think we should go with g
outputting (batch_size, d, 1)
or (batch_size, d)
or preferably both if possible for backward compatibility.
Hello, thanks for the great work!
I was trying the library and I think there is a problem with sdeint_adjoint
when it's used in custom loss functions (sdeint
works however). Below I give an example based on the LatentSDE which shows a case of the error. The example doesn't have any meaningful interpretation, it is just for illustration purposes.
import torch
from torch import nn
from torchsde import sdeint, sdeint_adjoint, SDEIto
from torch.distributions import Normal
class LatentSDE(SDEIto):
def __init__(self, drift, sigma=0.5):
super(LatentSDE, self).__init__(noise_type="diagonal")
self.drift = torch.tensor(drift, requires_grad=False)
self.sigma = torch.tensor(sigma, requires_grad=False)
# Approximate posterior drift: Takes in 2 positional encodings and the state.
self.net = nn.Sequential(
nn.Linear(2, 10),
nn.Tanh(),
nn.Linear(10, 1)
)
self.net[-1].weight.data.fill_(0.)
self.net[-1].bias.data.fill_(0.)
def h(self, t, y): # Prior drift.
return self.drift * y
def f(self, t, y): # Approximate posterior drift.
if t.dim() == 0:
t = float(t) * torch.ones_like(y)
# Positional encoding in transformers; must use `t`, since the posterior is likely inhomogeneous.
inp = torch.cat((torch.sin(t), torch.cos(t)), dim=-1)
return self.net(inp)
def g(self, t, y): # Shared diffusion.
return self.sigma
batch_size, d, T = 3, 1, 20
sde = LatentSDE(1.0)
ts = torch.linspace(0, 1, T)
y0 = torch.zeros(batch_size, 1).fill_(0.1) # (batch_size, d)
dist = Normal(0,1.0)
def loss():
ys, logqp = sdeint_adjoint(sde, y0, ts, logqp=True)
data = ys.sum(0).mean()
res = -dist.log_prob(data)
return res
learning_rate = 1e-4
optimizer = torch.optim.Adam(sde.parameters(), lr=learning_rate)
for t in range(3):
optimizer.zero_grad()
l = loss()
print(l)
l.backward()
# Calling the step function on an Optimizer makes an update to its
# parameters
optimizer.step()
The error is
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-16-65102f371b73> in <module>
3 l = loss()
4 print(l)
----> 5 l.backward()
6
7 # Calling the step function on an Optimizer makes an update to its
~/opt/anaconda3/lib/python3.7/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
196 products. Defaults to ``False``.
197 """
--> 198 torch.autograd.backward(self, gradient, retain_graph, create_graph)
199
200 def register_hook(self, hook):
~/opt/anaconda3/lib/python3.7/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
98 Variable._execution_engine.run_backward(
99 tensors, grad_tensors, retain_graph, create_graph,
--> 100 allow_unreachable=True) # allow_unreachable flag
101
102
RuntimeError: Mismatch in shape: grad_output[0] has a shape of torch.Size([3, 1]) and output[0] has a shape of torch.Size([]).
Is this a known issue or am I doing something wrong?
Thanks a lot in advance.
From:
torchsde/torchsde/_core/base_sde.py
Line 148 in 647b6e5
A prior drift is assumed to exist on base_sde
.
@lxuechen soliciting your thoughts on something I've been contemplating, namely to make BrownianPath and BrownianTree special cases of BrownianInterval.
BrownianInterval w/ infinite cache is essentially the same as BrownianPath.
BrownianInterval w/ some tolerance at which the search is terminated, and a prespecified tree structure, is essentially the same as BrownianTree.
Would need to add a tolerance to the search in BrownianInterval, and for robustness a trampoline to the relevant parts, neither of which should be very hard.
It'd be nice to make torchsde available on PyPI. There's a few limitations with hosting it only on GitHub, most notably that (a) you can't specify a range of acceptable versions (torchsde>=0.2.4
), and (b) it's not possible to upload any other package to PyPI that depends on torchsde: PyPI disallows uploading packages that have dependencies outside of PyPI.
Prototype here
Repro:
torchsde.BrownianTree(t0=torch.tensor(0.), w0=torch.tensor(0.))(2.)
Not helped by the implicit assumption that t1 = t0 + 1
.
Not suggesting that this happen now; I'm just opening an issue to keep track of it.
I think the current hierarchy of:
_brownian
_core
methods
brownian_lib
_brownian_lib [from C++]
(not counting _core/adjoint_sdes
which I know you've removed in your latest branch)
should be switched to:
_impl
brownian
methods
core
(So additionally tidying the Brownian things together somehow. If we end up implementing every Brownian*
in C++ then that would suffice, for example.)
Hello,
Thanks for the work, it looks great. However, I have a question about logqp_path and its involvement in KL term in latent_sde.py example. I have read the paper "Neural SDE: Stabilizing Neural ODE ..." and went through your code, but it is still not clear why logqp_path is a part of KL term in addition to logqp0 and why we compute it as 1/2*((f-h)/g)^2. It looks like log of Gaussian pdf, but I did not make a connection. I would appreciate if you can clarify how to derive logqp_path.
Jurijs
Hello,
I've tried to simulation the basic model supplied in the example for 20 seconds. t_size remained 100.
The method used us euler as in the supplied example. It results in an error. The file is attached.
long_SDE_error.zip
Thanks in advance,
Ron Teichner
The error is:
/torchsde/_brownian/brownian_interval.py", line 322, in _split_exact
generator = np.random.SeedSequence(entropy=self._top._entropy,
File "bit_generator.pyx", line 316, in numpy.random.bit_generator.SeedSequence.init
File "bit_generator.pyx", line 391, in numpy.random.bit_generator.SeedSequence.get_assembled_entropy
File "bit_generator.pyx", line 156, in numpy.random.bit_generator._coerce_to_uint32_array
File "<array_function internals>", line 5, in concatenate
RecursionError: maximum recursion depth exceeded while calling a Python object
Hi,
I am re-implementing Neural SDE paper.
When I tried using Neural SDE with bunch of linear layers, every thing seem to be good, the train accuracy about 0.92 on valid set.
After that, I tried with convolution layers. First of all, I realize that f
and g
take (B,N)
shape of input. So in the forward
step of my network, I used torch.view
method as a way to reshape flattened input into the square image. And continue feed the data through bunch of Convolution layer, and as final step, one more time, I use the torch.view
method to reshape in to (B,N)
shape. (Take a look below).
class ConvolutionDrift(nn.Module):
def __init__(self, in_channel, size=32, device="cpu"):
super(ConvolutionDrift,self).__init__()
self.size=size
self.in_channel=in_channel
self.conv1 = ConcatConv2d(in_channel, 64,ksize=3,padding=1)
self.norm1 = Norm(64)
self.relu = nn.ReLU(inplace=True)
self.conv2 = ConcatConv2d(64, 64, ksize=3,padding=1)
self.norm2 = Norm(in_channel)
def forward(self,t,x):
bs = x.shape[0]
out = x.view(bs, self.in_channel, self.size, self.size)
# print(f"{out.shape}\n\n\n\n")
out = self.conv1(t,out)
out = self.norm1(out)
out = self.relu(out)
out = self.conv2(t,out)
out = self.norm2(out)
out = self.relu(out)
out = out.view(bs,-1)
return out
class ConvolutionDiffusion(nn.Module):
def __init__(self, in_channel, size=32, brownian_size = 2, device="cpu"):
super(ConvolutionDiffusion,self).__init__()
self.size=size
self.in_channel=in_channel
# self.net = nn.Sequential(*[
# nn.Conv2d(in_channel, 64,3,padding=1),
# nn.GroupNorm(32,64),
# nn.ReLU(),
# nn.Conv2d(64, 64,3,padding=1),
# nn.GroupNorm(32,64),
# nn.ReLU(),
# nn.Conv2d(64,in_channel * 2,3,padding=1),
# nn.ReLU(),
#
# ]).to(device)
self.relu = nn.ReLU()
self.norm1 = Norm(64)
self.conv1 = ConcatConv2d(in_channel, 64, ksize=3, padding = 1)
self.conv2 = ConcatConv2d(64,64, ksize=3, padding = 1)
self.norm2 = Norm(64)
self.conv3 = ConcatConv2d(64, in_channel * brownian_size, ksize = 3, padding = 1)
self.norm3 = Norm(in_channel * brownian_size)
def forward(self,t,x):
bs = x.shape[0]
out = x.view(bs, self.in_channel, self.size, self.size)
# out = self.net(out)
out = self.conv1(t,out)
out = self.norm1(out)
out = self.relu(out)
out = self.conv2(t,out)
out = self.norm2(out)
out = self.relu(out)
out = self.conv3(t,out)
out = self.norm3(out)
out = self.relu(out)
out = out.view(bs,-1)
return out
class SDEBlock(nn.Module):
noise_type="general"
sde_type="ito"
def __init__(self, state_size, brownian_size, batch_size, option = dict(), device="cpu", parallel=False,
method="euler", noise_type="general", integral_type="ito", is_ode=False, input_conv_channel = 64,input_conv_size=6, layers="linear"):
super(SDEBlock, self).__init__()
self.noise_type=noise_type
self.sde_type=integral_type
self.state_size = state_size
self.batch_size = batch_size
self.brownian_size = brownian_size
self.parallel = parallel
self.device = device
self.is_ode = is_ode
if parallel:
self.batch_size = int(self.batch_size / 2)
if layers=="linear":
self.drift = LinearDrift(state_size, state_size).to(device)
self.diffusion = LinearDiffusion(state_size, state_size * brownian_size).to(device)
elif layers=="conv":
self.drift = ConvolutionDrift(input_conv_channel, input_conv_size).to(device)
self.diffusion = ConvolutionDiffusion(input_conv_channel, input_conv_size, brownian_size = self.brownian_size).to(device)
def f(self,t,x):
out = self.drift(t,x)
return out
#return self.f(x)
def g(self,t,x):
bs = x.shape[0]
if self.is_ode:
out = torch.zeros_like((self.batch_size,self.state_size, self.brownian_size)).to(self.device)
return out
out = self.diffusion(t,x)
out = out.view(bs, self.state_size, self.brownian_size)
return out
"""
SDEBlock: LinearDrift dx + LinearDiffusion dW
SDENet: fe -> SDEBlock -> fcc
"""
class SDENet(Model):
def __init__(self, input_channel, input_size, state_size, brownian_size, batch_size, option = dict(), method="euler",
noise_type="general", integral_type="ito", device = "cpu", is_ode=False,parallel = False):
""""""""""""""""""""""""
super(SDENet, self).__init__()
self.batch_size = batch_size
self.parallel = parallel
self.input_size = input_size
self.option = option
self.input_channel = input_channel
#state_size = 64 * 14 * 14
self.device = device
self.fe = nn.Sequential(*[
nn.Conv2d(input_channel,16,3,padding=1),
nn.GroupNorm(8,16),
nn.ReLU(),
nn.Conv2d(16,32,4,padding=2),
nn.GroupNorm(16,32),
nn.ReLU(),
nn.Conv2d(32,64,3,2),
nn.GroupNorm(32,64),
nn.ReLU(),
]).to(device)
state_size, input_conv_channel, input_conv_size = self.get_state_size()
self.input_conv_channel = input_conv_channel
self.input_conv_size = input_conv_size
# print(state_size, input_conv_channel, input_conv_size, "ehehehehehe\n\n\n\n")
# print(f"Init features extraction layer with device {self.device}")
# Output shape from (B,3,32,32) -> (B,64,14,14)
if parallel:
self.batch_size = int(self.batch_size / 2)
self.rm = SDEBlock(
state_size=state_size,
brownian_size = brownian_size,
batch_size = batch_size,
option=option,
method=method,
integral_type=integral_type,
noise_type=noise_type,
device=device,
parallel=parallel,
is_ode=is_ode,
input_conv_channel=input_conv_channel,
input_conv_size=input_conv_size,
layers="conv"
).to(device)
self.fcc = nn.Sequential(*[
nn.AdaptiveAvgPool2d(1),
nn.Flatten(),
nn.Linear(input_conv_channel,10),
nn.Softmax(dim = 1)
]).to(device)
self.intergrated_time = torch.Tensor([0.0,1.0]).to(device)
self.device = device
self.method = method
def get_state_size(self):
out = torch.rand((1,self.input_channel,self.input_size, self.input_size)).to(self.device)
shape = self.fe(out)
return shape.view(1,-1).shape[-1], shape.shape[1], shape.shape[2]
def forward(self,x):
out = self.fe(x)
bs = x.shape[0]
# print(f"Shape after Feature Extraction Layer: {out.shape}")
out = out.view(bs,-1)
# print(f"After the feature extraction step, shape is: {out.shape}")
# print(f"Device of out {out.device}")
# print(f"Shape before the SDE Intergral: {out.shape}")
out = sdeint(self.rm,out,self.intergrated_time, options=self.option,method="euler", atol=5e-2,rtol=5e-2, dt=0.1, dt_min=0.05,adaptive=True)[-1]
out = out.view(bs,self.input_conv_channel, self.input_conv_size, self.input_conv_size)
out = self.fcc(out)
return out
The code run with no any mistake. But there is a thing that the results is not so good. It usually converges at second / thrid epoch.
And accuracy with convolution layers is not good at all.
So my question here is: "Will the convolution layer is available in torchsde by someway or the loss.backward()
does not update ConcatConv2d
s' parameters?"
ForwardSDE
and AdjointSDE
for g_prod
and gdg_prod
/gdg_jvp
/gdg_jvp
.It assumes that the batch dimension can be modified, but something like e.g.
z = (batch, channel)
def g(self, t, y):
return torch.stack([z, y], dim=2)
will then throw an error because of the different batch dimensions. (This being a scenario that I've actually run into.)
Incidentally I'm also concerned (but have not tested) that multiplying the batch dimension will put a peak in our memory usage. Having a dramatically non-rectangular memory profile could reduce the maximum possible batch size and thus slow things down overall.
I'm wondering about switching to v1
for now? At least on test_strat.py
I don't find it that much slower. (And as per above, may even be faster across a batch.)
Hi,
First of all, thanks a lot for the work, this is excellent research!
I am aware of a Julia library that does roughly the same thing (https://github.com/SciML/DiffEqFlux.jl).
Could you please describe what the differences are in your opinion (either algorithmic differences or interface ones)? So far the only one I can see relates to the fact that it seems easier to request a batch of trajectories in torchsde than it is in DiffEqFlux (might be wrong though).
Thanks a lot
Adrien
When I run the tests, it looks like BrownianTree
and BrownianPath
are missing in brownian_lib
module.
After inspecting the test itself, it looks like BrownianTree
and BrownianPath
from brownian_lib
and _brownian
should work compatibly.
I think there are two ways to fix it:
if hasattr
check in the testing code_brownian
in brownian_lib/__init__.py
if the imports in the try
block fails.I observed something strange about computation time for brownain interval
sde.cnt = 0
%timeit -n 2 -r 7 torchsde.sdeint(sde, y0, th.tensor([0.0, 1.0]), dt=0.01, method="euler")
print(sde.cnt)
# 1.87 s ± 60.6 ms per loop (mean ± std. dev. of 7 runs, 2 loops each)
# 1428
sde.cnt = 0
%timeit -n 2 -r 7 torchsde.sdeint(sde, y0, th.tensor([0.0, 5.0]), dt=0.05, method="euler")
print(sde.cnt)
# 57.3 ms ± 1.12 ms per loop (mean ± std. dev. of 7 runs, 2 loops each)
# 1414
sde.cnt = 0
%timeit -n 2 -r 7 torchsde.sdeint(sde, y0, th.tensor([0.0, 10.0]), dt=0.1, method="euler")
print(sde.cnt)
# 57.2 ms ± 1.05 ms per loop (mean ± std. dev. of 7 runs, 2 loops each)
# 1414
where the sde is very similar to the one defined in the Quick example in README. In the above three examples, I change the different ts
and dt
. I think they should have roughly the same computation time. But it turns out the time used by the line are very different. According to the paper, the worse case should roughly be O(log T/dt) if I understand correctly. Why the first case is so slow?
Run the code in latent_sde.py. On cpu everything works fine, but when switching to GPU and using sdeint_adjoint() the following error appears:
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
As mentioned above, the only difference is the use of a GPU. Tried, just in case, moving leaf variables to device and it didn't fix the problem (they had already been moved to the appropriate device in the original code). I can also confirm that all variables have requires_grad = True.
Finally, sdeint() works on both CPU and GPU.
I am trying to reproduce the results of the CMU Motion Capture dataset. I use references from examples/latent_sde_lorenz.py, the paper, and the preprocessed dataset linked by ODE2VAE's repo.
My current runs' results have large discrepancies from the results in the paper so I want to check if there are any training details I'm missing. (I am not very familiar with Bayesian modeling so I try to follow the hyperparameters in the paper as closely as possible.)
Here are the main issues:
10^6
to 10^9
depending on the choice of hyperparameters, while the code for ODE2VAE has the log-likelihood for ODE2VAE in the magnitude of 10^4
.Here are the main training details I think I am most likely to wrongly interpret from the paper:
euler_heun
, milstein
, and even reversible_heun
.0.01
in the paper, but in some runs with the initial lr to be 0.001
, they seem to be more stable. I'm curious if you have any comments about this.1/5
of the minimum time difference between two observations. All observations are regular so we can choose the minimum time to be a particular value a
(eg., 1), then dt
would be 0.2 * a
. I want to know if my interpretation here is correct. The paper didn't mention this value a
or the start/end time. It would be nice if you remembered this.Tagging @lxuechen since you know the most about the exp details. Thank you for your time!
I have a set of coupled SDE with white noise. I would like to solve it using torchsde
but I completely do not understand what is exactly W(t)
from documentation.
Is it Wiener process with zero mean and unit volatility? How can I add Wiener process with sigma != 1
into each equation of my set?
Our naive guess was:
def g(self, t, number_of_equations):
noise = np.sqrt(2*A)* torch.ones_like(number_of_equations).unsqueeze(-1)
return noise
where 2*A
is the volatility of Wiener process.
However, it seems that it does not work.
Next, consider the object BrownianInterval
. We have a hope that this can help. We initialize:
bm = BrownianInterval(t0=ts[0]., t1=ts[-1]., size=(1,1), device='cuda')
and next try to solve SDE by using sdeint
.
What is exactly batch_size
? What is exactly brownian_size
? For brownian_size
I can not find any definition or any other documentation.
Finally, from documentation:
size
(tuple of int): The shape of each Brownian sample. If zero dimensional represents a scalar Brownian motion. If one dimensional represents a batch of scalar Brownian motions. If >two dimensional the last dimension represents the size of a a multidimensional Brownian motion, and all previous dimensions represent batch dimensions.
but if we set size=1
, we see TypeError: int object is no iterable
Could you please clarify these points?
Can we pick a Python version to support? I'd suggest >=3.6.
Future statements: currently used are absolute_import
, division
and print_function
. As far as I know these only affect Python 2 which is out of support anyway. Should we really include them?
Likewise, the setting of annotations to {}
seems unnecessary to me. The minimum verison of Python now supported by PyTorch is 3.6, and type annotations were introduced in 3.5 I believe.
I think we're already demanding at least Python 3.3 for the @property@abstractmethod
to work.
Hey,
Could you guys share the example code for the toy problems presented in the paper? Especially the one for Stochastic Lorenz Attractor?
I was looking at how to solve a system of stochastic differential equations using torchsde where there is inter-dependecy between some of the functions. Can you help me out?
Cheers,
Miguel
Can someone explain what exactly does the parameter W
(The increment of the Brownian motion over the interval [t0, t1]) in BrownianInterval
mean mathematically?
Let's say that I have a Brownian motion with E[dB_t dB_t^T] = Qdt
. Is the parameter W
related to Q
? If yes, how should I set the value of W
given that I have Q
?
I'm proposing to get rid of the 2d shape checks. These were added in #88 as part of the v0.2.4.
These checks are creating a huge barrier for applications that don't have vectorized data naturally. Flattening and unflattening will hurt efficiency tremendously.
See this comment:
I think this line is slightly wrong:
torchsde/torchsde/_brownian/brownian_tree.py
Line 175 in 381f4e9
side='right'
set, as np.searchsorted([0., 1.], 0.) == 0
for example.
searchsorted
is used in a few places elsewhere and I'm not completely sure if they're correct either, which is why I'm opening an issue rather than just fixing it.
Incidentally torch.bucketize
is available as of 1.6 so we can switch to that now.
setup.py
doesn't currently enable OpenMP for the C++ extensions. (Which by-the-by we may wish to think about the future of as maintaining those is going to be rather a hassle.)
See e.g.
https://github.com/patrick-kidger/signatory/blob/master/setup.py#L33
for an example of how to enable the OpenMP flags.
(Whilst we're at it we should also add the fvisibility flag; see the same example.)
Hi there,
first of all thank you for your work! This is great!
Disclaimer: i don't know much about SDE, but i would love to make use of neural sde, since i have some use cases that i think could work with it. But mostly i find it highly interesting! So i hope i didn't get it all wrong entirely.
I think there are quite some differences between your code example and whats decribed in your paper.
In the paper it says, that you have a 1 layer GRU to recover the dynamics and f_net and h_net both are 1 layer MLP, while f_net takes an additional context variable of size 1 . Then, there's a decoder to map back from latent space to feature space.
So if i understand correctly, this model would work just like shown in figure 4 of your paper:
The GRU consumes a (time reversed) sequence of inputs in observation space and outputs a sequence in a 4D latent space. The final output (at t0) then is your initial condition (z0) for the sde, which is integrated through time, producing another sequence in latent space. So the dynamics would happen in latent space. Then the decocer maps back to observation space. What's a little unclear to me, is what the context would be. Just one (the last) latent variable from each of the GRU outputs?
Anyway, this describes an actual latent sde, since all the dynamics are happening in latent space.
However, in your implementation in the example, all the dynamics happen in observation space directly. f_net and h_net both map from observation space to drifts in obeservation space (well f_net again sees an additional context, which is of size 64 here). So the GRU encoder "only " provides the context, but not the initial latent state. Meaning it does not recover the dynamics, if i understood it right.
So this is not an actual "Latent SDE" model, is it? More like a "Latent informed/controlled SDE"?
Similar thing for the latent_sde example. The dynamics are learned in obesrvation space as well, so there is no latent space involved. One could claim that the latent space is equal to the observation space here of course. Or do i simply have the wrong idea of what "Latent SDE/ODE" actually means?
In general, i think this package could greatly benefit from a more in depth documentation in regards to training models with it and/or better explained examples. One or two examples of standart use cases in jupyter notebooks with detailed explanations could help a lot to make this more accessible to people that don't have much background in SDE (like me).
Projet
Hi,
I am trying to modify the latent_sde_lorenz example to learn an SDE from a simple 2-dimension noise data. The code works well but the only problem is that the learned SDE will always generate quite smooth samples.
I'd like the learned SDE to show the same amount of stochasticity as the data, any idea how to deal with this? Is there any parameter that needs to be tuned?
I am trying the run the example sde_gan.py
with GPU support.
I have run the code directly as it is in a Google Colab session (with GPU execution) as well as in local and in both cases, there is almost no improvement in the running time in comparison with CPU execution.
I have looked up the code but I did not find any problem in sde_gan.py
; maybe the problem is in the source files for torchsde and torchcde.
Could we make this argument an optional one so that users are freed from typing this in when they explicitly initialize BrownianInterval? An optional could be torch.get_default_dtype()
.
Hi Xuechen - I appreciate I'm bombarding this repository with a lot of issues/PRs.
Something we've recently found over in torchdiffeq#115 is that it's noticeably more efficient to have the internals of the library operate on tensors, rather than tuples-of-tensors, and have the adjoint method simply torch.cat
its different channels together.
Of course this also makes the code a whole lot cleaner, and I expect easier to develop going forward.
Not something I've got time to put together a PR for I'm afraid, but I thought I'd flag this up as something you might be interested in over here.
When I run the default sdeint, it says "evaluation times 'ts' must be strictly increasing", is there any setting or solver that can be used to solve the problem with decreasing time.
Hi, thanks for the great work. I have questions on using Latent SDE for irregularly spaced time series.
As far as I can tell there are no code examples covering latent SDE for irregularly spaced time series in this repo. latent_sde_lorenz.py
is regularly spaced and the encoder is just a regular GRU with no time information. Is there any publicly available code on this?
If we would like to adapt latent SDE for irregularly spaced time series, what kind of encoder would you suggest? Would an ODE-RNN work for example?
Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.