Comments (8)
Hi @mitchelldaneker, It looks like I get the same results as the OP when running the pytorch script; only the G, for which we have observations, seems well estimated by the SBINN after the training. So to be clear, should we expect the other outputs to be estimated as neatly as G ? Or are the OP plots the final results which prove the usefulness of the ODE model ?
Yes you can only "trust" G in this case, and that is a loose trust. Remember that G has data, so you can compare to that data and look at the trustworthiness in that sense. We have found that generally with inverse PINN, the parameters are learned long before the state variables are, hence the "loose trust". You may need to train 4-5x as long to get good results on the other state variables. In this case, since we have a standard method for solving the ODE model and we know it is fast, it would be better to plug those parameters into that solver and use those predictions.
from sbinn.
We only have 1 observable, G. This means the network will only be looking at G when it is training and estimating parameters. Since the network cannot observe the other 5 state variables, its prediction of those state variables will be very poor as seen in your figure. That is why we have the ODE model. You will get much better information on the other state variables when you solve the ODE model with the inferred parameters.
Note that in the practical identifiability analysis section of the paper, you will find that one of the parameters is unable to be inferred. This is due to that parameter having no effect on G - thus making it unidentifiable when you only have G to estimate parameters. While this will mean using the ODE model to solve for the other 5 state variables will have some error, it will be a much better result than the network alone.
from sbinn.
Thank you for your detailed answer! It helped a lot!!
from sbinn.
Thank you for your detailed answer! It helped a lot!!
Do you have any comments on the output transform which was mentioned in this Sbinn paper ,I don't quite understand about it. Thanks
from sbinn.
There is a description in the paper, but essentially an output transform is done on the output of the network. For a simple description, imagine we have two outputs, A and B. We can use the output transform to do a few things, two of the main uses is scaling and applying hard constraints.
For the scaling, imagine B/1000 ~ O(A). If this is the case, the network may struggle to provide both outputs due to the order of magnitude difference. As a way around this, we can scale the variables such that the network outputs will be the same order of magnitude. To solve the issue in this simple case, we can look at the order of magnitude and realize if we multiply B by 1000 in the output transform, both will have the same order of magnitude within the network. This means that the network is actually predicting B/1000, and multiplying by 1000 will provide us with B. In the paper, we use the mean of the data as a means of scaling.
As for hard constraints, these are useful for applying initial or boundary conditions. Say that as an IC, at t = 0, B = 0 and A = 1. We could apply soft constraints via dde.IC. Hard constraints would be multiplying A and B by functions that would force them to always follow those initial conditions. For B, we may multiply by tanh(t) which is zero at t=0. For A, we may also multiply by tanh(0), but to give the IC we add exp(t). So the equation would be (A*tanh(t) + exp(t)), which satisfies the initial condition. The exact equations you use are dependent on the IC/BC and the system.
from sbinn.
There is a description in the paper, but essentially an output transform is done on the output of the network. For a simple description, imagine we have two outputs, A and B. We can use the output transform to do a few things, two of the main uses is scaling and applying hard constraints.
For the scaling, imagine B/1000 ~ O(A). If this is the case, the network may struggle to provide both outputs due to the order of magnitude difference. As a way around this, we can scale the variables such that the network outputs will be the same order of magnitude. To solve the issue in this simple case, we can look at the order of magnitude and realize if we multiply B by 1000 in the output transform, both will have the same order of magnitude within the network. This means that the network is actually predicting B/1000, and multiplying by 1000 will provide us with B. In the paper, we use the mean of the data as a means of scaling.
As for hard constraints, these are useful for applying initial or boundary conditions. Say that as an IC, at t = 0, B = 0 and A = 1. We could apply soft constraints via dde.IC. Hard constraints would be multiplying A and B by functions that would force them to always follow those initial conditions. For B, we may multiply by tanh(t) which is zero at t=0. For A, we may also multiply by tanh(0), but to give the IC we add exp(t). So the equation would be (A*tanh(t) + exp(t)), which satisfies the initial condition. The exact equations you use are dependent on the IC/BC and the system.
I found that the convergence speed generally becomes very slow after using the same method to implement hard constraints, is this a defect of hard constraints? Or the setting of the weight of the loss term or some other factors may have an effect on it?
It converges in about 100,000 or 200,000 cycles when I use soft constraints.
from sbinn.
Hi @mitchelldaneker,
It looks like I get the same results as the OP when running the pytorch script; only the G, for which we have observations, seems well estimated by the SBINN after the training. So to be clear, should we expect the other outputs to be estimated as neatly as G ? Or are the OP plots the final results which prove the usefulness of the ODE model ?
from sbinn.
There is a description in the paper, but essentially an output transform is done on the output of the network. For a simple description, imagine we have two outputs, A and B. We can use the output transform to do a few things, two of the main uses is scaling and applying hard constraints.
For the scaling, imagine B/1000 ~ O(A). If this is the case, the network may struggle to provide both outputs due to the order of magnitude difference. As a way around this, we can scale the variables such that the network outputs will be the same order of magnitude. To solve the issue in this simple case, we can look at the order of magnitude and realize if we multiply B by 1000 in the output transform, both will have the same order of magnitude within the network. This means that the network is actually predicting B/1000, and multiplying by 1000 will provide us with B. In the paper, we use the mean of the data as a means of scaling.
As for hard constraints, these are useful for applying initial or boundary conditions. Say that as an IC, at t = 0, B = 0 and A = 1. We could apply soft constraints via dde.IC. Hard constraints would be multiplying A and B by functions that would force them to always follow those initial conditions. For B, we may multiply by tanh(t) which is zero at t=0. For A, we may also multiply by tanh(0), but to give the IC we add exp(t). So the equation would be (A*tanh(t) + exp(t)), which satisfies the initial condition. The exact equations you use are dependent on the IC/BC and the system.I found that the convergence speed generally becomes very slow after using the same method to implement hard constraints, is this a defect of hard constraints? Or the setting of the weight of the loss term or some other factors may have an effect on it? It converges in about 100,000 or 200,000 cycles when I use soft constraints.
Sorry for the late reply @chenyv118. By using hard constraints, you are changing the function from the very start. This could have a potent effect on the loss field, and thus could require a change in your weights. Generally, initialization methods will output near zero for the network outputs. If you do hard constraints, especially with linear scaling and addition like we do hear, this could have an effect on the loss and may require slightly different weights.
from sbinn.
Related Issues (12)
- No module named 'tensorflow.python' HOT 1
- Use get_variable to limited parameters search range HOT 4
- AttributeError: 'RefVariable' object has no attribute 'tanh' HOT 2
- epochs is deprecated and will be removed in a future version. Use iterations instead. HOT 2
- AttributeError: 'function' object has no attribute 'concat' HOT 1
- Where do you define the loss functions present in the paper? HOT 2
- Errors when running practical_identifiability.jl HOT 4
- How to setup "ODE_weights" and “data_weights” ? HOT 10
- get_variable function HOT 7
- output_transform function HOT 3
- Incorrect layer definition in Pytorch implementation HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sbinn.