Giter VIP home page Giter VIP logo

Comments (8)

mitchelldaneker avatar mitchelldaneker commented on August 11, 2024 1

Hi @mitchelldaneker, It looks like I get the same results as the OP when running the pytorch script; only the G, for which we have observations, seems well estimated by the SBINN after the training. So to be clear, should we expect the other outputs to be estimated as neatly as G ? Or are the OP plots the final results which prove the usefulness of the ODE model ?

Yes you can only "trust" G in this case, and that is a loose trust. Remember that G has data, so you can compare to that data and look at the trustworthiness in that sense. We have found that generally with inverse PINN, the parameters are learned long before the state variables are, hence the "loose trust". You may need to train 4-5x as long to get good results on the other state variables. In this case, since we have a standard method for solving the ODE model and we know it is fast, it would be better to plug those parameters into that solver and use those predictions.

from sbinn.

mitchelldaneker avatar mitchelldaneker commented on August 11, 2024

We only have 1 observable, G. This means the network will only be looking at G when it is training and estimating parameters. Since the network cannot observe the other 5 state variables, its prediction of those state variables will be very poor as seen in your figure. That is why we have the ODE model. You will get much better information on the other state variables when you solve the ODE model with the inferred parameters.

Note that in the practical identifiability analysis section of the paper, you will find that one of the parameters is unable to be inferred. This is due to that parameter having no effect on G - thus making it unidentifiable when you only have G to estimate parameters. While this will mean using the ODE model to solve for the other 5 state variables will have some error, it will be a much better result than the network alone.

from sbinn.

RubADuckDuck avatar RubADuckDuck commented on August 11, 2024

Thank you for your detailed answer! It helped a lot!!

from sbinn.

ZSTanone avatar ZSTanone commented on August 11, 2024

Thank you for your detailed answer! It helped a lot!!
Do you have any comments on the output transform which was mentioned in this Sbinn paper ,I don't quite understand about it. Thanks

from sbinn.

mitchelldaneker avatar mitchelldaneker commented on August 11, 2024

There is a description in the paper, but essentially an output transform is done on the output of the network. For a simple description, imagine we have two outputs, A and B. We can use the output transform to do a few things, two of the main uses is scaling and applying hard constraints.

For the scaling, imagine B/1000 ~ O(A). If this is the case, the network may struggle to provide both outputs due to the order of magnitude difference. As a way around this, we can scale the variables such that the network outputs will be the same order of magnitude. To solve the issue in this simple case, we can look at the order of magnitude and realize if we multiply B by 1000 in the output transform, both will have the same order of magnitude within the network. This means that the network is actually predicting B/1000, and multiplying by 1000 will provide us with B. In the paper, we use the mean of the data as a means of scaling.

As for hard constraints, these are useful for applying initial or boundary conditions. Say that as an IC, at t = 0, B = 0 and A = 1. We could apply soft constraints via dde.IC. Hard constraints would be multiplying A and B by functions that would force them to always follow those initial conditions. For B, we may multiply by tanh(t) which is zero at t=0. For A, we may also multiply by tanh(0), but to give the IC we add exp(t). So the equation would be (A*tanh(t) + exp(t)), which satisfies the initial condition. The exact equations you use are dependent on the IC/BC and the system.

from sbinn.

chenyv118 avatar chenyv118 commented on August 11, 2024

There is a description in the paper, but essentially an output transform is done on the output of the network. For a simple description, imagine we have two outputs, A and B. We can use the output transform to do a few things, two of the main uses is scaling and applying hard constraints.

For the scaling, imagine B/1000 ~ O(A). If this is the case, the network may struggle to provide both outputs due to the order of magnitude difference. As a way around this, we can scale the variables such that the network outputs will be the same order of magnitude. To solve the issue in this simple case, we can look at the order of magnitude and realize if we multiply B by 1000 in the output transform, both will have the same order of magnitude within the network. This means that the network is actually predicting B/1000, and multiplying by 1000 will provide us with B. In the paper, we use the mean of the data as a means of scaling.

As for hard constraints, these are useful for applying initial or boundary conditions. Say that as an IC, at t = 0, B = 0 and A = 1. We could apply soft constraints via dde.IC. Hard constraints would be multiplying A and B by functions that would force them to always follow those initial conditions. For B, we may multiply by tanh(t) which is zero at t=0. For A, we may also multiply by tanh(0), but to give the IC we add exp(t). So the equation would be (A*tanh(t) + exp(t)), which satisfies the initial condition. The exact equations you use are dependent on the IC/BC and the system.

I found that the convergence speed generally becomes very slow after using the same method to implement hard constraints, is this a defect of hard constraints? Or the setting of the weight of the loss term or some other factors may have an effect on it?
1685498835750
It converges in about 100,000 or 200,000 cycles when I use soft constraints.

from sbinn.

HGangloff avatar HGangloff commented on August 11, 2024

Hi @mitchelldaneker,
It looks like I get the same results as the OP when running the pytorch script; only the G, for which we have observations, seems well estimated by the SBINN after the training. So to be clear, should we expect the other outputs to be estimated as neatly as G ? Or are the OP plots the final results which prove the usefulness of the ODE model ?

from sbinn.

mitchelldaneker avatar mitchelldaneker commented on August 11, 2024

There is a description in the paper, but essentially an output transform is done on the output of the network. For a simple description, imagine we have two outputs, A and B. We can use the output transform to do a few things, two of the main uses is scaling and applying hard constraints.
For the scaling, imagine B/1000 ~ O(A). If this is the case, the network may struggle to provide both outputs due to the order of magnitude difference. As a way around this, we can scale the variables such that the network outputs will be the same order of magnitude. To solve the issue in this simple case, we can look at the order of magnitude and realize if we multiply B by 1000 in the output transform, both will have the same order of magnitude within the network. This means that the network is actually predicting B/1000, and multiplying by 1000 will provide us with B. In the paper, we use the mean of the data as a means of scaling.
As for hard constraints, these are useful for applying initial or boundary conditions. Say that as an IC, at t = 0, B = 0 and A = 1. We could apply soft constraints via dde.IC. Hard constraints would be multiplying A and B by functions that would force them to always follow those initial conditions. For B, we may multiply by tanh(t) which is zero at t=0. For A, we may also multiply by tanh(0), but to give the IC we add exp(t). So the equation would be (A*tanh(t) + exp(t)), which satisfies the initial condition. The exact equations you use are dependent on the IC/BC and the system.

I found that the convergence speed generally becomes very slow after using the same method to implement hard constraints, is this a defect of hard constraints? Or the setting of the weight of the loss term or some other factors may have an effect on it? 1685498835750 It converges in about 100,000 or 200,000 cycles when I use soft constraints.

Sorry for the late reply @chenyv118. By using hard constraints, you are changing the function from the very start. This could have a potent effect on the loss field, and thus could require a change in your weights. Generally, initialization methods will output near zero for the network outputs. If you do hard constraints, especially with linear scaling and addition like we do hear, this could have an effect on the loss and may require slightly different weights.

from sbinn.

Related Issues (12)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.