Giter VIP home page Giter VIP logo

Comments (10)

gabrielastro avatar gabrielastro commented on July 30, 2024
  1. Also related, would it be possible to have in database.add_model() the option of limiting not only teff but also the other parameter ranges? Some parameter ranges are sometimes known to be much too wide (especially if they will be fixed), so one could save some (a lot of) read-in time.

  2. Atmospheric model files can have maaaany significant digits:

# Wavelength (um) - Flux (W m-2 um-1)
5.000000000000000000e-01 1.046887465549093861e+02
5.001237804050535640e-01 1.020516848591326635e+02
5.002475914532845680e-01 1.008752535495290061e+02
5.003714331522789438e-01 1.008323757761552457e+02

but this could be dramatically reduced without loss of precision, which would reduce not only the needed RAM (I guess) but certainly the file size by a large factor (ca. 3, or 0.5 dex!).

from species.

tomasstolker avatar tomasstolker commented on July 30, 2024

Thanks for opening another issue 😊!

  1. Indeed the whole file is unpacked. Ideally it would check if the files are present and if the files have the correct size (i.e. that a previous try was not aborted), but this is not something I plan to implement (but would welcome a PR!). I can add an unpack parameter in add_model though.

  2. There is the configuration file where you can set the download/unpack folder and also the HDF5 file that is to be used. So downloading, unpacking, and adding is only required once.

  3. I have implemented this in commit 4804241. So it only unpacks spectra files within the selected teff_range. For now, I am not planning to add support for adding partial files, since setting teff_range should for most cases be fine, but would welcome a PR.

  4. Hm not sure, should be somewhat straightforward to implement, but my guess is that such additional restrictions are typically not needed. It is only needed to add the model spectra once, so should be fine if that takes a few minutes (at most). In FitModel it is possible to fix a parameter though!

  5. I wouldn't know what would be a reasonable number... It may also depend on the model? So I rather stick with the safe number of decimals 😉

from species.

gabrielastro avatar gabrielastro commented on July 30, 2024

Thanks for answering quickly to one more issue 😊. Thanks a lot for adding the unpack parameter! It works very nicely. Now it is much faster.

  1. Good point that it would require some checks to avoid bad surprises later on. There probably exist some high-quality libraries implementing this but this is not my area… Maybe an easy compromise would be to offer a "use-at-your-risk" option only checking if the file exists (and warn the user that it is up to him or her to make sure the files are not corrupted)?

  2. Ah! perfect. Sorry for having missed this!

  3. Thanks! I agree that partial wavelength ranges are not needed (and would require some trouble to program).

4a. Even though add_model() is now much faster, it is taking seeeeveral minutes when initialising:

database.add_model(model='exo-rem-highres', teff_range=(1000., 1500.))

printed after eight minutes Unpacking 3171/9575 model spectra from Exo-REM (16 GB)... and after further ca. four minutes:

Please cite Charnay et al. (2018) when using Exo-REM in a publication
Reference URL: https://ui.adsabs.harvard.edu/abs/2018ApJ...854..172C/abstract
Wavelength range (um) = 0.67 - 250.0
Spectral resolution = 20000
Teff range (K) = 1000.0 - 1500.0

and the unpacking began, and after ten more minutes it printed Adding Exo-REM model spectra... [DONE], and then after ca. two more minutes it was done, including:

Number of stored grid points: 3300
Number of interpolated grid points: 128

So, in total 25 minutes. Programming the main part of the code obviously takes (me) much more time than this :), but still… The (University) computer is quite modern, with many cores and sufficient RAM, I thought, and there are no network issues. Or do these numbers seem suspicious to you?

I guess it has to be done only once per database file. I was thinking of having a local database for every project to keep it tidier, but given the time it takes I will probably re-use the database across projects. Which is ok too! I am still a beginner.

Maybe restricting the other parameters in the same way as Teff would still be relatively easy to implement? Or maybe not because not all grids have the same axes…

4b. If database.add_model() is called again for the same models, the same process starts over, right? Maybe the following changes would be sensible: if changing wavel_range, keep the behaviour as is (everything just has to be overwritten anyway to avoid complicated programmer-side checks and tracking), but if some other range of parameter is different compared to what is stored in the database, added only the needed part instead of again unpacking and reading in everything. If the range (e.g., of Teff) gets reduced, models would need to be delete from the database file, instead of all models from that family deleted and only the desired ones being read in again. But again, if it is too complicated (messy) to program, maybe it is not needed…

  1. Actually, it may be simple:
  • The smallest delta lambda for each grid is known, so keeping three more significant digits than this smallest number will be more than enough. Actually, since the wavelength is in scientific notation and the resolution = lambda/Delta lambda is constant, the number of digits needed is easy! N_digits = ceil(log10(R))+3 is more than enough.
  • What is the highest per-bin flux accuracy conveivable in the model files that you read in and/or in observations? SNR = 1e3? So four or at the very most five significant digits would be more than enough, right?

Was that convincing ;)? These measures would reduce the file sizes dramatically!

from species.

tomasstolker avatar tomasstolker commented on July 30, 2024
  1. That does take a long time! The high-res grid had not been used before by anyone I think... Okay perhaps adding the additional parameter restrictions would help in that case. What you can also do, is creating a separate folder and copy from data_folder only the model spectra files that you want to use, and then add them with add_custom_model.

  2. SNR of 1000 would certainly be sufficient for planet observations but I don't know if that is also a typical convergence accuracy for the atmospheric models?

from species.

tomasstolker avatar tomasstolker commented on July 30, 2024
  1. I just realized that add_model has the wavel_range parameter. It does have to read in each file though, but then only stores the selected wavelength range.

from species.

gabrielastro avatar gabrielastro commented on July 30, 2024

Remaining in this thread:

  1. I realised that calling FitModel() with a value for teff in bounds but without having first called database.add_model(), the models for all Teff values are added, instead of only the needed range. Maybe it would make sense to align the behaviour to that of database.add_model(). Also, I guess it means that one (never?) needs to call database.add_model() explicitly and that FitModel() will take care of this.

In passing: "safe some time" → "save some time".

  1. Right, perfect! Thanks.

  2. Thanks for pointing that out! By the way, about spec_res:

spec_res : float, None
    Spectral resolution to which the spectra will be resampled.
    This parameter is optional since the spectra have already
    been resampled to a lower, constant resolution (typically
    :math:`R = 5000`). The argument is only used if
    ``wavel_range`` is not ``None``.

I confused myself into thinking that it is meaningful to set this if the model resolution is much higher than that of the data but that is not true; FitModel will take care of this. Maybe it would be good to add a comment on this in the documentation. Is there a typical scenario where spec_res would be useful? Also, what happens if the model resolution is worse than that of the data?

  1. Good suggestion, thank you!

  2. Maybe the SNR I mentioned is a red herring. Certainly interpolating the models means that we cannot trust more than the third or fourth significant digit to be physically meaningful, so anything past this can be removed from the input file (but of course then working numerically in double precision or whatever is needed). That will dramatically reduce the file size.

from species.

tomasstolker avatar tomasstolker commented on July 30, 2024

There is more in species then FitModel 😉 so 1 and 6 seem too specific. The spec_res parameter from add_model was needed in an early version of the package. Currently, it only helps in terms of time/storage efficiency.

from species.

gabrielastro avatar gabrielastro commented on July 30, 2024

Ok about 1 😄. What happens if the model resolution is worse than that of the data, though? But for 6, I cannot imagine how an interpolated model could have more than three or four significant digits; the rest is basically numerical noise… (But there is no real harm; just to storage space, transfer time, and elegance 😉.)

from species.

tomasstolker avatar tomasstolker commented on July 30, 2024

Not all routines require interpolation so I think that your suggestion could introduce unwanted inaccuracies for some of the non-FitModel applications.

from species.

gabrielastro avatar gabrielastro commented on July 30, 2024

Hmm… Ok, as you wish… To be continued another time. Or just try truncating the input files and see how many bug reports start coming in 😁. Thanks in any case for this exchange!

from species.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.