When doing several times within a session or across sessions but in the same dir

Also related, would it be possible to have in <code cla

Thanks for opening another issue 😊! Indeed the

That does take a long time! The high-res grid had not b

I just realized that add_model has th

Remaining in this thread: I realised that calling <code class=

There is more in species then <code class="notranslat

Making unpacking/storing of atmospheric models more memory-efficient about species HOT 10 CLOSED

gabrielastro commented on July 30, 2024

Making unpacking/storing of atmospheric models more memory-efficient

from species.

Comments (10)

gabrielastro commented on July 30, 2024

Also related, would it be possible to have in database.add_model() the option of limiting not only teff but also the other parameter ranges? Some parameter ranges are sometimes known to be much too wide (especially if they will be fixed), so one could save some (a lot of) read-in time.
Atmospheric model files can have maaaany significant digits:

# Wavelength (um) - Flux (W m-2 um-1)
5.000000000000000000e-01 1.046887465549093861e+02
5.001237804050535640e-01 1.020516848591326635e+02
5.002475914532845680e-01 1.008752535495290061e+02
5.003714331522789438e-01 1.008323757761552457e+02

but this could be dramatically reduced without loss of precision, which would reduce not only the needed RAM (I guess) but certainly the file size by a large factor (ca. 3, or 0.5 dex!).

from species.

tomasstolker commented on July 30, 2024

Thanks for opening another issue 😊!

Indeed the whole file is unpacked. Ideally it would check if the files are present and if the files have the correct size (i.e. that a previous try was not aborted), but this is not something I plan to implement (but would welcome a PR!). I can add an unpack parameter in add_model though.
There is the configuration file where you can set the download/unpack folder and also the HDF5 file that is to be used. So downloading, unpacking, and adding is only required once.
I have implemented this in commit 4804241. So it only unpacks spectra files within the selected teff_range. For now, I am not planning to add support for adding partial files, since setting teff_range should for most cases be fine, but would welcome a PR.
Hm not sure, should be somewhat straightforward to implement, but my guess is that such additional restrictions are typically not needed. It is only needed to add the model spectra once, so should be fine if that takes a few minutes (at most). In FitModel it is possible to fix a parameter though!
I wouldn't know what would be a reasonable number... It may also depend on the model? So I rather stick with the safe number of decimals 😉

from species.

gabrielastro commented on July 30, 2024

Thanks for answering quickly to one more issue 😊. Thanks a lot for adding the unpack parameter! It works very nicely. Now it is much faster.

Good point that it would require some checks to avoid bad surprises later on. There probably exist some high-quality libraries implementing this but this is not my area… Maybe an easy compromise would be to offer a "use-at-your-risk" option only checking if the file exists (and warn the user that it is up to him or her to make sure the files are not corrupted)?
Ah! perfect. Sorry for having missed this!
Thanks! I agree that partial wavelength ranges are not needed (and would require some trouble to program).

4a. Even though add_model() is now much faster, it is taking seeeeveral minutes when initialising:

database.add_model(model='exo-rem-highres', teff_range=(1000., 1500.))

printed after eight minutes Unpacking 3171/9575 model spectra from Exo-REM (16 GB)... and after further ca. four minutes:

Please cite Charnay et al. (2018) when using Exo-REM in a publication
Reference URL: https://ui.adsabs.harvard.edu/abs/2018ApJ...854..172C/abstract
Wavelength range (um) = 0.67 - 250.0
Spectral resolution = 20000
Teff range (K) = 1000.0 - 1500.0

and the unpacking began, and after ten more minutes it printed Adding Exo-REM model spectra... [DONE], and then after ca. two more minutes it was done, including:

Number of stored grid points: 3300
Number of interpolated grid points: 128

So, in total 25 minutes. Programming the main part of the code obviously takes (me) much more time than this :), but still… The (University) computer is quite modern, with many cores and sufficient RAM, I thought, and there are no network issues. Or do these numbers seem suspicious to you?

I guess it has to be done only once per database file. I was thinking of having a local database for every project to keep it tidier, but given the time it takes I will probably re-use the database across projects. Which is ok too! I am still a beginner.

Maybe restricting the other parameters in the same way as Teff would still be relatively easy to implement? Or maybe not because not all grids have the same axes…

4b. If database.add_model() is called again for the same models, the same process starts over, right? Maybe the following changes would be sensible: if changing wavel_range, keep the behaviour as is (everything just has to be overwritten anyway to avoid complicated programmer-side checks and tracking), but if some other range of parameter is different compared to what is stored in the database, added only the needed part instead of again unpacking and reading in everything. If the range (e.g., of Teff) gets reduced, models would need to be delete from the database file, instead of all models from that family deleted and only the desired ones being read in again. But again, if it is too complicated (messy) to program, maybe it is not needed…

Actually, it may be simple:

The smallest delta lambda for each grid is known, so keeping three more significant digits than this smallest number will be more than enough. Actually, since the wavelength is in scientific notation and the resolution = lambda/Delta lambda is constant, the number of digits needed is easy! N_digits = ceil(log10(R))+3 is more than enough.
What is the highest per-bin flux accuracy conveivable in the model files that you read in and/or in observations? SNR = 1e3? So four or at the very most five significant digits would be more than enough, right?

Was that convincing ;)? These measures would reduce the file sizes dramatically!

from species.

tomasstolker commented on July 30, 2024

That does take a long time! The high-res grid had not been used before by anyone I think... Okay perhaps adding the additional parameter restrictions would help in that case. What you can also do, is creating a separate folder and copy from data_folder only the model spectra files that you want to use, and then add them with add_custom_model.
SNR of 1000 would certainly be sufficient for planet observations but I don't know if that is also a typical convergence accuracy for the atmospheric models?

from species.

tomasstolker commented on July 30, 2024

I just realized that add_model has the wavel_range parameter. It does have to read in each file though, but then only stores the selected wavelength range.

from species.

gabrielastro commented on July 30, 2024

Remaining in this thread:

I realised that calling FitModel() with a value for teff in bounds but without having first called database.add_model(), the models for all Teff values are added, instead of only the needed range. Maybe it would make sense to align the behaviour to that of database.add_model(). Also, I guess it means that one (never?) needs to call database.add_model() explicitly and that FitModel() will take care of this.

In passing: "safe some time" → "save some time".

Right, perfect! Thanks.
Thanks for pointing that out! By the way, about spec_res:

spec_res : float, None
    Spectral resolution to which the spectra will be resampled.
    This parameter is optional since the spectra have already
    been resampled to a lower, constant resolution (typically
    :math:`R = 5000`). The argument is only used if
    ``wavel_range`` is not ``None``.

I confused myself into thinking that it is meaningful to set this if the model resolution is much higher than that of the data but that is not true; FitModel will take care of this. Maybe it would be good to add a comment on this in the documentation. Is there a typical scenario where spec_res would be useful? Also, what happens if the model resolution is worse than that of the data?

Good suggestion, thank you!
Maybe the SNR I mentioned is a red herring. Certainly interpolating the models means that we cannot trust more than the third or fourth significant digit to be physically meaningful, so anything past this can be removed from the input file (but of course then working numerically in double precision or whatever is needed). That will dramatically reduce the file size.

from species.

tomasstolker commented on July 30, 2024

There is more in species then FitModel 😉 so 1 and 6 seem too specific. The spec_res parameter from add_model was needed in an early version of the package. Currently, it only helps in terms of time/storage efficiency.

from species.

gabrielastro commented on July 30, 2024

Ok about 1 😄. What happens if the model resolution is worse than that of the data, though? But for 6, I cannot imagine how an interpolated model could have more than three or four significant digits; the rest is basically numerical noise… (But there is no real harm; just to storage space, transfer time, and elegance 😉.)

from species.

tomasstolker commented on July 30, 2024

Not all routines require interpolation so I think that your suggestion could introduce unwanted inaccuracies for some of the non-FitModel applications.

from species.

gabrielastro commented on July 30, 2024

Hmm… Ok, as you wish… To be continued another time. Or just try truncating the input files and see how many bug reports start coming in 😁. Thanks in any case for this exchange!

from species.

Making unpacking/storing of atmospheric models more memory-efficient about species HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent