Comments (10)
-
Also related, would it be possible to have in
database.add_model()
the option of limiting not onlyteff
but also the other parameter ranges? Some parameter ranges are sometimes known to be much too wide (especially if they will be fixed), so one could save some (a lot of) read-in time. -
Atmospheric model files can have maaaany significant digits:
# Wavelength (um) - Flux (W m-2 um-1)
5.000000000000000000e-01 1.046887465549093861e+02
5.001237804050535640e-01 1.020516848591326635e+02
5.002475914532845680e-01 1.008752535495290061e+02
5.003714331522789438e-01 1.008323757761552457e+02
but this could be dramatically reduced without loss of precision, which would reduce not only the needed RAM (I guess) but certainly the file size by a large factor (ca. 3, or 0.5 dex!).
from species.
Thanks for opening another issue 😊!
-
Indeed the whole file is unpacked. Ideally it would check if the files are present and if the files have the correct size (i.e. that a previous try was not aborted), but this is not something I plan to implement (but would welcome a PR!). I can add an
unpack
parameter inadd_model
though. -
There is the configuration file where you can set the download/unpack folder and also the HDF5 file that is to be used. So downloading, unpacking, and adding is only required once.
-
I have implemented this in commit 4804241. So it only unpacks spectra files within the selected
teff_range
. For now, I am not planning to add support for adding partial files, since settingteff_range
should for most cases be fine, but would welcome a PR. -
Hm not sure, should be somewhat straightforward to implement, but my guess is that such additional restrictions are typically not needed. It is only needed to add the model spectra once, so should be fine if that takes a few minutes (at most). In
FitModel
it is possible to fix a parameter though! -
I wouldn't know what would be a reasonable number... It may also depend on the model? So I rather stick with the safe number of decimals 😉
from species.
Thanks for answering quickly to one more issue 😊. Thanks a lot for adding the unpack
parameter! It works very nicely. Now it is much faster.
-
Good point that it would require some checks to avoid bad surprises later on. There probably exist some high-quality libraries implementing this but this is not my area… Maybe an easy compromise would be to offer a "use-at-your-risk" option only checking if the file exists (and warn the user that it is up to him or her to make sure the files are not corrupted)?
-
Ah! perfect. Sorry for having missed this!
-
Thanks! I agree that partial wavelength ranges are not needed (and would require some trouble to program).
4a. Even though add_model()
is now much faster, it is taking seeeeveral minutes when initialising:
database.add_model(model='exo-rem-highres', teff_range=(1000., 1500.))
printed after eight minutes Unpacking 3171/9575 model spectra from Exo-REM (16 GB)...
and after further ca. four minutes:
Please cite Charnay et al. (2018) when using Exo-REM in a publication
Reference URL: https://ui.adsabs.harvard.edu/abs/2018ApJ...854..172C/abstract
Wavelength range (um) = 0.67 - 250.0
Spectral resolution = 20000
Teff range (K) = 1000.0 - 1500.0
and the unpacking began, and after ten more minutes it printed Adding Exo-REM model spectra... [DONE]
, and then after ca. two more minutes it was done, including:
Number of stored grid points: 3300
Number of interpolated grid points: 128
So, in total 25 minutes. Programming the main part of the code obviously takes (me) much more time than this :), but still… The (University) computer is quite modern, with many cores and sufficient RAM, I thought, and there are no network issues. Or do these numbers seem suspicious to you?
I guess it has to be done only once per database file. I was thinking of having a local database for every project to keep it tidier, but given the time it takes I will probably re-use the database across projects. Which is ok too! I am still a beginner.
Maybe restricting the other parameters in the same way as Teff would still be relatively easy to implement? Or maybe not because not all grids have the same axes…
4b. If database.add_model()
is called again for the same models, the same process starts over, right? Maybe the following changes would be sensible: if changing wavel_range
, keep the behaviour as is (everything just has to be overwritten anyway to avoid complicated programmer-side checks and tracking), but if some other range of parameter is different compared to what is stored in the database, added only the needed part instead of again unpacking and reading in everything. If the range (e.g., of Teff) gets reduced, models would need to be delete from the database file, instead of all models from that family deleted and only the desired ones being read in again. But again, if it is too complicated (messy) to program, maybe it is not needed…
- Actually, it may be simple:
- The smallest delta lambda for each grid is known, so keeping three more significant digits than this smallest number will be more than enough. Actually, since the wavelength is in scientific notation and the resolution = lambda/Delta lambda is constant, the number of digits needed is easy! N_digits = ceil(log10(R))+3 is more than enough.
- What is the highest per-bin flux accuracy conveivable in the model files that you read in and/or in observations? SNR = 1e3? So four or at the very most five significant digits would be more than enough, right?
Was that convincing ;)? These measures would reduce the file sizes dramatically!
from species.
-
That does take a long time! The high-res grid had not been used before by anyone I think... Okay perhaps adding the additional parameter restrictions would help in that case. What you can also do, is creating a separate folder and copy from
data_folder
only the model spectra files that you want to use, and then add them withadd_custom_model
. -
SNR of 1000 would certainly be sufficient for planet observations but I don't know if that is also a typical convergence accuracy for the atmospheric models?
from species.
- I just realized that
add_model
has thewavel_range
parameter. It does have to read in each file though, but then only stores the selected wavelength range.
from species.
Remaining in this thread:
- I realised that calling
FitModel()
with a value forteff
inbounds
but without having first calleddatabase.add_model()
, the models for all Teff values are added, instead of only the needed range. Maybe it would make sense to align the behaviour to that ofdatabase.add_model()
. Also, I guess it means that one (never?) needs to calldatabase.add_model()
explicitly and thatFitModel()
will take care of this.
In passing: "safe some time" → "save some time".
-
Right, perfect! Thanks.
-
Thanks for pointing that out! By the way, about
spec_res
:
spec_res : float, None
Spectral resolution to which the spectra will be resampled.
This parameter is optional since the spectra have already
been resampled to a lower, constant resolution (typically
:math:`R = 5000`). The argument is only used if
``wavel_range`` is not ``None``.
I confused myself into thinking that it is meaningful to set this if the model resolution is much higher than that of the data but that is not true; FitModel
will take care of this. Maybe it would be good to add a comment on this in the documentation. Is there a typical scenario where spec_res
would be useful? Also, what happens if the model resolution is worse than that of the data?
-
Good suggestion, thank you!
-
Maybe the SNR I mentioned is a red herring. Certainly interpolating the models means that we cannot trust more than the third or fourth significant digit to be physically meaningful, so anything past this can be removed from the input file (but of course then working numerically in double precision or whatever is needed). That will dramatically reduce the file size.
from species.
There is more in species
then FitModel
😉 so 1 and 6 seem too specific. The spec_res
parameter from add_model
was needed in an early version of the package. Currently, it only helps in terms of time/storage efficiency.
from species.
Ok about 1 😄. What happens if the model resolution is worse than that of the data, though? But for 6, I cannot imagine how an interpolated model could have more than three or four significant digits; the rest is basically numerical noise… (But there is no real harm; just to storage space, transfer time, and elegance 😉.)
from species.
Not all routines require interpolation so I think that your suggestion could introduce unwanted inaccuracies for some of the non-FitModel
applications.
from species.
Hmm… Ok, as you wish… To be continued another time. Or just try truncating the input files and see how many bug reports start coming in 😁. Thanks in any case for this exchange!
from species.
Related Issues (20)
- Tutorial "Fitting data with a grid of model spectra" (and a few other pages): small things HOT 5
- database overview: list_models() and verbosity control? HOT 4
- Database problem: OSError: Unable to open file (file is already open for read-only) HOT 4
- Adding dynesty support (especially to retrievals) HOT 8
- Various small things HOT 7
- Fitting with nested sampling (UltraNest or MultiNest) HOT 4
- Enhancing plot_spectrum a bit HOT 21
- Running multinest or ultranest in parallel? HOT 11
- Higher-order interpolation? HOT 3
- Retrieval with radial velocity / rotational broadening vsini HOT 2
- Installation error: No matching distribution found for matplotlib~=3.8.0 HOT 3
- Using wavel_range with database.add_model() HOT 5
- wavelength / spectral spacing for exo-rem-highres grid HOT 9
- pypi package version HOT 1
- Different fsed for cloud species in retrieval HOT 4
- sonora elf-owl as successor to bobcat and cholla HOT 18
- Upper limits on photometric measurements? HOT 1
- Change fontsize of axis labels with plot_spectrum? HOT 3
- Problem in add_custom_model with data_path HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from species.