bd-j / prospector Goto Github PK
View Code? Open in Web Editor NEWPython code for Stellar Population Inference from Spectra and SEDs
Home Page: http://prospect.readthedocs.io
License: MIT License
Python code for Stellar Population Inference from Spectra and SEDs
Home Page: http://prospect.readthedocs.io
License: MIT License
Write a nested sampling backend using nestle. This goes with #17.
When reinitializing the sampler try to estimate and use the covariance matrix of the current walker positions (or rather a high probability subset) instead of using an uncorrelated gaussian centered on the highest probability walker.
as per usual, dust attenuation is left to languish. some basic implementation should be added.
allow user to use george+HODLR for the gaussian processes
When I run in debug=True, I make it to the intermediate stage with no errors. Once it tries to go past that, this error pops up: [Final error at bottom]
Is this something on my end? It seems like on one of the functions is taking **kwargs but python is not handling it properly.
NameError Traceback (most recent call last)
/Users/imad/research/fsps/bsfh_fitting/prospector.py in ()
218 esampler, burn_p0, burn_prob0 = fitting.run_emcee_sampler(lnprobfn, initial_center, model,
219 postkwargs=postkwargs, pool=pool, initial_prob=initial_prob,
--> 220 **rp)
221 edur = time.time() - tstart
222 if rp['verbose']:
/Volumes/Berkeley/FSPS/bsfh/bsfh/fitting/fitterutils.pyc in run_emcee_sampler(lnprobf, initial_center, model, postargs, postkwargs, initial_prob, nwalkers, nburn, niter, walker_factor, initial_disp, nthreads, pool, verbose, **kwargs)
36 # Loop over the number of burn-in reintializations
37 for k, iburn in enumerate(nburn[:-1]):
---> 38 epos, eprob, state = esampler.run_mcmc(initial, iburn)
39 # find best walker position
40 # if multiple walkers in best position, cut down to one walker
/Library/Anaconda/lib/python2.7/site-packages/emcee/sampler.pyc in run_mcmc(self, pos0, N, rstate0, lnprob0, *_kwargs)
155 """
156 for results in self.sample(pos0, lnprob0, rstate0, iterations=N,
--> 157 *_kwargs):
158 pass
159 return results
/Library/Anaconda/lib/python2.7/site-packages/emcee/ensemble.pyc in sample(self, p0, lnprob0, rstate0, blobs0, iterations, thin, storechain, mh_proposal)
196 blobs = blobs0
197 if lnprob is None:
--> 198 lnprob, blobs = self._get_lnprob(p)
199
200 # Check to make sure that the probability function didn't return
/Library/Anaconda/lib/python2.7/site-packages/emcee/ensemble.pyc in _get_lnprob(self, pos)
380
381 # Run the log-probability calculations (optionally in parallel).
--> 382 results = list(M(self.lnprobfn, [p[i] for i in range(len(p))]))
383
384 try:
/Library/Anaconda/lib/python2.7/site-packages/emcee/ensemble.pyc in call(self, x)
503 def call(self, x):
504 try:
--> 505 return self.f(x, _self.args, *_self.kwargs)
506 except:
507 import traceback
/Users/imad/research/fsps/bsfh_fitting/prospector.py in lnprobfn(theta, model, obs, verbose)
84 t1 = time.time()
85 try:
---> 86 mu, phot, x = model.mean_model(theta, obs, sps=sps)
87 except(ValueError):
88 return -np.infty
/Volumes/Berkeley/FSPS/bsfh/bsfh/models/sedmodel.pyc in mean_model(self, theta, obs, sps, *_extras)
50 Any extra aspects of the model that are returned.
51 """
---> 52 s, p, x = self.sed(theta, obs, sps=sps, *_extras)
53 if obs.get('logify_spectrum', True):
54 s = np.log(s) + np.log(self.spec_calibration(obs=obs, **extras))
/Volumes/Berkeley/FSPS/bsfh/bsfh/models/sedmodel.pyc in sed(self, theta, obs, sps, *_kwargs)
85 spec, phot, extras = sps.get_spectrum(outwave=obs['wavelength'],
86 filters=obs['filters'],
---> 87 *_self.params)
88
89 spec *= obs.get('normalization_guess', 1.0)
/Volumes/Berkeley/FSPS/bsfh/bsfh/source_basis/galaxy_basis.pyc in get_spectrum(self, outwave, filters, **params)
346 SedModel class.
347 """
--> 348 self.params.update(**kwargs)
349 # Pass the model parameters through to the sps object
350 ncomp = len(self.params['mass'])
NameError: global name 'kwargs' is not defined
Hi all,
I am a new user of Python as well as of Prospector. After Installing Prospector with all dependencies (which are probably OK and verified !), I ran the program with demo_files provided for tutorial. I tried to execute following command
python prospector.py --param_file=demo_params.py --objid=0 --outfile=demo_obj0
and got this error
Traceback (most recent call last):
File "prospector.py", line 19, in
run_params = model_setup.get_run_params(argv=sargv, **clargs)
File "build/bdist.linux-x86_64/egg/prospect/models/model_setup.py", line 60, in get_run_params
File "build/bdist.linux-x86_64/egg/prospect/models/model_setup.py", line 163, in import_module_from_file
File "/home/iucaa/Softwares/anaconda2/lib/python2.7/importlib/init.py", line 37, in import_module
import(name)
File "demo_params.py", line 247
None})
^
Can anyone help me to sort it out ?
This falls under the general category of making the FSPS/python-FSPS/prospector pipeline independent of the filesystem, such that two or more such tasks (in separate python processes) can peacefully coexistent on the same machine.
take advantage of the emcee iterator to output the walker positions incrementally to a file so progress can be tracked in real time. Additionally implement easy restarts for the sampler given last location of chains.
Hey @bd-j,
What do you think about using argparse for the command-line arguments to make this more flexible and cleaner?
If you're interested and/or think this would be useful, I'd be happy to implement it. I'd also like to make it easier to pass model parameters at the command line.
I'm looking for some way to help out with development -- please let me know what you think.
Cheers.
use the matrix determinant lemma and woodbury matrix identity formula to reuse factorized kernel matrix as diagonal changes
It would be nice to break up some of the smoothing logic to have rest-frame (i.e. physical velocity dispersion) and observed frame (i.e. instrumental) components.
It would also be good to split up logic for cosmological redshift and peculiar velocities (and possibly have a separate cosmological redshift parameter that can affect photometry only, if fitting spectra in the rest-frame)
following bd-j/sedpy#6, need to interpolate the model spectrum onto the velocity grid of the filter transmission function for fast filter projections given arbitrary redshift.
I am using a Mac 10.12.6, gcc 7.1.0 (but also tried with v6.3.0), and I am finding a problem when running the first instruction in the tutorial:
python prospector.py --param_file=demo_params.py --objid=0 --outfile=demo_obj0
gives the following error:
SPS_SETUP ERROR: miles spectral library cannot be opened Z=0.
in principle one could have a library of Cloudy models (with parameters of Q(H), , Z, and f(esc), following Charlot + Longhetti), but it may be better to simply allow for individual emission lines a la gandalf. This will add many more parameters to the model
After updating emcee
by accident to (3.0.0.dev0
)...I notice that prospector
has some compatibility issue. I trace it down to:
initial = resample_until_valid(sampler_ball, initial_center, disps, sampler.k,
limits=limits, prior_check=model)
Apparently, sampler.k
is gone, and now is sampler.nwalkers
.
for k, iburn in enumerate(nburn):
epos, eprob, state = sampler.run_mcmc(initial, iburn, storechain=True)
and
for i, result in enumerate(esampler.sample(initial, iterations=niter,
storechain=storechain)):
And, storechain
is now just store
.
These are the only two I noticed so far. I have fixed them in a private version, and I think it is pretty straightforward to add support of v3.0
, something like:
if emcee.__version__[0] is '3':
xxxx
Since the v3.0
version has been merged into the master branch of emcee
, maybe this is worth considering.
modify likelihood methods to optionally return a residual vector for use in least-squares algorithms like Levenburg-Marquardt
Is the following RuntimeWarning in models.priors.__call__
non-catastrophic?
/usr/local/anaconda3/envs/py35/lib/python3.6/site-packages/prospect-c556c0f-py3.6.egg/prospect/models/priors.py:141: RuntimeWarning: divide by zero encountered in log
If so, it would be nice if it was suppressed.
The PDF should always be positive but I guess sometimes it returns zero.
using high spectral resolution models will allow for metallicity constraints, though restricted to optical. velocity dispersion implementation should be tested and refined as well
Allow for multiple spectra to be fit, potentially sharing some parameters.
The following error occurs in fitting.minimizer.minimizer_ball
when using the new L-M initial optimization option:
*** TypeError: sample() missing 1 required positional argument: 'self'
It's odd because I'm only using the TopHat
and LogUniform
prior classes, both of which inherit the Prior
class defined in models.priors
.
The issue goes away, however, if prior_args
isn't used in the model_params
dictionary. In other words, this works:
model_params.append({
'name': 'mass',
'N': 1,
'isfree': True,
'init': mass,
'init_disp': 5e11,
'units': r'$M_{\odot}$',
'prior_function': priors.LogUniform(mini=1e8, maxi=1e13),
})
but this borks with the error above:
model_params.append({
'name': 'mass',
'N': 1,
'isfree': True,
'init': mass,
'init_disp': 5e11,
'units': r'$M_{\odot}$',
'prior_function': priors.LogUniform,
'prior_args': {'mini': 1e8, 'maxi': 1e13}
})
For maximum portability and reproducibility, the output should include the full git history, and the text of both the parameter file and the prospector script. This should wait on the hdf5 output to be implemented.
photometry sometimes has outliers. see #24
try to set up a system so that lnprob functions can be imported from a module, allowing for more flexibility in likelihood functions. This needs to work over MPI with likelihood functions that require unpickleable objects.
AttributeError Traceback (most recent call last)
in ()
11 if obs["wavelength"] is None:
12 # restframe spectral wavelengths, since obs["wavelength"] is None
---> 13 wspec = sps.wavelengths
14 wspec *= a #redshift them
15 else:
/home/rroy/Software/ANACONDA/anaconda2/lib/python2.7/site-packages/prospect-17f9326-py2.7.egg/prospect/sources/ssp_basis.pyc in wavelengths(self)
264 @Property
265 def wavelengths(self):
--> 266 return self.ssp.wavelengths
267
268
AttributeError: 'CSPSpecBasis' object has no attribute 'ssp'
Split up the get_spectrum method in the sources into smaller more digestible pieces. Ideally, move it to the SedModel class or otherwise use a suite of shared methods so it doesn't have to be written for each source type separately.
Also clarify units.
implement a gaussian process for photometric data, to deal with outliers in SEDs
When they are determined from the model_params list, there should be a check to make sure that they are within the priors and that they are not zero.
Take emission line luminosities from FSPS and add them to spectra (and photometry).
This should use flags/arguments to specify the number of sigmas from line center to go out, and whether to coadd the lines together into a single spectrum (and into the continuum) or keep them separate (for possible use with gradient based methods)
Also use a line mask to choose which lines to actually calculate/use
We need to allow for looping over objects with the same mpi pool and globals to avoid startup penalties. Also, we could parallelize optimization between objects instead of having many optimizers for each object.
Hi Ben et al.,
I'm bumping into some issues with array broadcasting when using the sps function to generate a spectra.
Following the workflow in the docs I attempt to extract the Maximum a posteriori model spectra for the photometric data.
import prospect.io.read_results as bread
from prospect.sources import CSPBasis
res, pr, mod = bread.results_from('zfourge_test_128w_1024i_1488030812_1108_mcmc.h5')
res['obs']['wavelength'] = np.arange(1,100000, 1)
sps = CSPBasis(**res['run_params'])
ind_max = res["lnprobability"].argmax()
walker, iteration = np.unravel_index(ind_max, res["lnprobability"].shape)
spec, phot, mfrac = mod.mean_model(res['chain'][walker, iteration, :], obs=res['obs'], sps=sps, peraa=True)
When attempting to extract the mean_model I get a value error. The attached folder contains a screenshot of the full error report and the outputs used to extract the spectra.
files.zip
This is driven by the extra dimensions in the spectra (possibly for the extra time-steps?) generated when executing a parameter file similar to @jrleja 's non-parametric SFHs: https://github.com/jrleja/threedhst_bsfh/tree/master/parameter_files/brownseds_np
Simply transposing the spectra to fit numpy broadcasting rules fixes this issue but there are multiple errors afterwards due to the extra dimensions in the spectrum array.
So I was wondering you have an different approach to extracting best-fit models for such param files?
When I use the same structure as the demo param file this works fine. However, the best-fit parameters (including the redshift) doesn't get sampled properly, for the zfourge z=2 galaxies.
So I thought to use similar parameters to Joel's study at z~0. This is when I run into these issues.
allow for blobs to be returned by the lnprobfn, related #13
the .h5 file is fine. but the output sample chain from the mcmc file has (0) for one of its dimensions (presumably niter). suspect this is related to the dynamic resizing of the chain variable implemented during the convergence checks?
/Users/dweisz/gitcode/prospector 10:09am > python -V
Python 3.5.2 :: Anaconda 4.1.1 (x86_64)
/Users/dweisz/gitcode/prospector 10:09am > python setup.py install
Traceback (most recent call last):
File "setup.py", line 16, in
vers = subprocess.check_output(["git", "log", "--format=%h"]).split('\n')[0]
TypeError: a bytes-like object is required, not 'str'
Having all these different parameter classes floating around holding states and talking to each other is confusing. Put all the model state in one place along with the fit parameters. While we're at it, streamline the initialization of this new Parameter class and the adding of observational data.
Not a bug report but more of a clarification:
How does Prospector handle non-detections? (i.e. galaxies within the FOV of a given filter but with no continuum detection)
In those cases is it better to keep the fluxes to the 1-sigma sky detection level, or is there a specific flag?
It would be nice to have convergence criteria that are checked during sampling.
Relatedly, we need a mechanism to efficiently/easily restart sampling from a particular (set of) walker locations
Question: I'd like to be able to write out and read a CSPSpecBasis
object, since it takes about ~20 seconds to initialize every time the code is called.
Writing out a basic pickle file generically works but after reading it back in the code borks when the sedmodel.SedModel.mean_model
method gets called (in which the CSPSpecBasis
object is passed via the sps
input) with the message
SSP_GEN ERROR0: SPS_SETUP must be run once before calling SSP_GEN.
Any thoughts / suggestions?
I tried executing the commands in the demo/DeconstructedDemo.ipynb
notebook using an Anaconda installation of Python 3.6 with SciPy version 1.1.0 and Numpy version 1.12. However, I encounter the following error:
ValueError Traceback (most recent call last)
<ipython-input-25-f5c3ccf19dcb> in <module>()
13 res = least_squares(chivecfn, np.array(pinit), method='lm', x_scale='jac',
14 xtol=run_params["xtol"], ftol=run_params["ftol"],
---> 15 max_nfev=run_params["maxfev"])
16 guesses.append(res)
17
~/anaconda2/envs/python3/lib/python3.6/site-packages/scipy/optimize/_lsq/least_squares.py in least_squares(fun, x0, jac, bounds, method, ftol, xtol, gtol, x_scale, loss, f_scale, diff_step, tr_solver, tr_options, jac_sparsity, max_nfev, verbose, args, kwargs)
901 if method == 'lm':
902 result = call_minpack(fun_wrapped, x0, jac_wrapped, ftol, xtol, gtol,
--> 903 max_nfev, x_scale, diff_step)
904
905 elif method == 'trf':
~/anaconda2/envs/python3/lib/python3.6/site-packages/scipy/optimize/_lsq/least_squares.py in call_minpack(fun, x0, jac, ftol, xtol, gtol, max_nfev, x_scale, diff_step)
65 x, info, status = _minpack._lmdif(
66 fun, x0, (), full_output, ftol, xtol, gtol,
---> 67 max_nfev, epsfcn, factor, diag)
68 else:
69 if max_nfev is None:
ValueError: The array returned by a function changed size between calls
I checked to make sure that the array pinit
does not change size, and also ensured that chivecfn(pinit)
evaluates correctly.
After trying to debug this, I eventually downgraded my Scipy version to 0.19 (with which, when installed on Python 2.7, I had no problem running the code). During the downgrade, Numpy, blas, and several other packages were updated. The demo notebook worked without a problem after then, so it seems like Scipy version 1.1 was the issue.
I'm curious if anybody else has had difficulties running prospector
using Scipy version >0.19?
What's the preferred citation for published work that uses Prospector?
So up till a few days ago I was running prospector fine. I haven't really changed anything about my parameter file, but now when I try to run it, I get this error:
/Volumes/Berkeley/FSPS/bsfh/bsfh/models/parameters.pyc in theta_bounds(self)
221 for p, v in list(self.theta_index.items()):
222 sz = v[1] - v[0]
--> 223 pb = priors.plotting_range(self._config_dict[p]['prior_args'])
224 if sz == 1:
225 bounds[v[0]] = pb
key error: 'prior_args'
This happens when running prospector, or , when I use debug mode I can manually ask for model.theta_bounds() and the same error gets thrown, which is what's causing the error during the main run of the code. I don't know how to resolve it because it seems to be saying 'prior_args' is not a valid key for the config dictionary, which I can't manually access via model to bugtest.
Thanks!
use objects for priors, with methods giving lnp_prior as well as samplings from the prior
Making the obs
dictionary into a full fledged Observations
object might make it clearer to handle and interact with in many cases.
Like for the emcee
fitting, we should have an option for periodically saving the dynesty results to disk during a run (the current setup already uses the dynesty sampling iterator) to guard against cancelled jobs and help in debugging. Also should probably keep the output from flushing stderr, since this overwrites other diagnostic info.
Loading and unpickling details stored in HDF5 output, then using them to generate the model for a given location on the chain:
with h5py.File('chains.hdf5,'r') as f:
chain = f['sampling']['chain'][()]
model_params = pickle.loads(f.attrs['model_params'])
run_params = pickle.loads(f.attrs['run_params'])
# reconstruct model
model = sedmodel.SedModel(model_params)
samples = chain.reshape( (-1,chain.shape[2]) )
local_result = samples[0,:]
spec, phot, mfrac = model.mean_model(local_result, obs=obs, sps=sps)
produces an error
TypeError: 'list' object is not callable
> ./prospect/models/parameters.py(146)propagate_parameter_dependencies()
--> 146 value = info['depends_on'](**self.params)
I fixed it with this rather suspicious code?
def propagate_parameter_dependencies(self):
"""Propogate any parameter dependecies. That is, for parameters whose
value depends on another parameter, calculate those values and store
them in the ``params`` dictionary.
"""
if self._has_parameter_dependencies is False:
return
for p, info in list(self._config_dict.items()):
if 'depends_on' in info:
if type(info['depends_on']) is list:
import importlib
module = importlib.import_module(info['depends_on'][1])
func = getattr(module, info['depends_on'][0])
else:
func = info['depends_on']
value = func(**self.params)
self.params[p] = np.atleast_1d(value)
pickle files are stupid. store in hdf5, with serialized versions of everything that isn't an array as giant text strings
CSPSpecBasis
does not accept the vactoair_flag
optional input at instantiation, but it should.
To aid in clarity, the various object classes might store lists of the available fittable parameters. I.e. CSPSpecBasis
might have an attribute available_parameters
that is a list that includes all the fsps.StellarPopulation
parameters as well as things like mass
, lumdist
, wavecal_coeffs
, while the SedModel class would have poly_coeffs
and spec_norm
.
Transformed parameters would be a bit trickier to incorporate.
(This is Imad Pasha, working with Mariska)
In my params file, I am importing load_filters from sedpy.observate. It is correctly reading my list of filter names, but is throwing the error:
Filter transmission file /Library/Anaconda/lib/python2.7/site-packages/sedpy-0.1.0-py2.7.egg/sedpy/data/filters/filter_001.par does not exist!
Clearly the second half of the path (starting with /sedpy) is correct, but it's being appended to some strange location that seems to be related to sedpy's installation as a package for anaconda. I looked up the observate file in sedpy and didn't see anywhere in load_filters to specify a specific path to the sedpy folder, for it to append to. How do I set that up?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.