bd-j / prospector Goto Github PK

View Code? Open in Web Editor NEW

154.0 25.0 71.0 37.03 MB

Python code for Stellar Population Inference from Spectra and SEDs

Home Page: http://prospect.readthedocs.io

License: MIT License

Python 99.92% Shell 0.08%

stellar-population-properties fitting

prospector's People

Contributors

Stargazers

Watchers

prospector's Issues

Nested sampling

Write a nested sampling backend using nestle. This goes with #17.

When reinitializing the sampler try to estimate and use the covariance matrix of the current walker positions (or rather a high probability subset) instead of using an uncorrelated gaussian centered on the highest probability walker.

dust attenuation

as per usual, dust attenuation is left to languish. some basic implementation should be added.

use George

allow user to use george+HODLR for the gaussian processes

Name Error: kwargs not defined

When I run in debug=True, I make it to the intermediate stage with no errors. Once it tries to go past that, this error pops up: [Final error at bottom]
Is this something on my end? It seems like on one of the functions is taking **kwargs but python is not handling it properly.

NameError Traceback (most recent call last)
/Users/imad/research/fsps/bsfh_fitting/prospector.py in ()
218 esampler, burn_p0, burn_prob0 = fitting.run_emcee_sampler(lnprobfn, initial_center, model,
219 postkwargs=postkwargs, pool=pool, initial_prob=initial_prob,
--> 220 **rp)
221 edur = time.time() - tstart
222 if rp['verbose']:

/Volumes/Berkeley/FSPS/bsfh/bsfh/fitting/fitterutils.pyc in run_emcee_sampler(lnprobf, initial_center, model, postargs, postkwargs, initial_prob, nwalkers, nburn, niter, walker_factor, initial_disp, nthreads, pool, verbose, **kwargs)
36 # Loop over the number of burn-in reintializations
37 for k, iburn in enumerate(nburn[:-1]):
---> 38 epos, eprob, state = esampler.run_mcmc(initial, iburn)
39 # find best walker position
40 # if multiple walkers in best position, cut down to one walker

/Library/Anaconda/lib/python2.7/site-packages/emcee/sampler.pyc in run_mcmc(self, pos0, N, rstate0, lnprob0, *_kwargs)
155 """
156 for results in self.sample(pos0, lnprob0, rstate0, iterations=N,
--> 157 *_kwargs):
158 pass
159 return results

/Library/Anaconda/lib/python2.7/site-packages/emcee/ensemble.pyc in sample(self, p0, lnprob0, rstate0, blobs0, iterations, thin, storechain, mh_proposal)
196 blobs = blobs0
197 if lnprob is None:
--> 198 lnprob, blobs = self._get_lnprob(p)
199
200 # Check to make sure that the probability function didn't return

/Library/Anaconda/lib/python2.7/site-packages/emcee/ensemble.pyc in _get_lnprob(self, pos)
380
381 # Run the log-probability calculations (optionally in parallel).
--> 382 results = list(M(self.lnprobfn, [p[i] for i in range(len(p))]))
383
384 try:

/Library/Anaconda/lib/python2.7/site-packages/emcee/ensemble.pyc in call(self, x)
503 def call(self, x):
504 try:
--> 505 return self.f(x, _self.args, *_self.kwargs)
506 except:
507 import traceback

/Users/imad/research/fsps/bsfh_fitting/prospector.py in lnprobfn(theta, model, obs, verbose)
84 t1 = time.time()
85 try:
---> 86 mu, phot, x = model.mean_model(theta, obs, sps=sps)
87 except(ValueError):
88 return -np.infty

/Volumes/Berkeley/FSPS/bsfh/bsfh/models/sedmodel.pyc in mean_model(self, theta, obs, sps, *_extras)
50 Any extra aspects of the model that are returned.
51 """
---> 52 s, p, x = self.sed(theta, obs, sps=sps, *_extras)
53 if obs.get('logify_spectrum', True):
54 s = np.log(s) + np.log(self.spec_calibration(obs=obs, **extras))

/Volumes/Berkeley/FSPS/bsfh/bsfh/models/sedmodel.pyc in sed(self, theta, obs, sps, *_kwargs)
85 spec, phot, extras = sps.get_spectrum(outwave=obs['wavelength'],
86 filters=obs['filters'],
---> 87 *_self.params)
88
89 spec *= obs.get('normalization_guess', 1.0)

/Volumes/Berkeley/FSPS/bsfh/bsfh/source_basis/galaxy_basis.pyc in get_spectrum(self, outwave, filters, **params)
346 SedModel class.
347 """
--> 348 self.params.update(**kwargs)
349 # Pass the model parameters through to the sps object
350 ncomp = len(self.params['mass'])

NameError: global name 'kwargs' is not defined

Problem in running test file

Hi all,
I am a new user of Python as well as of Prospector. After Installing Prospector with all dependencies (which are probably OK and verified !), I ran the program with demo_files provided for tutorial. I tried to execute following command
python prospector.py --param_file=demo_params.py --objid=0 --outfile=demo_obj0
and got this error
Traceback (most recent call last):
File "prospector.py", line 19, in
run_params = model_setup.get_run_params(argv=sargv, **clargs)
File "build/bdist.linux-x86_64/egg/prospect/models/model_setup.py", line 60, in get_run_params
File "build/bdist.linux-x86_64/egg/prospect/models/model_setup.py", line 163, in import_module_from_file
File "/home/iucaa/Softwares/anaconda2/lib/python2.7/importlib/init.py", line 37, in import_module
import(name)
File "demo_params.py", line 247
None})
^
Can anyone help me to sort it out ?

avoid creation of lsf.dat for spectral smoothing

This falls under the general category of making the FSPS/python-FSPS/prospector pipeline independent of the filesystem, such that two or more such tasks (in separate python processes) can peacefully coexistent on the same machine.

incremental sampling

take advantage of the emcee iterator to output the walker positions incrementally to a file so progress can be tracked in real time. Additionally implement easy restarts for the sampler given last location of chains.

use argparse?

Hey @bd-j,

What do you think about using argparse for the command-line arguments to make this more flexible and cleaner?

If you're interested and/or think this would be useful, I'd be happy to implement it. I'd also like to make it easier to pass model parameters at the command line.

I'm looking for some way to help out with development -- please let me know what you think.

Cheers.

refactorize covariance matrix only when GP changes

use the matrix determinant lemma and woodbury matrix identity formula to reuse factorized kernel matrix as diagonal changes

rest-frame/observed-frame smoothing and velocities

It would be nice to break up some of the smoothing logic to have rest-frame (i.e. physical velocity dispersion) and observed frame (i.e. instrumental) components.

It would also be good to split up logic for cosmological redshift and peculiar velocities (and possibly have a separate cosmological redshift parameter that can affect photometry only, if fitting spectra in the rest-frame)

filter projections with even velocity spacings

following bd-j/sedpy#6, need to interpolate the model spectrum onto the velocity grid of the filter transmission function for fast filter projections given arbitrary redshift.

Problem reading MILES library

I am using a Mac 10.12.6, gcc 7.1.0 (but also tried with v6.3.0), and I am finding a problem when running the first instruction in the tutorial:

python prospector.py --param_file=demo_params.py --objid=0 --outfile=demo_obj0

gives the following error:

SPS_SETUP ERROR: miles spectral library cannot be opened Z=0.

emission lines

in principle one could have a library of Cloudy models (with parameters of Q(H), , Z, and f(esc), following Charlot + Longhetti), but it may be better to simply allow for individual emission lines a la gandalf. This will add many more parameters to the model

Compatibility with emcee v3.0

After updating emcee by accident to (3.0.0.dev0)...I notice that prospector has some compatibility issue. I trace it down to:

https://github.com/bd-j/prospector/blob/master/prospect/fitting/ensemble.py#L193

 initial = resample_until_valid(sampler_ball, initial_center, disps, sampler.k,
                                   limits=limits, prior_check=model)

Apparently, sampler.k is gone, and now is sampler.nwalkers.

https://github.com/bd-j/prospector/blob/master/prospect/fitting/ensemble.py#L124
and https://github.com/bd-j/prospector/blob/master/prospect/fitting/ensemble.py#L198

    for k, iburn in enumerate(nburn):
        epos, eprob, state = sampler.run_mcmc(initial, iburn, storechain=True)

and

    for i, result in enumerate(esampler.sample(initial, iterations=niter,
                                               storechain=storechain)):

And, storechain is now just store.

These are the only two I noticed so far. I have fixed them in a private version, and I think it is pretty straightforward to add support of v3.0, something like:

if emcee.__version__[0] is '3':
     xxxx

Since the v3.0 version has been merged into the master branch of emcee, maybe this is worth considering.

chi vector option in likelihood

modify likelihood methods to optionally return a residual vector for use in least-squares algorithms like Levenburg-Marquardt

suppress "RuntimeWarning: divide by zero encountered in log" warning(?)

Is the following RuntimeWarning in models.priors.__call__ non-catastrophic?

/usr/local/anaconda3/envs/py35/lib/python3.6/site-packages/prospect-c556c0f-py3.6.egg/prospect/models/priors.py:141: RuntimeWarning: divide by zero encountered in log

If so, it would be nice if it was suppressed.

The PDF should always be positive but I guess sometimes it returns zero.

high resolution models

using high spectral resolution models will allow for metallicity constraints, though restricted to optical. velocity dispersion implementation should be tested and refined as well

Multiple spectra

Allow for multiple spectra to be fit, potentially sharing some parameters.

*** TypeError: sample() missing 1 required positional argument: 'self'

The following error occurs in fitting.minimizer.minimizer_ball when using the new L-M initial optimization option:

*** TypeError: sample() missing 1 required positional argument: 'self'

It's odd because I'm only using the TopHat and LogUniform prior classes, both of which inherit the Prior class defined in models.priors.

The issue goes away, however, if prior_args isn't used in the model_params dictionary. In other words, this works:

    model_params.append({                                                                              
        'name': 'mass',                                                                                
        'N': 1,                                                                                        
        'isfree': True,                                                                                
        'init':      mass,                                                                             
        'init_disp': 5e11,                                                                             
        'units': r'$M_{\odot}$',                                                                       
        'prior_function': priors.LogUniform(mini=1e8, maxi=1e13),                                      
        })

but this borks with the error above:

    model_params.append({                                                                              
        'name': 'mass',                                                                                
        'N': 1,                                                                                        
        'isfree': True,                                                                                
        'init':      mass,                                                                             
        'init_disp': 5e11,                                                                             
        'units': r'$M_{\odot}$',                                                                       
        'prior_function': priors.LogUniform,
        'prior_args': {'mini': 1e8, 'maxi': 1e13}
        })

Store code details in output

For maximum portability and reproducibility, the output should include the full git history, and the text of both the parameter file and the prospector script. This should wait on the hdf5 output to be implemented.

Outlier Mixture modeling

photometry sometimes has outliers. see #24

generalize lnprob

try to set up a system so that lnprob functions can be imported from a module, allowing for more flexibility in likelihood functions. This needs to work over MPI with likelihood functions that require unpickleable objects.

Error while running the interactive demo, or the deconstructed demo

I installed the prospector few months ago, although no able to use it due to some other issues. Recently I found that in the site two demos (interactive demo and deconstructed demo) have been added. I tried to follow the demos but getting following problem just before plotting the SEDs. Should I reinstall prospector ? Is 'ssp' a new attribute added in the code ? Please let me know ASAP. The error is here.

AttributeError Traceback (most recent call last)
in ()
11 if obs["wavelength"] is None:
12 # restframe spectral wavelengths, since obs["wavelength"] is None
---> 13 wspec = sps.wavelengths
14 wspec *= a #redshift them
15 else:

/home/rroy/Software/ANACONDA/anaconda2/lib/python2.7/site-packages/prospect-17f9326-py2.7.egg/prospect/sources/ssp_basis.pyc in wavelengths(self)
264 @Property
265 def wavelengths(self):
--> 266 return self.ssp.wavelengths
267
268

AttributeError: 'CSPSpecBasis' object has no attribute 'ssp'

get_spectrum clarification

Split up the get_spectrum method in the sources into smaller more digestible pieces. Ideally, move it to the SedModel class or otherwise use a suite of shared methods so it doesn't have to be written for each source type separately.

Also clarify units.

GP for photometry

implement a gaussian process for photometric data, to deal with outliers in SEDs

initial parameters checking

When they are determined from the model_params list, there should be a check to make sure that they are within the priors and that they are not zero.

Emission lines

Take emission line luminosities from FSPS and add them to spectra (and photometry).

This should use flags/arguments to specify the number of sigmas from line center to go out, and whether to coadd the lines together into a single spectrum (and into the continuum) or keep them separate (for possible use with gradient based methods)

Also use a line mask to choose which lines to actually calculate/use

Efficient looping over objects

We need to allow for looping over objects with the same mpi pool and globals to avoid startup penalties. Also, we could parallelize optimization between objects instead of having many optimizers for each object.

error when generating spectra

Hi Ben et al.,

I'm bumping into some issues with array broadcasting when using the sps function to generate a spectra.

Following the workflow in the docs I attempt to extract the Maximum a posteriori model spectra for the photometric data.

import prospect.io.read_results as bread
from prospect.sources import CSPBasis

res, pr, mod = bread.results_from('zfourge_test_128w_1024i_1488030812_1108_mcmc.h5')
res['obs']['wavelength'] = np.arange(1,100000, 1)

sps = CSPBasis(**res['run_params'])

ind_max = res["lnprobability"].argmax()
walker, iteration = np.unravel_index(ind_max, res["lnprobability"].shape)

spec, phot, mfrac = mod.mean_model(res['chain'][walker, iteration, :], obs=res['obs'], sps=sps,  peraa=True)

When attempting to extract the mean_model I get a value error. The attached folder contains a screenshot of the full error report and the outputs used to extract the spectra.
files.zip

This is driven by the extra dimensions in the spectra (possibly for the extra time-steps?) generated when executing a parameter file similar to @jrleja 's non-parametric SFHs: https://github.com/jrleja/threedhst_bsfh/tree/master/parameter_files/brownseds_np

Simply transposing the spectra to fit numpy broadcasting rules fixes this issue but there are multiple errors afterwards due to the extra dimensions in the spectrum array.

So I was wondering you have an different approach to extracting best-fit models for such param files?
When I use the same structure as the demo param file this works fine. However, the best-fit parameters (including the redshift) doesn't get sampled properly, for the zfourge z=2 galaxies.
So I thought to use similar parameters to Joel's study at z~0. This is when I run into these issues.

`AssertionError: ``sourcewave`` not in ascending order` error when running NestedDemo.ipynb

Whether or not I use the master branch or version 0.1.0 of sedpy installed by python setup.py install I get the error below. Thoughts / suggestions?

Blobs

allow for blobs to be returned by the lnprobfn, related #13

sample_results chain from *mcmc file is empty

the .h5 file is fine. but the output sample chain from the mcmc file has (0) for one of its dimensions (presumably niter). suspect this is related to the dynamic resizing of the chain variable implemented during the convergence checks?

installation problem in python 3.5

/Users/dweisz/gitcode/prospector 10:09am > python -V
Python 3.5.2 :: Anaconda 4.1.1 (x86_64)
/Users/dweisz/gitcode/prospector 10:09am > python setup.py install
Traceback (most recent call last):
File "setup.py", line 16, in
vers = subprocess.check_output(["git", "log", "--format=%h"]).split('\n')[0]
TypeError: a bytes-like object is required, not 'str'

combine ProspectrParams and ThetaParameters classes

Having all these different parameter classes floating around holding states and talking to each other is confusing. Put all the model state in one place along with the fit parameters. While we're at it, streamline the initialization of this new Parameter class and the adding of observational data.

Tutorials update

Switch docs and tutorials to new prior objects and to CSPSpecBasis instead of CSPBasis.
Add a demonstration of output using the mocks.
Add nested sampling tutorial.

How are non-detections treated?

Not a bug report but more of a clarification:
How does Prospector handle non-detections? (i.e. galaxies within the FOV of a given filter but with no continuum detection)
In those cases is it better to keep the fluxes to the 1-sigma sky detection level, or is there a specific flag?

Convergence criteria and sampling restarts

It would be nice to have convergence criteria that are checked during sampling.

Relatedly, we need a mechanism to efficiently/easily restart sampling from a particular (set of) walker locations

writing and reading CSPSpecBasis object

Question: I'd like to be able to write out and read a CSPSpecBasis object, since it takes about ~20 seconds to initialize every time the code is called.

Writing out a basic pickle file generically works but after reading it back in the code borks when the sedmodel.SedModel.mean_model method gets called (in which the CSPSpecBasis object is passed via the sps input) with the message

 SSP_GEN ERROR0: SPS_SETUP must be run once before calling SSP_GEN.

Any thoughts / suggestions?

Incompatibility with Scipy=1.1.0: `ValueError: The array returned by a function changed size between calls`

I tried executing the commands in the demo/DeconstructedDemo.ipynb notebook using an Anaconda installation of Python 3.6 with SciPy version 1.1.0 and Numpy version 1.12. However, I encounter the following error:

ValueError                                Traceback (most recent call last)
<ipython-input-25-f5c3ccf19dcb> in <module>()
     13     res = least_squares(chivecfn, np.array(pinit), method='lm', x_scale='jac',
     14                         xtol=run_params["xtol"], ftol=run_params["ftol"],
---> 15                         max_nfev=run_params["maxfev"])
     16     guesses.append(res)
     17 

~/anaconda2/envs/python3/lib/python3.6/site-packages/scipy/optimize/_lsq/least_squares.py in least_squares(fun, x0, jac, bounds, method, ftol, xtol, gtol, x_scale, loss, f_scale, diff_step, tr_solver, tr_options, jac_sparsity, max_nfev, verbose, args, kwargs)
    901     if method == 'lm':
    902         result = call_minpack(fun_wrapped, x0, jac_wrapped, ftol, xtol, gtol,
--> 903                               max_nfev, x_scale, diff_step)
    904 
    905     elif method == 'trf':

~/anaconda2/envs/python3/lib/python3.6/site-packages/scipy/optimize/_lsq/least_squares.py in call_minpack(fun, x0, jac, ftol, xtol, gtol, max_nfev, x_scale, diff_step)
     65         x, info, status = _minpack._lmdif(
     66             fun, x0, (), full_output, ftol, xtol, gtol,
---> 67             max_nfev, epsfcn, factor, diag)
     68     else:
     69         if max_nfev is None:

ValueError: The array returned by a function changed size between calls

I checked to make sure that the array pinit does not change size, and also ensured that chivecfn(pinit) evaluates correctly.

After trying to debug this, I eventually downgraded my Scipy version to 0.19 (with which, when installed on Python 2.7, I had no problem running the code). During the downgrade, Numpy, blas, and several other packages were updated. The demo notebook worked without a problem after then, so it seems like Scipy version 1.1 was the issue.

I'm curious if anybody else has had difficulties running prospector using Scipy version >0.19?

Prospector literature reference

What's the preferred citation for published work that uses Prospector?

New error- model.theta_bounds() not working

So up till a few days ago I was running prospector fine. I haven't really changed anything about my parameter file, but now when I try to run it, I get this error:

/Volumes/Berkeley/FSPS/bsfh/bsfh/models/parameters.pyc in theta_bounds(self)
221 for p, v in list(self.theta_index.items()):
222 sz = v[1] - v[0]
--> 223 pb = priors.plotting_range(self._config_dict[p]['prior_args'])
224 if sz == 1:
225 bounds[v[0]] = pb

key error: 'prior_args'

This happens when running prospector, or , when I use debug mode I can manually ask for model.theta_bounds() and the same error gets thrown, which is what's causing the error during the main run of the code. I don't know how to resolve it because it seems to be saying 'prior_args' is not a valid key for the config dictionary, which I can't manually access via model to bugtest.

Thanks!

Prior objects

use objects for priors, with methods giving lnp_prior as well as samplings from the prior

obs dictionary as object

Making the obs dictionary into a full fledged Observations object might make it clearer to handle and interact with in many cases.

Incremental saving of dynesty results

Like for the emcee fitting, we should have an option for periodically saving the dynesty results to disk during a run (the current setup already uses the dynesty sampling iterator) to guard against cancelled jobs and help in debugging. Also should probably keep the output from flushing stderr, since this overwrites other diagnostic info.

make model from pickled hdf5 info

Loading and unpickling details stored in HDF5 output, then using them to generate the model for a given location on the chain:

    with h5py.File('chains.hdf5,'r') as f:
        chain = f['sampling']['chain'][()]
        model_params = pickle.loads(f.attrs['model_params'])
        run_params = pickle.loads(f.attrs['run_params'])

    # reconstruct model
    model = sedmodel.SedModel(model_params)

    samples = chain.reshape( (-1,chain.shape[2]) )
    local_result = samples[0,:]
    spec, phot, mfrac = model.mean_model(local_result, obs=obs, sps=sps)

produces an error

TypeError: 'list' object is not callable
> ./prospect/models/parameters.py(146)propagate_parameter_dependencies()
--> 146                 value = info['depends_on'](**self.params)

I fixed it with this rather suspicious code?

    def propagate_parameter_dependencies(self):
        """Propogate any parameter dependecies. That is, for parameters whose
        value depends on another parameter, calculate those values and store
        them in the ``params`` dictionary.
        """
        if self._has_parameter_dependencies is False:
            return
        for p, info in list(self._config_dict.items()):
            if 'depends_on' in info:
                if type(info['depends_on']) is list:
                    import importlib
                    module = importlib.import_module(info['depends_on'][1])
                    func = getattr(module, info['depends_on'][0])
                else:
                    func = info['depends_on']
                value = func(**self.params)
                self.params[p] = np.atleast_1d(value)

HDF5 output

pickle files are stupid. store in hdf5, with serialized versions of everything that isn't an array as giant text strings

CSPSpecBasis missing vactoair_flag optional input

CSPSpecBasis does not accept the vactoair_flag optional input at instantiation, but it should.

have sps and sedmodel objects list their available parameters

To aid in clarity, the various object classes might store lists of the available fittable parameters. I.e. CSPSpecBasis might have an attribute available_parameters that is a list that includes all the fsps.StellarPopulation parameters as well as things like mass, lumdist, wavecal_coeffs, while the SedModel class would have poly_coeffs and spec_norm.

Transformed parameters would be a bit trickier to incorporate.

Load_filters() appending path to Anaconda lib?

(This is Imad Pasha, working with Mariska)
In my params file, I am importing load_filters from sedpy.observate. It is correctly reading my list of filter names, but is throwing the error:

Filter transmission file /Library/Anaconda/lib/python2.7/site-packages/sedpy-0.1.0-py2.7.egg/sedpy/data/filters/filter_001.par does not exist!

Clearly the second half of the path (starting with /sedpy) is correct, but it's being appended to some strange location that seems to be related to sedpy's installation as a package for anaconda. I looked up the observate file in sedpy and didn't see anywhere in load_filters to specify a specific path to the sedpy folder, for it to append to. How do I set that up?

bd-j / prospector Goto Github PK

prospector's People

Contributors

Stargazers

Watchers

Forkers

prospector's Issues

Recommend Projects

Recommend Topics

Recommend Org