princetonuniversity / fasttemplateperiodogram Goto Github PK
View Code? Open in Web Editor NEWNlogN algorithm for least-squares fitting of periodic templates to noisy, non-equispaced time-series data.
License: MIT License
NlogN algorithm for least-squares fitting of periodic templates to noisy, non-equispaced time-series data.
License: MIT License
In this line you doubled the indices, i.e. CCh[:H][:H]
. Did you mean them to be CCh[:H, :H]
? Because in this context CCh[:H][:H]
is identical to CCh[:H]
.
When trying to figure out what was going on there, I had a really hard time tracing down what sums
is supposed to be. You might consider using a dictionary or a namedtuple
to make the code easier to follow.
My guess would be that the compute_summations
function could be made much faster. How does this output compare to the standard Lomb-Scargle summations in, e.g. astropy.stats.LombScargle
?
I just released a lightweight package that computes fast NFFTs, using only numpy
and scipy
rather than a custom compiled package. It's a much lighter dependency (pure Python, pip installable), and the speed is comparable to pynfft
(faster in many cases).
Take a look, and let me know what you think. It might be worth changing to depend on that just for the sake of simplifying our installation requirements here.
a = PseudoPolynomial(p=[ 0, 1, 4 ], q=[2, 3, 0], r=0)
b = Polynomial([ 1, 4, 6 ])
prod_ab = a * b
prod_ba = b * a
The a*b
product is a PseudoPolynomial
, however the b*a
product is a Polynomial
; I'm pretty sure this is because Python will attempt to evaluate b.__mul__(a)
before a.__rmul__(b)
, and I don't see any way to work around that...
Any suggestions or is this a necessary evil? It doesn't affect the rest of the code in any way, and the "rest of the code" is the entire reason the PseudoPolynomial
class exists in the first place, so this is an extremely minor problem.
Currently, it's not clear from the package layout which routines are meant to be user-facing, and which routines are meant as utilities. Probably the easiest way to address this is to import all objects which are intended to be user-facing within tempfit/__init__.py
Recent commits make tests pretty slow (1.5 minutes on my computer): most of this time is spent in test_modeler.py
. If there are ways to speed this up (e.g. by not computing such large grids) it would facilitate further development.
This seems like a bit of a hack to me, and I'm worried about its implications. It would be useful to have this tunable at the highest level, so we can make sure it's not impacting results negatively.
pyftp is already taken.
I'm trying to do some basic imports to start to understand how the code works, and I'm running into errors immediately. This error comes up on both Python 2.7 and Python 3.6. Any ideas what might be happening here?
In [1]: from pyftp.rrlyrae import FastRRLyraeTemplateModeler
In [2]: modeler = FastRRLyraeTemplateModeler()
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-2-06871de6eacb> in <module>()
----> 1 modeler = FastRRLyraeTemplateModeler()
/Users/jakevdp/github/PrincetonUniversity/FastTemplatePeriodogram/pyftp/rrlyrae.pyc in __init__(self, filts, redo, **kwargs)
92 self.params['redo'] = redo
93 self.params['filts'] = self.filts
---> 94 self._load_templates()
95
96 def _load_templates(self):
/Users/jakevdp/github/PrincetonUniversity/FastTemplatePeriodogram/pyftp/rrlyrae.pyc in _load_templates(self)
97 pars = [ 'filts', 'template_fname', 'errfunc', 'stop', 'nharmonics', 'redo' ]
98 kwargs = { par : self.params[par] for par in pars if par in self.params }
---> 99 self.templates = get_rrlyr_templates(**kwargs)
/Users/jakevdp/github/PrincetonUniversity/FastTemplatePeriodogram/pyftp/rrlyrae.pyc in get_rrlyr_templates(template_fname, errfunc, stop, filts, nharmonics, redo)
41 ftp_templates = { ID : Template(phase=T, y=Y, errfunc=errfunc,
42 nharmonics=nharmonics, stop=stop).precompute() \
---> 43 for ID, T, Y in zip(IDs, Ts, Ys) }
44 #print "done"
45 if not template_fname is None:
/Users/jakevdp/github/PrincetonUniversity/FastTemplatePeriodogram/pyftp/rrlyrae.pyc in <dictcomp>((ID, T, Y))
41 ftp_templates = { ID : Template(phase=T, y=Y, errfunc=errfunc,
42 nharmonics=nharmonics, stop=stop).precompute() \
---> 43 for ID, T, Y in zip(IDs, Ts, Ys) }
44 #print "done"
45 if not template_fname is None:
/Users/jakevdp/github/PrincetonUniversity/FastTemplatePeriodogram/pyftp/modeler.pyc in precompute(self)
213 #print self.cn
214 #print self.sn
--> 215 self.pvectors = ftp.get_polynomial_vectors(self.cn, self.sn, sgn=1)
216
217 #print "computing ptensors"
/Users/jakevdp/github/PrincetonUniversity/FastTemplatePeriodogram/pyftp/pseudo_poly.pyc in get_polynomial_vectors(cn, sn, sgn)
346 A_n, B_n, and their derivatives
347 """
--> 348 A = ABpoly(cn, sn, sgn, 0)
349 B = ABpoly(cn, sn, sgn, 1)
350
/Users/jakevdp/github/PrincetonUniversity/FastTemplatePeriodogram/pyftp/pseudo_poly.pyc in <lambda>(c, s, sgn, alpha)
278 ABpoly = lambda c, s, sgn, alpha : [ Afunc_pp(n+1, C if alpha == 0 else S,
279 S if alpha == 0 else -C, sgn) \
--> 280 for n, (C, S) in enumerate(zip(c, s)) ]
281
282 # Hardcoded, should probably be double checked but this
/Users/jakevdp/github/PrincetonUniversity/FastTemplatePeriodogram/pyftp/pseudo_poly.pyc in <lambda>(n, p, q, sgn)
270 # An (or Bn) as a PseudoPolynomial
271 Afunc_pp = lambda n, p, q, sgn : PseudoPolynomial( \
--> 272 p= p * np.array(chebyt(n).coef)[::-1],
273 q= - sgn * q * np.array(chebyu(n-1).coef)[::-1] \
274 if n > 0 else np.array([0]),
/Users/jakevdp/anaconda/envs/ftp27/lib/python2.7/site-packages/scipy/special/orthogonal.pyc in chebyt(n, monic)
1201 kn = 2**(n - 1)
1202 p = orthopoly1d(x, w, hn, kn, wfunc, (-1, 1), monic,
-> 1203 lambda x: eval_chebyt(n, x))
1204 return p
1205
/Users/jakevdp/anaconda/envs/ftp27/lib/python2.7/site-packages/scipy/special/orthogonal.pyc in __init__(self, roots, weights, hn, kn, wfunc, limits, monic, eval_func)
143 kn = 1.0
144 self.__dict__['normcoef'] = mu
--> 145 self.__dict__['coeffs'] *= kn
146
147 # Note: eval_func will be discarded on arithmetic
TypeError: ufunc 'multiply' output (typecode 'O') could not be coerced to provided output parameter (typecode 'd') according to the casting rule ''same_kind''
Users beware, the periodogram value appears to be correct for all frequencies, however, the best fit model parameters (a=amplitude
, c=offset
, b=cos(omega * tau)
, sgn=sign(sin(omega * tau))
) returned by model.get_best_model()
are not correct. If you take the parameters returned by model.get_best_model()
and compute what the periodogram should be;
P = 1 - chi2(model) / chi2(constant)
the value does not agree with the periodogram computed by model.periodogram()
.
I'm working on a fix, but the periodogram returned by model.periodogram()
should be correct.
Here's a minimal working example:
from pyftp import modeler
import numpy as np
Nobs = 100
freq = 10
sigma = 0.01
# random observation times
x = np.sort(np.random.rand(Nobs))
# sinusoidal signal (or any other signal)
y = np.cos(2 * np.pi * freq * x)
# add some error
y += np.random.normal(loc=0, scale=sigma, size=Nobs)
err = np.ones(Nobs) * sigma
# generate sine template & precompute necessary values
sine_template = modeler.Template(cn=np.array([ 1 ]), sn=np.array([ 0 ])).precompute()
# initialize model
model = modeler.FastTemplateModeler(ofac=100, hfac=1)
model.add_templates([ sine_template ])
# fit model to data
model.fit(x, y, err)
# compute periodogram
frq, periodogram = model.periodogram()
# obtain best fit model params
best_template, params = model.get_best_model()
# find the best frequency
best_freq = frq[np.argmax(periodogram)]
# output best fit parameters
print "frq = ", best_freq, " should be ", 1.0
print "a = ", params.a, " should be ", 1.0
print "b = ", params.b, " should be ", 1.0
print "c = ", params.c, " should be ", 0.0
print "sin(omega*tau) = ", params.sgn, " should be ", 1
Hi Jake,
I've added a branch that I have been using to update the API for the FastTemplateModeler
class; in the course of unit testing, I've uncovered some discrepancies which need to be addressed and understood before moving forward. I think these issues are related to the discrepancies you were finding between SlowTemplateModeler
and FastTemplateModeler
(i.e. SlowTemplateModeler
was able to find better solutions at some frequencies).
These issues are recreated by running the custom_testing.py
script (which generates an animation illustrating several hopefully-related problems). One of the changes I've made in this branch is the addition of a TemplateModel
class, which contains the model fit (frequency, template, and template fit parameters (ModelFitParams
)), and is callable.
Here's a smaller example that uses the TemplateModel
to inject a signal and to produce the model fit y_fit
from the best_fit_pars
returned by fit_template
.
from pyftp.modeler import TemplateModel
from pyftp.template import Template
from pyftp.utils import ModelFitParams, weights
from pyftp.fast_template_periodogram import fit_template
import numpy as np
ndata = 100
frequency = 1.0
sigma = 0.1
c_n=[0.1, 0.5, 0.1]
s_n=[0.5, 0.5, 0.1]
a, b, c, sgn = 1.0, 1.0, 0.0, 1
# generate template
template = Template(c_n=c_n, s_n=s_n)
# set up model parameters
parameters = ModelFitParams(a=a, b=b, c=c, sgn=sgn)
model = TemplateModel(template, frequency=frequency, parameters=parameters)
# generate random observation times
t = np.sort(np.random.rand(ndata))
# generate signal
y = model(t)
# inject white noise
y += sigma * np.random.randn(ndata)
yerr = sigma * np.ones_like(y)
# recover parameters
p_max, best_fit_pars = fit_template(t, y, yerr, template, frequency, allow_negative_amplitudes=True)
# obtain model fit
model_fit = TemplateModel(template, frequency=frequency, parameters=best_fit_pars)
y_fit = model_fit(t)
# use definition of periodogram to compute periodogram from best_fit_pars and signal
w = weights(yerr)
ybar = np.dot(w, y)
chi2_0 = np.dot(w, (y - ybar)**2)
chi2_model = np.dot(w, (y - y_fit)**2)
chi2_signal = np.dot(w, (y - model(t))**2)
p_model = 1 - chi2_model / chi2_0
p_signal = 1 - chi2_signal / chi2_0
print("p_max (fit_template) = {p_max}\np_model (computed from best_fit_pars) = {p_model}\np_signal (computed from signal parameters) = {p_signal}".format(p_max=p_max, p_model=p_model, p_signal=p_signal))
p_max
, p_model
and p_signal
all disagreecustom_testing.py
script, you should see an animation that showsbest_fit_pars
compared with the true signal and noisy datafit_template
, one for the true underlying signal, and one for the periodogram implied by best_fit_pars
(though this is not visible due to being << 1).freq * tau_best_fit - freq * tau_signal
vs freq * tau_signal
.
The above is a snapshot of the animation at the last frame (H=1 case top, H=3 case bottom)
fit_template
is inconsistent with the best fit parameters returned by fit_template
.fit_template
is sub-optimal for certain values of tau
(around freq * tau
= 0, 0.5, 1)fit_template
nor the "true" periodogram value. (shown below).SlowTemplateModeler
was able to find more accurate solutions than the FastTemplateModeler
, which I believe is related to this collection of problems.Let me know if you have any thoughts about this. I'll keep working on this in the meantime!
John
Hi @jakevdp, I signed into Travis and ran the first test, but the build failed with this error message:
0.41s$ python setup.py install
Traceback (most recent call last):
File "setup.py", line 42, in <module>
requires=['numpy', 'scipy', 'pynfft', 'gatspy', 'astroML', 'scikit-learn'],
File "/home/travis/miniconda/envs/test-env/lib/python3.5/distutils/core.py", line 108, in setup
_setup_distribution = dist = klass(attrs)
File "/home/travis/miniconda/envs/test-env/lib/python3.5/site-packages/setuptools-27.2.0-py3.5.egg/setuptools/dist.py", line 318, in __init__
File "/home/travis/miniconda/envs/test-env/lib/python3.5/distutils/dist.py", line 253, in __init__
getattr(self.metadata, "set_" + key)(val)
File "/home/travis/miniconda/envs/test-env/lib/python3.5/distutils/dist.py", line 1208, in set_requires
distutils.versionpredicate.VersionPredicate(v)
File "/home/travis/miniconda/envs/test-env/lib/python3.5/distutils/versionpredicate.py", line 114, in __init__
raise ValueError("expected parenthesized list: %r" % paren)
ValueError: expected parenthesized list: '-learn'
The command "python setup.py install" failed and exited with 1 during .
Your build has been stopped.
I fixed this problem by changing scikit-learn
to sklearn
. However, Travis then reports that py.test
fails to find the pyftp
package:
0.45s$ python setup.py install
running install
running bdist_egg
running egg_info
creating pyftp.egg-info
writing pyftp.egg-info/PKG-INFO
writing top-level names to pyftp.egg-info/top_level.txt
writing dependency_links to pyftp.egg-info/dependency_links.txt
writing manifest file 'pyftp.egg-info/SOURCES.txt'
reading manifest file 'pyftp.egg-info/SOURCES.txt'
writing manifest file 'pyftp.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
creating build
creating build/lib
creating build/lib/pyftp
copying ./pyftp/__init__.py -> build/lib/pyftp
copying ./pyftp/fast_template_periodogram.py -> build/lib/pyftp
copying ./pyftp/gatspy_template_modeler.py -> build/lib/pyftp
copying ./pyftp/modeler.py -> build/lib/pyftp
copying ./pyftp/pseudo_poly.py -> build/lib/pyftp
copying ./pyftp/rrlyrae.py -> build/lib/pyftp
copying ./pyftp/utils.py -> build/lib/pyftp
creating build/bdist.linux-x86_64
creating build/bdist.linux-x86_64/egg
creating build/bdist.linux-x86_64/egg/pyftp
copying build/lib/pyftp/__init__.py -> build/bdist.linux-x86_64/egg/pyftp
copying build/lib/pyftp/fast_template_periodogram.py -> build/bdist.linux-x86_64/egg/pyftp
copying build/lib/pyftp/gatspy_template_modeler.py -> build/bdist.linux-x86_64/egg/pyftp
copying build/lib/pyftp/modeler.py -> build/bdist.linux-x86_64/egg/pyftp
copying build/lib/pyftp/pseudo_poly.py -> build/bdist.linux-x86_64/egg/pyftp
copying build/lib/pyftp/rrlyrae.py -> build/bdist.linux-x86_64/egg/pyftp
copying build/lib/pyftp/utils.py -> build/bdist.linux-x86_64/egg/pyftp
byte-compiling build/bdist.linux-x86_64/egg/pyftp/__init__.py to __init__.pyc
byte-compiling build/bdist.linux-x86_64/egg/pyftp/fast_template_periodogram.py to fast_template_periodogram.pyc
byte-compiling build/bdist.linux-x86_64/egg/pyftp/gatspy_template_modeler.py to gatspy_template_modeler.pyc
byte-compiling build/bdist.linux-x86_64/egg/pyftp/modeler.py to modeler.pyc
byte-compiling build/bdist.linux-x86_64/egg/pyftp/pseudo_poly.py to pseudo_poly.pyc
byte-compiling build/bdist.linux-x86_64/egg/pyftp/rrlyrae.py to rrlyrae.pyc
byte-compiling build/bdist.linux-x86_64/egg/pyftp/utils.py to utils.pyc
creating build/bdist.linux-x86_64/egg/EGG-INFO
copying pyftp.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO
copying pyftp.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying pyftp.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying pyftp.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
zip_safe flag not set; analyzing archive contents...
creating dist
creating 'dist/pyftp-0.3.0-py2.7.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Processing pyftp-0.3.0-py2.7.egg
Copying pyftp-0.3.0-py2.7.egg to /home/travis/miniconda/envs/test-env/lib/python2.7/site-packages
Adding pyftp 0.3.0 to easy-install.pth file
Installed /home/travis/miniconda/envs/test-env/lib/python2.7/site-packages/pyftp-0.3.0-py2.7.egg
Processing dependencies for pyftp==0.3.0
Finished processing dependencies for pyftp==0.3.0
0.01s$ mkdir -p $TEST_DIR
The command "mkdir -p $TEST_DIR" exited with 0.
0.39s$ cd $TEST_DIR && py.test --pyargs pyftp
============================= test session starts ==============================
platform linux2 -- Python 2.7.13, pytest-3.0.5, py-1.4.32, pluggy-0.4.0
rootdir: /tmp/pyftp, inifile:
========================= no tests ran in 0.00 seconds =========================
ERROR: file or package not found: pyftp (missing __init__.py?)
The command "cd $TEST_DIR && py.test --pyargs pyftp" exited with 4.
Done. Your build exited with 1.
I have reproduced this problem on my machine. This may be related to this Conda problem:
conda/conda#2075
To test that, I made a setup.cfg
file and set
[easy_install]
zip_ok = False
and ran
[jah5@macpro:testing]$ py.test --pyargs pyftp
============================================================ test session starts =============================================================
platform darwin -- Python 2.7.12, pytest-2.9.2, py-1.4.31, pluggy-0.3.1
rootdir: /Users/jah5/Documents/projects/fast_template_periodogram/testing, inifile:
collected 0 items
======================================================== no tests ran in 0.01 seconds ========================================================
from my own testing directory (../testing
). Any ideas for what the problem might be?
Is there a good reason to avoid the Polynomial API rather than the low-level routines in pseudo_poly.py
? It may be a bit slower, but I doubt the difference would be significant overall.
Using that API would make the code much easier to understand and maintain in the long-run. Perhaps once we have some comprehensive benchmarks, we could think about translating parts of this.
I think it would make sense to rename the top-level method to FastTemplatePeriodogram
, to match the name of the package. Additionally, reading through the code I was initially confused by the names TemplateModel
and FastTemplateModeler
, because at first glance it sounds like the two classes should do the same thing, just at different speeds. A name change would prevent that confusion as well.
Thoughts?
This function is a big workhorse of the algorithm, and it could be sped up significantly, I think, using some vectorized operations rather than a triply-nested loop.
But first we need to unit test it. It might be best to first split it into two functions, one which computes the f-hat terms using the NFFT (which could be tested using a straightforward O[N^2] computation of the same thing) and one which uses these results to compute the C, S, YC, etc.
I was trying to understand this โ are these sums just the standard Lomb-Scargle-type terms outlined in, e.g. the Zechmeister paper?
make test failed with error "ImportError: No module named 'nfft'" . Somehow python code does not sees nfft (pynfft installed)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.