radiocosmology / draco Goto Github PK
View Code? Open in Web Editor NEWA pipeline for the analysis and simulation of drift scan radio data
License: MIT License
A pipeline for the analysis and simulation of drift scan radio data
License: MIT License
I am having issues with commit 1341eb8
(check task output for NaN's/Inf's and log/dump/skip them). For now I am just going to set nan_dump = False in order not to run into this issue.
I did not look into this bug in more detail but I think it has to do with the fact that its trying to dump the file on each rank because error message is
IOError: Unable to create file (unable to open file: name = 'nandump_LoadDataFiles_0.h5', errno = 17, error message = 'File exists', flags = 15, o_flags = c2
This is the full traceback (sorry this is run with 4 processes - work fine with one):
273.6s [MPI 2/4] - INFO draco.core.task.LoadDataFiles: Reading file 1 of 774. (/project/rpp-krs/chime/chime_archive/20121207T174000Z_mingun_weather/20140621.h5)
273.6s [MPI 2/4] - INFO draco.core.task.LoadDataFiles: NaN's found in dataset /windGustDir [13 of 288 elements]
273.6s [MPI 2/4] - INFO draco.core.task.LoadDataFiles: NaN's found in dataset /windDir [13 of 288 elements]
273.6s [MPI 0/4] - DEBUG draco.core.task.LoadDataFiles: NaN found. Dumping nandump_LoadDataFiles_0.h5
273.6s [MPI 1/4] - DEBUG draco.core.task.LoadDataFiles: NaN found. Dumping nandump_LoadDataFiles_0.h5
273.6s [MPI 2/4] - DEBUG draco.core.task.LoadDataFiles: NaN found. Dumping nandump_LoadDataFiles_0.h5
273.6s [MPI 3/4] - DEBUG draco.core.task.LoadDataFiles: NaN found. Dumping nandump_LoadDataFiles_0.h5
File "/project/6003614/cahofer/ch_pipeline/venv/src/draco/draco/core/task.py", line 327, in next
out = self.next(*args)
File "/project/6003614/cahofer/ch_pipeline/venv/src/draco/draco/core/task.py", line 327, in next
dispatch(parser, *args, **kwargs)
File "/project/6003614/chime/chime_env/2018_04/base/lib/python2.7/site-packages/argh/dispatching.py", line 174, in dispatch
for line in lines:
File "/project/6003614/chime/chime_env/2018_04/base/lib/python2.7/site-packages/argh/dispatching.py", line 277, in _execute_command
for line in result:
File "/project/6003614/chime/chime_env/2018_04/base/lib/python2.7/site-packages/argh/dispatching.py", line 260, in _call
result = function(*positional, **keywords)
File "/project/6003614/cahofer/ch_pipeline/venv/src/caput/scripts/caput-pipeline", line 24, in run
P.run()
File "/project/6003614/cahofer/ch_pipeline/venv/src/caput/caput/pipeline.py", line 473, in run
out = task._pipeline_next()
File "/project/6003614/cahofer/ch_pipeline/venv/src/caput/caput/pipeline.py", line 818, in _pipeline_next
out = self.next(*args)
File "/project/6003614/cahofer/ch_pipeline/venv/src/draco/draco/core/task.py", line 327, in next
output = self._nan_process_output(output)
File "/project/6003614/cahofer/ch_pipeline/venv/src/draco/draco/core/task.py", line 394, in _nan_process_output
output = self._nan_process_output(output)
output = self._nan_process_output(output)
File "/project/6003614/cahofer/ch_pipeline/venv/src/draco/draco/core/task.py", line 394, in _nan_process_output
File "/project/6003614/cahofer/ch_pipeline/venv/src/draco/draco/core/task.py", line 394, in _nan_process_output
self.write_output(outfile, output)
File "/project/6003614/cahofer/ch_pipeline/venv/src/caput/caput/pipeline.py", line 1275, in write_output
self.write_output(outfile, output)
File "/project/6003614/cahofer/ch_pipeline/venv/src/caput/caput/pipeline.py", line 1275, in write_output
self.write_output(outfile, output)
File "/project/6003614/cahofer/ch_pipeline/venv/src/caput/caput/pipeline.py", line 1275, in write_output
output.save(filename)
File "/project/6003614/cahofer/ch_pipeline/venv/src/caput/caput/memh5.py", line 1467, in save
output.save(filename)
File "/project/6003614/cahofer/ch_pipeline/venv/src/caput/caput/memh5.py", line 1467, in save
output.save(filename)
File "/project/6003614/cahofer/ch_pipeline/venv/src/caput/caput/memh5.py", line 1467, in save
self._data.to_hdf5(filename, **kwargs)
File "/project/6003614/cahofer/ch_pipeline/venv/src/caput/caput/memh5.py", line 483, in to_hdf5
self._data.to_hdf5(filename, **kwargs)
File "/project/6003614/cahofer/ch_pipeline/venv/src/caput/caput/memh5.py", line 483, in to_hdf5
with h5py.File(filename, **kwargs) as f:
File "/project/6003614/chime/chime_env/2018_04/base/lib/python2.7/site-packages/h5py/_hl/files.py", line 394, in init
with h5py.File(filename, **kwargs) as f:
File "/project/6003614/chime/chime_env/2018_04/base/lib/python2.7/site-packages/h5py/_hl/files.py", line 394, in init
swmr=swmr)
File "/project/6003614/chime/chime_env/2018_04/base/lib/python2.7/site-packages/h5py/_hl/files.py", line 195, in make_fid
fid = h5f.create(name, h5f.ACC_EXCL, fapl=fapl, fcpl=fcpl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
self._data.to_hdf5(filename, **kwargs)
File "/project/6003614/cahofer/ch_pipeline/venv/src/caput/caput/memh5.py", line 483, in to_hdf5
with h5py.File(filename, **kwargs) as f:
File "/project/6003614/chime/chime_env/2018_04/base/lib/python2.7/site-packages/h5py/_hl/files.py", line 394, in init
swmr=swmr)
File "/project/6003614/chime/chime_env/2018_04/base/lib/python2.7/site-packages/h5py/_hl/files.py", line 195, in make_fid
fid = h5f.create(name, h5f.ACC_EXCL, fapl=fapl, fcpl=fcpl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5f.pyx", line 105, in h5py.h5f.create
IOError: Unable to create file (unable to open file: name = 'nandump_LoadDataFiles_0.h5', errno = 17, error message = 'File exists', flags = 15, o_flags = c2
We now have ThresholdVisWeights
and ThresholdVisWeightsTime
, and will have ThresholdVisWeightsBaseline
after #200 is merged. These should be refactored in an intelligent way.
BasicCont
object have a .history
section, this is where we should have added the version and config tracking, and so we should put them there (using the .add_history(...)
call), rather than in the root metadata.
This task should:
weight
property).RandomGen
which is already a dependency)@tristpinsm feel free to add items to this one.
The SpectroscopicCatalog
container has a z_error
field that currently isn't modified by the Add{Gaussian,EBOSS}ZErrorsToCatalog
tasks. It would make sense to store the standard deviation of the distribution the z errors are drawn from (added in quadrature to the existing z_error
value if one exists).
ch_pipeline.analysis.calibration
to draco.analysis.calibration
:
DetermineSourceTransit
TransitFit
GainFromTransitFit
FlagAmplitude
ch_util.ephemeris
.EigenContainer
with the the equivalent of the evec
, eval
, and erms
datasets from the chime real-time pipeline. Create a subclass of TimeStream
and EigenContainer
that can replicate the chimecal
acquisitions.EigenContainer
.ch_util
modules. We will need to decide where to put these:
CollateProducts
introduces a numpy unicode type somewhere in the timestream file. This only causes an issue when the timestream is saved, as h5py does not handle numpy unicode. There is a method in caput.memh5
designed to handle this, but it misses this somehow. Related issues tracked here: radiocosmology/caput#230
This ticket is to track the migrating of the code to NumPy 1.17.
In NumPy 1.17, RandomGen
was integrated into NumPy.
The advantage of using RandomGen
over Numpy's legacy random is mainly performance.
From @jrs65:
As @tristpinsm says the advantage is performance, and there are certain tasks (of which this will be one) where the speed of the RNG is the bottleneck. I introduced it for the delay power spectrum estimator (which in some sense internally does what your doing here hundreds of times), and it took it down from 40 mins per power spectrum to more like 10 mins.
However, the changes made to the NumPy API as part of this integration were substantial. A small excerpt:
So, seeding seems to work in different ways for the "legacy random" and the "new generators".
RandomState provides access to legacy random https://numpy.org/devdocs/reference/random/legacy.html. get_state/set_state/seed specifically work with the legacy randoms https://numpy.org/devdocs/reference/random/legacy.html?highlight=seed.
The new RandomGenerator works by initialising a generator with a seed https://numpy.org/devdocs/reference/random/generator.html#numpy.random.Generator. SeedSequence https://numpy.org/devdocs/reference/random/bit_generators/generated/numpy.random.SeedSequence.html#numpy.random.SeedSequence is the main class that determines the sequence of seeds.
So if we bump to NumPy 1.17, it will be a bit of a refactor, and the two random generators do not intersect with their seed states.
This task has a few issues that should be fixed up.
Overall the point of the task is two fold:
Both of these regions should have their weights set to zero.
As implemented this task has several issues:
ApplyRFIMask
task.I think a reasonable path to achieve this is:
This enhancement is to add to draco.core.task.SingleTask
a config option which allows setting of arbitrary output parameters on any BasicCont style outputs.
This would be something like:
<task stuff>
params:
attributes:
tracer: QSO
oldtag: "oldtag_{tag}"
where the {...}
allows interpolation of strings using any item already within the params (plus maybe the count
parameter from each task).
=============================== warnings summary ===============================
test/test_write_metadata.py::test_metadata_to_hdf5
test/test_write_metadata.py::test_metadata_to_yaml
test/test_write_metadata.py::test_metadata_to_yaml
/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/caput/pipeline.py:792: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() or inspect.getfullargspec()
setup_argspec = inspect.getargspec(self.setup)
test/test_write_metadata.py::test_metadata_to_hdf5
test/test_write_metadata.py::test_metadata_to_yaml
test/test_write_metadata.py::test_metadata_to_yaml
/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/caput/pipeline.py:813: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() or inspect.getfullargspec()
next_argspec = inspect.getargspec(self.next)
test/test_write_metadata.py::test_metadata_to_hdf5
test/test_write_metadata.py::test_metadata_to_yaml
test/test_write_metadata.py::test_metadata_to_yaml
/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/draco/core/task.py:302: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() or inspect.getfullargspec()
pro_argspec = inspect.getargspec(self.process)
test/test_write_metadata.py::test_metadata_to_yaml
/home/travis/virtualenv/python3.7.1/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
return f(*args, **kwds)
-- Docs: https://docs.pytest.org/en/latest/warnings.html
==================== 2 passed, 10 warnings in 0.34 seconds =====================
Although LoadFilesFromParams
accepts lists of indices for axis selections, it appears that caput.mpiarray
only supports slices.
Here is the relevant traceback for a pipeline that attempted to use LoadFilesFromParams
with a freq_index
property.
output = self.process()
File "/project/6003614/tristpm/maps/code/draco/draco/core/io.py", line 528, in process
cont = self._load_file(file_)
File "/project/6003614/tristpm/maps/code/draco/draco/core/io.py", line 434, in _load_file
cont = new_cls.from_file(
File "/project/6003614/tristpm/maps/code/caput/caput/memh5.py", line 1550, in from_file
data = MemGroup.from_hdf5(
File "/project/6003614/tristpm/maps/code/caput/caput/memh5.py", line 469, in from_hdf5
self = _distributed_group_from_hdf5(
File "/project/6003614/tristpm/maps/code/caput/caput/memh5.py", line 2601, in _distributed_group_from_hdf5
_copy_from_file(f, group, selections)
File "/project/6003614/tristpm/maps/code/caput/caput/memh5.py", line 2568, in _copy_from_file
pdata = mpiarray.MPIArray.from_hdf5(
File "/project/6003614/tristpm/maps/code/caput/caput/mpiarray.py", line 659, in from_hdf5
gshape.append(_len_slice(sl, l))
File "/project/6003614/tristpm/maps/code/caput/caput/mpiarray.py", line 1110, in _len_slice
start, stop, step = slice_.indices(n)
AttributeError: 'list' object has no attribute 'indices'
I pulled the most recent version of draco which I haven't done in a a few months (includes most recent pull request #13 ) and there seems to be some issue with the new SiderealStream container when I try to simulate a sidereal stream:
When using draco/synthesis.stream.SimulateSidereal(task.SingleTask)
sstream = containers.SiderealStream(freq=freqmap, ra=ntime, input=feed_index, prod=tel.uniquepairs, distributed=True, comm=map_.comm)
File "/project/6003614/cahofer/ch_pipeline/venv/src/draco/draco/core/containers.py", line 571, in __init__
stack['prod'][:] = np.arange(len(prod))
ValueError: could not broadcast input array from shape (752) into shape (752,2)
prod is a (752,2) array in the old format.
Seems like this is already a quite old change (4 months ago by tristan)
here the traceback
File "/home/pboubel/code/draco/draco/analysis/transform.py", line 548, in process marray = _make_marray(sstream.vis[:], mmax) File "/home/pboubel/code/draco/draco/analysis/transform.py", line 568, in _make_marray marray = _pack_marray(mmodes, mmax) File "/home/pboubel/code/draco/draco/analysis/transform.py", line 593, in _pack_marray marray[:mlim+1, 0] = mmodes[:mlim+1] # Non-negative modes ValueError: could not broadcast input array from shape (3,7155,4096) into shape (2049,3,7155)
This worked before the latest changes.
When I try to use DirtyMapMaker on some input m-modes, I get an error with the following traceback:
File "/home/sforeman/ch/ch_pipeline/src/draco/draco/core/task.py", line 329, in next
output = self.process(*input)
File "/home/sforeman/ch/ch_pipeline/src/draco/draco/analysis/mapmaker.py", line 84, in process
freq_ind = [find_key(bt_freq, mf) for mf in mm_freq]
File "/home/sforeman/ch/ch_pipeline/src/draco/draco/analysis/mapmaker.py", line 84, in <listcomp>
freq_ind = [find_key(bt_freq, mf) for mf in mm_freq]
File "/home/sforeman/ch/ch_pipeline/src/draco/draco/analysis/mapmaker.py", line 74, in find_key
return map(tuple, list(key_list)).index(tuple(key))
AttributeError: 'map' object has no attribute 'index'
StackOverflow (https://stackoverflow.com/questions/33717314/attributeerror-map-obejct-has-no-attribute-index-python-3) says this is a Python 3 compatibility thing (I'm using Python 3.6.9), but the fix they recommend doesn't work either. On the other hand, if we take the find_key() routine,
draco/draco/analysis/mapmaker.py
Lines 72 to 78 in f9659fa
and change line 75 from except TypeError:
to except (TypeError, AttributeError):
, everything runs fine. Should I submit a PR with this change?
version strings should be in SingleTask._global_config SingleTask._metadata
This is an issue to track the NaN/inf values appearing in the SiderealRegridder
task. Based on the current testing:
SmoothVisWeights
(draco#187) and setting weights to zero where vis_weights are zero avoids the issue, suggesting that it's related to the inclusion of zeros in the median calculation in SmoothVisWeights
. Setting all weights to 1 or using scipy.ndimage.median_filter
produces the error.SmoothVisWeights
and then feeding that into SiderealRegridder
in either a notebook or a new pipeline task avoids the issue. The only thing I can think of here is that the save/load process affects the data in a subtle way--convention=numpy --add-ignore=D105,D202
...will require fixing a lot of docstrings
On days where there is no data available (e.g. CSD=2214) the daily pipeline will run and not do anything except the BeamFormCat
task which crashes as the .epoch
attribute hasn't been set (as it derives from the data).
Traceback (most recent call last):
File "/home/jrs65/chime_pipeline_stable/code/caput/caput/scripts/runner.py", line 430, in <module>
cli()
File "/project/rpp-chime/chime/chime_env/modules/chime/python/2021.03/lib/python3.8/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/project/rpp-chime/chime/chime_env/modules/chime/python/2021.03/lib/python3.8/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/project/rpp-chime/chime/chime_env/modules/chime/python/2021.03/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/project/rpp-chime/chime/chime_env/modules/chime/python/2021.03/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/project/rpp-chime/chime/chime_env/modules/chime/python/2021.03/lib/python3.8/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/home/jrs65/chime_pipeline_stable/code/caput/caput/scripts/runner.py", line 148, in run
P.run()
File "/project/6003614/chime/chime_processed/daily/rev_03/code/caput/caput/pipeline.py", line 603, in run
out = task._pipeline_next()
File "/project/6003614/chime/chime_processed/daily/rev_03/code/caput/caput/pipeline.py", line 1038, in _pipeline_next
out = self.next(*args)
File "/project/6003614/chime/chime_processed/daily/rev_03/code/draco/draco/core/task.py", line 319, in next
output = self.process(*input)
File "/project/6003614/chime/chime_processed/daily/rev_03/code/draco/draco/analysis/beamform.py", line 695, in process
self._process_catalog(source_cat)
File "/project/6003614/chime/chime_processed/daily/rev_03/code/draco/draco/analysis/beamform.py", line 611, in _process_catalog
catalog["position"]["ra"], catalog["position"]["dec"], self.epoch
AttributeError: 'BeamFormCat' object has no attribute 'epoch'
Hi all,
I am trying to implement a pipeline for measuring power spectrum via draco (Or more specifically the tools for
Radio Cosmology). I wrote a config file with the appropriate tasks but I get the error:
WARNING:draco.synthesis.stream.SimulateSidereal:Use of
output_root is deprecated. Traceback (most recent call last): File "/cluster/home/bin/caput-pipeline", line 8, in <module> sys.exit(cli()) File "/cluster/apps/nss/python/3.7.4/x86_64/lib64/python3.7/site-packages/click/core.py", line 764, in __call__ return self.main(*args, **kwargs) File "/cluster/apps/nss/python/3.7.4/x86_64/lib64/python3.7/site-packages/click/core.py", line 717, in main rv = self.invoke(ctx) File "/cluster/apps/nss/python/3.7.4/x86_64/lib64/python3.7/site-packages/click/core.py", line 1137, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/cluster/apps/nss/python/3.7.4/x86_64/lib64/python3.7/site-packages/click/core.py", line 956, in invoke return ctx.invoke(self.callback, **ctx.params) File "/cluster/apps/nss/python/3.7.4/x86_64/lib64/python3.7/site-packages/click/core.py", line 555, in invoke return callback(*args, **kwargs) File "/cluster/home/lib64/python3.7/site-packages/caput/scripts/runner.py", line 161, in run P.run() File "/cluster/home/lib64/python3.7/site-packages/caput/pipeline.py", line 633, in run out = task._pipeline_next() File "/cluster/home/lib64/python3.7/site-packages/caput/pipeline.py", line 1085, in _pipeline_next out = self.next(*args) File "/cluster/home/lib64/python3.7/site-packages/draco/core/task.py", line 347, in next output = self.process(*input) TypeError: process() missing 1 required positional argument: 'inp'
I couldn't pinpoint the source of the error and was not sure if it's related to the config file I wrote. Can you please help me with this?
The numpy.where
call in util.tools.invert_no_zero
unnecessarily makes a full copy of the array in memory. We should re-implement this function with Cython to do it in place, and probably also parallelise it too.
The routines in draco.analysis.ringmapmaker
are incompatible with sidereal streams simulated with non-CHIME
telescope classes in at least two ways:
The following check in MakeVisGrid
fails:
draco/draco/analysis/ringmapmaker.py
Lines 59 to 65 in 0c4b397
view
of sstream.prodstack
as np.uint16
. Changing the type to np.int
resolves things:Sidereal streams from draco.synthesis.stream.SimulateSidereal
do not have reverse_map["stack"]
, causing this to fail in MakeVisGrid
:
draco/draco/analysis/ringmapmaker.py
Lines 106 to 112 in 0c4b397
reverse_map["stack"]
is constructed by a few lines in draco.analysis.transform.CollateProducts
, we could simply copy these lines into draco.synthesis.stream.SimulateSidereal
to ensure that simulated sidereal streams have this field. This would probably help for future compatibility of simulated sidereal streams and other routines as well.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.