Comments (16)
that's right, except that vis_weight
is actually the inverse variance
from draco.
Edited based on Tristan's comment
My understanding:
- The
weight
property is the estimate of the variance of the noise in each datasample. vis_weight
is the inverse variance- The task should zero all data and replace it with a realisation of a Gaussian noise distribution with a
weight
variance
Is the above correct?
from draco.
@tristpinsm Thanks!
from draco.
Notes from meeting:
For a random generator, the following key things are needed:
- you want it to be reproducable
- you want it to be generating different arrays for each node and for each realisation
- for
mpi_random_seed
, theextra
param takes care of generating a different array for each node. we also may want to have multiple reproducible realisations. Proposal is to either add a counter to tasks that employ it or thempi_random_seed
context manager keeps track of how often it was run.
from draco.
Do we want to update the other tasks to use RandomGen
instead of NumPy.random
?
from draco.
Would the reason to do so be performance? Do we think using numpy
is currently a limitation?
from draco.
I will look into the difference between them! I just noticed that @jrs65 recommended using RandomGen
and elsewhere in synthesis/noise.py
uses NumPy.random
.
from draco.
yeah I had never heard of RandomGen
until now and I'm still not sure what its advantages are. From what I read in their docs it is supposed to be quite a bit faster in certain situations. They also note that compatibility (presumably between versions) is not guaranteed.
from draco.
I think this has been integrated in numpy
already: https://docs.scipy.org/doc/numpy/reference/random/generator.html#numpy.random.Generator. As long as we are running a recent version this functionality should already be available (?)
from draco.
I don't think there's any reason to go back and change things, but as RandomGen
is already a requirement we should use it when we can.
As @tristpinsm says the advantage is performance, and there are certain tasks (of which this will be one) where the speed of the RNG is the bottleneck. I introduced it for the delay power spectrum estimator (which in some sense internally does what your doing here hundreds of times), and it took it down from 40 mins per power spectrum to more like 10 mins.
from draco.
I agree that it looks like it has been merged into numpy in 1.17 (or at least the core). I'd like to check that it still has the OpenMP parallel mode though before dropping the dependency.
from draco.
Understood!
The reason behind changing the original ones was to simplify the seed state setting (to avoid having one for setting the NumPy seed and one for setting the RandomGen seed). But if it is integrated into NumPy
(I will check if it still has the OpenMP parallel mode), perhaps they share a state
.
from draco.
So, seeding seems to work in different ways for the "legacy random" and the "new generators".
RandomState
provides access to legacy random https://numpy.org/devdocs/reference/random/legacy.html. get_state/set_state/seed specifically work with the legacy randoms https://numpy.org/devdocs/reference/random/legacy.html?highlight=seed.
The new RandomGenerator
works by initialising a generator with a seed https://numpy.org/devdocs/reference/random/generator.html#numpy.random.Generator. SeedSequence
https://numpy.org/devdocs/reference/random/bit_generators/generated/numpy.random.SeedSequence.html#numpy.random.SeedSequence is the main class that determines the sequence of seeds.
So if we bump to NumPy 1.17, it will be a bit of a refactor, and the two random generators do not intersect with their seed states.
from draco.
I cannot confirm whether it still has OpenMP parallel mode, but I do not see anything in here that would indicate support being removed: numpy/numpy#13163.
from draco.
It seems like the changes are pretty significant between 1.16 and 1.17. Should we create an issue to migrate whatever uses the old RandomState
methods to Generator
s?
from draco.
@tristpinsm Agreed, they are big and out of scope for this issue/pr, and I think it would be good to do the migration.
from draco.
Related Issues (20)
- Migrate to NumPy 1.17 HOT 5
- _pack_marray function from commit 1d4730 buggy HOT 2
- add version strings to output.attrs written by SingleTask._save_output
- Run pydocstyle in .travis.yml
- Clean up warnings generated by unit tests HOT 1
- Convert ICRS to CIRS coordinates while beam forming
- Insert version and config into the "history" HOT 4
- Incompatibility of hybrid ringmap maker with simulated sidereal streams
- Fix crash in beamformer when no data is available HOT 2
- Axis selections as list of indices don't work with `mpiarray` HOT 6
- Intelligently set `z_error` field for mock catalogs HOT 2
- Allow setting arbitrary attributes via params in tasks
- Cythonize `invert_no_zero` HOT 1
- TypeError: process() missing 1 required positional argument: 'inp'` HOT 4
- Fixes for ThresholdVisWeight HOT 1
- Refactor code related to thresholding visibility weights HOT 10
- `dataset_id` flag has unicode type but is not checked when saving HOT 12
- Numerical issues in `SmoothVisWeights` -> `SiderealRegridder` tasks.
- Generalize chime eigen-calibration to arbitrary driftscan telescopes
- BUG: numpy >= 2 causes binary compatibility issues with bit shuffle
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from draco.