Giter VIP home page Giter VIP logo

asteca's People

Contributors

gabriel-p avatar msolpera avatar waffle-iron avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

asteca's Issues

Localization bug in p-value

Sometimes this will happen:

 .../functions/get_p_value.py", line 142, in get_pval
p_vals_cl.append(float(str(p_val_cl)[4:]))
ValueError: invalid literal for float(): 0,2902803

Need to add localization so commas are never used.

Assignation of binary masses

There's a bug in synth_clust where sometimes (haven't reproduced it) the array mass_bin0 is a float, possibly because m1 is empty.

Fix data_input

Once the data_output is stable, fix 'clusters_input.dat'.
1- It could to match the output file's format, or
2- have a different, easier to read format.

Missing term in likelihood

Check Cabrera-Caño & Alfaro 1990, there appears to be a missing 1/(n-1) term in the likelihood function.

Create bad pixels mask

To be used on frames with complicated geometries or bad pixels that show blank portions.

Empty or bad pixel regions could be either filled with an average sample of stars from the frame or marked an ignored.

Manual detection

Give the option to supply a file with as many as xmin, xmax, ymin, ymax values per line to leave out rectangular sections of the frame that are either empty or unusable for some reason.

Related: #68, #107

Auto detection

  1. Perhaps use Monte Carlo to automatically detect empty regions in the frame? Generate N random points and check which ones have no neighbor stars around them. Grouping these points we could detect the empty areas.
  2. Even simpler would be to obtain the KDE for the field and identify as empty regions those that are in a curve less than x% the maximum value. The problem here would be how to check if a star is within a closed curve of accepted density value, or outside of it. More complicated even are non-closed curves.
  3. No need to use curves. Just check the center of the bin in the 2D spatial histogram. If the density in that point is below a certain threshold (for example, the maximum KDE density value for all the bins in the frame), assume that bin is in an empty region (ie: composed of "bad pixels") of the frame and mark it as invalid.

Fix colorbar position

The colorbar in the CMD with the membership probs keeps moving around when the GA and/or the p-value test are disabled/enabled.

Don't stretch cluster when zooming in

If the cluster is located near a border (see NGC1863) the zoom will look stretched because the axis won't be of the same size and the plot is square allways.

Subtract averaged field integrated mag?

Check if I should subtract to the cluster region integrated magnitude curve the averaged field integrated magnitude curve to obtain a more accurate estimation for the true cluster integrated magnitude.

Re-define cluster region

See Ref 27/SL351, the cluster region defined cuts a portion of the cluster. Perhaps define it as a square centered on the center of the cluster of length 2_1.5_r_cl.

Add no background flag

If the cluster is too large or the frame too small, it could happen the the region selected for obtaining the background falls inside the cluster.

In this case the radius, background, density, field regions, bayesian decont algor, p-value test should be skipped.

Add a flag so that the user can indicate when this happens or.

Add Saha's W parameter?

Write a function that calculates Saha's W parameter between the cluster region and all the field regions. It is another version of what the p-values distribution functions does.

On second thought, not sure it is the same thing.


This short article The W-function applied to the age of Globular Clusters, Rengel & Bruzual (2002), uses the W function to estimate ages for GCs.

The method used is similar to what ASteCA does to estimate the cluster probability of being a real physical entity through the KDE p-value: compares synthetic clusters of the same age with each other to generate a distribution of W values, then compares the observed cluster with synthetic clusters of the same age, and finally selects the "best" age estimate as that which produces the largest overlap between distributions.

More details can be found in the PhD Thesis (dead) on which the article is based. Here it is stated that the number of model points is fixed to the number of observed points (stars), see pag. 29.


Confirmed by Dr P Saha: the W function should be used when the number of model points is fixed.

Dr Saha suggested to fix this parameter to a large value (as large as possible) and assign per-star masses after the fitting is completed. But, as stated by Dr Saha: "If M is very large, W should go to the Poisson formula", which sort of defeats the purpose of using W.

This statistic is also discussed in Bayesian isochrone fitting and stellar ages Valls-Gabaud (2014), who conclude that W is:

the statistic of choice to be used in the context of CMD modeling

Bug in GA (possibly in decode)

There's an issue in the elitism/decode/fitness_eval block where the best solution is apparently not being passed along to the fitness_eval function.

I suspect this is related to the decode_ function not transforming the solution correctly.

Removal of stars in p-value is wrong

Right now before comparing the cluster region with a field region, a number of n_f stars are removed from the cluster region where n_f is the number of stars in that field region.

This results, for heavily contaminated regions, in a cluster region almost devoid of stars which forces high p-values for the cluster-field regions comparisons. For clusters not too contaminated the effect is diminished.

This was introduced via issue #12.

Accept a CMD from any single photometric system

Generalize the code to process a CMD from any arbitrary photometric system defined in the Girardi set.

Old attempts: Old 0.2.0 branch with 70 commits, 34 older commits

  • Photometric systems inp\input_params
  • Read params_input data as pd dict, get rid of global variables.
  • Read cluster data as cld dictionary with keys: id, coordinates, magnitudes (and errors), colors (and errors)
  • Add parameters (center, radius, etc) to clp dictionary as they are found.
  • Tidy up func_caller.
  • Define method to read the photometric data properly in params_input.dat
  • Read theoretical isochrones and extra parameters (mass, etc) from CMD metallicity files.
  • Read cluster's photometric data in the same format as the theoretical data.
  • Add necessary checks.
  • Integrated magnitude
  • KDE pvalue (not generalized to N dimensions)
  • Read membership probabilities.
  • Bayesian DA (see #352)
  • Clean cluster region
    • Local cell clean (see #311)
    • Other methods (see #319)
  • Observed cluster prepare
    • Tolstoy
    • Dolphin
  • Fix synthetic cluster generation.
    • Read separated theoretical filters for each color (used in binarity)
    • Get extinction coefficients from Cardelli model
    • Move isochrone
    • Max magnitude cut
    • Mass interpolation
    • Binarity
    • Completeness removal
    • Add errors
  • Likelihood (see #325)
    • Tolstoy
    • Dolphin
    • Mighell
  • Brute force
  • Genetic algorithm
  • Bootstrap
  • Synth cluster write to file
  • Add data output (not generalized to N dimensions)
  • Top tiers (not generalized to N dimensions)
  • TEST

Correct comments in get-regions

Correct the description of flag_area_stronger (currently tied to the decont algor in the comments) and other things that need it.

Use metallicity and age steps

Currently the steps defined in the input file for these two parameters has little use.

Make it so that these values are used when reading the isochrone files so as to skip values in between.

Read mem probs from file

Add an option to read the membership probabilities from file (from a previous run or user-provided) to speed up calculations.
See what happens with the plotted KDE when this happens.

Select CMD, not mag, color and phot system separated

Restrict the selection to a given CMD so that it defines the magnitude, color and photometric system used.

There's no point in leaving these things be picked until the code learns how to deal with them separately (currently it does not)

Restrict radius for high CI clusters

When the cluster has a high CI (cont index) the radius should be restricted to a value lower that the one found by the get_radius function. This way less field stars will be present in the r<r_cl CMD and the isochrone fitting process will be more accurate.

'cluster_region': re-define

The cluster-region array is constantly being used and the stars in it are always being filtered to only use stars inside the cluster's radius. Re-write this so this "cleaning" is not need anymore.

Re-write 'get_most_probable_memb'

It needs a re-name and a re-write. The name does not accurately describe what it does anymore and neither do the descriptions inside the function.

Read p-value qq-plot data from file

Create a file that can store all the values necessary for the p-value and qq-plot functions to be processed without running them, just reading data from said file.

Re-locate + discard plots

1- Move the integ magnitude plot o the first column fifth row
2- Move the memb probability distribution to the second column fifth row
3- Discard the m_p>0.75 diagram
4- Discard the N_c CMD diagram.
5- Displace the full CMD two column to the right
6- Replace the m_p>0.5 for m_p>mu and locate it after the full CMD.

Change LF presentation

Plot the cluster and field regions LFs in a single graph, flip the x axis and use high-steps.

Add Mighell's Chi_Gamma parameter?

Use it along with Saha's W parameter to estimate the field-cluster region fit.

Same issue as with Saha's W parameter, not sure it does the same thing as the p-value.

Change p-values distribution function

This is important

The 'cluster_region' should be randomly cleaned by removing a given number of stars so that it will have the same number of stars as the field region being compared to.

Revise radius assignement

The radius is assigned perhaps too fast by using the first density point that falls within the back+delta threshold (point A).

Instead take the first four density points starting from point A and select the one which lays closer to the background value as the radius.

Should the isoch fit be made with N_c stars only?

Currently the best fitting algorithm makes use of all the stars in the cluster region to compare with the synthetic clusters and obtain a best fit.

Wouldn't it be more reasonable to just use the N_c stars (most probable members) from the cluster region in this comparison?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.