Giter VIP home page Giter VIP logo

legacyqa's Introduction

DESI Legacy Survey Exposure QA

The purpose of this repo is to support automated QA of DECaLs imaging used to select DESI spectroscopic targets. The three main components are described below.

Prepare Data

Process the community pipeline output for all ~120K candidate DECaLS exposures to produce (inverse-variance weighted) downsampled thumbnails.

More details here.

Visual Inspection

Allow rapid visual inspection of about 5% of the thumbnail images to collect expert labels for the main classes of bad exposures.

Try it out here.

Machine Learning

Train various algorithms using the expert labeled data to automate the identification of bad exposures from the full dataset.

More details here.

legacyqa's People

Contributors

dkirkby avatar

Watchers

 avatar  avatar  avatar

Forkers

rongpu

legacyqa's Issues

Implement user sign in

It will be useful to uniquely identify each person providing labels. For example, this would allow conflicting labels for people with different levels of expertise to be optimally combined.

This issue is to implement a simple one-time sign in process, with the user name cached for subsequent visits to the page.

Use lower jpeg quality

Spinning off from #9: thumbnail images are currently written using the default jpeg quality of 95, which seems un-necessarily high, and pushes the client memory usage to ~800Mb.

This issue is to replace plt.imsave in prepare/extract.py with:

fig = Figure(dpi=dpi, frameon=False)
FigureCanvas(fig)
fig.figimage(arr, cmap=cmap, vmin=vmin, vmax=vmax, origin=origin, resize=True)
fig.savefig(fname, dpi=dpi, format=format, transparent=True, quality=80)

where the new quality has been tuned for a reasonable size / appearance tradeoff.

Investigate alternate image scaling

The prototype uses histogram equalization to scale each image individually (code here).

This issue is to investigate alternative schemes that use the same scaling for all images of the same band.

Filter out known bad exposures

The prototype uses all 121,123 exposures listed in:

/global/project/projectdirs/cosmo/work/legacysurvey/dr8b/image-lists/image-list-decam-dr8.txt

This issue is to use a different list that has known bad exposures filtered out, so the expert labeling can focus on identifying bad exposures that are not already being automatically filtered by the existing quality cuts.

Implement labeling backend

Clicking a label on the prototype now just displays a message to the javascript console.

This issue is to POST the label to a backend server, e.g. a google form.

Identify the main classes of bad exposure

The purpose of this issue is to discuss the main classes of bad exposure and select the labels to use in visual inspection and for machine learning classification.

The ideal number of classes (not including "good") is 4, since that is easy to implement (one per thumbnail corner) and still few enough to allow quick manual classification of ~6K images.

Slurm "out of memory" error

I am trying to process ~120K compressed FITS files (.fits.fz) from the DESI legacy imaging survey, e.g.

/global/project/projectdirs/cosmo/work/legacysurvey/dr8b/images/decam/DECam_CP/CP20140810_g_v2/c4d_140815_235550_ooi_g_v2.fits.fz

Each compressed input file is ~300Mb.

I have tested a python script (extract.py in this repo) that processes 10 files in < 10 mins, and estimate its RSS memory requirement is ~4Gb (using ps -p PID -o time,rss,vsz).

I have also tested a slurm script (extract.slurm in this repo) that works with a stub python script.

However, when I run with the real python script, jobs are killed with:

Starting slurm script at Fri Mar 29 10:28:57 PDT 2019
slurmstepd: error: Detected 1 oom-kill event(s) in step 12810111.0 cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.
srun: error: nid06081: tasks 0-23: Out Of Memory
srun: Terminating job step 12810111.0

real	0m35.528s
user	0m0.040s
sys	0m0.032s
done with slurm script at Fri Mar 29 10:29:33 PDT 2019

Is there a way to increase the default memory available per process?

Any suggestions for decreasing the memory footprint of the script? I am already using memmap=True to open the FITS file and only keep 2 HDUs in memory at once (~2 x 32Mb), so I don't understand why the memory usage is so much bigger than the whole (compressed) input file.

Implement progressive thumbnail loading

The prototype currently loads 100 images for each band.

This issue is to implement progressive loading of ~2K images per band, so that the help tab can be read immediately and images can be labeled as soon as they have loaded.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.