Giter VIP home page Giter VIP logo

bfg-nets's People

Contributors

nsfabina avatar pgbrodrick avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

dmarvs

bfg-nets's Issues

Reporter: clean up reports, misc changes

✔️ combine input and output plots to avoid redundancy

  • only show predictions when available, automatically

✔️ prediction plot labels run into one another excessively

  • turn header labels 45 degrees

✔️ printing "sample x" in each row is excessive

  • print once in header or once for central row or just rotate

✔️ weights coloring

  • use white on outside, viridis on inside

✔️ history

  • change histogram to line plot for cumulative epochs completed vs minutes elapsed

✔️ for categorical

  • correct category plot needs colors lightened
  • combine responses and predictions into single plots for each, respectively, warn if more than 20 classes!
  • don't show transforms

README

README BUT ALSO add disclaimer that this is not being maintained for everyone to use, this is basically an internal tool that is a living codebase for our own purposes, we're making it available for others to kick around, be sure to use tags if you need certain functionality, etc

Add coverage into tox tests

Note that pytest-cov is not playing well with tox right now using the following configuration option in tox.ini:

[pytest]
-addopts = --cov --no-cov-on-fail

Design data report, prior to writing code

sample base stats on dataset in both performance and comparison reports
total area in original
total area in existing plots
category names
band names
response names
number of samples
fold splits and fold identity for verification

Reporter: workup histograms, for Phil

Per conversation:

  • need better guidance on histogram plots
  • legend
  • constant y-axis across plots
  • differentiate bins for zero from non-zero bins
  • misc improvements as they come up

Move application functions to own module?

Seems like data_management is for building data and handling samples, primarily?

If so, maybe an application module for taking a completed model and applying it to any input raster?

Reporter: add "spatial confusion matrices" to results

Confusion matrices give information on correct/incorrect predictions relative to the predicted and actual classes. We want to know about the spatial context in those predictions.

e.g., trying to predict land cover classes: are predictions for class a more likely to be correct when class a is found next to class b or class c?

@pgbrodrick has an idea of how to do this easily, so talk to him about implementation

handle output directories

Determine if we want to have a network-specific output directory, and what exactly we want to go in there.

Dependencies: switch from keras to tf.keras after tensorflow 2.0 is stable

This sounds easy, but we'd like this to be relatively comprehensive:

  • update environment files
  • update imports and API references
  • in the process, ensure that the references to tensorflow and keras are compartmentalized such that we could easily switch to a new wrapper / backend if we wanted -- this is unlikely, but would be good coding practice and ensure that we're keeping our focus on what's important about our contribution... namely that we have automated tools for handling remote sensing data, building remote sensing -relevant models, and generating remote sensing -relevant reports... the actual network building is not a core contribution
  • confirm things are working correctly

Handle simultaneous attempts at data build on parallel envs like SLURM

Use case: user starts analysis pipeline for ten jobs with different model parameters but same raw file and data build configuration. Currently, all jobs will attempt to build the data and cause chaos in single directory. This can be handled manually by having the user start a single job, wait until data build is complete, and then start the remaining jobs. However, one solution could be to create a file lock in the data build directory, any jobs that find that file lock would not start data build, but would continually poll to determine when data build is complete. Apologies if terminology is off here.

Configs: use absolute paths in config

Convert relative paths to absolute paths for when configs are loaded, so that models and histories (and other) can be found regardless of the Python working directory

incorrect projection error

Currently, the following doesn't pass our checks in training_data.check_projections:

Feature/Response projection mismatch at site 0
Feature proj: WGS_1984_UTM_Zone_10N
Response proj: WGS 84 / UTM zone 10N

Find a way to make sure that these different types of proj strings don't kill us, maybe straight to EPSG code.

Design model comparison report, prior to writing code

comparison report sample content
table with all models
print configs in table
highlight rows that are different
plus some other things
network summaries
layers
coefficients
time training
features
how many features
list of built data filepaths
samples
total samples
total training samples
total validation samples
image size
loss window
training time graph
include loss and validation loss as well

Modify tensorboard callback to append to existing files

Use case: a model is being run and exits due to error or preemption, and we pick up where we left off with fitting. The history object is handled appropriately, but it would be wonderful if the tensorboard callback would simply append to the logdir so that it shows up as one run in the graphs, not multiple runs that overlap with one another.

multi-response vectors

Responses as input vector files are currently assumed to be single, binary vector responses (E.G. each vector is its own response category). Consider enabling multi-response vectors.

Fix uncaught / uninformative errors on data build with no raw files

Example:

  File "/home/nfabina/rsCNN/rsCNN/data_management/data_core.py", line 93, in build_or_load_rawfile_data
    self.config.raw_files.boundary_files)
  File "/home/nfabina/rsCNN/rsCNN/configuration/sections.py", line 517, in check_input_file_validity
    num_r_bands_per_file = [gdal.Open(x, gdal.GA_ReadOnly).RasterCount for x in r_file_list[0]]
  File "/home/nfabina/rsCNN/rsCNN/configuration/sections.py", line 517, in <listcomp>   
    num_r_bands_per_file = [gdal.Open(x, gdal.GA_ReadOnly).RasterCount for x in r_file_list[0]]
AttributeError: 'NoneType' object has no attribute 'RasterCount'

maximum_likelihood_classification error when applied to application function output

Code that caused this error:

apply_model_to_data.apply_model_to_raster(
    experiment.model, data_container, raster_for_application, basename_out)

apply_model_to_data.maximum_likelihood_classification(
    basename_out + '.tif', any_filepath_out)

Traceback:

Traceback (most recent call last):
  File "run_classification.py", line 106, in <module>
    run_classification(**args)
  File "run_classification.py", line 97, in run_classification
    filepath_out_base + 'apply.tif', filepath_out_base + 'class.tif')
  File "/home/nfabina/rsCNN/rsCNN/data_management/apply_model_to_data.py", line 182, in maximum_likelihood_classification
    prob = dataset.ReadAsArray(0, line, dataset.RasterXSize, 1)
  File "/home/nfabina/miniconda3/envs/asu/lib/python3.7/site-packages/osgeo/gdal.py", line 2089, in ReadAsArray
    callback_data = callback_data )
  File "/home/nfabina/miniconda3/envs/asu/lib/python3.7/site-packages/osgeo/gdal_array.py", line 304, in DatasetReadAsArray
    buf_obj, buf_type, resample_alg, callback, callback_data ) != 0:
  File "/home/nfabina/miniconda3/envs/asu/lib/python3.7/site-packages/osgeo/gdal_array.py", line 147, in DatasetIONumPy
    return _gdal_array.DatasetIONumPy(ds, bWrite, xoff, yoff, xsize, ysize, psArray, buf_type, resample_alg, callback, callback_data)
TypeError: in method 'DatasetIONumPy', argument 4 of type 'int'

enable multi-file scaling

Scaling is currently static, set it to link to the specified band types through data core and possibly be a list of multiple scalings.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.