Giter VIP home page Giter VIP logo

aomlhrdradar / ronin.jl Goto Github PK

View Code? Open in Web Editor NEW

This project forked from irslushy/ronin.jl

0.0 0.0 0.0 80.34 MB

RONIN (Random forest Optimized Nonmeteorological IdentificatioN) (currently RadarQC.jl) is a Julia implementation of Dr. Alex DesRosiers' P3 MLQC code for removing non-meteorological gates from airborne radar scans using random forests.

Home Page: https://irslushy.github.io/Ronin.jl/

License: Other

Shell 0.06% Python 4.71% Julia 36.94% Jupyter Notebook 58.30%

ronin.jl's Introduction

Ronin.jl

Ronin.jl (Random forest Optimized Nonmeteorological IdentificatioN) contains a julia implementation of the algorithm described in DesRosiers and Bell 2023 for removing non-meteoroloigcal gates from airborne radar scans. Care has been taken to ensure relative similarity to the form described in the manuscript, but some changes have been made in the interest of computational speed.

A key part of the process is computing necessary derived parameters from the raw radar moments, which may be custom-specified in a parameters file. Many of the relevant functions for these calculations are contained within Ronin.jl.



Acknowledgments

Much of the data used to train the models in this repository is the product of arduous manual editing of radar scans. ELDORA data is provided by the authors of Bell, Lee, Wolff, & Cai 2013. NOAA P3 TDR Data is courtsey of Dr. Paul Reasor, Dr. John Gamache, and Kelly Neighbour. As mentioned above, the code is adapted from the original work of Dr. Alex DesRosiers.


Getting Started:

Setting up the environment (CSU)

After cloning the repository, start Julia using Ronin as the project directory, either by calling

julia --project=Ronin

from the parent directory of Ronin or modifying the JULIA_PROJECT environment variable.
Then, enter package mode in the REPL by pressing ].


Next, run instantiate to download the necessary dependencies. This should serve both to download/install dependencies and precompile the Ronin package. Now, exit package using the dlete key. To ensure that everything was installed properly, run using Ronin on the Julia REPL. No errors or information should print out if successful. Run add iJulia if you will be viewing the code in a Jupyter notebook and need access to the Jupyter kernel.

Guide adaped from https://github.com/mmbell/Scythe.jl/tree/main

Setting up the environment (Derecho)

Getting Julia

export JULIA_DEPOT_PATH=$SCRATCH/julia
curl -fsSL https://install.julialang.org | sh

Now, exit package mode using the delete key. To ensure that everything was installed properly, run using Ronin on the Julia REPL. No errors or information should print out if successful.

Guide adapted from https://github.com/mmbell/Scythe.jl/tree/main

Example notebook


If you're looking to jump right in, check out Ronin Example Notebook - it contains everything you need to get up and running.



Guide: Processing new data, training, and evaluating a new model



The first step in training a new random forest model is determining which portions of the data will be used for training, testing, and validation. A helpful function here is split_training_testing! - this can be used to automatically split a collection of scans into a training directory and a testing directory. In order for the script to be configured properly, the variables relating to the different paths must be modified by the user - this is shown in the example notebook.

The current configuration is consistent with the 80/20 training/testing split described in the manuscript, as well as to have an equal number of scans from each "case" represented in the testing set. It is expected that the script would work for different training/testing splits, but this has not yet been tested.

Once the training and testing scans have been placed into separate directories, data processing may begin. calculate_features will be the primary function utilized here. The script processes a directory (or single scan) of scans, and outputs the calculated features into an .h5 file, with the desired features specified by the user in a text file.

For the case where training scans are located within /cfradials/training/, the desired features to be calculated are specified in features.txt, and you wish to output the input features to training_set.h5, invoke the function as

calculate_features("/cfradials/training", "features.txt", "training_set.h5")

If you wish to remove a validation set from the training dataset, utilize remove_validation

Finally, we can train a model to process our data. To do so, utilize train_model. If training data is contained within training_set.h5, and you wish to name your trained model trained_model.jld2, invoke as follows. It's recommended to end the model name in .jld2 as this is the method used to serialzied it to disk.

train_model("training_set.h5", "trained_model.jld2")

NOTE: This may take on the order of 20-30 minutes if running on the entire ELDORA set.

This script also includes the option to verify the model on the training set and output the results to a separate h5 file. If you wish to do this, execute the same as above, but include the keyword argument verify=true

Evaluating the model

Now - let's apply the trained model on a set of data. The useful function here is QC_scan. In order, pass it arguments of the input location, the configuration file, and the path to the trained model. For this reason, it's important to keep the configuration file used to calculate input features in a known location.

The function will calculate the necessary input features, apply the Random Forest model, and apply the resulting prediction the fields specified by keyword argument VARIABLES_TO_QC. These new variables will then be written back out into the specified netcdf file under the field name concatenated with keyword argument QC_suffix. If this name is already in use in the NetCDF, it will be overwritten.


Notes on data conventions


Some important data convetions to make note of:

  • Meteorological Data is referred to by 1 or true
  • Non-Meteorological Data is referred to by 0 or false
  • ELDORAscan variable names:
    • Raw Velocity: VV
    • QC'ed Velocity (Used for ground truth): VG
    • Raw Reflectivity: ZZ
    • QC'ed Reflectivity (Used for ground truth): DBZ
    • Normalized Coherent Power/Signal Quality Index: NCP
  • NOAA TDR scan variable names:
    • Raw Velocity: VEL
    • QC'ed Velocity (Used for ground truth): VG
    • Raw Reflectivity: DBZ
    • QC'ed Reflectivity: ZZ
    • Normalized Coherent Power/Signal Quality Index: SQI

ronin.jl's People

Contributors

irslushy avatar cenamiller avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.