Giter VIP home page Giter VIP logo

snowdragon's Introduction

snowdragon

This repository can be used to run and compare different models for the classification and segmentation of Snow Micro Pen (SMP) profiles.

The SMP is a fast, high-resolution, portable snow hardness measurement device. The automatic classification and segmentation models can be used for the fast analysis of vast numbers of SMP profiles. For more information about the background of snow layer segmentation and grain type classification please refer to the related publicatoins. Throughout the project the SMP dataset collected during the MOSAiC expedition was used. The plots and results of the different models can be reproduced with this repository.

Related publications

Overview

  • data/: Preprocessed SMP profiles as npz files
  • data_handling/: Scripts to preprocess and clean the raw SMP profiles
  • models/: Scripts to create, use and store models
  • output/: Output and results are stored here. Subdir evaluation/ contains plots for each model and profile. plots_data/ contains plots giving an overview over the data. plots_results/ contains plotted results. predictions/ is where predictions are stored. scores/ contains all the scores.
  • tuning/: Scripts to tune models
  • visualization/: Scripts to create plots

Setup

This repository runs on Python 3.6. For a quick setup run pip install -e . The required packages can also be installed with pip install -r requirements.txt. If wished, create an environment beforehand (eg: conda create --name=snowdragon python=3.6).

The repository does not contain the MOSAiC data used here. The data is publicly available here: https://doi.org/10.1594/PANGAEA.935934. Any other SMP dataset can be used as well with this repository to train models for automatic SMP classification and segmentation.

If you would like to use our pre-trained models instead of training the models yourself, you can download them here: https://doi.org/10.5281/zenodo.7063520.

Usage

Data Preprocessing

To preprocess all SMP profiles, run:

python -m data_handling.data_loader [path_npz_file] --smp_src [path_raw_smp_data] --exp_loc [path_dir_smp_preprocessed]
  • [path_npz_file]: Path and name of the npz file where the complete preprocessed SMP dataset is stored. For example: data/all_smp_profiles.npz
  • [path_raw_smp_data]: Path to the directory where the raw SMP data is stored
  • [path_dir_smp_preprocessed]: Path to the directory where each single preprocessed SMP profile will be stored. For example: data/smp_profiles

To get information about an already preprocessed data set, run:

python -m data_handling.data_loader [path_npz_file] --load_only --test_print

For explanations of further preprocessing options run python -m data_handling.data_loader -h.

For smooth default usage, set the SMP_LOC in data_handling/data_parameters.py to the path where all raw SMP data is stored.

Tuning

Tuning can be skipped. The default hyperparameters of all models are set to the values which produced the best results for the MOSAiC SMP dataset.

To run tuning, run first model evaluation to create a split up (training, validation, testing) and normalized dataset. The results are saved e.g. in data/preprocessed_data_k5.txt. To tune all models simply run the prepared bash script:

bash tuning/tune_models.sh [path_results_csv]

[path_results_csv] could be e.g. tuning/tuning_results/tuning_run01_all.csv.

To tune a single model run:

python -m tuning.tuning --model_type [wished_model] [path_results_csv]

See help options for more information.

After tuning, run python -m tuning.eval_tuning to aggregate and sort the tuning results for each model. The results are stored in the folder tuning/tuning_results/tables.

Model Evaluation

Model evaluation consists of preprocessing the complete dataset (in contrast to the single smp profiles as in the first step); evaluating each model; and if desired validating each model.

Preprocessing the complete data set (including data splits and preparing it for model usage) only needs to be done one time:

python -m models.run_models --preprocess

Afterwards one can just include a flag for evaluating or validating: (All results are stored for each model in the folder output/evaluation.)

python -m models.run_models --evaluate --validate

Here is the full command, where the smp file and the preprocessed dataset file can be set:

python -m --smp_npz [path_npz_file] --preprocess_file [path_txt_file] --preprocess --validate --evaluate
  • [path_npz_file]: Path to the npz file where the complete SMP dataset was stored. For example: data/all_smp_profiles_updated.npz
  • [path_txt_file]: Path to the txt file where the SMP dataset is or will be stored and the different splits of the dataset can be accessed. For example: data/preprocessed_data_k5.txt
  • preprocess: Preprocesses the [path_npz_file] data and stores the split, model-ready data in [path_txt_file].
  • evaluate: Evaluates each model based on the dataset [path_txt_file]. (Go into run_models, function evaluate_all_models to choose between different models and which evaluation information you want to have from them). Results are stored in output/evaluation/
  • validate: Validates each model based on the dataset [path_txt_file]. Results are stored in ouput/tables/.

Visualization

The data, preprocessing and results are also visualized. The plots are stored in outcome and can already be found there. There are three sets of plots that can be created: Visualizations of the original data, the normalized data and the results. Look into the code to see which plots are shown and comment out specific plots in run_visualization.py if desired.

python -m visualization.run_visualization --original_data --normalized_data --results

Structure

.
├── data
│   └── smp_profiles
├── data_handling
├── models
│   └── stored_models
├── output
│   ├── evaluation
│   │   ├── baseline
│   │   ├── blstm
│   │   ├── bmm
│   │   ├── easy_ensemble
│   │   ├── enc_dec
│   │   ├── gmm
│   │   ├── kmeans
│   │   ├── knn
│   │   ├── label_spreading
│   │   ├── lstm
│   │   ├── rf
│   │   ├── rf_bal
│   │   ├── self_trainer
│   │   └── svm
│   ├── plots_data
│   │   ├── normalized
│   │   └── original
│   ├── plots_results
│   ├── predictions
│   └── scores
├── tuning
│   └── tuning_results
│       └── tables
└── visualization

snowdragon's People

Contributors

liellnima avatar vkakerbeck avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.