Giter VIP home page Giter VIP logo

rajarshiroychoudhury / clustering_superdarn_data Goto Github PK

View Code? Open in Web Editor NEW

This project forked from vtsuperdarn/clustering_superdarn_data

0.0 1.0 0.0 374.64 MB

Classification of SuperDARN backscatter using machine learning algorithms

Home Page: https://vtsuperdarn.github.io/clustering_superdarn_data

License: BSD 3-Clause "New" or "Revised" License

Python 3.27% HTML 1.02% Jupyter Notebook 95.71% JavaScript 0.01%

clustering_superdarn_data's Introduction

clustering_superdarn_data

We are developing new models for classifying SuperDARN (Super Dual Auroral Radar Network) data using machine learning algorithms. In the past, this data has been classified point-by-point using a quadratic formula based on doppler velocity and spectral width. Recently, researchers successfully applied unsupervised clustering techniques to this data. These approaches improved on past methods, but they used a very limited set of features to create clusters and relied on simple methods (k-means, depth-first search) that do not easily capture non-linear relationships or subtle probability distributions.

This project applies DBSCAN and Gaussian Mixture Model (GMM) to the data, and provides a library with different models and classification thresholds which can be used on SuperDARN data. Depending on characteristics of the data the user wants to study, different models, parameters, and thresholds may be suitable. For example, the Ribiero threshold is best for mid-latitude radars, and the Blanchard thresholds are best for high-latitude. See below for more details about the individual models and thresholds.

Google Summer of Code 2018 project link:

https://summerofcode.withgoogle.com/projects/#5795870029643776

Project website:

https://vtsuperdarn.github.io/clustering_superdarn_data

Plotting tool:

http://vt.superdarn.org/tiki-index.php?page=class_models

Algorithms

GMM

GMM runs on 5 features by default: beam, gate, time, velocity, and spectral width. It performs well overall, even on clusters that are not well-separated in space and time. However, it will often create clusters that are too high variance,causing it to pull in scattered points that do not look like they should be clustered together - see the fanplots in cluster.ipynb. It is also slow, taking 5-10 minutes for one day of data.

https://github.com/vtsuperdarn/clustering_superdarn_data/blob/master/plotters/gmm.ipynb

DBSCAN

DBSCAN runs on 3 features: beam, gate, and time (space and time). It can classify clusters that are well-separated in space in time, but will not perform well on mixed scatter. It uses sklearn's implementation of DBSCAN, which is highly optimized, so it runs in ~10s on 1 day of data.

DBSCAN + GMM

Applies DBSCAN on the space-time features, then applies GMM to separate clusters based on velocity and width. Unlike pure DBSCAN, it can identify mixed scatter. It is also much faster than GMM, running in ~15-60s on a full day of data.

https://github.com/vtsuperdarn/clustering_superdarn_data/blob/master/plotters/dbscan_gmm.ipynb

GridBasedDBSCAN

Based on Kellner et al. 2012. Grid-based DBSCAN is a modification of regular DBSCAN
designed for automotive radars. It assumes close objects will appear wider, and distant objects will appear narrower, and varies the search area accordingly. It is not yet clear whether this assumption is advantageous for SuperDARN data.

My implementation of GBDBSCAN has not been optimized to the extent sklearn's DBSCAN has, so it takes 5-10 minutes, but there is room for improvement. So far, it provides similar performace to DBSCAN, but creates less small clusters at close ranges due to its wide search area.

https://github.com/vtsuperdarn/clustering_superdarn_data/blob/master/plotters/grid_based_dbscan.ipynb

GridBasedDBSCAN + GMM

Applies GBDBSCAN on the space-time features, then applies GMM to separate clusters based on velocity and width. Takes 5-10 minutes. Not yet clear if it's any better than DBSCAN + GMM.

https://github.com/vtsuperdarn/clustering_superdarn_data/blob/master/plotters/grid_based_dbscan_gmm.ipynb

Classification thresholds

Blanchard paper

This is the 'traditional' point-by-point classification method, developed in Blanchard 2009 for high-latitude radars. We apply it to the median values of a cluster instead of to one point at a time.

|vel| < 33.1 + 0.139 * |width| - 0.00133 * |width|^2

Blanchard code

This is the classification threshold used in the RST library, and it is credited there to Blanchard et al., but we don't know why this is used instead.

|vel| < 30 - 1/3 |width|

Ribiero

This classificaion method was developed for mid-latitude radars, and applied on clusters created using a depth-first search over space and time [Ribiero 2011]. Clusters are classified based on their time duration (L, hours) and the ratio (R) of high:low velocity scatter points in the cluster. See Ribiero 2011 Figure 4 for the full flowchart.

Setup instructions

This project was written in Python 3.5 and Python 3.6 on Ubuntu 16.04 and Ubuntu 18.04.

Ubuntu setup:

Make sure the Python3 and Python3 tkinter package is installed. This is required for matplotlib.

sudo apt-get install python3-tk

Install the dependencies using Pip (if python2 is your default, make sure to use pip3 command):

pip install -r requirements.txt

If that doesn't work, install the libraries manually using Pip:

matplotlib scipy numpy sklearn pillow jupyter

Now you can run the files using Python 3.

Windows setup

Not tested. Anaconda may be useful for Windows setup, as it contains many of the packages we use.

clustering_superdarn_data's People

Contributors

e-271 avatar muhammadvt avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.