Giter VIP home page Giter VIP logo

sidchaini / astronomical-classification-plasticc Goto Github PK

View Code? Open in Web Editor NEW
4.0 2.0 0.0 38.38 MB

With an ever-increasing amount of astronomical data being collected, manual classification has become obsolete; and machine learning is the only way forward. Keeping this in mind, the LSST Team hosted the PLAsTiCC in 2018. This repository details our approach to this problem.

License: MIT License

Python 100.00%
deep-learning astronomical-data gated-recurrent-units keras-tensorflow python

astronomical-classification-plasticc's Introduction

Astronomical Classification of Light Curves with an Ensemble of Gated Recurrent Units

GitHub arXiv MIT

Abstract

With an ever-increasing amount of astronomical data being collected, manual classification has become obsolete; and machine learning is the only way forward. Keeping this in mind, the Large Synoptic Survey Telescope (LSST) Team hosted the Photometric LSST Astronomical Time-Series Classification Challenge (PLAsTiCC) in 2018. The aim of this challenge was to develop ML models that accurately classify astronomical sources into different classes, scaling from a limited training set to a large test set. In this text, we report our results of experimenting with Bidirectional Gated Recurrent Unit (GRU) based models to deal with time series data of the PLAsTiCC data. We demonstrate that GRUs are indeed suitable to handle time series data. With minimum preprocessing and without augmentation, our stacked ensemble of GRU and Dense networks achieves an accuracy of 76.243%. Data from astronomical surveys such as LSST will help researchers answer questions pertaining to dark matter, dark energy and the origins of the universe; accurate classification of astronomical sources is the first step towards achieving this.

This project is part of a submission for the course DSE 301: AI, IISER Bhopal, Fall 2020.

Prerequisites

Python 3.6+, numpy, pandas, matplotlib, sk-learn, keras, tensorflow 2+ and scikitplot. For example, for 3 authors, the directory structure is as follows:

Astronomical-Classification-PLASTICC
├── Astronomical Classification Report.pdf
├── README.md	
├── LICENSE		
├── OnePageAbstract.pdf		
└── code/
    ├── *.py
    ├── *.h5
    ├── *.csv
    └── *.pickles

The PLAsTiCC Dataset can be downloaded from here. It can be stored anywhere on the computer, and the location will be asked in the form of an input when you first run the program.

Order of running

Run the py files in the order:

1)      preprocessing.py
2a)     cross_val_2dsubm.py
2b)     cross_val_3dsubm.py
3a)     random_search_2dsubm.py
3b)     random_search_3dsubm.py
4)      create_ensemble.py
5)      create_submission.py
6)      evaluate.py

Description of py files

  1. preprocessing.py This preprocesses the input data as outlined in Section 3.2 of our report.

         Input: The light curve data and metadata made available by the PLAsTiCC team on [Kaggle](https://www.kaggle.com/c/PLAsTiCC-2018/data).
         
         Output: Preprocessed data files stored as pickle files:
             filename_3d_pickle: for the 3DSubM(3D Sub Model) data.
             filename_2d_pickle: for the 2DSubM(2D Sub Model) data.
             filename_label_pickle: for the true classes of each object.
             
         Note: While not made available originally in the competition, we also make use of the [unblinded PLAsTiCC dataset](https://zenodo.org/record/2539456) to get the true class an object from the test dataset belongs to. This is then used to evaluate our performance in evaluate.py. No other data from unblinded PLAsTiCC dataset is used.
    
  2. a) cross_val_2dsubm.py This calculates the cross-validation accuracy for an elementary 2DSubM densely connected deep network, using the 2D data.

         Input:  The 2DSubM training data pickles created by preprocessing.py
         
         Output: Prints the cross-validation accuracy for the basic model.
    

    b) cross_val_3dsubm.py This calculates the cross-validation accuracy for an elementary 3DSubM deep network consisting of Bidirectional GRUs and Dense layers, using the 3D data.

         Input:  The 3DSubM training data pickles created by preprocessing.py
         
         Output: Prints the cross-validation accuracy for the basic model.
    
  3. a) random_search_2dsubm.py This does a random search across the hyperparameter space in search of the best hyperparameters as to maximise the validation accuracy of the 2D Sub Model, 2DSubM.

         Input:  The 2DSubM training data pickles created by preprocessing.py
         
         Output: The top 20 2DSubM models from the random search are saved in the form of h5 files.
    

    b) random_search_3dsubm.py This does a random search across the hyperparameter space in search of the best hyperparameters as to maximise the validation accuracy of the 3D Sub Model, 3DSubM.

         Input:  The 3DSubM training data pickles created by preprocessing.py
         
         Output: The top 20 3DSubM models from the random search are saved in the form of h5 files.
    
  4. create_ensemble.py This creates an ensemble of the top 2 2DSubM models and top 2 3DSubM models. This is trained on the validation data.

         Input: The top 2 2DSubM h5, top 2 3DSubM h5 models, the 2DSubM training data pickles and the 3DSubM training data pickles created by preprocessing.py
         
         Output: Ensemble h5 file
    
  5. create_submission.py This creates a submission csv file as specified by the Kaggle team which can then be submitted to the Kaggle competition. Input: The ensemble h5 file, the 2DSubM training data pickles and the 3DSubM training data pickles created by preprocessing.py Output: Submission CSV file.

  6. evaluate.py This evaluates the model against the test data pickles created by preprocessing.py, and calculates evaluation metrics by using the true classes provided in the unblinded PLAsTiCC dataset.

         Input: The test data pickles created by preprocessing.py
         
         Output: Prints the accuracy and other evaluation metrics for the ensemble model.
    

Authors

Siddharth Chaini, Soumya Sanjay Kumar

IISER Bhopal

astronomical-classification-plasticc's People

Contributors

sidchaini avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

astronomical-classification-plasticc's Issues

SyntaxError: invalid syntax in preprocessing.py

I'm trying to run the preprocessing.py code.
However, when I try, I get:

python preprocessing.py
File "preprocessing.py", line 20
if (os.path.isfile(fr"{pickle_location}{mainfilename}_3d_pickle")
^
SyntaxError: invalid syntax

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.