Giter VIP home page Giter VIP logo

deepseti's Introduction

DeepSeti - Breakthrough Listen Deep Learning Search Tool

This is a python implementation of DeepSeti - an algorithm designed to detect anomalies for radio telescope data open sourced by Breakthrough Listen. These python scripts facilitates the custom architecture and training loops required for the DeepSeti algorithm to preform a multichannel search for anomalies. Main objective is to develop software that increases the computational sensitivity and speed to search for 'unknown' anomalies. NOTE: Currently this code only works for MID-RES filterbank and h5 files. Developments made further will preforming tests on full res products from GBT.

alt text

Introduction

The purpose of this algorithm is to help detect anomalies within the GBT dataset from Breakthrough Listen. The code demonstrates a potential method in accelerating ML SETI in large unlabeled datasets. This approach is an extension from the original paper [https://arxiv.org/pdf/1901.04636.pdf] by looking into preforming the final classification on the encoded feature vector by taking a triplet loss between an anchor, and positive or negative samples.

Deep Learning Architecture

What makes this algorithm unique is that it injects an encoder, thats been previously trained on a classification dataset, into an autoencoder trained through unsupervised techniques. This method relies on a inital small labeled dataset where it is intermediately trained a CNN-LSTM classifier then injected it into the CNN-LSTM Auto Encoder.

Rationale: The goal is to force the feature selection from CNN's to search for those desired labels while the unsupervised method gives it the “freedom” to familiarize with "normal data" and detect novel anomalies beyond the small labeled dataset. Both the supervised and unsupervised models are executed together and model injections occur intermittently.

Reference diagram below

Preliminary - Results

From current tests done on new data, it was able to generalize to a variety of usecases. The image below shows how sensitive the algorithm is to small and weak beams across multiple channels. Despite never being trained on the sinusoid beam on the left [sample A], it was able to detect the anomaly out of the dataset. This shows promise in the intended use case of the algorithm.

The example detection is made by taking the MSE / euclidean distance between the two vectors, an anchor vector and a unknown vector. These spikes within the data are seen as anomolies by the algorithm giving us these two detections.

Round 1 - 2 Terabytes Search @BreakthroughListen April 08 2020

With the first round execution, the algorithm searched through the first 2 terabytes worth of Breakthrough listen Data in search for signs of "intelligence". Over the 20 hour compute time, this signal was perhaps the strangest amongst its finds! Further analysis needed. But promising search.

If you are an astronomer and would like to want to see the results of the first round searches, checkout the folder titled first round! The complete csv is also avliable.

Round 2 - 4 Terabytes Search - Coming soon!

Updates coming soon!!

How To Use The Algorithm

Some features are still under construction, however you can test the current powers of this algorithm following this simple guide below. ** Note: This will require Blimpy and Setigen to opperate properly.** Install these requirements by running the following commands in the terminal in your python enviroment.

pip3 install -r requirements.txt

After getting the following setup. Download a radio observation from the UC Berkeley SETI open database. [http://seti.berkeley.edu/opendata]. Or get a test sample by typing this command...

wget http://blpd13.ssl.berkeley.edu/dl/GBT_58402_66623_NGC5238_mid.h5

Following that all you need to do is clone the repository, and navigate into the folder where its cloned in.

git clone https://github.com/PetchMa/DeepSeti.git

Once you're within the cloned folder, copy the code block into a new python script. Fill in the mising directories, and you can train a model on your custom data. Note: you can also load a pretrained model called encoder_injection_model_cudda.h5 which has been trained on 500,000 radio samples. Keep in mind this requires CUDDA supported devices + drivers. Try the vanillia encoder_injection_model(1).h5 without Cudda support.

%tensorflow_version 1.x
import tensorflow
from DeepSeti import DeepSeti

DeepSeti = DeepSeti()
DeepSeti.load_model_function(model_location='/content/encoder_injected_model_CUDA_04-13-2020.h5')
DeepSeti.load_anchor(anchor_location='/content/GBT_58402_66967_HIP66130_mid.h5')
print("Model Loaded... Executing search loop")
search_return = DeepSeti.prediction(test_location='/content/'+file_download,  
                top_hits=1, target_name=file_download,
                output_folder='/content/drive/My Drive/Deeplearning/SETI/output_folder/',
                numpy_folder='/content/drive/My Drive/Deeplearning/SETI/numpy_output_folder/')   

This example will search for the most confident candidates and return saved images of these candidates. You can checkout an example notebook that loads queries the database EXAMPLE => [https://github.com/PetchMa/DeepSeti/blob/master/Examples/DeepSeti_Engine.ipynb]

Next Steps

In the next bit, this project will be scaled up to TPU's to allow for training on larger datasets. Goal is to train the model on over 1 million radio samples! Follow updates on my twitter! [https://twitter.com/peterma02] Feel free to reach out to me by email: [email protected] if you have any questions or concerns.

Acknowledgements

Special thanks to the UC Berkeley SETI Research Team for the continual support. => list of their work here: [https://github.com/UCBerkeleySETI] Also thanks to Breakthrough Listen and Green Bank Telescope for making the data openly accessible. More on them here: [https://breakthroughinitiatives.org/initiative/1]

deepseti's People

Contributors

dependabot[bot] avatar petchma avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

deepseti's Issues

Implementing DeepSeti

Hey, I love this project! Would you be interested in seeing how we could collaborate to get this implemented in the data of our instruments? The data stream consists of FFT samples (a waterfall basically), but raw I/Q data can also be passed to the classifier if that's more convenient (probably not, looking at the structure of the repository). I've also sent you a mail (check Twitter too 🙂 ).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.