Giter VIP home page Giter VIP logo

deeprc's Introduction

Modern Hopfield Networks and Attention for Immune Repertoire Classification

Michael Widrich1, Bernhard Schäfl1, Milena Pavlović3 4, Hubert Ramsauer1, Lukas Gruber1, Markus Holzleitner1, Johannes Brandstetter1, Geir Kjetil Sandve4, Victor Greiff3, Sepp Hochreiter1 2, Günter Klambauer1

(1) ELLIS Unit Linz and LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria
(2) Institute of Advanced Research in Artificial Intelligence (IARAI)
(3) Department of Immunology, University of Oslo, Oslo, Norway
(4) Department of Informatics, University of Oslo, Oslo, Norway

Paper: https://arxiv.org/abs/2007.13505

Quickstart

conda

Conda setup:

conda env create -f condal_install.yml --name deeprdc_env
conda activate deeprc_env

pip

Alternatively, can install via pip:

pip install --no-dependencies git+https://github.com/widmi/widis-lstm-tools
pip install git+https://github.com/ml-jku/DeepRC

To update your installation with dependencies, you can use:

pip install --no-dependencies git+https://github.com/widmi/widis-lstm-tools
pip install --upgrade git+https://github.com/ml-jku/DeepRC

To update your installation without dependencies, you can use:

pip install --no-dependencies git+https://github.com/widmi/widis-lstm-tools
pip install --no-dependencies --upgrade git+https://github.com/ml-jku/DeepRC

Usage

Training DeepRC on pre-defined datasets

You can train a DeepRC model on the pre-defined datasets of the DeepRC paper using one of the Python files in folder deeprc/examples. The datasets will be downloaded automatically.

You can use tensorboard --logdir [results_directory] --port=6060 and open http://localhost:6060/ in your web-browser to view the performance.

Real-world data with implanted signals

This is category has the smallest dataset files and is a good starting point. Training a binary DeepRC classifier on dataset "0" of category "real-world data with implanted signals":

python3 -m deeprc.examples.simple_cmv_with_implanted_signals 0 --n_updates 10000 --evaluate_at 2000

To get more information, you can use the help function:

python3 -m deeprc.examples.simple_cmv_with_implanted_signals -h
LSTM-generated data

Training a binary DeepRC classifier on dataset "0" of category "LSTM-generated data":

python3 -m deeprc.examples.simple_lstm_generated 0
Simulated immunosequencing data

Training a binary DeepRC classifier on dataset "0" of category "simulated immunosequencing data":

python3 -m deeprc.examples.simple_lstm_generated 0

Warning: Filesize to download is ~20GB per dataset!

Real-world data

Training a binary DeepRC classifier on dataset "real-world data":

python3 -m deeprc.examples.simple_cmv

Training DeepRC on a custom dataset

You can train DeepRC on custom text-based datasets, which will be automatically converted to hdf5 containers. Specifications of the supported formats are give here: deeprc/datasets/README.md

from deeprc.deeprc_binary.dataset_readers import make_dataloaders
from deeprc.deeprc_binary.architectures import DeepRC
from deeprc.deeprc_binary.training import train, evaluate

# Let's assume this is your dataset metadata file
metadatafile = 'custom_dataset/metadata.tsv'

# Get data loaders from text-based dataset (see `deeprc/datasets/README.md` for format)
trainingset, trainingset_eval, validationset_eval, testset_eval = make_dataloaders(
    metadatafile, target_label='status', true_class_label_value='+', id_column='ID', 
    single_class_label_columns=('status',), sequence_column='amino_acid',
    sequence_counts_column='templates', column_sep='\t', filename_extension='.tsv')

# Train a DeepRC model
model = DeepRC(n_input_features=23, n_output_features=1, max_seq_len=30)
train(model, trainingset_dataloader=trainingset, trainingset_eval_dataloader=trainingset_eval,
      validationset_eval_dataloader=validationset_eval, results_directory='results')

# Evaluate on test set
roc_auc, bacc, f1, scoring_loss = evaluate(model=model, dataloader=testset_eval)

print(f"Test scores:\nroc_auc: {roc_auc:6.4f}; bacc: {bacc:6.4f}; f1:{f1:6.4f}; scoring_loss: {scoring_loss:6.4f}")

Note that make_dataloaders() will automatically create a hdf5 container of your dataset. Later, you can simply load this hdf5 container instead of the text-based dataset:

from deeprc.deeprc_binary.dataset_readers import make_dataloaders
# Get data loaders from hdf5 container
trainingset, trainingset_eval, validationset_eval, testset_eval = make_dataloaders('dataset.hdf5')

You can use tensorboard --logdir [results_directory] --port=6060 and open 'http://localhost:6060/' in your web-browser to view the performance.

Structure

deeprc
      |--datasets : stores datasets
      |   |--README.md : Information on supported dataset formats
      |--deeprc_binary : DeepRC implementation for binary classification
      |   |--architectures.py : DeepRC network architecture
      |   |--dataset_converters.py : Converter for text-based datasets
      |   |--dataset_readers.py : Tools for reading datasets
      |   |--predefined_datasets.py : Pre-defined datasets from paper
      |   |--training.py : Tools for training DeepRC model
      |--examples : DeepRC examples

Note

We are currently cleaning up and uploading the code for the paper. Baseline methods, contribution analysis, LSTM embedding, and other features will follow soon.

Requirements

deeprc's People

Contributors

bschaefl avatar widmi avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.