Giter VIP home page Giter VIP logo

viral-mutation's Introduction

Learning the language of viral evolution and escape

This repository contains the analysis code, links to the data, and pretrained models for the paper "Learning the language of viral evolution and escape" by Brian Hie, Ellen Zhong, Bonnie Berger, and Bryan Bryson (2021).

Data

You can download the relevant datasets (including training and validation data) using the commands

wget http://cb.csail.mit.edu/cb/viral-mutation/data.tar.gz
tar xvf data.tar.gz

within the same directory as this repository.

Dependencies

The major Python package requirements and their tested versions are in requirements.txt.

Our experiments were run with Python version 3.7 on Ubuntu 18.04.

Experiments

Key results from our experiments can be found in the results/ directory and can be reproduced with the commands below. The models/ directory contains key pretrained models used in our analyses.

To run the experiments below, download the data (instructions above). Our experiments require a maximum of 400 GB of CPU RAM and 32 GB of GPU RAM (though often much less); in silico fitness and escape model inference can take around 35 minutes for influenza HA, 90 minutes for HIV Env, and 10 hours for SARS-CoV-2 Spike.

Influenza HA

Influenza HA semantic embedding UMAPs and log files with statistics can be generated with the command

python bin/flu.py bilstm --checkpoint models/flu.hdf5 --embed \
    > flu_embed.log 2>&1

Single-residue escape prediction using validation data from Doud et al. (2018) and Lee et al. (2019) can be done with the command

python bin/flu.py bilstm --checkpoint models/flu.hdf5 --semantics \
    > flu_semantics.log 2>&1

Combinatorial fitness experiments measuring correlation with grammaticality and semantic change using data from Doud and Bloom (2016) and from Wu et al. (2020) can be done with the command

python bin/flu.py bilstm --checkpoint models/flu.hdf5 --combfit \
    > flu_combfit.log 2>&1

Training a new model on flu HA sequences can be done with the command

python bin/flu.py bilstm --train --test \
    > flu_train.log 2>&1

HIV Env

HIV Env semantic embedding UMAPs and log files with statistics can be generated with the command

python bin/hiv.py bilstm --checkpoint models/hiv.hdf5 --embed \
    > hiv_embed.log 2>&1

Single-residue escape prediction using validation data from Dingens et al. (2019) can be done with the command

python bin/hiv.py bilstm --checkpoint models/hiv.hdf5 --semantics \
    > hiv_semantics.log 2>&1

Combinatorial fitness experiments measuring correlation with grammaticality and semantic change using data from Haddox et al. (2018) can be done with the command

python bin/hiv.py bilstm --checkpoint models/hiv.hdf5 --combfit \
    > hiv_combfit.log 2>&1

Training a new model on HIV Env sequences can be done with the command

python bin/hiv.py bilstm --train --test \
    > hiv_train.log 2>&1

SARS-CoV-2 Spike

Coronavirus spike semantic embedding UMAPs and log files with statistics can be generated with the command

python bin/cov.py bilstm --checkpoint models/cov.hdf5 --embed \
    > cov_embed.log 2>&1

Single-residue escape prediction using validation data from Baum et al. (2020) and DMS data from Greaney et al. (2020) can be done with the command

python bin/cov.py bilstm --checkpoint models/cov.hdf5 --semantics \
    > cov_semantics.log 2>&1

Combinatorial fitness experiments measuring correlation with grammaticality and semantic change using data from Starr et al. (2020) can be done with the command

python bin/cov.py bilstm --checkpoint models/cov.hdf5 --combfit \
    > cov_combfit.log 2>&1

Experiments measuring grammaticality and semantic change of a SARS-CoV-2 reinfection event documented by To et al. (2020) can be done with the command

python bin/cov.py bilstm --checkpoint models/cov.hdf5 --reinfection \
    > cov_reinfection.log 2>&1

Training a new model on coronavirus spike sequences can be done with the command

python bin/cov.py bilstm --train --test \
    > cov_train.log 2>&1

Questions

For questions about the pipeline and code, contact [email protected]. We will do our best to provide support, address any issues, and keep improving this software. And do not hesitate to submit a pull request and contribute!

viral-mutation's People

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.