Giter VIP home page Giter VIP logo

sars-cov-2-rbd_delta's Introduction

Mutational antigenic profiling of the Delta SARS-CoV-2 RBD

The effects of mutations on antibody and serum binding to the RBD from the Delta (B.1.617.2) SARS-CoV-2 variant.

Study and analysis by Allie Greaney, Tyler Starr, Jesse Bloom, and co-authors.

Summary of workflow and results

For a summary of the workflow and links to key results files, click here. Reading this summary is the best way to understand the analysis.

Running the analysis

The analysis consists of three components, all of which are contained in this repository:

  1. Instructions to build the computing environment.

  2. The computer code itself.

  3. The required input data.

Configure .git to not track Jupyter notebook metadata

To simplify git tracking of Jupyter notebooks, we have added the filter described here to strip notebook metadata to .gitattributes and .gitconfig. The first time you check out this repo, run the following command to use this configuration (see here):

git config --local include.path ../.gitconfig

Then don't worry about it anymore.

Build the computing environment

First, set up the computing environment, which is partially done via mamba, which is essentially a faster conda. Ensure you have mamba installed; if not install it via Miniconda as described here. If you have not previously built the conda environment, then build the environment specified in environment.yml to ./env with:

mamba env create -f environment.yml -p ./env

After building the environment, simply activate it with conda with:

conda activate ./env

Setting up the conda environment above installs everything to run all parts of the analysis except the R markdown notebooks. For those, the pipeline currently uses the Fred Hutch computing cluster module R/3.6.2-foss-2019b as specified in Snakefile. That module is not packaged with this repo, so if you aren't on the Fred Hutch cluster you'll have to create a similar R environment yourself (all the R packages are listed at the beginning of their output in the summary results.

Input data

The input data are specified in ./data/; see the README in that subdirectory for more details.

Running the code

Now you can run the entire analysis. The analysis consists primarily of a series of Jupyter notebooks and R markdown in to the top-level directory along with some additional code in Snakefile. You can run the analysis by using Snakemake to run Snakefile:

snakemake --use-conda --conda-prefix ./env

However, you probably want to use the server to help with computationally intensive parts of the analysis. To run using the cluster configuration for the Fred Hutch server, simply run the bash script run_Hutch_cluster.bash, which executes Snakefile in a way that takes advantage of the Hutch server resources. This bash script also automates the environment building steps above, so really all you have to do is run this script. You likely want to submit run_Hutch_cluster.bash itself to the cluster (since it takes a while to run) with:

sbatch -t 7-0 run_Hutch_cluster.bash

Configuring the analysis

The configuration for the analysis is specified in config.yaml. This file defines key variables for the analysis, and should be relatively self-explanatory. You should modify the analysis by changing this configuration file; do not hard-code crucial experiment-specific variables within the Jupyter notebooks or Snakefile.

The input files pointed to by config.yaml are in the ./data/ subdirectory. See the ./data/README.md file for details.

Note that the raw sequencing data are on the SRA in BioProject PRJNA639956 as well as on the Hutch cluster.

Cluster configuration

There is a cluster configuration file cluster.yaml that configures Snakefile for the Fred Hutch cluster, as recommended by the Snakemake documentation. The run_Hutch_cluster.bash script uses this configuration to run Snakefile. If you are using a different cluster than the Fred Hutch one, you may need to modify the cluster configuration file.

Notebooks that perform the analysis

The Jupyter notebooks and R markdown scripts that perform most of the analysis are in this top-level directory with the extension *.ipynb or *.Rmd. These notebooks read the key configuration values from config.yaml.

There is also a ./scripts/ subdirectory with related scripts.

The notebooks need to be run in the order described in the workflow and results summary. This will occur automatically if you run them via Snakefile as described above.

Results

Results are placed in the ./results/ subdirectory. Many of the files created in this subdirectory are not tracked in the git repo as they are very large. However, key results files are tracked as well as a summary that shows the code and results. Click here to see that summary.

The large results files are tracked via git-lfs. This requires git-lfs to be installed, which it is in the conda environment specified by environment.yml. The following commands were then run:

git lfs install

You may need to run this if you are tracking these files and haven't installed git-lfs in your user account. Then the large results files were added for tracking with:

git lfs track <FILENAME>

sars-cov-2-rbd_delta's People

Contributors

ajgreaney avatar tylernstarr avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.