Giter VIP home page Giter VIP logo

mapmetasvsrefs's Introduction

Mapping Oligotrophic Metagenomes to Reference Genomes

Sarah Stevens, Elizabeth McDaniel, Matthew Wolff

This pipeline calculates the coverage and ANI for each metagenome included to each reference genome

Directory Structure:

| - metagenomes/ : directory to place (or link) all the metagenomes to include in analysis
| - refGenomes/ : directory to place (or link) all the reference genomes to include in analysis
| - scripts/ : directory that contains scripts for analysis
| - mappingResults/ : directory that holds the resulting sam files from mapping
| - runAll.sh : script that runs pipeline
| - resetFiles.sh : script that removed intermediate files to reset the repo
| - setup.sh : setups the directory structure to start with
| - Readme.md : this helpful file

Requirements

  • Samtools
  • BBMap
  • Python 2.7
  • Python Modules:
    • multiprocessing
    • pandas

Setup

To set up the directory structure run

./setup.sh

Then:

  • place all the metagenomes(fasta type files) you are want to map in metagenomes/
  • place all of the reference genomes you want to map to in refGenomes/

Running all mapping

To start you may need to open runAll.sh and set the bbpath variable to where the bbmap software is located (relative to this repo)
Default it thinks that the bbmap directory is one above this and that the bbmap.sh is within that directory.
Run all analysis using the following command:

./runAll.py threads memlimit

Arguments (very naive and only use positionals):

  • threads = number of threads to use (default=10)
  • memlimit = java memory limit for each mapping job (default=4g)

Makes nice logfiles with dates like this:

nohup bash runAll.sh thread memlimit > $(echo $(date +%Y%m%d_%H%M%S))_nohup.log 2> $(echo $(date +%Y%m%d_%H%M%S))_nohup.err &

Example w/ 20 threads and 4g memory each:

nohup bash runAll.sh 20 4g > $(echo $(date +%Y%m%d_%H%M%S))_nohup.log 2> $(echo $(date +%Y%m%d_%H%M%S))_nohup.err &

Mapping default arguments (see bbmap for details):

  • idtag
  • minid=.8
  • threads=1 - WARNING this does not seem to limit it to 1 CPU. If using shared resource, be the only one using it at that time.
  • nodisk
  • -Xmx4g (unless changed with runAll.sh argument) To change these settings (change 'cmd=...' line in runMapping.py)

Output files

  • refGenomeList.txt - List of all the reference genomes runAll.sh last ran on

  • metagenomeList.txt - List of all the metagenomes runAll.sh last ran on

  • mappingCombos.txt - All of the combinations of mapping metagenomes to reference genomes that runAll.sh last ran on

  • mappingResults/ - directory that stores all of the mapping results files (.bam)

    • *.bam - all the output files from all the combinations of mapping metagenomes to reference genomes
  • *.depth - the resulting depth (for each base) for all of the combinations of mapping metagenomes to reference genomes

  • resultingPIDs.txt - All the lines from the *.bam (converted to sam) that contain the percent identity (PID) information

  • parsedPID.txt - All of the percent identity hits with the info about which file they came from which meta vs which reference

  • coverage.txt - The number of reads that mapped from each metagenome to each reference genome and the average coverage of each base.

Resetting files

To reset repo use:

./resetFiles.sh

If you want to remove the files form mappingResults, as well:

./resetFiles.sh True

mapmetasvsrefs's People

Contributors

matthewwolff avatar sstevens2 avatar

Watchers

 avatar  avatar  avatar

mapmetasvsrefs's Issues

With Makefile...args?

I think we will need to update the README with the new way to run it with make but...as far as I know you can't change the args for runAll.sh if you run it with make as is.
@MatthewWolff and @elizabethmcd, opinions?
Which do you prefer?

  1. I update the README so you can either run it with make for defaults and then have people use the runAll.sh if you change the arguments.
  2. Someone figures out how to add arguments in make. Puts finger on nose Not it! ๐Ÿ™ƒ

refGenomes.len file location?

All references to refGenomes.len indicate it should be in the main directory, but there aren't any refGenome/*.len files being generated to create the refGenomes.len file, and thus it's always 0 bytes

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.