Giter VIP home page Giter VIP logo

lrasmanuscript's Introduction

Reproducible Analyses from the Manuscript Introducing DADA2 + PacBio

This repository hosts the reproducible workflow that performed the analyses presented in the manuscript "High-throughput amplicon sequencing of the full-length 16S rRNA gene with single-nucleotide resolution" by Callahan et al. Nucleic Acids Research, 2019.

Rmarkdown documents are hosted in the root directory. The input sequencing data is not included in the repository for size reasons, and is instead available from the SRA under Bioproject accession PRJNA521754. Auxiliary data is included in the Docs/ directory, RDS files holding intermediate data objects suitable for performing the analyses of the processed sequencing data are in the RDS/ directory, and figures created by the Rmarkdown documents are in the Figures/ directory.

You can run these analyses on your own machine by (1) cloning the repository, (2) obtaining the raw sequencing data, (3) modifying the paths defined at the start of each Rmd document, (4) installing required libraries, and (5) pressing Run! Even without the sequencing data, the analysis portion of each Rmarkdown document can be run using the stored data objects in the RDS/ directory.

These Rmarkdown documents have also been rendered into html format, and can be viewed in your web browser:

DADA2 and PacBio

The dada2 R package is available through GitHub and Bioconductor. Full PacBio functionality was introduced in version 1.9.1, with additional improvements in subsequent releases.

The dada2 R package is maintained by Benjamin Callahan (benjamin DOT j DOT callahan AT gmail DOT com). Twitter: @bejcal

lrasmanuscript's People

Contributors

benjjneb avatar kant avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

lrasmanuscript's Issues

Error while primer removal

Hi

I get the following error message when i try to filter out the primers of my PacBio Reads when i execute the loop:
(Error in sapply(match.fwd, end) + 1 :
non-numeric argument to binary operator)

I am currently using the following script:

#setting the path and assign the files to list
path <- "~/OneDrive - Syddansk Universitet/HzsA/packed/"
list.files(path)
fns <- list.files(path, pattern="fastq.gz", full.names=TRUE)

#assign primers
FWD <- "GAGCACGTAGGTGGGTTTGT"
REV <- "AAAACCCCTCTACTTAGTGCCC"

rc <- dada2:::rc

nops <- file.path(path, "noprimers", basename(fns))
for(i in seq_along(fns)) {
fn <- fns[[i]]; nop <- nops[[i]]
dada2:::removePrimers(fn, nop, primer.fwd=FWD, primer.rev=dada2:::rc(REV), orient=TRUE)
}

Do you have any idea what is causing the error?

cheers and thanks for the splendid support you provide,

Clemens

SRA accession

Hello,
I am wondering if you could help me with the SRA accession to get the data files.
Thanks,

Error in primer removal error

Hello Dr. Benjamin,

I am having an issue while removing primers. The code is used for a set of data and I am using the same code for another set of data of the same nature. I ran into similar issues some had earlier in 2019 on your page and the problem there was wrong primers. But here, for my case, the primers were not removed with other tools and the code worked for some data of the same nature.

setwd("/Users/kishanmahmud/Desktop/Soil Microbiome Data/Non Toxic Endo")

path1 <- "ntcbind"
path2 <- "ntcbind"
path.out <- "Figures/"
path.rds <- "RDS/"
fns1 <- list.files(path1, pattern="fastq.gz", full.names=TRUE)
fns2 <- list.files(path2, pattern="fastq.gz", full.names=TRUE)
F27 <- "AGRGTTYGATYMTGGCTCAG"
R1492 <- "RGYTACCTTGTTACGACTT"
rc <- dada2:::rc
theme_set(theme_bw())
nops2 <- file.path(path2, "noprimers", basename(fns1))
prim2 <- removePrimers(fns1, nops2, primer.fwd=F27, primer.rev=dada2:::rc(R1492), orient=TRUE)

It is giving me a nopimers folder with fastq.gz files but also giving me this message.
"Error in sapply(match.fwd, end) + 1 :
non-numeric argument to binary operator"

Your help is requested. Thanks.

Best
Kishan

Sample names in fecal.Rmd

Hi! I am trying to run the code below line 138. In theory reading the st2 and tax2 RDS should be enough, but in line 154 sample.names2 is needed and it is not in the environment. Since the initial raw files are not available yet I am stuck in this spot. Thanks

Running LRASms_Zymo.Rmd and hung up at learnErrors()

Hi benjjneb,

I've downloaded the fastq.gz from NCBI pertaining to zymo_CCS_99.9.fastq.gz (SRR9089357), and going through the LRASms_Zyme.Rmd and i'm hung up on line ~92, err <- learnErrors(drp, BAND_SIZE=32, multithread=TRUE, errorEstimationFunction=dada2:::PacBioErrfun)

The output is: Error in getErrors(err, enforce = TRUE) : Error matrix is NULL.

the dada2 version is 1.30.0

I'm comparing the my running Rmd to your github LRASms_Zymo.html and I'm noticing some differences:

track <- fastqFilter(nop, filt, minQ=3, minLen=1000, maxLen=1600, maxN=0, rm.phix=FALSE, maxEE=2, verbose=TRUE)
Warning: '.\Zymo\noprimers\filtered' already existsOverwriting file:./Zymo//noprimers/filtered/zymo_CCS_99_9.fastq.gz
Read in 73057, output 72940 (99.8%) filtered sequences.

drp <- derepFastq(filt, verbose=TRUE)
Dereplicating sequence entries in Fastq file: ./Zymo//noprimers/filtered/zymo_CCS_99_9.fastq.gz
Encountered 22309 unique sequences from 72940 total sequences read.

err <- learnErrors(drp, BAND_SIZE=32, multithread=TRUE, errorEstimationFunction=dada2:::PacBioErrfun)
Error in getErrors(err, enforce = TRUE) : Error matrix is NULL.

I do not know if higher output reads 72940 (99.8%) vs 69367(94.9%) has anything to do with the ultimate learnErrors() error.

I had a similar situation running the LRASms_fecal.Rmd as well.

thank you for your time,
Mark

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.