benjjneb / lrasmanuscript Goto Github PK

Reproducible Analyses accompanying DADA2 + PacBio Manuscript

HTML 99.98% Shell 0.02%

lrasmanuscript's Introduction

Reproducible Analyses from the Manuscript Introducing DADA2 + PacBio

This repository hosts the reproducible workflow that performed the analyses presented in the manuscript "High-throughput amplicon sequencing of the full-length 16S rRNA gene with single-nucleotide resolution" by Callahan et al. Nucleic Acids Research, 2019.

Rmarkdown documents are hosted in the root directory. The input sequencing data is not included in the repository for size reasons, and is instead available from the SRA under Bioproject accession PRJNA521754. Auxiliary data is included in the Docs/ directory, RDS files holding intermediate data objects suitable for performing the analyses of the processed sequencing data are in the RDS/ directory, and figures created by the Rmarkdown documents are in the Figures/ directory.

You can run these analyses on your own machine by (1) cloning the repository, (2) obtaining the raw sequencing data, (3) modifying the paths defined at the start of each Rmd document, (4) installing required libraries, and (5) pressing Run! Even without the sequencing data, the analysis portion of each Rmarkdown document can be run using the stored data objects in the RDS/ directory.

These Rmarkdown documents have also been rendered into html format, and can be viewed in your web browser:

DADA2 and PacBio

The dada2 R package is available through GitHub and Bioconductor. Full PacBio functionality was introduced in version 1.9.1, with additional improvements in subsequent releases.

dada2 GitHub repository: https://github.com/benjjneb/dada2
dada2 Bioconductor page: https://www.bioconductor.org/packages/release/bioc/html/dada2.html
dada2 website: https://benjjneb.github.io/dada2/

The dada2 R package is maintained by Benjamin Callahan (benjamin DOT j DOT callahan AT gmail DOT com). Twitter: @bejcal

lrasmanuscript's People

Contributors

Stargazers

Watchers

Forkers

kant haythem-abdelkhalek guokai8 yilmazbah mengqingren byjediaz bsalehe sramani-spark adalisan wangzhichao1990 yuhe-kan

lrasmanuscript's Issues

Error while primer removal

I get the following error message when i try to filter out the primers of my PacBio Reads when i execute the loop:
(Error in sapply(match.fwd, end) + 1 :
non-numeric argument to binary operator)

I am currently using the following script:

#setting the path and assign the files to list
path <- "~/OneDrive - Syddansk Universitet/HzsA/packed/"
list.files(path)
fns <- list.files(path, pattern="fastq.gz", full.names=TRUE)

#assign primers
FWD <- "GAGCACGTAGGTGGGTTTGT"
REV <- "AAAACCCCTCTACTTAGTGCCC"

rc <- dada2:::rc

nops <- file.path(path, "noprimers", basename(fns))
for(i in seq_along(fns)) {
fn <- fns[[i]]; nop <- nops[[i]]
dada2:::removePrimers(fn, nop, primer.fwd=FWD, primer.rev=dada2:::rc(REV), orient=TRUE)
}

Do you have any idea what is causing the error?

cheers and thanks for the splendid support you provide,

Clemens

SRA accession

Hello,
I am wondering if you could help me with the SRA accession to get the data files.
Thanks,

where is the example data: m150206_s1_p0_1_ccs_minpass_2_minprdaccry_99.fastq.gz

As shown in the page: DADA2 + PacBio: S. aureus from Wagner et al. 2016, there is an example data. How to download it?

fn <- file.path(path, "m150206_s1_p0_1_ccs_minpass_2_minprdaccry_99.fastq.gz")

Error in primer removal error

Hello Dr. Benjamin,

I am having an issue while removing primers. The code is used for a set of data and I am using the same code for another set of data of the same nature. I ran into similar issues some had earlier in 2019 on your page and the problem there was wrong primers. But here, for my case, the primers were not removed with other tools and the code worked for some data of the same nature.

setwd("/Users/kishanmahmud/Desktop/Soil Microbiome Data/Non Toxic Endo")

path1 <- "ntcbind"
path2 <- "ntcbind"
path.out <- "Figures/"
path.rds <- "RDS/"
fns1 <- list.files(path1, pattern="fastq.gz", full.names=TRUE)
fns2 <- list.files(path2, pattern="fastq.gz", full.names=TRUE)
F27 <- "AGRGTTYGATYMTGGCTCAG"
R1492 <- "RGYTACCTTGTTACGACTT"
rc <- dada2:::rc
theme_set(theme_bw())
nops2 <- file.path(path2, "noprimers", basename(fns1))
prim2 <- removePrimers(fns1, nops2, primer.fwd=F27, primer.rev=dada2:::rc(R1492), orient=TRUE)

It is giving me a nopimers folder with fastq.gz files but also giving me this message.
"Error in sapply(match.fwd, end) + 1 :
non-numeric argument to binary operator"

Your help is requested. Thanks.

Best
Kishan

Sample names in fecal.Rmd

Hi! I am trying to run the code below line 138. In theory reading the st2 and tax2 RDS should be enough, but in line 154 sample.names2 is needed and it is not in the environment. Since the initial raw files are not available yet I am stuck in this spot. Thanks

Running LRASms_Zymo.Rmd and hung up at learnErrors()

Hi benjjneb,

I've downloaded the fastq.gz from NCBI pertaining to zymo_CCS_99.9.fastq.gz (SRR9089357), and going through the LRASms_Zyme.Rmd and i'm hung up on line ~92, err <- learnErrors(drp, BAND_SIZE=32, multithread=TRUE, errorEstimationFunction=dada2:::PacBioErrfun)

The output is: Error in getErrors(err, enforce = TRUE) : Error matrix is NULL.

the dada2 version is 1.30.0

I'm comparing the my running Rmd to your github LRASms_Zymo.html and I'm noticing some differences:

track <- fastqFilter(nop, filt, minQ=3, minLen=1000, maxLen=1600, maxN=0, rm.phix=FALSE, maxEE=2, verbose=TRUE)
Warning: '.\Zymo\noprimers\filtered' already existsOverwriting file:./Zymo//noprimers/filtered/zymo_CCS_99_9.fastq.gz
Read in 73057, output 72940 (99.8%) filtered sequences.

drp <- derepFastq(filt, verbose=TRUE)
Dereplicating sequence entries in Fastq file: ./Zymo//noprimers/filtered/zymo_CCS_99_9.fastq.gz
Encountered 22309 unique sequences from 72940 total sequences read.

err <- learnErrors(drp, BAND_SIZE=32, multithread=TRUE, errorEstimationFunction=dada2:::PacBioErrfun)
Error in getErrors(err, enforce = TRUE) : Error matrix is NULL.

I do not know if higher output reads 72940 (99.8%) vs 69367(94.9%) has anything to do with the ultimate learnErrors() error.

I had a similar situation running the LRASms_fecal.Rmd as well.

thank you for your time,
Mark

benjjneb / lrasmanuscript Goto Github PK

lrasmanuscript's Introduction

Reproducible Analyses from the Manuscript Introducing DADA2 + PacBio

DADA2 and PacBio

lrasmanuscript's People

Contributors

Stargazers

Watchers

Forkers

lrasmanuscript's Issues

Error while primer removal

SRA accession

where is the example data: m150206_s1_p0_1_ccs_minpass_2_minprdaccry_99.fastq.gz

Error in primer removal error

Sample names in fecal.Rmd

Running LRASms_Zymo.Rmd and hung up at learnErrors()

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent