schlosslab / schloss_pacbio16s_peerj_2015 Goto Github PK

Repository to accompany "Sequencing 16S rRNA gene fragments using the PacBio SMRT DNA sequencing system"

License: MIT License

Shell 60.83% R 39.17%

schloss_pacbio16s_peerj_2015's Introduction

The paper

This is the repository for the manuscript "Sequencing 16S rRNA gene fragments using the PacBio SMRT DNA sequencing system" written by Patrick D. Schloss, Sarah L. Westcott, Matthew L. Jenior, and Sarah K. Highlander.The raw data can be obtained from the Sequence Read Archive at NCBI under accession SRP051686, which are associated with BioProject PRJNA271568 and the processed movies can be obtained using the commands in mothur_raw.bash. The entire manuscript can be generated on a Mac OS X or Linux computer by running the following commands:

git clone https://github.com/SchlossLab/PacBio_16S.git
cd PacBio_16S
sh write.paper

In running these commands we assume that you have a copy of R and mothur installed in your PATH. You can then convert the resulting markdown file to a Word docx file by doing:

sh compile_docx.sh PacBioManuscript.md
open PacBioManuscript.docx

Background

We ran 8 SMRT cells and their data were located in the following folders that were housed within a folder called raw_data:

B01_1_Cell2_PacBioRun109_PSchloss311_p15  
C01_1_Cell3_PacBioRun109_PSchloss315_p16  
D01_1_Cell4_PacBioRun108_PSchloss310_p13  
D01_1_Cell4_PacBioRun109_PSchloss316_p4  
E01_1_Cell5_PacBioRun108_PSchloss311_p15  
E01_1_Cell5_PacBioRun109_PSchloss317_p19  
F01_1_Cell6_PacBioRun108_PSchloss312_p35  
H01_1_Cell8_PacBioRun112_PSchloss319_p19

If you substitute the "_p" for "_v" you'll see what region each run represented. There were two V15 runs and two V19 runs. These regions were amplified with the following primer sets:

V19	AGRGTTTGATYMTGGCTCAG	GGYTACCTTGTTACGACTT  
V16	AGRGTTTGATYMTGGCTCAG	ACRACACGAGCTGACGAC  
V15	AGRGTTTGATYMTGGCTCAG	CCCGTCAATTCMTTTRAGT  
V13	AGRGTTTGATYMTGGCTCAG	ATTACCGCGGCTGCTGG  
V35	CCTACGGGAGGCAGCAG	CCCGTCAATTCMTTTRAGT  
V4	 GTGCCAGCMGCCGCGGTAA	GGACTACHVGGGTWTCTAAT

We also put sequencing indices at the 5' and 3' of the inserts to correspond to the different samples:

mock1	ccaac	ccaac  
mock2	ggttg	ccaac  
mock3	ttggt	ccaac  
mouse1	aggtg	ccaac  
mouse2	cttac	ccaac  
mouse3	gaact	ccaac  
human1	tccga	ccaac  
human2	acagt	ccaac  
human3	cactg	ccaac  
soil1	aacca	ccaac  
soil2	tgtca	ccaac  
soil3	aaacc	ccaac

Putting it all together in an oligos file (pacbio.oligos), we get...

primer	AGRGTTTGATYMTGGCTCAG	GGYTACCTTGTTACGACTT	v19  
primer	AGRGTTTGATYMTGGCTCAG	ACRACACGAGCTGACGAC	v16  
primer	AGRGTTTGATYMTGGCTCAG	CCCGTCAATTCMTTTRAGT	v15  
primer	AGRGTTTGATYMTGGCTCAG	ATTACCGCGGCTGCTGG	v13  
primer	CCTACGGGAGGCAGCAG	CCCGTCAATTCMTTTRAGT	v35  
primer	GTGCCAGCMGCCGCGGTAA	GGACTACHVGGGTWTCTAAT	v4  
barcode	ccaac	ccaac	mock1  
barcode	ggttg	ccaac	mock2  
barcode	ttggt	ccaac	mock3  
barcode	aggtg	ccaac	mouse1  
barcode	cttac	ccaac	mouse2  
barcode	gaact	ccaac	mouse3  
barcode	tccga	ccaac	human1  
barcode	acagt	ccaac	human2  
barcode	cactg	ccaac	human3  
barcode	aacca	ccaac	soil1  
barcode	tgtca	ccaac	soil2  
barcode	aaacc	ccaac	soil3

schloss_pacbio16s_peerj_2015's People

Contributors

Stargazers

Watchers

Forkers

fw1121 xxz19900

schloss_pacbio16s_peerj_2015's Issues

Abstract

Missing V1-V5 in listing of regions that were analyzed

shell script

I am new to this field, so forgive me if I sounds stupid.

In running the write.paper shell script, I am seeing the following error.
It is looking for a reference file: v4/HMP_MOCK.filter.fasta that is not in my system.
/references/HMP_MOCK.fasta was found in the current directory of the project.
So what is the best way to fix this issue?

mothur > get.groups(fasta=v4/v4.trim.unique.good.filter.unique.fasta, name=v4/v4.trim.unique.good.filter.names, group=v4/v4.good.groups, groups=mock1.v4-mock2.v4-mock3.v4)
Unable to open v4/v4.trim.unique.good.filter.unique.fasta. Trying default /remote/RSU/sw-cache/metag/bin/v4.trim.unique.good.filter.unique.fasta
Unable to open /remote/RSU/sw-cache/metag/bin/v4.trim.unique.good.filter.unique.fasta
Unable to open v4/v4.trim.unique.good.filter.names. Trying default /remote/RSU/sw-cache/metag/bin/v4.trim.unique.good.filter.names
Unable to open /remote/RSU/sw-cache/metag/bin/v4.trim.unique.good.filter.names
Unable to open v4/v4.good.groups. Trying default /remote/RSU/sw-cache/metag/bin/v4.good.groups
Unable to open /remote/RSU/sw-cache/metag/bin/v4.good.groups
You have no current groupfile, designfile, countfile or sharedfile and one is required.
You must provide at least one of the following: fasta, name, taxonomy, group, shared, design, count or list.
[ERROR]: did not complete get.groups.

mothur >
seq.error(fasta=current, name=current, reference=v4/HMP_MOCK.filter.fasta, processors=8)
[WARNING]: no file was saved for fasta parameter.
[WARNING]: no file was saved for name parameter.
You have no current fasta file and the fasta parameter is required.
Unable to open v4/HMP_MOCK.filter.fasta. Trying default /remote/RSU/sw-cache/metag/bin/HMP_MOCK.filter.fasta
Unable to open /remote/RSU/sw-cache/metag/bin/HMP_MOCK.filter.fasta

Using 8 processors.
[ERROR]: did not complete seq.error.

mothur > quit()

P4-C2 chemistry is quite old

Hi,

thank you for posting the article. It seems an interesting work. In particular, for me, the results about potential bias in error randomness are noteworthy and I am looking forward to future studies on that point. However, I am missing some qualification here. The chemistry used is P4-C2. P6-C4 is now available and seems to be a significant improvement. While technically totally different, the quality differences observed by using a different chemistry with ONT MinION is striking. And while the differences will probably not be that pronounced with PacBio, I nevertheless find the statement "[...] the error rates we observed with the PacBio platform were nearly 5-fold higher than what has previously been reported. It appears that the promise offered by the PacBio platform has not been realized." in the conclusion somewhat too strong in light of the new chemistry. Accordingly, I think readers might benefit from including a respective reference/sentence to clarify that the chemistry might potentially have an effect and might reduce the error rate. In any case, the 5-fold higher error rate is something that I find of interest too...

Looking forward to your comments.

Best,

Cedric

schlosslab / schloss_pacbio16s_peerj_2015 Goto Github PK

schloss_pacbio16s_peerj_2015's Introduction

The paper

Background

schloss_pacbio16s_peerj_2015's People

Contributors

Stargazers

Watchers

Forkers

schloss_pacbio16s_peerj_2015's Issues

Abstract

shell script

P4-C2 chemistry is quite old

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent