Finding region coordinates.
Prior to running the pipeline, find the coordinate for your region. I will show you how to do this for the v4v5 region.
- Locate your forward and reverse primers. Replace degenerates with any base pair option.
Ex.
> forward (w/ degenerates)= GTGYCAGCMGCCGCGGTAA
> forward (w/out degenerates)= GTGCCAGCAGCCGCGGTAA
> reverse_806r (w/ degenerates)= GGACTACNVGGGTWTCTAAT
> reverse_806r (w/out degenerates)= GGACTACAGGGGTATCTAAT
-
Find the reverse complement of your reverse primer. To do this, first find the complemenent of your reverse primer.
GGACTACAGGGGTATCTAAT (reverse primer)
CCTGATGTCCCCATAGATTA (complement of reverse primer)
Next find the reverse of this complement.
ATTAGATACCCCTGTAGTCC (reverse of the complement of the reverse primer)
Save this for later use.
-
Input your forward and reverse primers into the boxes labeled "Use my own forward primer (5'->3' on plus strand)" and "Use my own reverse primer (5'->3' on minus strand)".
-
Scroll down to the heading "Primer Pair Specificity Checking Parameters" and change the database setting "nr".
-
Below that, change the organism setting to "Escherichia coli (taxid:562)".
-
Click "Get Primers".
-
In a line that looks similiar to the following:
CP101983.1 Escherichia coli strain STEC1096 chromosome, complete genome
copy the alphanumeric cominbation up to the dot. In this case, that would be CP101983.
-
Go to https://www.ncbi.nlm.nih.gov/ and copy this string into the search bar.
-
Click on the result and then select "fasta".
-
Use ctrl/command + f to locate your forward primer. Start there and then copy the ~400-500 pairs that follow and paste into a text editor.
-
Use ctrl/command + f to locate your reverse complement primer within this text. Some of the sequences may be mismatched, so try searching for different sections of the primer if you cannot find the whole one at first.
-
Once you locate the reverse complement primer, check the length of the section from the forward primer to there. It would be around 300 base pairs. Copy this section over to a file called ecoliv4.fasta. You will need this eventually in the pipeline.
Downloading reference files
- Go to https://mothur.org/wiki/silva_reference_files/
- Download the latest version of the files called "full-length sequence and taxonomy database" and "seed database file", move them your working directory, and unzip them. This should generate 4 files whose names will vary slightly depending on the version downloaded:
silva.nr_v138_1.tax
silva.nr_v138_1.align
silva.seed_v138_1.tax
silva.seed_v138_1.align
Thank you
Thank you to Dr. Bradley Tolar for providing the code for the mothur pipeline. Thank you to Jessica Bullington and Dr. Katie Langenfeld for providing the R notebook which is originally adapted from Earth System Science 210 Techniques in Environmental Microbiology by Anne Dekas.
Reference
Schloss PD et al. 2009. Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Applied and Environmental Microbiology 75:7537โ7541.