vascoelbrecht / primerminer Goto Github PK

View Code? Open in Web Editor NEW

32.0 7.0 11.0 5.14 MB

R mased batch sequence downloader, with primer development and in silico evaluation capabilities

R 100.00%

metabarcoding bioinformatics primer-design batch-download boldsystems ncbi database-mining

primerminer's People

Contributors

Stargazers

Watchers

Forkers

taylormwilcox jiayiqin stephaniehowes hemprichbennett quote2106 jaimeortiz-david jnchildress jaredfreedman damianosalex emagallong nycmyc

primerminer's Issues

Make empty taxonomy file when generating the batch file

check how gaps are handled with ribosomal alignments

add function to selectively trimm alignments

don't clipp all data, just flancing regions where primers would bind

add an option to include custom data into the clustering process

If you have own unpublished sequence data or data from other repositories not included in PrimerMiner, the should be an option to add your own sequences manually before clustering (in fasta format)

Batch download will fail if no sub categories are defined

Temporary work around;

add

delete_me,
,Georissidae
,Gyrinidae

to the end of your taxa file

Add maximum sequence number to download!

set this for mitochondria individually etc.

Also give a maximum number of OTUs to produce.

batch download - unknown error

Error in batch download script

speed up writing OTU consensus sequences in the batch_download comand

implement the consensus generation scripts from plot_alignments -> will lead to significant speed up!

(I didn't look at the code for ages, so it might take some time to get this running again)

0.3 bases

0.3 instead of 1/3, might cause issues? have to check this

PrimerMiner/PrimerMiner/R/plot_alignments.R

Line 20 in 5e0fb21

B,C or G or T,0,0.3,0.3,0.3,T

update error matching

based on table 7 in https://esajournals.onlinelibrary.wiley.com/doi/full/10.1002/ecs2.3193

use --msaout to write majority consensus OTU sequences

With new Vsearch version faster generation of OTU consensus sequences might be possible. Will be tested.

NCBI download does currently not work

changes on the NCBI side of things

Check version in config file against installed PrimerMiner file

Stopp execution with error if there are mismatches

Reason: The config file variables might change over time (spelling, additional features, features get obsolete). Backwards compatibility cant be guaranteed as PrimerMiner is in very active development and a new config file is quickly made!

Compare each sequence against a mitochondrial reference before clustering and verify that primers are trimmed with BOLD api

Very traffic intensive, not sure if this will be implemented, should rather be done on the BOLD end. We will see...

PrimerMiner

Hi Dears

I got message when trying to download PrimerMiner through the code

install.packages("PrimerMiner", repos = NULL, type="source", dependencies=T):

Warning: invalid package 'PrimerMiner'
Error: ERROR: no packages specified
Warning in install.packages :
installation of package ‘PrimerMiner’ had non-zero exit status

Any clue to solve this issue?

Install issue

Hi Vasco,

I'm keen to give PrimerMiner a try but have run into an issue. When installing, I get the following Warnings and Error:

> install.packages("~/Downloads/PrimerMiner-0.12.tar.gz", repos = NULL, type = "source") Warning in untar2(tarfile, files, list, exdir, restore_times) : skipping pax global extended headers ERROR: cannot extract package from ‘/Users/sebastian/Downloads/PrimerMiner-0.12.tar.gz’ Warning in install.packages : installation of package ‘/Users/sebastian/Downloads/PrimerMiner-0.12.tar.gz’ had non-zero exit status

Any idea wha I might be doing wrong?

Many thanks,
Sebastian

Include scoring table into primer miner it self!

Like in JAMP, no changes to the tables

Any option to get the species in each cluster?

Is there any option to obtain the species in each cluster?

I know this is related with vsearch, they mention in the documentation an example including the taxonomy and the species name, following the "s" identifier: ">X80725_S000004313;tax=d:Bacteria,p:Proteobacteria,c:Gammaproteobacteria,o:Enterobacteriales,f:Enterobacteriaceae,g:Escherichia/Shigella,s:Escherichia_coli"

Maybe I'm not reading the output correctly. For example one of my groups Bathyraja, has 8 species

Bathyraja,
,"Bathyraja brachyurops"
,"Bathyraja cousseauae"
,"Bathyraja scaphiops"
,"Bathyraja griseocauda"
,"Bathyraja magellanica"
,"Bathyraja albomaculata"
,"Bathyraja macloviana"
,"Bathyraja multispinis"

Results reports they were grouped in 5 clusters, but I couldn't find which species were grouped in each cluster.

cluster_file:

Reading file Bathyraja/Vsearch/Bathyraja_all_drep+1.fasta 100%
35182 nt in 45 seqs, min 591, max 1757, avg 782
Masking 100%
Sorting by length 100%
Counting unique k-mers 100%
Clustering 100%
Sorting clusters 100%
Writing clusters 100%
Clusters: 5 Size min 1, max 37, avg 9.0
Singletons: 2, 4.4% of seqs, 40.0% of clusters
Multiple alignments 100%
vsearch v1.10.2_osx_x86_64, 8.0GB RAM, 4 cores
https://github.com/torognes/vsearch

The log.txt output is:

2020-08-18 00:13:05 - Downloading BOLD sequence data

#Bold_data: Folder Bathyraja/Bathyraja
Taxon	Sequences	downl_time
Bathyraja brachyurops	6	0.31 secs
Bathyraja cousseauae	3	0.31 secs
Bathyraja scaphiops	8	0.36 secs
Bathyraja griseocauda	12	0.34 secs
Bathyraja magellanica	4	0.3 secs
Bathyraja albomaculata	8	0.86 secs
Bathyraja macloviana	4	0.95 secs
Bathyraja multispinis	10	0.32 secs
#Bold_data_end

2020-08-18 00:13:09 - Downloading GenBank sequence data

Search query: REPLACE_WITH_TAXA[Organism] AND (COi OR CO1 OR COXi OR COX1) AND 1:2000[Sequence Length]

#GB_data: Folder Bathyraja/Bathyraja
Taxon	Sequences	downl_time
Bathyraja brachyurops	5	5.8 secs
Bathyraja cousseauae	3	5.3 secs
Bathyraja scaphiops	7	5.4 secs
Bathyraja griseocauda	10	5.4 secs
Bathyraja magellanica	4	5.1 secs
Bathyraja albomaculata	6	5.5 secs
Bathyraja macloviana	4	5.3 secs
Bathyraja multispinis	8	5.2 secs
#GB_data_end

2020-08-18 00:13:52  - Downloading Miochondrial Genomes from GenBank

Search query: REPLACE_WITH_TAXA[Organism] AND mitochondrion[filter] AND genome AND 2001:80000[Sequence Length]

#mito_data: Folder Bathyraja/Bathyraja
Taxon	Sequences	downl_time
Bathyraja brachyurops	0	0.83 secs
Bathyraja cousseauae	0	0.92 secs
Bathyraja scaphiops	0	0.84 secs
Bathyraja griseocauda	2	4.8 secs
Bathyraja magellanica	0	1.4 secs
Bathyraja albomaculata	2	4.5 secs
Bathyraja macloviana	0	1.4 secs
Bathyraja multispinis	2	4.7 secs
#mito_data_end

2020-08-18 00:14:11  - Converting Mito Genbank to fasta

#mito_gb2fasta
GBfile	noMito	unique
Bathyraja/Bathyraja/Bathyraja albomaculata_mito.gb	2	1
Bathyraja/Bathyraja/Bathyraja griseocauda_mito.gb	2	1
Bathyraja/Bathyraja/Bathyraja multispinis_mito.gb	2	1
#mito_gb2fasta_end

2020-08-18 00:14:11 - Merging fasta files

Reading in files matching BOLD\.fasta$:
Folders: Bathyraja/Bathyraja
Files: 
Clipping: Left 0 bp, Right 0 bp

Matching files which were written into  Bathyraja/Bathyraja_Bold.fasta : 
 Bathyraja/Bathyraja/Bathyraja albomaculata_BOLD.fasta, Bathyraja/Bathyraja/Bathyraja brachyurops_BOLD.fasta, Bathyraja/Bathyraja/Bathyraja cousseauae_BOLD.fasta, Bathyraja/Bathyraja/Bathyraja griseocauda_BOLD.fasta, Bathyraja/Bathyraja/Bathyraja macloviana_BOLD.fasta, Bathyraja/Bathyraja/Bathyraja magellanica_BOLD.fasta, Bathyraja/Bathyraja/Bathyraja multispinis_BOLD.fasta, Bathyraja/Bathyraja/Bathyraja scaphiops_BOLD.fasta 

2020-08-18 00:14:11 - Merging fasta files

Reading in files matching GB\.fasta$:
Folders: Bathyraja/Bathyraja
Files: 
Clipping: Left 0 bp, Right 0 bp

Matching files which were written into  Bathyraja/Bathyraja_GB.fasta : 
 Bathyraja/Bathyraja/Bathyraja albomaculata_GB.fasta, Bathyraja/Bathyraja/Bathyraja brachyurops_GB.fasta, Bathyraja/Bathyraja/Bathyraja cousseauae_GB.fasta, Bathyraja/Bathyraja/Bathyraja griseocauda_GB.fasta, Bathyraja/Bathyraja/Bathyraja macloviana_GB.fasta, Bathyraja/Bathyraja/Bathyraja magellanica_GB.fasta, Bathyraja/Bathyraja/Bathyraja multispinis_GB.fasta, Bathyraja/Bathyraja/Bathyraja scaphiops_GB.fasta 

2020-08-18 00:14:11 - Merging fasta files

Reading in files matching [mito]\.fasta$:
Folders: Bathyraja/Bathyraja
Files: 
Clipping: Left 0 bp, Right 0 bp

Matching files which were written into  Bathyraja/Bathyraja_mito.fasta : 
 Bathyraja/Bathyraja/Bathyraja albomaculata_mito.fasta, Bathyraja/Bathyraja/Bathyraja griseocauda_mito.fasta, Bathyraja/Bathyraja/Bathyraja multispinis_mito.fasta 

2020-08-18 00:14:11 - Merging fasta files

Reading in files matching \.fasta$:
Folders: 
Files: Bathyraja/Bathyraja_Bold.fasta, Bathyraja/Bathyraja_GB.fasta, Bathyraja/Bathyraja_mito.fasta
Clipping: Left 0 bp, Right 0 bp

Matching files which were written into  Bathyraja/Bathyraja_all.fasta : 
 Bathyraja/Bathyraja_Bold.fasta, Bathyraja/Bathyraja_GB.fasta, Bathyraja/Bathyraja_mito.fasta 

2020-08-18 00:14:11 - Clustering reads with Vsearch
vsearch v1.10.2_osx_x86_64, 8.0GB RAM, 4 cores

Used fasta file: Bathyraja_all.fasta
Number of imput sequences: 105
Dereplicated: 45
Cluster: 5

VSEARCH comands:

/usr/local/lib/R/site-library/PrimerMiner/vsearch-1.10.2_osx_x86_64 -derep_fulllength Bathyraja/Bathyraja_all.fasta -output Bathyraja/Vsearch/Bathyraja_all_drep.fasta >Bathyraja/Vsearch/temp.txt
/usr/local/lib/R/site-library/PrimerMiner/vsearch-1.10.2_osx_x86_64 -derep_fulllength Bathyraja/Vsearch/Bathyraja_all_drep.fasta -output Bathyraja/Vsearch/Bathyraja_all_drep+1.fasta -sizeout
/usr/local/lib/R/site-library/PrimerMiner/vsearch-1.10.2_osx_x86_64 -cluster_fast Bathyraja/Vsearch/Bathyraja_all_drep+1.fasta -strand both -id 0.97 -msaout Bathyraja/Vsearch/cluster_file  >Bathyraja/Vsearch/temp.txt

Error in rawToChar(res$content) : long vectors not supported yet: raw.c:68

Error in rawToChar(res$content) :
long vectors not supported yet: raw.c:68

I was able to run it successfully the first time I downloaded it, but when I downloaded the same content a second time, I kept encountering this issue. How can I resolve it?

If no sequences are obtained, a ReadLines error occurs

"When no COI sequences are available for a given order (and I assume, family), the error "Error in readLines(file) : 'con' is not a connection" is given and the download terminates. Manual removal of the taxon in question is then required, although this isn't obvious to a less-seasoned R user. A potential consideration would be somehow bypassing this error and instead noting that 0 sequences were available for the taxon. The download could then continue without need for regular surveillance. "

Thanks to Jordan Cuff for submitting this

auto delete folders when downloading failed because of server error

delete incomplete folders and start over

automatcally cluster mitogenomes as well, to provide a backbone to map agains

currently mitogenome sequences are provided, to make an alignment, and use that degenerated consensus as a backbone for mapping all reads against. It would be better to cluster the mito sequences first to prevent overrepresentation of certain taxa (this will be rarely the case and does not matter much, but does not hurt to implement this at some point).

Some errors when trying to batch download with 12S and 16S markers

Hello,

I am finding two errors when batch downloading using PrimerMiner...

I have set the marker in the config file to: Marker = c("16s", "16S", "16S ribosomal RNA", "16s Ribosomal RNA") # specify target gene and I have turned off the downloads from BOLD as specified in the Wiki
But then, when running the batch download script I find this problem (For Coleoptera families there is no problem at all, only when going to other orders and families of insects):

Error in download.file(paste("https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=", :
'curl' call had nonzero exit status
curl: (16) Error in the HTTP2 framing layer

Also, I am noticing that even when I check the Coleoptera file, there is no Groups_mito.fasta file to work within the Alignment process in Geneious

plot_alignments crashes when only plotting one fasta file

Temporary fix: load the same fasta file 2 times. Then it does get plotted.

Use "stop(), warning() or message()" instead of "print()"

change way messages to user are displayed / printed.

include product annotation in mitogenome extractions!

"that for many of the mitogenomes the 12S or 16S rRNA gene is not annotated with a gene ID in genbank. Instead the 12S and 16S regions are often annotated as product=”12S ribosomal RNA” and product=”16S ribosomal RNA”. I have tried specifying these product annotations in the config file but primerminer does not retrieve these sequences." - suggested from Jonas Bylemans

Can't make plots from lower case fasta files

implement toupper or different fasta read in method.

Thank you Evgeny Shcherbakov for finding this issue.

Prevent Vsearch from writing lower characters

this would allow the consensus writing to skip the toupper step

not available for this version of R

Hello,

I'm trying to install PrimerMiner, but came to me the following error:

Warning in install.packages :
package ‘PrimerMiner-0.21.tar.gz’ is not available for this version of R

What's the problem?

plot_alignment step restrictions

Hi @VascoElbrecht

I hope everything's going well!

I have been using PrimerMiner to develop some specific and metabarcoding primers, so I've been using some Geneious fasta files that I already have (but complying with your recommendations in the YT tutorial) instead of following the batch download steps.

The problem arises when I try to plot more than 4 individual fasta files with plot_alignment. Using 5 or more files results in an incomplete plot (without the consensus and nucleotide letter part) with the console returning this mistake:

> plot_alignments(alignments, Order_names=gsub(".*./._(.*)_.*", "\\1", alignments)) Error in list[[k]][start:end, 2:5] : subscript out of bounds

I have been trying to see if changing the height and width could solve it without any improvements. Any recommendations?

Design Primers

Maybe I got it wrong but does PrimerMiner design primers as well? I couldn't find any tutorial in the wiki...

primer_evaluation

Error when only one sequence is in the fasta file...

workaround: Copy sequence to have 2 ; )

Write PrimerMiner version in config file

Error installing JAMP locally

Hi Vasco
We follow the package_tutorial.R and when we run the following line:

install_github("VascoElbrecht/JAMP", subdir="JAMP")

We obtain the following error:

Downloading GitHub repo VascoElbrecht/JAMP@master
tar: Failed to set default locale
tar: Failed to set default locale
Skipping 1 packages not available: PrimerMiner
During startup - Warning messages:
1: Setting LC_CTYPE failed, using "C"
2: Setting LC_TIME failed, using "C"
3: Setting LC_MESSAGES failed, using "C"
4: Setting LC_MONETARY failed, using "C"
v checking for file '/private/var/folders/l0/636zb7xd0fg015xw9v4mgt380000gn/T/Rtmpymi7u5/remotes21c51e830429/VascoElbrecht-JAMP-54b0c90/JAMP/DESCRIPTION' ...

preparing 'JAMP':
v checking DESCRIPTION meta-information ...
checking for LF line-endings in source and make files and shell scripts
checking for empty or unneeded directories
building 'JAMP_0.67.tar.gz'

Error: (converted from warning) Setting LC_CTYPE failed, using "C"
Execution halted
Error in i.p(...) :
(converted from warning) installation of package '/var/folders/l0/636zb7xd0fg015xw9v4mgt380000gn/T//Rtmpymi7u5/file21c554371e63/JAMP_0.67.tar.gz' had non-zero exit status

And when install
install_github("VascoElbrecht/PrimerMiner", subdir="PrimerMiner")
We obtain:
Downloading GitHub repo VascoElbrecht/PrimerMiner@master
tar: Failed to set default locale
tar: Failed to set default locale
These packages have more recent versions available.
Which would you like to update?

1: XML (3.98-1.17 -> 3.98-1.18) [CRAN]

Enter one or more numbers separated by spaces, or an empty line to cancel
1: 1
XML (3.98-1.17 -> 3.98-1.18) [CRAN]
Installing 1 packages: XML

There is a binary version available but the source version is later:
binary source needs_compilation
XML 3.98-1.17 3.98-1.18 TRUE

Do you want to install from sources the package which needs compilation? (Yes/no/cancel) Yes
installing the source package 'XML'

trying URL 'https://cran.rstudio.com/src/contrib/XML_3.98-1.18.tar.gz'
Content type 'application/x-gzip' length 1601173 bytes (1.5 MB)

downloaded 1.5 MB

Error: (converted from warning) Setting LC_CTYPE failed, using "C"
Execution halted
Error in i.p(...) :
(converted from warning) installation of package 'XML' had non-zero exit status

Downloading GitHub repo VascoElbrecht/PrimerMiner@master
tar: Failed to set default locale
tar: Failed to set default locale
These packages have more recent versions available.
Which would you like to update?

1: XML (3.98-1.17 -> 3.98-1.18) [CRAN]

Enter one or more numbers separated by spaces, or an empty line to cancel
1: 1
XML (3.98-1.17 -> 3.98-1.18) [CRAN]
Installing 1 packages: XML

There is a binary version available but the source version is later:
binary source needs_compilation
XML 3.98-1.17 3.98-1.18 TRUE

Do you want to install from sources the package which needs compilation? (Yes/no/cancel) Yes
installing the source package 'XML'

trying URL 'https://cran.rstudio.com/src/contrib/XML_3.98-1.18.tar.gz'
Content type 'application/x-gzip' length 1601173 bytes (1.5 MB)

downloaded 1.5 MB

Error: (converted from warning) Setting LC_CTYPE failed, using "C"
Execution halted
Error in i.p(...) :
(converted from warning) installation of package 'XML' had non-zero exit status

And the package is not installed.

Any suggestion? Thanks

María

vascoelbrecht / primerminer Goto Github PK

primerminer's People

Contributors

Stargazers

Watchers

Forkers

primerminer's Issues

trying URL 'https://cran.rstudio.com/src/contrib/XML_3.98-1.18.tar.gz' Content type 'application/x-gzip' length 1601173 bytes (1.5 MB)

trying URL 'https://cran.rstudio.com/src/contrib/XML_3.98-1.18.tar.gz' Content type 'application/x-gzip' length 1601173 bytes (1.5 MB)

Recommend Projects

Recommend Topics

Recommend Org

trying URL 'https://cran.rstudio.com/src/contrib/XML_3.98-1.18.tar.gz'
Content type 'application/x-gzip' length 1601173 bytes (1.5 MB)

trying URL 'https://cran.rstudio.com/src/contrib/XML_3.98-1.18.tar.gz'
Content type 'application/x-gzip' length 1601173 bytes (1.5 MB)