vascoelbrecht / primerminer Goto Github PK
View Code? Open in Web Editor NEWR mased batch sequence downloader, with primer development and in silico evaluation capabilities
R mased batch sequence downloader, with primer development and in silico evaluation capabilities
allows for numbering of folders!
0.3 instead of 1/3, might cause issues? have to check this
implement toupper or different fasta read in method.
Thank you Evgeny Shcherbakov for finding this issue.
don't clipp all data, just flancing regions where primers would bind
Add support for different markers, currently only COI is retrieved!
http://v4.boldsystems.org/index.php/resources/api?type=webservices#sequenceParameters
this would allow the consensus writing to skip the toupper step
currently mitogenome sequences are provided, to make an alignment, and use that degenerated consensus as a backbone for mapping all reads against. It would be better to cluster the mito sequences first to prevent overrepresentation of certain taxa (this will be rarely the case and does not matter much, but does not hurt to implement this at some point).
changes on the NCBI side of things
Should not be a problem in typical usage. Low priority bug.
caused by duplicates in the "log.txt" file, for example if you cluster sequences a second time.
change way messages to user are displayed / printed.
"When no COI sequences are available for a given order (and I assume, family), the error "Error in readLines(file) : 'con' is not a connection" is given and the download terminates. Manual removal of the taxon in question is then required, although this isn't obvious to a less-seasoned R user. A potential consideration would be somehow bypassing this error and instead noting that 0 sequences were available for the taxon. The download could then continue without need for regular surveillance. "
Thanks to Jordan Cuff for submitting this
set this for mitochondria individually etc.
Also give a maximum number of OTUs to produce.
Is there any option to obtain the species in each cluster?
I know this is related with vsearch, they mention in the documentation an example including the taxonomy and the species name, following the "s" identifier: ">X80725_S000004313;tax=d:Bacteria,p:Proteobacteria,c:Gammaproteobacteria,o:Enterobacteriales,f:Enterobacteriaceae,g:Escherichia/Shigella,s:Escherichia_coli"
Maybe I'm not reading the output correctly. For example one of my groups Bathyraja, has 8 species
Bathyraja,
,"Bathyraja brachyurops"
,"Bathyraja cousseauae"
,"Bathyraja scaphiops"
,"Bathyraja griseocauda"
,"Bathyraja magellanica"
,"Bathyraja albomaculata"
,"Bathyraja macloviana"
,"Bathyraja multispinis"
Results reports they were grouped in 5 clusters, but I couldn't find which species were grouped in each cluster.
cluster_file:
Reading file Bathyraja/Vsearch/Bathyraja_all_drep+1.fasta 100%
35182 nt in 45 seqs, min 591, max 1757, avg 782
Masking 100%
Sorting by length 100%
Counting unique k-mers 100%
Clustering 100%
Sorting clusters 100%
Writing clusters 100%
Clusters: 5 Size min 1, max 37, avg 9.0
Singletons: 2, 4.4% of seqs, 40.0% of clusters
Multiple alignments 100%
vsearch v1.10.2_osx_x86_64, 8.0GB RAM, 4 cores
https://github.com/torognes/vsearch
The log.txt output is:
2020-08-18 00:13:05 - Downloading BOLD sequence data
#Bold_data: Folder Bathyraja/Bathyraja
Taxon Sequences downl_time
Bathyraja brachyurops 6 0.31 secs
Bathyraja cousseauae 3 0.31 secs
Bathyraja scaphiops 8 0.36 secs
Bathyraja griseocauda 12 0.34 secs
Bathyraja magellanica 4 0.3 secs
Bathyraja albomaculata 8 0.86 secs
Bathyraja macloviana 4 0.95 secs
Bathyraja multispinis 10 0.32 secs
#Bold_data_end
2020-08-18 00:13:09 - Downloading GenBank sequence data
Search query: REPLACE_WITH_TAXA[Organism] AND (COi OR CO1 OR COXi OR COX1) AND 1:2000[Sequence Length]
#GB_data: Folder Bathyraja/Bathyraja
Taxon Sequences downl_time
Bathyraja brachyurops 5 5.8 secs
Bathyraja cousseauae 3 5.3 secs
Bathyraja scaphiops 7 5.4 secs
Bathyraja griseocauda 10 5.4 secs
Bathyraja magellanica 4 5.1 secs
Bathyraja albomaculata 6 5.5 secs
Bathyraja macloviana 4 5.3 secs
Bathyraja multispinis 8 5.2 secs
#GB_data_end
2020-08-18 00:13:52 - Downloading Miochondrial Genomes from GenBank
Search query: REPLACE_WITH_TAXA[Organism] AND mitochondrion[filter] AND genome AND 2001:80000[Sequence Length]
#mito_data: Folder Bathyraja/Bathyraja
Taxon Sequences downl_time
Bathyraja brachyurops 0 0.83 secs
Bathyraja cousseauae 0 0.92 secs
Bathyraja scaphiops 0 0.84 secs
Bathyraja griseocauda 2 4.8 secs
Bathyraja magellanica 0 1.4 secs
Bathyraja albomaculata 2 4.5 secs
Bathyraja macloviana 0 1.4 secs
Bathyraja multispinis 2 4.7 secs
#mito_data_end
2020-08-18 00:14:11 - Converting Mito Genbank to fasta
#mito_gb2fasta
GBfile noMito unique
Bathyraja/Bathyraja/Bathyraja albomaculata_mito.gb 2 1
Bathyraja/Bathyraja/Bathyraja griseocauda_mito.gb 2 1
Bathyraja/Bathyraja/Bathyraja multispinis_mito.gb 2 1
#mito_gb2fasta_end
2020-08-18 00:14:11 - Merging fasta files
Reading in files matching BOLD\.fasta$:
Folders: Bathyraja/Bathyraja
Files:
Clipping: Left 0 bp, Right 0 bp
Matching files which were written into Bathyraja/Bathyraja_Bold.fasta :
Bathyraja/Bathyraja/Bathyraja albomaculata_BOLD.fasta, Bathyraja/Bathyraja/Bathyraja brachyurops_BOLD.fasta, Bathyraja/Bathyraja/Bathyraja cousseauae_BOLD.fasta, Bathyraja/Bathyraja/Bathyraja griseocauda_BOLD.fasta, Bathyraja/Bathyraja/Bathyraja macloviana_BOLD.fasta, Bathyraja/Bathyraja/Bathyraja magellanica_BOLD.fasta, Bathyraja/Bathyraja/Bathyraja multispinis_BOLD.fasta, Bathyraja/Bathyraja/Bathyraja scaphiops_BOLD.fasta
2020-08-18 00:14:11 - Merging fasta files
Reading in files matching GB\.fasta$:
Folders: Bathyraja/Bathyraja
Files:
Clipping: Left 0 bp, Right 0 bp
Matching files which were written into Bathyraja/Bathyraja_GB.fasta :
Bathyraja/Bathyraja/Bathyraja albomaculata_GB.fasta, Bathyraja/Bathyraja/Bathyraja brachyurops_GB.fasta, Bathyraja/Bathyraja/Bathyraja cousseauae_GB.fasta, Bathyraja/Bathyraja/Bathyraja griseocauda_GB.fasta, Bathyraja/Bathyraja/Bathyraja macloviana_GB.fasta, Bathyraja/Bathyraja/Bathyraja magellanica_GB.fasta, Bathyraja/Bathyraja/Bathyraja multispinis_GB.fasta, Bathyraja/Bathyraja/Bathyraja scaphiops_GB.fasta
2020-08-18 00:14:11 - Merging fasta files
Reading in files matching [mito]\.fasta$:
Folders: Bathyraja/Bathyraja
Files:
Clipping: Left 0 bp, Right 0 bp
Matching files which were written into Bathyraja/Bathyraja_mito.fasta :
Bathyraja/Bathyraja/Bathyraja albomaculata_mito.fasta, Bathyraja/Bathyraja/Bathyraja griseocauda_mito.fasta, Bathyraja/Bathyraja/Bathyraja multispinis_mito.fasta
2020-08-18 00:14:11 - Merging fasta files
Reading in files matching \.fasta$:
Folders:
Files: Bathyraja/Bathyraja_Bold.fasta, Bathyraja/Bathyraja_GB.fasta, Bathyraja/Bathyraja_mito.fasta
Clipping: Left 0 bp, Right 0 bp
Matching files which were written into Bathyraja/Bathyraja_all.fasta :
Bathyraja/Bathyraja_Bold.fasta, Bathyraja/Bathyraja_GB.fasta, Bathyraja/Bathyraja_mito.fasta
2020-08-18 00:14:11 - Clustering reads with Vsearch
vsearch v1.10.2_osx_x86_64, 8.0GB RAM, 4 cores
Used fasta file: Bathyraja_all.fasta
Number of imput sequences: 105
Dereplicated: 45
Cluster: 5
VSEARCH comands:
/usr/local/lib/R/site-library/PrimerMiner/vsearch-1.10.2_osx_x86_64 -derep_fulllength Bathyraja/Bathyraja_all.fasta -output Bathyraja/Vsearch/Bathyraja_all_drep.fasta >Bathyraja/Vsearch/temp.txt
/usr/local/lib/R/site-library/PrimerMiner/vsearch-1.10.2_osx_x86_64 -derep_fulllength Bathyraja/Vsearch/Bathyraja_all_drep.fasta -output Bathyraja/Vsearch/Bathyraja_all_drep+1.fasta -sizeout
/usr/local/lib/R/site-library/PrimerMiner/vsearch-1.10.2_osx_x86_64 -cluster_fast Bathyraja/Vsearch/Bathyraja_all_drep+1.fasta -strand both -id 0.97 -msaout Bathyraja/Vsearch/cluster_file >Bathyraja/Vsearch/temp.txt
Hi Vasco,
I'm keen to give PrimerMiner a try but have run into an issue. When installing, I get the following Warnings and Error:
> install.packages("~/Downloads/PrimerMiner-0.12.tar.gz", repos = NULL, type = "source") Warning in untar2(tarfile, files, list, exdir, restore_times) : skipping pax global extended headers ERROR: cannot extract package from ‘/Users/sebastian/Downloads/PrimerMiner-0.12.tar.gz’ Warning in install.packages : installation of package ‘/Users/sebastian/Downloads/PrimerMiner-0.12.tar.gz’ had non-zero exit status
Any idea wha I might be doing wrong?
Many thanks,
Sebastian
Maybe I got it wrong but does PrimerMiner design primers as well? I couldn't find any tutorial in the wiki...
Very traffic intensive, not sure if this will be implemented, should rather be done on the BOLD end. We will see...
Error in rawToChar(res$content) :
long vectors not supported yet: raw.c:68
I was able to run it successfully the first time I downloaded it, but when I downloaded the same content a second time, I kept encountering this issue. How can I resolve it?
Stopp execution with error if there are mismatches
Reason: The config file variables might change over time (spelling, additional features, features get obsolete). Backwards compatibility cant be guaranteed as PrimerMiner is in very active development and a new config file is quickly made!
"that for many of the mitogenomes the 12S or 16S rRNA gene is not annotated with a gene ID in genbank. Instead the 12S and 16S regions are often annotated as product=”12S ribosomal RNA” and product=”16S ribosomal RNA”. I have tried specifying these product annotations in the config file but primerminer does not retrieve these sequences." - suggested from Jonas Bylemans
Error when only one sequence is in the fasta file...
workaround: Copy sequence to have 2 ; )
With new Vsearch version faster generation of OTU consensus sequences might be possible. Will be tested.
Hi Vasco
We follow the package_tutorial.R and when we run the following line:
install_github("VascoElbrecht/JAMP", subdir="JAMP")
We obtain the following error:
Downloading GitHub repo VascoElbrecht/JAMP@master
tar: Failed to set default locale
tar: Failed to set default locale
Skipping 1 packages not available: PrimerMiner
During startup - Warning messages:
1: Setting LC_CTYPE failed, using "C"
2: Setting LC_TIME failed, using "C"
3: Setting LC_MESSAGES failed, using "C"
4: Setting LC_MONETARY failed, using "C"
v checking for file '/private/var/folders/l0/636zb7xd0fg015xw9v4mgt380000gn/T/Rtmpymi7u5/remotes21c51e830429/VascoElbrecht-JAMP-54b0c90/JAMP/DESCRIPTION' ...
Error: (converted from warning) Setting LC_CTYPE failed, using "C"
Execution halted
Error in i.p(...) :
(converted from warning) installation of package '/var/folders/l0/636zb7xd0fg015xw9v4mgt380000gn/T//Rtmpymi7u5/file21c554371e63/JAMP_0.67.tar.gz' had non-zero exit status
And when install
install_github("VascoElbrecht/PrimerMiner", subdir="PrimerMiner")
We obtain:
Downloading GitHub repo VascoElbrecht/PrimerMiner@master
tar: Failed to set default locale
tar: Failed to set default locale
These packages have more recent versions available.
Which would you like to update?
1: XML (3.98-1.17 -> 3.98-1.18) [CRAN]
Enter one or more numbers separated by spaces, or an empty line to cancel
1: 1
XML (3.98-1.17 -> 3.98-1.18) [CRAN]
Installing 1 packages: XML
There is a binary version available but the source version is later:
binary source needs_compilation
XML 3.98-1.17 3.98-1.18 TRUE
Do you want to install from sources the package which needs compilation? (Yes/no/cancel) Yes
installing the source package 'XML'
downloaded 1.5 MB
Error: (converted from warning) Setting LC_CTYPE failed, using "C"
Execution halted
Error in i.p(...) :
(converted from warning) installation of package 'XML' had non-zero exit status
Downloading GitHub repo VascoElbrecht/PrimerMiner@master
tar: Failed to set default locale
tar: Failed to set default locale
These packages have more recent versions available.
Which would you like to update?
1: XML (3.98-1.17 -> 3.98-1.18) [CRAN]
Enter one or more numbers separated by spaces, or an empty line to cancel
1: 1
XML (3.98-1.17 -> 3.98-1.18) [CRAN]
Installing 1 packages: XML
There is a binary version available but the source version is later:
binary source needs_compilation
XML 3.98-1.17 3.98-1.18 TRUE
Do you want to install from sources the package which needs compilation? (Yes/no/cancel) Yes
installing the source package 'XML'
downloaded 1.5 MB
Error: (converted from warning) Setting LC_CTYPE failed, using "C"
Execution halted
Error in i.p(...) :
(converted from warning) installation of package 'XML' had non-zero exit status
And the package is not installed.
Any suggestion? Thanks
María
Hello,
I'm trying to install PrimerMiner, but came to me the following error:
Warning in install.packages :
package ‘PrimerMiner-0.21.tar.gz’ is not available for this version of R
What's the problem?
I hope everything's going well!
I have been using PrimerMiner to develop some specific and metabarcoding primers, so I've been using some Geneious fasta files that I already have (but complying with your recommendations in the YT tutorial) instead of following the batch download steps.
The problem arises when I try to plot more than 4 individual fasta files with plot_alignment. Using 5 or more files results in an incomplete plot (without the consensus and nucleotide letter part) with the console returning this mistake:
> plot_alignments(alignments, Order_names=gsub(".*./._(.*)_.*", "\\1", alignments)) Error in list[[k]][start:end, 2:5] : subscript out of bounds
I have been trying to see if changing the height and width could solve it without any improvements. Any recommendations?
If you have own unpublished sequence data or data from other repositories not included in PrimerMiner, the should be an option to add your own sequences manually before clustering (in fasta format)
Temporary work around;
add
delete_me,
,Georissidae
,Gyrinidae
to the end of your taxa file
Hi Dears
I got message when trying to download PrimerMiner through the code
install.packages("PrimerMiner", repos = NULL, type="source", dependencies=T):
Warning: invalid package 'PrimerMiner'
Error: ERROR: no packages specified
Warning in install.packages :
installation of package ‘PrimerMiner’ had non-zero exit status
Any clue to solve this issue?
delete incomplete folders and start over
implement the consensus generation scripts from plot_alignments -> will lead to significant speed up!
(I didn't look at the code for ages, so it might take some time to get this running again)
Temporary fix: load the same fasta file 2 times. Then it does get plotted.
Hello,
I am finding two errors when batch downloading using PrimerMiner...
I have set the marker in the config file to: Marker = c("16s", "16S", "16S ribosomal RNA", "16s Ribosomal RNA") # specify target gene and I have turned off the downloads from BOLD as specified in the Wiki
But then, when running the batch download script I find this problem (For Coleoptera families there is no problem at all, only when going to other orders and families of insects):
Error in download.file(paste("https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=", :
'curl' call had nonzero exit status
curl: (16) Error in the HTTP2 framing layer
Also, I am noticing that even when I check the Coleoptera file, there is no Groups_mito.fasta file to work within the Alignment process in Geneious
Like in JAMP, no changes to the tables
based on table 7 in https://esajournals.onlinelibrary.wiley.com/doi/full/10.1002/ecs2.3193
dependencies are available on CRAN, so they could have listed them in the NAMESPACE file so that these are automatically installed when the package is installed
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.