Giter VIP home page Giter VIP logo

phylostratr's Introduction

unstable Travis-CI Build Status Coverage Status DOI

phylostratr

Predict and explore the age of genes using phylostratigraphic methods.

Phylostratr Workflow

Installation

You can install from GitHub with:

library(devtools)
install_github("arendsee/phylostratr")

Dependencies

  • R packages from CRAN (see DESCRIPTION)
  • Biostrings from Bioconductor, install with devtools::install_bioc("Biostrings")
  • NCBI BLAST+ - blastp (the protein BLAST command) must be in PATH. You can tell if blastp is properly installed by calling blastp -help from the command line.

Citation

Zebulun Arendsee, Jing Li, Urminder Singh, Arun Seetharam, Karin Dorman,
Eve Syrkin Wurtele (2019) "phylostratr: A framework for phylostratigraphy." Bioinformatics. https://doi.org/10.1093/bioinformatics/btz171

Funding

This work is funded by the National Science Foundation grant:

NSF-IOS 1546858 Orphan Genes: An Untapped Genetic Reservoir of Novel Traits

phylostratr's People

Contributors

arendsee avatar lijing28101 avatar urmi-21 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

phylostratr's Issues

No alias or index file found

Hello,

I am getting a lot of these messages when I run phylostratr:

BLAST Database error: No alias or index file found for protein database [blastdb/4932.faa] in search path [/mypath:/data/bio/ncbi/refseq200v5:]
24932: blasting ...
BLAST Database error: No alias or index file found for protein database [blastdb/9606.faa] in search path [/mypath:/data/bio/ncbi/refseq200v5:]
29606: blasting ...

How can I solve it?

Thanks

Error in (function (fastafile, blastdb = "blastdb", verbose = FALSE) : Failed to make blast database blastdb/Araport11_genes.201606.pep.fasta In addition: Warning message: In system2("makeblastdb", stderr = TRUE, stdout = TRUE, args = c("-dbtype", : running command '"makeblastdb" -dbtype prot -in Araport11_genes.201606.pep.fasta -out blastdb/Araport11_genes.201606.pep.fasta' had status 1

Error in (function (fastafile, blastdb = "blastdb", verbose = FALSE) :
Failed to make blast database blastdb/Araport11_genes.201606.pep.fasta
In addition: Warning message:
In system2("makeblastdb", stderr = TRUE, stdout = TRUE, args = c("-dbtype", :
running command '"makeblastdb" -dbtype prot -in Araport11_genes.201606.pep.fasta -out blastdb/Araport11_genes.201606.pep.fasta' had status 1

Hi everyone,when running the code for recurrence, there is something wrong with makeblastdb. I have downloaded BLAST+ locally and set up the environment path, and tried lots of search to solve this problem but failed. I wonder if someone also met this problem or if someone knew how to tackle it. Thank you sosososo much!

strata_blast issue, similar to #15

Greetings,

I have a similar issue to a previously posted issue #15, that when I try to run strata_blast(strata), I get the same error:
Error in system2("blastdbcmd", stdout = TRUE, stderr = TRUE, args = c("-info", :
error in running command

I have blast+ installed on my system and can run blastdbcmd from command line, and I am on Mac as well. I have the latest version of the package installed, but for me it doesn't output the error message that you described:
BLAST Database error: No alias or index file found for protein database [xxxxxx] in search path [/path/to/working/directory::]

Additionally, when I assign a variable with:
var1 <- strata@data$faa
I get, for example, 134 faxid's, and there are also 134 files in the uniprot-seqs directory. I'm not sure if this is related to the error. I just checked it because #15 seemed to have an issue with that.

Apologies for any dumb questions as I am quite new to R.

Thank you!
Best regards

Error when importing external blast results

Hello,

I have been using this great package for a while now, but recently something has changed, I could not find out what it was so I ended up writing here for some help.

So I have retrieved all sequences for blasting using phylostratr's algorithm and performed the blasts manually. After that I have formatted the results to be imported(I have attached such an output sample with txt suffix because of github 9925.tab.txt
). After that I tried to import these results, only to get the following error message:

Error in strata_from_taxids(focal_species, taxids, ...) : 
  object 'blastfiles' not found
In addition: Warning message:
In strata_from_taxids(focal_species, taxids, ...) :
  The following subspecies taxa where removed: 100901019510228132806618197451884772323232526712698142792328230128390928788929159300693079723450634508345133762140068242192453514651446704467314683552066610562106239625363596692026956697370707091721772207227723072347240724472457260757476347635765476567719774079558364860376903192179954495979606982399139925

I tried to look into the files, but could not find anything out of order. Do you have any suggestions for a workaround?

Thank you in advance!

Here is the session info:

R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] hrbrthemes_0.8.0  wesanderson_0.3.6 reshape2_1.4.4    knitr_1.36        taxizedb_0.3.0    forcats_0.5.1     stringr_1.4.0     dplyr_1.0.7      
 [9] purrr_0.3.4       readr_2.1.0       tidyr_1.1.4       tibble_3.1.6      ggplot2_3.3.5     tidyverse_1.3.1   magrittr_2.0.1    phylostratr_0.2.1

loaded via a namespace (and not attached):
 [1] httr_1.4.2          bit64_4.0.5         jsonlite_1.7.2      modelr_0.1.8        assertthat_0.2.1    BiocManager_1.30.16 rvcheck_0.2.1      
 [8] yulab.utils_0.0.4   blob_1.2.2          cellranger_1.1.0    yaml_2.2.1          gdtools_0.2.3       Rttf2pt1_1.3.9      pillar_1.6.4       
[15] RSQLite_2.2.8       backports_1.3.0     lattice_0.20-45     glue_1.5.0          extrafontdb_1.0     digest_0.6.28       rvest_1.0.2        
[22] colorspace_2.0-2    ggfun_0.0.4         htmltools_0.5.2     plyr_1.8.6          pkgconfig_2.0.3     broom_0.7.10        haven_2.4.3        
[29] patchwork_1.1.1     scales_1.1.1        ggplotify_0.1.0     tzdb_0.2.0          generics_0.1.1      ellipsis_0.3.2      cachem_1.0.6       
[36] withr_2.4.2         cli_3.1.0           crayon_1.4.2        readxl_1.3.1        evaluate_0.14       memoise_2.0.0       fs_1.5.0           
[43] fansi_0.5.0         nlme_3.1-153        xml2_1.3.2          tools_4.0.3         hms_1.1.1           lifecycle_1.0.1     aplot_0.1.1        
[50] munsell_0.5.0       reprex_2.0.1        compiler_4.0.3      systemfonts_1.0.3   gridGraphics_0.5-1  rlang_0.4.12        grid_4.0.3         
[57] rstudioapi_0.13     rappdirs_0.3.3      rmarkdown_2.11      gtable_0.3.0        DBI_1.1.1           curl_4.3.2          R6_2.5.1           
[64] lubridate_1.8.0     extrafont_0.17      fastmap_1.1.0       bit_4.0.4           utf8_1.2.2          hoardr_0.5.2        ape_5.5            
[71] stringi_1.7.5       parallel_4.0.3      Rcpp_1.0.7          vctrs_0.3.8         dbplyr_2.1.1        tidyselect_1.1.1    xfun_0.28

Heatmap layout

Hello again!

I am using phylostratr to recover data about the conservation level (and age) of analogous enzymes in the human genome, and the software is fantastic for this purpose. In our work, the results produced with BLASTp, against a customized dataset (78 proteomes - https://www.ebi.ac.uk/reference_proteomes), were processed (considering evalue, Score, Identity, and coverage), and some missing data were observed. I don't have problems with these missing data, but it would be helpful if the color of the border in the heatmap was drawn in black (the matrix will represent the evalue, the presence, and absence of orthologs for each gene). Also, species with long names showed overlap with the heatmap.

What can I do to fix this?

Thank you very much!

Invalid Strata object

tid<-'47426'
strata=uniprot_strata(tid, from = 2) %>% strata_apply(f= diverse_subtree,n=5,weights=uniprot_weight_by_ref()) %>% use_recommended_prokaryotes() %>% add_taxa(c('4932', '9606')) %>% uniprot_fill_strata()

#it shows the following error
Error in is_valid_strata(strata) :
Invalid Strata object, the focal species '47426' is not found in the tree

May I know what might be the issue? I checked the organism 'A.cepistipes'; it exist in NCBI taxonomy but, the program could not identify it. Please help me solve the issue.

Any help would be appreciated! Thanks

Cannot resolve tree ids in `merge_besthits`

Hi,

Thank you for the package and the framework, it is really great!

I met with a small issue while running merge_desthits() on some new data:

Error in clean_phyid(x, id, type = type) : 
  Cannot automatically resolve tree id.It could be either a phylo object index or a taxon name.Please specify the type.
Calls: merge_besthits ... clean_phyid -> parent -> parent.phylo -> clean_phyid
Execution halted

As I am working with unpublished data, I added several ids and fasta files manually. It caused no problem for the blast or plotting the tree. I do not know if it is connected here either, but mentioning. Every other steps, I followed your tutorials.
Also, i installed the newest versions of required packages (treeio, ggtree).

Thank you for the answers in advance!

Issues in compiling results

At the very end, after I run BLAST and everything works seamlessly, I run into an error at this line:
results <- merge_besthits(strata)

Error in $<-.data.frame(*tmp*, mrca, value = "58024") :
replacement has 1 row, data has 0
What do you think it means? Am I missing something?

Duplicated factor levels while calling stratify()

Hello. When calling the stratify function I get an error message of
Error in levels<-(tmp, value = as.character(levels)) : factor level [20] is duplicated

The code used was
focal_taxid <- '7230'
str_dmoj <- strata_from_blast_dir(focal_taxid, blastdir = "/Users/ferenckagan/Documents/Bioinformatic_analysis/phylostratr_proteomes/Comparative_phylostrat/7230", ext = "tab") %>% strata_besthits()
results_dmoj <- strata_convert(str_dmoj, target = 'all', to = 'name') %>% merge_besthits()
str_dmoj <- stratify(results_dmoj, strata_names = get_mrca_names(results_dmoj))

I tried to check the results table for any anomalies but could not find anything in particular at the ps 20 compared to the rest of phylostrata. In parallel I have read in other blast outputs from other focal species without any problems so the issue is in this particular blast result. Do you have any ideas what could cause the problem?

Thank you very much in advance for the help.

Cheers,
Ferenc

swap `BiocInstaller` with `BiocManager`

Looks like package BiocInstaller is not available (for R version 3.5.0) and it is superseded by BiocManager. Can this be changed so that the installer finds BiocManager and does not complain?

> install_github("arendsee/phylostratr")
Downloading GitHub repo arendsee/phylostratr@master
from URL https://api.github.com/repos/arendsee/phylostratr/zipball/master
Installing phylostratr
‘BiocInstaller’ must be installed for this functionality.
Would you like to install it?

1: Yes
2: No

Selection: 1
Installing package into ‘/opt/rit/spack-app/linux-rhel7-x86_64/gcc-4.8.5/r-rsqlite-2.0-zew7f5pkhdh6gk4qdv6hown4uf4zszsb/rlib/R/library’
(as ‘lib’ is unspecified)
Warning in install.packages(pkg) :
  'lib = "/opt/rit/spack-app/linux-rhel7-x86_64/gcc-4.8.5/r-rsqlite-2.0-zew7f5pkhdh6gk4qdv6hown4uf4zszsb/rlib/R/library"' is not writable
Would you like to use a personal library instead? (yes/No/cancel) yes
Error in loadNamespace(name) : there is no package called ‘BiocInstaller’
In addition: Warning message:
package ‘BiocInstaller’ is not available (for R version 3.5.0)

Thanks,

qseqid problem

Hi @arendsee,

I have managed to run the same analysis as the tutorial for my data, but I am wondering why the qseqid for all ps is ensembl_peptide_ID, except for ps 2, which has different ID names that I don't know how to convert in biomart.

Screenshot 2020-07-13 at 12 14 45
Screenshot 2020-07-13 at 12 14 57

Could you please help me on this? Thank you very much!
Best,
CW

Let phylostratr read blast outputs directly!

Hi, It would be great if phylostratr can directly read blast output files. Currently, it still needs to run strata_blast() which is not necessary once we have the blast outputs already. This will also be helpful when one wants to run phylostratr multiple times using different thresholds.

Possible bug? number of genes per strata is way higher than the total number of genes

I re-ran the phylostratr recently on Apis florea and got really weired results. The number of genes reported per strata were way higher than the previous run and the sum doesn't equate to the total number of genes. I think it is due to some bug introduced recently because the previous run had much resonable numbers. The only difference between the previous run and this run is that I've added 16 extra spp for the latest run.

Here is more information:

R script for previous run

library(devtools)
source("https://bioconductor.org/biocLite.R")
library(phylostratr)
library(magrittr)
library(plotly)
focal_taxid <- '7463'
strata <-
  uniprot_strata(focal_taxid, from=2) %>%
  add_taxa('7463') %>%
  strata_apply(f=diverse_subtree, n=5, weights=uniprot_weight_by_ref()) %>%
  use_recommended_prokaryotes %>%
  uniprot_fill_strata
results <- strata_blast(strata, blast_args=list(nthreads=8)) %>%
  strata_besthits %>%
  merge_besthits
phylostrata <- stratify(results)
write.csv(phylostrata, "phylostrata_table.csv")
tabled <- table(stratify(results)$mrca_name)
write.csv(tabled, "phylostrata_stats.csv")

R script for the present run:

#!/usr/bin/Rscript
library(devtools)
source("https://bioconductor.org/biocLite.R")
library(phylostratr)
library(magrittr)
library(plotly)
focal_taxid <- '7463'
strata <-
  uniprot_strata(focal_taxid, from=2) %>%
  add_taxa('7463') %>%
  strata_apply(f=diverse_subtree, n=5, weights=uniprot_weight_by_ref()) %>%
  add_taxa(c('30195','88501','78185','78189','44477','132113','143995','156304','166423','175324','175328','178035','481568','516756','597456')) %>%
  use_recommended_prokaryotes %>%
  uniprot_fill_strata
results <- strata_blast(strata, blast_args=list(nthreads=8)) %>%
  strata_besthits %>%
  merge_besthits
phylostrata <- stratify(results)
write.csv(phylostrata, "phylostrata_table.csv")
tabled <- table(stratify(results)$mrca_name)
write.csv(tabled, "phylostrata_stats.csv")
plot_heatmaps(results, "heatmaps3.pdf")

Previous results:

"","Var1","Freq"
"1","cellular organisms",6853
"2","Eukaryota",5120
"3","Opisthokonta",415
"4","Metazoa",1940
"5","Eumetazoa",333
"6","Bilateria",456
"7","Protostomia",277
"8","Ecdysozoa",62
"9","Panarthropoda",48
"10","Arthropoda",216
"11","Mandibulata",72
"12","Pancrustacea",131
"13","Hexapoda",112
"14","Neoptera",639
"15","Holometabola",62
"16","Apocrita",305
"17","Aculeata",188
"18","Apoidea",75
"19","Apidae",70
"20","Apis",97
"21","Apis florea",205

Present results:

"","Var1","Freq"
"1","cellular organisms",284161
"2","Eukaryota",103882
"3","Opisthokonta",44785
"4","Metazoa",63526
"5","Eumetazoa",63633
"6","Bilateria",130111
"7","Protostomia",153915
"8","Ecdysozoa",173269
"9","Panarthropoda",37748
"10","Arthropoda",95879
"11","Mandibulata",14997
"12","Pancrustacea",57473
"13","Hexapoda",48852
"14","Neoptera",106423
"15","Holometabola",116966
"16","Apocrita",33352
"17","Aculeata",72995
"18","Apoidea",38565
"19","Apidae",158736
"20","Apinae",8673
"21","Apis",15274
"22","Apis florea",15810

Clearly, this isn't right. Could you please take a look at it?

Thanks!

BLAST Database error

Hello!
Thank you for such important and useful software. Unfortunately, there was error during the similarity search step. The full code is below:


setwd("/media/sf_Space/Genes/Phylostratr/")
library(phylostratr)
library(reshape2)
library(taxizedb)
library(dplyr)
library(readr)
library(magrittr)
focal_taxid <- '981537'
strata <- uniprot_strata(focal_taxid, from=1) %>%
strata_apply(f=diverse_subtree, n=5, weights=uniprot_weight_by_ref()) %>%
use_recommended_prokaryotes %>%
add_taxa(c('4932', '9606')) %>%
uniprot_fill_strata
The focal species is not present in UniProt. You may add it after retrieving uniprot sequences (i.e. with 'uniprot_fill_strata') with a command such as: strata_obj@data$faa[[focal_taxid]] <- '/path/to/your/focal-species.faa'
strata@data$faa[[focal_taxid]] <- '/media/sf_Space/Genes/Final_sets_of_sequences/Psimillimum/good.Psimillimum_ref.clustered.genes_level.after_filters.prot.fasta'
strata <- add_taxa(strata, "79327")
strata@data$faa[["79327"]] <- '/media/sf_Space/Genes/Phylostratr/WormBase_parasites/schmidtea_mediterranea.PRJNA379262.WBPS14.protein.fa'
pdf(file="Tree_plot.pdf")
strata %>% strata_convert(target='all', to='name') %>% sort_strata %>% plot(cex=0.2, no.margin=TRUE, label.offset=1)
dev.off()
null device
1
strata <- strata_blast(strata, blast_args=list(nthreads=3))
BLAST Database error: No alias or index file found for protein database [blastdb/4932.faa] in search path [/media/sf_Space/Genes/Phylostratr::]
24932: blasting ...


The fasta file with sequences (4932.faa) exist and located in the /media/sf_Space/Genes/Phylostratr/uniprot-seqs directory.
Additional information about system:
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.4 LTS


library(devtools)
session_info()
─ Packages ───────────────────────────────────────────────────────────────────
package * version date lib source
ape 5.4 2020-06-03 [1] CRAN (R 3.6.3)
assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.1)
backports 1.1.8 2020-06-17 [1] CRAN (R 3.6.3)
bit 1.1-15.2 2020-02-10 [1] CRAN (R 3.6.3)
bit64 0.9-7 2017-05-08 [1] CRAN (R 3.6.1)
blob 1.2.1 2020-01-20 [1] CRAN (R 3.6.3)
callr 3.4.3 2020-03-28 [1] CRAN (R 3.6.3)
cli 2.0.2 2020-02-28 [1] CRAN (R 3.6.3)
crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.1)
curl 4.3 2019-12-02 [1] CRAN (R 3.6.3)
DBI 1.1.0 2019-12-15 [1] CRAN (R 3.6.3)
dbplyr 1.4.4 2020-05-27 [1] CRAN (R 3.6.3)
desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.1)
devtools * 2.2.1 2019-09-24 [1] CRAN (R 3.6.1)
digest 0.6.25 2020-02-23 [1] CRAN (R 3.6.3)
dplyr * 1.0.0 2020-05-29 [1] CRAN (R 3.6.3)
ellipsis 0.3.1 2020-05-15 [1] CRAN (R 3.6.3)
fansi 0.4.1 2020-01-08 [1] CRAN (R 3.6.3)
fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.1)
generics 0.0.2 2018-11-29 [1] CRAN (R 3.6.1)
glue 1.4.1 2020-05-13 [1] CRAN (R 3.6.3)
hms 0.5.3 2020-01-08 [1] CRAN (R 3.6.3)
hoardr 0.5.2 2018-12-02 [1] CRAN (R 3.6.3)
lattice 0.20-41 2020-04-02 [4] CRAN (R 3.6.3)
lifecycle 0.2.0 2020-03-06 [1] CRAN (R 3.6.3)
magrittr * 1.5 2014-11-22 [1] CRAN (R 3.6.1)
memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.1)
nlme 3.1-147 2020-04-13 [4] CRAN (R 3.6.3)
phylostratr * 0.2.1 2020-07-09 [1] Github (dc1e49a)
pillar 1.4.4 2020-05-05 [1] CRAN (R 3.6.3)
pkgbuild 1.0.8 2020-05-07 [1] CRAN (R 3.6.3)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.1)
pkgload 1.1.0 2020-05-29 [1] CRAN (R 3.6.3)
plyr 1.8.6 2020-03-03 [1] CRAN (R 3.6.3)
prettyunits 1.1.1 2020-01-24 [1] CRAN (R 3.6.3)
processx 3.4.3 2020-07-05 [1] CRAN (R 3.6.3)
ps 1.3.3 2020-05-08 [1] CRAN (R 3.6.3)
purrr 0.3.4 2020-04-17 [1] CRAN (R 3.6.3)
R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.3)
rappdirs 0.3.1 2016-03-28 [1] CRAN (R 3.6.3)
Rcpp 1.0.5 2020-07-06 [1] CRAN (R 3.6.3)
readr * 1.3.1 2018-12-21 [1] CRAN (R 3.6.1)
remotes 2.1.0 2019-06-24 [1] CRAN (R 3.6.1)
reshape2 * 1.4.4 2020-04-09 [1] CRAN (R 3.6.3)
rlang 0.4.6 2020-05-02 [1] CRAN (R 3.6.3)
rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.1)
RSQLite 2.2.0 2020-01-07 [1] CRAN (R 3.6.3)
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.1)
stringi 1.4.6 2020-02-17 [1] CRAN (R 3.6.3)
stringr 1.4.0 2019-02-10 [1] CRAN (R 3.6.1)
taxizedb * 0.1.7.9601 2020-07-09 [1] Github (arendsee/taxizedb@a5a0b5c)
testthat 2.3.2 2020-03-02 [1] CRAN (R 3.6.3)
tibble 3.0.2 2020-07-07 [1] CRAN (R 3.6.3)
tidyselect 1.1.0 2020-05-11 [1] CRAN (R 3.6.3)
usethis * 1.5.1 2019-07-04 [1] CRAN (R 3.6.1)
vctrs 0.3.1 2020-06-05 [1] CRAN (R 3.6.3)
withr 2.2.0 2020-04-20 [1] CRAN (R 3.6.3)


Any help would be appreciated!
Thanks

strata_blast issue

Hi there,
I couldn't run strata_blast(strata), and it says:
Error in system2("blastdbcmd", stdout = TRUE, stderr = TRUE, args = c("-info", :
error in running command

How can I solve the problem?

Thank you!
Best,
CW

Error in readLines(con) : HTTP error 400.

Hi,

I am trying to run the Arabidopsis example and am running into difficulties. When running any of the uniprot commands e.g. uniprot_weight_by_ref or uniprot_strata I get the same error message: Error in readLines(con) : HTTP error 400.

My error for this chunk of code in the markdown file is this:

Error in readLines(con) : HTTP error 400.
5.readLines(con)
4. readLines(con) %>% cast
3. wrap_uniprot_id_retrieval(db = "taxonomy", query = query, cast = as.integer, ...)
2. uniprot_downstream_ids(clade, reference_only = TRUE)

  1. uniprot_weight_by_ref()

Thank you,
Kathy

Empty faa file for 1008392 produces error with diamond

I found that the function add_recommended_prokaryotes add a species with tax id 1008392. The uniprot_fill_strata function downloads an empty faa file for this. This creates an issue when using the newly added strata_diamond function as diamond throws an error when using an empty file.

I looked up NCBI and found this species only has a nucleotide sequence https://www.ncbi.nlm.nih.gov/taxonomy/?term=1008392
Uniprot also returns empty result: https://www.uniprot.org/uniprot/?query=taxonomy%3A1008392&sort=score

I think this species should be removed from add_recommended_prokaryotes

My current workaround for this problem is to manually remove 1008392 from list of prokaryotes:

prokaryote_sample <- readRDS(system.file("extdata","prokaryote_sample.rda", package = "phylostratr"))
prokaryotes_toadd<-prokaryote_sample$tip.label
#remove these
toremove<-as.character(c(1008392)) #this has empty faa file
prokaryotes_toadd<-prokaryotes_toadd [! prokaryotes_toadd %in% toremove]
h_strata <-
  uniprot_strata(focal_taxid, from=2) %>%
  strata_apply(f=diverse_subtree, n=10, weights=uniprot_weight_by_ref()) %>%
  add_taxa(prokaryotes_toadd) %>%
  uniprot_fill_strata

#rundiamond
h_strata <- strata_diamond(h_strata, blast_args=list(nthreads=20)) %>% strata_besthits

Error (?) - strata_besthits

Hello!
I am trying to use phylostratr to analyze a subset of 2000 human proteins. I can load the BLAST results, but I find an error when I try to recover the best hits.

My code:

library(devtools)
source("https://bioconductor.org/biocLite.R")
library(phylostratr)
library(magrittr)
library(plotly)

weights=uniprot_weight_by_ref()
focal_taxid <- '9606'
strata <-
  uniprot_strata(focal_taxid, from=2L) %>%
  strata_apply(f=diverse_subtree, n=5L, weights=weights) %>%
  use_recommended_prokaryotes %>%
  uniprot_fill_strata
strata@data$faa[['9606']] <- "my_subset.faa"
strata_blast(strata, blast_args=list(nthreads=8)) %>%
  strata_besthits %>%
  merge_besthits

#Error in mutate_impl(.data, dots) : 
#Evaluation error: Column `staxid` not found in `.data`
#Call `rlang::last_error()` to see a backtrace.

rlang::last_error()
#<error>
#message: Column `staxid` not found in `.data`
#class:   `rlang_data_pronoun_not_found`
#backtrace:
#-phylostratr::strata_blast(strata, blast_args = list(nthreads = 4))
#-phylostratr::strata_besthits(.)
#Call `summary(rlang::last_error())` to see the full backtrace

summary(rlang::last_error())

#<error>
#message: Column `staxid` not found in `.data`
#class:   `rlang_data_pronoun_not_found`
#fields:  `message`, `trace` and `parent`
#backtrace:
#x
#+-`%>%`(...)
#| +-base::withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
#| \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
#|   \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
#|     \-global::`_fseq`(`_lhs`)
#|       \-magrittr::freduce(value, `_function_list`)
#|         \-function_list[[i]](value)
#|           \-phylostratr::strata_besthits(.)
#|             \-base::lapply(taxa, get_besthit, strata = strata)
#|               \-phylostratr:::FUN(X[[i]], ...)
#|                 \-`%>%`(...)
#|                   +-base::withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
#|                   \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
#|                     \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
#|                       \-phylostratr:::`_fseq`(`_lhs`)
#|                         \-magrittr::freduce(value, `_function_list`)
#|                           +-base::withVisible(function_list[[k]](value))
#|                           \-function_list[[k]](value)
#|                             \-phylostratr::get_max_hit(.)
#|                               \-`%>%`(...)
#|                                 +-base::withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
#|                                 \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
#|                                   \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
#|                                     \-phylostratr:::`_fseq`(`_lhs`)
#|                                       \-magrittr::freduce(value, `_function_list`)
#|                                         \-function_list[[i]](value)
#|                                           +-dplyr::group_by(., .data$qseqid, .data$staxid)
#|                                           \-dplyr:::group_by.data.frame(., .data$qseqid, .data$staxid)
#|                                             \-dplyr::group_by_prepare(.data, ..., add = add)
#|                                               \-dplyr:::add_computed_columns(.data, new_groups)
#|                                                 +-dplyr::mutate(.data, !!!mutate_vars)
#|                                                 \-dplyr:::mutate.tbl_df(.data, !!!mutate_vars)
#|                                                   \-dplyr:::mutate_impl(.data, dots)
#+-base::tryCatch(...)
#| \-base:::tryCatchList(expr, classes, parentenv, handlers)
#|   +-base:::tryCatchOne(...)
#|   | \-base:::doTryCatch(return(expr), name, parentenv, handler)
#|   \-base:::tryCatchList(expr, names[-nh], parentenv, handlers[-nh])
#|     \-base:::tryCatchOne(expr, names, parentenv, handlers[[1L]])
#|       \-base:::doTryCatch(return(expr), name, parentenv, handler)
#+-base::evalq(.data$staxid, <environment>)
#| \-base::evalq(.data$staxid, <environment>)
#|   +-staxid
#|   \-rlang:::`$.rlang_data_pronoun`(.data, staxid)
#|     \-rlang:::data_pronoun_get(x, nm)
#\-rlang:::abort_data_pronoun(x)

Thanks,

Rafael

Error: 'classification' is not an exported object from 'namespace:taxizedb'

Hello,

I am trying to use phylostratr and I cannot even run the example. When I run the function "uniprot_strata" I get the following error:

Error: 'classification' is not an exported object from 'namespace:taxizedb'

I looked at the function (uniprot_strata) and it seems it calles the function "taxizedb::classification" but this function is not part of the package taxizedb:

lsf.str("package:taxizedb")
db_download_col : function (verbose = TRUE)
db_download_gbif : function (verbose = TRUE)
db_download_itis : function (verbose = TRUE)
db_download_tpl : function (verbose = TRUE)
db_load_col : function (path, user = "root", pwd = NULL, verbose = TRUE)
db_load_gbif : function (verbose = TRUE)
db_load_itis : function (path, user, pwd = NULL, verbose = TRUE)
db_load_tpl : function (path, user, pwd = NULL, verbose = TRUE)
sql_collect : function (src, query, ...)
src_col : function (user = "root", password = NULL, dbname = "col", ...)
src_gbif : function (path)
src_itis : function (user, password, dbname = "ITIS", ...)
src_tpl : function (user, password, dbname = "plantlist", ...)

The following are the versions of packages loaded:
[1] testthat_2.3.2 onekp_0.3.0 rlang_0.4.7 taxize_0.9.97
[5] magrittr_1.5 phylostratr_0.2.1 readr_1.3.1 dplyr_1.0.1
[9] taxizedb_0.1.4 hoardr_0.5.2 reshape2_1.4.4 withr_2.2.0
[13] devtools_2.3.0 usethis_1.6.1

I am using a wrong version of any package?

Error: The focal species is not present in UniProt

Hi I am trying to repeat some analysis that I have managed to run in the past, but now when I am running it on our hpc, it is no longer working. I got the error below no matter what focal_id I use:

so from the main page, when I try:
focal_taxid<-"3702"
strata<-uniprot_strata(focal_taxid,from=2) %>%
use_recommended_prokaryotes %>%
add_taxa(c('4932','9606')) %>%
uniprot_fill_strata

The focal species is not present in UniProt. You may add it after retrieving uniprot sequences (i.e. with 'uniprot_fill_strata') with a command such as: strata_obj@data$faa[[focal_taxid]] <- '/path/to/your/focal-species.faa'
Error in integer(max(oldnodes)) : vector size cannot be infinite
In addition: Warning message:
In max(oldnodes) : no non-missing arguments to max; returning -Inf

I have installed phylostratr from github and blastp.

Thank you very much!

Best,
CW

Issue in compiling phylostratr results

After I loaded BLAST, I ran this line
strata <- strata_blast(strata, blast_args=list(nthreads=8L)) %>% strata_besthits

It skipped a bunch of taxonomy IDs, and stopped at 3311. It gave this error:

3311: blasting ...

sh: line 1: 7524 Segmentation fault (core dumped) 'blastp' -db blastdb/SGTW.faa -query uniprot-seqs/3702.faa -outfmt "6 qseqid sseqid qstart qend sstart send evalue score" -num_threads 8 -seg no > '3311.tab'

Error in .validate_table(out, "read_blast", c("qseqid", "sseqid", "evalue")) :

'read_blast': input table is missing required columns: qseqid, sseqid, evalue

Columns:

I then looked at my .tab files and I had no tab file for 3311. I don't know what I may be doing wrong. I'm attaching my R code. Maybe it can give you some clue as to what I may be doing wrong.

Thank you!

Error in if (class(strata) != "Strata") { : the condition has length > 1

Hi Arendsee,
Thanks for your help, i get the preliminary results from the phylostratr package. It boosts my study a lot, thank you sincerely!
However, i met some new problems in further study, like in plot_heatmaps , strata2 <- strata_convert(strata, target='all', to='name') ,get_phylostrata_map(strata) and so on. It have problems like
①strata2 <- strata_convert(strata, target='all', to='name') %>%

  • print
    Error in ap_vector_dispatch(x = x, db = db, cmd = "taxid2name", verbose = verbose, :
    trying to get slot "tree" from an object (class "glue") that is not an S4 object

②map <- get_phylostrata_map(strata) %>% select(mrca, ps) %>% distinct
Error in if (class(strata) != "Strata") { : the condition has length > 1
👆 this one i really don't know how to resolve.....

③plot_heatmaps(results, filename="human.pdf",tree=strata@tree, focal_id=focal_taxid,to_name = TRUE, scheme = )
Scale for fill is already present.
Adding another scale for fill, which will replace the existing scale.
………………
Scale for fill is already present.
Adding another scale for fill, which will replace the existing scale.
Error in stat_tree():
! Problem while converting geom to grob.
ℹ Error occurred in the 1st layer.
Caused by error in check.length():
! 'gpar'成分'lwd'的长度不能为零
Run rlang::last_error() to see where the error occurred.

Could you remind me how to solve the problem,thank you !! I wish you all the best!!

[Question]NCBI nr database VS uniprot

Hi arendsee,
The first paper "Domazet-Loso et al. 2007" uses NCBI nr database. Your R code indicates that phylostratr uses Uniprot. I find the classification results from two databases are very different. I am new to prot database. Do you know what is the difference between them? Why does phylostratr use uniprot?

Thanks,
Haojing

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.