I re-ran the phylostratr
recently on Apis florea
and got really weired results. The number of genes reported per strata were way higher than the previous run and the sum doesn't equate to the total number of genes. I think it is due to some bug introduced recently because the previous run had much resonable numbers. The only difference between the previous run and this run is that I've added 16 extra spp for the latest run.
Here is more information:
R script for previous run
library(devtools)
source("https://bioconductor.org/biocLite.R")
library(phylostratr)
library(magrittr)
library(plotly)
focal_taxid <- '7463'
strata <-
uniprot_strata(focal_taxid, from=2) %>%
add_taxa('7463') %>%
strata_apply(f=diverse_subtree, n=5, weights=uniprot_weight_by_ref()) %>%
use_recommended_prokaryotes %>%
uniprot_fill_strata
results <- strata_blast(strata, blast_args=list(nthreads=8)) %>%
strata_besthits %>%
merge_besthits
phylostrata <- stratify(results)
write.csv(phylostrata, "phylostrata_table.csv")
tabled <- table(stratify(results)$mrca_name)
write.csv(tabled, "phylostrata_stats.csv")
R script for the present run:
#!/usr/bin/Rscript
library(devtools)
source("https://bioconductor.org/biocLite.R")
library(phylostratr)
library(magrittr)
library(plotly)
focal_taxid <- '7463'
strata <-
uniprot_strata(focal_taxid, from=2) %>%
add_taxa('7463') %>%
strata_apply(f=diverse_subtree, n=5, weights=uniprot_weight_by_ref()) %>%
add_taxa(c('30195','88501','78185','78189','44477','132113','143995','156304','166423','175324','175328','178035','481568','516756','597456')) %>%
use_recommended_prokaryotes %>%
uniprot_fill_strata
results <- strata_blast(strata, blast_args=list(nthreads=8)) %>%
strata_besthits %>%
merge_besthits
phylostrata <- stratify(results)
write.csv(phylostrata, "phylostrata_table.csv")
tabled <- table(stratify(results)$mrca_name)
write.csv(tabled, "phylostrata_stats.csv")
plot_heatmaps(results, "heatmaps3.pdf")
Previous results:
"","Var1","Freq"
"1","cellular organisms",6853
"2","Eukaryota",5120
"3","Opisthokonta",415
"4","Metazoa",1940
"5","Eumetazoa",333
"6","Bilateria",456
"7","Protostomia",277
"8","Ecdysozoa",62
"9","Panarthropoda",48
"10","Arthropoda",216
"11","Mandibulata",72
"12","Pancrustacea",131
"13","Hexapoda",112
"14","Neoptera",639
"15","Holometabola",62
"16","Apocrita",305
"17","Aculeata",188
"18","Apoidea",75
"19","Apidae",70
"20","Apis",97
"21","Apis florea",205
Present results:
"","Var1","Freq"
"1","cellular organisms",284161
"2","Eukaryota",103882
"3","Opisthokonta",44785
"4","Metazoa",63526
"5","Eumetazoa",63633
"6","Bilateria",130111
"7","Protostomia",153915
"8","Ecdysozoa",173269
"9","Panarthropoda",37748
"10","Arthropoda",95879
"11","Mandibulata",14997
"12","Pancrustacea",57473
"13","Hexapoda",48852
"14","Neoptera",106423
"15","Holometabola",116966
"16","Apocrita",33352
"17","Aculeata",72995
"18","Apoidea",38565
"19","Apidae",158736
"20","Apinae",8673
"21","Apis",15274
"22","Apis florea",15810
Clearly, this isn't right. Could you please take a look at it?
Thanks!