medvir / virmet Goto Github PK

Set of tools for viral metagenomics.

Python 84.95% R 1.94% Makefile 6.50% Batchfile 6.61%

bioinformatics bioinformatics-pipeline genome miseq python viral-database viral-metagenomics ncbi database bacterial-database

virmet's People

Contributors

Stargazers

Watchers

Forkers

msegurawang maryamzaheri gerbenvoshol nailouzhang wook2014

virmet's Issues

Read counts in orgs_list.tsv is underestimated for some species.

Do the grouping only based on accession ID. No need to group based on 'stitle', 'ssciname', 'species', 'tax_id'. In some cases, ssciname is nan and as a results the corresponding reads are excluded from the grouping and final results.

AssertionError

root@116ebf9b765d:/home/ubuntu# virmet fetch --viral n
Traceback (most recent call last):
File "/opt/miniconda/envs/test-virmet/bin/virmet", line 9, in
load_entry_point('VirMet==1.0rc1', 'console_scripts', 'virmet')()
File "/opt/miniconda/envs/test-virmet/lib/python3.5/site-packages/VirMet-1.0rc1-py3.5.egg/virmet/main.py", line 110, in main
args.func(args)
File "/opt/miniconda/envs/test-virmet/lib/python3.5/site-packages/VirMet-1.0rc1-py3.5.egg/virmet/main.py", line 39, in fetch_db
fetch.main(args)
File "/opt/miniconda/envs/test-virmet/lib/python3.5/site-packages/VirMet-1.0rc1-py3.5.egg/virmet/fetch.py", line 31, in main
assert gids_1 == gids_2
AssertionError

Reporting

NA in species list
turn on/off reporting of human endogenous retroviruses
mark usual suspects (HIV-2, choristoneura, uncultured marine virus) as contamination?

When picked_seqs are present, they must be added to viral_gi_taxid.dmp

Read counts in covplot is overestimated

covplot reports number of reads from samtools idxstats that counts the number of alignments, not the number of reads. For genomes with repetitive regions, this leads to overestimate the read counts.

Fetching databases in parallel

Downloading takes time, a trivial parallelisation would be useful.

Logging improvements

log more often which sample is being analysed
use different logging levels
move logging file to output directory

Re-analysis of samples

To re-analyse a sample (because of stoppen pipeline, new version), the output folder of that sample should first be deleted manually. The pipeline should skip samples that already have an output folder.

FileNotFoundError when running single files

When running a single file, VirMet is trying to create an output folder with the whole path as name instead only the run/file name.

mihuber@timavo:~$ virmet wolfpack --file /data/MiSeq/MiSeqOutput/170426_M01274_0185_000000000-B4FP5/Data/Intensities/BaseCalls/1000251161-UR-10-1_S9_L001_R1_001.fastq.gz
Traceback (most recent call last):
  File "/opt/miniconda/bin/virmet", line 11, in <module>
    load_entry_point('VirMet==1.1', 'console_scripts', 'virmet')()
  File "/opt/miniconda/lib/python3.5/site-packages/virmet/__main__.py", line 120, in main
    args.func(args)
  File "/opt/miniconda/lib/python3.5/site-packages/virmet/__main__.py", line 25, in wolfpack_run
    od = wolfpack.main(args)
  File "/opt/miniconda/lib/python3.5/site-packages/virmet/wolfpack.py", line 385, in main
    os.chdir(out_dir)
FileNotFoundError: [Errno 2] No such file or directory: 'virmet_output_/data/MiSeq/MiSeqOutput/170426_M01274_0185_000000000-B4FP5/Data/Intensities/BaseCalls/1000251161-UR-10-1_S9_L001_R1_001'

Adapt to phasing out of GI numbers by NCBI.

As written in their blog, NCBI phased out GI. VirMet has to adapt, for sure changing how fetch and update work.

Run `makeblastdb` at the end of `fetch_viral_genomes.py`

The command line is already copied.

More aggressive clean up

Get rid of

good.fastq
bad.fastq
good_*_.fastq

Compress

unique.tsv
hq_decont_reads.fasta

`virmet covplot` for viruses with a segmented genome

Because viruses with a segmented genome share the same ssciname in orgs_list.tsv it's only possible to make a coverage plot of one of the segments.

If one could (alternatively) use stitle for the --organism argument, it would be possible to distinguish between the different segments.

VirMet/src/virmet/covplot.py

Line 53 in 97351c2

criterion = orgs_list.loc[:, 'ssciname'].str.startswith(org_name).fillna(False)

if ssciname in unique.tsv is N/A, no entry is written in orgs_list.tsv

This bug was detected because there were much less (23 Pepino mosaic virus reads) listed in orgs_list.tsv compared to the amount of reads aligned to Pepino mosaic virus with virmet covplot(874).

Sample:

/analysis/VirMetResults/virmet_output_181116_M01274_0246_000000000-C6W4V/1000420494-ST-RNA_S9

sseqid not written in orgs_list.tsv:

MF422611.1 (Pepino mosaic virus isolate CH_bpo160, complete genome)
MF422613.1 (Pepino mosaic virus isolate CH_bpo162, complete genome)

sseqid written in orgs_list.tsv (example where it works):

gb|FJ212288.1| (Pepino mosaic virus, complete genome)

Adapt to phase out of genbank identifiers

NCBI phased GI numbers out in September 2016. The pipeline must adapt to this by retrieving and annotating sequences and taxonomies by accession numbers rather than by GIs.

Output

tidy data format: table with columns run, sample_id, sample_name, quality, organism_category, species, reads, percent instead of separate stats.csv and orgs_list.csv
coverage plot

TidyData Table

Include in VirMet the generation of a TidyData.csv Table.
Run, Sample, Name, Quality, Category, Species, Reads, Percent (of what?)

Locking strategy

Touch a file when a step is done so that it can be rerun skipping it.

docker scripts point to wrong ftp addresses for fetching genomes

Hi,
I have modified some of the scripts in the docker container to use to right ftp addresses to fetch for genomes. I have seen that also the scripts in github need to be modified.

Comparing your original scripts (here are fetch_ori.py, common_ori.py, wolfpack_ori.py) and the ones I'm using at the moment, here are the differences:

diff /opt/miniconda/envs/test-virmet/lib/python3.5/site-packages/VirMet-1.0rc1-py3.5.egg/virmet/fetch_ori.py /opt/miniconda/envs/test-virmet/lib/python3.5/site-packages/VirMet-1.0rc1-py3.5.egg/virmet/fetch.py
27c27
<         cml = 'cut -f 1,2 viral_seqs_info.tsv > viral_gi_taxid.dmp'
---
>         cml = 'cut -f 1,2 viral_seqs_info.tsv | awk \'{print $2"\t"$1}\' > viral_gi_taxid.dmp'
29,31c29,31
<         gids_1 = set(get_gids('viral_database.fasta'))
<         gids_2 = set([l.split()[0] for l in open('viral_gi_taxid.dmp')])
<         assert gids_1 == gids_2
---
>         ##gids_1 = set(get_gids('viral_database.fasta'))
>         ##gids_2 = set([l.split()[0] for l in open('viral_gi_taxid.dmp')])
>         ##assert gids_1 == gids_2
132c132
<             fasta_url = 'ftp://ftp.ncbi.nlm.nih.gov/genomes/Bos_taurus/Assembled_chromosomes/seq/bt_ref_Bos_taurus_UMD_3.1.1_%s.fa.gz' % chrom
---
>             fasta_url = 'ftp://ftp.ncbi.nlm.nih.gov/genomes/Bos_taurus/ARCHIVE/BUILD.6.1/Assembled_chromosomes/seq/bt_ref_Bos_taurus_UMD_3.1_%s.fa.gz' % chrom
136c136
<         gff3_url = 'ftp://ftp.ncbi.nlm.nih.gov/genomes/Bos_taurus/GFF/ref_Bos_taurus_UMD_3.1.1_top_level.gff3.gz'
---
>         gff3_url = 'ftp://ftp.ncbi.nlm.nih.gov/genomes/Bos_taurus/ARCHIVE/BUILD.6.1/GFF/ref_Bos_taurus_UMD_3.1_top_level.gff3.gz'


diff /opt/miniconda/envs/test-virmet/lib/python3.5/site-packages/VirMet-1.0rc1-py3.5.egg/virmet/wolfpack_ori.py /opt/miniconda/envs/test-virmet/lib/python3.5/site-packages/VirMet-1.0rc1-py3.5.egg/virmet/wolfpack.py
22c22
<     'bt_ref': '/data/virmet_databases/bovine/fasta/bt_ref_Bos_taurus_UMD_3.1.1.fasta.gz'
---
>     'bt_ref': '/data/virmet_databases/bovine/fasta/ref_Bos_taurus_UMD_3.1_top_level.gff3.gz'


diff /opt/miniconda/envs/test-virmet/lib/python3.5/site-packages/VirMet-1.0rc1-py3.5.egg/virmet/common_ori.py /opt/miniconda/envs/test-virmet/lib/python3.5/site-packages/VirMet-1.0rc1-py3.5.egg/virmet/common.py
128c128
<     x = gb['ftp_path'].apply(lambda col: col + '/' + col.split('/')[5] + '_genomic.fna.gz')
---
>     x = gb['ftp_path'].apply(lambda col: col + '/' + col.split('/')[9] + '_genomic.fna.gz') ```
I saw that this line is correct in the scripts in github

Best regards,
Maia

Build database virus

I try to build the database with the subcommand fetch, I obtained the next problem:

Traceback (most recent call last):
File "./virmet", line 11, in
load_entry_point('VirMet==1.1.1', 'console_scripts', 'virmet')()
File "/home/sonia/anaconda3/lib/python3.6/site-packages/virmet/main.py", line 120, in main
args.func(args)
File "/home/sonia/anaconda3/lib/python3.6/site-packages/virmet/main.py", line 44, in fetch_db
fetch.main(args)
File "/home/sonia/anaconda3/lib/python3.6/site-packages/virmet/fetch.py", line 21, in main
cml_search = viral_query('n')
File "/home/sonia/anaconda3/lib/python3.6/site-packages/virmet/common.py", line 134, in viral_query
os.mkdir(target_dir)
FileNotFoundError: [Errno 2] No such file or directory: '/data/virmet_databases/viral_nuccore'

could you tell me which the mistake in the build of database? please

FutureDeprecation warning in `covplot.py`

A warning is given by pandas: sort_values should replace order as this will be removed in a future release.

*coverage.pdf of large genomes not displayed properly

For large genomes sometimes the coverage plots (created with virmet covplot) are not displayed well.
One example is:
/analyses/VirMetResults/virmet_output_171110_M02081_0000_000000000-BF24W/BE-spike-SSIV-1-1_S6/Human_herpesvirus_5/Human_herpesvirus_5_coverage.pdf

Looking at the depth.txt file it looks like there is an even distribution of reads across a large part of the genome but in the .pdf nothing is visible.

Quality filtering of NanoPore reads

Disable length trimming based on quality for NanoPore reads. NanoPore quality scores do not follow PHRED scores.

@2b89169b-74c9-4653-8877-c3f21ec7674d runid=2564fad3a9906330fcc0c6c90ac0c8128b89d9ff read=213 ch=124 start_time=2017-05-24T11:01:53Z
TTGTTCGTTCAGTTGGGTGTTTTATGGTTTCGTTTTTCGTGCGCCGCTTCAACAGATGAAGATGTGACATCCATTATTAATAAAGTAGACAGGCATGCTGTGGTCAAAATGGCAGTTTGTGGTAATATAGCACCATATAGCAAGAATTCTAACAAAGTTTTAACTGTAATTTAACTAATCATTATCAAATCTCTGTAGCTGCTGAAGAGAAAAGAAAAGTGGTACACTTCTGCAAATGAGACAAATCCTGTTCATCGCCATCGACAGCATGGTTGAAAAAACTCTTTGATGGTTGTTGCATTTGGAGTGCTCTGTAGTTGACATTAAACCAGCCGTCAAAAGAATTGGCCTATTAAGCCAATGGCTTGATGTTGACTAAGAGTGTAGGAGCAGC
+
$$$')7',+*$&%(+2/3/2=,)''.+)*.,,0:830,+612033).95:8*1239BDCA2461=;<A?=B:268184?7>?@928+(63+.126>[email protected]=:4*5134;F5/3,+'5*-.926:3:B;C<9..238./53;;11)249*.1DEA1?2/),(--74-++'128;1E0-.27E3.++)'(,17876355<.*,'-880'/46.,6.7570/0B1D?8202--,*-0&01*)?1..EFFBE@3254+/+,+34666=8:0/;EFFGEBD5>=/++1+35@,=A35.68++2)(+'),)213.369;>4*/+<?>=856?7>B?9C7.(.*5/245?//7:=8544.,.23/2*+(.*@.+*2(0/&*+/*))+0-,-*)+

wolfpack

Hi,

virmet wolfpack --run /home/Exp01
Traceback (most recent call last):
  File "/home/miniconda2/envs/virmet1/bin/virmet", line 11, in <module>
    load_entry_point('VirMet==1.1.1', 'console_scripts', 'virmet')()
  File "/home/miniconda2/envs/virmet1/lib/python3.5/site-packages/virmet/__main__.py", line 120, in main
    args.func(args)
  File "/home/miniconda2/envs/virmet1/lib/python3.5/site-packages/virmet/__main__.py", line 25, in wolfpack_run
    od = wolfpack.main(args)
  File "/home/miniconda2/envs/virmet1/lib/python3.5/site-packages/virmet/wolfpack.py", line 410, in main
    sd = hunter(fq)
  File "/home/miniconda2/envs/virmet1/lib/python3.5/site-packages/virmet/wolfpack.py", line 126, in hunter 
    with open('prinseq.log') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'prinseq.log'

virmet.log
INFO 2017/12/01 14:22:40 __main__.py: main() 116:       /home/miniconda2/envs/virmet1/bin/virmet wolfpack --run /home/Exp01
INFO 2017/12/01 14:22:40 wolfpack.py: main() 391:       samples to run: S16 S7
INFO 2017/12/01 14:22:40 wolfpack.py: main() 409:       running hunter on /home/Exp01/A-16_S16_L001_R1_001.fastq.gz
DEBUG 2017/12/01 14:22:40 wolfpack.py: hunter() 47:     hunter will run on 16 processors
INFO 2017/12/01 14:22:40 common.py: run_child() 56:     Running instance of gunzip
DEBUG 2017/12/01 14:22:45 common.py: run_child() 62:    Completed
DEBUG 2017/12/01 14:22:45 wolfpack.py: hunter() 77:     trimming with seqtk
INFO 2017/12/01 14:22:45 common.py: run_child() 56:     Running instance of seqtk
DEBUG 2017/12/01 14:22:50 common.py: run_child() 62:    Completed
INFO 2017/12/01 14:22:50 common.py: run_child() 56:     Running instance of wc
DEBUG 2017/12/01 14:22:50 common.py: run_child() 62:    Completed
INFO 2017/12/01 14:22:50 common.py: run_child() 56:     Running instance of split
DEBUG 2017/12/01 14:22:52 common.py: run_child() 62:    Completed
DEBUG 2017/12/01 14:22:52 wolfpack.py: hunter() 101:    filtering with prinseq
INFO 2017/12/01 14:22:52 common.py: run_child() 56:     Running instance of /usr/bin/seq
ERROR 2017/12/01 14:22:52 common.py: run_child() 64:    Execution of /usr/bin/seq -f %03g 0 15 | xargs -P 16 -I {} prinseq             -fastq splitted{}.fastq -lc_method entropy -lc_threshold 70             -log prinseq{}.log -min_qual_mean 20             -out_good ./good{} -out_bad ./bad{} > ./prinseq.err 2>&1 failed with returncode 127:
ERROR 2017/12/01 14:22:52 common.py: run_child() 65:    /usr/bin/seq -f %03g 0 15 | xargs -P 16 -I {} prinseq             -fastq splitted{}.fastq -lc_method entropy -lc_threshold 70             -log prinseq{}.log -min_qual_mean 20             -out_good ./good{} -out_bad ./bad{} > ./prinseq.err 2>&1
DEBUG 2017/12/01 14:22:52 wolfpack.py: hunter() 108:    cleaning up
INFO 2017/12/01 14:22:52 common.py: run_child() 56:     Running instance of rm
DEBUG 2017/12/01 14:22:53 common.py: run_child() 62:    Completed

virmet readthedocs website not displayed properly

Something broke the website. Probably something with the sidebar, the top of each page is now cut off.

https://virmet.readthedocs.io/en/latest/

Error when no viral reads are found: AttributeError: 'Series'

AttributeError: 'Series' object has no attribute 'order' is thrown when no read is above the coverage/identity threshold. orgs_list must be written anyway.

Write better output for `update`

E.g., when update returns zero new sequences, exit and print a message to screen.

virmet wolfpack --run doesn't work if run_name starts with a 2

I'm not sure if this is the only line of code which causes this issue:

VirMet/src/virmet/wolfpack.py

Line 504 in d6fd569

if run_name.startswith('1') and len(run_name.split('-')[-1]) == 5 and \

To reproduce issue try running virmet wolfpack --run /data/MiSeq/MiSeqOutput/200110_M01274_0316_000000000-CW5HM/ on timavo

Document hot run capabilities

Add in the documentation what files should be deleted to rerun parts of the analysis.

Coverage plot

~~After a wolfpack run, user should be able to select a viral organism, realign viral reads to that and produce a coverage plot.~~
Already a point in #4

Update search query to include all viral genomes

As @sschmutz pointed out, the viral genomes page at NCBI lists, as of today, 7477 genomes (9533 sequences). While many of these are already included in the database, there are some notable exceptions like Influenza B. The viral query search in common.py returns 64165 sequences,

The query search given by NCBI to list all viral genomes is

"Viruses"[Organism] AND srcdb_refseq[PROP] NOT wgs[PROP] NOT "cellular organisms"[Organism] NOT AC_000001[PACC] : AC_999999[PACC]

The query search below should return both our old sequences and RefSeq genomes, and lists 67920 sequences

esearch -db nuccore -query "Viruses [orgn]
AND (\"complete genome\" [Title] OR srcdb_refseq[prop])
NOT txid131567 [orgn] NOT wgs[prop]
NOT AC_000001[PACC] : AC_999999[PACC]"

This seems to return a more complete database.

virmet tidytable does not finish if one sample doesn't contain any viral reads

Happens sometime (negative controls) that there are sequences, but no viral ones. If virmet tidytable is then used, following error is returned and it stops there
FileNotFoundError: File b'NTC-DNA_S3/stats.tsv' does not exist

This example is from the analysis virmet_output_180720_M01274_0227_000000000-BY9JM
@ozagordi

Input

run entire run (i.e. all fast files in a given directory) or just a single file
not restricted to only Illumina file name (Name_SX_L001_R....), i.e. input can also be a file like "file1.fastq" from any other source

Version of VirMet and virus reference database

It would be good if one could easily retrieve the version of MinVar and the last update date of the virus reference database (e.g. with a --version subcommand).

This would be important for

citation in publications
follow-up of changes in virus taxonomy

add measured variable of coverage and species information

This issue suggests two options to enhance the VirMet output.

Include a measurement of coverage for each entry in orgs_list.tsv, that way one could faster/better distinguish false from true hits.
One example is on timavo /analyses/VirMetResults/virmet_output_171127_M01274_0200_000000000-BHDG5/1000378040-AR-1_S1. If the coverage plots of the first three hits are compared (Moraxella phage, BeAn 58058 virus, Streptococcus phage) it's likely that the first two are false while the Streptococcus phage hits are true.
Add species level to each entry in orgs_list.tsv so it can be summarized. Sometimes there are a lot of different entries coming from the same virus. We don't always know if it's actually the same virus (keeping the original information is important) but it would help to make the report more readable.
One example is on timavo /Volumes/analyses/VirMetResults/virmet_output_171110_M02081_0000_000000000-BF24W/BE-spike-SSIV-1-1_S6.
sscinames: Human adenovirus 7d2 (staxids: 326124)
sscinames: Human adenovirus 7 (staxids: 10519)
...
they don't share the same staxids but if you look at the classification (in that case with the R package taxize), you see that they share the same species. Same with other adenovirus hits.

> library(taxize)
> classification("Human adenovirus 7d2", db="ncbi")

Retrieving data for taxon 'Human adenovirus 7d2'

$`Human adenovirus 7d2`
                         name         rank     id
1                     Viruses superkingdom  10239
2 dsDNA viruses, no RNA stage      no rank  35237
3                Adenoviridae       family  10508
4              Mastadenovirus        genus  10509
5      Human mastadenovirus B      species 108098
6         Human adenovirus B1      no rank 565302
7          Human adenovirus 7      no rank  10519
8        Human adenovirus 7d2      no rank 326124

attr(,"class")
[1] "classification"
attr(,"db")
[1] "ncbi"

> classification("Human adenovirus 7", db="ncbi")

Retrieving data for taxon 'Human adenovirus 7'

$`Human adenovirus 7`
                         name         rank     id
1                     Viruses superkingdom  10239
2 dsDNA viruses, no RNA stage      no rank  35237
3                Adenoviridae       family  10508
4              Mastadenovirus        genus  10509
5      Human mastadenovirus B      species 108098
6         Human adenovirus B1      no rank 565302
7          Human adenovirus 7      no rank  10519

attr(,"class")
[1] "classification"
attr(,"db")
[1] "ncbi"

Virus reads used for alignment covplot

We were wondering, why covplot aligns all viral reads to the sequence with the highest number of reads mapped and not just the reads of the selected organism. Depending on the alignment thresholds in virmet and covplot, reads from different organism could be combined in this new alignment.

Matching contaminant

When there is no match, pipeline reports '1'

read_table is deprecated, use read_csv instead

FutureWarning: read_table is deprecated, use read_csv instead.

We should probably replace this in tidytable.py to avoid future issues.

VirMet/src/virmet/tidytable.py

Line 27 in 97351c2

df = pd.read_table(stat_file, sep='\t', header=None,

VirMet/src/virmet/tidytable.py

Line 34 in 97351c2

df = pd.read_table(orgs_file, sep='\t', header=0)

virmet covplot not working because of missing ggplot package

Virmet covplot did run and generated a depth.txt file, but no plot because of missing ggplot package?
Error message: Error in library(ggplot2) : there is no package called ‘ggplot2’
Conda base environment was used.
Example here: /analyses/VirMetResults/virmet_output_191010_M01274_0292_000000000-CNPVH/1000458456-BE-DNA_S3/Torque/

Write `covplot` documentation

./ or ../ as --outdir argument in tidytable.py leads to an error

...but it works with covplot.py (virmet covplot --outdir ./ --organism "...").
I don't know if this is something which needs to be fixed (since it works with the absolute path).
This is the output I get:

stschmu@timavo:/analyses/VirMetResults/virmet_output_171024_M01274_0196_000000000-BF7JF$ virmet tidytable --outdir ./
Traceback (most recent call last):
  File "/opt/miniconda/bin/virmet", line 11, in <module>
    load_entry_point('VirMet==1.1.1', 'console_scripts', 'virmet')()
  File "/opt/miniconda/lib/python3.5/site-packages/virmet/__main__.py", line 120, in main
    args.func(args)
  File "/opt/miniconda/lib/python3.5/site-packages/virmet/__main__.py", line 16, in tidytable_run
    tidytable.main(args)
  File "/opt/miniconda/lib/python3.5/site-packages/virmet/tidytable.py", line 15, in main
    run = outdir.rstrip('/').split('/')[-1].split('virmet_output_')[1]
IndexError: list index out of range

Command to update fungal db, check also bacterial update

Write tests possibly.

depth.txt is overwritten if you run covplot more than once

When the subcommand covplot is used, organism_coverage.pdf and depth.txt results as output.
If you run the subcommand more than once (e.g. for another organism), the organism_coverage.pdf is unique, but the depth.txt file is overwritten.
A suggestion would be to rename depth.txt to organism_depth.txt.

mihuber@timavo$ virmet tidytable --outdir /analyses/VirMetResults_KidneyPairs/virmet_output_170407_M01274_0180_000000000-B4LLY/
Traceback (most recent call last):
  File "/opt/miniconda/bin/virmet", line 11, in <module>
    load_entry_point('VirMet==1.1', 'console_scripts', 'virmet')()
  File "/opt/miniconda/lib/python3.5/site-packages/virmet/__main__.py", line 120, in main
    args.func(args)
  File "/opt/miniconda/lib/python3.5/site-packages/virmet/__main__.py", line 16, in tidytable_run
    tidytable.main(args)
  File "/opt/miniconda/lib/python3.5/site-packages/virmet/tidytable.py", line 15, in main
    run = outdir.rstrip().split('/')[-1].split('virmet_output_')[1]
IndexError: list index out of range

This works

mihuber@timavo$ virmet tidytable --outdir /analyses/VirMetResults_KidneyPairs/virmet_output_170407_M01274_0180_000000000-B4LLY```

Inconsistency of viral reads count.

tax_orgs correctly excludes low coverage and low identity hits, summarise doesn't.