iqbal-lab-org / bigsi Goto Github PK
View Code? Open in Web Editor NEWThis project forked from phelimb/bigsi
BItsliced Genomic Signature Index - Efficient indexing and search in very large collections of WGS data
License: MIT License
This project forked from phelimb/bigsi
BItsliced Genomic Signature Index - Efficient indexing and search in very large collections of WGS data
License: MIT License
This is the test coverage report from PR #1, which adds more tests to BIGSI, sorted by source file with lowest test coverage to highest:
Coverage report.pdf
We should improve the test coverage in many files.
I wonder if you think that the tests are cumbersome to run, due to using real DBs. One option would be to mock the DBs, but it is nice to have tests using real DBs also. I see that many of the tests are actually integration tests, as they test not only the function/method being tested, but all the other functions and methods called by the tested function/method. I could start adding unit testing with mocking if you wish, although it seems that integration tests might be more complete, but less specific, than unit testing.
Anyway, this issue is more like a discussion issue to talk about how we should improve the testing framework.
We had issues due to dependencies updating and API changes, modifying BIGSI behaviour. I feel very strongly that we should ensure that users and devs have the exact same environment, so the execution and issues are reproducible, and we don't have unexpected bugs. The only way to achieve this is to require everyone to use the containers, but I don't think that is feasible. So, we should at least fix the python dependencies versions, so at least the python environment and the dependencies are always the same.
I don't think there is the downside of our dependencies never getting upgraded if we fix the versions: we are just controlling the dependencies' versions. We will thus be responsible for upgrading the dependencies from time to time, but before upgrading the dependencies, we will be able to make sure everything works by running a comprehensive test suite.
For building a BIGSI index for a large dataset I would like to know whether BIGSI supports building Bloom filters directly from the sequence files (FASTA/FASTQ). As far as I have read the code I do not find anything related to this, except for the bloom function docstring statement. In general, BIGSI requires cortex graphs to build bloom filters. Can you please clarify this?
Thank you!
Hi,
I tried to follow the manual at https://bigsi.readme.io/docs/your-first-bigsi and run BIGSI in docker image (phelimb/bigsi:63768c2).
I got the error messages in the search
step:
❯ docker run -v $PWD/example-data:/data phelimb/bigsi:63768c2 bigsi search --config /data/configs/berkeleydb.yaml CGGCGAGGAAGCGTTAAATCTCTTTCTGACG
Traceback (most recent call last):
File "/usr/local/bin/bigsi", line 11, in <module>
load_entry_point('bigsi==0.3.2', 'console_scripts', 'bigsi')()
File "/usr/local/lib/python3.7/dist-packages/bigsi-0.3.2-py3.7.egg/bigsi/__main__.py", line 178, in main
File "hug/api.py", line 390, in hug.api.CLIInterfaceAPI.__call__
File "hug/interface.py", line 551, in hug.interface.CLI.__call__
File "hug/interface.py", line 547, in hug.interface.CLI.__call__
File "hug/interface.py", line 100, in hug.interface.Interfaces.__call__
File "/usr/local/lib/python3.7/dist-packages/bigsi-0.3.2-py3.7.egg/bigsi/__main__.py", line 158, in search
File "/usr/local/lib/python3.7/dist-packages/bigsi-0.3.2-py3.7.egg/bigsi/graph/bigsi.py", line 133, in __init__
File "/usr/local/lib/python3.7/dist-packages/bigsi-0.3.2-py3.7.egg/bigsi/graph/index.py", line 23, in __init__
File "/usr/local/lib/python3.7/dist-packages/bigsi-0.3.2-py3.7.egg/bigsi/matrix/bitmatrix.py", line 16, in __init__
File "/usr/local/lib/python3.7/dist-packages/bigsi-0.3.2-py3.7.egg/bigsi/storage/base.py", line 67, in get_integer
File "/usr/local/lib/python3.7/dist-packages/bigsi-0.3.2-py3.7.egg/bigsi/storage/base.py", line 21, in __getitem__
File "/usr/local/lib/python3.7/dist-packages/bsddb3/__init__.py", line 239, in __getitem__
return _DeadlockWrap(lambda: self.db[key]) # self.db[key]
File "/usr/local/lib/python3.7/dist-packages/bsddb3/dbutils.py", line 67, in DeadlockWrap
return function(*_args, **_kwargs)
File "/usr/local/lib/python3.7/dist-packages/bsddb3/__init__.py", line 239, in <lambda>
return _DeadlockWrap(lambda: self.db[key]) # self.db[key]
KeyError: b'number_of_rows:int'
I also pasted the outputs of build
:
❯ docker run -v $PWD/example-data:/data phelimb/bigsi:63768c2 bigsi build --config /data/configs/berkeleydb.yaml /data/test1.bloom /data/test2.bloom -s s1 -s s2
INFO:bigsi.cmds.build:Building index: 0/1
DEBUG:bigsi.cmds.build:Loading /data/test1.bloom/test1.bloom
DEBUG:bigsi.cmds.build:Loading /data/test2.bloom/test2.bloom
DEBUG:bigsi.graph.bigsi:Insert sample metadata
DEBUG:bigsi.graph.bigsi:Create signature index
DEBUG:bigsi.graph.index:Transpose bitarrays
DEBUG:bigsi.graph.index:Insert rows
DEBUG:bigsi.storage.base:set bitarrays
I also tried the latest docker image phelimb/bigsi:310ef4c
, it failed at build
step:
❯ docker run -v $PWD/example-data:/data phelimb/bigsi:310ef4c bigsi build --config /data/configs/berkeleydb.yaml /data/test1.bloom /data/test2.bloom -s s1 -s s2
Traceback (most recent call last):
File "/usr/local/bin/bigsi", line 11, in <module>
load_entry_point('bigsi==0.3.8', 'console_scripts', 'bigsi')()
File "/usr/local/lib/python3.7/dist-packages/bigsi-0.3.8-py3.7.egg/bigsi/__main__.py", line 324, in main
File "/usr/local/lib/python3.7/dist-packages/hug/api.py", line 439, in __call__
result = self.commands.get(command)()
File "/usr/local/lib/python3.7/dist-packages/hug/interface.py", line 631, in __call__
raise exception
File "/usr/local/lib/python3.7/dist-packages/hug/interface.py", line 627, in __call__
result = self.output(self.interface(**pass_to_function), context)
File "/usr/local/lib/python3.7/dist-packages/hug/interface.py", line 123, in __call__
return __hug_internal_self._function(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/bigsi-0.3.8-py3.7.egg/bigsi/__main__.py", line 157, in build
AssertionError
Could you please provide any suggestions?
Hello,
I work in a ministry of health in the pathogenes detection.
I read about Bigsi (congratulation!), and I think it could be useful.
However didn't find the docs very useful (sorry), and the demo link https://bigsi.readme.io/ doesn't work
We need to find presence of specific genes in 600.000 Salmonella genomes.
I am in charge to find the better tool to do that.
Could you tell me please if you think it's doable with Bigsi, and how many volume in my hard drive I need,
and how much time (approximatively of course) it would take?
Thank you very much
David
Hi,
I have downloaded the all-microbial-bigsi-v03* files from the FTP at ftp://ftp.ebi.ac.uk/pub/software/bigsi/nat_biotech_2018/all-microbial-index-v03/ and concatenated them to get a combined-index file, which I then referenced using the template config from FTP:
h: 3
m: 25000000
nproc: 4
k: 31
storage-engine: berkeleydb
storage-config:
filename: /media/disk2/combined-index # cat * > combined-index
flag: "c" ## Change to 'r' for read-only access
However I get the following error when I try to use this config and make a search:
(base) joe@fractal:~/BIGSI$ bigsi search -c config.yaml CGGCGAGGAAGCGTTAAATCTCTTTCTGACG
Traceback (most recent call last):
File "/home/joe/anaconda3/bin/bigsi", line 33, in <module>
sys.exit(load_entry_point('bigsi==0.3.8', 'console_scripts', 'bigsi')())
File "/home/joe/anaconda3/lib/python3.8/site-packages/bigsi-0.3.8-py3.8.egg/bigsi/__main__.py", line 402, in main
File "/home/joe/anaconda3/lib/python3.8/site-packages/hug/api.py", line 441, in __call__
result = self.commands.get(command)()
File "/home/joe/anaconda3/lib/python3.8/site-packages/hug/interface.py", line 650, in __call__
raise exception
File "/home/joe/anaconda3/lib/python3.8/site-packages/hug/interface.py", line 646, in __call__
result = self.output(self.interface(**pass_to_function), context)
File "/home/joe/anaconda3/lib/python3.8/site-packages/hug/interface.py", line 129, in __call__
return __hug_internal_self._function(*args, **kwargs)
File "/home/joe/anaconda3/lib/python3.8/site-packages/bigsi-0.3.8-py3.8.egg/bigsi/__main__.py", line 283, in search
File "/home/joe/anaconda3/lib/python3.8/site-packages/bigsi-0.3.8-py3.8.egg/bigsi/__main__.py", line 66, in search_bigsi
File "/home/joe/anaconda3/lib/python3.8/site-packages/bigsi-0.3.8-py3.8.egg/bigsi/graph/bigsi.py", line 181, in search
File "/home/joe/anaconda3/lib/python3.8/site-packages/bigsi-0.3.8-py3.8.egg/bigsi/graph/bigsi.py", line 196, in exact_filter
File "/home/joe/anaconda3/lib/python3.8/site-packages/bigsi-0.3.8-py3.8.egg/bigsi/graph/bigsi.py", line 208, in get_sample_list
File "/home/joe/anaconda3/lib/python3.8/site-packages/bigsi-0.3.8-py3.8.egg/bigsi/graph/metadata.py", line 70, in colours_to_samples
File "/home/joe/anaconda3/lib/python3.8/site-packages/bigsi-0.3.8-py3.8.egg/bigsi/graph/metadata.py", line 71, in <dictcomp>
File "/home/joe/anaconda3/lib/python3.8/site-packages/bigsi-0.3.8-py3.8.egg/bigsi/graph/metadata.py", line 59, in colour_to_sample
File "/home/joe/anaconda3/lib/python3.8/site-packages/bigsi-0.3.8-py3.8.egg/bigsi/graph/metadata.py", line 96, in _get_string
File "/home/joe/anaconda3/lib/python3.8/site-packages/bigsi-0.3.8-py3.8.egg/bigsi/storage/base.py", line 84, in get_string
File "/home/joe/anaconda3/lib/python3.8/site-packages/bigsi-0.3.8-py3.8.egg/bigsi/storage/base.py", line 21, in __getitem__
KeyError: b'metadata:447931:string'
Are you able to provide any guidance on what I might be doing wrong?
Thanks for your work on a really fascinating research paper.
Could an updated conda package be made please from this branch of the code? Currently the one available is from over a year ago.
Hi all,
when running:
bigsi bulk_search -c config_10K_00.yaml -f csv -t 0.0 --score True foo.fas
with the files of the paper
https://www.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.000499
I get this error:
Traceback (most recent call last):
File "/usr/local/bin/bigsi", line 11, in <module>
load_entry_point('bigsi==0.3.5', 'console_scripts', 'bigsi')()
File "/usr/local/lib/python3.7/dist-packages/bigsi-0.3.5-py3.7.egg/bigsi/__main__.py", line 307, in main
File "/usr/local/lib/python3.7/dist-packages/hug/api.py", line 399, in __call__
result = self.commands.get(command)()
File "/usr/local/lib/python3.7/dist-packages/hug/interface.py", line 546, in __call__
raise exception
File "/usr/local/lib/python3.7/dist-packages/hug/interface.py", line 542, in __call__
result = self.output(self.interface(**pass_to_function), context)
File "/usr/local/lib/python3.7/dist-packages/hug/interface.py", line 100, in __call__
return __hug_internal_self._function(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/bigsi-0.3.5-py3.7.egg/bigsi/__main__.py", line 259, in bulk_search
File "/usr/local/lib/python3.7/dist-packages/pyfasta/fasta.py", line 67, in __init__
raise FastaNotFound('"' + fasta_name + '"')
pyfasta.fasta.FastaNotFound: "True"
Looking in the code --score
should accept True or False.
Without --score
it works properly.
Where I am wrong?
Thank you in advance,
Alex
Details here: #2 (comment)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.