mdu-phl / lissero Goto Github PK

View Code? Open in Web Editor NEW

7.0 15.0 9.0 8.2 MB

In silico serotype prediction for Listeria monocytogenes

License: GNU General Public License v3.0

Python 98.03% Shell 1.97%

listeria-monocytogenes bioinformatics microbial-genomics serotyping public-health

lissero's People

Contributors

Stargazers

Watchers

Forkers

kwongj redmar-van-den-berg andersgs lindechun senasica-cnrpyc kristyhoran vikash84 vincenzopennone sap-phe-bioinformatics

lissero's Issues

Vomit stacktrace when unknown command?

lissero something
Traceback (most recent call last):
  File "/home/linuxbrew/.linuxbrew/bin/lissero", line 8, in <module>
    sys.exit(run_lissero())
  File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/site-packages/lissero/run_lissero.py", line 47, in run_lissero
    path_serodb = os.path.realpath(serotype_db)
  File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/posixpath.py", line 394, in realpath
    filename = os.fspath(filename)
TypeError: expected str, bytes or os.PathLike object, not NoneType

Add to Bioconda

Create a recipe for Bioconda.

Here are some docs: https://bioconda.github.io/contributor/guidelines.html

Here is the emmtyper recipe as a model: https://github.com/bioconda/bioconda-recipes/blob/master/recipes/emmtyper/meta.yaml

You will need to start by forking the Bioconda repo https://github.com/bioconda/, and creating a new branch (lissero).

One you finished creating and testing the recipe, make sure to squash your commits (git rebase -i master) - here is a good instruction set to keep handy: https://github.com/wprig/wprig/wiki/How-to-squash-commits, and then you can to the PR request to the main Bioconda repo. @tseemann can help.

"run_lissero somestring" gives vomit backtrace

linuxbrew@deepthought:~ $ run_lissero  foo
Traceback (most recent call last):
  File "/home/linuxbrew/.linuxbrew/bin/run_lissero", line 8, in <module>
    sys.exit(run_lissero())
  File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/site-packages/lissero/run_lissero.py", line 52, in run_lissero
    path_serodb = os.path.realpath(serotype_db)
  File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/posixpath.py", line 394, in realpath
    filename = os.fspath(filename)
TypeError: expected str, bytes or os.PathLike object, not NoneType

Discrepancy between Doumith and lissero

I have create the following table to show how LisSero interprets a certain combination of amplicons, compared to the paper of Doumith. As you can see, LisSero assigns the serotype to a lot of combinations that do not appear in the Doumith paper. I was wondering if this classification is documented somewhere, is it from another publication?

gene combinations	doumith	lissero
Prs	?4a,4c	?4a,4c
ORF2110	?	4b,4d,4e
ORF2110,Prs	?	4b,4d,4e
ORF2819	1/2b,3b,7	1/2b,3b,7
ORF2819,Prs	1/2b,3b,7	1/2b,3b,7
ORF2819,ORF2110	4b,4d,4e	4b,4d,4e
ORF2819,ORF2110,Prs	4b,4d,4e	4b,4d,4e
lmo0737	1/2a,3a	1/2a,3a
lmo0737,Prs	1/2a,3a	1/2a,3a
lmo0737,ORF2110	?	4b,4d,4e
lmo0737,ORF2110,Prs	?	4b,4d,4e
lmo0737,ORF2819	?	4b,4d,4e
lmo0737,ORF2819,Prs	?	4b,4d,4e
lmo0737,ORF2110,ORF2819	?	4b,4d,4e
lmo0737,ORF2110,ORF2819,Prs	?	4b,4d,4e
lmo1118,Prs	?	1/2c,3c
lmo1118,ORF2110	?	1/2c,3c
lmo1118,lmo0737	1/2c,3c	1/2c,3c
lmo1118,lmo0737,Prs	1/2c,3c	1/2c,3c

LISSERO_DB environment variable could not be found

From Christhian Ulises Franco Frias <[email protected]>

The reason for my email is to ask a question about LisSero
And installed the tool on my computer and all its dependencies
However when I run it the following legend appears.

2020-08-11 11:01:34.867 | ERROR    | 
lissero.run_lissero:run_lissero:109 - Please provide a correct serotype db path or set correct PATH for LISSERO_DB
Greetings and hopefully you can help me. Please

there should be spaces between commas in the same field

Hi Jason.

When printing multiple elements that are in a field separated by commas, there should be no space between the comma and the record. That makes parsing the output a bit of a nightmare.

So:

1/2b, 3b, 7

Should be:

1/2b,3b,7

So, the whole line looks like this:

contigs.fa   1/2b,3b,7     ORF2819,Prs

Anders.

run_lissero on non-fasta gives vomit

run_lissero  not_fasta_but_exists.txt

Traceback (most recent call last):
  File "/home/linuxbrew/.linuxbrew/bin/run_lissero", line 8, in <module>
    sys.exit(run_lissero())
  File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/site-packages/lissero/run_lissero.py", line 52, in run_lissero
    path_serodb = os.path.realpath(serotype_db)
  File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/posixpath.py", line 394, in realpath
    filename = os.fspath(filename)
TypeError: expected str, bytes or os.PathLike object, not NoneType

mistake

If you download the file from github you must make modifications in lines: 13,14,15. We should delete lissero.

in line 16 delete the . in init.

BLASTN issue

Hi Anders,

There seems to have some issue with BlastN...

When I tried to run the job, it looks like the failing bit is at this step.

$ lissero test.fna
2021-07-14 12:01:45.731 | INFO     | lissero.scripts.Serotype:_load_log:285 - Successfully loaded DB log!
2021-07-14 12:01:45.731 | INFO     | lissero.scripts.Serotype:check_db:260 - DB was created on 2020-06-04T11:57:55.145347
2021-07-14 12:01:45.731 | INFO     | lissero.scripts.Serotype:check_db:261 - DB is ready for use.
2021-07-14 12:01:45.732 | INFO     | lissero.scripts.Sample:get_serotype:29 - Serotyping: test.fna
2021-07-14 12:01:45.732 | INFO     | lissero.scripts.Blast:run:52 - Blastn is running: /home/users/iidlleo/miniconda3/envs/phetype/bin/blastn -db /home/users/iidlleo/miniconda3/envs/phetype/lib/python3.6/site-packages/lissero/db/lissero -ungapped -culling_limit 1 -outfmt 6 qaccver saccver length slen pident -dust no -query /scratch/iidlleo/eyre/test.fna
2021-07-14 12:01:45.743 | CRITICAL | lissero.scripts.Blast:run:55 - Failed to run /home/users/iidlleo/miniconda3/envs/phetype/bin/blastn -db /home/users/iidlleo/miniconda3/envs/phetype/lib/python3.6/site-packages/lissero/db/lissero -ungapped -culling_limit 1 -outfmt 6 qaccver saccver length slen pident -dust no -query /scratch/iidlleo/eyre/test.fna

I am using blast v2.5.0, and I could found the dbs if list the directory

$ ls /home/users/iidlleo/miniconda3/envs/phetype/lib/python3.6/site-packages/lissero/db/
lissero_db.json  lissero.nhr  lissero.nog  lissero.not  lissero.ntf  sequences.fasta
lissero.ndb      lissero.nin  lissero.nos  lissero.nsq  lissero.nto

Missing pkg_resources module when installing in fresh conda enviroment using conda

This is caused by missing setuptools in a brand new environment.

Should migrate to importlib.resources--- but it would mean we would have to set python 3.7 as minimum requirement.

In the meantime, need to add setuptools to dependency to ensure proper installation from conda.

lissero 0.4.0 on pipy not on github

pkg_resource

when I run the command (in terminal not conda enviroment) python3 run_lissero.py I get the following error:

_Traceback (most recent call last):

File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 349, in get_provider
module = sys.modules[moduleOrReq]
KeyError: 'lissero'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/chris_franco/bin/programas_bioinformaticos/LisSero/lissero/run_lissero.py", line 20, in
DEFAULT_DB = pkg_resources.resource_filename("lissero", "db")
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 1135, in resource_filename
return get_provider(package_or_requirement).get_resource_filename(
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 351, in get_provider
import(moduleOrReq)
ModuleNotFoundError: No module named 'lissero'

do yu have any suggetion for fix them.

lissero --version Does not work

Usage: lissero [OPTIONS] FASTA...
Try 'lissero --help' for help.
Error: no such option: --version```

JOSS submission

Requirements:

The software should be open source as per the OSI definition.
The software should have an obvious research application.
You should be a major contributor to the software you are submitting.
The software should be a significant contribution to the available open source software that either enables some new research challenges to be addressed or makes addressing research challenges significantly better (e.g., faster, easier, simpler).
The software should be feature-complete (no half-baked solutions) and designed for maintainable extension (not one-off modifications). Minor ‘utility’ packages, including ‘thin’ API clients, and single-function packages are not acceptable.
Your paper (paper.md and BibTeX files, plus any figures) must be hosted in a Git-based repository. Placing these items together with your software (rather than in a separate repository) is strongly encouraged.

Your paper should include:

A list of the authors of the software and their affiliations, using the correct format (see the example below).
A summary describing the high-level functionality and purpose of the software for a diverse, non-specialist audience.
A clear Statement of Need that illustrates the research purpose of the software.
A list of key references, including to other software addressing related needs.
Mention (if applicable) a representative set of past or ongoing research projects using the software and recent scholarly publications enabled by it.
Acknowledgement of any financial support.

Example paper: https://joss.readthedocs.io/en/latest/submitting.html#example-paper-and-bibliography

Create a GitHub action to run validation

Create a Using GitHub Action called validation.yml that will run the tool against all the validation genomes, and assert we get the right results.

I would add the assemblies into the repo (but we don't package them) to validations/data/<assembler_version>/<sample_id>/contigs.fa folder in the root folder.

Then add a csv with the expected results in validations/data/pcr-results.csv.

It will then need a small NextFlow pipeline that runs the latest lissero on the controls, and on each of the genomes.

It then compares the output of lissero with the expected output to assert they are the same.

I believe @kristyhoran has a template of what the validation report might look like, so the NextFlow pipeline would ideally output that in PDF. The plain text version of the filled in template (before conversion to PDF) would be committed to validations/reports/ along with a CSV of the raw data (i.e., the one row per sample with the sample ID, the expected result, the observed result, and whether they match or not.

The NextFlow pipeline should be in validations/validation.nf and the config file should be validations/config.nextflow.

I appreciate there is a lot to this issue. I would start with just getting the NextFlow pipeline working on a local clone of the repo, and then work towards getting it working with the GitHub Actions.

Clarify and detect min blast version

Fails on blast 2.5
please make lissero check blast -version before running.
And add to docs

Use the serogroup DB in Pasteur instance of BigsDB

Pasteur is curating FASTA with various alleles of the Listeria serogroups.

https://bigsdb.pasteur.fr/api/db/pubmlst_listeria_seqdef/schemes/4

Fix docs space after ## headings

EMBOSS is alive and well

http://emboss.sourceforge.net/download/

maybe was just an outage at sf.net ?

issues downloading the dependency EMBOSS PrimerSearch

Dear Jason,

First of all, congratulations for LisSero. It will be a very useful tool!!!

I am trying to install it on a Unix environment, but the link provided in the HELP file regarding the download of EMBOSS PrimerSearch is not working...(and consequently, the LIsSero doesn't run!)

I would like to kindly ask you if you could provide us the correct link, or another alternative to install PrimerSearch.

I am looking forward to testting LisSero!!!

Best regards,

Vítor Borges

Add controls options

We need the following three flags added to the tool before we can publish:

--positive-control
--negative-control
--controls

The first one should runs a positive control, which can be the EGDE-e strain (https://www.ncbi.nlm.nih.gov/assembly/GCF_000196035.1/)

The second one runs two negative controls Listeria innocua Clip11262 strain (https://www.ncbi.nlm.nih.gov/assembly/GCF_000195795.1/) and Salmonella enterica subsp Typhimurium LT2 strain (https://www.ncbi.nlm.nih.gov/assembly/GCF_000006945.2).

Download these genomes, and add them to the repo to be part of the packaged data.

When the user uses the --positive-control flag it will run the EGDE-e strain, and it should assert that the result is correct before going forward (I believe it is 1/2a), or gracefully exit with an error.

When the user adds the --negative-control flag it will return both the Listerian innocua and Salmonella enterica genomes both as Nontypeable. That particular Listeria innocua strain lacks the Prs gene. It should assert we get the expected values before going forward, or gracefully exit with an error.

The options --controls runs both --postive-control and --negative-control.