Giter VIP home page Giter VIP logo

sting's People

Contributors

apetkau avatar ar0ch avatar hspitia avatar lavanyarishishwar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

shulp2211 apetkau

sting's Issues

db_util.py not able to download data from PubMLST

Describe the bug
When trying to use db_util.py to download data from PubMLST I encountered an error at the indexer stage.

Command:

./db_util.py fetch --query "Campylobacter jejuni" --out_dir sting-db --build_index

Error:

Database: "Campylobacter jejuni"
 Fetching allele sequences:
 - https://rest.pubmlst.org/db/pubmlst_campylobacter_seqdef/loci/aspA/alleles_fasta -> /home/CSCScience.ca/apetkau/workspace/thesis-benchmarking/sting/STing/scripts/sting-db/camp
ylobacter_jejuni/alleles_fasta.fa
 - https://rest.pubmlst.org/db/pubmlst_campylobacter_seqdef/loci/glnA/alleles_fasta -> /home/CSCScience.ca/apetkau/workspace/thesis-benchmarking/sting/STing/scripts/sting-db/camp
ylobacter_jejuni/alleles_fasta.fa
...
subprocess.CalledProcessError: Command '/home/CSCScience.ca/apetkau/workspace/thesis-benchmarking/lib/bin/STing/indexer -c /home/CSCScience.ca/apetkau/workspace/thesis-benchmarki
ng/sting/STing/scripts/sting-db/campylobacter_jejuni/config.txt -p /home/CSCScience.ca/apetkau/workspace/thesis-benchmarking/sting/STing/scripts/sting-db/campylobacter_jejuni/db/
index' returned non-zero exit status 1.

It looks like this happens because every single alleles fasta file is being saved to the same file alleles_fasta.fa instead of a separate file per loci. I think that the PubMLST API may have changed leading to this issue. I actually already have a fix in PR #8

To Reproduce
Steps to reproduce the behavior:

  1. Run the above command and you will see the error message.

Expected behavior
I should be able to download and index data from PubMLST.

Additional context
None.

Rewrite git history for publication

For licensing reasons, we may want to rewrite the Git history before we make this repo public. Some versions of STing were licensed under a different license at the beginning of development and we would prefer to have everything under a single license throughout all of the public git history

finding rST through rMLST scheme

hi @ar0ch, I appreciate your work very much. I have successfully built a rMLST database using STing indexer.
However, the Status values are always no_st_in_table due to mismatch(es) in one or several locus. May I know if there is a way to find the closest rST or the species identity using STing? Thank you very much.

Check licensing info

We should check that we're reproducing all the required licensing information for any libraries being used in STing

subprocess.CalledProcessError

Hi!

I'm testing your tool but got an error message with your test command db_util.py fetch --query "Neisseria spp." --out_dir my_dbs --build_index. Here is my command and the output, can you help me for this? Thanks!

(base) mkzhouy1@supermicro:/mnt/data/2007-mlst-silke/STing_demo$ ~/git/STing/scripts/db_util.py fetch --query "Neisseria spp." --out_dir my_dbs --build_index
Database: "Neisseria spp."
 Fetching allele sequences: 
 - https://pubmlst.org/data/alleles/neisseria/abcZ.tfa -> /mnt/data/2007-mlst-silke/STing_demo/my_dbs/neisseria_spp/abcZ.fa
 - https://pubmlst.org/data/alleles/neisseria/adk.tfa -> /mnt/data/2007-mlst-silke/STing_demo/my_dbs/neisseria_spp/adk.fa
 - https://pubmlst.org/data/alleles/neisseria/aroE.tfa -> /mnt/data/2007-mlst-silke/STing_demo/my_dbs/neisseria_spp/aroE.fa
 - https://pubmlst.org/data/alleles/neisseria/fumC.tfa -> /mnt/data/2007-mlst-silke/STing_demo/my_dbs/neisseria_spp/fumC.fa
 - https://pubmlst.org/data/alleles/neisseria/gdh.tfa -> /mnt/data/2007-mlst-silke/STing_demo/my_dbs/neisseria_spp/gdh.fa
 - https://pubmlst.org/data/alleles/neisseria/pdhC.tfa -> /mnt/data/2007-mlst-silke/STing_demo/my_dbs/neisseria_spp/pdhC.fa
 - https://pubmlst.org/data/alleles/neisseria/pgm.tfa -> /mnt/data/2007-mlst-silke/STing_demo/my_dbs/neisseria_spp/pgm.fa
 Fetching profiles: 
 - https://pubmlst.org/data/profiles/neisseria.txt -> /mnt/data/2007-mlst-silke/STing_demo/my_dbs/neisseria_spp/profile.txt
Building STing index:
/home/mkzhouy1/git/STing/bin/indexer -c /mnt/data/2007-mlst-silke/STing_demo/my_dbs/neisseria_spp/config.txt -p /mnt/data/2007-mlst-silke/STing_demo/my_dbs/neisseria_spp/db/index
Traceback (most recent call last):
  File "/home/mkzhouy1/git/STing/scripts/db_util.py", line 271, in <module>
    main()
  File "/home/mkzhouy1/git/STing/scripts/db_util.py", line 267, in main
    args.func(dbFileContent, args, parser)
  File "/home/mkzhouy1/git/STing/scripts/db_util.py", line 197, in fetch
    build_index(configFileName, outPrefix)
  File "/home/mkzhouy1/git/STing/scripts/db_util.py", line 245, in build_index
    stderr=subprocess.STDOUT).decode("utf-8")
  File "/home/mkzhouy1/anaconda3/lib/python3.7/subprocess.py", line 411, in check_output
    **kwargs).stdout
  File "/home/mkzhouy1/anaconda3/lib/python3.7/subprocess.py", line 512, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '/home/mkzhouy1/git/STing/bin/indexer -c /mnt/data/2007-mlst-silke/STing_demo/my_dbs/neisseria_spp/config.txt -p /mnt/data/2007-mlst-silke/STing_demo/my_dbs/neisseria_spp/db/index' returned non-zero exit status 1.

Profile file for cgMLST

Hi, related to #11, could you show me what a profile file should look like for cg/wgMLST? I downloaded the wgMLST scheme from the ChewBBACA website and I want to make a STing database out of it. I basically ran this inside the folder of locus fasta files:

ls *.fasta| head | \
  perl -MFile::Basename=basename -lane '
    BEGIN{print "[loci]";} 
    $n=basename($F[0], ".fasta"); 
    $n=~s/INNUENDO_cgMLST-//; 
    print join("\t", $n, $F[0]); 
    END{print "[profile]"; print "profile\tprofile.txt";}
  ' > config.txt

touch profile.txt

And then I get this error

[gzu2@monolith3 Salmonella_enterica.stringMLST]$ indexer -c config.txt
Loading sequences from sequences files:

N       Loci    #Seqs.  File
1       00031717        15      ./INNUENDO_cgMLST-00031717.fasta
2       00031718        35      ./INNUENDO_cgMLST-00031718.fasta
3       00031719        14      ./INNUENDO_cgMLST-00031719.fasta
4       00031720        42      ./INNUENDO_cgMLST-00031720.fasta
5       00031721        5       ./INNUENDO_cgMLST-00031721.fasta
6       00031722        30      ./INNUENDO_cgMLST-00031722.fasta
7       00031723        17      ./INNUENDO_cgMLST-00031723.fasta
8       00031724        17      ./INNUENDO_cgMLST-00031724.fasta
9       00031725        23      ./INNUENDO_cgMLST-00031725.fasta
10      00031726        11      ./INNUENDO_cgMLST-00031726.fasta

Total sequences loaded: 209

Loading the profiles file...
ERROR: At least 11 columns (a ST column + # loci in config file) are required in the profiles file but only 0 were found.

indexer reports "ERROR: The config file does not contain the [loci] section."

Hello. I am having the same problem as mentioned in a previous issue, the tab separated config file I created results in ERROR: The config file does not contain the [loci] section. I can't figure out what the problem is, since the Neisseria config file works fine and looks identical to my non-working config file.

Do you have any suggestions for creating the config file?

Indexer throwing ParseError

Hello! I am trying to use STing to do a cgMLST analysis on Salmonella samples. I downloaded all the fasta files for the scheme on Enterobase and I wanted to build a database using the command 'indexer -m MLST -c my_salmonella_db/config.txt It seems to read fine a couple of fasta files but then it gets stuck and throws this error: 'terminate called after throwing an instance of 'seqan::ParseError'
what(): Unexpected character 'R' found.
Aborted (core dumped)'. I get the same problem with other of my fasta files. I hope you could give me a hint on how to fix it. Thank you!

Proper usage of index

Hi,

I'm new to this software, and I have been trying to make it work for the past two days.
I'm using the vfdb database provided by you.

This is my STing command:

OUTDIR="/path/to/output_directory"
DATABASE="/path/to/STing_datasets-20200718/vfdb/db/gene_set"
READ_BASE="/path/to/fastq/reads"
PREFIX="F3_F_R1"

./programs/STing/typer \
--index-prefix ${DATABASE} \
--fastq-1-files ${READ_BASE}/${PREFIX}.fastq \
--sample-name ${PREFIX} \
--sensitive \
--kmer-length 30 \
--n-top-alleles 2 \
--output-file ${OUTDIR}/${PREFIX}.out \
--print-tidy

I am a bit confused on how to use the --index-prefix option. From what I could get, it should point at the database and there should be a prefix word for all files in such database that should be included as a basename (sort of like the bowtie2-build basename structure).

In this case it is the one I specify in DATABASE, but I get this error:

ERROR: Could not open the input file '/path/to/STing_datasets-20200718/vfdb/db/gene_set.prof_idx.sa'
ERROR: Index file '/path/to/STing_datasets-20200718/vfdb/db/gene_set.prof_idx.sa' not found.

I "masked" the paths leaving only the relevant parts in. The paths are correct, i.e. that's my index prefix and all of the other files exist, except for this one.

Could not open the input file 'ERR026529_1.fastq.gz'

typer -x my_dbs/neisseria_spp/db/index -1 ERR026529_1.fastq.gz -2 ERR026529_2.fastq.gz -s ERR026529 -c -a -d -t ERR026529.depth.tsv -y -o ERR026529.results.tsv --sensitive
ERROR: Could not open the input file 'ERR026529_1.fastq.gz'

Said file is not opening while running the above command

Missing files

No file found in the following link kindly provides files.

wget https://github.com/jordanlab/STing/releases/download/1.0.0/sting_v1.0.0.tar.gz

404 not found error occurred

See below for error

yasirniazi@yasirniazi:~/STing-master$ wget https://github.com/jordanlab/STing/releases/download/1.0.0/sting_v1.0.0.tar.gz
--2020-10-29 19:46:51-- https://github.com/jordanlab/STing/releases/download/1.0.0/sting_v1.0.0.tar.gz
Resolving github.com (github.com)... 13.234.176.102
Connecting to github.com (github.com)|13.234.176.102|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2020-10-29 19:46:53 ERROR 404: Not Found.

cgMLST schemes?

Could you explain a little bit how to download a cgMLST scheme for Neisseria spp.. for use with this tool? I'm interested in doing this for a few samples but the documentation seems relevant only to traditional 7 loci MLST. As a long time StringMLST user thanks for this tool!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.