Giter VIP home page Giter VIP logo

public_scripts's Introduction

public_scripts's People

Contributors

peterthorpe5 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

public_scripts's Issues

Warning messages - BLAST data is spaced separated

Dear Peter,

First of all, thank you for your scripts!
I have just used the Diamond_blast_to_taxid.py but the result was an empty file and a log file with warning messages. Bellow follows an example of the log file:

WARNING: changing to an Unknown tax_id 32644 WARNING: Your BLAST data is space separated. This is weird WARNING: Your BLAST data is space separated. This is weird WARNING: try updating your tax info tax_id database file WARNING: tax_id for ��<o��˒>��" is not found in database WARNING: changing to an Unknown tax_id 32644 WARNING: try updating your tax info tax_id database file �,;ݓ7���S��G<��~Ok� is not found in databaseNITY_DN106828_c0_g1_i1p�R��4��� WARNING: changing to an Unknown tax_id 32644

I have followed each step for downloading and formatting the databases.
Do you have any suggestions for resolving these issues?
Thank you again!

Sincerely,

George

Confirm that use of BLAST's `-max_target_seqs` is intentional

Hi there,

This is a semi-automated message from a fellow bioinformatician. Through a GitHub search, I found that the following source files make use of BLAST's -max_target_seqs parameter:

Based on the recently published report, Misunderstood parameter of NCBI BLAST impacts the correctness of bioinformatics workflows, there is a strong chance that this parameter is misused in your repository.

If the use of this parameter was intentional, please feel free to ignore and close this issue but I would highly recommend to add a comment to your source code to notify others about this use case. If this is a duplicate issue, please accept my apologies for the redundancy as this simple automation is not smart enough to identify such issues.

Thank you!
-- Arman (armish/blast-patrol)

ValueError: not enough values to unpack (expected 4, got 2)

Hi, I'm actually using your programm but I found some issue such this one:

INFO: Starting testing: Sat May 12 20:50:59 2018
Traceback (most recent call last):
  File "/pandata/me/LEPIWASP/blast_database/public_scripts-master/Diamond_BLAST_add_taxonomic_info/Diamond_blast_to_taxid.py", line 1034, in <module>
    logger)
  File "/pandata/me/blast_database/public_scripts-master/Diamond_BLAST_add_taxonomic_info/Diamond_blast_to_taxid.py", line 465, in parse_diamond_tab
    acc_to_tax_id = assign_taxon_to_dic(acc_taxid_prot)
  File "/pandata/me/blast_database/public_scripts-master/Diamond_BLAST_add_taxonomic_info/Diamond_blast_to_taxid.py", line 285, in assign_taxon_to_dic
    acc, acc_version, tax_id, GI = line.rstrip("\n").split()
ValueError: not enough values to unpack (expected 4, got 2)

Here is my script:

source /panhome/me//miniconda3/bin/activate
export PYTHONPATH=$PYTHONPATH:/panhome/me/miniconda3/lib/python3.6/site-packages
diamond_tab_output=/pandata/me/blast_database/matches.m8
Diamond_blast_to_taxid=/pandata/me/blast_database/public_scripts-master/Diamond_BLAST_add_taxonomic_info/Diamond_blast_to_taxid.py

taxid=/pandata/me/LEPIWASP/blast_database/gi_taxid_prot.dmp

categories=/pandata/me/blast_database/categories.dmp

names=/pandata/me/blast_database/names.dmp

description=/pandata/me/blast_database/acc_to_des.tab

$Diamond_blast_to_taxid -i $diamond_tab_output -t $taxid -c $categories -n $names -d $description -o outfile_sp1.tab

Do you know where could be the issue?

Error while detecting LGT on the test data

Hi Peter,
I'm using Lateral_gene_transfer_prediction_tool/Lateral_gene_transfer_predictor.py to detect LGTs. I'm testing my setup using the test dataset (/home/Thorpe_HGT/public_scripts/Lateral_gene_transfer_prediction_tool/tests/inputs/blast_tax.tab). I used the following command and got the following errors. I've installed Samtools, Biopython and Numpy. Did I miss anything?

Thank you for any suggestions!

YY

$ python /home/Thorpe_HGT/public_scripts/Lateral_gene_transfer_prediction_tool/Lateral_gene_transfer_predictor.py -p /home/Thorpe_HGT/NCBI_downloads/ -i blast_tax.tab --tax_filter_out 6656 --tax_filter_up_to 33208 -o LTG_results.out --tax_coloumn 14

INFO: sys.version_info(major=3, minor=7, micro=10, releaselevel='final', serial=0)
INFO: Command-line: /home/Thorpe_HGT/public_scripts/Lateral_gene_transfer_prediction_tool/Lateral_gene_transfer_predictor.py -p /home/Thorpe_HGT/ -i blast_tax.tab --tax_filter_out 6656 --tax_filter_up_to 33208 -o LTG_results.out --tax_coloumn 14
INFO: Starting testing: Thu Aug 19 16:57:37 2021
INFO: loading NCBI files from : /home/Thorpe_HGT/
INFO: parsing Blast file blast_tax.tab
Traceback (most recent call last):
  File "/home/Thorpe_HGT/public_scripts/Lateral_gene_transfer_prediction_tool/Lateral_gene_transfer_predictor.py", line 862, in <module>
    tax_coloumn)
  File "/home/Thorpe_HGT/public_scripts/Lateral_gene_transfer_prediction_tool/Lateral_gene_transfer_predictor.py", line 269, in parse_blast_tab_file
    staxids, scientific_name, scomnames, Kingdom = line.rstrip("\n").split("\t")
ValueError: not enough values to unpack (expected 17, got 1)

add full taxonomy instead of only Kingdom

Dear Peter

Is it possible to add the full taxonomy instead of only the Kingdom?
Full taxonomy like given in the file: rankedlineage.dmp

Thanks a lot.

Best wishes,
Lotte

fail to run Diamond_blast_to_taxid.py

Dear Peter,

I feel so lucky to find such a scirpt you wrote to solve the taxid problem from diamond output. However, when I run Diamond_blast_to_taxid.py, I got a problem like this:
Traceback (most recent call last):
File "Diamond_blast_to_taxid.py", line 599, in
parse_diamond_tab(diamond_tab_output, path_files, gi_taxid_prot, categories, names, gi_to_des, outfile)
File "Diamond_blast_to_taxid.py", line 170, in parse_diamond_tab
taxon_to_kingdom = assign_cat_to_dic(categories)
File "Diamond_blast_to_taxid.py", line 100, in assign_cat_to_dic
kingdom_tax_id[int(tax_id)]= kingdom_dic[kingdom]
KeyError: 'O'

Do you know what kind of mistake I might make .
Thanks for your time and help.

gi_to_des.tab file creation issues

Hi Peter,

Thank you for your script - I believe it will work for my needed use.

I am running a diamond blastx search of MGS files against a custom database of viral proteins from RefSeq (ftp://ftp.ncbi.nlm.nih.gov/refseq/release/viral/). I concatenated the viral.X.protein.faa files together and made a custom diamond database for my purposes.

I want to add taxonomy to my diamond output for use in MEGAN because it isn't properly processing my .daa output.

Back to your script - when I try to make the gi_to_des.tab file I get an error saying

Error: AssertionError: Error, gi_to_des.tab file is not formatted as expected. It wants Gi_number description. See help on how to make this file, or use the shell script.

Can you explain how to make the gi_to_des.tab file? I followed your instructions in wiki:

makeblastdb -in viral.1.2.nr.protein.faa -out nr_viral

blastdbcmd -entry 'all' -db viral.1.2.nr.protein.faa > nr.faa

python /home/casey/diamond_blast_to_taxid/prepare_gi_to_description_databse.py -i nr_viral.faa -o gi_to_des.tab

Do you have an idea what I may be doing wrong?

Thank you,
Casey

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.