Comments (14)
Hi Calvin,
I guess the reaction is added during the gap-filling. There, reactions with "blad_blast" might be added if they complete a metabolic functionality, e.g. a pathway to use a specific compound as an energy source. By default, gapseq considers reactions with a bitscore > 50 for those gap fillings.
You can make the gap-filling step more strict by using the option "-b". E.g.
gapseq fill -m <genomeID>-draft.RDS -n <genomeID>-medium.csv -c <genomeID>-rxnWeights.RDS -g <genomeID>-rxnXgenes.RDS -b 100
This will reset the bitscore threshold during gap-filling to 100 instead of 50.
I hope this helps.
Silvio
from gapseq.
Hello Silvio,
Thank you very much for getting back to me so fast as it is much appreciated and means a lot so thank you. Also thank you for your help in my question!
However one thing I accidentally forgot to include in my question is I only ran "Gap-seq find" and I did not run gapseq fill or any other downstream steps.
Thanks and sorry for the confusion
from gapseq.
I see. Could you perhaps share the part of the table "...-all-Reactions.tbl" with the said reaction? Ideally all rows for the pathway, which contains the reaction.
from gapseq.
Of course here you go!
from gapseq.
Also here is a screen shot showing that the pathway is complete
from gapseq.
Thank you for reporting this issue! Can you please also post the output of
./gapseq find -p PWY-40 -b 60 -z 2 your.genome.fna
from gapseq.
You're welcome and thank you for your help and assistance it is much appreciated and of course! It is the first reaction 1) ARGDECARBOX-RXN arginine decarboxylase, that is being problematic (GapSeq says it is present despite having Bad_Blast!
from gapseq.
okay, thank you! That the reactions are found makes sense since you lowered the bitscore. However, it is still unclear why it is labeled as a "bad blast" hit.
from gapseq.
Okay so the problem lies in lowering the bit-score this does make sense so thank you. However, when I checked another protein/enzyme (ORNDECARBOX-RXN ornithine decarboxylase 4.1.1.17 ) that has a bit-score of 89.7 the output of GapSeq says it is absent. Please see attached photo
from gapseq.
Hi Calvin,
I tried to reproduce the issue with an Nitrosothermus koennekii assembly from NCBI. Everything was just as expected: There were no bad_blast hits for Pathway PWY-40.
Could you run the following commands and post the output?
gapseq find -p PWY-40 -t Archaea -b 60 Nitrosothermus_koennekii.faa
cat Nitrosothermus_koennekii-PWY-40-Reactions.tbl
cat Nitrosothermus-PWY-40-Pathways.tbl
For me, the output was:
/tmp/tmp.S1WcusPC8v
Protein fasta detected.
Checking updates for Archaea /home/silvio/Software/gapseq/src/../dat/seq/Archaea
Reference sequences are up-to-date.
metacyc,custom
Duplicated pathway IDs found: |ARGDEGRAD-PWY2| |NADPHOS-DEPHOS-PWY| will only use /home/silvio/Software/gapseq/src/../dat/custom_pwy.tbl
\|ARGDEGRAD-PWY2\||\|NADPHOS-DEPHOS-PWY\|
1
Checking for pathways and reactions in: Nitrosothermus.faa PWY-40
Number of pathways to be considered: 1
1/1: Checking for pathway |PWY-40| putrescine biosynthesis I with 2 reactions
(Biosynthesis,Polyamine-Biosynthesis,Putrescine-Biosynthesis)
1) ARGDECARBOX-RXN arginine decarboxylase, biosynthetic 4.1.1.19 speA speA SPE1 MADC2 MADC3 ADC ADC pdaD speA ADC adiA speA
--> Found sequences: /home/silvio/Software/gapseq/src/../dat/seq/Archaea/rev/4.1.1.19.fasta (32 sequences)
Final file: /tmp/tmp.S1WcusPC8v/tmp.BE8ElyVNwY (32 sequences)
Blast hit (2x)
bit=92.0 id=40.000 cov=80 hit=UniRef90_A8AAB6
bit=81.6 id=32.456 cov=85 hit=UniRef90_A8MBV3
Candidate reaction for import: 1
2) AGMATIN-RXN agmatinase 3.5.3.11 speB AGMAT speB
--> Found sequences: /home/silvio/Software/gapseq/src/../dat/seq/Archaea/rev/3.5.3.11.fasta (3 sequences)
Final file: /tmp/tmp.S1WcusPC8v/tmp.brVixrq3ku (3 sequences)
Blast hit (1x)
bit=146 id=35.433 cov=88 hit=UniRef90_Q5JI38
Candidate reaction for import: 1
Pathway completeness: 2/2 (100%)
Hits with candidate reactions in database: 2/2
Key reactions: 0/0
Pathways found:
putrescine biosynthesis I
Candidate reactions found: 2
%CPU %MEM COMMAND
4.0 0.0 /bin/bash /home/silvio/Software/gapseq/src/gapseq_find.sh -p PWY-40 -t
Running time: 1 s.
# gapseq version: 1.2 6225829
# Sequence DB md5sum: 748b938 (2022-12-19, Archaea)
# Genome format: prot
rxn name ec bihit qseqid pident evalue bitscore qcovs stitle sstart send pathway statuspathway.status dbhit complex exception complex.status
ARGDECARBOX-RXN arginine decarboxylase, biosynthetic 4.1.1.19 UniRef90_A8AAB6 40.000 2.30e-26 92.0 80 RMF31093.1 MAG: adenosylmethionine decarboxylase [Candidatus Nitrosothermus koennekii] 3 116 |PWY-40| good_blast full rxn00405 NA 0 1
ARGDECARBOX-RXN arginine decarboxylase, biosynthetic 4.1.1.19 UniRef90_A8MBV3 32.456 1.79e-22 81.6 85 RMF31093.1 MAG: adenosylmethionine decarboxylase [Candidatus Nitrosothermus koennekii] 3 114 |PWY-40| good_blast full rxn00405 NA 0 1
AGMATIN-RXN agmatinase 3.5.3.11 UniRef90_Q5JI38 35.433 7.94e-44 146 88 RMF31931.1 MAG: agmatinase [Candidatus Nitrosothermus koennekii] 24 272 |PWY-40| good_blast full rxn00858 NA 0 1
# gapseq version: 1.2 6225829
# Sequence DB md5sum: 748b938 (2022-12-19, Archaea)
# Genome format: prot
ID Name Prediction Completeness VagueReactions KeyReactions KeyReactionsFound ReactionsFound
|PWY-40| putrescine biosynthesis I true 100 0 0 0 ARGDECARBOX-RXN AGMATIN-RXN
from gapseq.
Hello!
Thank you for reproducing my data/.faa to try and find the source of the error. It is much appreciated!
As requested here are the photos of my output (.tsv instead of cat for more clarity for me) and like yourself I am getting no "bad_blast" for PWY-40. Which is incredible strange and odd I have likewise tried running GapSeq on the entire "Polyamine-Biosynthesis" and am still getting now a "good_blast" (in other words no "bad_blast")
Very odd but seems like in the end I think the major source of the problem (in why this protein was included when other analysis didn't) might be related to the low bit-score I set up.
I tried to reproduce the issue with an Nitrosothermus koennekii assembly from NCBI. Everything was just as expected: There were no bad_blast hits for Pathway PWY-40.
Could you run the following commands and post the output?
from gapseq.
However, if it is okay, I just have another confusing find in that ORNDECARBOX-RXN ornithine decarboxylase 4.1.1.17 (please see the attached photos) says it is has a "bad_blast" yet the bit-score is above my 60 threshold (bit-score 86.9), whereas the output says ARGINASE-RXN (arginase) is present.
One idea, I am thinking it is related to ORNDECARBOX-RXN ornithine decarboxylase having "bihit" as NA while ARGINASE-RXN (arginase) doesn't. Therefore do you think this the reason? If so, and I apologize (as I am still pretty new to bioinfomatics) but what does "bihit" {Bidirectional hit} mean?
Thanks again for all your help it is so very much appreciated!
from gapseq.
Hi Calvin,
unfortunately, your screenshot does not show the entire table (That is why I suggested the output of the cat
command). You will see that the column "exception" says 1 for EC 4.1.1.17. This means this enzyme requires a higher identity to be identified as "good_blast". The gapseq publication in the section "Pathway and subsystem prediction" explains the exception handling.
One small note: When posting code blocks or parts of command line log outputs, you could consider posting the text in code blocks (https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks) instead of screenshots.
from gapseq.
I am closing the issue for now as we can not reproduce the described problem. Please feel free to re-open if issue remains.
from gapseq.
Related Issues (20)
- How was dat/media/gut.csv formulated?
- stat: cannot stat '/scratch/users/nus/e0512805/gapseq/src/../dat/seq/Bacteria/rev/sequences.tar.gz': No such file or directory HOT 8
- Error in curl::curl_fetch_memory(url, handle = handle) : Timeout was reached: [rest.uniprot.org] SSL connection timeout HOT 4
- Issue with options when calling subcommand for "gapseq doall". HOT 3
- Download sequences fails HOT 2
- HTML entities for special characters in reaction name causes incorrect uniprot queries HOT 1
- libsbml and libglpk not found while installed using conda HOT 4
- Reaction inferred from pseudogene regions when using gapseq on genome fasta file.
- CHNOSZ NOT FOUND HOT 1
- [Request Update] [Tutorial] For anyone who intend to use CPLEX in gapseq HOT 3
- Diamond not used for the transport-find command HOT 2
- Could not use find-transport function using '-m' option for specific metabolite. HOT 8
- subex.tbl issue HOT 1
- Any plans on updating MetaCyc database? HOT 2
- Using seed reaction database in Adapt function HOT 8
- adding pathways| Error: No model reactions found and Error: Error in !opt$sbml.no.output : invalid argument type HOT 6
- Inquiry on instllation HOT 2
- Inquiry on error HOT 5
- Missing exchange reactions are not being added in the "adapt" module HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gapseq.