Giter VIP home page Giter VIP logo

Comments (14)

Waschina avatar Waschina commented on August 24, 2024

Hi Calvin,

I guess the reaction is added during the gap-filling. There, reactions with "blad_blast" might be added if they complete a metabolic functionality, e.g. a pathway to use a specific compound as an energy source. By default, gapseq considers reactions with a bitscore > 50 for those gap fillings.

You can make the gap-filling step more strict by using the option "-b". E.g.

gapseq fill -m <genomeID>-draft.RDS -n <genomeID>-medium.csv -c <genomeID>-rxnWeights.RDS -g <genomeID>-rxnXgenes.RDS -b 100

This will reset the bitscore threshold during gap-filling to 100 instead of 50.

I hope this helps.
Silvio

from gapseq.

Calvin2077 avatar Calvin2077 commented on August 24, 2024

Hello Silvio,

Thank you very much for getting back to me so fast as it is much appreciated and means a lot so thank you. Also thank you for your help in my question!

However one thing I accidentally forgot to include in my question is I only ran "Gap-seq find" and I did not run gapseq fill or any other downstream steps.

Thanks and sorry for the confusion

from gapseq.

Waschina avatar Waschina commented on August 24, 2024

I see. Could you perhaps share the part of the table "...-all-Reactions.tbl" with the said reaction? Ideally all rows for the pathway, which contains the reaction.

from gapseq.

Calvin2077 avatar Calvin2077 commented on August 24, 2024

Of course here you go!

Screenshot 2023-11-09 at 2 35 17 PM

from gapseq.

Calvin2077 avatar Calvin2077 commented on August 24, 2024

Also here is a screen shot showing that the pathway is complete

Screenshot 2023-11-09 at 2 37 10 PM

from gapseq.

jotech avatar jotech commented on August 24, 2024

Thank you for reporting this issue! Can you please also post the output of

./gapseq find -p PWY-40 -b 60 -z 2 your.genome.fna

from gapseq.

Calvin2077 avatar Calvin2077 commented on August 24, 2024

You're welcome and thank you for your help and assistance it is much appreciated and of course! It is the first reaction 1) ARGDECARBOX-RXN arginine decarboxylase, that is being problematic (GapSeq says it is present despite having Bad_Blast!
Screenshot 2023-11-10 at 9 31 35 AM

from gapseq.

jotech avatar jotech commented on August 24, 2024

okay, thank you! That the reactions are found makes sense since you lowered the bitscore. However, it is still unclear why it is labeled as a "bad blast" hit.

from gapseq.

Calvin2077 avatar Calvin2077 commented on August 24, 2024

Okay so the problem lies in lowering the bit-score this does make sense so thank you. However, when I checked another protein/enzyme (ORNDECARBOX-RXN ornithine decarboxylase 4.1.1.17 ) that has a bit-score of 89.7 the output of GapSeq says it is absent. Please see attached photo
Screenshot 2023-11-10 at 2 27 37 PM

from gapseq.

Waschina avatar Waschina commented on August 24, 2024

Hi Calvin,

I tried to reproduce the issue with an Nitrosothermus koennekii assembly from NCBI. Everything was just as expected: There were no bad_blast hits for Pathway PWY-40.

Could you run the following commands and post the output?

gapseq find -p PWY-40 -t Archaea -b 60 Nitrosothermus_koennekii.faa
cat Nitrosothermus_koennekii-PWY-40-Reactions.tbl
cat Nitrosothermus-PWY-40-Pathways.tbl

For me, the output was:

/tmp/tmp.S1WcusPC8v
Protein fasta detected.
Checking updates for Archaea /home/silvio/Software/gapseq/src/../dat/seq/Archaea
Reference sequences are up-to-date.
metacyc,custom
Duplicated pathway IDs found: |ARGDEGRAD-PWY2| |NADPHOS-DEPHOS-PWY| will only use /home/silvio/Software/gapseq/src/../dat/custom_pwy.tbl
\|ARGDEGRAD-PWY2\||\|NADPHOS-DEPHOS-PWY\|
1
Checking for pathways and reactions in: Nitrosothermus.faa PWY-40
Number of pathways to be considered: 1

1/1: Checking for pathway |PWY-40| putrescine biosynthesis I with 2 reactions
(Biosynthesis,Polyamine-Biosynthesis,Putrescine-Biosynthesis)
	1) ARGDECARBOX-RXN arginine decarboxylase, biosynthetic 4.1.1.19 speA speA SPE1 MADC2 MADC3 ADC ADC pdaD speA ADC adiA speA
		--> Found sequences: /home/silvio/Software/gapseq/src/../dat/seq/Archaea/rev/4.1.1.19.fasta (32 sequences)
		Final file: /tmp/tmp.S1WcusPC8v/tmp.BE8ElyVNwY (32 sequences)
		Blast hit (2x)
			bit=92.0 id=40.000 cov=80 hit=UniRef90_A8AAB6
			bit=81.6 id=32.456 cov=85 hit=UniRef90_A8MBV3
		Candidate reaction for import: 1
	2) AGMATIN-RXN agmatinase 3.5.3.11 speB AGMAT speB
		--> Found sequences: /home/silvio/Software/gapseq/src/../dat/seq/Archaea/rev/3.5.3.11.fasta (3 sequences)
		Final file: /tmp/tmp.S1WcusPC8v/tmp.brVixrq3ku (3 sequences)
		Blast hit (1x)
			bit=146 id=35.433 cov=88 hit=UniRef90_Q5JI38
		Candidate reaction for import: 1
Pathway completeness: 2/2 (100%)
Hits with candidate reactions in database: 2/2
Key reactions: 0/0

Pathways found:
putrescine biosynthesis I

Candidate reactions found: 2 

%CPU %MEM COMMAND
 4.0  0.0 /bin/bash /home/silvio/Software/gapseq/src/gapseq_find.sh -p PWY-40 -t
Running time: 1 s.

# gapseq version: 1.2 6225829
# Sequence DB md5sum: 748b938 (2022-12-19, Archaea)
# Genome format: prot
rxn	name	ec	bihit	qseqid	pident	evalue	bitscore	qcovs	stitle	sstart	send	pathway	statuspathway.status	dbhit	complex	exception	complex.status
ARGDECARBOX-RXN	arginine decarboxylase, biosynthetic	4.1.1.19		UniRef90_A8AAB6	40.000	2.30e-26	92.0	80	RMF31093.1 MAG: adenosylmethionine decarboxylase [Candidatus Nitrosothermus koennekii]	3	116	|PWY-40|	good_blast	full	rxn00405 	NA	0	1
ARGDECARBOX-RXN	arginine decarboxylase, biosynthetic	4.1.1.19		UniRef90_A8MBV3	32.456	1.79e-22	81.6	85	RMF31093.1 MAG: adenosylmethionine decarboxylase [Candidatus Nitrosothermus koennekii]	3	114	|PWY-40|	good_blast	full	rxn00405 	NA	0	1
AGMATIN-RXN	agmatinase	3.5.3.11		UniRef90_Q5JI38	35.433	7.94e-44	146	88	RMF31931.1 MAG: agmatinase [Candidatus Nitrosothermus koennekii]	24	272	|PWY-40|	good_blast	full	rxn00858 	NA	0	1

# gapseq version: 1.2 6225829
# Sequence DB md5sum: 748b938 (2022-12-19, Archaea)
# Genome format: prot
ID	Name	Prediction	Completeness	VagueReactions	KeyReactions	KeyReactionsFound	ReactionsFound
|PWY-40|	putrescine biosynthesis I	true	100	0	0	0	ARGDECARBOX-RXN AGMATIN-RXN 

from gapseq.

Calvin2077 avatar Calvin2077 commented on August 24, 2024

Hello!

Thank you for reproducing my data/.faa to try and find the source of the error. It is much appreciated!

As requested here are the photos of my output (.tsv instead of cat for more clarity for me) and like yourself I am getting no "bad_blast" for PWY-40. Which is incredible strange and odd I have likewise tried running GapSeq on the entire "Polyamine-Biosynthesis" and am still getting now a "good_blast" (in other words no "bad_blast")

Very odd but seems like in the end I think the major source of the problem (in why this protein was included when other analysis didn't) might be related to the low bit-score I set up.

I tried to reproduce the issue with an Nitrosothermus koennekii assembly from NCBI. Everything was just as expected: There were no bad_blast hits for Pathway PWY-40.

Could you run the following commands and post the output?

Screenshot 2023-11-11 at 1 52 15 PM Screenshot 2023-11-11 at 1 50 42 PM Screenshot 2023-11-11 at 1 50 28 PM

from gapseq.

Calvin2077 avatar Calvin2077 commented on August 24, 2024

However, if it is okay, I just have another confusing find in that ORNDECARBOX-RXN ornithine decarboxylase 4.1.1.17 (please see the attached photos) says it is has a "bad_blast" yet the bit-score is above my 60 threshold (bit-score 86.9), whereas the output says ARGINASE-RXN (arginase) is present.

One idea, I am thinking it is related to ORNDECARBOX-RXN ornithine decarboxylase having "bihit" as NA while ARGINASE-RXN (arginase) doesn't. Therefore do you think this the reason? If so, and I apologize (as I am still pretty new to bioinfomatics) but what does "bihit" {Bidirectional hit} mean?

Thanks again for all your help it is so very much appreciated!

Screenshot 2023-11-11 at 2 11 57 PM Screenshot 2023-11-11 at 2 12 43 PM

from gapseq.

Waschina avatar Waschina commented on August 24, 2024

Hi Calvin,

unfortunately, your screenshot does not show the entire table (That is why I suggested the output of the cat command). You will see that the column "exception" says 1 for EC 4.1.1.17. This means this enzyme requires a higher identity to be identified as "good_blast". The gapseq publication in the section "Pathway and subsystem prediction" explains the exception handling.

One small note: When posting code blocks or parts of command line log outputs, you could consider posting the text in code blocks (https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks) instead of screenshots.

from gapseq.

Waschina avatar Waschina commented on August 24, 2024

I am closing the issue for now as we can not reproduce the described problem. Please feel free to re-open if issue remains.

from gapseq.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.