Giter VIP home page Giter VIP logo

Comments (10)

Calvin2077 avatar Calvin2077 commented on August 26, 2024

Minor update when I switched the Taxonomic range (-t) from Archaea to Bacteria, GapSeq successfully identified formate dehydrogenase in my archaeal species, which is great.

However, when I check the Pathway file output, there is a significant difference between the degree of completion for other metabolic pathways depending on whether I run GapSeq using the Taxonomic range for Archaea and Bacteria. This makes sense as the enzymes that make them up pathways can differ between Archaea and Bacteria.

As a result, can I please ask how I should progress or what Taxonomic range I should choose for my project?

from gapseq.

Waschina avatar Waschina commented on August 26, 2024

Dear @Calvin2077 ,

If you know your organism is an Archaea, we recommend using the option `-t Archaea".

The results from gapseq find are expected to depend on the chosen taxonomic range since the reference sequence database between Archaea and Bacteria differs.

In your case, gapseq misses the identification of EC 1.17.1.9 because there is no closely related reference protein in the Archaea reference protein sequences. I will check if the reference sequences for Archaea can be improved so it will correctly predict the formate dehydrogenase presence.

from gapseq.

Waschina avatar Waschina commented on August 26, 2024

I just checked, and the problem is that there has yet to be a reviewed protein sequence for Archaea for the Enzyme EC 1.17.1.9 on UniProt.

A workaround in your case could be the gapseq command:

~/gapseq/gapseq find -p C1-COMPOUNDS -t Archaea -z 3 Nitrosocosmicus_hydrocola.faa

The option -z 3 includes reference sequences from UniProt that are reviewed and unreviewed. Thus, the prediction is more relaxed.

I hope this helps you.

from gapseq.

Calvin2077 avatar Calvin2077 commented on August 26, 2024

Awesome thank you very much for your help much appreciated. Is there a way however to change the taxonomic range to include both bacteria and archaea? (e.g., instead of only -t Archaea I can put -t Archaea,Bacteria).

I did note a option "-m Limit pathways to taxonomic range (default: )". However, it seems to lack any input word as if possible I would love to turn this off and hence expand the homology searching of sequences across all tax

from gapseq.

Waschina avatar Waschina commented on August 26, 2024

Currently, the option -t cannot include both, Archaea and Bacteria. However, I remember discussing this with @jotech. The option to include reference protein sequences from both, Archaea and Bacteria, would be a nice feature. I will add this as a feature request that we will try to add in the future (although I can't promise a specific timeframe until when we will be able to implement the feature).

I did note a option "-m Limit pathways to taxonomic range (default: )". However, it seems to lack any input word ...

Thank you for pointing this out. I just corrected it. It now states:

-m Limit pathways to taxonomic range (default: all)

from gapseq.

Calvin2077 avatar Calvin2077 commented on August 26, 2024

Fantastic, thank you for telling me and for your quick response; this is much appreciated and helps a lot!

Moreover, thank you for both making it a feature request as well as for updating this parameter!

If it is okay, I just have one more quick question. In I am looking at the Reactions.tbl, in particular at the 'Status' column, I find some proteins have a "no_blast" option, and others have "no_seq_data." therefore, can you please tell me what these mean?

I think it means that in the reference database (e.g., Archaea), there isn't a corresponding sequence to compare against; therefore, I was wondering if there is some way to circumvent that and have gapseq search the entire database for sequences to compare to.

Thanks again!

from gapseq.

Waschina avatar Waschina commented on August 26, 2024

no_blast means that there was no hit between the reference protein database for that enzyme and the input genome sequence. no_seq_data means that the reference protein database does not contain any sequence for the corresponding enzyme.

At the moment, there is no way that in those cases, the search includes sequences from a different domain than the one that is set with the option -t.

from gapseq.

Calvin2077 avatar Calvin2077 commented on August 26, 2024

Awesome thank you for explaining these too me it is much appreciated, and helps a lot. Therefore, is there a way to update the reference protein database to include these sequences?

from gapseq.

Waschina avatar Waschina commented on August 26, 2024

You can tell gapseq to retrieve the latest sequences from UniProt with the option -U. For instance:

gapseq find -p TRPSYN -U  ecoli.faa.gz

But please keep in mind that this is very slow due to HTTP queries to Uniprot's API. And it will only retrieve the sequences for the taxonomic range set by -t.

from gapseq.

Calvin2077 avatar Calvin2077 commented on August 26, 2024

Makes sense and understandable! Thanks for telling me this information it is much appreciated. Overall I really enjoy Gapseq it is easy to use and install and can give great accessible results so thank you so much for creating an amazing program.

from gapseq.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.