Giter VIP home page Giter VIP logo

Comments (7)

korseby avatar korseby commented on July 18, 2024

Hi Tom,

that's a good question and I don't remember anymore where the weights came from. Maybe @schymane has an idea? I think I just used some defaults provided by Emma some years ago... If not, I need to ask @sneumann. :)

Best, *Kristian

from metfrag-galaxy.

Tomnl avatar Tomnl commented on July 18, 2024

Thanks Kristian

from metfrag-galaxy.

schymane avatar schymane commented on July 18, 2024

Thanks for linking me in! I don't recognise this weighting scheme; in our 2016 paper we had a different set of weightings that totalled 1, but in our internal MetFrag use we tend to now keep all scores weighted at 1 so that the total max score equals the number of terms used.

I am not sure what database and suspect list combination that you are using @Tomnl but for e.g. PubChem or PubChemLite we would usually do:

  • FragmenterScore
  • Offline MoNA Exact Spectral Similarity (not MetFusion)
  • References / PubMed Count
  • Patent Score
  • Optional: additional terms like e.g. the suspect list count, or in PubChemLite the metadata categories if relevant. For biologically-focused studies this could be the BioPathway, for instance.

We have found the exact spectral similarity score a lot easier to interpret (and integrate into automated scoring schemes) than the MetFusion score.
Hope this helps, let me know if you have any more questions!

from metfrag-galaxy.

Tomnl avatar Tomnl commented on July 18, 2024

Thank you @schymane and @korseby

This is really useful information!

It's unfortunate we do not no know the origin of these weightings used but perhaps a good point now to revise them in the Galaxy tool based on what it is practically used in the community for MetFrag.

In Birmingham have been using MetFrag with a few different use cases but a reasonably common usage for general annotation is using the FragmenterScore, (a local) Pubchem database, and a list of Natural products (provided by Kristian derived from the Universal Natural Products Database) for the inclusion list and the Offline MetFusion score. We then combine with some other annotation approaches using different scoring schemes and weights. In some complicated workflows it does make a lot of sense to take the scores out of MetFrag (like @schymane has done with the spectral matching) - and I will probably change to do something similar in the future.

@schymane - do you have general weightings that you use for these scores? (would that be something that you could share? Or is it something that is really dependent on your own analysis and evaluations)

I wonder….although it could be very useful for users to have default weightings in the tool, perhaps for the Galaxy tool at the moment we remove the default weightings and force the user determine their own weightings based on their own preferences.

Does this seem OK with you @korseby? This seems to follow the logic of MetFragCLI which does not specifically provide default weightings for the scores (At least I could not find any default weightings).

from metfrag-galaxy.

schymane avatar schymane commented on July 18, 2024

Honestly, over several years and many, many users we have found it best to keep each scoring term with a weighting of 1, so that the scores become additive. It seems to be much easier for people to understand intuitively when selecting their candidates afterwards (then max score is the "number of scoring terms chosen"). So this is what I would recommend as the preferred default behaviour - but then leave the option for people to tweak the scores (weighting) if they wish.
We used this in the PubChemLite evaluation and achieved very good results. So this would be something like:

Term Weighting
FragmenterScore 1
Offline MoNA Exact Spectral Similarity 1
References* / PubMed Count 1
Patent Score 1
Optional Additional Terms 1

* If you wish to use n reference counts and merge them, like previously in ChemSpider, then weighting is 1/n for each, to give total score of 1 if you only want one overall reference count. This is not necessary for PubChem, and probably now obselete since ChemSpider requires tokens for the API now.

What is important to do then is to ensure that the two experimental terms (e.g. MetFrag Fragmenter Score and the MoNA score) are reported clearly (separately) in the output along with the aggregated score, because these two are the key terms that help you see if the MetFrag and MoNA results really match the input spectrum, or if it's just the "best match" but only explains the experimental data poorly. Frank also experimented with calculating a spectral match based on the MetFrag predicted fragments here.

If you want to see some of this pictorially, please see some examples here:
https://zenodo.org/record/6856116
(examples start at Slide 37, and were with PubChemLite, not PubChem ... but same idea)

from metfrag-galaxy.

Tomnl avatar Tomnl commented on July 18, 2024

Thanks @schymane for sharing, this is really useful.

I agree just having a simple additive scoring for default for the Galaxy tool like you describe seems sensible and intuitive (but letting the user change the weightings if they would like it).

I will update the Galaxy tool using those defaults and I will check the outputs to see how the experimental terms are separated in the output.

Thanks for you help!

And thanks for the including the links - I will have a read to explore a bit more.

from metfrag-galaxy.

korseby avatar korseby commented on July 18, 2024

Sorry for the delay. We had a workshop this week and I am incredibly busy right now. So, yes I am completely fine with this weighting.

We used my weighting a few years ago together with a suspect list of natural products. This way, natural products were ranked much higher and we were able to get a lot of candidates ranked first. We mostly applied this weighting scheme for non-model species of plants. The use of other libraries than pubchem has rendered this weighting scheme obsolete anyway.

from metfrag-galaxy.

Related Issues (9)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.