Comments (7)
Hi Tom,
that's a good question and I don't remember anymore where the weights came from. Maybe @schymane has an idea? I think I just used some defaults provided by Emma some years ago... If not, I need to ask @sneumann. :)
Best, *Kristian
from metfrag-galaxy.
Thanks Kristian
from metfrag-galaxy.
Thanks for linking me in! I don't recognise this weighting scheme; in our 2016 paper we had a different set of weightings that totalled 1, but in our internal MetFrag use we tend to now keep all scores weighted at 1 so that the total max score equals the number of terms used.
I am not sure what database and suspect list combination that you are using @Tomnl but for e.g. PubChem or PubChemLite we would usually do:
- FragmenterScore
- Offline MoNA Exact Spectral Similarity (not MetFusion)
- References / PubMed Count
- Patent Score
- Optional: additional terms like e.g. the suspect list count, or in PubChemLite the metadata categories if relevant. For biologically-focused studies this could be the BioPathway, for instance.
We have found the exact spectral similarity score a lot easier to interpret (and integrate into automated scoring schemes) than the MetFusion score.
Hope this helps, let me know if you have any more questions!
from metfrag-galaxy.
Thank you @schymane and @korseby
This is really useful information!
It's unfortunate we do not no know the origin of these weightings used but perhaps a good point now to revise them in the Galaxy tool based on what it is practically used in the community for MetFrag.
In Birmingham have been using MetFrag with a few different use cases but a reasonably common usage for general annotation is using the FragmenterScore, (a local) Pubchem database, and a list of Natural products (provided by Kristian derived from the Universal Natural Products Database) for the inclusion list and the Offline MetFusion score. We then combine with some other annotation approaches using different scoring schemes and weights. In some complicated workflows it does make a lot of sense to take the scores out of MetFrag (like @schymane has done with the spectral matching) - and I will probably change to do something similar in the future.
@schymane - do you have general weightings that you use for these scores? (would that be something that you could share? Or is it something that is really dependent on your own analysis and evaluations)
I wonder….although it could be very useful for users to have default weightings in the tool, perhaps for the Galaxy tool at the moment we remove the default weightings and force the user determine their own weightings based on their own preferences.
Does this seem OK with you @korseby? This seems to follow the logic of MetFragCLI which does not specifically provide default weightings for the scores (At least I could not find any default weightings).
from metfrag-galaxy.
Honestly, over several years and many, many users we have found it best to keep each scoring term with a weighting of 1, so that the scores become additive. It seems to be much easier for people to understand intuitively when selecting their candidates afterwards (then max score is the "number of scoring terms chosen"). So this is what I would recommend as the preferred default behaviour - but then leave the option for people to tweak the scores (weighting) if they wish.
We used this in the PubChemLite evaluation and achieved very good results. So this would be something like:
Term | Weighting |
---|---|
FragmenterScore | 1 |
Offline MoNA Exact Spectral Similarity | 1 |
References* / PubMed Count | 1 |
Patent Score | 1 |
Optional Additional Terms | 1 |
*
If you wish to use n
reference counts and merge them, like previously in ChemSpider, then weighting is 1/n for each, to give total score of 1 if you only want one overall reference count. This is not necessary for PubChem, and probably now obselete since ChemSpider requires tokens for the API now.
What is important to do then is to ensure that the two experimental terms (e.g. MetFrag Fragmenter Score and the MoNA score) are reported clearly (separately) in the output along with the aggregated score, because these two are the key terms that help you see if the MetFrag and MoNA results really match the input spectrum, or if it's just the "best match" but only explains the experimental data poorly. Frank also experimented with calculating a spectral match based on the MetFrag predicted fragments here.
If you want to see some of this pictorially, please see some examples here:
https://zenodo.org/record/6856116
(examples start at Slide 37, and were with PubChemLite, not PubChem ... but same idea)
from metfrag-galaxy.
Thanks @schymane for sharing, this is really useful.
I agree just having a simple additive scoring for default for the Galaxy tool like you describe seems sensible and intuitive (but letting the user change the weightings if they would like it).
I will update the Galaxy tool using those defaults and I will check the outputs to see how the experimental terms are separated in the output.
Thanks for you help!
And thanks for the including the links - I will have a read to explore a bit more.
from metfrag-galaxy.
Sorry for the delay. We had a workshop this week and I am incredibly busy right now. So, yes I am completely fine with this weighting.
We used my weighting a few years ago together with a suspect list of natural products. This way, natural products were ranked much higher and we were able to get a lot of candidates ranked first. We mostly applied this weighting scheme for non-model species of plants. The use of other libraries than pubchem has rendered this weighting scheme obsolete anyway.
from metfrag-galaxy.
Related Issues (9)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from metfrag-galaxy.