Giter VIP home page Giter VIP logo

Comments (22)

aparamon avatar aparamon commented on August 20, 2024

Validator script:

import sys
from indigo import *

indigo = Indigo()

for mol in indigo.iterateSDFile(sys.argv[1]):
    flags = 0
    prevmz = None
    for peak in mol.getProperty('MASS SPECTRAL PEAKS').split('\n'):
        mz, intens = peak.split()
        mz = float(mz)
        #if prevmz and mz < prevmz:
        #    print('{}: M/z={} Da out of order.'.format(mol.getProperty('ID'), mz),
        #          file=sys.stderr)
        #    flags += 1
        if prevmz and mz == prevmz:
            print('{}: Duplicate m/z={} Da.'.format(mol.getProperty('ID'), mz),
                  file=sys.stderr)
            flags += 1
        if prevmz and mz > 10*prevmz:
            print('{}: M/z jump at m/z={} Da.'.format(mol.getProperty('ID'), mz),
                  file=sys.stderr)
            flags += 1
        prevmz = mz
    if flags:
        print(mol.getProperty('ID'))
        print(file=sys.stderr)

from massbank-data.

takaakin avatar takaakin commented on August 20, 2024

from massbank-data.

aparamon avatar aparamon commented on August 20, 2024

Dear Takaaki-san!

Thank you for the confirmation, and for your great contribution to MS community. It is not possible to avoid all mistakes; what matters is these can be eventually found and corrected.

I was wondering why my validator script didn't flag m/z=1355 Da peak form your example. Searching MoNA-export-GC-MS_Positive_Mode.sdf manually for "1355.0" revealed a couple of spectra for which this m/z is sensible and in fact corresponds to the molecular ion (UO000022, UO000023) but not the example you mentioned (M*++ of a substance with the molecular mass 271 Da). What is the ID of your example?

from massbank-data.

takaakin avatar takaakin commented on August 20, 2024

from massbank-data.

aparamon avatar aparamon commented on August 20, 2024

Yes, my validator did report
JP005065: M/z jump at m/z=1495.0 Da.
Might it be possible that extra trailing zeros always indicate doubly-charged ions, for MSSJ data (so that my guess about left-over markers is correct)? It could be important to still keep this information then, somehow -- but of course in a less intrusive way.

from massbank-data.

sneumann avatar sneumann commented on August 20, 2024

Dear apamaron,
thanks for your report, and especially
the code to verify records exactly for this issue.

I would like to support the idea of curating data
where the original authors confirm to the changes,
or where changes are obvious from issues in the data processing
leading to the record if it is not possible to contact the authors.

The github revision control will allow to exactly trace back to the original record,
and cross-link all changes done as curation at later points,
giving credit and reasoning for any changes.
Yours, Steffen

from massbank-data.

takaakin avatar takaakin commented on August 20, 2024

from massbank-data.

takaakin avatar takaakin commented on August 20, 2024

from massbank-data.

aparamon avatar aparamon commented on August 20, 2024

@takaakin Thank you for your support!
I believe the original CDR data will not harm in any case, so please do not hesitate to share it.

from massbank-data.

takaakin avatar takaakin commented on August 20, 2024

from massbank-data.

aparamon avatar aparamon commented on August 20, 2024

Dear Takaaki-san!
My email is [email protected], but please note that I do not have write access to MassBank (I became MassBank user only recently, actually). I believe Steffen and Tobias will do best with the CDR data.

from massbank-data.

takaakin avatar takaakin commented on August 20, 2024

from massbank-data.

aparamon avatar aparamon commented on August 20, 2024

@takaakin Yes, please use the above script!
In order to run it please make sure to install Indigo Toolkit:
http://lifescience.opensource.epam.com/indigo
I'll be sincerely happy if this piece of code proves helpful.

from massbank-data.

aparamon avatar aparamon commented on August 20, 2024

@takaakin Btw, current MassBank records are available at http://mona.fiehnlab.ucdavis.edu (you are interested in "GC-MS Spectra") and https://github.com/MassBank/MassBank-data (specifically, https://github.com/MassBank/MassBank-data/tree/master/MSSJ).

from massbank-data.

sneumann avatar sneumann commented on August 20, 2024

Hi @aparamon , I am currently installing indigo and will try to run the script.
Otherwise, you can for MassBank-data, apply the script, and send us a pull request
through GitHub. Yours, Steffen

from massbank-data.

sneumann avatar sneumann commented on August 20, 2024

Hi @aparamon , still have not received indigo from the company, so if you create the PR
that would be great. Alternatively, send the fixed files my way and I can do it on your behalf.
Yours, Steffen

from massbank-data.

aparamon avatar aparamon commented on August 20, 2024

@sneumann Please note that my script only flags the likely mistakes, but doesn't automatically fix them. Although the fixes are usually seemingly obvious, I'm hesitant to propose them as undoubtedly correct. That's why original, ungarbled spectral data by @takaakin are of crucial value.

from massbank-data.

takaakin avatar takaakin commented on August 20, 2024

from massbank-data.

sneumann avatar sneumann commented on August 20, 2024

Dear @takaakin -san,
you could download a snapshot of the whole MassBank-data from github as ZIP, see the screenshot below.
For your convenience I have attached the subset of the MSSJ spectra as of today to this comment:
MSSJ-2018-09-26.zip

We can upload them on your behalf when fixed,
thanks for taking care of the corrections,
yours, Steffen

image

from massbank-data.

takaakin avatar takaakin commented on August 20, 2024

from massbank-data.

takaakin avatar takaakin commented on August 20, 2024

from massbank-data.

meier-rene avatar meier-rene commented on August 20, 2024

Thank you for providing the corrected records. All of them have been added to the repository. Additionally we have implemented some automatic tests which flag errors in the sorting of the peak lists to detect this kind of problems during submission in the future.

from massbank-data.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.