This repo contains all MassBank records and uses GitHub Actions to validate the content of all records with the Validator from MassBank-web.
Documentation can be found at https://massbank.github.io/MassBank-documentation.
Official repository of open data MassBank records
This repo contains all MassBank records and uses GitHub Actions to validate the content of all records with the Validator from MassBank-web.
Documentation can be found at https://massbank.github.io/MassBank-documentation.
Bug report from external user:
First of all, thank you for the massive effort in developing and maintaining MassBank! I was very pleased to see in the News that all the records were linked to Comptox (if registered), so I gave it a go: the first record I randomly tested was MSJ01067 (Acetamiprid; GC-EI-Q; MS; Positive; M+), I clicked the Comptox link (DTXSID60861331) and...the substance ID does not exist - Acetamiprid ID is DTXSID0034300.
I therefore tested many other records which were all ok, so I assume that I was really unlucky (or an excellent proof-reader) :-)
I don't know if it's an isolated case, but give it a check.
Follow-up:
Indeed that DTXSID doesn't appear to exist in the public Dashboard, nor do I get a match for that InChIKey. If this is a name match, it's wrong ...
https://massbank.eu/MassBank/RecordDisplay.jsp?id=MSJ01067
https://comptox.epa.gov/dashboard/dsstoxdb/results?search=DTXSID60861331
https://comptox.epa.gov/dashboard/dsstoxdb/results?search=WCXDHFDTOYPNIE-UHFFFAOYSA-N
This is the correct match:
https://comptox.epa.gov/dashboard/dsstoxdb/results?search=DTXSID0034300
and is also found by name:
https://comptox.epa.gov/dashboard/dsstoxdb/results?search=Acetamiprid
Any ideas what went wrong here @meier-rene @ChemConnector ?
PubChem link looks fine
https://pubchem.ncbi.nlm.nih.gov/compound/213021
#MetSoc2019, Towards FAIR Spectral Libraries workshop. There is a request to make raw mass spectral files associated with MassBank records available for the public via any repository. Should be vendor's format, not mzML.
The people from Mona asked for a web-hook some while ago. We should add it for easier announcement of changes in the records.
Hi, my validator does not work correctly. There is maybe a problem with the illegal method?
root@massbank2:/MassBank-data# mvn -q -f .scripts/MassBank-web/MassBank-Project/MassBank-lib/pom.xml install
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.google.inject.internal.cglib.core.$ReflectUtils$1 (file:/usr/share/maven/lib/guice.jar) to method java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int,java.security.ProtectionDomain)
WARNING: Please consider reporting this to the maintainers of com.google.inject.internal.cglib.core.$ReflectUtils$1
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
root@massbank2:/MassBank-data# mc
MobaXterm X11 proxy: Unsupported authorisation protocol
root@massbank2:/MassBank-data# mc
MobaXterm X11 proxy: Unsupported authorisation protocol
root@massbank2:/MassBank-data#
root@massbank2:/MassBank-data# ./.scripts/validate.sh ./
Validating 43 files
Exception in thread "main" java.io.IOException: File 'AAFC' exists but is a directory
at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:291)
at org.apache.commons.io.FileUtils.readFileToString(FileUtils.java:1805)
at massbank.Validator.main(Validator.java:230)
root@massbank2:/MassBank-data# ./.scripts/validate.sh /MassBank-data/
Validating 43 files
Exception in thread "main" java.io.IOException: File 'AAFC' exists but is a directory
at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:291)
at org.apache.commons.io.FileUtils.readFileToString(FileUtils.java:1805)
at massbank.Validator.main(Validator.java:230)
UF402601, UF402602, UF402603 and UF402604 are simazine and not desethylterbutylazine, the SPLASHes are identical with the simazine spectra and Martin Krauss has confirmed the retention times also match simazine and not desethylterbutylazine. All (simazine and desethylterbutylazine) spectra were flagged as simazine by Herbert Oberacher.
https://massbank.eu/MassBank/RecordDisplay.jsp?id=UF402603
@meier-rene can you update the compound information (CH$ fields) of the UF40260X records with the compound information from here:
https://massbank.eu/MassBank/RecordDisplay.jsp?id=UF404103
Pls let me know if you need more information, or if you want me to do the updates instead, thanks!
With new API and token rules, it will be easier to add ChemSpider IDs to new records on the MassBank-data side rather than via RMassBank. Posting this as a result of discussions on MassBank/RMassBank#192
Note from Dave at RSC:
I believe that there is a way to ensure that Travis has access to an API token but it is encrypted so that the token is only accessible to the test runner: https://stackoverflow.com/questions/9338428/using-secret-api-keys-on-travis-ci
The validator cannot run some scripts and hence the permission settings in the script folder must be set or maybe sudo should be used to run the script
Will be fixed with registration of VETIST
I'm not sure how this got through the validator but it doesn't look like these meet the Record Format requirements? @meier-rene can you look into this? Thanks!
Multiple names in the title field, including names that are clearly wrong (e.g. including the metal salt - sodium and lithium)
Would it be possible to search database by using m/z vs absolute intensity format? When I copy/paste a spectrum directly from MassLynx it gives me m/z intensity format which the database apparently does not support.
Type Exception Report
Message java.lang.ArrayIndexOutOfBoundsException
Description The server encountered an unexpected condition that prevented it from fulfilling the request.
Exception
org.apache.jasper.JasperException: java.lang.ArrayIndexOutOfBoundsException
org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWrapper.java:598)
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:514)
org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:386)
org.apache.jasper.servlet.JspServlet.service(JspServlet.java:330)
javax.servlet.http.HttpServlet.service(HttpServlet.java:742)
org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
Root Cause
java.lang.ArrayIndexOutOfBoundsException
Note The full stack trace of the root cause is available in the server logs.
@meier-rene can we get a CC0 dump of MassBank Accession IDs with InChIKey mappings for @egonw to add to WikiData? He's registering this property now ;-)
The UF414901-04 spectra are not 17-beta-estradiol, as this does not ionise with the method used. Confirmed by Martin Krauss; flagged by Herbert Oberacher as potentially being 19-norandrostenedione. @tsufz can you confirm what compound information should be associated with this record, or indicate whether @meier-rene should deprecate?
Especially the 03 and 04 records appear good quality spectra, 01 and 02 are close to noise levels.
https://massbank.eu/MassBank/RecordDisplay.jsp?id=UF414904
There are 10 records having the tag
COMMENT: CONFIDENCE
without any confidence value. I think this should not be valid, so please create a rule for the validator and correct the confidence values.
This applies to:
AAFC/AC000433
AAFC/AC000427
AAFC/AC000428
AAFC/AC000432
AAFC/AC000429
AAFC/AC000425
AAFC/AC000431
AAFC/AC000430
AAFC/AC000434
AAFC/AC000426
Hi, the OpenMS team has a prototype for conversion of MassBank records to a single mzML file
in https://github.com/OpenMS/MassBankUpdate/
This can now be done with eba8b81d5819e996a4f115be45b127271bb4e6fa
in https://github.com/sneumann/MassBankUpdate/ and eventually https://github.com/OpenMS/MassBankUpdate
Yours, Steffen
I would like to develop some guidelines for new contributors how to name their accession and how and when to create new directories. This has become urgent due to some email discussion about new contributors and particular some new contributions, like #82.
There are different demands for which we need to find some compromise.
I, as a maintainer of the whole project would like to have data compact and not cluttered. Some directories are desired but not too much. Technical we only support one level with directories at the moment.
There are demands from contributors. They would like to separate their contribution by contributing group, but also sometimes by a specific project, which supported the creation of this records. I expect that a entry in the COMMENT section does not suffice. For them its most likely also matter of public image. Sometimes this separation is not an issue, because there is just one contributing person/group at a particular institution. In other cases more "separation" or "distinguishability" is desired. You all know that everyone has to justify his/her projects somehow...
Possible solutions - technical view -:
-most easy way for contributors: allow subdirectories in the institution directories. This creates major headaches on my side, because it would mean a lot of adjustments to the codebase
-use a directory naming scheme like the one which is already in the current data and resulted in this this discussion issue. Examples for the scheme could be RIKEN_IMS, RIKEN_NPDepo... This is a easy solution because it works right now. Only drawback is the increasing directory number which makes the project view bit more confusing.
-ease the requirement on accession naming, thats easy to implement but might not be sufficient "distinguishability"
Besides directory naming we also have the question of accession naming.
Mix of voltage representations (2000 V, 2 kV). I suggest to standardise from V to kV:
2000 V -> 2 kV
160 V -> 0.16 kV
The UF415701-04 records should be Cortisone and not Prednisolone:
https://massbank.eu/MassBank/RecordDisplay.jsp?id=UF415701
https://massbank.eu/MassBank/RecordDisplay.jsp?id=UF415702
https://massbank.eu/MassBank/RecordDisplay.jsp?id=UF415703
https://massbank.eu/MassBank/RecordDisplay.jsp?id=UF415704
@meier-rene can you update these records with the compound information from https://massbank.eu/MassBank/RecordDisplay.jsp?id=UF415401 ?
Again, the analytical information is correct, just the CH$ entries are for the incorrect compound. The SPLASHes are identical for UF415701 and UF415401.
Dear MassBank team,
I would very much like to use the MassBank data in .mlb format so that I can use Bruker's Library Editor to review and manipulate the library. In the next step I then would like to use sub-libraries of compounds that are more likely to be contained in my samples when I process my data using Bruker's DataAnalysis and Metaboscape.
Thanks for considering my request.
Best regards,
Joerg
According to Mr. Katsutoshi Nagase, Waters Japan, Tokyo, Japan,
all of the 2,992 Waters_Japan records currently on
https://massbank.eu/MassBank/jsp/Result.jsp?type=rcdidx&idxtype=site&srchkey=14&sortKey=name&sortAction=1&pageNo=1&exec=
can be re-licensed to the CC License ("CC BY-NC") and integrated to MassBank-data.
Note: the -NC clause of the re-licensing. Yours, Steffen
Dear all,
after implementing an automatic test to check for problems similar to #13 i found a problem with JEOL_Ltd/JEL00034.txt The peak list might contain two different spectra.
PK$PEAK: m/z int. rel.int.
14.10826 36475 13
15.09583 222688 81
26.10593 73559 27
27.11639 449105 163
28.11312 374331 136
29.13347 788534 286
29.80832 47011 17
30.12487 59097 21
38.12244 31758 12
39.13071 316878 115
40.12874 84849 31
41.14179 616729 224
42.13915 415303 151
43.1387 708455 257
44.14659 172260 63
45.08199 202944 74
46.0992 77463 28
47.10616 159675 58
53.11076 111360 40
54.13147 37750 14
55.14271 199477 72
56.13508 95786 35
57.17133 494598 180
58.15374 126469 46
59.12688 177267 64
67.13793 60353 22
68.13357 792919 288
69.15435 126760 46
70.14899 84057 31
71.17394 238975 87
72.14914 30018 11
73.25028 69003 25
74.11656 271765 99
83.17146 123791 45
85.10854 174341 63
91.13495 34416 12
93.12413 60088 22
95.15051 40090 15
96.16142 183119 66
97.18768 36897 13
99.11687 79397 29
105.66012 30252 11
110.13925 82023 30
111.16113 52938 19
113.15047 55551 20
113.67169 175718 64
116.12024 66879 24
123.15651 56371 20
127.08559 60564 22
136.15661 36760 13
138.14722 298408 108
139.15477 141469 51
140.13857 35555 13
142.10486 66036 24
143.10454 32046 12
143.73454 31880 12
144.13416 35561 13
155.11388 52262 19
156.11776 46508 17
157.10419 77936 28
158.10339 42136 15
166.17478 32350 12
170.10137 271991 99
171.10492 79761 29
180.17858 45502 17
184.11898 161572 59
185.12064 285129 104
186.11901 2751962 999
195.1689 202761 74
198.09722 94862 34
200.03948 47050 17
210.11173 60566 22
212.08434 73157 27
224.08631 45595 17
226.07326 619225 225
227.09087 87787 32 <-last line of first spectra
129.02778 18900 7 <-first line of second spectra
130.11347 42343 15
136.1427 26075 9
137.10019 25130 9
138.11953 677826 246
139.12352 80523 29
140.11027 61803 22
141.08029 95339 35
142.08115 73738 27
144.09759 159144 58
149.20379 26721 10
150.03273 24138 9
152.1384 108578 39
153.02915 38416 14
154.01107 18706 7
155.08419 422515 153
156.11311 78871 29
157.07738 86180 31
164.15452 56191 20
165.102 19225 7
166.13771 231682 84
167.14993 933539 339
168.13903 29477 11
169.04063 34261 12
170.06958 564067 205
171.04781 136229 49
179.9786 14559 5
182.03469 57553 21
183.02114 26768 10
184.0618 224532 82
185.06897 394963 143
186.03256 113767 41
189.03165 18790 7
196.06806 44105 16
198.04149 1279391 464
199.05639 342464 124
//
If I compare this peak list with JEOL_Ltd/JEL00033.txt I find essentially the first spectra (one peak is missing). That's why i suppose that this error is a copy&paste artifact and only the second spectra is valid for JEOL_Ltd/JEL00034.txt.
Is there any way to contact the owner of this records?
User reported that SM858902 and SM858951 contain spectral data from acetylsulfamethoxazole but are labeled diphenhydramine (thank you!). Upon closer inspection we seem to have had an ID/Precursor&peaks mismatch for 3 IDs / 4 records in a series, surrounded by records that look OK; series "broken" due to missing IDs in the middle. We also need to find the cause in https://github.com/MassBank/RMassBank
This should not be passing any form of validation; a screening of the entire CASMI2016 database would be extremely useful for debugging the cause and flagging how and how many records to fix, thank you @meier-rene in advance if you can :-)
From what I can see:
**this one looks OK.
ACCESSION: SM858203
RECORD_TITLE: Cetirizine; LC-ESI-QFT; MS2; CE: 35 NCE; R=35000; [M+H]+
CH$FORMULA: C21H25ClN2O3
CH$EXACT_MASS: 388.15537
MS$FOCUSED_ION: PRECURSOR_M/Z 389.1626
389.1626 C21H26ClN2O3+ 1 389.1626 -0.05
**this one looks OK.
ACCESSION: SM858353
RECORD_TITLE: 2-Hydroxycarbamazepine; LC-ESI-QFT; MS2; CE: 35 NCE; R=35000; [M-H]-
CH$FORMULA: C15H12N2O2
CH$EXACT_MASS: 252.08988
MS$FOCUSED_ION: PRECURSOR_M/Z 251.0826
251.0827 C15H11N2O2- 1 251.0826 0.4
[no records with IDs between 8583 and 8588]
** here something has gone wrong
ACCESSION: SM858801
RECORD_TITLE: Finasteride; LC-ESI-QFT; MS2; CE: 35 NCE; R=35000; [M+H]+
CH$FORMULA: C23H36N2O2
CH$EXACT_MASS: 372.27768
MS$FOCUSED_ION: PRECURSOR_M/Z 256.1696
** here something has gone wrong
ACCESSION: SM858902
RECORD_TITLE: Diphenhydramine; LC-ESI-QFT; MS2; CE: 35 NCE; R=35000; [M+H]+
CH$FORMULA: C17H21NO
CH$EXACT_MASS: 255.16231
MS$FOCUSED_ION: PRECURSOR_M/Z 296.07
** still wrong ... it's using the same (wrong) exact mass to get equivalent wrong precursor
ACCESSION: SM858951
RECORD_TITLE: Diphenhydramine; LC-ESI-QFT; MS2; CE: 35 NCE; R=35000; [M-H]-
CH$FORMULA: C17H21NO
CH$EXACT_MASS: 255.16231
MS$FOCUSED_ION: PRECURSOR_M/Z 294.0554
** still wrong:
ACCESSION: SM859002
RECORD_TITLE: Acetyl-sulfamethoxazole; LC-ESI-QFT; MS2; CE: 35 NCE; R=35000; [M+H]+
CH$FORMULA: C12H13N3O4S
CH$EXACT_MASS: 295.06268
MS$FOCUSED_ION: PRECURSOR_M/Z 325.1711
325.171 C20H22FN2O+ 1 325.1711 -0.17 <= we have F annotations!!!!!
[no 8591]
** and now everything seems OK again ...
ACCESSION: SM859203
RECORD_TITLE: Amitriptyline; LC-ESI-QFT; MS2; CE: 35 NCE; R=35000; [M+H]+
CH$FORMULA: C20H23N
CH$EXACT_MASS: 277.18305
MS$FOCUSED_ION: PRECURSOR_M/Z 278.1903
278.1904 C20H24N+ 1 278.1903 0.42
Following instructions from @sneumann:
I just realised that there is a simpler way to upload files if the directory already exists. In that case, You need 1) and 2), but 3)-7) can be replaced by going to https://github.com/schymane/MassBank-data/upload/master/UniLu
and using the "Upload files" button :-)
I get:
Yowza, that’s a lot of files. Try again with fewer than 100 files.
Since I have 950 files and don't (yet) fancy doing 9.5 commits ... I am trying another way!
I'm trying to reconcile what I see on massbank.eu and massbank.jp with the directories in this repo, the following seem to be missing (num. records MBEU / num. records MBJP)
AAFC (292/0)
CASMI2016 (622/622)
Env Anal Chem, U Tuebingen (119/119)
European MassBank (1/0) <= not a major concern as this is a dummy entry
UPAO (12/12)
The following are present in this repo but not online:
CASMI_2012
The following folders are in the OpenData here: http://www.massbank.jp/SVN/OpenData/record/
CASMI_2016, Env_Anal_Chem_U_Tuebingen, UPAO
I cannot see AAFC there.
There appear to be some creatinine records that are noise only or close too it where we should also consider removing poor records.
This one has good intensity, few peaks but looks fine, I've used it to compare:
https://massbank.eu/MassBank/RecordDisplay.jsp?id=SM868004&dsn=CASMI_2016
PK$PEAK: m/z int. rel.int.
57.0575 160462.5 2
58.0653 4760539.5 69
70.0653 149661.5 2
72.0445 776245.8 11
86.0713 4838806 71
114.0662 67953792 999
These ones are very low intensity, two have peaks that are clearly noise only, two have peaks that are in the peaklist above but still close to noise, missing other main peaks and I recommend actually to remove all four UF records ... again these failed QC by Herbert Oberacher.
https://massbank.eu/MassBank/RecordDisplay.jsp?id=UF412504&dsn=UFZ
one or two genuine peaks, rest noise, low I
https://massbank.eu/MassBank/RecordDisplay.jsp?id=UF412501&dsn=UFZ
noise only?
https://massbank.eu/MassBank/RecordDisplay.jsp?id=UF412503&dsn=UFZ
one or two genuine peaks, rest noise, low I
https://massbank.eu/MassBank/RecordDisplay.jsp?id=UF412502&dsn=UFZ
noise only?
I am comparing the MoNA record at http://mona.fiehnlab.ucdavis.edu/spectra/display/BSU00002 with the MassBank record at https://massbank.eu/MassBank/RecordDisplay.jsp?id=BSU00002
I see stereochem in the structure depiction on MoNA but not in the MassBank record. I assume that InChIs are the basis of the stereo on MoNA but the SMILES has no stereochem on MassBank. The inconsistency is confusing. Is there a StereoSMILES in MassBank that is not displayed?
Thanks for your efforts @meier-rene. However, we need some release policy. Changes and updates in the data are occuring from time to time rather than frequently. Often the upload is also related to reportings or publishing of a paper.
Hence, a fixed release frequency is not an appropriate way to go. I suggest a very open release policy, that means that we release on request or as a larger set of new spectra is uploaded. I would expect that as a contributor, I want to see my spectra online ASAP and I don't want to wait for weeks (as in paper publishing....).
Hi,
currently the last directory failing is
https://travis-ci.org/MassBank/MassBank-data/jobs/368060988
Incorrect number of peaks in peaklist. 16 peaks are declared in PK$NUM_PEAK line, but 13 peaks are found.
PK$PEAK: m/z int. rel.int.
^
Error in UA008702.txt
Incorrect number of peaks in peaklist. 13 peaks are declared in PK$NUM_PEAK line, but 11 peaks are found.
PK$PEAK: m/z int. rel.int.
^
Error in UA008703.txt
Incorrect number of peaks in peaklist. 8 peaks are declared in PK$NUM_PEAK line, but 7 peaks are found.
PK$PEAK: m/z int. rel.int.
^
Error in UA008704.txt
for e.g. https://github.com/MassBank/MassBank-data/blob/master/UFZ_Additional_Specs/UA008704.txt
Yours,
Steffen
Here are the notes and to does from our webmeeting with @laurentheirendt
Hi, the records of Nihon Water are not covered by an open access license. I suggest to remove them from the repository.
Suggestion by @egonw to change CC-BY to CC0 by default, as this is the more applicable and most flexible license. Agree?
@meier-rene are you able to produce a dump file with all InChIKeys in MassBank and, where they have them, the corresponding DTXSIDs? I need all the InChIKeys for one file, and all the DTXSIDs for another.
I've browsed and found several varients of such files, but not one containing exactly this information paired. If you have one already that I missed, please point me to it ;-)
Thanks!
It would be great if we could auto-create a file to deposit in PubChem with every stable release of MassBank-data.
To discuss: compound information only (=> relatively easy) or mappings with spectral IDs (slightly more info needed) or actual spectra as well (more work our side).
Shall we start with getting a deposit file for compound information only? Then we need e.g.:
PUBCHEM_EXT_DATASOURCE_REGID <= InChIKey, or any unique identifier our side
PUBCHEM_EXT_DATASOURCE_SMILES <= SMILES
PUBCHEM_EXT_DATASOURCE_CID <= PubChem CID (if available)
PUBCHEM_SUBSTANCE_COMMENT <= here we could e.g. provide accession IDs, collapsed
PUBCHEM_SUBSTANCE_SYNONYM <= any names our side (can have multiple columns, but maybe e.g. max 3 would be sensible)
@meier-rene @sneumann @tsufz what do you think?
If yes, who will look after the file?
I would contact PubChem to get us a MassBank login for deposition, so credit goes to MassBank(EU) and we can track our submissions.
Hi,
@Tomnl has updated his code to convert MassBank records to a sqlite database:
I have tidied up the MSP to SQLite python code and included it as
separate python package maintained in pip, see docs https://msp2db.readthedocs.io/en/latest/
and code https://github.com/computational-metabolomics/msp2db
The code can be used as CLI or API to create an SQLite database from MSP files
By default, it can work with either MSP format found in MassBank
github or from MoNA. You just need to assign the either "massbank" or
"mona" the schema parameter.
...
I have updated the documentation https://bioconductor.org/packages/devel/bioc/vignettes/msPurity/inst/doc/msPurity-spectral-matching-vignette.html
for msPurity in Bioconductor (development branch)
Includes reference to msp2db documentation and details the databases
in more detail
I have created SQLite databases locally from MassBank and MoNA
I am in the process of getting a suitably sized updated SQLite file
for msPurityData Bioconductor data package.
Please let me know if you have any questions. And I will keep you
informed of any other developments.
It would be great to distribute snapshots of MassBank-data in such a format.
Yours, Steffen
Copy-paste from email received; @meier-rene are you able to follow-up? Thx!
Comparing data from different databases, I found some discrépancies between your data. For the mentioned entry of your database (https://massbank.eu/MassBank/RecordDisplay.jsp?id=OUF00136), the chemical structure indicates that the configuration of the double bond is not defined. This configuration is defined in other databases as InChIKey CWVRJTMFETXNAD-NCZKRNLISA-N:
See:
PubChem: https://pubchem.ncbi.nlm.nih.gov/compound/9476
ChEBI: https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:95271
ChEMBL: https://www.ebi.ac.uk/chembl/compound_report_card/CHEMBL3186431/
EPA: https://comptox.epa.gov/dashboard/dsstoxdb/results?search=DTXSID3024786
Could you check please if the definition of your entry is correct and if the chemical structure is the correct one of if the structural identifiers are wrong ?
The problem is the same for other entries like FIO00619, JP000136, FIO00623... where the chemical structure is not correct compared to the stereoconfiguration at the origin of InChIKey CWVRJTMFETXNAD-JUHZACGLSA-N. This InChIKey requires the definition of the 4 chiral carbons on the ring. Please see:
ChEBI: https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:16112
CHEMBL: https://www.ebi.ac.uk/chembl/compound_report_card/CHEMBL284616/
Reporting upstream as suggested in
https://bitbucket.org/fiehnlab/mona/issues/209/incorrect-m-z-values
Apparently, multiple MoNA GC/MS spectra have incorrect entries in MASS SPECTRAL PEAKS.
To reproduce:
http://mona.fiehnlab.ucdavis.edu/spectra/display/JP011674
Spectrum window displays multiple peaks with m/z > 1000 Da, which is definitely out-of-range (molecule mass is 436.2438).
Closer examination reveals that MASS SPECTRAL PEAKS contains multiple entries which are out of order, and exceed the previous entry by factor of >10:
...
145 3.2
1490 3.2
155 1.7
159 2.3
165 1.7
1670 2.3
169 2.6
173 1.3
175 2.3
1770 1.5
178 2.3
179 3.2
18 1.1
1810 7.4
182 1.7
183 1.3
191 2.3
1930 2.1
195 1.3
197 1.5
...
Luckily, all these should be easy to fix, by removing the extra trailing zeroes.
(Were these markers of some sort the operator failed to remove before submitting, or OCR bugs?)
In attached list.txt is a list of all automatically-flagged suspicions records; more detailed information is available in log.txt.
Please note that there are some false positives due to rare strange ordering of MASS SPECTRAL PEAKS items:
http://mona.fiehnlab.ucdavis.edu/spectra/display/JP011672
(what natsort?!)
However, only <400 records have non-conventional ordering, so it might be feasible to review them all.
@zzjl20, I found that synonyms are annoted as COMMENT: Synonyms:
, for example in
https://massbank.eu/MassBank/RecordDisplay.jsp?id=NGA00625&dsn=RIKEN_NPDepo.
In accordance to our specificication, synoynms should be annoted in CH$NAME
with one entry per synonym. The first CH$NAME
entry should be the preffered name.
For example:
CH$NAME: (S)-Luteanine
CH$NAME: Artabotrine
CH$NAME: Luteanine
The synonyms in Comment will be not stored as chemical names and thus a search for example for Artabotrine has no result.
May I ask you to edit the respective records and to resubmit them?
Thanks a lot and best wishes
Tobias
The following fields and entries do not comply to the latest Record Format and should be changed in the data for harmonisation.
Those tags are wrongly used or the terms changed during the last years:
AC$MASS_SPECTROMETRY: FRAGMENTATION_METHOD
-> AC$MASS_SPECTROMETRY: FRAGMENTATION_MODE
AC$MASS_SPECTROMETRY: FRAGMENT_VOLTAGE
-> AC$MASS_SPECTROMETRY: IONIZATION_ENERGY
AC$MASS_SPECTROMETRY: IONIZATION_POTENTIAL
-> AC$MASS_SPECTROMETRY: IONIZATION_ENERGY
AC$MASS_SPECTROMETRY: RESOLUTION_SETTING
-> AC$MASS_SPECTROMETRY: RESOLUTION
AC$MASS_SPECTROMETRY: ION_SOURCE_TEMPERATURE
-> AC$MASS_SPECTROMETRY: SOURCE_TEMPERATURE
AC$CHROMATOGRAPHY: CAPILLARY_VOLTAGE
-> AC$MASS_SPECTROMETRY: CAPILLARY_VOLTAGE
AC$CHROMATOGRAPHY: INJECTION_TEMPERATURE
-> AC$MASS_SPECTROMETRY: SOURCE_TEMPERATURE
AC$CHROMATOGRAPHY: RETENTION_INDEX
-> AC$CHROMATOGRAPHY: KOVATS_RTI
I checked all entries, they are in the 1000er ranges and thus it is very propably that they are related to the KOVATS_RTI
AC$CHROMATOGRAPHY: OVEN_TEMPERATURE
-> AC$CHROMATOGRAPHY: COLUMN_TEMPERATURE_GRADIENT
MS$FOCUSED_ION: PRECURSOR_M/Z
-> MS$FOCUSED_ION: PRECURSOR_MZ
@meowcat mentioned that we should avoid slashes in the tags.
This is just a typo:
AC$CHROMATOGRAPHY: TRANSFARLINE_TEMPERATURE
-> AC$CHROMATOGRAPHY: TRANSFERLINE_TEMPERATURE
The following tags can be merged into AC$MASS_SPECTROMETRY: MASS_RANGE_MZ
AC$MASS_SPECTROMETRY: MASS_RANGE_M/Z
AC$MASS_SPECTROMETRY: SCAN_RANGE_M/Z
AC$MASS_SPECTROMETRY: SCANNING_RANGE
Just a reminder.
Hi @meier-rene @tsufz,
There is a project in NORMAN joint program of activities to upload GC-APCI-QTOF mass spectra in MassBank. I prepared for you one massbank record (https://www.dropbox.com/s/93z76o9bx243lll/AU230117.txt?dl=0), so that you apply the needed modifications to MassBank (if any).
Let me know if all is okay with the sample record, so that I give the signal for production of GC-APCI-QTOF records.
Thanks!
Nikiforos
I just stumbled over two records, which seem to be duplicates. Meta data as well as the spectrum is exactly the same.
https://massbank.eu/MassBank/RecordDisplay.jsp?id=TY000228&dsn=Univ_Toyama
https://massbank.eu/MassBank/RecordDisplay.jsp?id=TY000237&dsn=Univ_Toyama
Maybe it is worth to search MassBank globally for such cases.
I guess we will have to contact the contributors in any case.
How to tackle this? I suggest to introduce a "DEPRECATED" tag for records which are duplicated (this issue) or noisy (e.g. #51) or otherwise erroneous (#9).
The metadata of the Kaempferol-7-O-glucoside spectra in MassBank are not consistent. We have a mixup of the protonated and deprotonated form in InChi, SMILES, MolecularFormula and exact mass.
Hi,
RMassBank or one of the used databases provides sometimes chemical names in different capitalization. However, the validator is not case sensitive. I think the validator should be less picky and should not complain about duplicates in case of case sensitive dublicates.
Or we need general rules about the caplitalisation which need to be implemented in both validator and RMassBank.
Yours,
Tobias
To update the links on Chemspider to the new MassBank url and to include new records also we need a mapping file InChI <--> URL.
We currently have 28 Clarithromycin records and three records from UFZ appear to be noise only and I recommend we remove them:
https://massbank.eu/MassBank/RecordDisplay.jsp?id=UF408502&dsn=UFZ
https://massbank.eu/MassBank/RecordDisplay.jsp?id=UF408504&dsn=UFZ
https://massbank.eu/MassBank/RecordDisplay.jsp?id=UF408503&dsn=UFZ
All only have peaks around 522 incredibly close to the noise level of the Orbi and fail quality control checks by Herbert Oberacher, also with other MassBank entries.
PK$PEAK: m/z int. rel.int.
522.4671 2511.9 254
522.9462 9873.2 999
522.9945 3734.4 377
@meier-rene @Treutler the EPA have set up a basic service that should allow retrieval of DTXSIDs by InChIKey, can you look into implementing this on the database end to add DTXSIDs to all records with matching entries for now, I will post a separate issue to get this into RMassBank and linked up in MassBank-web.
It's already in our Record format as
CH$LINK: COMPTOX DTXSID50274017
(https://github.com/MassBank/MassBank-web/blob/master/Documentation/MassBankRecordFormat.md)
https://actorws.epa.gov/actorws/chemIdentifier/v01/resolve?identifier=IKHGUXGNUITLKF-UHFFFAOYSA-N
https://actorws.epa.gov/actorws/chemIdentifier/v01/resolve.json?identifier=IKHGUXGNUITLKF-UHFFFAOYSA-N
https://actorws.epa.gov/actorws/chemIdentifier/v01/resolve.xml?identifier=IKHGUXGNUITLKF-UHFFFAOYSA-N
Any feedback re service to @ChemConnector
Thanks!
External comment to massbankEU mail:
I think that the following two data are mistakenly replaced.
Could you confirm it? Thank you.
https://massbank.eu/MassBank/jsp/RecordDisplay.jsp?id=KO002063&dsn=Keio
L-Aspartic acid; LC-ESI-QQ; MS2; CE:10 V; [M+H]+
https://massbank.eu/MassBank/jsp/RecordDisplay.jsp?id=KO002066&dsn=Keio
L-Aspartic acid; LC-ESI-QQ; MS2; CE:40 V; [M+H]+
ES comment: I have asked for more detail; the CE10 spectrum looks more like CE40 and vice versa, but this is not entirely clear, esp with one high mass peak (noise?).
Some "compounds" we measure are not single compounds, but mixtures of isomers or similar compounds. However, often the mixture is reported, for example Nystatin, which contains Nystatin A1, A2 and A3.
An example is https://massbank.eu/MassBank/jsp/RecordDisplay.jsp?id=EQ314001&dsn=Eawag
The name is Nystatin (the mixture), but the shown compound is Nystatin A1 ``https://comptox.epa.gov/dashboard/dsstoxdb/results?search=DTXSID80872323` related to https://comptox.epa.gov/dashboard/dsstoxdb/results?search=DTXSID80872323#related-substances
.
For the measurement also proxy compounds are used (for example in case of surfactant mixtures with homologes or nonylphenol).
However, from a pure data science / machine view point this relation is wrong without addional information that the given compound is a proxy. In PubChem also only the proxy is given, DTX is better, of course.
Therefore, we should implent a structure to handle this situation:
@hunter-moseley suggested to implement a JSON representation for the records to enhance machine readibility and to account for some future transitions.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.