Giter VIP home page Giter VIP logo

dimspec's Introduction

Database Infrastructure for Mass Spectrometry (DIMSpec)

About

 Welcome to the Database Infrastructure for Mass Spectrometry project. This project is the result of work from the National Institute of Standards and Technology's Material Measurement Laboratory, Chemical Sciences Division. We seek to provide a comprehensive portable database toolkit supporting non-targeted analysis of high resolution mass spectrometry experiments for exposure-based analyte targets (e.g. per- and polyfluorinated alkyl substances (PFAS)) including descriptive metadata for analytical instrument method, quality analysis, and samples. If you would like to get involved, or just to keep track of the project, please give this repository a watch or star, or send an email to [email protected] to receive updates.

Latest News

2024 February (@jmr-nist-gov)

 A video tutorial series is now available for DIMSpec, discussing download and setup, file conversion to .mzML, and using the MSMatch application.

2024 February (@jmr-nist-gov)

 Minor changes to the quick install guide were made to clarify some language, especially in regards to what is actually required versus recommended versus suggested, and under which circumstances those apply.

 A bug was fixed in the molecule_picture function where invalid filenames were produced from InChI (and other) strings. Invalid filename characters are now substituted with descriptive characters for these; the result is that filenames no longer match 1:1 with molecular notation in many cases, though most SMILES strings should remain intact. Also, use of the show argument should be more intuitive and will now display the resulting picture in the system viewer.

 These changes will be included in the next release, but can be downloaded directly from the current repository.


2024 January (@jmr-nist-gov)

 The DIMSpec project was featured as part of the SERDP Webinar Series on December 7, 2024. A recording of that webinar, the first half of which is dedicated to DIMSpec is now available.


Older news items (click to expand)
2023 December (@jmr-nist-gov)  This update provides quality of life improvements and minor bug fixes in MSMatch, and supports certain functionality issues related to package versioning when installed on R v4.3 as of Nov 2023. If you are running with R v4.1 and certain package combinations, you may run into an issue with logging and receive a console message regarding `log_formatter`. If so, turn off logging by setting `LOGGING_ON <- FALSE` in the `config/env_log.txt` file or update your packages. Furthermore, this update (a) fixes certain instances with alert messages failing to render, (b) fixes a rare issue with uncertainty calculation inheriting NaN values, (c) adds support for advanced settings on the match uncertainty evaluation tool, and (d) fixes the location of alert messages which could occasionally run past the bottom of the browser.
2023 July (@jmr-nist-gov)  DIMSpec has been updated to its first release candidate version. Changes include schema tightening for annotated fragments and PFAS data updates including consistency updates to analyte nomenclature including aliases, and other minor bug fixes.

Motivation

 In analytical chemistry, the objective of non-targeted analysis (NTA) is to detect and identify unknown (generally organic) compounds using a combination of advanced analytical instrumentation (e.g. high-resolution mass spectrometry) and computational tools. For NTA using mass spectrometry, the use of reference libraries containing fragmentation mass spectra of known compounds is essential to successfully identifying unknown compounds in complex mixtures. However, due to the diversity of vendors of mass spectrometers and mass spectrometry software, it is difficult to easily share mass spectral data sets between laboratories using different instrument vendor software packages while maintaining the quality and detail of complex data and metadata that makes the mass spectra commutable and useful. Additionally, this diversity can also alter fragmentation patterns as instrument engineering and method settings can differ between analyses.

 This report describes a set of tools developed in the NIST Chemical Sciences Division to provide a database infrastructure for the management and use of NTA data and associated metadata. In addition, as part of a NIST-wide effort to make data more Findable, Accessible, Interoperable, and Reusable (FAIR), the database and affiliated tools were designed using only open-source resources that can be easily shared and reused by researchers within and outside of NIST. The information provided in this report includes guidance for the setup, population, and use of the database and its affiliated analysis tools. This effort has been primarily supported by the Department of Defense Strategic Environmental Research and Development Program (DOD-SERDP), project number ER20-1056. As that project focuses on per- and polyfluoroalkyl substances (PFAS), DIMSpec is distributed with mass spectra including compounds on the NIST Suspect List of Possible PFAS as collected using the Non-Targeted Analysis Method Reporting Tool.

Features

  • Portable and reusable database infrastructure for linking sample and method details to high resolution mass spectrometry data.
  • Easily extendable schema for new data extensions or views.
  • Open source from inception to delivery using only R, python, and SQLite.
  • Application programming interface (API) support using the plumber framework.
  • Web applications for exploration and data processing, including a template web application to quickly build new GUI functionality using the shiny framework.
  • Development support through flexible logging and function argument validation frameworks.
  • Includes curated high resolution mass spectra for 132 per- and polyfluorinated alkyl substances from over 100 samples using ESI-, ESI+, and APCI- detection methods (as of 2023-03-16). The DIMSpec for PFAS database is provided here as an example, and is published on the NIST Public Data Repository at https://doi.org/10.18434/mds2-2905. If you use the DIMSpec for PFAS database, please cite both this repository and that file.

Getting Started

While the only hard requirement for using DIMSpec is R version 4.1 or later (packages will be installed as part of the installation compliance script, though users on Windows systems should also install RTools), to get the most out of DIMSpec users may want to include other software such as (but in no way limited to):

  • Java (with bit architecture matching that of R)
  • MSConvert >= 3.0.21050 (from ProteoWizard)
  • SQLite >= v3.32.0
  • Mini/Anaconda w/ Python >= 3.8 (if not already installed, R will install it as part of the compliance script, though advanced users may want to explicitly install this themselves)

Note: As of the December 2023 release, use of R v4.3 is encouraged as support for older versions of R will sunset in 2024.

To get started in most cases from a blank slate:

  1. Ensure R v4.1+ is installed (download)
  2. Download the project by forking this repository or downloading the zip file.
    • If using Windows, ensure RTools (download) matching your R version is installed to build certain packages.
  3. Run the compliance script, which should install everything needed for the project.
    • The easiest way is to load the project using RStudio (download).
      • Open RStudio and click "File" > "Open Project..." and navigate to the location where you downloaded the project.
      • Either open the file at "R/compliance.R" from the "Files" pane and click the "Source" button or enter the command source(file.path("R", "compliance")) in the console pane.
    • If not using RStudio, open an R terminal at the project directory (or setwd(file.path("path", "to", "project")) and enter the command source(file.path("R", "compliance")).
    • The first installation typically takes around half an hour from start to finish, depending on the speed of your internet connection and computer.

A quick guide is available describing the install process.

For evaluation and distribution purposes, DIMSpec is distributed with a populated database of per- and polyfluorinated alkyl substances (PFAS), but supporting functionality is present to easily create new databases. This enables DIMSpec to support multiple efforts simultaneously as research needs require.

Guides and Documentation

For a full description of the project and its different aspects, please see the DIMSpec User Guide.

A series of Quick Guides have been made available focusing on various aspects of the project.

In addition, a series of short video tutorials are available discussing certain topics.

  • Download and installation
  • mzML conversion of instrument data files
  • Import files and process on MSMatch
  • Library searching and data mining
  • Fragmenation searching and data mining

Links

Several links can provide additional contextual information about this project. If any of the resource links below are broken, please report them so we may address it. The user guide is also available in running DIMSpec sessions using the user_guide() function which will load a local version of the user guide if the web version is unavailable or your computer is offline.

Contacting Us

If you have any issues with any portion of the repository, please feel free to contact the NIST PFAS program at [email protected] directly or post an issue in the repository itself.

The main contributors to this project from NIST were members of the Material Measurement Laboratory's Chemical Sciences Division:

  1. Jared M. Ragland orcid icon with link (@jmr-nist-gov) (email) (staff page) (Chemical Informatics Group)
  2. Benjamin J. Place orcid icon with link (@benjaminplace) (email) (staff page) (Organic Chemical Metrology Group)

Contributing

NIST projects are provided as a public service, and we always appreciate feedback and contributions. If you have a contribution, feel free to fork this project, open a PR, or start a discussion. The authors hope this effort spurs further innovations in the NTA open data space for environmental mass spectrometry.

Disclaimer

Certain commercial equipment, instruments, software, or materials are identified in this documentation in order to specify the experimental procedure adequately. Such identification is not intended to imply recommendation or endorsement by the National Institute of Standards and Technology, nor is it intended to imply that the materials or equipment identified are necessarily the best available for the purpose.

This work is provided by NIST as a public service and is expressly provided "AS IS." Please see the license statement for details.

Funding Source

The work included in this repository has been funded in large part by the Department of Defense's Strategic Environmental Research and Development Program (SERDP), project number ER20-1056.

dimspec's People

Contributors

benjaminplace avatar jmr-nist-gov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

dimspec's Issues

"ModuleNotFoundError: No module named 'rdkit'" when running "_Plumber set up" in compliance.R

I am using RStudio 2023.12.1 Build 402 with R version 4.3.3 on an Ubuntu 22.04 laptop. I also installed conda 24.1.2 with auto_activate_base set to false.

While performing the quick install with compliance.R, which I was running line-by-line in RStudio to follow its progress, I ran into the following error while running the code for "_Plumber set up": ModuleNotFoundError: No module named 'rdkit'. "inst/plumber/env_plumb.R" contains a section called "Enable RDKit", so I ran the code for "_RDKit set up" before "_Plumber set up" (reversing their order in compliance.R), and the installation was successful. This suggests that their order should be switched in compliance.R.

Caveat: I do have to activate conda in the terminal before the code in "_RDKit set up" will run using conda environments, and did not try this until after the ModuleNotFoundError, so it is possible that activating conda in advance would also have prevented the error.

Blank subtraction

Good day
Was blank subtraction performed in this code in any way.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.