Giter VIP home page Giter VIP logo

trace-peak-picking-adjusted's Introduction

Trace: Machine Learning of Signal Images for Trace-Sensitive Mass Spectrometry

Introduction

This repository contains the Python code for the manuscript

Liu, Z., et al. Trace: Machine Learning of Signal Images for Trace-Sensitive Mass Spectrometry – A Case Study from Single-Cell Metabolomics. Anal. Chem. 2019 91 (9), 5768-5776

Recent developments in high-resolution mass spectrometry (MS) technology enabled ultrasensitive detection of biomolecules even at the level of single cells. However, extraction of trace-abundance signals from complex MS datasets requires improved data and signal processing algorithms. To bridge this gap, we here developed "Trace", a software package that incorporates machine learning (ML) to automate feature selection and optimization for the extraction of trace-level signals from MS data. The basic workflow is shown below:

Setup

Environment

Trace is implemented in Python with TensorFlow. For large MS data, we recommend 32+ GB RAM for fast data processing and sufficient data storage capacity. GPU is recommended (but not required) for speeding up the initial training of the model.

To run Trace, following software/libraries should be installed:

Other libraries may also be installed if not existent beforehand. For most of them, pip install would work.

Pre-trained models

To use Trace for signal detection on MS data, a pre-trained model is needed. You can download the pre_trained models here:

and put it under this directory.

While the pre-trained model is provided under default settings for our CE-ESI-MS data, users can also perform independent training for their customized datasets with the code provided. The details of training the model is discussed below.

Input data format

Trace calls on both the centroid and profile MS data to reduce data processing time. Export and convert the MS1 spectra from each raw (primary) MS data file into the open-access mzML file format in both centroid and profile mode.

High-resolution MS data from our study is available at the NIH Common Fund's Metabolomics Data Repository and Coordinating Center website, the Metabolomics Workbench with Project ID PR000686. The data can be accessed directly via Project DOI:10.21228/M80Q2W

Usage

Signal Detection from MS Data

To process your MS data for signal detection (with all parameters and inputs by default), make sure you already have the pre_trained models (either by downloading the models provided above or training you own model). Then simply run the main code by:

python TRACE.py

To change parameters and input MS files, edit the TRACE.py file as needed according to the code comments or our User Manual.

After running the code, a folder called "Results" will be generated (if not existent) and the result files will be saved under that folder. Three files will be generated:

Output File         Description
Initial_pks.txt Initial scanning signal list. Contains (m/z, retention time, intensity, peak area, SNR) for each signal.
Images_pks.txt Images of the potential signals by initial scaning (60x12=720 pixels for each image per row by default).
Final_pks.txt Final signal list. Contains (m/z, retention time, intensity, peak area, SNR) for each signal per row.

Train Your Own Model

While the pre-trained model is provided by default with our CE-ESI-MS data, users are advised to perform independent training for customized datasets, particularly if different types of experimental conditions or technologies were used to acquire the data. For this purpose, besides the python code provided (Training_Model.py), users need to prepare their own training data: both positive (true) and negative (false) signal sample images (imgs-train.txt) and their labels (label-train.txt). The image file should be in such a format that each line stands for a flatted signal image (rows connect to a single row in order). The label file should be in one column indicating whether (1) or not (0) the signal image stands for a true signal in the image file of corresponding row. For example, if N (>1000 recommended for better model performance) samples are collected and labeled, then the data size should be (N x 720) for image file and (N x 1 )for label file. To run the program, execute the Training_Model.py code by

python Training_Model.py

For more details about Trace, please go to the paper.

Contact us

If you have any questions or comments on Trace, please contact us:

[email protected]; [email protected]; [email protected].

trace-peak-picking-adjusted's People

Contributors

jerrychen04 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.