The trace-peak-picking-adjusted from yinyinghao

Trace: Machine Learning of Signal Images for Trace-Sensitive Mass Spectrometry

Introduction

This repository contains the Python code for the manuscript

Liu, Z., et al. Trace: Machine Learning of Signal Images for Trace-Sensitive Mass Spectrometry – A Case Study from Single-Cell Metabolomics. Anal. Chem. 2019 91 (9), 5768-5776

Recent developments in high-resolution mass spectrometry (MS) technology enabled ultrasensitive detection of biomolecules even at the level of single cells. However, extraction of trace-abundance signals from complex MS datasets requires improved data and signal processing algorithms. To bridge this gap, we here developed "Trace", a software package that incorporates machine learning (ML) to automate feature selection and optimization for the extraction of trace-level signals from MS data. The basic workflow is shown below:

Setup

Environment

Trace is implemented in Python with TensorFlow. For large MS data, we recommend 32+ GB RAM for fast data processing and sufficient data storage capacity. GPU is recommended (but not required) for speeding up the initial training of the model.

To run Trace, following software/libraries should be installed:

Anaconda
TensorFlow 1.15
joblib==0.11
matplotlib
numpy
scikit-learn
requests

Other libraries may also be installed if not existent beforehand. For most of them, pip install would work.

Pre-trained models

To use Trace for signal detection on MS data, a pre-trained model is needed. You can download the pre_trained models here:

pre-trained_models [90M]

and put it under this directory.

While the pre-trained model is provided under default settings for our CE-ESI-MS data, users can also perform independent training for their customized datasets with the code provided. The details of training the model is discussed below.

Input data format

Trace calls on both the centroid and profile MS data to reduce data processing time. Export and convert the MS1 spectra from each raw (primary) MS data file into the open-access mzML file format in both centroid and profile mode.

High-resolution MS data from our study is available at the NIH Common Fund's Metabolomics Data Repository and Coordinating Center website, the Metabolomics Workbench with Project ID PR000686. The data can be accessed directly via Project DOI:10.21228/M80Q2W

Usage

Signal Detection from MS Data

To process your MS data for signal detection (with all parameters and inputs by default), make sure you already have the pre_trained models (either by downloading the models provided above or training you own model). Then simply run the main code by:

python TRACE.py

To change parameters and input MS files, edit the TRACE.py file as needed according to the code comments or our User Manual.

After running the code, a folder called "Results" will be generated (if not existent) and the result files will be saved under that folder. Three files will be generated:

Output File	Description
`Initial_pks.txt`	Initial scanning signal list. Contains (m/z, retention time, intensity, peak area, SNR) for each signal.
`Images_pks.txt`	Images of the potential signals by initial scaning (60x12=720 pixels for each image per row by default).
`Final_pks.txt`	Final signal list. Contains (m/z, retention time, intensity, peak area, SNR) for each signal per row.

Train Your Own Model

While the pre-trained model is provided by default with our CE-ESI-MS data, users are advised to perform independent training for customized datasets, particularly if different types of experimental conditions or technologies were used to acquire the data. For this purpose, besides the python code provided (Training_Model.py), users need to prepare their own training data: both positive (true) and negative (false) signal sample images (imgs-train.txt) and their labels (label-train.txt). The image file should be in such a format that each line stands for a flatted signal image (rows connect to a single row in order). The label file should be in one column indicating whether (1) or not (0) the signal image stands for a true signal in the image file of corresponding row. For example, if N (>1000 recommended for better model performance) samples are collected and labeled, then the data size should be (N x 720) for image file and (N x 1 )for label file. To run the program, execute the Training_Model.py code by

python Training_Model.py

For more details about Trace, please go to the paper.

Contact us

If you have any questions or comments on Trace, please contact us:

[email protected]; [email protected]; [email protected].

yinyinghao / trace-peak-picking-adjusted Goto Github PK

trace-peak-picking-adjusted's Introduction

Trace: Machine Learning of Signal Images for Trace-Sensitive Mass Spectrometry

Introduction

Setup

Environment

Pre-trained models

Input data format

Usage

Signal Detection from MS Data

Train Your Own Model

Contact us

trace-peak-picking-adjusted's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent