Giter VIP home page Giter VIP logo

metams's Introduction

Docker Automated buil Docker Pulls Docker Stars bioconda-badge Build Status

workflow

Our project

The Workflow4Metabolomics, W4M in short, is a French infrastructure offering software tool processing, analyzing and annotating metabolomics data. It is based on the Galaxy platform.

In the context of collaboration between metabolomics (MetaboHUB French infrastructure) and bioinformatics platforms (IFB: Institut Français de Bioinformatique), we have developed full LC/MS, GC/MS and NMR pipelines using Galaxy framework for data analysis including preprocessing, normalization, quality control, statistical analysis and annotation steps. Those modular and extensible workflows are composed with existing components (XCMS and CAMERA packages, etc.) but also a whole suite of complementary homemade tools. This implementation is accessible through a web interface, which guarantees the parameters completeness. The advanced features of Galaxy have made possible the integration of components from different sources and of different types. Thus, an extensible Virtual Research Environment (VRE) is offered to metabolomics communities (platforms, end users, etc.), and enables preconfigured workflows sharing for new users, but also experts in the field.

Citation

Giacomoni F., Le Corguillé G., Monsoor M., Landi M., Pericard P., Pétéra M., Duperier C., Tremblay-Franco M., Martin J.-F., Jacob D., Goulitquer S., Thévenot E.A. and Caron C. (2014). Workflow4Metabolomics: A collaborative research infrastructure for computational metabolomics. Bioinformatics, http://dx.doi.org/10.1093/bioinformatics/btu813

Galaxy

Galaxy is an open, web-based platform for data intensive biomedical research. Whether on the free public server or your own instance, you can perform, reproduce, and share complete analyses.

Homepage: https://galaxyproject.org/

workflow

How to contribute

Get our tools

All our tools are publicly available in GitHub and freely installable through the Galaxy ToolShed

However, we will be glad to have [good] feedbacks on their usage in order to motivate us (and our funders).

It will also be great if you can cite our papers:

Franck Giacomoni, Gildas Le Corguillé, Misharl Monsoor, Marion Landi, Pierre Pericard, Mélanie Pétéra, Christophe Duperier, Marie Tremblay-Franco, Jean-François Martin, Daniel Jacob, Sophie Goulitquer, Etienne A. Thévenot and Christophe Caron (2014). Workflow4Metabolomics: A collaborative research infrastructure for computational metabolomics. Bioinformatics

doi:10.1093/bioinformatics/btu813

Push your tools / W4M as a Showcase

Your tools can be installed, integrated and hosted within the main W4M instance Tools.

Quality standards

However, the tools must stick to the IUC standards in order to be easily integrated:

In the first place, your tools will be displayed in the Contribution section of the tool panel. And eventually, it should be promoted among the other tools.

Advanced mode

In order to be fully integrated in our reference workflows, your tools must follow your exchange formats between tools (for more information, contact us).

A collaboration should be established if help is needed!

Support / HelpDesk

In all cases, the tools must be maintained by the developers themselves. A tool can be removed if this after sales service isn't done.

Guidelines

metams's People

Contributors

gallardoalba avatar jsaintvanne avatar lecorguille avatar pkrog avatar yguitton avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

metams's Issues

MSP file not recognized in User_defined mode

MSP file when uploaded from user computer by Galaxy, are concidered as txt file and that format is not declared in the user_defined option. Please change that according to what was done in GC_default

Specific issue when compare to DB and stay only 1 unkn

When you want to run metaMS_runGC against a DB but you search also unknowns and that only 1 unknown was found you obtain an error :

Error in x[, settings$timeComparison] : incorrect number of dimensions
Calls: runGC ... match.unannot.patterns -> sapply -> sapply -> lapply -> FUN -> mean

This time this is not due to a variable which is a matrix whereas it should be a list but it apparently due to a list with only 1 thing in it... So like this, we have something like :

$`./0205065.CDF`
        mz      maxo      rt
CP0113  84 0.2200641 26.6531
CP0742 267 0.2948930 26.6531
CP0760 282 0.2927542 26.6656
CP0761 285 0.4139154 26.6656
CP0776 300 0.7185312 26.6656
CP0803 342 0.3021456 26.6656

$`./0205065_1.CDF`
        mz      maxo      rt
CP0920  84 0.2200641 26.6531
CP1549 267 0.2948930 26.6531
CP1567 282 0.2927542 26.6656
CP1568 285 0.4139154 26.6656
CP1583 300 0.7185312 26.6656
CP1610 342 0.3021456 26.6656

whereas we should have something like :

$`./0205004.CDF`
$`./0205004.CDF`[[1]]
        mz       maxo       rt
CP0004  51 0.01793574 24.51583
CP0012  52 0.01412946 24.51583
CP0016  53 0.04252074 24.51583
CP0024  54 0.01917578 24.52208
CP0031  55 0.07008467 24.50958
CP0035  56 0.01895593 24.50958
CP0044  57 0.02532629 24.50958

$`./0205004.CDF`[[2]]
        mz      maxo       rt
CP0365 121 0.9064138 24.74080
CP0933 236 0.1357717 24.74080
CP1076 286 0.2732609 24.74080
CP1233 359 0.1131362 24.74080
CP1350 415 0.1342479 24.74703
CP1461 470 0.1736152 24.74080
CP1479 478 0.1189479 24.74703
CP1504 496 0.1009723 24.74703

$`./0205004.CDF`[[3]]
        mz      maxo       rt
CP0300 109 0.4692890 24.49708
CP0427 133 0.6552011 24.49083
CP0702 183 0.4586975 24.49708
CP0714 184 0.2844711 24.49708
CP0766 194 0.1315781 24.49708
CP0979 249 0.1245335 24.49708
CP1179 329 0.1040561 24.49083
CP1285 385 0.1245118 24.48458

With the name of file first, then the different pseudospectra.

Problem when files haven't any result

When files haven't any results between them, we have an error when we try to run sweep function :

Error in FUN(x, aperm(array(STATS, dims[perm]), order(perm)), ...) : 
  non-numeric argument to binary operator
Calls: runGC -> sweep```

Reading tabular files

The file input for csv, txt, msp and other file sneeded by metaMS is not efficient and cause serious issues.
TO DO add tests for each read.tables to check for the right format (sep ="", or ";" or"\t" ...

Error in EICs names when MSP database used

When the user specify a MSP library as input the names of the target compounds found are not written in EICs, only unknown X is specified anf it should be compound name Y for file X

checklinks to How To Pdf

For now the links to How To pdf are targeted to a pydio based in Roscoff what is the new path in Toulouse?

Empty dataMatrix when file names contains - or too many .

with files names like XXX R&T - 1.cdf I had an issue with dataMatrix.tsv only containing variable names and no information on area. Those information stayed in variableMetadata.tsv due to some kind of issue during file formating (copy/paste columns..)

varName after normalisation

after normalisation Unknown X became Unknown.X and this is a problem
can we manage to avoid the checkname or in case of such small difference allow the tool to run

[TODO] error when compare house DB with RI filter active

@yguitton @jsaintvanne

one error when we try to compare to house DB with RI filter active :

error when using Use RI as filter optioWarning message:
replacing previous import 'xcms::plot' by 'graphics::plot' when loading 'CAMERA'
Note: you might want to set/adjust the 'sampclass' of the returned xcmSet object before proceeding with the analysis.
< -------- Experiment of 42 samples ------------------------------------ >
< -------- Instrument: GALAXY.GC --------------------------------------- >
< -------- Annotation using database of 1 spectra ---------------------- >
< -------- Using xcmsSet object - only doing annotation ---------------- >
< -------- Removing artefacts () --------------------------------------- >
< -------- Matching with database of standards ------------------------- >
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'which': non-numeric argument to binary operator
Calls: runGC ... lapply -> FUN -> which -> outer -> .handleSimpleError -> h
Execution haltedn

Have the output table as html

Actually the output table of unkn is a fixed table which is ranked by metaMS with no real order.
It can be a solution to find easily some compounds to have an interactive table in html format (like GOLM tool in W4M Galaxy) where users can sort by RT or MZ or something else to find their compound.

This issue merge with this one can be able users to find IDs of their compounds and then run EICs creations with these ones.

@yguitton @lecorguille

Problem when DB and files don't really match

We obtain an error when DB and files don't really match... It is probably due to something resolve in this PR : yguitton/metaMS#14

But something stay wrong sometimes after... Not really the same number of rows between each column when we constructExpPseudoSpectra

Have to be explored !

Error with CDF and DB

During the runGC with CDF files and comparing with a DB, we obtain this error :

Error in x[, settings$timeComparison] : incorrect number of dimensions
Calls: runGC ... match.unannot.patterns -> sapply -> sapply -> lapply -> FUN -> mean

https://github.com/rwehrens/metaMS/blob/master/R/matchSamples2Samples.R#L173
It looks that due to a list of 1 element that is not a list but a matrix (probably because there is only one element in it)...
So I should modify how the list has been made just here https://github.com/rwehrens/metaMS/blob/master/R/matchSamples2Samples.R#L31

RI option not working nicely

When using the RI option with a tab separated file or comma the runGC will crash saying that it need a matrix not a data.frame.
Idea modification of the RIarg<-read.table line to create a matrix with first column= rt and second column RI

Add check MSP wrapper

some msp files are not fully compatible with metaMS
for exemple msp file fom AMDIS have ( mz int) instead of mz int; as mass spectrum descriptor

the idea for wrapper is 1 load msp file with a new read.msp that can deal with more msp format
then use write.msp format to create a converted msp file

[TODO] Add tests for EICs

TODO : Add test on EICs (problem with gs not found whereas I added pip install ghostscript and bgs in .travis.yml)

xset.merge issues with metaMS:runGC

For metaMS ploting function we need access to raw files and when files come from the xcmsSet.merge process the filepath doesn't includes the galaxy path

from metaMS (or zip)
resGC$xset[[c]]@xcmsSet@filepaths
[1] "/work/project/w4m/galaxy4metabolomics/galaxy-dist/database/jobs_directory/000/107/107131/working/./FWS_100perNaCl/alg10.cdf"

from merge

xset@filepaths
[1] "./alg3.mzData" "./alg2.mzData" "./alg11.mzData" "./alg9.mzData"
[5] "./alg8.mzData" "./alg7.mzData"

please complete parameters

metaMS in galaxy do not allow user to define all the parameters and too many stays as default.
TO DO increase the available parameters

Golm Metabolome Request Enhancement needed

Today the batch Golm Data base search is not fully taking advantage of the RI filter available on Golm in fact we should trick the tool by setting a default RI = 1500 and add a shift of +/-2500.
can we add a function that will read for each compound in the *.msp file its own std.RI and use it as reference RI?

Use RI as filters

metaMS in R allows the use of RI values instead of rt for peaks comparison/peakmatching, But the optionis not activated in metaMs

Golm Metabolom as database

IDEA can we create a tool that read the Golm Metabolom database files and format it to be usable into metaMS ?

plotUnknown function add a trycatch and remove option

sometime the GCMS EIC.pdf generated by plotUnknown function is empty because the function crash, can we add in xml a boolean saying plotEIC or Not AND add in metams.r a tryCatch(plotUnknown) and in case of crash generate an empty pdf

Problem when first annotation has no match in DB

When the first file has no match in DB, we obtain this table

$`./0205065.CDF`
[1] pattern      alternatives
<0 rows> (or 0-length row.names)

But we would like something with annotation column like this one :

$`./0205003.CDF`
  pattern annotation alternatives
1       1         -1             
2       3         -2             
3       4         -3             
4       5         -4             
5       7         -5             

Because here we bind first annotation and then new.annotation (this one contain new column) :
https://github.com/rwehrens/metaMS/blob/master/R/matchSamples2Samples.R#L160

Just changing the first binding table can be able to add this column... but does it work good with it ?
Testing that..

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.