Giter VIP home page Giter VIP logo

nf-core / metaboigniter Goto Github PK

View Code? Open in Web Editor NEW
15.0 31.0 14.0 47.28 MB

Pre-processing of mass spectrometry-based metabolomics data with quantification and identification based on MS1 and MS2 data.

Home Page: https://nf-co.re/metaboigniter

License: MIT License

HTML 0.97% Python 29.01% Nextflow 66.23% Shell 3.78%
workflow metabolomics identification quantification mass-spectrometry nextflow pipeline nf-core ms1 ms2

metaboigniter's People

Contributors

axelwalter avatar egonw avatar ewels avatar kevinmenden avatar maxulysse avatar nf-core-bot avatar payamemami avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

metaboigniter's Issues

Software in bioconda

Generally we aim to use software packaged in BioConda for nf-core pipelines. By doing so we get support for conda, docker and singularity (the 2nd two via https://biocontainers.pro). We also avoid taking on maintenance of software packaging as well as pipeline maintenance.

Currently, this pipeline is using a suite of custom Docker containers. These are all built using dedicated repos at https://github.com/MetaboIGNITER

If possible, it would be great to switch from using these to using Bioconda. Here's my quick googling for them:

So nearly all seem to be available already on the face of it.

As you're currently using one container per process, the quickest way to use them is just to add them to the main script, e.g.:

process xcms {
    container "quay.io/biocontainers/bioconductor-xcms:3.12.0--r40h5f743cb_0"
    conda "bioconductor-xcms:3.12.0-0"

    script:
    """
    normal nextflow stuff here
    """
}

However, if it works, it might be nicer to add an environment.yml file back with the bioconda deps in, if they play well together. That gives a couple of advantages:

  • We can make the get_software_versions process run each command in one process to get the software version numbers reported
  • Simpler administration - nf-core lint checks this file for available updates for example
  • Smaller total file size for Singularity users

If they don't work together then that's fine. Pretty soon we will be moving all pipelines to DSL2 and rewriting pipelines to use a central repository of software wrappers at https://github.com/nf-core/modules - then each process will have to have its own container. If we're not using the main pipeline docker image at all we should delete the Dockerfile though and remove mention of the top-level process.container attribute.

Let me know what you think!

Phil

Do not define "NULL" string as a default value at nextflow_schema.json

Description

Some fields at nextflow_schema.json file define default values like an string ("default": "NULL") this will be a problem in the upcoming version of tower.nf. The "NULL" string will be set at the launchpad form and send to Nextflow when launching the pipeline. Finally the run will fail because Nextflow will interpret it as a string and not as an empty parameter.

Solution

Aligned with the discussion here about enforcing stricter rules for initialising params with no default value, I suggest to just set this fields to null at nextflow.config file and remove the default setting from the schema file. This will be compatible with the future tower.nf release.

Pipeline has no release but no UNDER CONSTRUCTION warning

Check Documentation

I have checked the following places for your error:

Description of the bug

Steps to reproduce

Steps to reproduce the behaviour:

  1. Command line:
  2. See error:

Expected behaviour

Log files

Have you provided the following extra information/files:

  • The command used to run the pipeline
  • The .nextflow.log file

System

  • Hardware:
  • Executor:
  • OS:
  • Version

Nextflow Installation

  • Version:

Container engine

  • Engine:
  • version:
  • Image tag:

Additional context

Process to create library using MSnbase fails on a few files

Description of the bug

In the identification subpipeline, I am trying to perform the identification using internal standards. I have a few library .mzML files and an associated library description file. The process process_create_library_pos_msnbase fails for some of the files but pass for some others

Steps to reproduce

Steps to reproduce the behaviour:

Sorry the data cannot be provided to reproduce the error :(

  1. Command line: nextflow run metaboigniter/main.nf -c metaboigniter/conf/custom.config -profile singularity

  2. System:

  • Hardware: HPC
  • Executor: slurm
  • OS: CentOS Linux
  • Version: 7
  1. Nextflow Installation:
  • Version: 20.10.0
  1. Container engine:
  • Engine: Singularity
  • version: 3.7.4-1.el7
  • Image tag: nfcore/metaboigniter:dev

Errors

Before the latest dev version, the process process_create_library_pos_msnbase failed on some of my library files, but the error was not the same for all these files, either this error (A) :

Error in strsplit(x = hitTMP[, "parentmzs"], split = ";", fixed = T)[[1]] : 
  subscript out of bounds
Calls: createLibrary
Execution halted

or this error (B) :

Error in parentMS2s[[p]] : subscript out of bounds
Calls: createLibrary
Execution halted

In the latest dev version (9c86f6f), with the modifications in the createLibrary.R file, the files which failed with error B (Error in parentMS2s[[p]] : subscript out of bounds) now pass this process, but the files which failed with error A (Error in strsplit(x = hitTMP[, "parentmzs"], split = ";", fixed = T)[[1]] : subscript out of bounds) still fail

If you have any idea on this issue it would be of great help 💪
Thanks in advance

Library search retention time tolerance is missing

Check Documentation

I have checked the following places for your error:

Description of the bug

Steps to reproduce

Steps to reproduce the behaviour:

  1. Command line:
  2. See error:

Expected behaviour

Log files

Have you provided the following extra information/files:

  • The command used to run the pipeline
  • The .nextflow.log file

System

  • Hardware:
  • Executor:
  • OS:
  • Version

Nextflow Installation

  • Version:

Container engine

  • Engine:
  • version:
  • Image tag:

Additional context

"Path value cannot be null" error during featurelinkerunlabeledkd step

Description of the bug

During workflow with peakpickerhires->featurefindermetabo->mapalignerposecluster->maprttransformer workflow stops after mapalignerposecluster step with error "Error executing process Caused by: Path value cannot be null"

Previously had to make adjustment to the modules.config file to get the peakpickerhires step to work (added .centroided to the filename.mzML in line 48) and had to adjust paths on lines 137 and 175 to remove ${meta.id} to fix "filename too long" error.

Command used and terminal output

command: nextflow run nf-core/metaboigniter -profile docker

output: 
WARN: Input tuple does not match input set cardinality declared by process `NFCORE_METABOIGNITER:METABOIGNITER:LINKER:OPENMS_FEATURELINKERUNLABELEDKD` -- offending value: [id:Linked_data]
ERROR ~ Error executing process > 'NFCORE_METABOIGNITER:METABOIGNITER:LINKER:OPENMS_FEATURELINKERUNLABELEDKD (1)'

Caused by:
  Path value cannot be null

Relevant files

files.zip

System information

Nextflow version: 23.10.1
Metaboigniter version: 2.0.0
Hardware: Desktop
Executor: local
Container engine: Docker
OS: Linux (Fedora 39)

Missing output file(s) error when centroiding data

Description of the bug

After centroiding first data file the workflow gives an error and stops, saying it can't find the centroided data file it just created.

It appears that changing line 48 in modules.config to be

ext.prefix = { " ${meta.id}.centroided " }

fixes the issue. The workflow appears to create a new centroided file with the original filename.mzML instead of filename.centroided.mzML which is what the workflow looks for in future steps

Command used and terminal output

No response

Relevant files

No response

System information

Nextflow version: 23.10.01
Metaboigniter version: 2.0.0
Hardware: Desktop
Executor: Local
Container Engine: Docker
OS: Linux (Fedora 39)

Filename too long error when aligning multiple files

Description of the bug

metaboigniter completes without error when running only a couple files, but when running a full batch (63 files) crashes at the alignment step giving a "filename too long" error when trying to create the output from the mapalign step. it looks as though it's trying to pass an array of sample names as a filename to the /alignment/ folder.

Command used and terminal output

command: nextflow run nf-core/metaboigniter -profile docker

output: Mar-01 19:13:26.604 [Task monitor] DEBUG nextflow.Session - Session aborted -- Cause: /home/laytox/projects/smoke/smoke_taint/99-output/alignment/[c3r1-r001, c3r1-r002, c1r3-r003, c1r3-r001, c3r1-r003, c1r1-r002, c1r1-r003, c1r1-r001, c1r3-r002, c3r3-r001, c3r3-r002, c4r1-r001, c3r3-r003, c4r1-r002, s1r1-r001, c4r1-r003, c4r3-r002, c4r3-r003, c4r3-r001, s1r1-r002, s1r2-r001, s1r1-r003, s1r2-r002, s1r2-r003, s1r3-r001, s2r1-r001, s1r3-r002, s1r3-r003, s2r3-r001, s2r1-r002, s2r1-r003, s2r2-r001, s2r2-r003, s2r2-r002, s3r1-r003, s2r3-r002, s2r3-r003, s3r1-r001, s3r3-r001, s3r1-r002, s3r2-r001, s3r2-r002, s3r2-r003, s3r3-r002, s4br2-r001, s3r3-r003, s4br1-r001, s4br1-r002, s4br1-r003, s4br3-r003, s4br2-r002, s4br2-r003, s4br3-r001, s4br3-r002, s4fr2-r001, s4fr1-r001, s4fr3-r001, s4fr1-r002, s4fr1-r003, s4fr2-r002, s4fr2-r003, s4fr3-r002, s4fr3-r003]: File name too long

Relevant files

files.zip

System information

Nextflow version: 23.10.1
Metaboigniter version: 2.0.0
Hardware: desktop
Executor: local
container engine: docker
OS: Linux (Fedora 39)

Process to create library using MSnbase fails

Description of the bug

In the identification subpipeline, I am trying to perform the identification using internal standards. I have a few library .mzML files and an associated library description file. The process process_create_library_pos_msnbase fails gives different errors when different values of the following parameters are set in the conf/parameters.config file:

raw_file_name_preparelibrary_pos_msnbase
compund_id_preparelibrary_pos_msnbase
compound_name_preparelibrary_pos_msnbase
mz_col_preparelibrary_pos_msnbase

Steps to reproduce

Sorry the data cannot be given to reproduce the error :(

  1. Command line: nextflow run metaboigniter/main.nf -c metaboigniter/conf/custom.config -profile singularity

  2. Log file:
    log.txt (.nextflow.log renamed in log.txt)

  3. System:

  • Hardware: HPC
  • Executor: slurm
  • OS: CentOS Linux
  • Version: 7
  1. Nextflow Installation:
  • Version: 20.10.0
  1. Container engine:
  • Engine: Singularity
  • version: 3.7.4-1.el7
  • Image tag: nfcore/metaboigniter:1.0.1

Errors

I found that when we set the parameters (these following four with values different than default) :

raw_file_name_preparelibrary_pos_msnbase = 'RAW_FILE'
compund_id_preparelibrary_pos_msnbase = 'IARC_ID'
compound_name_preparelibrary_pos_msnbase = 'NAME'
mz_col_preparelibrary_pos_msnbase = 'MZ'

the process process_create_library_pos_msnbase fails with the error :

Loading required package: stringr
  Error in `[.data.frame`(libraryInfo, , requiredHeader["mzCol"]) : 
    undefined columns selected
  Calls: createLibrary -> IntervalMerge -> [ -> [.data.frame
  Execution halted

 
When we set the parameter for mz column to default ‘mz’ but the other three to values different than default :

raw_file_name_preparelibrary_pos_msnbase = 'RAW_FILE'
compund_id_preparelibrary_pos_msnbase = 'IARC_ID'
compound_name_preparelibrary_pos_msnbase = 'NAME'
mz_col_preparelibrary_pos_msnbase = 'mz'

the process process_create_library_pos_msnbase also fails but with a different error :

  Loading required package: stringr
  Error in data.frame(startRT = startRT, endRT = endRT, startMZ = startMZ,  : 
    arguments imply differing number of rows: 1, 0
  Calls: createLibrary -> IntervalMerge -> data.frame
  Execution halted

 
When we set all the four parameters to their default values :

raw_file_name_preparelibrary_pos_msnbase = ‘rawFile’
compound_id_preparelibrary_pos_msnbase = ‘HMDB.YMDB.ID’
compound_name_preparelibrary_pos_msnbase = ‘PRIMARY_NAME’
mz_col_preparelibrary_pos_msnbase = ‘mz’

the process succeeds for a few tasks (for a few identification file), but fails for others, giving the following error :

Loading required package: stringr
Error in strsplit(x = hitTMP[, "parentmzs"], split = ";", fixed = T)[[1]] : 
  subscript out of bounds
Calls: createLibrary
Execution halted

 
In the bin folder, I dug into the R scripts involved in the process process_create_library_pos_msnbase (createLibrary.R and createLibraryFun.R) and found that it is related to the dataframe MSlibrary in the script createLibraryFun.R. For the tasks failing, in the dataframe MSlibrary, the columns parentmzs, parentrts, parentInts and MS2s are empty, therefore the line MSlibrary[MSlibrary[,"MS2s"]!="",] returns an empty dataframe and further creates an empty hitTMP dataframe. While for tasks succeeding, these columns are not empty, giving further a non-empty hitTMP dataframe !

I still can’t understand what could have happened leading to this issue 😢
 
If you have any idea that would be great !
Once again thank you so much in advance for your answer 💪
 

negative run error

Description of the bug

i try a lot and always meet the same error message when run negative data

Command used and terminal output

Command error:
  Adding neutral: ---------- Adduct -----------------
  Charge: 0
  Amount: 1
  MassSingle: -18.0106
  Formula: H-2O-1
  log P: -2.99573
  
  Adding neutral: ---------- Adduct -----------------
  Charge: 0
  Amount: 1
  MassSingle: 46.0055
  Formula: C1H2O2
  log P: -0.693147
  
  MassExplainer table size: 4
  Error: Unexpected internal error (WARNING!!! implicit number of default adduct is negative!!! left:-1 right: -1
  )
  Generating Masses with threshold: -2.99573 ...
  done

Relevant files

No response

System information

No response

METABOIGNITER: Migrate all docs to JSON parameter schema

Hi!

this is not necessarily an issue with the pipeline, but in order to streamline the documentation group next week for the hackathon, I'm opening issues in all repositories / pipeline repos that might need this update to switch from parameter docs to auto-generated documentation based on the JSON schema.

This will then supersede any further parameter documentation, thus making things a bit easier :-)

If this doesn't apply (anymore), please close the issue. Otherwise, I'm hoping to have some helping hands on this next week in the documentation team on Slack https://nfcore.slack.com/archives/C01QPMKBYNR

URGENT: pin nf-validation version

Description of the bug

To prevent breaking this pipeline in the near future, the nf-validation version should be pinned to version 1.1.3 like:

plugins {
    id '[email protected]'
}

Command used and terminal output

No response

Relevant files

No response

System information

No response

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.