Giter VIP home page Giter VIP logo

Comments (50)

yvanlebras avatar yvanlebras commented on May 18, 2024 2

It is! ;)

I've check with @cmonjeau and we can begin to write a MultiQC Galaxy tool using the bioconda or a Docker dependency.... We'll keep you informed !

from multiqc.

bgruening avatar bgruening commented on May 18, 2024 2

@ewels @yvanlebras not following the problem closely, but do not forget you can always change the input file name on disk to everything you like by simply creating a symlink of $input.

ln -s $input ./my_smart_name.bam

This could also be the name of the history element, which you can access by $input.name afaik.

from multiqc.

ewels avatar ewels commented on May 18, 2024

Quite a lot of work as I need to set up a galaxy instance. Dropping this for now, but would be nice to come back to at some point. If anyone out there is already used to Galaxy, would be great to have some help!

from multiqc.

yvanlebras avatar yvanlebras commented on May 18, 2024

Hi @ewels We, with @abretaud, just begin to evaluate the possibility to add multiQC to a Galaxy instance... Maybe something on an "Interactive environment" can be investigated with @bgruening...

from multiqc.

ewels avatar ewels commented on May 18, 2024

Hi @yvanlebras, that's great! I'll re-open this issue then. Do you think it will be a lot of work to write a galaxy wrapper for it?

from multiqc.

yvanlebras avatar yvanlebras commented on May 18, 2024

I/We have to evaluate this ! In fact, creating a "classical" Galaxy tool seems to be not the best way.... As mentionned, creating a MultiQC Interactive Environment (like in this screencast : https://www.youtube.com/watch?v=mKmXSN1G-Po) can be a good way... The idea will be to allow MultiQC interact with logs from an history where have been execute tools like FastQC, STAR, Cutadapt, .... @bgruening & @erasche will be best to evaluate this....

from multiqc.

hexylena avatar hexylena commented on May 18, 2024

@yvanlebras is the output more than just an HTML dataset? I'm just seeing HTML + JS there. If so, then this would be fine as a classical galaxy tool, no IE needed

from multiqc.

yvanlebras avatar yvanlebras commented on May 18, 2024

True... I don't know why I absolutely want to propose an IE... Oh indeed, because I love this functionality ;) More seriously,  you're right, this is more because I had like a feeling that MultiQC is more something who can be applied to an entire QC history + transiently + about dynamic visualization... so more "usable" as an IE than a Galaxy tool....

from multiqc.

ewels avatar ewels commented on May 18, 2024

@erasche the main output from MultiQC is a HTML report as you say (a single file, everything is embedded). It does also save parsed data as tsv / yaml / json in a directory which can be helpful sometimes. It's easy to disable this if needed with a config option or command line flag.

from multiqc.

ewels avatar ewels commented on May 18, 2024

The only thing that had worried me about running on Galaxy is that MultiQC needs to see all previous logs / standard out / stderr and so on (varies across tools). Not sure if different tasks are sandboxed in galaxy or not? Sorry for my unfamiliarity with it :)

from multiqc.

hexylena avatar hexylena commented on May 18, 2024

@yvanlebras I understand the feeling, IEs are exciting, I want to turn lots of tools into them too. :) That's an interesting application though, run MultiQC on existing datasets. Interesting thought!

@ewels ok... If your tool needs access to stdout/stderr, I am absolutely sure we can find a way to make that possible if it isn't already. It would be easily possible to keep all of the tsv/yaml/json extra datasets.

from multiqc.

ewels avatar ewels commented on May 18, 2024

@erasche Not exactly - it just need access to files made by other tools. Some of these files will have come from the stdout/stderr of that tool. My point was that the files it needs could be seen as intermediate files and deleted before MultiQC runs..? Not sure if galaxy does that..

from multiqc.

hexylena avatar hexylena commented on May 18, 2024

@ewels ah, ok, I see.

Yes, intermediate files can be deleted. I do something similar with the JBrowse tool I maintain -- a large number of tools are run and reported, much like your MultiQC tool does. I mark the intermediate datasets as being safe to delete.

from multiqc.

ewels avatar ewels commented on May 18, 2024

@erasche sounds good. Though MultiQC doesn't run anything itself, so won't have any control over what other galaxy tool wrapper class as being safe to delete.. Anyway, these are details. How do we go about doing this? @yvanlebras are you intending to write anything? I guess I should try and get a local instance of galaxy running to test stuff on..

from multiqc.

bgruening avatar bgruening commented on May 18, 2024

@yvanlebras @ewels let me know if you need any support.
@ewels if you want to start Galaxy tool dev have a look at planemo. We provide also VM and Docker containers to make dev life easier.

from multiqc.

ewels avatar ewels commented on May 18, 2024

Thanks @bgruening - planemo looks brilliant! I'll take a look next week.

from multiqc.

devengineson avatar devengineson commented on May 18, 2024

Hi everyone,

As mentionned before, we, @yvanlebras and @cmonjeau have begin to work on a Galactic MultiQC tool.. yesterday morning ;)

Thank you very much @ewels for this beautiful tool, I really found it terrific and I think we have a lot to do with it, notably inside Galaxy and on our start-up project, EnginesOn !

So you will find:

Don't hesitate to make comments or ask for more informations!

We will continue in the next days to work on this integration task.

All the best,

Yvan

Integration difficulties

Galaxy

  • Tool Shed featureCounts tool output names are too long, this may induce issues. We change this name to a shorter one for MultiQC
  • Tool Shed cutadapt tool version (1.6) is older than the accepted version for !MultiQC (>1.8). We have add a function to test the version of the cutadapt to allow Galaxy !MultiQC tool to use older cutadapt version.
  • We are using repeat param for input files in the !MultiQC Galaxy tool. This is not very usable when we have a lot of input files (for FastQC, cutadapt tools for example). As data collection is a batch mode input field, we plan to use "multiple = true" option for the input parameter then process the resulting list of files. _FIXED using a multi file selection by module (FastQC / Tophat / cutadatpt) _
  • HTML sanitization deactivation mandatory

MultiQC

  • Visualization bugs on Mozilla
  • Order of tools is quite strange no? Why don't propose from top-down, pre-processing tools as cutadapt, !fastQC, then alinment tool as Tophat / STAR then featureCounts ?
  • MultiQC search on the cutadapt report file the name of the command to associate the good dataset name. In Galaxy, this is something like ".../dataset_58.dat". For Galaxy, it will be better that !MultiQC use the cutadapt report file name for example...
  • Considering featureCounts, if there is space on the samples names, !MultiQC will failed to process files.

from multiqc.

bgruening avatar bgruening commented on May 18, 2024

awesome news!!! 🍺

from multiqc.

ewels avatar ewels commented on May 18, 2024

Fantastic! Thank you very much @devengineson / @yvanlebras / @cmonjeau ! This looks awesome.

Some thoughts regarding your difficulties:

Galaxy

  • featureCounts tool output names are too long
    • Long sample names can be cleaned up using parameters saved in a MultiQC config file, if there are consistent pipeline-specific strings such as here. See these docs for instructions. Not sure if this is easier than changing the featureCounts tool itself though.
  • old version of cutadapt
    • I'm happy to try to modify the MultiQC module to handle output from v1.6 if you'd like.
    • I presume that this is a typical log file?
  • lots of input files
    • A recent addition by @lpantano was to give the option of supplying a file listing input files to be used, rather than on the command line. This was for bcbio, where they're trying to support CWL (maybe the same for Galaxy?).
    • Operation is now multiqc --file-list data/special_cases/file_list.txt. See #201 for more information.
    • Note that this is only in v0.7dev, not in v0.6.

MultiQC

  • Visualization bugs on Mozilla
    • You're right! You mean the general statistics table at the top? Thanks for noticing this - I'll take a look (#213).
  • order of tools
    • I guess this is a personal thing. Usually the thing I'm most interested in is the final result (eg. final QC step or whatever), and I dig back earlier and earlier to try to explain stuff. Whilst not chronological, I find this more intuitive for how I use the report ("biologically"). I don't have any plans to change this.
  • HTML sanitization deactivation mandatory
    • I don't understand what you mean by this sorry.
  • Take cutadapt sample name from filename
    • This is a tough one. Whilst I appreciate that in this case it would be better, it's difficult to change this behaviour for a single instance of MultiQC. Because cutadapt output is in stderr logs, other users may have results from multiple files concatenated within a single file (I often do). This is why I try to take sample name from the input filename where possible.
    • Going by this log file it looks like the sample name should be the same anyway? Here it would be dataset_39.dat as that's the input filename.
    • Slightly worrying that you're using .dat - I'm not sure how many modules assume .fastq / .fq file formats. But we can fix that problem when we find it I guess.
  • featureCounts breaks if space in filename
    • Great spot, thanks! I've added an issue to fix this (#214).

from multiqc.

devengineson avatar devengineson commented on May 18, 2024

Thank for your rapid comments @ewels !

For the visualization bug on Mozilla, you're right! This is on the general statistics table at the top.

I understand the "biologically" order ;)

Concerning the HTML sanitization, this is a Galaxy related issue not MultiQC, sorry for the bad assignment ;)

For cutadapt v1.6 (and maybe older versions), the first line is different from > 1.8 and is : "This is cutadapt 1.6 with Python 2.7.3"

from multiqc.

ewels avatar ewels commented on May 18, 2024

Hi @devengineson,

I think most of these changes are done now - the firefox display bug should be fixed, featureCounts now tolerates having spaces in sample names and the Cutadapt module should work with the old style logs.

Let me know if there's anything else that I can do to help! Thanks again for your work on this.

Phil

from multiqc.

devengineson avatar devengineson commented on May 18, 2024

Hi @ewels ,

Thanks for your rapidity!!! After another galactic integration day, several others comments below.

Cheers,

Yvan

FastQC: ok

Tophat: ok

samtools stats: ok

cutadapt: ok

featureCounts: ok

Bismark: ok

Picard:

  • markdups : OK but MultiQC searchs 'picard.sam.MarkDuplicates' instead of 'picard.sam.markduplicates.MarkDuplicates'. Fix with a sed in command line.
  • insertsize : OK
  • gcbias : OK

Bowtie2:

  • multiQC doesn't seems to work properly with Bowtie2 file... Maybe a filename is required but not in the doc ?

from multiqc.

ewels avatar ewels commented on May 18, 2024

Hah, the Picard MarkDups thing was reported by someone else earlier today and is already fixed in 3746e76 😉

Bowtie2 logs are horrible and really difficult to parse. I've been working on that module again this week actually. The module looks in the log for a bowtie command in the hope that a wrapper script around bowtie printed this, but if it can't find that it will take the log filename for the sample name. What happens for you exactly? Do you have an example log?

Phil

from multiqc.

devengineson avatar devengineson commented on May 18, 2024

Here is the Galaxy Bowtie 2 example log that we obtain through the activation of the "Save the bowtie2 mapping statistics to the history" parameter on the Galaxy Bowtie2 formular : bowtie2 galaxy mapping stat file example

from multiqc.

devengineson avatar devengineson commented on May 18, 2024

Hi @bgruening, we encountered some issues with conda on planemo. Writting only requirements with the conda package on MultiQC tool.xml file like this :

<requirements>
       <requirement type="package" version="0.6">multiqc</requirement>
    </requirements>

the planemo test --conda_dependency_resolution . command fails with the attached log
planemo_log_conda_error.txt

Apparently, there is a problem with the conda installation then, multiqc command is not found...

Using conda install . command on the folder with the MultiQC tool.xml file, conda installation works fine... I think we have forgot something.... Did you have any idea ?

from multiqc.

bgruening avatar bgruening commented on May 18, 2024

I have tracked down the problem to: conda/conda#2035

Can you please do a

conda install conda=3.19.0

Hopefully this works for you.

from multiqc.

yvanlebras avatar yvanlebras commented on May 18, 2024

Hi @bgruening !

This works for us. Thank you very much. Have a nice day.

from multiqc.

bgruening avatar bgruening commented on May 18, 2024

Awesome!

from multiqc.

ewels avatar ewels commented on May 18, 2024

Hi @devengineson,

I think I was too slow and the link you posted gives me a 404 error now. Sorry! Could you please send one through again?

Phil

from multiqc.

yvanlebras avatar yvanlebras commented on May 18, 2024

@ewels, you're right, we have deployed a new cloud VM.. sorry ;)

You can find a new version here : Bowtie2 stat report

Cheers,

Yvan

from multiqc.

ewels avatar ewels commented on May 18, 2024

Thanks! What is this file called? As there's nothing else in the log, the bowtie2 module should take the filename as the sample name, and try to clean it up. I'll add something to the docs about this now.

Phil

from multiqc.

yvanlebras avatar yvanlebras commented on May 18, 2024

The datasets name in the Galaxy history is " Bowtie2 on data 1, data 5, and data 4: mapping stats "

Not sure this can help you because you're searching the name of this stat file following bowtie2 command line execution, isn't it ? ;)

from multiqc.

ewels avatar ewels commented on May 18, 2024

Yeah exactly, MultiQC is looking at the filename on the disk - I guess it won't know the name in the galaxy history if that's kept separately. At least, not without writing a Galaxy-specific plugin for MultiQC.

from multiqc.

devengineson avatar devengineson commented on May 18, 2024

@ewels can you give us the expected file name ?
@bgruening we will not forget ;) Thanks!

from multiqc.

ewels avatar ewels commented on May 18, 2024

Absolutely - it can be whatever you like really. MultiQC will truncate from anything in the config.fn_clean_exts list.

So, if you call it sample_name.txt then the MultiQC report will show the name as sample_name.

from multiqc.

devengineson avatar devengineson commented on May 18, 2024

Sorry @ewels for my unclear question (Sorry for my very approximate frenglish) ;) I understand that MultiQC will truncate sample_name.ext, but it seems that for Bowtie2, MultiQC is searching to parse a specific filename (like bowtie2.log for example)... Is it the case? If yes, we have to preprocess the bowtie2 log file to affect a good file name (using the @bgruening method for example ;) ) before giving it to MultiQC.... OR, we don't have to change the name because MultiQC is looking at the content of the bowtie2 log file and we just have to give the bowtie2 log file to MultiQC. We have tested this second manner but it seems to don't work...

from multiqc.

ewels avatar ewels commented on May 18, 2024

Ah I see, sorry. MultiQC uses a config file called search_patterns.yaml to define the search parameters (these can be overwritten by the user, see the docs).

Bowtie 2 has no standardised filename for the output as it's just stderr, so instead MultiQC finds logs by searching for any file containing the string "reads; of these:" which is pretty rubbish, but the best I could manage. Other modules do search by filename as you say, these use fn: in the config instead of contents:. For example, FastQC uses fn: '*_fastqc.zip'.

So MultiQC should find the bowtie logs with any filename if they're there. Then sample names in the report will be chosen based on the filename of the file that is found.

Make sense?

Phil

from multiqc.

ewels avatar ewels commented on May 18, 2024

ps. Two things to consider - MultiQC will overwrite samples if it gives them the same name, so if all bowtie 2 logs are called bowtie2.log then your report will contain only the last file that was found. Also, if something goes wrong with the way that MultiQC parses the logs then there may be no results. It's usually possible to get a better idea about both of these by running in verbose mode (-v) or looking at the contents of multiqc_data/.multiqc.log

from multiqc.

devengineson avatar devengineson commented on May 18, 2024

Ok @ewels that make sense ;) So, in our first test, we have propose the previously mentionned bowtie 2 stat file to MultiQC but it didn't make nothing with it... maybe MultiQC don't like our file ;)

from multiqc.

ewels avatar ewels commented on May 18, 2024

Hmm, you're right. Debugging now - MultiQC finds your log but is looking for a handful of lines which aren't there. My fault - I thought that it was always printed to the log but it seems the bowtie2 log is even more sparse than I thought.

from multiqc.

ewels avatar ewels commented on May 18, 2024

Ok, change pushed - hopefully this version should recognise your bowtie2 logs now..

from multiqc.

devengineson avatar devengineson commented on May 18, 2024

Thank you! We are using the MultiQC 0.6 conda recipe. Can you inform me about the fact that an update can fix this problem ?

from multiqc.

ewels avatar ewels commented on May 18, 2024

Hi @devengineson,

No sorry, this update is currently in v0.7dev which is only on GitHub. It will end up in v0.7 on conda when I release it, but that may be a few weeks yet.

Phil

from multiqc.

devengineson avatar devengineson commented on May 18, 2024

Ok! It was just to be sure ;)

from multiqc.

ewels avatar ewels commented on May 18, 2024

Hi all,

I'm building up to a v0.7 release soon. How are you getting on? Is there anything I need to add to MultiQC? Can I claim on the readme that it works with Galaxy yet? 😉

Phil

from multiqc.

yvanlebras avatar yvanlebras commented on May 18, 2024

Hi Phil,

Yes we have finished the tests and it seems ok for the 0.6 version. We have create a dedicated Galaxy Tool Shed repository here : https://toolshed.g2.bx.psu.edu/view/engineson/multiqc/ff22ea7aa6bb

Cheers,

Yvan

from multiqc.

ewels avatar ewels commented on May 18, 2024

Ok great! I'll add this to the changelog and readme files then if that's ok. Do you think you could write a sentence that I can copy describing how people can use it? Also maybe a slightly longer version that I can add to the docs or something.

Phil

from multiqc.

ewels avatar ewels commented on May 18, 2024

Also, I just noticed that there is a section in multiqc.xml that describes citations. MultiQC has just been published in Bioinformatics, so that would be a better citation (Epigenomics of Common Disease was a conference poster). See http://dx.doi.org/10.1093/bioinformatics/btw354

from multiqc.

devengineson avatar devengineson commented on May 18, 2024

For sure! Thanks for the info

from multiqc.

ewels avatar ewels commented on May 18, 2024

Ok, I've mentioned the wrapper in the readme now. Let me know if you'd like me to change the text. If there's nothing else for me to do with MultiQC then I'll close this issue. Feel free to reopen it again if you feel the need.

Thanks again!

Phil

from multiqc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.