Comments (50)
It is! ;)
I've check with @cmonjeau and we can begin to write a MultiQC Galaxy tool using the bioconda or a Docker dependency.... We'll keep you informed !
from multiqc.
@ewels @yvanlebras not following the problem closely, but do not forget you can always change the input file name on disk to everything you like by simply creating a symlink of $input
.
ln -s $input ./my_smart_name.bam
This could also be the name of the history element, which you can access by $input.name
afaik.
from multiqc.
Quite a lot of work as I need to set up a galaxy instance. Dropping this for now, but would be nice to come back to at some point. If anyone out there is already used to Galaxy, would be great to have some help!
from multiqc.
Hi @ewels We, with @abretaud, just begin to evaluate the possibility to add multiQC to a Galaxy instance... Maybe something on an "Interactive environment" can be investigated with @bgruening...
from multiqc.
Hi @yvanlebras, that's great! I'll re-open this issue then. Do you think it will be a lot of work to write a galaxy wrapper for it?
from multiqc.
I/We have to evaluate this ! In fact, creating a "classical" Galaxy tool seems to be not the best way.... As mentionned, creating a MultiQC Interactive Environment (like in this screencast : https://www.youtube.com/watch?v=mKmXSN1G-Po) can be a good way... The idea will be to allow MultiQC interact with logs from an history where have been execute tools like FastQC, STAR, Cutadapt, .... @bgruening & @erasche will be best to evaluate this....
from multiqc.
@yvanlebras is the output more than just an HTML dataset? I'm just seeing HTML + JS there. If so, then this would be fine as a classical galaxy tool, no IE needed
from multiqc.
True... I don't know why I absolutely want to propose an IE... Oh indeed, because I love this functionality ;) More seriously, you're right, this is more because I had like a feeling that MultiQC is more something who can be applied to an entire QC history + transiently + about dynamic visualization... so more "usable" as an IE than a Galaxy tool....
from multiqc.
@erasche the main output from MultiQC is a HTML report as you say (a single file, everything is embedded). It does also save parsed data as tsv / yaml / json in a directory which can be helpful sometimes. It's easy to disable this if needed with a config option or command line flag.
from multiqc.
The only thing that had worried me about running on Galaxy is that MultiQC needs to see all previous logs / standard out / stderr and so on (varies across tools). Not sure if different tasks are sandboxed in galaxy or not? Sorry for my unfamiliarity with it :)
from multiqc.
@yvanlebras I understand the feeling, IEs are exciting, I want to turn lots of tools into them too. :) That's an interesting application though, run MultiQC on existing datasets. Interesting thought!
@ewels ok... If your tool needs access to stdout/stderr, I am absolutely sure we can find a way to make that possible if it isn't already. It would be easily possible to keep all of the tsv/yaml/json extra datasets.
from multiqc.
@erasche Not exactly - it just need access to files made by other tools. Some of these files will have come from the stdout/stderr of that tool. My point was that the files it needs could be seen as intermediate files and deleted before MultiQC runs..? Not sure if galaxy does that..
from multiqc.
@ewels ah, ok, I see.
Yes, intermediate files can be deleted. I do something similar with the JBrowse tool I maintain -- a large number of tools are run and reported, much like your MultiQC tool does. I mark the intermediate datasets as being safe to delete.
from multiqc.
@erasche sounds good. Though MultiQC doesn't run anything itself, so won't have any control over what other galaxy tool wrapper class as being safe to delete.. Anyway, these are details. How do we go about doing this? @yvanlebras are you intending to write anything? I guess I should try and get a local instance of galaxy running to test stuff on..
from multiqc.
@yvanlebras @ewels let me know if you need any support.
@ewels if you want to start Galaxy tool dev have a look at planemo. We provide also VM and Docker containers to make dev life easier.
from multiqc.
Thanks @bgruening - planemo looks brilliant! I'll take a look next week.
from multiqc.
Hi everyone,
As mentionned before, we, @yvanlebras and @cmonjeau have begin to work on a Galactic MultiQC tool.. yesterday morning ;)
Thank you very much @ewels for this beautiful tool, I really found it terrific and I think we have a lot to do with it, notably inside Galaxy and on our start-up project, EnginesOn !
So you will find:
- a first result on RNAseq Galaxy workflow on our MultiQC dedicated dev Galaxy instance with a first MultiQC webpage result
- some coments here concerning Galaxy and MultiQC difficulties we faced concerning the integration.
Don't hesitate to make comments or ask for more informations!
We will continue in the next days to work on this integration task.
All the best,
Yvan
Integration difficulties
Galaxy
- Tool Shed featureCounts tool output names are too long, this may induce issues. We change this name to a shorter one for MultiQC
- Tool Shed cutadapt tool version (1.6) is older than the accepted version for !MultiQC (>1.8). We have add a function to test the version of the cutadapt to allow Galaxy !MultiQC tool to use older cutadapt version.
- We are using repeat param for input files in the !MultiQC Galaxy tool. This is not very usable when we have a lot of input files (for FastQC, cutadapt tools for example). As data collection is a batch mode input field, we plan to use "multiple = true" option for the input parameter then process the resulting list of files. _FIXED using a multi file selection by module (FastQC / Tophat / cutadatpt) _
- HTML sanitization deactivation mandatory
MultiQC
- Visualization bugs on Mozilla
- Order of tools is quite strange no? Why don't propose from top-down, pre-processing tools as cutadapt, !fastQC, then alinment tool as Tophat / STAR then featureCounts ?
- MultiQC search on the cutadapt report file the name of the command to associate the good dataset name. In Galaxy, this is something like ".../dataset_58.dat". For Galaxy, it will be better that !MultiQC use the cutadapt report file name for example...
- Considering featureCounts, if there is space on the samples names, !MultiQC will failed to process files.
from multiqc.
awesome news!!! 🍺
from multiqc.
Fantastic! Thank you very much @devengineson / @yvanlebras / @cmonjeau ! This looks awesome.
Some thoughts regarding your difficulties:
Galaxy
- featureCounts tool output names are too long
- Long sample names can be cleaned up using parameters saved in a MultiQC config file, if there are consistent pipeline-specific strings such as here. See these docs for instructions. Not sure if this is easier than changing the featureCounts tool itself though.
- old version of cutadapt
- I'm happy to try to modify the MultiQC module to handle output from v1.6 if you'd like.
- I presume that this is a typical log file?
- lots of input files
- A recent addition by @lpantano was to give the option of supplying a file listing input files to be used, rather than on the command line. This was for bcbio, where they're trying to support CWL (maybe the same for Galaxy?).
- Operation is now
multiqc --file-list data/special_cases/file_list.txt
. See #201 for more information. - Note that this is only in
v0.7dev
, not inv0.6
.
MultiQC
- Visualization bugs on Mozilla
- You're right! You mean the general statistics table at the top? Thanks for noticing this - I'll take a look (#213).
- order of tools
- I guess this is a personal thing. Usually the thing I'm most interested in is the final result (eg. final QC step or whatever), and I dig back earlier and earlier to try to explain stuff. Whilst not chronological, I find this more intuitive for how I use the report ("biologically"). I don't have any plans to change this.
- HTML sanitization deactivation mandatory
- I don't understand what you mean by this sorry.
- Take cutadapt sample name from filename
- This is a tough one. Whilst I appreciate that in this case it would be better, it's difficult to change this behaviour for a single instance of MultiQC. Because cutadapt output is in
stderr
logs, other users may have results from multiple files concatenated within a single file (I often do). This is why I try to take sample name from the input filename where possible. - Going by this log file it looks like the sample name should be the same anyway? Here it would be
dataset_39.dat
as that's the input filename. - Slightly worrying that you're using
.dat
- I'm not sure how many modules assume.fastq
/.fq
file formats. But we can fix that problem when we find it I guess.
- This is a tough one. Whilst I appreciate that in this case it would be better, it's difficult to change this behaviour for a single instance of MultiQC. Because cutadapt output is in
- featureCounts breaks if space in filename
- Great spot, thanks! I've added an issue to fix this (#214).
from multiqc.
Thank for your rapid comments @ewels !
For the visualization bug on Mozilla, you're right! This is on the general statistics table at the top.
I understand the "biologically" order ;)
Concerning the HTML sanitization, this is a Galaxy related issue not MultiQC, sorry for the bad assignment ;)
For cutadapt v1.6 (and maybe older versions), the first line is different from > 1.8 and is : "This is cutadapt 1.6 with Python 2.7.3"
from multiqc.
Hi @devengineson,
I think most of these changes are done now - the firefox display bug should be fixed, featureCounts now tolerates having spaces in sample names and the Cutadapt module should work with the old style logs.
Let me know if there's anything else that I can do to help! Thanks again for your work on this.
Phil
from multiqc.
Hi @ewels ,
Thanks for your rapidity!!! After another galactic integration day, several others comments below.
Cheers,
Yvan
FastQC: ok
Tophat: ok
samtools stats: ok
cutadapt: ok
featureCounts: ok
Bismark: ok
Picard:
- markdups : OK but MultiQC searchs 'picard.sam.MarkDuplicates' instead of 'picard.sam.markduplicates.MarkDuplicates'. Fix with a sed in command line.
- insertsize : OK
- gcbias : OK
Bowtie2:
- multiQC doesn't seems to work properly with Bowtie2 file... Maybe a filename is required but not in the doc ?
from multiqc.
Hah, the Picard MarkDups thing was reported by someone else earlier today and is already fixed in 3746e76 😉
Bowtie2 logs are horrible and really difficult to parse. I've been working on that module again this week actually. The module looks in the log for a bowtie command in the hope that a wrapper script around bowtie printed this, but if it can't find that it will take the log filename for the sample name. What happens for you exactly? Do you have an example log?
Phil
from multiqc.
Here is the Galaxy Bowtie 2 example log that we obtain through the activation of the "Save the bowtie2 mapping statistics to the history" parameter on the Galaxy Bowtie2 formular : bowtie2 galaxy mapping stat file example
from multiqc.
Hi @bgruening, we encountered some issues with conda on planemo. Writting only requirements with the conda package on MultiQC tool.xml file like this :
<requirements>
<requirement type="package" version="0.6">multiqc</requirement>
</requirements>
the planemo test --conda_dependency_resolution .
command fails with the attached log
planemo_log_conda_error.txt
Apparently, there is a problem with the conda installation then, multiqc command is not found...
Using conda install .
command on the folder with the MultiQC tool.xml file, conda installation works fine... I think we have forgot something.... Did you have any idea ?
from multiqc.
I have tracked down the problem to: conda/conda#2035
Can you please do a
conda install conda=3.19.0
Hopefully this works for you.
from multiqc.
Hi @bgruening !
This works for us. Thank you very much. Have a nice day.
from multiqc.
Awesome!
from multiqc.
Hi @devengineson,
I think I was too slow and the link you posted gives me a 404 error now. Sorry! Could you please send one through again?
Phil
from multiqc.
@ewels, you're right, we have deployed a new cloud VM.. sorry ;)
You can find a new version here : Bowtie2 stat report
Cheers,
Yvan
from multiqc.
Thanks! What is this file called? As there's nothing else in the log, the bowtie2 module should take the filename as the sample name, and try to clean it up. I'll add something to the docs about this now.
Phil
from multiqc.
The datasets name in the Galaxy history is " Bowtie2 on data 1, data 5, and data 4: mapping stats "
Not sure this can help you because you're searching the name of this stat file following bowtie2 command line execution, isn't it ? ;)
from multiqc.
Yeah exactly, MultiQC is looking at the filename on the disk - I guess it won't know the name in the galaxy history if that's kept separately. At least, not without writing a Galaxy-specific plugin for MultiQC.
from multiqc.
@ewels can you give us the expected file name ?
@bgruening we will not forget ;) Thanks!
from multiqc.
Absolutely - it can be whatever you like really. MultiQC will truncate from anything in the config.fn_clean_exts
list.
So, if you call it sample_name.txt
then the MultiQC report will show the name as sample_name
.
from multiqc.
Sorry @ewels for my unclear question (Sorry for my very approximate frenglish) ;) I understand that MultiQC will truncate sample_name.ext, but it seems that for Bowtie2, MultiQC is searching to parse a specific filename (like bowtie2.log for example)... Is it the case? If yes, we have to preprocess the bowtie2 log file to affect a good file name (using the @bgruening method for example ;) ) before giving it to MultiQC.... OR, we don't have to change the name because MultiQC is looking at the content of the bowtie2 log file and we just have to give the bowtie2 log file to MultiQC. We have tested this second manner but it seems to don't work...
from multiqc.
Ah I see, sorry. MultiQC uses a config file called search_patterns.yaml
to define the search parameters (these can be overwritten by the user, see the docs).
Bowtie 2 has no standardised filename for the output as it's just stderr, so instead MultiQC finds logs by searching for any file containing the string "reads; of these:"
which is pretty rubbish, but the best I could manage. Other modules do search by filename as you say, these use fn:
in the config instead of contents:
. For example, FastQC uses fn: '*_fastqc.zip'
.
So MultiQC should find the bowtie logs with any filename if they're there. Then sample names in the report will be chosen based on the filename of the file that is found.
Make sense?
Phil
from multiqc.
ps. Two things to consider - MultiQC will overwrite samples if it gives them the same name, so if all bowtie 2 logs are called bowtie2.log
then your report will contain only the last file that was found. Also, if something goes wrong with the way that MultiQC parses the logs then there may be no results. It's usually possible to get a better idea about both of these by running in verbose mode (-v
) or looking at the contents of multiqc_data/.multiqc.log
from multiqc.
Ok @ewels that make sense ;) So, in our first test, we have propose the previously mentionned bowtie 2 stat file to MultiQC but it didn't make nothing with it... maybe MultiQC don't like our file ;)
from multiqc.
Hmm, you're right. Debugging now - MultiQC finds your log but is looking for a handful of lines which aren't there. My fault - I thought that it was always printed to the log but it seems the bowtie2 log is even more sparse than I thought.
from multiqc.
Ok, change pushed - hopefully this version should recognise your bowtie2 logs now..
from multiqc.
Thank you! We are using the MultiQC 0.6 conda recipe. Can you inform me about the fact that an update can fix this problem ?
from multiqc.
Hi @devengineson,
No sorry, this update is currently in v0.7dev
which is only on GitHub. It will end up in v0.7
on conda when I release it, but that may be a few weeks yet.
Phil
from multiqc.
Ok! It was just to be sure ;)
from multiqc.
Hi all,
I'm building up to a v0.7 release soon. How are you getting on? Is there anything I need to add to MultiQC? Can I claim on the readme that it works with Galaxy yet? 😉
Phil
from multiqc.
Hi Phil,
Yes we have finished the tests and it seems ok for the 0.6 version. We have create a dedicated Galaxy Tool Shed repository here : https://toolshed.g2.bx.psu.edu/view/engineson/multiqc/ff22ea7aa6bb
Cheers,
Yvan
from multiqc.
Ok great! I'll add this to the changelog and readme files then if that's ok. Do you think you could write a sentence that I can copy describing how people can use it? Also maybe a slightly longer version that I can add to the docs or something.
Phil
from multiqc.
Also, I just noticed that there is a section in multiqc.xml
that describes citations. MultiQC has just been published in Bioinformatics, so that would be a better citation (Epigenomics of Common Disease was a conference poster). See http://dx.doi.org/10.1093/bioinformatics/btw354
from multiqc.
For sure! Thanks for the info
from multiqc.
Ok, I've mentioned the wrapper in the readme now. Let me know if you'd like me to change the text. If there's nothing else for me to do with MultiQC then I'll close this issue. Feel free to reopen it again if you feel the need.
Thanks again!
Phil
from multiqc.
Related Issues (20)
- Validate incorrect format strings in table headers
- Fastp module fails when input reads were provided from stdin HOT 3
- Add API (function) docs
- Unit testing for core library HOT 1
- Interactive usage: additive `multiqc.parse_logs()` sample parsing
- Use `pydantic-settings` to read config from different sources
- Goleft/indexcov module breaks: no datasets to plot HOT 2
- hostile not actually supported in v1.21 ("Invalid value for '-m' / '--module': 'hostile' is not one of . . .") HOT 2
- Add config JSON schema to schema store
- Profile resident memory per module
- MultiQC ignoring first line of TSV file HOT 2
- Custom content works for interactive report but breaks for flat report
- Report: Add back "Y-Limits" toggle button
- samtools coverage log file skipped by the search
- Failure to write report when a table has an empty column that contains a modify lambda
- Show progress status of running modules
- Fix PyPI readme logo
- v1.22 does not work inside nextflow work directory HOT 4
- Deprecate / remove config option `prokka_fn_snames`
- Failed with cellranger multi HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from multiqc.