multiqc / multiqc Goto Github PK

View Code? Open in Web Editor NEW

1.2K 37.0 580.0 35.47 MB

Aggregate results from bioinformatics analyses across many samples into a single report.

Home Page: http://multiqc.info

License: GNU General Public License v3.0

Python 23.82% HTML 0.47% CSS 0.20% JavaScript 75.46% Shell 0.01% Dockerfile 0.01% Nix 0.01%

bioinformatics analysis pypi bioconda multiqc python data-visualization quality-control reporting seqera

multiqc's Introduction

Aggregate bioinformatics results across many samples into a single report

Find documentation and example reports at http://multiqc.info

MultiQC is a tool to create a single report with interactive plots for multiple bioinformatics analyses across many samples.

Reports are generated by scanning given directories for recognised log files. These are parsed and a single HTML report is generated summarising the statistics for all logs found. MultiQC reports can describe multiple analysis steps and large numbers of samples within a single plot, and multiple analysis tools making it ideal for routine fast quality control.

A very large number of Bioinformatics tools are supported by MultiQC. Please see the MultiQC website for a complete list. MultiQC can also easily parse data from custom scripts, if correctly formatted / configured - a feature called Custom Content.

More modules are being written all the time. Please suggest any ideas as a new issue (please include example log files).

Installation

You can install MultiQC from PyPI using pip as follows:

pip install multiqc

Alternatively, you can install using Conda from Bioconda (set up your channels first):

conda install multiqc

If you would like the development version from GitHub instead, you can install it with pip:

pip install --upgrade --force-reinstall git+https://github.com/MultiQC/MultiQC.git

MultiQC is also available via Docker and Singularity images, Galaxy wrappers, and many more software distribution systems. See the documentation for details.

Usage

Once installed, you can use MultiQC by navigating to your analysis directory (or a parent directory) and running the tool:

multiqc .

That's it! MultiQC will scan the specified directory (. is the current dir) and produce a report detailing whatever it finds.

The report is created in multiqc_report.html by default. Tab-delimited data files are also created in multiqc_data/, containing extra information. These can be easily inspected using Excel (use --data-format to get yaml or json instead).

For more detailed instructions, run multiqc -h or see the documentation.

Citation

Please consider citing MultiQC if you use it in your analysis.

MultiQC: Summarize analysis results for multiple tools and samples in a single report.
Philip Ewels, Måns Magnusson, Sverker Lundin and Max Käller
Bioinformatics (2016)
doi: 10.1093/bioinformatics/btw354
PMID: 27312411

@article{doi:10.1093/bioinformatics/btw354,
 author = {Ewels, Philip and Magnusson, Måns and Lundin, Sverker and Käller, Max},
 title = {MultiQC: summarize analysis results for multiple tools and samples in a single report},
 journal = {Bioinformatics},
 volume = {32},
 number = {19},
 pages = {3047},
 year = {2016},
 doi = {10.1093/bioinformatics/btw354},
 URL = { + http://dx.doi.org/10.1093/bioinformatics/btw354},
 eprint = {/oup/backfile/Content_public/Journal/bioinformatics/32/19/10.1093_bioinformatics_btw354/3/btw354.pdf}
}

Contributions & Support

Contributions and suggestions for new features are welcome, as are bug reports! Please create a new issue for any of these, including example reports where possible. Pull-requests for fixes and additions are very welcome. Please see the contributing notes for more information about how the process works.

MultiQC has extensive documentation describing how to write new modules, plugins and templates.

If in doubt, feel free to get in touch with the author directly: @ewels ([email protected])

Contributors

MultiQC is developed and maintained by Phil Ewels (@ewels) at Seqera Labs. It was originally written at the National Genomics Infrastructure, part of SciLifeLab in Sweden.

A huge thank you to all code contributors - there are a lot of you! See the Contributors Graph for details.

MultiQC is released under the GPL v3 or later licence.

multiqc's People

Contributors

Stargazers

Watchers

Forkers

guillermo-carrasco moonso robinandeer gitter-badger b97jre rghan jsobral khalidm fw1121 lpantano simexin xflicsu chapmanb slowkow robertoalvarezm roryk vd4mmind cometsong bihealth monashbioinformaticsplatform hlwiencko mlusignan simon-coetzee conerade67 woook ndswxy cwt1 hammarn boratonaj koelling pacifly felloumi vladsavelyev noelnamai vezzi t-neumann fmkerckhof edinburghgenomics iromeo asabjorklund pasted asadrahman-horizon epruesse psaxcode wwood thorellk bioproximity twbattaglia bubioinformaticshub galithil asalhab muslih14 alexanderscholz tarah28 moka-guys ahvigil viklund upscb runsheng cliu72 tanaes remiolsen nalcala palc flopezo jyh1 ishwarvh asetgem ddesvillechabrol tabwalsh iimog raonyguimaraes dennisschwartz peterjc lweasel lecorguille adam-rabinowitz rlegendre fishinwind ehsueh winni2k juliangehring igorhut dpryan79 dfornika rdali maplesond matthdsm chrisbarnettster rifius blthree boulund parlundin joachimwolff raramayo yvanlebras yixf-self abhisheksp smiduthuri almiheenko

multiqc's Issues

Make HTML file stand-alone

Now that the original images are going, there aren't actually that many assets required for the reports - just lots of CSS and Javascript.

It would be nice if we could embed all of this into the HTML file, so that it's easier to share. Preferably using a Python package to do this at run time to avoid maintenance issues.

Input filename line not present in my cutadapt log

The 'Input filename' string that seems to be used as an indicator of cutadapt logfile (i.e. https://github.com/ewels/MultiQC/blob/master/multiqc/cutadapt/__init__.py#L66) is not present in my cutadapt log... I used cutadapt 1.8. It gives the line "Command line parameters: ..." instead.

Is it because I run it in paired end mode, i.e. use two input files?

Cheers,
Johannes

Make General Stats table sortable

Would be cool if you could click table headers and get the table to sort by that column.

Make modules export all parsed data to big JSON files

Instead of printing parsed data to the HTML as currently done, print the parsed data to a single JSON file per module, and then load this with ajax. Keeps the HTML cleaner and makes it easier for others to use the data.

Maybe also copy out the raw data file where applicable? eg. FastQC.

Don't automatically generate plots if loads of samples

If reports are generated with mega-numbers of samples, the page can take for ever to render and even hang.

It would be nice if the javascript plots could check the number of samples. If it's huge, display a button instead. Clicking the button generates the plot.

This should massively improve page load time but still retain all functionality.

Toolbox - Hide samples regex not saved

It seems that the hide samples regex on / off state isn't saved properly.

Ability to swap names

Sample names aren't always intuitive - it would be nice to have a tool to swap them out for something more useful. Needs to be easy to use with copy and paste.

Refactor plotting JavaScript

Merged three issues into this:

Currently, if samples are hidden in the saved config, the page loads then is immediately unresponsive whilst those samples are removed.
Highlight line graphs samples - save doesn't work (#54)
Speed up 'hide samples' (#53)

Most of the toolbox features currently work on rendered plots - looking through existing features and changing them. Instead, it might be better to replot everything when something changes.

Restructure plotting functions to work off a canonical data source that doesn't change
- Make a copy of this data at plot time
- Get highlight / rename / hide settings from page
- Recolour / rename / remove data from the copied data structure
- Create plot
Refactor highlight / rename / hiding to call plotting functions when changed, instead of modifying directly
Plotting functions should not fire on page load, instead be triggered by config load.

This is sort of semi-implemented already in some places, for instance all of the code involving sample renaming. This is a fairly large refactoring, but I think that it will make the reports quite a bit more stable.

It could also lead to a potential fix for the mysterious slowness in #53, as hiding samples often seems slower than opening the page in the first time.

Finally - it should also fix #54 as the sample colouration will be set properly in the highcharts settings, so properly exported in other file formats (rather than a post-plotting hack).

Figure out how to extend with other pip modules

Get #10 to work, then write such an extension for the NGI (partly as an example for others).

Could include:

A new theme (softlinking files, new javascript file to power API to rename samples).
New super-specific module that checks the database for some kind of value?
Could also add a new click argument? To e-mail to a user or something..?

Add Qualimap support

Qualimap is widely used for data QC and would be a good addition to MultiQC. It has a number of data outputs that could be used, but I think it would be good to start with those plotted in ngi_visualizations...

General stats table:

Median coverage
Median insert size
% of genome >= 30x coverage
mapped reads avg. GC

Main sections:

Coverage Histogram
Insert Size Histogram
Genome Fraction Coverage
GC distribution

FastQC - no original images

Medium - large reports now have significant file sizes, due to saving all of the original FastQC plot images. Whilst not consuming vast resources yet, it's getting to the size where they can be annoying to send by e-mail.

Removing the original images means that the reports should be back to being only a few hundred KB. Not sure that their use justifies the problems that they cause? Feel free to comment below if you disagree. Will leave this open for discussion for a bit.

Design change: Move tools

The tools are currently a bit hidden off the bottom of the navigation, and as more modules are added / the tools get longer, they're going to get more and more obscured.

Maybe move them into a side-pull space on the right of the page?

Add option for prefixing directory name

To avoid overwriting sample names, give the option to prefix the directory name to every sample.

Add to Galaxy

See the tutorial. Maybe wait a bit for it to settle down and become more stable first..

Make general stats table sortable by custom highlight

It would be nice to be able to sort samples in the general stats table by highlight. Would help #28.

Add a button above the table to do this once highlights have been applied?

Filter visible samples

It would be cool to have a sidebar text box which you can type into with the option to either show only or hide any samples which match that text..

feature request - A barplot showing the proportion of bp/reads passing each step of a Methyl NGS pipeline from:

A barplot showing the proportion of bp/reads passing each step of a Methyl NGS pipeline from:

Initial fastq.gz PF reads/bases
Qscore and adaptor filtering
Mapping
Deduplication
Clip Overlap
5' or 3' trimming before methylation calling
Bases in CpG positions covered below 1x, above 1x, above 5x, above 30x

Fastqc reporting doesn't work for me (possibly due to my malpractice)

Hi again,

I seem to be naughty and rename the output .zip archives from fastqc (its done by our workflow to avoid outputfiles be named P1994_101_R1.fq_fastqc.html, but instead P1994_101_R1_fastqc.html when running on .fq.gz files), this causes confusion in multiqc since the dirname within the zip archive is still the old one...

When multiqc tries to look for the fastqc_data.txt file within the .zip it results in a KeyError since the directory name within the zip is wrong. I guess one could argue this is malpractice or a problem with fastqc, but it would be nice if multiqc could be tolerant of renaming the zip archives. I think I can sketch on a pull request for this if you may assist with comments and you think its a good idea.

Thanks,
Johannes

General Statistics - Scrolling Header

Hi Phil,
just started using your tool, great stuff. I was wandering if it is possible to fix the header in the General Statistics section in a way that follows you while scrolling down the page, since it easy to get lost in the middle of the sample list and forget what is what.
Thanks a lot and all the best!
Cheers

Edo

Share chroma colour scales in table

Have the option to share a single colour scale. For instance, all columns with read counts (data-scalename='readcount'). Needs to work across modules and be compatible with one or many columns. Probably take the first colour scheme specified.

Advantage - makes it clearer where reads are being lost through a selection process (for example).

FastQC: colour plots by status

We have Pass / Warn / Fail statuses from FastQC for every sample in every plot. Currently, it's tricky working out which lines have which status though. Could be nice to have a button that colour codes the data series by their status.

Add option to copy original plots

The option to show the original plots is very useful, but it breaks if you move the MultiQC_report resulting directory to a location where you don't have access to the original plots. i.e I ran MultiQC in our cluster, downloaded the resulting zip file to my laptop for better visualization, and the FastQC plots (the original ones) are not there.

What about something like --copy-original?

Batch colour sample labels

It would be ace to have a sidebar widget where you could specify the colour of the label for all samples matching a text input.

This would enable easy bulk labelling of certain groups - eg. all trimmed (*_trimmed) samples.

FastQ Screen: Different plot if lots of samples

I like the FastQ Screen plot, as it mimics that made by the original tool. However, it gets a bit useless if there are lots of samples or species. So, in these cases, use a different plot..

Probably sum the various types of alignment score, then have a single bar per dataset, coloured by the species it aligns to.

I would say that the limit of usefulness is around 20 samples for 8 species. So, maybe count num_samples * num_species and if that is > 160 use the alternative plot style.

Entry points for 3rd party modules

Someone mentioned what the best way would be to share 3rd party modules at DNA club

Maybe people are getting sick of me recommending setuptools entry points but I think they could prove a good fit in this case

Basically it would allow other people to build modules as Python packages that can be installed using pip. Whatever packages are active in the current env are the once used by MultiQC 😃

Remove auto-save feature

It was a nice idea, but it annoys me quite frequently and I've never found it useful to date.

Instead, have two buttons to ask for a save - one for this report only, one for all reports.

Add Picard support

The Picard modules I would be interested in are

Any other (picard) modules that are interesting to have?

Hide plots if all samples hidden

Currently, when hiding samples it's possible to get a graph with no data. Instead of this, slot in an explanation saying that all samples have been hidden.

Add 'Key' modal for general stats table colour schemes

Would be nice to have a button which launches a modal showing the colour schemes used in the general stats table.

Suggestion: link samples together?

Hi, just a humble suggestion can you start a new project named MetaMultiQC (suggested by @guillermo-carrasco) (or possibly MultiMultiQC).

Just kidding. Do you think its a good idea for support for linking samples together?

I'll try to explain what I mean:
My usecase is that I run fastqc for each sample at each step I take in the processing of the reads. Below you can see the results for three samples in two steps = 18 fastqc reports.

Would be nice to see a graph over how the stats change for a sample between the different steps for example. In order to do that I need a way to link the reports that belongs to the same sample together.

Thanks,
Johannes

Highlighting changes

Highlight all samples in light grey first if any highlights are created. Disallow blank highlight search strings.

Create system for plugin modules to tie into specified hooks

Could create another setuptools entrypoints class (multiqc.hooks.v1?) that could create a list of functions to run at different points in the code.

feature request - preseq lc_extrap curves

For example, for a paired-end bam file, preseq lc_extrap can be run like this:

Input file(s):  SAMPLE.bam
Output file(s):   SAMPLE.preseq.lc_extrap.tsv
Tool(s):        preseq lc_extrap

$ preseq lc_extrap
    -B –P  
    -o SAMPLE.preseq.lc_extrap.tsv  
    SAMPLE.bam

The output of preseq lc_extrap can be plotted in a curve indicating the ratio of “number of molecules (M)” and “number of distinct molecules (M)”. See below a description of three examples from the preseq website (http://smithlabresearch.org/software/preseq/challenge/)

Instructions on how to install this software and its dependencies can be found at http://smithlabresearch.org/software/preseq

Highlight toolbox drawers that are active

It would be good if the toolbox buttons that have active options are highlighted in some way - a coloured line at the bottom of their button perhaps?

Would be even better still if this only appears when their filters are actually doing something. eg. if there's a renaming filter that has no effect, don't highlight. This is probably a bit tricky and unnecessary though.

Speed up 'hide samples'

With lots of samples, this can make the browser hang. Try to figure out what's taking it so long.

Extend FastQC modules

Make MultiQC support the plots that it currently ignores..

Per base sequence quality
~~Per tile sequence quality~~
Per sequence quality scores
Per base sequence content
Per sequence GC content
Per base N content
Sequence Length Distribution
Sequence Duplication Levels (needs thought)
~~Overrepresented sequences~~
Adapter Content
Kmer Content (needs thought)

Z-index issue with FastQC sequence quality plot

The toolbox doesn't float above the charts, some z-index thing:

Browser: Chrome Canary
Modules: General and FastQC

Add regex mode for renaming

There is currently no regex mode option for renaming samples. Seems a bit unfair.

Overwrite click defaults from config

Something like this:

# update the context with new defaults from the config file
context.default_map = context.obj

Add General Stats HTML markup generator to BaseModule

Currently, each module has to write its own HTML for the general statistics table, which is messy.

Instead, write a function in the BaseModule to handle this.

Modules ported:

Create child templates

At the moment, every template is stand alone. This means that if you just want to change the logo, or a bit of text in the report, then you have to copy everything from general and then make your change. Inevitably, this leads to drift away from the core package and theme.

Instead, allow templates to behave as 'child' themes. Could set a variable parent in the theme __init__.py with the name of another template and build off that. Then the parent template would be used, with any files in the child copied over the top of this. This could happen at run time, also allowing plugin packages to keep their code base clean.

To make this work, I need to split up the template into many more files instead of a single massive file. This is thankfully very easy with Jinja2 - see the docs.

Also, whilst the new stand-alone reports are very nice, we probably need to allow themes to create non stand-alone reports if they want to. Again, an option in the report's __init__.py file here I think.

Task list:

Split general Jinja template into multiple files, centred on a base.html with blocks
New optional variable in template __init__.py to specify parent theme
New method of report generation, involving copying template files to a temporary location
- See tempfile.mkdtemp()
New template __init__.py config option to prepend a directory name to generated report
New template __init__.py ability to copy files along with report generation.

General stats table plotting

Have a collapsible plot (hidden by default) below the general stats table. Use a select drop-down box to choose a column from the table, then plot that data. Respect ordering of rows. Use data-max and data-min to set plotting limits (unless data exceeds them). Think about plot types - presumably use a bar plot.

Toolbox - Clear all option

Would be a nice to have a "clear all" option for filters, especially when loading a page that has been played with previously. Maybe on the save page?

Module request: bwa-meth

parsing output/stats from bwameth:
https://github.com/brentp/bwa-meth

Add Trimmomatic support

Trimmomatic, yes. Also one of those tools that writes to stdout or a user-given log file.

Add support for SeqControl

Great initiative Phil!

I'm not going to work on this for now, but it would be interesting to integrate this byzantine mashup of R & Perl scripts in your mix:

http://labs.oicr.on.ca/Boutros-lab/software/SeqControl/

It generates fairly pretty QC plots and stuff ;)

multiqc / multiqc Goto Github PK

multiqc's Introduction

Aggregate bioinformatics results across many samples into a single report

Find documentation and example reports at http://multiqc.info

Installation

Usage

Citation

Contributions & Support

Contributors

multiqc's People

Contributors

Stargazers

Watchers

Forkers

multiqc's Issues

Recommend Projects

Recommend Topics

Recommend Org