Giter VIP home page Giter VIP logo

multiqc / multiqc Goto Github PK

View Code? Open in Web Editor NEW
1.2K 37.0 580.0 35.47 MB

Aggregate results from bioinformatics analyses across many samples into a single report.

Home Page: http://multiqc.info

License: GNU General Public License v3.0

Python 23.82% HTML 0.47% CSS 0.20% JavaScript 75.46% Shell 0.01% Dockerfile 0.01% Nix 0.01%
bioinformatics analysis pypi bioconda multiqc python data-visualization quality-control reporting seqera

multiqc's Introduction

  MultiQC MultiQC  

Aggregate bioinformatics results across many samples into a single report

PyPI Version Bioconda Version DOI


MultiQC is a tool to create a single report with interactive plots for multiple bioinformatics analyses across many samples.

Reports are generated by scanning given directories for recognised log files. These are parsed and a single HTML report is generated summarising the statistics for all logs found. MultiQC reports can describe multiple analysis steps and large numbers of samples within a single plot, and multiple analysis tools making it ideal for routine fast quality control.

A very large number of Bioinformatics tools are supported by MultiQC. Please see the MultiQC website for a complete list. MultiQC can also easily parse data from custom scripts, if correctly formatted / configured - a feature called Custom Content.

More modules are being written all the time. Please suggest any ideas as a new issue (please include example log files).

Installation

You can install MultiQC from PyPI using pip as follows:

pip install multiqc

Alternatively, you can install using Conda from Bioconda (set up your channels first):

conda install multiqc

If you would like the development version from GitHub instead, you can install it with pip:

pip install --upgrade --force-reinstall git+https://github.com/MultiQC/MultiQC.git

MultiQC is also available via Docker and Singularity images, Galaxy wrappers, and many more software distribution systems. See the documentation for details.

Usage

Once installed, you can use MultiQC by navigating to your analysis directory (or a parent directory) and running the tool:

multiqc .

That's it! MultiQC will scan the specified directory (. is the current dir) and produce a report detailing whatever it finds.

cd test_data/data/modules/fastqc/v0.10.1 && multiqc .

The report is created in multiqc_report.html by default. Tab-delimited data files are also created in multiqc_data/, containing extra information. These can be easily inspected using Excel (use --data-format to get yaml or json instead).

For more detailed instructions, run multiqc -h or see the documentation.

Citation

Please consider citing MultiQC if you use it in your analysis.

MultiQC: Summarize analysis results for multiple tools and samples in a single report.
Philip Ewels, Måns Magnusson, Sverker Lundin and Max Käller
Bioinformatics (2016)
doi: 10.1093/bioinformatics/btw354
PMID: 27312411

@article{doi:10.1093/bioinformatics/btw354,
 author = {Ewels, Philip and Magnusson, Måns and Lundin, Sverker and Käller, Max},
 title = {MultiQC: summarize analysis results for multiple tools and samples in a single report},
 journal = {Bioinformatics},
 volume = {32},
 number = {19},
 pages = {3047},
 year = {2016},
 doi = {10.1093/bioinformatics/btw354},
 URL = { + http://dx.doi.org/10.1093/bioinformatics/btw354},
 eprint = {/oup/backfile/Content_public/Journal/bioinformatics/32/19/10.1093_bioinformatics_btw354/3/btw354.pdf}
}

Contributions & Support

Contributions and suggestions for new features are welcome, as are bug reports! Please create a new issue for any of these, including example reports where possible. Pull-requests for fixes and additions are very welcome. Please see the contributing notes for more information about how the process works.

MultiQC has extensive documentation describing how to write new modules, plugins and templates.

If in doubt, feel free to get in touch with the author directly: @ewels ([email protected])

Contributors

MultiQC is developed and maintained by Phil Ewels (@ewels) at Seqera Labs. It was originally written at the National Genomics Infrastructure, part of SciLifeLab in Sweden.

A huge thank you to all code contributors - there are a lot of you! See the Contributors Graph for details.

MultiQC is released under the GPL v3 or later licence.

multiqc's People

Contributors

andrei-seleznev avatar apeltzer avatar bnbowman avatar boulund avatar chrispyatt avatar chuan-wang avatar cpavanrun avatar erikdanielsson avatar erinyoung avatar ewels avatar fgvieira avatar gartician avatar hammarn avatar iimog avatar jchorl avatar jfy133 avatar joachimwolff avatar juho0 avatar just-roma avatar matthdsm avatar moonso avatar redmar-van-den-berg avatar remiolsen avatar sachalau avatar sstadick avatar subwaystation avatar t-neumann avatar tbooth avatar vladsavelyev avatar zwdzwd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

multiqc's Issues

Make HTML file stand-alone

Now that the original images are going, there aren't actually that many assets required for the reports - just lots of CSS and Javascript.

It would be nice if we could embed all of this into the HTML file, so that it's easier to share. Preferably using a Python package to do this at run time to avoid maintenance issues.

Make modules export all parsed data to big JSON files

Instead of printing parsed data to the HTML as currently done, print the parsed data to a single JSON file per module, and then load this with ajax. Keeps the HTML cleaner and makes it easier for others to use the data.

Maybe also copy out the raw data file where applicable? eg. FastQC.

Don't automatically generate plots if loads of samples

If reports are generated with mega-numbers of samples, the page can take for ever to render and even hang.

It would be nice if the javascript plots could check the number of samples. If it's huge, display a button instead. Clicking the button generates the plot.

This should massively improve page load time but still retain all functionality.

Ability to swap names

Sample names aren't always intuitive - it would be nice to have a tool to swap them out for something more useful. Needs to be easy to use with copy and paste.

Refactor plotting JavaScript

Merged three issues into this:

  1. Currently, if samples are hidden in the saved config, the page loads then is immediately unresponsive whilst those samples are removed.
  2. Highlight line graphs samples - save doesn't work (#54)
  3. Speed up 'hide samples' (#53)

Most of the toolbox features currently work on rendered plots - looking through existing features and changing them. Instead, it might be better to replot everything when something changes.

  • Restructure plotting functions to work off a canonical data source that doesn't change
    • Make a copy of this data at plot time
    • Get highlight / rename / hide settings from page
    • Recolour / rename / remove data from the copied data structure
    • Create plot
  • Refactor highlight / rename / hiding to call plotting functions when changed, instead of modifying directly
  • Plotting functions should not fire on page load, instead be triggered by config load.

This is sort of semi-implemented already in some places, for instance all of the code involving sample renaming. This is a fairly large refactoring, but I think that it will make the reports quite a bit more stable.

It could also lead to a potential fix for the mysterious slowness in #53, as hiding samples often seems slower than opening the page in the first time.

Finally - it should also fix #54 as the sample colouration will be set properly in the highcharts settings, so properly exported in other file formats (rather than a post-plotting hack).

Figure out how to extend with other pip modules

Get #10 to work, then write such an extension for the NGI (partly as an example for others).

Could include:

  • A new theme (softlinking files, new javascript file to power API to rename samples).
  • New super-specific module that checks the database for some kind of value?
  • Could also add a new click argument? To e-mail to a user or something..?

Add Qualimap support

Qualimap is widely used for data QC and would be a good addition to MultiQC. It has a number of data outputs that could be used, but I think it would be good to start with those plotted in ngi_visualizations...

General stats table:

  • Median coverage
  • Median insert size
  • % of genome >= 30x coverage
  • mapped reads avg. GC

Main sections:

  • Coverage Histogram
  • Insert Size Histogram
  • Genome Fraction Coverage
  • GC distribution

FastQC - no original images

Medium - large reports now have significant file sizes, due to saving all of the original FastQC plot images. Whilst not consuming vast resources yet, it's getting to the size where they can be annoying to send by e-mail.

Removing the original images means that the reports should be back to being only a few hundred KB. Not sure that their use justifies the problems that they cause? Feel free to comment below if you disagree. Will leave this open for discussion for a bit.

Design change: Move tools

The tools are currently a bit hidden off the bottom of the navigation, and as more modules are added / the tools get longer, they're going to get more and more obscured.

Maybe move them into a side-pull space on the right of the page?

Filter visible samples

It would be cool to have a sidebar text box which you can type into with the option to either show only or hide any samples which match that text..

Fastqc reporting doesn't work for me (possibly due to my malpractice)

Hi again,

I seem to be naughty and rename the output .zip archives from fastqc (its done by our workflow to avoid outputfiles be named P1994_101_R1.fq_fastqc.html, but instead P1994_101_R1_fastqc.html when running on .fq.gz files), this causes confusion in multiqc since the dirname within the zip archive is still the old one...

When multiqc tries to look for the fastqc_data.txt file within the .zip it results in a KeyError since the directory name within the zip is wrong. I guess one could argue this is malpractice or a problem with fastqc, but it would be nice if multiqc could be tolerant of renaming the zip archives. I think I can sketch on a pull request for this if you may assist with comments and you think its a good idea.

Thanks,
Johannes

General Statistics - Scrolling Header

Hi Phil,
just started using your tool, great stuff. I was wandering if it is possible to fix the header in the General Statistics section in a way that follows you while scrolling down the page, since it easy to get lost in the middle of the sample list and forget what is what.
Thanks a lot and all the best!
Cheers

Edo

Share chroma colour scales in table

Have the option to share a single colour scale. For instance, all columns with read counts (data-scalename='readcount'). Needs to work across modules and be compatible with one or many columns. Probably take the first colour scheme specified.

Advantage - makes it clearer where reads are being lost through a selection process (for example).

FastQC: colour plots by status

We have Pass / Warn / Fail statuses from FastQC for every sample in every plot. Currently, it's tricky working out which lines have which status though. Could be nice to have a button that colour codes the data series by their status.

Add option to copy original plots

The option to show the original plots is very useful, but it breaks if you move the MultiQC_report resulting directory to a location where you don't have access to the original plots. i.e I ran MultiQC in our cluster, downloaded the resulting zip file to my laptop for better visualization, and the FastQC plots (the original ones) are not there.

What about something like --copy-original?

Batch colour sample labels

It would be ace to have a sidebar widget where you could specify the colour of the label for all samples matching a text input.

This would enable easy bulk labelling of certain groups - eg. all trimmed (*_trimmed) samples.

FastQ Screen: Different plot if lots of samples

I like the FastQ Screen plot, as it mimics that made by the original tool. However, it gets a bit useless if there are lots of samples or species. So, in these cases, use a different plot..

Probably sum the various types of alignment score, then have a single bar per dataset, coloured by the species it aligns to.

I would say that the limit of usefulness is around 20 samples for 8 species. So, maybe count num_samples * num_species and if that is > 160 use the alternative plot style.

Entry points for 3rd party modules

Someone mentioned what the best way would be to share 3rd party modules at DNA club

Maybe people are getting sick of me recommending setuptools entry points but I think they could prove a good fit in this case

Basically it would allow other people to build modules as Python packages that can be installed using pip. Whatever packages are active in the current env are the once used by MultiQC 😃

Remove auto-save feature

It was a nice idea, but it annoys me quite frequently and I've never found it useful to date.

Instead, have two buttons to ask for a save - one for this report only, one for all reports.

Add Picard support

The Picard modules I would be interested in are

  • CollectInsertSizeMetrics
  • CollectGcBiasMetrics
  • CollectAlignmentSummaryMetrics
  • CalculateHsMetrics (see example from Pekka)
  • CollectRnaSeqMetrics
  • CollectOxoGMetrics
  • BaseDistributionByCycleMetrics
  • CollectWgsMetrics
  • RrbsSummaryMetrics
  • QualityByCycleMetrics
  • QualityDistributionMetrics

Any other (picard) modules that are interesting to have?

Hide plots if all samples hidden

Currently, when hiding samples it's possible to get a graph with no data. Instead of this, slot in an explanation saying that all samples have been hidden.

Suggestion: link samples together?

Hi, just a humble suggestion can you start a new project named MetaMultiQC (suggested by @guillermo-carrasco) (or possibly MultiMultiQC).

Just kidding. Do you think its a good idea for support for linking samples together?

I'll try to explain what I mean:
My usecase is that I run fastqc for each sample at each step I take in the processing of the reads. Below you can see the results for three samples in two steps = 18 fastqc reports.
screen shot 2015-09-18 at 16 22 08

Would be nice to see a graph over how the stats change for a sample between the different steps for example. In order to do that I need a way to link the reports that belongs to the same sample together.

Thanks,
Johannes

Highlighting changes

Highlight all samples in light grey first if any highlights are created. Disallow blank highlight search strings.

feature request - preseq lc_extrap curves

For example, for a paired-end bam file, preseq lc_extrap can be run like this:

Input file(s):  SAMPLE.bam
Output file(s):   SAMPLE.preseq.lc_extrap.tsv
Tool(s):        preseq lc_extrap

$ preseq lc_extrap
    -B –P  
    -o SAMPLE.preseq.lc_extrap.tsv  
    SAMPLE.bam

The output of preseq lc_extrap can be plotted in a curve indicating the ratio of “number of molecules (M)” and “number of distinct molecules (M)”. See below a description of three examples from the preseq website (http://smithlabresearch.org/software/preseq/challenge/)

Instructions on how to install this software and its dependencies can be found at http://smithlabresearch.org/software/preseq

Highlight toolbox drawers that are active

It would be good if the toolbox buttons that have active options are highlighted in some way - a coloured line at the bottom of their button perhaps?

Would be even better still if this only appears when their filters are actually doing something. eg. if there's a renaming filter that has no effect, don't highlight. This is probably a bit tricky and unnecessary though.

Speed up 'hide samples'

With lots of samples, this can make the browser hang. Try to figure out what's taking it so long.

Extend FastQC modules

Make MultiQC support the plots that it currently ignores..

  • Per base sequence quality
  • Per tile sequence quality
  • Per sequence quality scores
  • Per base sequence content
  • Per sequence GC content
  • Per base N content
  • Sequence Length Distribution
  • Sequence Duplication Levels (needs thought)
  • Overrepresented sequences
  • Adapter Content
  • Kmer Content (needs thought)

Add General Stats HTML markup generator to BaseModule

Currently, each module has to write its own HTML for the general statistics table, which is messy.

Instead, write a function in the BaseModule to handle this.

Modules ported:

  • bismark
  • bowtie1
  • bowtie2
  • cutadapt
  • (fastq_screen)
  • fastqc
  • featureCounts
  • picard
  • qualimap
  • star
  • tophat

Create child templates

At the moment, every template is stand alone. This means that if you just want to change the logo, or a bit of text in the report, then you have to copy everything from general and then make your change. Inevitably, this leads to drift away from the core package and theme.

Instead, allow templates to behave as 'child' themes. Could set a variable parent in the theme __init__.py with the name of another template and build off that. Then the parent template would be used, with any files in the child copied over the top of this. This could happen at run time, also allowing plugin packages to keep their code base clean.

To make this work, I need to split up the template into many more files instead of a single massive file. This is thankfully very easy with Jinja2 - see the docs.

Also, whilst the new stand-alone reports are very nice, we probably need to allow themes to create non stand-alone reports if they want to. Again, an option in the report's __init__.py file here I think.

Task list:

  • Split general Jinja template into multiple files, centred on a base.html with blocks
  • New optional variable in template __init__.py to specify parent theme
  • New method of report generation, involving copying template files to a temporary location
  • New template __init__.py config option to prepend a directory name to generated report
  • New template __init__.py ability to copy files along with report generation.

General stats table plotting

Have a collapsible plot (hidden by default) below the general stats table. Use a select drop-down box to choose a column from the table, then plot that data. Respect ordering of rows. Use data-max and data-min to set plotting limits (unless data exceeds them). Think about plot types - presumably use a bar plot.

Toolbox - Clear all option

Would be a nice to have a "clear all" option for filters, especially when loading a page that has been played with previously. Maybe on the save page?

Add Trimmomatic support

Trimmomatic, yes. Also one of those tools that writes to stdout or a user-given log file.

Highlight line graphs samples - save doesn't work

When exporting line plots which have been highlighted, the line colours aren't respected. Maybe if we do this differently and replot it might work.

Doing non-standard stuff to highlight the bar plot, so don't think we'll get that to work any time soon.

Option to export config somehow

Have a think about how to get the page setup to persist. Could be browser-generated javascript file download? Instruct the user to save it in the report directory. Then always try to load this file in case it's there..

Also could use cookies, though doesn't transfer across users. Might be cleaner. Could do both..

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.