Giter VIP home page Giter VIP logo

checkqc's Introduction

checkQC

Build Status codecov PyPI Conda Documentation Status DOI

More documentation is available at http://checkqc.readthedocs.io/

CheckQC is a program designed to check a set of quality criteria against an Illumina runfolder.

This is useful as part of a pipeline, where one needs to evaluate a set of quality criteria after demultiplexing. CheckQC is fast, and should finish within a few seconds. It will warn if there are problems breaching warning criteria, and will emit a non-zero exit status if it finds any errors, thus making it easy to stop further processing if the run that is being evaluated needs troubleshooting.

CheckQC has been designed to be modular, and exactly which "qc handlers" are executed with which parameters for a specific run type (i.e. machine type and run length) is determined by a configuration file.

Instrument types supported in checkQC are the following:

  • HiSeqX
  • HiSeq2500
  • iSeq
  • MiSeq
  • NovaSeq
  • NovaSeq X Plus

Install instructions

CheckQC requires Python 3.10. CheckQC can be installed with pip.

pip install checkqc

Alternatively it can be installed with conda using the bioconda channel:

conda install -c bioconda checkqc

Running CheckQC

After installing CheckQC you can run it by specifying the path to the runfolder you want to analyze like this:

checkqc <RUNFOLDER>

This will use the default configuration file packaged with CheckQC if you want to specify your own custom file, you can do so by adding a path to the config like this:

checkqc --config_file <path to your config> <RUNFOLDER>

When CheckQC starts and no path to the config file is specified it will give you the path to where the default file is located on your system, if you want a template that you can customize according to your own needs.

When you run CheckQC you can expect to see output similar to this:

checkqc  tests/resources/170726_D00118_0303_BCB1TVANXX/
INFO     ------------------------
INFO     Starting checkQC (1.1.2)
INFO     ------------------------
INFO     Runfolder is: tests/resources/170726_D00118_0303_BCB1TVANXX/
INFO     No config file specified, using default config from /home/MOLMED/johda411/workspace/checkQC/checkQC/default_config/config.yaml.
INFO     Run summary
INFO     -----------
INFO     Instrument and reagent version: hiseq2500_rapidhighoutput_v4
INFO     Read length: 125-125
INFO     Enabled handlers and their config values were:
INFO            ClusterPFHandler Error=unknown Warning=180
INFO            Q30Handler Error=unknown Warning=80
INFO            ErrorRateHandler Error=unknown Warning=2
INFO            ReadsPerSampleHandler Error=90 Warning=unknown
INFO            UndeterminedPercentageHandler Error=10 Warning=unknown
WARNING  QC warning: Cluster PF was to low on lane 1, it was: 117.93 M
WARNING  QC warning: Cluster PF was to low on lane 7, it was: 122.26 M
WARNING  QC warning: Cluster PF was to low on lane 8, it was: 177.02 M
ERROR    Fatal QC error: Number of reads for sample Sample_pq-27 was too low on lane 7, it was: 6.893 M
ERROR    Fatal QC error: Number of reads for sample Sample_pq-28 was too low on lane 7, it was: 7.104 M
INFO     Finished with fatal qc errors and will exit with non-zero exit status.

The program will summarize the type of run it has identified and output any warnings and/or errors in finds. If any qc errors were found the CheckQC will output a non-zero exit status. This means it can easily be used to decide if a further steps should run or not, e.g. in a workflow.

In addition to the normal output CheckQC has a json mode, enabled by adding --json to the commandline. This outputs the results normally shown in the log as json on stdout (while the log itself is written to stderr), so that this can either be written to a file, or redirected to other programs which can parse the data further. In this example we use the python json tool to pretty print the json output:

checkqc --json tests/resources/170726_D00118_0303_BCB1TVANXX/  | python -m json.tool
INFO     ------------------------
INFO     Starting checkQC (1.1.2)
INFO     ------------------------
INFO     Runfolder is: tests/resources/170726_D00118_0303_BCB1TVANXX/
INFO     No config file specified, using default config from /home/MOLMED/johda411/workspace/checkQC/checkQC/default_config/config.yaml.
INFO     Run summary
INFO     -----------
INFO     Instrument and reagent version: hiseq2500_rapidhighoutput_v4
INFO     Read length: 125-125
INFO     Enabled handlers and their config values were:
INFO     	ClusterPFHandler Error=unknown Warning=180
INFO     	Q30Handler Error=unknown Warning=80
INFO     	ErrorRateHandler Error=unknown Warning=2
INFO     	ReadsPerSampleHandler Error=90 Warning=unknown
INFO     	UndeterminedPercentageHandler Error=10 Warning=unknown
WARNING  QC warning: Cluster PF was to low on lane 1, it was: 117.93 M
WARNING  QC warning: Cluster PF was to low on lane 7, it was: 122.26 M
WARNING  QC warning: Cluster PF was to low on lane 8, it was: 177.02 M
ERROR    Fatal QC error: Number of reads for sample Sample_pq-27 was too low on lane 7, it was: 6.893 M
ERROR    Fatal QC error: Number of reads for sample Sample_pq-28 was too low on lane 7, it was: 7.104 M
INFO     Finished with fatal qc errors and will exit with non-zero exit status.
{
    "exit_status": 1,
    "ClusterPFHandler": [
        {
            "type": "warning",
            "message": "Cluster PF was to low on lane 1, it was: 117.93 M",
            "data": {
                "lane": 1,
                "lane_pf": 117929896,
                "threshold": 180
            }
        },
        {
            "type": "warning",
            "message": "Cluster PF was to low on lane 7, it was: 122.26 M",
            "data": {
                "lane": 7,
                "lane_pf": 122263375,
                "threshold": 180
            }
        },
        {
            "type": "warning",
            "message": "Cluster PF was to low on lane 8, it was: 177.02 M",
            "data": {
                "lane": 8,
                "lane_pf": 177018999,
                "threshold": 180
            }
        }
    ],
    "ReadsPerSampleHandler": [
        {
            "type": "error",
            "message": "Number of reads for sample Sample_pq-27 was too low on lane 7, it was: 6.893 M",
            "data": {
                "lane": 7,
                "number_of_samples": 12,
                "sample_id": "Sample_pq-27",
                "sample_reads": 6.893002,
                "threshold": 90
            }
        },
        {
            "type": "error",
            "message": "Number of reads for sample Sample_pq-28 was too low on lane 7, it was: 7.104 M",
            "data": {
                "lane": 7,
                "number_of_samples": 12,
                "sample_id": "Sample_pq-28",
                "sample_reads": 7.10447,
                "threshold": 90
            }
        }
    ],
    "run_summary": {
        "instrument_and_reagent_type": "hiseq2500_rapidhighoutput_v4",
        "read_length": "125-125",
        "handlers": [
            {
                "handler": "ClusterPFHandler",
                "error": "unknown",
                "warning": 180
            },
            {
                "handler": "Q30Handler",
                "error": "unknown",
                "warning": 80
            },
            {
                "handler": "ErrorRateHandler",
                "error": "unknown",
                "warning": 2
            },
            {
                "handler": "ReadsPerSampleHandler",
                "error": 90,
                "warning": "unknown"
            },
            {
                "handler": "UndeterminedPercentageHandler",
                "error": 10,
                "warning": "unknown"
            }
        ]
    }
}

Running CheckQC as a webservice

In addition to running like a commandline application, CheckQC can be run as a simple webservice.

To run it you simply need to provide the path to a directory where runfolders that you want to be able to check are located. This is given as MONITOR_PATH below. There are also a number of optional arguments that can be passed to the service.

$ checkqc-ws --help
Usage: checkqc-ws [OPTIONS] MONITOR_PATH

Options:
  --port INTEGER     Port which checkqc-ws will listen to (default: 9999).
  --config PATH      Path to the checkQC configuration file (optional)
  --log_config PATH  Path to the checkQC logging configuration file (optional)
  --debug            Enable debug mode.
  --help             Show this message and exit.

Once the webserver is running you can query the /qc/ endpoint and get any errors and warnings back as json. Here is an example how to query the endpoint, and what type of results it will return:

$ curl -s -w'\n' localhost:9999/qc/170726_D00118_0303_BCB1TVANXX | python -m json.tool
{
    "ClusterPFHandler": [
        {
            "data": {
                "lane": 1,
                "lane_pf": 117929896,
                "threshold": 180
            },
            "message": "Cluster PF was to low on lane 1, it was: 117.93 M",
            "type": "warning"
        },
        {
            "data": {
                "lane": 7,
                "lane_pf": 122263375,
                "threshold": 180
            },
            "message": "Cluster PF was to low on lane 7, it was: 122.26 M",
            "type": "warning"
        },
        {
            "data": {
                "lane": 8,
                "lane_pf": 177018999,
                "threshold": 180
            },
            "message": "Cluster PF was to low on lane 8, it was: 177.02 M",
            "type": "warning"
        }
    ],
    "ReadsPerSampleHandler": [
        {
            "data": {
                "lane": 7,
                "number_of_samples": 12,
                "sample_id": "Sample_pq-27",
                "sample_reads": 6.893002,
                "threshold": 90
            },
            "message": "Number of reads for sample Sample_pq-27 was too low on lane 7, it was: 6.893 M",
            "type": "warning"
        },
        {
            "data": {
                "lane": 7,
                "number_of_samples": 12,
                "sample_id": "Sample_pq-28",
                "sample_reads": 7.10447,
                "threshold": 90
            },
            "message": "Number of reads for sample Sample_pq-28 was too low on lane 7, it was: 7.104 M",
            "type": "warning"
        }
    ],
    "exit_status": 0,
    "version": "1.1.0"
}

checkqc's People

Contributors

alvaannett avatar aratz avatar b97pla avatar cbrueffer avatar johandahlberg avatar mariya avatar matrulda avatar monikabrandt avatar nkongenelly avatar sarek928 avatar slohse avatar withrocks avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

checkqc's Issues

Fix or remove CodeCov

Either set up CodeCov correctly (now showing 0% coverage) or remove that functionality.

checkQC can't handle total yield being zero

The outout below is from a runfolder in which lane 7 failed during cluster generation and therefore contains no data.

INFO     ------------------------
INFO     Starting checkQC (1.4.0)
INFO     ------------------------
INFO     Runfolder is: /data/biotank13/runfolders/<runfolder>
INFO     Run summary
INFO     -----------
INFO     Instrument and reagent version: hiseqx_v2
INFO     Read length: 151-151
INFO     Enabled handlers and their config values were: 
INFO     	ClusterPFHandler Error=unknown Warning=400
INFO     	Q30Handler Error=unknown Warning=75
INFO     	ErrorRateHandler Error=unknown Warning=5
INFO     	ReadsPerSampleHandler Error=200 Warning=unknown
INFO     	UndeterminedPercentageHandler Error=10 Warning=unknown
WARNING  QC warning: Cluster PF was to low on lane 7, it was: 0.00 M
WARNING  QC warning: %Q30 0.00 was too low on lane: 7 for read: 1
WARNING  QC warning: %Q30 0.00 was too low on lane: 7 for read: 2
ERROR    Fatal QC error: Number of reads for sample NN was too low on lane 7, it was: 0.000 M
Traceback (most recent call last):
  File "/opt/checkqc/bin/checkqc", line 11, in <module>
    sys.exit(start())
  File "/opt/checkqc/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/opt/checkqc/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/opt/checkqc/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/checkqc/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/opt/checkqc/lib/python3.6/site-packages/checkQC/app.py", line 38, in start
    app.run()
  File "/opt/checkqc/lib/python3.6/site-packages/checkQC/app.py", line 94, in run
    reports = self.configure_and_run()
  File "/opt/checkqc/lib/python3.6/site-packages/checkQC/app.py", line 76, in configure_and_run
    reports = qc_engine.run()
  File "/opt/checkqc/lib/python3.6/site-packages/checkQC/qc_engine.py", line 61, in run
    reports = self._compile_reports()
  File "/opt/checkqc/lib/python3.6/site-packages/checkQC/qc_engine.py", line 98, in _compile_reports
    handler_report = handler.report()
  File "/opt/checkqc/lib/python3.6/site-packages/checkQC/handlers/qc_handler.py", line 240, in report
    sorted_errors_and_warnings = sorted(errors_and_warnings, key=lambda x: x.ordering)
  File "/opt/checkqc/lib/python3.6/site-packages/checkQC/handlers/undetermined_percentage_handler.py", line 40, in check_qc
    percentage_undetermined = (undetermined_yield / total_yield)*100
ZeroDivisionError: division by zero
No JSON object could be decoded

API docs does not get generate on readthedocs.io

We are currently having some trouble generating the docs at readthedocs.io. This used to work, but then for reasons that are unknown to me it stopped working.

I have previously added some code which I found here to make the API auto-build: readthedocs/readthedocs.org#1139 to make this work, however since that seemed to be what was causing the readthedocs build errors I "removed" it in this commit: ad9c7b5

The current state is that the normal documentation is up on readthedocs, while the API docs are missing. I don't have time to look further into this at the moment, so I'll leave this here as a note to myself once I get back to it. However should anyone else feel like picking up this issue up go ahead!

Who is using CheckQC?

I'm curious to know who out there might be using CheckQC. We at the SNP&SEQ Technology Platform are using it in our routine data processing pipelines. Outside of that I think there might be some use, based on seeing people reporting issues, and requesting features, however I don't know who that might be. If you are using CheckQC and would feel comfortable listing your organization in the README, let me know.

Error in code for UndeterminedPercentageHandler

When running checkQC on a MiSeq run, i get the following error:

Traceback (most recent call last):
  File "/home/maleasy/miniconda3/envs/checkqc/bin/checkqc", line 8, in <module>
    sys.exit(start())
  File "/home/maleasy/miniconda3/envs/checkqc/lib/python3.9/site-packages/click/core.py", line 1137, in __call__
    return self.main(*args, **kwargs)
  File "/home/maleasy/miniconda3/envs/checkqc/lib/python3.9/site-packages/click/core.py", line 1062, in main
    rv = self.invoke(ctx)
  File "/home/maleasy/miniconda3/envs/checkqc/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/maleasy/miniconda3/envs/checkqc/lib/python3.9/site-packages/click/core.py", line 763, in invoke
    return __callback(*args, **kwargs)
  File "/home/maleasy/miniconda3/envs/checkqc/lib/python3.9/site-packages/checkQC/app.py", line 41, in start
    app.run()
  File "/home/maleasy/miniconda3/envs/checkqc/lib/python3.9/site-packages/checkQC/app.py", line 104, in run
    reports = self.configure_and_run()
  File "/home/maleasy/miniconda3/envs/checkqc/lib/python3.9/site-packages/checkQC/app.py", line 87, in configure_and_run
    reports = qc_engine.run()
  File "/home/maleasy/miniconda3/envs/checkqc/lib/python3.9/site-packages/checkQC/qc_engine.py", line 61, in run
    reports = self._compile_reports()
  File "/home/maleasy/miniconda3/envs/checkqc/lib/python3.9/site-packages/checkQC/qc_engine.py", line 103, in _compile_reports
    handler_report = handler.report()
  File "/home/maleasy/miniconda3/envs/checkqc/lib/python3.9/site-packages/checkQC/handlers/qc_handler.py", line 240, in report
    sorted_errors_and_warnings = sorted(errors_and_warnings, key=lambda x: x.ordering)
  File "/home/maleasy/miniconda3/envs/checkqc/lib/python3.9/site-packages/checkQC/handlers/undetermined_percentage_handler.py", line 75, in check_qc
    if self.error() != self.UNKNOWN and percentage_undetermined > compute_threshold(self.error()):
  File "/home/maleasy/miniconda3/envs/checkqc/lib/python3.9/site-packages/checkQC/handlers/undetermined_percentage_handler.py", line 66, in compute_threshold
    return value + mean_phix_per_lane[lane_nbr]
KeyError: 1

When commenting out the handler in the config file it works:

...
default_handlers:
#  - name: UndeterminedPercentageHandler
#    warning: unknown
#    error: 9 # <% Phix on lane> + < value as %>
...

Possibly related: I tried to install checkqc 3.6.5 with conda, which seemed to work, but running checkQC lead to an error

ModuleNotFoundError: No module named 'interop'

conda however says that illumina-interop is installed.
I then installed checkQC with pip in a new, clean conda env, which worked, but shows the error described above.

Display expected thresholds in normal mode

@senthil10 had a great suggestion that we should display the expected threshold when printing the output, or that we should output a flowcell summary for a run (i.e. type of flowcell, default thresholds for each criteria, etc). I think both are great ideas, I think we should explore both these ideas and see that we think is best in practice.

installation on Ubuntu 14.04

Are there installation instructions for Ubuntu 14.04?

I tried, but:

sudo pip install -f https://github.com/Illumina/interop/releases/tag/v1.1.1 interop
Downloading/unpacking interop
  Could not find a version that satisfies the requirement interop (from versions: 1.1.1-Darwin-AppleClang, 1.1.1-Linux-GNU, 1.1.1-Windows-MSVC)
Cleaning up...
No distributions matching the version for interop
Storing debug log for failure in /home/avilella/.pip/pip.log

Feature request - e-mail reports

Hi,

We're considering implementing checkQC into our workflow, but we're missing a key feature. We'd need checkQC to e-mail the qc report for each run to the people responsible (who might not be very technologically inclined).
Nice graphs and tables would be a plus, but a basic e-mail functionality would be great.

Is this something you'd be willing to support?

Thanks
Matthias

Improve README

These improvements were suggested by @ewels.

  • Add output examples
  • Write more clearly about when exit status is 0 vs. 1, etc
  • Mention --json mode

Error when using lower bound in read length interval

I found a bug when processing a hiseq2500_rapidhighoutput_v4 with read length 50.

ERROR:

INFO     ------------------------
INFO     Starting checkQC (1.1.2)
INFO     ------------------------
INFO     Runfolder is: /data/biotank7/runfolders/180124_D00457_0238_BCB2B1ANXX
INFO     No config file specified, using default config from /opt/checkqc/lib/python3.6/site-packages/checkQC/default_config/config.yaml.
ERROR    Could not find a config entry for instrument 'hiseq2500_rapidhighoutput_v4' with read length '50'. Please check the provided config file 
Traceback (most recent call last):
  File "/opt/checkqc/bin/checkqc", line 11, in <module>
    sys.exit(start())
  File "/opt/checkqc/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/opt/checkqc/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/opt/checkqc/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/checkqc/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/opt/checkqc/lib/python3.6/site-packages/checkQC/app.py", line 36, in start
    app.run()
  File "/opt/checkqc/lib/python3.6/site-packages/checkQC/app.py", line 66, in run
    reports = self.configure_and_run()
  File "/opt/checkqc/lib/python3.6/site-packages/checkQC/app.py", line 55, in configure_and_run
    handler_config = config.get_handler_config(instrument_and_reagent_version, read_length)
  File "/opt/checkqc/lib/python3.6/site-packages/checkQC/config.py", line 90, in get_handler_config
    raise e
  File "/opt/checkqc/lib/python3.6/site-packages/checkQC/config.py", line 82, in get_handler_config
    handler_config = self._get_matching_handler(instrument_and_reagent_type, read_length)
  File "/opt/checkqc/lib/python3.6/site-packages/checkQC/config.py", line 64, in _get_matching_handler
    raise KeyError
KeyError

Config:

[...]
hiseq2500_rapidhighoutput_v4:
  50-70:
    handlers:
      - name: ClusterPFHandler
        warning: 180 # Millons of clusters
        error: unknown
      - name: Q30Handler
        warning: 80 # Give percentage for reads greater than Q30
        error: unknown # Give percentage for reads greater than Q30
      - name: ErrorRateHandler
        allow_missing_error_rate: False
        warning: 1.5
        error: unknown
      - name: ReadsPerSampleHandler
        warning: 90 # 50 % of threshold for clusters pass filter
        error: unknown
  100-110:
[...]

I think the problem is that we are defining the read length as number of cycles - 1:

read_lengths.append(int(read['@NumCycles']) - 1)

and then doing an exclusive check against the lower bound:

if low_break < int(read_length) <= high_break:

So in my case: 51 cycles were run for read1, it is adjusted to read length 50. Then we check if 50 < 50, which is not true.

This could be fixed by not defining read length as number of cycles minus one or by doing an inclusive check against the lower bound. What do you think is the best, @johandahlberg @monikaBrandt ?

percent_phix on undetermined_percentage_handler.py

Good morning.

I am writing this issue because I was trying to implement CheckQC on an preliminary analysis in our company. I experienced that the tool is breaking when trying to find "percent_phix" in our Run information since there are no fields that correlate to this specific matter. When I comment the parts in which "percent_phix" takes part on, the tools finishes correctly.

However, the MultiQC analysis that we run afterwards requires "percent_phix" to be on the output information to correctly assign CheckQC results on their final plots.

I think that the reason why we do not get this "percent_phix" field is because we demultiplex our samples without splitting them by lane (but it is just my guess, I am not completely sure). Could it be possible to implement a new version which it does not really require that field?

I attach in this issue the parts where I changed your original code and my data so you can work on if you want to.

  def compute_threshold(value):
      return value # + mean_phix_per_lane[lane_nbr]

  def create_data_dict(value):
      return {"lane": lane_nbr,
              "percentage_undetermined": percentage_undetermined,
              "threshold": value,
              "computed_threshold": compute_threshold(value)}
              # "phix_on_lane": mean_phix_per_lane[lane_nbr]}

checkqc_trial.zip

Command I run:

checkqc --downgrade-errors UndeterminedPercentageHandler .

I also change the config.yaml to:

parser_configurations:
  StatsJsonParser:
    # Path to where the bcl2fastq output (i.e. fastq files, etc) is located relative to
    # the runfolder
    bcl2fastq_output_path: Reports/legacy
  SamplesheetParser:
    samplesheet_name: Reports/SampleSheet.csv

Detect index switch + reverse complemented index

I encountered a case where CheckQC couldn't tell the whole story behind incorrectly specified indexes:

Index in sample sheet:

ACATAGCG+CTAGCTTG

Index found in undetermined indexes:

CAAGCTAG+ACATAGCG

Reported by CheckQC:

Index: CAAGCTAG+ACATAGCG on lane: 1 was significantly overrepresented (5.9%) at significance threshold of: 1%.
We found a possible match for the reverse complement of tag: CAAGCTAG, on: Lane: 1, for sample: Sample_A
The tag we found was: CTAGCTTG.
This originated from the dual index tag: CAAGCTAG+ACATAGCG

It would have been helpful if it also was reported that ACATAGCG was find among index 1 in the sample sheet.

Undetermined index handler: Evaluate dual indexes separately

The Undetermined Index Handler does not detect the case where only one index is e.g. reverse complemented. This is something that https://github.com/Molmed/sisyphus picks up and maybe something we want to implement?

Output from the same run:

CheckQC:

UndeterminedPercentageHandler:
    - data:
        computed_threshold: 11.165829420089722
        lane: 6
        percentage_undetermined: 99.99831577213324
        phix_on_lane: 2.1658294200897217
        threshold: 9
      message: 'The percentage of undetermined indexes was to high on lane 6, it was: 100.00%'
      type: error
    UnidentifiedIndexHandler:
    - data:
        lane: 6
        msg: 'Index: AAGAGGCA+ACTCTAGG on lane: 6 was significantly overrepresented (1.1%) at significance threshold of: 1%.'
      message: 'Index: AAGAGGCA+ACTCTAGG on lane: 6 was significantly overrepresented (1.1%) at significance threshold of: 1%.'
      type: warning
    - data:
        lane: 6
        msg: 'Index: ACTGAGCG+AGAGGATA on lane: 6 was significantly overrepresented (1.4%) at significance threshold of: 1%.'
      message: 'Index: ACTGAGCG+AGAGGATA on lane: 6 was significantly overrepresented (1.4%) at significance threshold of: 1%.'
      type: warning
    - data:
        lane: 6
        msg: 'Index: AGGCAGAA+CTAGTCGA on lane: 6 was significantly overrepresented (1.0%) at significance threshold of: 1%.'
      message: 'Index: AGGCAGAA+CTAGTCGA on lane: 6 was significantly overrepresented (1.0%) at significance threshold of: 1%.'
      type: warning
    - data:
        lane: 6
        msg: 'Index: AGGCAGAA+TCGCATAA on lane: 6 was significantly overrepresented (1.6%) at significance threshold of: 1%.'
      message: 'Index: AGGCAGAA+TCGCATAA on lane: 6 was significantly overrepresented (1.6%) at significance threshold of: 1%.'
      type: warning
    - data:
        lane: 6
        msg: 'Index: CGAGGCTG+CTAGTCGA on lane: 6 was significantly overrepresented (1.2%) at significance threshold of: 1%.'
      message: 'Index: CGAGGCTG+CTAGTCGA on lane: 6 was significantly overrepresented (1.2%) at significance threshold of: 1%.'
      type: warning
    - data:
        lane: 6
        msg: 'Index: CGTACTAG+CTAGTCGA on lane: 6 was significantly overrepresented (1.0%) at significance threshold of: 1%.'
      message: 'Index: CGTACTAG+CTAGTCGA on lane: 6 was significantly overrepresented (1.0%) at significance threshold of: 1%.'
      type: warning
    - data:
        lane: 6
        msg: 'Index: CGTACTAG+TAAGGCTC on lane: 6 was significantly overrepresented (1.7%) at significance threshold of: 1%.'
      message: 'Index: CGTACTAG+TAAGGCTC on lane: 6 was significantly overrepresented (1.7%) at significance threshold of: 1%.'
      type: warning
    - data:
        lane: 6
        msg: 'Index: CGTACTAG+TCGCATAA on lane: 6 was significantly overrepresented (1.2%) at significance threshold of: 1%.'
      message: 'Index: CGTACTAG+TCGCATAA on lane: 6 was significantly overrepresented (1.2%) at significance threshold of: 1%.'
      type: warning
    - data:
        lane: 6
        msg: 'Index: CTCTCTAC+AGCTAGAA on lane: 6 was significantly overrepresented (1.3%) at significance threshold of: 1%.'
      message: 'Index: CTCTCTAC+AGCTAGAA on lane: 6 was significantly overrepresented (1.3%) at significance threshold of: 1%.'
      type: warning
    - data:
        lane: 6
        msg: 'Index: CTCTCTAC+CTAGTCGA on lane: 6 was significantly overrepresented (1.9%) at significance threshold of: 1%.'
      message: 'Index: CTCTCTAC+CTAGTCGA on lane: 6 was significantly overrepresented (1.9%) at significance threshold of: 1%.'
      type: warning
    - data:
        lane: 6
        msg: 'Index: CTCTCTAC+TAAGGCTC on lane: 6 was significantly overrepresented (1.1%) at significance threshold of: 1%.'
      message: 'Index: CTCTCTAC+TAAGGCTC on lane: 6 was significantly overrepresented (1.1%) at significance threshold of: 1%.'
      type: warning
    - data:
        lane: 6
        msg: 'Index: CTCTCTAC+TCGCATAA on lane: 6 was significantly overrepresented (2.0%) at significance threshold of: 1%.'
      message: 'Index: CTCTCTAC+TCGCATAA on lane: 6 was significantly overrepresented (2.0%) at significance threshold of: 1%.'
      type: warning
    - data:
        lane: 6
        msg: 'Index: CTCTCTAC+TCTTACGC on lane: 6 was significantly overrepresented (1.1%) at significance threshold of: 1%.'
      message: 'Index: CTCTCTAC+TCTTACGC on lane: 6 was significantly overrepresented (1.1%) at significance threshold of: 1%.'
      type: warning
    - data:
        lane: 6
        msg: 'Index: GGACTCCT+CTAGTCGA on lane: 6 was significantly overrepresented (1.2%) at significance threshold of: 1%.'
      message: 'Index: GGACTCCT+CTAGTCGA on lane: 6 was significantly overrepresented (1.2%) at significance threshold of: 1%.'
      type: warning
    - data:
        lane: 6
        msg: 'Index: TAAGGCGA+ATAGAGAG on lane: 6 was significantly overrepresented (1.0%) at significance threshold of: 1%.'
      message: 'Index: TAAGGCGA+ATAGAGAG on lane: 6 was significantly overrepresented (1.0%) at significance threshold of: 1%.'
      type: warning
    - data:
        lane: 6
        msg: 'Index: TAGGCATG+ATAGCCTT on lane: 6 was significantly overrepresented (1.4%) at significance threshold of: 1%.'
      message: 'Index: TAGGCATG+ATAGCCTT on lane: 6 was significantly overrepresented (1.4%) at significance threshold of: 1%.'
      type: warning
    - data:
        lane: 6
        msg: 'Index: TAGGCATG+CTAGTCGA on lane: 6 was significantly overrepresented (1.8%) at significance threshold of: 1%.'
      message: 'Index: TAGGCATG+CTAGTCGA on lane: 6 was significantly overrepresented (1.8%) at significance threshold of: 1%.'
      type: warning
    - data:
        lane: 6
        msg: 'Index: TAGGCATG+TCGCATAA on lane: 6 was significantly overrepresented (1.6%) at significance threshold of: 1%.'
      message: 'Index: TAGGCATG+TCGCATAA on lane: 6 was significantly overrepresented (1.6%) at significance threshold of: 1%.'
      type: warning
    - data:
        lane: 6
        msg: 'Index: TCCTGAGC+AGCTAGAA on lane: 6 was significantly overrepresented (1.1%) at significance threshold of: 1%.'
      message: 'Index: TCCTGAGC+AGCTAGAA on lane: 6 was significantly overrepresented (1.1%) at significance threshold of: 1%.'
      type: warning
    - data:
        lane: 6
        msg: 'Index: TCCTGAGC+CTAGTCGA on lane: 6 was significantly overrepresented (1.3%) at significance threshold of: 1%.'
      message: 'Index: TCCTGAGC+CTAGTCGA on lane: 6 was significantly overrepresented (1.3%) at significance threshold of: 1%.'
      type: warning
    - data:
        lane: 6
        msg: 'Index: TCCTGAGC+TCGCATAA on lane: 6 was significantly overrepresented (1.4%) at significance threshold of: 1%.'
      message: 'Index: TCCTGAGC+TCGCATAA on lane: 6 was significantly overrepresented (1.4%) at significance threshold of: 1%.'
      type: warning
    - data:
        lane: 6
        msg: 'Index: TCGACGTC+TATGCAGT on lane: 6 was significantly overrepresented (1.1%) at significance threshold of: 1%.'
      message: 'Index: TCGACGTC+TATGCAGT on lane: 6 was significantly overrepresented (1.1%) at significance threshold of: 1%.'
      type: warning

sisyphus:

There are 10 significant undetermined index counts for Lane 6 Index2: 
The reverse complement of index AGCTAGAA is present in SampleSheet among Index2.
Please investigate index AGCTAGAA. (1.140% of all reads in lane 6)
The reverse complement of index CTAGTCGA is present in SampleSheet among Index2.
Please investigate index CTAGTCGA. (1.007% of all reads in lane 6)
The reverse complement of index ATAGCCTT is present in SampleSheet among Index2.
Please investigate index ATAGCCTT. (1.411% of all reads in lane 6)
The reverse complement of index TCTTACGC is present in SampleSheet among Index2.
Please investigate index TCTTACGC. (1.123% of all reads in lane 6)
The reverse complement of index ACTCTAGG is present in SampleSheet among Index2.
Please investigate index ACTCTAGG. (1.108% of all reads in lane 6)
The reverse complement of index TATGCAGT is present in SampleSheet among Index2.
Please investigate index TATGCAGT. (1.137% of all reads in lane 6)
The reverse complement of index ATAGAGAG is present in SampleSheet among Index2.
Please investigate index ATAGAGAG. (1.021% of all reads in lane 6)
The reverse complement of index TCGCATAA is present in SampleSheet among Index2.
Please investigate index TCGCATAA. (1.156% of all reads in lane 6)
The reverse complement of index TAAGGCTC is present in SampleSheet among Index2.
Please investigate index TAAGGCTC. (1.098% of all reads in lane 6)
The reverse complement of index AGAGGATA is present in SampleSheet among Index2.
Please investigate index AGAGGATA. (1.418% of all reads in lane 6)

There are 11 significant undetermined index counts for Lane 6 Index1: 
Index ACTGAGCG is present in Samplesheet among Index1. OK!
Index CGTACTAG is present in Samplesheet among Index1. OK!
Index TCCTGAGC is present in Samplesheet among Index1. OK!
Index CTCTCTAC is one mismatch from being a correct index among Index2.
Index CTCTCTAC is present in Samplesheet among Index1. OK!
Index TAAGGCGA is present in Samplesheet among Index1. OK!
Index CGAGGCTG is present in Samplesheet among Index1. OK!
Index TCGACGTC is present in Samplesheet among Index1. OK!
Index AGGCAGAA is present in Samplesheet among Index1. OK!
Index GGACTCCT is present in Samplesheet among Index1. OK!
Index TAGGCATG is present in Samplesheet among Index1. OK!
Index AAGAGGCA is present in Samplesheet among Index1. OK!

Path to bcl2fastq output in config

CheckQC assumes that output from bcl2fastq is located in a sub directory called Unaligned, but output is by default written to <RUNFOLDER>/Data/Intensities/BaseCalls. It would be nice if the location of the bcl2fastq output could be specified in the config.

Issues for NovaSeq SP with interop==1.1.9

It looks like the latest release of interop (1.1.9) does not work for (at least) NovaSeq SP.


Traceback (most recent call last):
  File "/home/MOLMED/matas618/.conda/envs/checkqc/bin/checkqc", line 11, in <module>
    load_entry_point('checkQC', 'console_scripts', 'checkqc')()
  File "/home/MOLMED/matas618/.conda/envs/checkqc/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/MOLMED/matas618/.conda/envs/checkqc/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/MOLMED/matas618/.conda/envs/checkqc/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/MOLMED/matas618/.conda/envs/checkqc/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/home/MOLMED/matas618/workspace/checkQC/checkQC/app.py", line 41, in start
    app.run()
  File "/home/MOLMED/matas618/workspace/checkQC/checkQC/app.py", line 104, in run
    reports = self.configure_and_run()
  File "/home/MOLMED/matas618/workspace/checkQC/checkQC/app.py", line 87, in configure_and_run
    reports = qc_engine.run()
  File "/home/MOLMED/matas618/workspace/checkQC/checkQC/qc_engine.py", line 61, in run
    reports = self._compile_reports()
  File "/home/MOLMED/matas618/workspace/checkQC/checkQC/qc_engine.py", line 103, in _compile_reports
    handler_report = handler.report()
  File "/home/MOLMED/matas618/workspace/checkQC/checkQC/handlers/qc_handler.py", line 240, in report
    sorted_errors_and_warnings = sorted(errors_and_warnings, key=lambda x: x.ordering)
  File "/home/MOLMED/matas618/workspace/checkQC/checkQC/handlers/undetermined_percentage_handler.py", line 75, in check_qc
    if self.error() != self.UNKNOWN and percentage_undetermined > compute_threshold(self.error()):
  File "/home/MOLMED/matas618/workspace/checkQC/checkQC/handlers/undetermined_percentage_handler.py", line 66, in compute_threshold
    return value + mean_phix_per_lane[lane_nbr]
KeyError: 2

We might have to update the interop requirements, but this will be further investigated first.

The solution for now is to use interop 1.1.5.

`IndexError` with format string

There is an issue with the following lines. The issue is that the format string is expecting two arguments but is only provided one. I guess this was not noticed before because it only happens when this exception is raised.

raise DemuxSummaryNotFound("Could not identify expected demux summary file: {}. "
"We expect to find {} files matching the pattern, "
"'DemuxSummaryF1L<Lane number>.txt'".format(path))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.