Giter VIP home page Giter VIP logo

clockwork's Introduction

Clockwork

Pipelines for processing bacterial sequence data (Illumina only) and variant calling

Note: these pipelines were developed for the CRyPTIC project which studies M. tuberculosis, but in principle can be used on any bacteria.

Clockwork takes fastq input and outputs standard VCF files as output

Please see the clockwork wiki page for documentation.

clockwork's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

clockwork's Issues

multi-sample option?

Would take a list of samples, run them individually, collect sites, regenotype.

singularity installation instructions

I'm installing Clockwork via singularity as recommended. I think there has been some changes to singularity since the instructions were written. When I ran:

sudo singularity create -s 8000 clockwork_container.img

It returned this warning:

WARNING: The create command is deprecated, and will be removed
WARNING: To create, use the image.create command.
WARNING: Use the build command to create and build an image in a single step.
Creating empty 8000MiB image file: clockwork_container.img
Formatting image with ext3 file system
Image is done: clockwork_container.img

Then, on the next step:

sudo singularity bootstrap clockwork_container.img clockwork_container.def

It failed with this error:

WARNING: The bootstrap command is deprecated and will be removed in a future release.
WARNING: Use the build command like so:
WARNING: singularity build clockwork_container.img clockwork_container.def
ERROR: clockwork_container.img is not an image file
Cleaning up...

There was no difference if I used this instead:

sudo singularity image.create -s 8000 clockwork_container.img

It works if i use this:

sudo singularity build clockwork_container.img clockwork_container.def

Adding base quality support for variants

I'm getting multiple (important) people asking if it is possible to add to the VCF, information about the fastq-qualities of the bases supporting a call. We can't push this all the way through the pipeline, but for example we could have a clockwork command which you optionally run after variant calling, which remaps reads to the VCF calls, and then calculate summary stats of the qualities of the bases in the alleles.

The sensible place to do this is in gramtools, from a global efficiency point of view, but i don't want to do that, as gramtools has enough on its plate, and also the return on investment is low, in terms of improved quality (IMO), but maybe high in terms of convincing people in the public health community.

so i think maybe we could have a command that says "get evidence" for a call, and then pulls the right reads from the bam, maps to the alleles, and collects stats.

this is just an idea @martinghunt , feel free to push back

Documentation ommission: nextflow workflow locations

Based on the example here I tried to run nextflow for the remove_contam step, but it ends with an error, as per:

$ nextflow run nextflow/remove_contam.nf   -with-singularity ~/clockwork/singularity/clockwork_container.img   --ref_fasta Reference.remove_contam/ref.fa   --ref_metadata_tsv Reference.remove_contam/remove_contam_metadata.tsv   --reads_in1 set1_1.fastq.gz --reads_in2 set1_2.fastq.gz --outprefix OUT
N E X T F L O W  ~  version 19.04.1
Pulling nextflow-io/nextflow ...
Not a valid Nextflow project -- The repository `https://github.com/nextflow-io/nextflow` must contain a the script `main.nf` or the file `nextflow.config`

I discovered, upon further inspection, that this is because the assumption is that the working directory has the nextflow scripts available in the nextflow/ directory (whereas mine were elsewhere). I thought it could find the nextflow scripts in the singularity container, but this is not the case.

Perhaps this arrangement can be clarified in the Wiki.

problem with reference_prepare for remove_contam

I'm running reference_prepare for remove contam with the following command line

singularity exec clockwork_container.img clockwork reference_prepare --contam_tsv /clockwork/reference_genomes/remove_contam.tsv --outdir Reference.remove_contam /clockwork/reference_genomes/remove_contam.fa.gz

I left it running for 2 or 3 hours before I cancelled it, this is the message received when cancelling (not sure how useful this is):

  File "/usr/local/bin/clockwork", line 4, in <module>
    __import__('pkg_resources').run_script('clockwork==0.5.2', 'clockwork')
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 742, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 1510, in run_script
    exec(script_code, namespace, namespace)
  File "/usr/local/lib/python3.6/dist-packages/clockwork-0.5.2-py3.6.egg/EGG-INFO/scripts/clockwork", line 500, in <module>
  File "/usr/local/lib/python3.6/dist-packages/clockwork-0.5.2-py3.6.egg/clockwork/tasks/reference_prepare.py", line 29, in run
  File "/usr/local/lib/python3.6/dist-packages/clockwork-0.5.2-py3.6.egg/clockwork/reference_dir.py", line 49, in make_index_files
  File "/usr/local/lib/python3.6/dist-packages/clockwork-0.5.2-py3.6.egg/clockwork/utils.py", line 18, in syscall
  File "/usr/lib/python3.6/subprocess.py", line 405, in run
    stdout, stderr = process.communicate(input, timeout=timeout)
  File "/usr/lib/python3.6/subprocess.py", line 830, in communicate
    stdout = self.stdout.read()
KeyboardInterrupt

Perhaps I'm doing something stupid?

Downloads from HMPDACC not working

The FTP URLs here

'Saliva.tar.bz2' => 'ftp://public-ftp.hmpdacc.org/HMBSA/Saliva.tar.bz2',
'Throat.tar.bz2' => 'ftp://public-ftp.hmpdacc.org/HMBSA/Throat.tar.bz2',
'Tongue_dorsum.tar.bz2' => 'ftp://public-ftp.hmpdacc.org/HMBSA/Tongue_dorsum.tar.bz2',
'Buccal_mucosa.tar.bz2' => 'ftp://public-ftp.hmpdacc.org/HMBSA/Buccal_mucosa.tar.bz2',
'Palatine_Tonsils.tar.bz2' => 'ftp://public-ftp.hmpdacc.org/HMBSA/Palatine_Tonsils.tar.bz2',
are not working. When trying to run wget or curl on them it says The server refuses login.

I have sent a feedback request form to them to mention this.

Some alternatives could be they have this data transfer tool https://github.com/IGS/portal_client

API rate limit exceeded -- Provide your GitHub user name and password to get a higher rate limit

Hello, I have a question about running the clockwork, could you give me a suggestion?

The code as the following,

Singularity> nextflow run nextflow/variant_call.nf --ref_dir /tmp/singularity-3.8.0/refrence/ --reads_in1 /tmp/singularity-3.8.0/diyData/1.L350_BDMS190036238-1a_1.fq.clean.gz --reads_in2 /tmp/singularity-3.8.0/diyData/1.L350_BDMS190036238-1a_2.fq.clean.gz --output_dir result --sample_name sample
N E X T F L O W ~ version 21.10.0
Pulling nextflow-io/nextflow ...
API rate limit exceeded -- Provide your GitHub user name and password to get a higher rate limit

I have confused me some time, I would grateful for you if you can help me, thank you.

Singularity Build gramtools issue

Hello:
The singularity image build command fails at the point of installing gramtools below is a snapshot of the command and error messages:

singularity build clockwork_container.img clockwork_container.def

________________________ gramtools _________________________#
pip3 install --process-dependency-links wheel git+https://github.com/iqbal-lab-org/gramtools@9313eceb606a6fc159e4a14c168b7a6f888c5ed2

.........

.........

SUCCESS: sdsl was installed successfully!
The sdsl include files are located in '/tmp/pip-041uygn9-build/cmake-build-release/libgramtools/include'.
The library files are located in '/tmp/pip-041uygn9-build/cmake-build-release/libgramtools/lib'.

Sample programs can be found in the examples-directory.
A program 'example.cpp' can be compiled with the command:
g++ -std=c++11 -DNDEBUG -O3 [-msse4.2] \
   -I/tmp/pip-041uygn9-build/cmake-build-release/libgramtools/include -L/tmp/pip-041uygn9-build/cmake-build-release/libgramtools/lib \
   example.cpp -lsdsl -ldivsufsort -ldivsufsort64

Tests in the test-directory
A cheat sheet in the extras/cheatsheet-directory.
Have fun!
[ 28%] No install step for 'sdsl'
[ 29%] No test step for 'sdsl'
[ 30%] Completed 'sdsl'
[ 30%] Built target sdsl
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2
ERROR: gramtools backend compilation returned:  2

----------------------------------------

Command "/usr/bin/python3 -u -c "import setuptools, tokenize;file='/tmp/pip-041uygn9-build/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-o6jmmrip-record/install-record.txt --single-version-externally-managed --compile" failed with error code 255 in /tmp/pip-041uygn9-build/
FATAL: While performing build: while running engine: exit status 1

bufio.Scanner: token too long

Hello,

When I run sudo singularity build clockwork.img clockwork_container.def , throw error 2021/11/13 15:54:03 bufio.Scanner: token too long FATAL: Unable to build from clockwork_container.def: while parsing definition: clockwork_container.def: bufio.Scanner: token too long.

image

But use the example of Singularity official document, is run successfully.

image

Can you tell me how should do?

thanks.

How to access the output?

clockwork was installed on my server with a database. It runs OK, but I would like to know what is the best way to access the pipeline outputs. In particular, I would like to be able to export:

  1. the variant table,
  2. the number or % of reads annotated to each reference genome (or any other form of report with the details of the contamination step.
  3. Is it possible to get some FASTQC-like output? in particular the 'Per base sequence quality'
  4. Mapping statistics - median/average depth and coverage breadth.

Thanks so much

QC report

Covg stats
Any evidence of NTM contamination.
% human

Summary report generator for first release of clockwork variants

  1. Stats on number of SNPs, indels etc (total number of SNPs, and indels at different sizes, and then number of SNPs, indels per sample)
  2. Stats on number of known/unknown AMR mutations
  3. Number of samples with a deleted/broken AMR gene
  4. Plot on fixed tree (split by country) (Phelim has a tree and methods)
  5. Lineage breakdown for each country
  6. Number of mixed samples per country
  7. Scatter Plot of number of hets per sample
  8. Plot across chromosome of SNP density summing across all samples
  9. Frequency distribution of SNPs
  10. Summary of Mykrobe results: counts of each AMR SNP/indel per country?

code formatting

Run Black on all the python code, so it's nice and consistent

gVCF output

Specifically for PHE, they want to collect per-sample VCFs and use them for distance calculations without ever having to regenotype. Their hack/route is to dump a gVCF and fasta for each sample. At every position of the reference, you either: give the called allele from a normal VCF, give the ref allele if uncalled and the pileup supports it,and give N if unsure.

Comments

  1. Given the high discovery power of clockwork, just assuming REF everywhere not called is not terrible, especially given we assume the use of a mask. So the cheap and dirty solution is simply to do that. Not super-keen.
  2. We could scan the pileup quickly and collect a list of confident REF sites, and then combine this with the normal VCF - just automatically calling N at all the places not confident REF and not in the normal VCF

Installation error

I got error: "ERROR : Failed to set process capabilities" at command "rsync -a ../python ../nextflow ../scripts /tmp/sbuild-670860823/fs/clockwork/" while I was building clockwork_container.img.
I stuck at this step for few days now, anyone can help me?

Run pipelines with multiple threads

I guess that you have parallelisation setup within the database way of organising jobs, but it might be nice to expose a threads option for e.g. qc.nf, variant_call.nf, etc.

Remove_contamination command throws up AssertionError

I tried running this command to remove contaminating reads from my references prepared folling the instructions on your wiki:

clockwork remove_contam ref_data/OUT/remove_contam_metadata.tsv ERR552106_sort.bam ERR552106_count.txt ERR552106_1.fastq ERR552106_2.fastq

I ended up with this error message below:

Traceback (most recent call last):
File "/usr/local/bin/clockwork", line 4, in
import('pkg_resources').run_script('clockwork==0.9.0', 'clockwork')
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 658, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 1445, in run_script
exec(script_code, namespace, namespace)
File "/usr/local/lib/python3.6/dist-packages/clockwork-0.9.0-py3.6.egg/EGG-INFO/scripts/clockwork", line 960, in
File "/usr/local/lib/python3.6/dist-packages/clockwork-0.9.0-py3.6.egg/clockwork/tasks/remove_contam.py", line 17, in run
File "/usr/local/lib/python3.6/dist-packages/clockwork-0.9.0-py3.6.egg/clockwork/contam_remover.py", line 177, in run
AssertionError

I am not sure if I am doing something wrong, but the AssertionError suggests maybe something to check in the actual code?

Issue while reference_prepare

Hi,
I'm trying to prepare the reference while encountered the following issue:

$singularity exec ../singularity/clockwork_container.img clockwork reference_prepare --contam_tsv reference_info.tsv --outdir OUT references.fasta
[2021-03-30T08:38:23 - clockwork reference_prepare - INFO] Run command: seqtk seq -C -l 60 references.fasta > /home/carolyn/clockwork/ref_data/OUT/ref.fa
[2021-03-30T08:38:33 - clockwork reference_prepare - INFO] Return code: 0
[2021-03-30T08:38:33 - clockwork reference_prepare - INFO] stdout:
[2021-03-30T08:38:33 - clockwork reference_prepare - INFO] stderr:
[2021-03-30T08:38:33 - clockwork reference_prepare - INFO] Run command: samtools faidx /home/carolyn/clockwork/ref_data/OUT/ref.fa
[2021-03-30T08:38:50 - clockwork reference_prepare - INFO] Return code: 0
[2021-03-30T08:38:50 - clockwork reference_prepare - INFO] stdout:
[2021-03-30T08:38:50 - clockwork reference_prepare - INFO] stderr:
[2021-03-30T08:38:50 - clockwork reference_prepare - INFO] Run command: bwa index -a bwtsw /home/carolyn/clockwork/ref_data/OUT/ref.fa
[2021-03-30T10:02:41 - clockwork reference_prepare - INFO] Return code: 0
[2021-03-30T10:02:41 - clockwork reference_prepare - INFO] stdout:
[bwt_gen] Finished constructing BWT in 882 iterations.
[2021-03-30T10:02:41 - clockwork reference_prepare - INFO] stderr:
[bwa_index] Pack FASTA... 30.88 sec
[bwa_index] Construct BWT for the packed sequence...
[BWTIncCreate] textLength=8140015626, availableWord=584761636
[BWTIncConstructFromPacked] 10 iterations done. 99999994 characters processed.
[BWTIncConstructFromPacked] 20 iterations done. 199999994 characters processed.
[BWTIncConstructFromPacked] 30 iterations done. 299999994 characters processed.
[BWTIncConstructFromPacked] 40 iterations done. 399999994 characters processed.
[BWTIncConstructFromPacked] 50 iterations done. 499999994 characters processed.
[BWTIncConstructFromPacked] 60 iterations done. 599999994 characters processed.
[BWTIncConstructFromPacked] 70 iterations done. 699999994 characters processed.
[BWTIncConstructFromPacked] 80 iterations done. 799999994 characters processed.
[BWTIncConstructFromPacked] 90 iterations done. 899999994 characters processed.
[BWTIncConstructFromPacked] 100 iterations done. 999999994 characters processed.
[BWTIncConstructFromPacked] 110 iterations done. 1099999994 characters processed.
[BWTIncConstructFromPacked] 120 iterations done. 1199999994 characters processed.
[BWTIncConstructFromPacked] 130 iterations done. 1299999994 characters processed.
[BWTIncConstructFromPacked] 140 iterations done. 1399999994 characters processed.
[BWTIncConstructFromPacked] 150 iterations done. 1499999994 characters processed.
[BWTIncConstructFromPacked] 160 iterations done. 1599999994 characters processed.
[BWTIncConstructFromPacked] 170 iterations done. 1699999994 characters processed.
[BWTIncConstructFromPacked] 180 iterations done. 1799999994 characters processed.
[BWTIncConstructFromPacked] 190 iterations done. 1899999994 characters processed.
[BWTIncConstructFromPacked] 200 iterations done. 1999999994 characters processed.
[BWTIncConstructFromPacked] 210 iterations done. 2099999994 characters processed.
[BWTIncConstructFromPacked] 220 iterations done. 2199999994 characters processed.
[BWTIncConstructFromPacked] 230 iterations done. 2299999994 characters processed.
[BWTIncConstructFromPacked] 240 iterations done. 2399999994 characters processed.
[BWTIncConstructFromPacked] 250 iterations done. 2499999994 characters processed.
[BWTIncConstructFromPacked] 260 iterations done. 2599999994 characters processed.
[BWTIncConstructFromPacked] 270 iterations done. 2699999994 characters processed.
[BWTIncConstructFromPacked] 280 iterations done. 2799999994 characters processed.
[BWTIncConstructFromPacked] 290 iterations done. 2899999994 characters processed.
[BWTIncConstructFromPacked] 300 iterations done. 2999999994 characters processed.
[BWTIncConstructFromPacked] 310 iterations done. 3099999994 characters processed.
[BWTIncConstructFromPacked] 320 iterations done. 3199999994 characters processed.
[BWTIncConstructFromPacked] 330 iterations done. 3299999994 characters processed.
[BWTIncConstructFromPacked] 340 iterations done. 3399999994 characters processed.
[BWTIncConstructFromPacked] 350 iterations done. 3499999994 characters processed.
[BWTIncConstructFromPacked] 360 iterations done. 3599999994 characters processed.
[BWTIncConstructFromPacked] 370 iterations done. 3699999994 characters processed.
[BWTIncConstructFromPacked] 380 iterations done. 3799999994 characters processed.
[BWTIncConstructFromPacked] 390 iterations done. 3899999994 characters processed.
[BWTIncConstructFromPacked] 400 iterations done. 3999999994 characters processed.
[BWTIncConstructFromPacked] 410 iterations done. 4099999994 characters processed.
[BWTIncConstructFromPacked] 420 iterations done. 4199999994 characters processed.
[BWTIncConstructFromPacked] 430 iterations done. 4299999994 characters processed.
[BWTIncConstructFromPacked] 440 iterations done. 4399999994 characters processed.
[BWTIncConstructFromPacked] 450 iterations done. 4499999994 characters processed.
[BWTIncConstructFromPacked] 460 iterations done. 4599999994 characters processed.
[BWTIncConstructFromPacked] 470 iterations done. 4699999994 characters processed.
[BWTIncConstructFromPacked] 480 iterations done. 4799999994 characters processed.
[BWTIncConstructFromPacked] 490 iterations done. 4899999994 characters processed.
[BWTIncConstructFromPacked] 500 iterations done. 4999999994 characters processed.
[BWTIncConstructFromPacked] 510 iterations done. 5099999994 characters processed.
[BWTIncConstructFromPacked] 520 iterations done. 5199999994 characters processed.
[BWTIncConstructFromPacked] 530 iterations done. 5299999994 characters processed.
[BWTIncConstructFromPacked] 540 iterations done. 5399999994 characters processed.
[BWTIncConstructFromPacked] 550 iterations done. 5499999994 characters processed.
[BWTIncConstructFromPacked] 560 iterations done. 5599999994 characters processed.
[BWTIncConstructFromPacked] 570 iterations done. 5699999994 characters processed.
[BWTIncConstructFromPacked] 580 iterations done. 5799999994 characters processed.
[BWTIncConstructFromPacked] 590 iterations done. 5899999994 characters processed.
[BWTIncConstructFromPacked] 600 iterations done. 5999999994 characters processed.
[BWTIncConstructFromPacked] 610 iterations done. 6099999994 characters processed.
[BWTIncConstructFromPacked] 620 iterations done. 6199999994 characters processed.
[BWTIncConstructFromPacked] 630 iterations done. 6299999994 characters processed.
[BWTIncConstructFromPacked] 640 iterations done. 6399999994 characters processed.
[BWTIncConstructFromPacked] 650 iterations done. 6499999994 characters processed.
[BWTIncConstructFromPacked] 660 iterations done. 6599999994 characters processed.
[BWTIncConstructFromPacked] 670 iterations done. 6699999994 characters processed.
[BWTIncConstructFromPacked] 680 iterations done. 6799999994 characters processed.
[BWTIncConstructFromPacked] 690 iterations done. 6899999994 characters processed.
[BWTIncConstructFromPacked] 700 iterations done. 6999999994 characters processed.
[BWTIncConstructFromPacked] 710 iterations done. 7099999994 characters processed.
[BWTIncConstructFromPacked] 720 iterations done. 7199999994 characters processed.
[BWTIncConstructFromPacked] 730 iterations done. 7299999994 characters processed.
[BWTIncConstructFromPacked] 740 iterations done. 7399999994 characters processed.
[BWTIncConstructFromPacked] 750 iterations done. 7499321674 characters processed.
[BWTIncConstructFromPacked] 760 iterations done. 7589896394 characters processed.
[BWTIncConstructFromPacked] 770 iterations done. 7670395434 characters processed.
[BWTIncConstructFromPacked] 780 iterations done. 7741939178 characters processed.
[BWTIncConstructFromPacked] 790 iterations done. 7805523434 characters processed.
[BWTIncConstructFromPacked] 800 iterations done. 7862033258 characters processed.
[BWTIncConstructFromPacked] 810 iterations done. 7912255322 characters processed.
[BWTIncConstructFromPacked] 820 iterations done. 7956888826 characters processed.
[BWTIncConstructFromPacked] 830 iterations done. 7996555162 characters processed.
[BWTIncConstructFromPacked] 840 iterations done. 8031806682 characters processed.
[BWTIncConstructFromPacked] 850 iterations done. 8063134298 characters processed.
[BWTIncConstructFromPacked] 860 iterations done. 8090974314 characters processed.
[BWTIncConstructFromPacked] 870 iterations done. 8115714554 characters processed.
[BWTIncConstructFromPacked] 880 iterations done. 8137699706 characters processed.
[bwa_index] 3524.45 seconds elapse.
[bwa_index] Update BWT... 23.89 sec
[bwa_index] Pack forward-only FASTA... 22.18 sec
[bwa_index] Construct SA from BWT and Occ... 1384.14 sec
[main] Version: 0.7.15-r1140
[main] CMD: bwa index -a bwtsw /home/carolyn/clockwork/ref_data/OUT/ref.fa
[main] Real time: 5030.994 sec; CPU: 4985.536 sec
[2021-03-30T10:02:41 - clockwork reference_prepare - INFO] Run command: rsync reference_info.tsv /home/carolyn/clockwork/ref_data/OUT/remove_contam_metadata.tsv
[2021-03-30T10:02:41 - clockwork reference_prepare - INFO] Return code: 0
[2021-03-30T10:02:41 - clockwork reference_prepare - INFO] stdout:
[2021-03-30T10:02:41 - clockwork reference_prepare - INFO] stderr:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/clockwork-0.9.0-py3.6.egg/clockwork/contam_remover.py", line 71, in _load_metadata_file
ValueError: not enough values to unpack (expected at least 2, got 1)

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/clockwork", line 4, in
import('pkg_resources').run_script('clockwork==0.9.0', 'clockwork')
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 658, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 1445, in run_script
exec(script_code, namespace, namespace)
File "/usr/local/lib/python3.6/dist-packages/clockwork-0.9.0-py3.6.egg/EGG-INFO/scripts/clockwork", line 960, in
File "/usr/local/lib/python3.6/dist-packages/clockwork-0.9.0-py3.6.egg/clockwork/tasks/reference_prepare.py", line 47, in run
File "/usr/local/lib/python3.6/dist-packages/clockwork-0.9.0-py3.6.egg/clockwork/reference_dir.py", line 73, in add_remove_contam_metadata_tsv
File "/usr/local/lib/python3.6/dist-packages/clockwork-0.9.0-py3.6.egg/clockwork/contam_remover.py", line 73, in _load_metadata_file
clockwork.contam_remover.Error: Error parsing line:
Virus 1 NC_001802.1

I couldn't upload the references.fasta file since it's too large (3.9G) but it could be donwloaded here:
https://1drv.ms/u/s!Ahu3aHGoa85BhpgJ54IOPupzotYjFA?e=MyObb2
(md5sum 0c3d0d79c6d5fa163423cfbea91917ad)

And the tsv file is the this one:

reference_info.tsv.tar.gz

I used the latest version of genome sequences as I could find and made the tsv file following to the instructions from this link:https://github.com/iqbal-lab-org/clockwork/wiki/Preparing-remove-contamination-reference-data. Did I do something wrong?Thank you in advance!

Is minos/final.vcf the output to take forward?

Just a minor documentation thing really, I'm guess that minos/final.vcf is the file to take forward for further analyses, but it would be good for this to be mentioned in the documentation somewhere.

Error when removing contamination

Hi,

I'm trying to remove contamination using the following commands:

$nextflow run ~/clockwork/nextflow/remove_contam.nf -with-singularity ~/clockwork/singularity/clockwork_container.img --ref_fasta ~/clockwork/ref_data/OUT/ref.fa --ref_metadata_tsv ~/clockwork/ref_data/OUT/remove_contam_metadata.tsv --reads_in1 1_1.fq --reads_in2 1_2.fq --outprefix remove_contam.out

and got the error message:

N E X T F L O W ~ version 20.10.0
Launching /home/carolyn/clockwork/nextflow/remove_contam.nf [lonely_church] - revision: b2f84f8ad4
executor > local (1)
[9d/405c46] process > make_jobs_tsv [ 0%] 0 of 1
[- ] process > map_reads -
[- ] process > sam_to_fastq_files -
executor > local (1)
[9d/405c46] process > make_jobs_tsv [100%] 1 of 1, failed: 1 ✘
[- ] process > map_reads -
[- ] process > sam_to_fastq_files -
Error executing process > 'make_jobs_tsv'

Caused by:
Process make_jobs_tsv terminated with an error exit status (1)

Command executed:

echo "reads_in1 reads_in2 counts_tsv reads_contam1 reads_contam2 reads_remove_contam1 reads_remove_contam2 sample_id seqrep_id isolate_id sequence_replicate_number reference_id ref_fasta contam_tsv" > tmp.tsv
echo "/media/carolyn/8T/fastq_rmcontam_clockwork/1_1.fq /media/carolyn/8T/fastq_rmcontam_clockwork/1_2.fq /media/carolyn/8T/fastq_rmcontam_clockwork/remove_contam.out.counts.tsv /media/carolyn/8T/fastq_rmcontam_clockwork/remove_contam.out.contam.1.fq.gz /media/carolyn/8T/fastq_rmcontam_clockwork/remove_contam.out.contam.2.fq.gz /media/carolyn/8T/fastq_rmcontam_clockwork/remove_contam.out.remove_contam.1.fq.gz /media/carolyn/8T/fastq_rmcontam_clockwork/remove_contam.out.remove_contam.2.fq.gz . . . . . /home/carolyn/clockwork/ref_data/OUT/ref.fa /home/carolyn/clockwork/ref_data/OUT/remove_contam_metadata.tsv" >> tmp.tsv
sed 's/ / /g' tmp.tsv > jobs_tsv
rm tmp.tsv

Command exit status:
1

Command output:
(empty)

Command error:
/bin/bash: line 0: cd: /media/carolyn/8T/fastq_rmcontam_clockwork/work/9d/405c46ed86f696e8fc74ab97c6cde7: No such file or directory
/bin/bash: .command.sh: No such file or directory

Work dir:
/media/carolyn/8T/fastq_rmcontam_clockwork/work/9d/405c46ed86f696e8fc74ab97c6cde7

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

The messages says "/media/carolyn/8T/fastq_rmcontam_clockwork/work/9d/405c46ed86f696e8fc74ab97c6cde7: No such file or directory' , but the directory is there. I'm not sure whether this is the problem or something else went wrong. Any idea?

Add a final report

Some stats we could add, requested in our WHO/UNITAID call

  1. output a warning if any resistance gene has <99% of the gene at >5x depth
  2. warn if <95% of the genome has >5x depth

Summarise QC, Mykrobe, variant calling.

Check read names on import

Import checks that the fwd/rev FASTQ files are valid FASTQ, and have the same number of reads. The read names can mismatch between the fwd/rev files, but still import. Then the remove contam pipeline fails when BWA MEM throws an error ([mem_sam_pe] paired reads have different names...).

Could add in check during import that N^th read name matches.

Multi-thread remove contamination option?

Hello,

I'm running the remove_contam.nf script, is there a way to run this workflow with multiple threads? Since we are mapping to large ref genome(s), it takes some time.

Thanks,

Phil

Error building singularity container

I had this error after a while processing the singularity build command:

Error running: enaDataGet -d tmp.7933 -f fasta CVQQ00000000 at /clockwork/scripts/make_pipeline_reference_files.pl line 189.
ABORT: Aborting with RETVAL=255

I do not know what the issue is, it just says something unexpected went wrong, and to try again and if problem persists, ask for assistance.

Thanks for your time.

Option to not download ref files during singularity build

The script make_pipeline_reference_files.pl is having issues downloading genomes (see #49 and #52).

It is run at the end of the singularity build:
https://github.com/iqbal-lab-org/clockwork/blob/master/singularity/clockwork_container.def#L29
It downloads TB reference genomes, and genomes used to check for contamination. It puts them inside the container, with the intention of maintaining easier reproducibility.

I propose we make the running of that script optional at singularity build time. It could be run post-build, and not put the ref files in the container. The advantages of downloading after building the container are:

  1. that the script can continue where it left off, if it crashes. (note that it recognises genomes from the ENA and doesn't get them again, however it doesn't do that for hmpdacc, so ideally that could be fixed). Unlike during a singularity build, where any error unrecoverably aborts the build.
  2. if clockwork isn't being used for Mtb, then none of these files are wanted anyway.

I have just had the script hang when running enaDataGet. I killed it and restarted and then it finished ok. Hence point 1.

Error when running qc.nf

To execute quality control on fastq files, the command supplied in the tutorial is:

nextflow run nextflow/qc.nf
-with-singularity clockwork_container.img
--ref_fasta Ref.QC_and_map/ref.fa
--reads_in1 reads_1.fastq --reads_in2 reads_2.fastq
--output_dir qc_out

(I am able to prepare the reference genome). However, I get the following error:

N E X T F L O W ~ version 22.04.0
Launching ../nextflow/qc.nf [elegant_carlsson] DSL2 - revision: 4bda30ed1d
No such variable: jobs_tsv_channel

-- Check script 'nextflow/qc.nf' at line: 154 or see '.nextflow.log' file for more details

I don't know if I have to specify something else when running nextflow. On the other hand, if I silence that line, I get the following error:

N E X T F L O W ~ version 22.04.0
Launching ../nextflow/qc.nf [agitated_mestorf] DSL2 - revision: e5636d66a5
Missing workflow definition - DSL2 requires at least a workflow block in the main script

I seems that a workflow has not been defined in the qc.nf file.

If anyone knows how to solve the problem, please, let me know. Thank you

Commands run and command output

It would be nice if the clockwork command would output:

  • the full commands run
  • the output of those commands

I assume It's all sequential so the output could be something like

*** running command: bwa mem .....
stdout:
<command stdout here>
stderr:
<command stderr here>
*** running command: ...

Written either to stdout or to a file

Error connecting to database

Hello, I’m following your instructions for the Walkthrough: using tracking database and I seem to be encountering the error below when I try to run the following command:

singularity exec clockwork_v0.11.1.img clockwork make_empty_db db.ini

Error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/clockwork-0.11.1-py3.8.egg/clockwork/db_maker.py", line 7, in __init__
  File "/usr/local/lib/python3.8/dist-packages/clockwork-0.11.1-py3.8.egg/clockwork/db_connection.py", line 7, in __init__
  File "/usr/local/lib/python3.8/dist-packages/clockwork-0.11.1-py3.8.egg/clockwork/db_connection.py", line 50, in _parse_config_file
  File "/usr/lib/python3.8/configparser.py", line 960, in __getitem__
    raise KeyError(key)
KeyError: 'db_login'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/clockwork", line 4, in <module>
    __import__('pkg_resources').run_script('clockwork==0.11.1', 'clockwork')
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 667, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 1470, in run_script
    exec(script_code, namespace, namespace)
  File "/usr/local/lib/python3.8/dist-packages/clockwork-0.11.1-py3.8.egg/EGG-INFO/scripts/clockwork", line 1019, in <module>
  File "/usr/local/lib/python3.8/dist-packages/clockwork-0.11.1-py3.8.egg/clockwork/tasks/make_empty_db.py", line 5, in run
  File "/usr/local/lib/python3.8/dist-packages/clockwork-0.11.1-py3.8.egg/clockwork/db_maker.py", line 9, in __init__
Exception: Error connecting to database

I’m pretty confident about the contents of my db.ini file. They really follow what you’ve given as an example here:

[db_login]
user = xxxxxxxx
password = xxxxxxxxx
host = locahost
db = clockwork_db

Would you have any idea what could be going on and what I can do to solve this? Is the problem on your end or mine? Any insight will be most appreciated thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.