Giter VIP home page Giter VIP logo

hammerpede's Introduction

ONT_logo


Hammerpede: training profile HMMs for primers from real Nanopore data

Hammerpede is a package to build strand-specific profile HMMs for a set of primers from real Oxford Nanopore Technologies' reads. The models built can be used by the pychopper package to identify and orient full length cDNA reads.

Getting Started

Dependencies

The required Python packages are installed by either pip or conda. The profile HMM alignment backend depends on the latest hmmer package. This can be easily installed using conda:

conda install -c bioconda hmmer

The package also requires the latest spoa. This is best to be installed from source according to the developers instructions.

Installation

Install via pip:

pip install git+https://github.com/nanoporetech/hammerpede.git

After installing the test can be run by issuing:

make test

Issue make help to get a list of make targets.

Usage

usage: hp_bootstrap.py [-h] -f query_fasta -o outdir [-i input_format]
                       [-g aln_params] [-s min_score]
                       input_fastx

Tool train strand-specific profile HMMs of primers from real Nanopore reads.

positional arguments:
  input_fastx      Input read fastq.

optional arguments:
  -h, --help       show this help message and exit
  -f query_fasta   Fasta with primer sequences.
  -o outdir        Output directory.
  -i input_format  Input/output format (fastq).
  -g aln_params    Alignment parameters (match, mismatch,gap_open,gap_extend).
  -s min_score     Score cutoff (0.8).

Example usage (see also test/Makefile):

hp_bootstrap.py -f cDNA_SSP_VNP_full.fas -o test_output -s 0.75 SIRV_E0_pcs109_1k.fq

The profile HMMs produced can be visualized using Skyling. For example, the VNP primer logo might look like this:

ONT_logo

Contributing

  • Please fork the repository and create a merge request to contribute.
  • Use bumpversion to manage package versioning.
  • The code should be PEP8 compliant, which can be tested by make lint.

Help

Licence and Copyright

(c) 2019 Oxford Nanopore Technologies Ltd.

This Source Code Form is subject to the terms of the Mozilla Public License, v. 2.0. If a copy of the MPL was not distributed with this file, You can obtain one at http://mozilla.org/MPL/2.0/.

FAQs and tips

References and Supporting Information

Research Release

Research releases are provided as technology demonstrators to provide early access to features or stimulate Community development of tools. Support for this software will be minimal and is only provided directly by the developers. Feature requests, improvements, and discussions are welcome and can be implemented by forking and pull requests. However much as we would like to rectify every issue and piece of feedback users may have, the developers may have limited resource for support of this software. Research releases may be unstable and subject to rapid iteration by Oxford Nanopore Technologies.

See the post announcing the tools at the Oxford Nanopore Technologies Community here.

hammerpede's People

Contributors

asaont avatar bsipos avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hammerpede's Issues

Error message when trying out the test

Hi, I have tried to install the hammerpede tool. I tried out my installation using your test files, and I got an error.

This is the error output:

hp_bootstrap.py -f cDNA_SSP_VNP_full.fas -o test_output -s 0.75 SIRV_E0_pcs109_1k.fq
Traceback (most recent call last):
  File "/cluster/projects/nn9305k/src/miniconda/envs/hammerpede_v0/bin/hp_bootstrap.py", line 64, in <module>
    queries = seq_detect.load_queries(args.f, args.s, None, ALIGN_PARAMS)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/cluster/projects/nn9305k/src/miniconda/envs/hammerpede_v0/lib/python3.12/site-packages/hammerpede/seq_detect.py", line 32, in load_queries
    for sr in seu.read_seq_records(in_fas):
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/cluster/projects/nn9305k/src/miniconda/envs/hammerpede_v0/lib/python3.12/site-packages/hammerpede/seq_utils.py", line 115, in read_seq_records
    handle = open(handle, "rU")
             ^^^^^^^^^^^^^^^^^^
ValueError: invalid mode: 'rU'

I am wondering if this is due to my installation or due to something else.

How I installed the tool

  • first created a conda environment called: hammerpede_v0, using a yml file.
    The yml file:
name: hammerpede_v0
channels:
 - conda-forge
 - bioconda
 - defaults
dependencies:
 - hmmer
 - spoa=4.1.4
 - pip
 - wheel=0.43.0

I installed the environment, like this:

conda env create --prefix /cluster/projects/nn9305k/src/miniconda/envs/hammerpede_vo --file yaml/hammerpede_v0.yml

pip and wheel are needed for the environment, otherwise the tool is using the system pip version, and stores the tools from the next step outside if the conda environment.

I then ran the command:

pip install git+https://github.com/nanoporetech/hammerpede.git

After that I have this:

 which hp_bootstrap.py
/cluster/projects/nn9305k/src/miniconda/envs/hammerpede_v0/bin/hp_bootstrap.py

and the tool can be called

 hp_bootstrap.py -h
usage: hp_bootstrap.py [-h] -f query_fasta -o outdir [-i input_format] [-g aln_params] [-s min_score] input_fastx

Tool train strand-specific profile HMMs of primers from real Nanopore reads.

positional arguments:
  input_fastx      Input read fastq.

options:
  -h, --help       show this help message and exit
  -f query_fasta   Fasta with primer sequences.
  -o outdir        Output directory.
  -i input_format  Input/output format (fastq).
  -g aln_params    Alignment parameters (match, mismatch,gap_open,gap_extend).
  -s min_score     Score cutoff (0.8).

Any idea where the error could be coming from?

Hammerpede is stuck, seemingly after starting the 'spoa' stage

I'm running the following command:

hp_bootstrap.py -f telo-prime_adaptors.fa -o hammerpede_output <(gunzip -c ../qscore7.fastq.gz)

It progresses normally at first and takes ~24 hours to make four fasta files, called hits_*.fasta (where * is SSP, -SSP, VNP, and -VNP). It produces the following output:

0it [00:00, ?it/s]Extracting primer regions from reads in /dev/fd/63
49153803783it [17:53:00, 763494.19it/s]
Aligning primer regions using spoa: hammerpede_output/spoa_aln_hits_VNP.fasta

But is then stuck (I think) at this stage, as nothing seems to be progressing now for ~96 hours. The node that this command is running on is running a spoa job, but the spoa_aln_hits_VNP.fasta file which has been created has not been populated with any information.

spoa non-zero exit status 132

I was trying to generate hmm file for future use in pychopper using the command:
hp_bootstrap.py -f /home/anshul1/scratch/seq/our_primers.fas -o $outDir $inDir/${filename}.fastq
(Input reads: 74M)
when I faced the following error:

Aligning primer regions using spoa: /home/anshul1/scratch/seq/pychopper_out5_hmm/spoa_aln_hits_REV.fasta
/bin/sh: line 1: 327333 Illegal instruction     (core dumped) spoa -l 1 -r 1 /home/anshul1/scratch/seq/pychopper_out5_hmm/hits_REV.fasta > /home/anshul1/scratch/seq/pychopper_out5_hmm/spoa_aln_hits_REV.fasta
Traceback (most recent call last):
  File "/home/anshul1/virtual_envs/pychopper/bin/hp_bootstrap.py", line 95, in <module>
    spoa.spoa_align(qd[1], aln)
  File "/home/anshul1/virtual_envs/pychopper/lib/python3.10/site-packages/hammerpede/spoa.py", line 8, in spoa_align
    sp.check_call(cmd, shell=True)
  File "/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/python/3.10.2/lib/python3.10/subprocess.py", line 369, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'spoa -l 1 -r 1 /home/anshul1/scratch/seq/pychopper_out5_hmm/hits_REV.fasta > /home/anshul1/scratch/seq/pychopper_out5_hmm/spoa_aln_hits_REV.fasta' returned non-zero exit status 132.

My primers were:

$ cat /home/anshul1/scratch/seq/our_primers.fas
>REV
TCTTTCCCTACACGACGCTCTTCCGATCT
>FRW
AAGCAGTGGTATCAACGCAGAGTGAATGGG

More details:

Complete script:

$ cat pychopperHMM_script.sh
#!/bin/bash -l
#SBATCH --error=/home/anshul1/scratch/seq/%x_err_%A_%a.txt
#SBATCH --time=1-10:00:00
#SBATCH --output=/home/anshul1/scratch/seq/%x_out_%A_%a.out
#SBATCH --cpus-per-task=2
#SBATCH --job-name=HMMpychopper
#SBATCH --array=1
#SBATCH --mem-per-cpu=30G
#SBATCH --account=def-noncodo

inDir="/home/anshul1/projects/def-noncodo/anshul1/2023-12-15_seq_JURK_HEK_1"
outDir="/home/anshul1/scratch/seq/pychopper_out5_hmm"
filename="seq_run1n2"

##activating PyChopper virtualenv
module load StdEnv/2020 gcc/9.3.0 parasail python/3.10 spoa/3.4.0
source ~/virtual_envs/pychopper/bin/activate
python -c 'import parasail'
python -c 'import pychopper'
python -c 'import tqdm'
##command:
hp_bootstrap.py -f /home/anshul1/scratch/seq/our_primers.fas -o $outDir $inDir/${filename}.fastq
##pychopper -m hmm -g ${outDir}/FRW_REV.hmm -c primer_config.txt -t 46 -r $outDir/${filename}_report.pdf -S $outDir/${filename}_statistics.tsv -u $outDir/${filename}_unclassified.fastq -w $outDir/${filename}_rescued.fastq $inDir/${filename}.fastq.gz $outDir/${filename}_full_output.fastq

Complete error file:

$ cat HMMpychopper_err_29241032_1.txt

Lmod is automatically replacing "intel/2020.1.217" with "gcc/9.3.0".


Due to MODULEPATH changes, the following have been reloaded:
  1) mii/1.1.2

The following have been reloaded with a version change:
  1) StdEnv/2023 => StdEnv/2020
  2) blis/0.9.0 => blis/0.8.1
  3) flexiblas/3.3.1 => flexiblas/3.0.4
  4) gcc/12.3 => gcc/9.3.0
  5) gcccore/.12.3 => gcccore/.9.3.0
  6) gentoo/2023 => gentoo/2020
  7) libfabric/1.18.0 => libfabric/1.10.1
  8) openmpi/4.1.5 => openmpi/4.0.3
  9) ucx/1.14.1 => ucx/1.8.0

  0%|          | 0/120138152530 [00:00<?, ?it/s]Extracting primer regions from reads in /home/anshul1/projects/def-noncodo/anshul1/2023-12-15_seq_JURK_HEK_1/seq_run1n2.fastq
100%|██████████| 120138152530/120138152530 [24:13:40<00:00, 1377404.48it/s]
Aligning primer regions using spoa: /home/anshul1/scratch/seq/pychopper_out5_hmm/spoa_aln_hits_REV.fasta
/bin/sh: line 1: 327333 Illegal instruction     (core dumped) spoa -l 1 -r 1 /home/anshul1/scratch/seq/pychopper_out5_hmm/hits_REV.fasta > /home/anshul1/scratch/seq/pychopper_out5_hmm/spoa_aln_hits_REV.fasta
Traceback (most recent call last):
  File "/home/anshul1/virtual_envs/pychopper/bin/hp_bootstrap.py", line 95, in <module>
    spoa.spoa_align(qd[1], aln)
  File "/home/anshul1/virtual_envs/pychopper/lib/python3.10/site-packages/hammerpede/spoa.py", line 8, in spoa_align
    sp.check_call(cmd, shell=True)
  File "/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/python/3.10.2/lib/python3.10/subprocess.py", line 369, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'spoa -l 1 -r 1 /home/anshul1/scratch/seq/pychopper_out5_hmm/hits_REV.fasta > /home/anshul1/scratch/seq/pychopper_out5_hmm/spoa_aln_hits_REV.fasta' returned non-zero exit status 132.

Output folder:

$ ls -lh
total 2.4G
-rw-rw----. 1 anshul1 anshul1  1.9G May 22 09:54 hits_-FRW.fasta
-rw-rw----. 1 anshul1 anshul1  2.1G May 22 09:54 hits_-REV.fasta
-rw-rw----. 1 anshul1 anshul1  2.0G May 22 09:54 hits_FRW.fasta
-rw-rw----. 1 anshul1 anshul1 1019M May 22 09:54 hits_REV.fasta
-rw-rw----. 1 anshul1 anshul1     0 May 22 09:54 spoa_aln_hits_REV.fasta

My virtual env & its installed packages:

$ pip list --local
Package         Version
--------------- -------------------------
biopython       1.81+computecanada
contourpy       1.2.0+computecanada
cycler          0.12.1+computecanada
edlib           1.3.9+computecanada
exceptiongroup  1.2.1
fonttools       4.51.0+computecanada
Hammerpede      0.1.0
iniconfig       2.0.0+computecanada
kiwisolver      1.4.5+computecanada
matplotlib      3.7.2+computecanada
numpy           1.25.2+computecanada
packaging       24.0
pandas          2.1.1+computecanada
parasail        1.2.4+computecanada
Pillow          10.1.0+computecanada
pip             24.0+computecanada
pluggy          1.5.0+computecanada
pychopper       2.7.9
pyparsing       3.0.9+computecanada
pysam           0.22.0+computecanada
pytest          8.2.0+computecanada
python_dateutil 2.9.0.post0+computecanada
pytz            2024.1+computecanada
setuptools      69.2.0
six             1.16.0+computecanada
tomli           2.0.1+computecanada
tqdm            4.26.0
tzdata          2024.1+computecanada
wheel           0.43.0

Any help in resolving the same would be much appreciated

Can the process be resumed from the point the hits_-VNP.fasta (and the other three) have been created

I ran the following command:
hp_bootstrap.py -f adaptors.fa -o hammerpede_output <(gunzip -c ../qscore7.fastq.gz)
And it ran for ~20 hours and finished creating the four hits_.fasta files. However, at this point the script was killed with exit code 137. (unfortunately I lost the error messages). The script called 'spoa' was mentioned in the error messages. Is it possible to resume the process using the hits_.fasta files (and this time I can supply the command with more memory)? If so, could you tell me which script to use please

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.