Giter VIP home page Giter VIP logo

uncalled's Introduction

UNCALLED

A Utility for Nanopore Current Alignment to Large Expanses of DNA

UNCALLED logo

A read mapper which rapidly aligns raw nanopore signal to DNA references

Enables software-based targeted sequenceing on Oxford Nanopore (ONT) MinION or GridION via adaptive sampling

Note that UNCALLED can only be applied to legacy r9.4.1 data. For r10.4.1 data try ReadFish or ONT's builtin adaptive sampling option.

Targeted nanopore sequencing by real-time mapping of raw electrical signal with UNCALLED
Sam Kovaka, Yunfan Fan, Bohan Ni, Winston Timp, Michael C. Schatz
Nature Biotechnology (2020)

For accurate end-to-end nanopore signal alignment, visualization, and analysis see Uncalled4

Installation

> pip3 install git+https://github.com/skovaka/UNCALLED.git --user

OR

> git clone --recursive https://github.com/skovaka/UNCALLED.git
> cd UNCALLED
> pip3 install .

Requires python >= 3.6, read-until == 3.0.0, pybind11 >= 2.5.0, and GCC >= 4.8.1 (all except GCC are automatically downloaded and installed)

Other dependencies are included via submodules, so be sure to clone with git --recursive

We recommend running on a Linux machine. UNCALLED has been successfully installed and run on Mac computers, but real-time ReadUntil has not been tested on a Mac. Installing UNCALLED has not been attempted on Windows.

Indexing

Example:

> uncalled index -o E.coli E.coli.fasta

Positional arguments:

  • fasta-file reference genome(s) or other target sequences in the FASTA format

Optional arguments:

  • -o/--bwa_prefix output index prefix (default: same as input fasta)

Note that UNCALLED uses the BWA FM Index to encode the reference, and this command will use a previously built BWA index if all the required files exist with the specified prefix. Otherwise, a new BWA index will be automatically built.

We recommend applying repeat masking your reference if it contains eukaryotic sequences. See masking for more details.

Fast5 Mapping

Example:

> uncalled map -t 16 E.coli fast5_list.txt > uncalled_out.paf
Loading fast5s
Mapping

> head -n 4 uncalled_out.paf
b84a48f0-9e86-47ef-9d20-38a0bded478e 3735 77 328 + Escherichia_coli_chromosome 4765434 2024611 2024838 66 228 255  ch:i:427 st:i:50085  mt:f:53.662560
77fe7f8c-32d6-4789-9d62-41ff482cf890 5500 94 130 + Escherichia_coli_chromosome 4765434 2333754 2333792 38 39  255  ch:i:131 st:i:238518 mt:f:19.497091
eee4b762-25dd-4d4a-8a59-be47065029be 2905     *      *      *      *      *      *      *      *      *       255  ch:i:44  st:i:302369 mt:f:542.985229
e175c87b-a426-4a3f-8dc1-8e7ab5fdd30d 8052 84 154 + Escherichia_coli_chromosome 4765434 1064550 1064614 41 65  255  ch:i:182 st:i:452368 mt:f:38.611683

Positional arguments:

  • bwa-prefix the BWA reference index prefix generated by uncalled map
  • fast5-files Reads to be mapped. Can be a directory which will be recursively searched for all files with the ".fast5" extension, a text file containing one fast5 filename per line, or a comma-separated list of fast5 file names.

Optional arguments:

  • -l/--read-list text file containing a list of read IDs. Only these reads will be mapped if specified
  • -n/--read-count maximum number of reads to map
  • -t/--threads number of threads to use for mapping (default: 1)
  • -e/--max-events-proc number of events to attempt mapping before giving up on a read (default 30,000). Note that there are approximately two events per nucleotide on average.

See example/ for a simple read and reference example.

Real-Time ReadUntil

Warning: in the latest MinKNOW version, an API bug may prevent UNCALLED from properly ejecting reads. You can identify this bug if you do not see a peak of small "adaptive sampling" reads in read length histogram. If this occurs you should stop your sequencing run, briefly start a new sequencing run with MinKNOW's builtin version of adaptive sampling enabled, then stop that run and restart your UNCALLED run. We have found that this may initialize something in MinKNOW which allows UNCALLED to function properly.

Example:

> uncalled realtime E.coli --port 8000 -t 16 --enrich -c 3 > uncalled_out.paf 
Starting client
Starting mappers
Mapping

> head -n 4 uncalled_out.paf
81ba344d-60df-4688-b37f-9064e76a3eb8 1352 *     *     *     *      *      *      *      *      *   255 ch:i:68  st:i:29101 mt:f:375.93841 wt:f:1440.934 mx:f:0.152565
404113c1-6ace-4690-885c-9c4a47da6476 450  *     *     *     *      *      *      *      *      *   255 ch:i:106 st:i:29268 mt:f:63.272270 wt:f:1591.070 en:f:0.010086
d9acafe3-23dd-4a0f-83db-efe299ee59a4 1355 *     *     *     *      *      *      *      *      *   255 ch:i:118 st:i:29378 mt:f:239.50201 wt:f:1403.641 ej:f:0.120715
8a6ec472-a289-4c50-9a75-589d7c21ef99 451  98 369 + Escherichia_coli 4765434 3421845 3422097 56 253 255 ch:i:490 st:i:29456 mt:f:79.419411 wt:f:8.551202 kp:f:0.097424

We recommend that you try mapping fast5s via uncalled map before real-time enrichment, as runtime issues could occur if UNCALLED is not installed properly.

The command can generally be run at any time before or during a sequencing run, although an error may occur if UNCALLED is run before any sequencing run has been started in the current MinKNOW session. If this is happens you should start UNCALLED after the run begins, ideally during the first mux scan. If you want to change the chunk size you must run the command before starting the run (see below).

Positional arguments:

  • bwa-prefix the BWA reference index prefix generated by uncalled map

Required arguments:

  • --enrich will keep reads that map to the reference if included OR
  • --deplete will eject reads that map to the reference if included Exactly one of --deplete or --enrich must be specified

Optional Arguments:

  • -c/--max-chunks number of chunks to attempt mapping before giving up on a read (default: 10).
  • --chunk-size size of chunks in seconds (default: 1). Note: this is a new feature and may not work as intended (see below)
  • -t/--threads number of threads to use for mapping (default: 1)
  • --port MinION device port.
  • --even will only eject reads from even channels if included
  • --odd will only eject reads from odd channels if included
  • --duration expected duration of sequencing run in hours (default: 72)

Altering Chunk Size

The ReadUntil API receives signal is "chunks", which by default are one second's worth of signal. This can be changed using the --chunk-size parameter. Note that --max-chunks-proc should also be changed to compensate for changes to chunk sizes. If the chunk size is changed, you must start running UNCALLED before sequencing begins. UNCALLED is unable to change the chunk size mid-seqencing-run. In general reducing the chunk size should improve enrichment, although previous work has found that the API becomes unreliable with chunks sizes less than 0.4 seconds. We have not thoroughly tested this feature, and recommend using the default 1 second chunk size for most cases. In the future this default size may be reduced.

Simulator

Example:

> uncalled sim E.coli.fasta /path/to/control/fast5s --ctl-seqsum /path/to/control/sequencing_summary.txt --unc-seqsum /path/to/uncalled/sequencing_summary.txt --unc-paf /path/to/uncalled/uncalled_out.paf -t 16 --enrich -c 3 --sim-speed 0.25 > uncalled_out.paf 2> uncalled_err.txt

> sim_scripts/est_genome_yield.py -u uncalled_out.paf --enrich -x E.coli -m mm2.paf -s sequencing_summary.txt --sim-speed 0.25

unc_on_bp       150.678033
unc_total_bp    6094.559395
cnt_on_bp       33.145022
cnt_total_bp    8271.651331

The simulator simulates a real-time run using data from two real runs: one control run and one UNCALLED run. Reads are simulated from the control run, and the pattern of channel activity of modeled after the control run. The simulator outputs a PAF file similar to the real-time mode, which can be interperted using scripts found in sim_scripts/.

Example files which can be used as template UNCALLED sequencing summary and PAF files for the simulator can be found here. The control reads/sequencing summary can be from any sequencing run of your sample of interest, and it does not have to match the sample used in the provided examples.

The simulator can take up a large amount of memory (> 100Gb), and loading the fast5 reads can take quite a long time. To reduce the time/memory requirements you could truncate your control sequencing summary and only the loads present in the summary will be loaded, although this may reduce the accuracy of the simulation. Also, unfortunately the fast5 loading portion of the simulator cannot be exited via a keyboard interrupt and must be hard-killed. I will work on fixing this in future versions.

Arguments:

  • bwa-prefix the prefix of the index to align to. Should be a BWA index that uncalled index was run on
  • control-fast5-files path to the directory where control run fast5 files are stored, or a text file containing the path to one control fast5 per line
  • --ctl-seqsum sequencing summary of the control run. Read IDs must match the control fast5 files
  • --unc-seqsum sequencing summary of the UNCALLED run
  • --unc-paf PAF file output by UNCALLED from the UNCALLED run
  • --sim-speed scaling factor of simulation duration in the range (0.0, 1.0], where smaller values are faster. Setting below 0.125 may decrease accuracy.
  • -t/--threads number of threads to use for mapping (default: 1)
  • -c/--max-chunks-proc number of chunks to attempt mapping before giving up on a read (default: 10). Note that for the simulator, altering this changes how many chunks is loaded from each each, changing the memory requirements.
  • --enrich will keep reads that map to the reference if included
  • --deplete will eject reads that map to the reference if included
  • --even will only eject reads from even channels if included
  • --odd will only eject reads from odd channels if included

Exactly one of --deplete or --enrich must be specified

Output Format

UNCALLED outputs to stdout in a format similar to PAF. Unmapped reads are output with reference-location-dependent fields replaced with *s. Lines that begin with "#" are comments that useful for debugging.

Query coordinates, residue matches, and block lengths are estimated assuming 450bp sequenced per second. This estimate can be significantly off depending on the sequencing run. UNCALLED attempts to map a read as early as possible, so the "query sequence length" and "query end" fields correspond to the leftmost position where UNCALLED was able to confidently map the read. In many cases this may only be 450bp or 900bp into the read, even if the read is many times longer than this. This differs from aligners such as minimap2, which attempt to map the full length of the read.

The "query sequence length" field currently does not correspond to the actual read length, rather an estimate of the number of bases that UNCALLED attempted to align. In most cases this will be equal to "query end". This may be changed to better reflect the full read length in future versions.

Both modes include the following custom attributes in each PAF entry:

  • mt: map time. Time in milliseconds it took to map the read.
  • ch: channel. MinION channel that the read came from.
  • st: start sample. Global sequencing start time of the read (in signal samples, 4000 samples/sec).

uncalled realtime also includes the following attributes:

  • ej: ejected. Time that the eject signal was sent, in milliseconds since last chunk was received.
  • kp: kept. Time that UNCALLED decided to keep the read, in milliseconds since last chunk was received.
  • en: ended. Time that UNCALLED determined the read ended, in milliseconds since last chunk was received.
  • mx: mux scan. Time that the read would have been ejected, had it not have occured within a mux scan.
  • wt: wait time. Time in milliseconds that the read was queued but was not actively being mapped, either due to thread delays or waiting for new chunks.

pafstats

We have included a functionality called uncalled pafstats which computes speed statistics from a PAF file output by UNCALLED. Accuracy statistics can also be included if provided a ground truth PAF file, for example based on [minimap2](https://github.com/lh3/minimap2 alignments of basecalled reads. There is also an option to output the original UNCALLED PAF annotated with comparisons to the ground truth.

Example:

> uncalled pafstats -r minimap2_alns.paf -n 5000 uncalled_out.paf
Summary: 5000 reads, 4373 mapped (89.46%)

Comparing to reference PAF
     P     N
T  88.74  6.80
F   0.60  3.74
NA: 0.12

Speed            Mean    Median
BP per sec:   4878.24   4540.50
BP mapped:     636.29    362.00
MS to map:     140.99     89.96

Positional arguments

  • infile PAF file output by UNCALLED

Optional arguments

  • -n/--max-reads maximum number of reads to parse
  • -r/--ref-paf ground-truth alignments (from minimap2) to compute TP/TN/FP/FN rates
  • -a/--annotate if used with -r, will output PAF with "rf:" tag indicating TP, TN, FP, or FN

Accuracy statistics:

  • TP: true positive - percent infile reads that overlap reference read locations
  • FP: false positive - percent infile reads that do not overlap reference read locations
  • TN: true negative - percent of reads which were not aligned in reference or infile
  • FN: false negative - percent of reads which were aligned in the reference but not in the infile
  • NA: not aligned/not applicable - percent of reads aligned in infile but not in reference. Could be considered a false positive, but the truth is unknown.

Practical Considerations

For ReadUntil sequencing, the first decision to make is whether to perform enrichment or depletion (--enrich or --deplete). In enrichment mode, UNCALLED will eject a read if it does not map to the reference, meaning your target should be the reference. In depletion mode, UNCALLED will eject a read if it does map to the reference, meaning your target should be everything except your reference.

Note that enrichment necessitates a quick decision as to whether or not a read maps, since you want to eject a read as fast as possible. Usually ~95% of reads can be mapped within three seconds for highly non-repetitive references, so setting -c/--max-chunks-proc to 3 generally works well for enrichment. The default value of 10 works well for depletion. Note these values assume --chunk-size is set to the default 1 second.

UNCALLED currently does not support large (> ~1Gbp) or highly repetitive references. The speed and mapping rate both progressively drop as references become larger and more repetitive. Bacterial genomes or small collections of divergent bacterial genomes typically work well. Small segments of eukaryotic genomes can also be used, however the presence of any repetitve elements will harm the performance. Collections of highly similar genomes wil not work well, as conserved sequences introduce repeats. See masking for repeat masking scripts and guidelines.

ReadUntil works best with longer reads. Maximize your read lengths for best results. You may also need to perform a nuclease flush and reloading to achieve the highest yield of on-target bases.

UNCALLED currently only supports reads sequenced with r9.4.1/r9.4 chemistry.

Release notes

  • v2.2: added event profiler which masks out pore stalls, and added compile-time debug options
  • v2.1: updated ReadUntil client for latest MinKNOW version, made uncalled index automatically build the BWA index, added hdf5 submodule, further automated installation by auto-building hdf5, switched to using setuptools, moved submodules to submods/
  • v2.0: released the ReadUntil simulator uncalled sim, which can predict how much enrichment UNCALLED could provide on a given reference, using a control and UNCALLED run as a template. Also changed the format of certain arguments: index prefix and fast5 list are now positional, and some flags have changed names.
  • v1.2: fixed indexing for particularly large or small reference
  • v1.1: added support for altering chunk size
  • v1.0: pre-print release

uncalled's People

Contributors

alshai avatar krozzle avatar mr-c avatar mschatz avatar skovaka avatar yfan2012 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

uncalled's Issues

Implementation on ARM64 Raspberry Pi 4

Hey There!

I am trying to get Uncalled up and running in our lab on a new Raspberry Pi 4 but am running into a few problems when I go to run the setup.py. It is throwing this error when trying to compile ksw.c in the /submods/bwa:

fatal error: emmintrin.h: No such file or directory

From what I can tell the error has to do with GCC but I have not been able to find a solution yet. I was able to get it up and running on another machine, but it has a x86_64 architecture compared to the ARM64 on the Raspberry Pi.

I guess I'm asking if there has been any testing if it can be run with a ARM64 architecture, or more so on any Raspberry Pi devices. For reference I am running Ubuntu 20.10 on both machines. Any help would be appreciated!

Trouble indexing fasta-file.

Hi all! I'm trying to get uncalled installed and setup.

I am getting the following error and I'm not sure if it is my numpy version (currently using 1.18.1) or something else.

uncalled index -i HIV1_FLT_2016_genome_DNA.fa -x HIV1_index
Initializing parameter search
Writing default parameters
Traceback (most recent call last):
  File "/home/will/anaconda/envs/nanoporetest/bin/uncalled", line 534, in <module>
    index_cmd(args)
  File "/home/will/anaconda/envs/nanoporetest/bin/uncalled", line 154, in index_cmd
    p.add_preset("default", tgt_speed=115)
  File "/home/will/anaconda/envs/nanoporetest/lib/python3.7/site-packages/uncalled/index.py", line 134, in add_preset
    delta = self.get_fn_speed(fn_locs, fn_pcks) - tgt_speed
  File "/home/will/anaconda/envs/nanoporetest/lib/python3.7/site-packages/uncalled/index.py", line 113, in get_fn_speed
    pcks = np.interp(self.all_locs, fn_locs, fn_pcks)
  File "<__array_function__ internals>", line 6, in interp
  File "/home/will/anaconda/envs/nanoporetest/lib/python3.7/site-packages/numpy/lib/function_base.py", line 1412, in interp
    return interp_func(x, xp, fp, left, right)
TypeError: Cannot cast array data from dtype('O') to dtype('float64') according to the rule 'safe'

Or is there some requirement about the reference file that's not documented?

It seems numpy is required, at least, for indexing yet it is not marked as a requirement. Can you pin a specific numpy version in a requirements file?

ImportError: libhdf5.so.103: cannot open shared object file: No such file or directory

Hi,
I'm getting this error when trying to run UNCALLED on ubuntu 18.04 on two different machines.
I'm installing HDF5 from source to /usr/local/hdf5 and building first with this command.

python3 setup.py build_ext --library-dirs /usr/local/hdf5/lib/ --include-dirs /usr/local/hdf5/include/
sudo python3 setup.py install

When I try to run uncalled it gives this error.

Traceback (most recent call last):
  File "/usr/local/bin/uncalled", line 25, in <module>
    from uncalled import mapping, index, pafstats
ImportError: libhdf5.so.103: cannot open shared object file: No such file or directory

Any help would be appreciated.
Thanks

Building reference gene panel, advise

Hi there,

I tried to use uncalled to improve the efficiency of cas9 targeted sequencing on low inputs and it seems the reference I generated rejected the reads after ~1KB instead of continuing to sequence the rest of the read.

uncalled realtime mouse_cas9panel.fa --port 8000 -t 8 --enrich -c 3 > uncalled_realtime.paf

uncalled _fail3

I have not had the same issue with enriching in small genomes, like virus for example in a mixture of host DNA, or with a reference panel based on hg38. Have you seen this before or know of problems with generating references from mm10?

I am designing a panel of genes for a large signature in mouse (>150 genes) and I want to make sure that this doesn't happen. Do you have any advise/ tips when making a reference fasta for a large panel of genes?

Ubuntu 20
Uncalled version 2.1
MinKnow GUI 4.1.22

Segmentation fault in 'uncalled sim'

Hello

I ran the uncalled simulation and got 278742 Segmentation fault (core dumped). I tried it on different HPCs and got the same problem. Any ideas to solve this?

I ran the simulation like this:

uncalled sim index/col_non_cen.fasta fast5/ --ctl-seqsum sequencing_summary.txt --unc-seqsum 20190809_zymo_seqsum.txt --unc-paf 20190809_zymo_uncalled.paf -t 4 --deplete --sim-speed 0.25 > sim_uncalled_out.paf 2> uncalled_err.txt

Resource usage summary:

    CPU time :                                   10876.19 sec.
    Max Memory :                                 40303 MB
    Average Memory :                             39679.36 MB
    Total Requested Memory :                     -
    Delta Memory :                               -
    Max Swap :                                   1 MB
    Max Processes :                              4
    Max Threads :                                9
    Run time :                                   30601 sec.
    Turnaround time :                            30602 sec.

It return a core file core.278742:

$ gdb -c core.278742

GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-110.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
[New LWP 278742]
[New LWP 280323]
[New LWP 280324]
[New LWP 280325]
[New LWP 280326]
Core was generated by `/scem/work/mowp/anaconda3/envs/uncalled/bin/python3 /scem/work/mowp/anaconda3/e'.
Program terminated with signal 11, Segmentation fault.
#0  0x00002ae7e2ad8554 in ?? ()

The uncalled_err.txt is attached. (uncalled_err.txt)

Any help very much appreciated.
Weipeng

Segmentation error in simulation

Hello

I am running the simulation with an unrelated control experiment and keep getting the error 32589 Segmentation fault (core dumped). Any ideas how to solve this?

This is my input

uncalled sim SARS2 /fast5 --ctl-seqsum sequencing_summary.txt --unc-seqsum 20190809_zymo_seqsum.txt --unc-paf 20190809_zymo_uncalled.paf -t 4 --enrich -c 3 --sim-speed 0.25 > uncalled_out.paf 2> uncalled_err.txt

This is the uncalled_err.txt

Loading UNCALLED PAF............
'================================
Procesing run...................
Generating pattern..............
'================================

Loading control PAF.............
'================================
Procesing run...................
Ordering reads..................
================================0 loaded

Any help very much appreciated.

Karl

AttributeError: 'uncalled.minknow_client' has no attribute 'Client'

Hello,

I am trying to install UNCALLED on a Linux 64 bit system with Ubuntu 18.4 in an Anaconda environment. The uncalled index and uncalled map functions have been tested and are working fine. However, as soon as I want to test realtime enrichment for uncalled, I get the following error message:

uncalled realtime LTR12_DNA_sequence.fa --port 8000 -t 16 -c 3 --enrich > out.paf
Traceback (most recent call last):
  File "/home/nanopore/.local/bin/uncalled", line 196, in realtime_cmd
    client = unc.minknow_client.Client(conf.host, conf.port, conf.chunk_time, conf.num_channels)
AttributeError: module 'uncalled.minknow_client' has no attribute 'Client'

Expected behavior:

> uncalled realtime E.coli --port 8000 -t 16 --enrich -c 3 > uncalled_out.paf 
Starting client
Starting mappers
Mapping

All needed packages were installed using the latest release:

python3 -m pip list
Package     Version
----------- -------------------
certifi     2020.6.20
chardet     4.0.0
grpcio      1.36.1
idna        2.10
minknow-api 4.1.5
mkl-fft     1.3.0
mkl-random  1.1.1
mkl-service 2.3.0
numpy       1.19.2
packaging   20.9
pip         21.0.1
protobuf    3.15.5
pybind11    2.6.2
pyparsing   2.4.7
python-git  2018.2.1
read-until  3.0.0
requests    2.25.1
Send2Trash  1.5.0
setuptools  52.0.0.post20210125
six         1.15.0
uncalled    2.2
urllib3     1.26.3
wheel       0.36.2

Am I missing something during the installation of uncalled?
Any help would be very much appreciated!

Cheers,
Marvin

Elimination of false positive seed mappings and detection of indels

Hi! I found the following in your preprint:

"Due to the noisy nature of nanopore sequencing, UNCALLED must use very loose thresholds for event/k-mer matches, which produce many false positive seed mappings. We eliminate these false positives under the observation that they will usually map to random locations, while true positives will map to locations consistent with their position on the read."

How will this affect the use of UNCALLED for the detection of indels and other structural variants, considering that these mappings are inconsistent with their position on the read?

fast5 IO in simulate module

Hi @skovaka ,

after compilation of this tool I keep having trouble with loading fast5 files. For some reason the filename is not properly passed to the HDF5 loading binary. So when I call the simulate module

uncalled simulate --bwa-prefix some/prefix --fast5s some/fast5 --enrich

I get this error:

HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:
  #000: /home/hdftest/snapshots-hdf5_1_10_5/current/src/H5F.c line 509 in H5Fopen(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #001: /home/hdftest/snapshots-hdf5_1_10_5/current/src/H5Fint.c line 1498 in H5F_open(): unable to open file: time = Fri Nov 15 17:19:56 2019
', tent_flags = 0
    major: File accessibilty
    minor: Unable to open file
  #002: /home/hdftest/snapshots-hdf5_1_10_5/current/src/H5FD.c line 734 in H5FD_open(): open failed
    major: Virtual File Layer
    minor: Unable to initialize object
', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0FD_sec2_open(): unable to open file: name = '�HDF
    major: File accessibilty
    minor: Unable to open file
Traceback (most recent call last):
  File "./build/scripts-2.7/uncalled", line 510, in <module>
    simulate_cmd(args)
  File "./build/scripts-2.7/uncalled", line 421, in simulate_cmd
    sim.add_fast5s(args.fast5s, args.read_count)
: error in H5FopenDF

I have made sure that the fast5 is not corrupted. I get the same error when I use the fast5 from the example directory. I have tried this with a bunch of different installations of HDF5. But in the end I keep getting this error.
Any idea of what could be the matter here I would really appreciate.
Best,
Torsten

Error when running uncalled map

Hi,

I am currently getting an error when trying to run uncalled map.

uncalled map nif_data/NifH_data/uncalled_cdhit_ref/5k_nifH nanopore_data/single_fast5/0/8384d558-af58-4306-b661-34ed2ec7e3b7.fast5

Loading fast5s
Mapping
HDF5-DIAG: Error detected in HDF5 (1.8.21) thread 140484998567744:
#000: H5Dio.c line 223 in H5Dread(): can't read data
major: Dataset
minor: Read failed
#1: H5Dio.c line 605 in H5D__read(): can't read data
major: Dataset
minor: Read failed
#2: H5Dchunk.c line 2093 in H5D__chunk_read(): unable to read raw data chunk
major: Low-level I/O
minor: Read failed
#3: H5Dchunk.c line 3123 in H5D__chunk_lock(): data pipeline read failed
major: Data filters
minor: Filter operation failed
#4: H5Z.c line 1311 in H5Z_pipeline(): required filter 'vbz' is not registered
major: Data filters
minor: Read failed
#5: H5PL.c line 380 in H5PL_load(): search in paths failed
major: Plugin for dynamically loaded library
minor: Can't get value
#6: H5PL.c line 738 in H5PL__find(): can't open directory: /usr/local/hdf5/lib/plugin
major: Plugin for dynamically loaded library
minor: Can't open directory or file
Traceback (most recent call last):
File "/home/ubuntu/.local/bin/uncalled", line 341, in
map_cmd(conf, args)
File "/home/ubuntu/.local/bin/uncalled", line 157, in map_cmd
for p in mapper.update():
RuntimeError: /Raw/Reads/Read_137128/Signal: error in H5Dread
terminate called without an active exception
Aborted (core dumped)

Here is the file version:
HDF5 "8384d558-af58-4306-b661-34ed2ec7e3b7.fast5" {
ATTRIBUTE "file_version" {
DATATYPE H5T_IEEE_F64LE
DATASPACE SCALAR
DATA {
(0): 2
}
}
}

and the listing of the fast5 groups

/ Group
/Analyses Group
/Analyses/Basecall_1D_000 Group
/Analyses/Basecall_1D_000/BaseCalled_template Group
/Analyses/Basecall_1D_000/BaseCalled_template/Fastq Dataset {SCALAR}
/Analyses/Basecall_1D_000/Summary Group
/Analyses/Basecall_1D_000/Summary/basecall_1d_template Group
/Analyses/Segmentation_000 Group
/Analyses/Segmentation_000/Summary Group
/Analyses/Segmentation_000/Summary/segmentation Group
/Raw Group
/Raw/Reads Group
/Raw/Reads/Read_137128 Group
/Raw/Reads/Read_137128/Signal Dataset {14750/Inf}
/UniqueGlobalKey Group
/UniqueGlobalKey/channel_id Group
/UniqueGlobalKey/context_tags Group
/UniqueGlobalKey/tracking_id Group

Do you have any ideas why this may be failing?

Thanks!

PC spec for realtime run?

What kind of computing resource do I need to run in realtime?
I'm working on a quad-core i7-4790 cpu, 32Gb ram, 1Tb ssd PC. How do I know if this is applicable for a realtime run?

I tried uncalled map 40k reads to a single 26 mb chromosome of a 180 mb genome, mimicking the enrichment mode. It took ~90m to finish, and I found this is much slower than the result in paper (but different in genome size).

$ uncalled pafstats test.out.paf
Summary: 40009 reads, 5109 mapped (12.77%)

Speed            Mean    Median
BP per sec:   4206.46   4065.79
BP mapped:     875.58    452.00
MS to map:     229.15    116.13

Regarding library prep for UNCALLED sequencing

Hi,

We are about to try UNCALLED for the first time, and I have a couple of questions regarding the library preparation.

  1. What is the recommended DNA input?
  2. In the paper, SQK-LSK109 was used. Would SQK-LSK110 work just as well?
  3. What is the reason you excluded the DNA CS and FFPE end repair in the library prep in the paper?

I'm very excited to try this method. Thanks for sharing such great work!

MemoryError: std::bad_alloc

Thank you for writing UNCALLED. It was a seamless install and worked quite well with small genomes.

Not sure if I am doing something I am not supposed to do :D - I was trying to run uncalled index on chr22, however, I ran into a MemoryError: std::bad_alloc. Seems like it runs out of memory even on a 380GB server. Could it be due to a memory leak or something?

(uncalled) hasindu@genometech-gpgpu:/data/hasindu$ /usr/bin/time -v uncalled index chr22.fa
Using previously built BWA index.
Note: to fully re-build the index delete files with the "chr22.fa.*" prefix.
Initializing parameter search
Traceback (most recent call last):
  File "/home/hasindu/uncalled/bin/uncalled", line 339, in <module>
    index_cmd(args)
  File "/home/hasindu/uncalled/bin/uncalled", line 56, in index_cmd
    p = unc.index.IndexParameterizer(args)
  File "/home/hasindu/uncalled/lib/python3.6/site-packages/uncalled/index.py", line 62, in __init__
    self.calc_map_stats(args)
  File "/home/hasindu/uncalled/lib/python3.6/site-packages/uncalled/index.py", line 82, in calc_map_stats
    fmlens = unc.self_align(args.bwa_prefix, sample_dist)
MemoryError: std::bad_alloc
Command exited with non-zero status 1
        Command being timed: "uncalled index chr22.fa"
        User time (seconds): 6625.89
        System time (seconds): 770.15
        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 2:03:20
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 380383136
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 481
        Minor (reclaiming a frame) page faults: 254323991
        Voluntary context switches: 1470
        Involuntary context switches: 408568
        Swaps: 0
        File system inputs: 286888
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 1

Minimum hardware requirements

Hi, I noticed in the Uncalled manuscript a rather beefy machine was used for the sequencing. I was curious whether someone could comment on the minimum hardware specs to run this with good performance. For instance, would a computer with the minimum specs for a MinION (ie 16GB RAM, >4 core CPU, ~1TB SSD) be sufficient to run this in real time? If not, what should be modified?

Cheers.

[QUESTION] 10X run-time for reads of 500 raw signals

Hello,
I'm running uncalled_map with 60 threads on a 100Kb long reference and I'm observing the run-time is approximately 10X higher if average signal length is 500 instead of 1000. Adapters are trimmed, r9.4.1 DNA reads.
Do you think there is a reason for this?

uncalled failed to connect to minknow instance

Hi,

I have been running uncalled successfully for the past several months on Ubuntu 20.04. After a round of updates, I now get the following error message:

uncalled realtime reference_genomes/Ames/genome.fasta -t 10 --enrich --full -c 3 | tee test.paf
[15:23:21 - ReadUntil] Creating many chunk client with ReadCache data queue filtering to strand2, strand1, and strand read chunks.
[15:23:21 - ReadUntil] Using pre-defined read classification map.
[15:23:21 - ReadUntil] Creating rpc connection on port 8000.
[15:23:21 - minknow_api] Error received from rpc
[15:23:21 - minknow_api] Failed to connect to minknow instance (retry 1/5): failed to connect to all addresses
[15:23:22 - minknow_api] Error received from rpc
[15:23:22 - minknow_api] Failed to connect to minknow instance (retry 2/5): failed to connect to all addresses
[15:23:24 - minknow_api] Error received from rpc
[15:23:24 - minknow_api] Failed to connect to minknow instance (retry 3/5): failed to connect to all addresses
[15:23:26 - minknow_api] Error received from rpc
[15:23:26 - minknow_api] Failed to connect to minknow instance (retry 4/5): failed to connect to all addresses
[15:23:29 - minknow_api] Error received from rpc
[15:23:29 - minknow_api] Failed to connect to minknow instance (retry 5/5): failed to connect to all addresses
Traceback (most recent call last):
File "/home/jason/.local/lib/python3.8/site-packages/uncalled-2.2-py3.8-linux-x86_64.egg/EGG-INFO/scripts/uncalled", line 195, in realtime_cmd
client = unc.minknow_client.Client(conf.host, conf.port, conf.chunk_time, conf.num_channels)
File "/home/jason/.local/lib/python3.8/site-packages/uncalled-2.2-py3.8-linux-x86_64.egg/uncalled/minknow_client.py", line 43, in init
read_until.ReadUntilClient.init(
File "/home/jason/anaconda3/envs/pima/lib/python3.8/site-packages/read_until/base.py", line 255, in init
self.connection = minknow_api.Connection(host=self.mk_host, port=self.grpc_port)
File "/home/jason/anaconda3/envs/pima/lib/python3.8/site-packages/minknow_api/init.py", line 327, in init
raise error
File "/home/jason/anaconda3/envs/pima/lib/python3.8/site-packages/minknow_api/init.py", line 299, in init
self.instance.get_version_info()
File "/home/jason/anaconda3/envs/pima/lib/python3.8/site-packages/minknow_api/instance_service.py", line 93, in get_version_info
return run_with_retry(self._stub.get_version_info,
File "/home/jason/anaconda3/envs/pima/lib/python3.8/site-packages/minknow_api/instance_service.py", line 37, in run_with_retry
result = MessageWrapper(method(message, timeout=timeout), unwraps=unwraps)
File "/home/jason/anaconda3/envs/pima/lib/python3.8/site-packages/grpc/_channel.py", line 946, in call
return _end_unary_response_blocking(state, call, False, None)
File "/home/jason/anaconda3/envs/pima/lib/python3.8/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "failed to connect to all addresses"
debug_error_string = "{"created":"@1652912609.827260489","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3008,"referenced_errors":[{"created":"@1652912609.827259173","description":"failed to connect to all addresses","file":"src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc","file_line":397,"grpc_status":14}]}"

Is this an uncalled problem or a minknow problem? Thanks,

Chandler

uncalled list-ports - nothing listed

Hi,

I have installed UNCALLED and Read Until Api
following the recommendations.

/opt/ont/minknow/ont-python/bin/python setup.py build_ext --library-dirs /usr/lib/x86_64-linux-gnu/hdf5/serial/lib/ --include-dirs /usr/lib/x86_64-linux-gnu/hdf5/serial/include
sudo /opt/ont/minknow/ont-python/bin/python setup.py install 

However, when I am trying to look for the port nothing is happening.

uncalled list-ports --log-dir /var/log/minknow/

Do you have any idea how to get it to work?

Simulator bug?

When running the simulator, I ran the following command and got the following error:
uncalled sim ../../UNCALLED/example/e.coli.fasta . --ctl-seqsum 20190809_zymo_seqsum.txt --unc-seqsum 20191220_GM12878_seqsum.txt --unc-paf 20191220_GM12878_uncalled.paf --enrich -c 3 --sim-speed 0.25 > uncalled_out.paf 2> uncalled_err.txt

Loading control PAF.............

Procesing run...................
Ordering reads..................
Traceback (most recent call last):
File "/usr/local/bin/uncalled", line 276, in realtime_cmd
sim_utils.load_sim(client, conf)
File "/usr/local/lib/python2.7/dist-packages/uncalled/sim_utils.py", line 387, in load_sim
odr = np.flip(np.argsort(diff))
TypeError: flip() takes exactly 2 arguments (1 given)

I'm using Linux Mint, fresh install of everything.

[Question] RNA sequencing

I really like your preprint and I would love to use it for direct RNA sequencing.
Do you happen to work on adjusting the UNCALLED for direct RNA sequencing? Given that an accurate RNA k-mer model is available, how difficult is it to optimize event detection parameters? I understand that by event detection parameters you mean index probability threshold, am I right?

ModuleNotFoundError: file_read_backwards

I have installed uncalled into a new virtual environment using the pip install git+https://github.com/skovaka/UNCALLED.git installation method. When I then run uncalled index command I get the error text copied at the end of this issue indicating that the file_read_backwards module is not found. It looks like the use of the imported function, FileReadBackwards, is commented out in debug.py. It seems that the file-read-backwards package should be added as a dependency in setup.py or the import should be removed from debug.py.

Traceback (most recent call last):
  File "/venv/bin/uncalled", line 35, in <module>
    import uncalled as unc
  File "/venv/lib/python3.7/site-packages/uncalled/__init__.py", line 2, in <module>
    from uncalled import index, pafstats, sim_utils, args, debug, minknow_client
  File "/venv/lib/python3.7/site-packages/uncalled/debug.py", line 8, in <module>
    from file_read_backwards import FileReadBackwards
ModuleNotFoundError: No module named 'file_read_backwards'

Question about only the leftmost mapped reads are reported

Hi, thanks for the great work and the comprehensive paper.

I noticed you mentioned "UNCALLED attempts to map a read as early as possible, so the "query end" field corresponds to the leftmost position where UNCALLED was able to confidently map the read. In many cases this may only be 450bp or 900bp into the read", and would like to know if there is any way to report all the mapped and aligned reads using UNCALLED? (e.g. Can I add any arguments or make minor changes on the scripts?)

Your insights or advice is highly appreciated! Thanks!

undemultiplex fast5

Hi all
just to make sure I understood well. I have a MinIon run of a set of amplicon samples (which includes sequencing from different bacterial strains). In order to use UNCALLED a posteriori (non in real time) on the fast5 files, do I need to split them as function of barcode, isn't? Or I can generate a huge multi fasta file and then split based on ... what?
Best for any advice

Unmapped reads not ejecting

Hello,

I started an instance of UNCALLED using the following command:

uncalled realtime Csojina.fasta --port 8000 -t 8 --enrich -c 3 > uncalled_out_soybeansojina2.paf

When I look at the .paf output, it appears that there are a lot of reads that do not map to the reference, but are not ejected. Example:

ed2e1457-91b1-4c27-8159-c01f08d39d55	450	*	*	*	*	*	*	*	*	*	255	ch:i:404	   st:i:24150816	qt:f:150.188721	mt:f:8.359437	wt:f:1441.921387	en:f:0.037671
8eeed2ad-1525-42a5-bff2-132c9f5e9d44	453	*	*	*	*	*	*	*	*	*	255	ch:i:357	st:i:24151447	  qt:f:142.428894	mt:f:7.929103	wt:f:1451.144775	en:f:0.037727
d3996ea4-37db-4649-b24f-f077ac9081d1	450	*	*	*	*	*	*	*	*	*	255	ch:i:268	st:i:24151338	qt:f:10.439973	mt:f:17.490294	wt:f:1628.886597	en:f:0.046620
a89bfc35-573d-4073-8cb3-8b04278bcb95	450	*	*	*	*	*	*	*	*	*	255	ch:i:156	st:i:24150719	qt:f:10.750773	mt:f:69.235641	wt:f:1931.648438	en:f:0.070621
c50db927-83f0-49bf-8f94-2ed193ccbd82	452	*	*	*	*	*	*	*	*	*	255	ch:i:487	st:i:24150905	qt:f:39.190941	mt:f:9.406738

Overall, it seems like roughly .75% of reads are ejected and .25% of reads are kept. But the remaining reads that don't have either designation are still present in the FASTQ files. Am I running the program or interpreting something incorrectly? Thanks for your help!

Jon

No acquisition run available: failed pre-condition error

Awesome tool, congrats on setting this up!

I am trying to run Uncalled realtime and I am getting some errors.
I created an environment and installed UNCALLED and read_until_api from git in it.

I set up the build using:

python3 setup.py build_ext --library-dirs /home/hh/UNCALLED/submods/hdf5/lib/ --include-dirs /home/hh/UNCALLED/submods/hdf5/include/
sudo python3 setup.py install

I was able to index my fasta and perform mapping of fast5s sucessfully, but when I attempt to run realtime using:

uncalled realtime /home/hh/Documents/refs/HBV_GA/HBV_GA_2.fasta --port 8000 -t 16 --enrich -c 3 > /home/hh/Documents/Basecalled/20200820/uncalled_realtime.paf

I get the following error:

Traceback (most recent call last):
  File "/home/hh/miniconda3/envs/UNCALLED/lib/python3.6/site-packages/uncalled-2.1-py3.6-linux-x86_64.egg/EGG-INFO/scripts/uncalled", line 308, in realtime_cmd
    if not client.run():
  File "/home/hh/miniconda3/envs/UNCALLED/lib/python3.6/site-packages/uncalled-2.1-py3.6-linux-x86_64.egg/uncalled/minknow_client.py", line 70, in run
    if not self._wait_for_start(steady_wait, refresh):
  File "/home/hh/miniconda3/envs/UNCALLED/lib/python3.6/site-packages/uncalled-2.1-py3.6-linux-x86_64.egg/uncalled/minknow_client.py", line 188, in _wait_for_start
    state = self._get_run_state()
  File "/home/hh/miniconda3/envs/UNCALLED/lib/python3.6/site-packages/uncalled-2.1-py3.6-linux-x86_64.egg/uncalled/minknow_client.py", line 170, in _get_run_state
    return self.connection.acquisition.get_acquisition_info().state
  File "/home/hh/miniconda3/envs/UNCALLED/lib/python3.6/site-packages/minknow_api-4.0.4-py3.6.egg/minknow_api/acquisition_service.py", line 231, in get_acquisition_info
    return run_with_retry(self._stub.get_acquisition_info, _message, _timeout, [], "minknow_api.acquisition.AcquisitionService")
  File "/home/hh/miniconda3/envs/UNCALLED/lib/python3.6/site-packages/minknow_api-4.0.4-py3.6.egg/minknow_api/acquisition_service.py", line 72, in run_with_retry
    result = MessageWrapper(method(message, timeout=timeout), unwraps=unwraps)
  File "/home/hh/miniconda3/envs/UNCALLED/lib/python3.6/site-packages/grpcio-1.32.0-py3.6-linux-x86_64.egg/grpc/_channel.py", line 826, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/home/hh/miniconda3/envs/UNCALLED/lib/python3.6/site-packages/grpcio-1.32.0-py3.6-linux-x86_64.egg/grpc/_channel.py", line 729, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.FAILED_PRECONDITION
	details = "No acquisition run available"
	debug_error_string = "{"created":"@1601990520.207969892","description":"Error received from peer ipv4:127.0.0.1:8000","file":"src/core/lib/surface/call.cc","file_line":1061,"grpc_message":"No acquisition run available","grpc_status":9}"
>

"No acquisition run available" does that mean I have an issue with MinKnow install/ have I missed something?
Any advise or suggestions appreciated.
Thank you in advance:-)

Ubuntu 16.04
MinKnow 4.0.21

Uncalled in ubuntu 16

Hello. Trying to test Uncalled.
Followed the instructions to install it in my computer (ubuntu 16, python3.7)
It seems that the compilation is failing. I try to install multiple libraries to fix the problem but still no success.
I also update the gcc

It is falling with
Fatal error : Python.h No such file or directory

error command 'usr/bun/x86_64-linuc-gnu-gcc' failed with exit code 1

Thanks

Sorry for ask a stringtie2 question here

Hello, Dr Skovaka:

Sorry again for ask this question here because I can not find a issue page or your email in the stringtie2 page. You can close this issue whenever you like.
From your paper about stringtie2, I believe i can handle my Ont RNA reads well with it.
But I get a much larger transcripts(about 30-40k) from stringtie2 no matter use ONT reads or Illumina reads., but the actually length should be 3-4k. I did all these with default paras and the input bam all result from hisat and minimap2.
Could you please give some tips about this or How should I process my Ont reads for genome annotation.
Thanks a lot.

ZhangZhou from JXAU,Nanchang,China.

support for python 3.7

The new upcoming Minknow software release version 20.03 will work with python 3.7. Does uncalled still works with this new Python? Or does it means that uncalled runs on python2 and minknow on python3?

Last channel is too large error

Hi,

When I try to run UNCALLED for read_until whilst using a FLONGLE, I get the following error ...

[14:59:17 - ReadUntil] Creating many chunk client with ReadCache data queue filtering to strand, strand2, and strand1 read chunks.
[14:59:17 - ReadUntil] Using pre-defined read classification map.
[14:59:17 - ReadUntil] Creating rpc connection on port 8008.
[14:59:17 - ReadUntil] Got rpc connection.
[14:59:17 - UNCALLED] Run already in progress
Traceback (most recent call last):
  File "./uncalled", line 287, in realtime_cmd
    cal = client.device.rpc.device.get_calibration(first_channel=1, last_channel=128)
...
File "/opt/ont/minknow/ont-python/lib/python2.7/site-packages/grpc/_channel.py", line 466, in _end_unary_response_blocking
    raise _Rendezvous(state, None, None, deadline)
_Rendezvous: <_Rendezvous of RPC that terminated with:
	status = StatusCode.INVALID_ARGUMENT
	details = "last_channel is too large"
	debug_error_string = "{"created":"@1593781157.102897145","description":"Error received from peer","file":"src/core/lib/surface/call.cc","file_line":1039,"grpc_message":"last_channel is too large","grpc_status":3}"

Installation on Ubuntu 20.04.2

Hello,

I am new to working with nanopore sequencing data and I would like to try UNCALLED to map some raw public electrical signal data which I have to DNA reference. By using the following command pip3 install git+https://github.com/skovaka/UNCALLED.git --user I believe that I am able to successfully install UCALLED as I get the following:

Collecting git+https://github.com/skovaka/UNCALLED.git
Cloning https://github.com/skovaka/UNCALLED.git to /tmp/pip-req-build-4gxwg_2g
Running command git clone -q https://github.com/skovaka/UNCALLED.git /tmp/pip-req-build-4gxwg_2g
Running command git submodule update --init --recursive -q
Requirement already satisfied (use --upgrade to upgrade): uncalled==2.2 from git+https://github.com/skovaka/UNCALLED.git in ./.local/lib/python3.8/site-packages/uncalled-2.2-py3.8-linux-x86_64.egg
Building wheels for collected packages: uncalled
Building wheel for uncalled (setup.py) ... done
Created wheel for uncalled: filename=uncalled-2.2-cp38-cp38-linux_x86_64.whl size=11639490 sha256=12580f68f59df2b4d21dcebe3dd5278e8dd545eb1bccf7309c88e2004bedf0f7
Stored in directory: /tmp/pip-ephem-wheel-cache-yjk3biym/wheels/b4/66/64/ee36746e3b4be0eea3b5c6579661fc526addfc626a5ed78e98
Successfully built uncalled

However, it seems like the 'uncalled' command is not recognized yet as while I try to run some examples I get the following: uncalled: command not found.

Please, am I missing some step in between installation and trying the examples?

Any help would be very much appreciated.

Cheers,
Enio

Updated Uncalled and Minknow No Longer Working

Hello,

I am a researcher working on enriching fungal pathogen genomes from mammal host tissues, which I have successfully done using your UNCALLED software in the past. However, after being forced to update MinKNOW and then updating UNCALLED so it would work with the newest version, UNCALLED has essentially stopped functioning.

In the .paf files, no reads are being kept even though I know there is target DNA in the sample, and mapping to the target afterward using minimap2 does show there are reads from the target. Additionally, although the .paf says that reads are being ejected, I don't see the characteristic "spike" in the read length distribution in MinKNOW at the length they say they are being ejected at. The size distribution is in fact identical to a run I performed without UNCALLED, suggesting to me that UNCALLED is simply not working or actually enriching for my target.

Do you have any guidance?

Thank you!

Installing UNCALLED4 error

Hi

I am trying to install UNCALLED4 to plot some signal alignment from Nanopolish/f5c. However, cloned the relevant branch and did a pip3 install . However, I am getting the following compilation error.

    gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -DPYBIND -I./submods -I./submods/hdf5/include -I./submods/fast5/include -I./submods/pdqsort -I./submods/toml11 -I/home/hasindu/UNCALLED/venv/lib/python3.9/site-packages/pybind11/include -I/home/hasindu/UNCALLED/venv/lib/python3.9/site-packages/pybind11/include -I/home/hasindu/UNCALLED/venv/include -I/home/hasindu/python-3.9.10/include/python3.9 -c src/read_buffer.cpp -o build/temp.linux-x86_64-3.9/src/read_buffer.o -fvisibility=hidden -g0 -std=c++11 -O3 -g
    src/dataframe.cpp:5:41: error: ‘constexpr’ needed for in-class initialization of static data member ‘const std::array<const char*, 3ul> AlnCoords::columns’ of non-integral type [-fpermissive]
     decltype(AlnCoords::columns) AlnCoords::columns;
                                             ^
    src/dtw.cpp: In function ‘pybind11::array_t<Coord, 16> get_guided_bands(PyArray<int>&, PyArray<int>&, PyArray<int>&, size_t, i32)’:
    src/dtw.cpp:17:26: warning: unused variable ‘b’ [-Wunused-variable]
         size_t i = 0, j = 0, b = 0;
                              ^
    src/mapper.cpp: In member function ‘void Mapper::update_seeds(Mapper::PathBuffer&, bool)’:
    src/mapper.cpp:737:14: warning: variable ‘clust’ set but not used [-Wunused-but-set-variable]
             auto clust = seed_tracker_.add_seed(
                  ^
    In file included from src/pybinder.cpp:5:0:
    src/fast5_py_read.hpp: In constructor ‘Fast5PyRead::Fast5PyRead(pybind11::object)’:
    src/fast5_py_read.hpp:97:30: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
             for (size_t i = 0; i < raw_info.shape[0]; i++) {
                                  ^
    In file included from src/pybinder.cpp:15:0:
    src/compare.hpp: In member function ‘Compare::DistIter::Coefs Compare::DistIter::next_dist(int)’:
    src/compare.hpp:93:21: warning: unused variable ‘dist’ [-Wunused-variable]
                 int len,dist;
                         ^
    In file included from src/pybinder.cpp:15:0:
    src/compare.hpp: In lambda function:
    src/compare.hpp:134:47: warning: narrowing conversion of ‘c.Compare::rec.std::vector<_Tp, _Alloc>::size<Compare::Rec, std::allocator<Compare::Rec> >()’ from ‘std::vector<Compare::Rec>::size_type {aka long unsigned int}’ to ‘pybind11::ssize_t {aka long int}’ inside { } [-Wnarrowing]
                 return py::array_t<Rec>{c.rec.size(), c.rec.data()};
                                                   ^
    gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -DPYBIND -I./submods -I./submods/hdf5/include -I./submods/fast5/include -I./submods/pdqsort -I./submods/toml11 -I/home/hasindu/UNCALLED/venv/lib/python3.9/site-packages/pybind11/include -I/home/hasindu/UNCALLED/venv/lib/python3.9/site-packages/pybind11/include -I/home/hasindu/UNCALLED/venv/include -I/home/hasindu/python-3.9.10/include/python3.9 -c src/seed_tracker.cpp -o build/temp.linux-x86_64-3.9/src/seed_tracker.o -fvisibility=hidden -g0 -std=c++11 -O3 -g
    error: command '/usr/bin/gcc' failed with exit code 1
    src/pybinder.cpp: In instantiation of ‘void pybind_kmers(pybind11::module_&) [with long unsigned int ...Ks = {4ul, 5ul, 6ul, 7ul, 8ul, 9ul, 10ul, 11ul, 12ul}]’:
    src/pybinder.cpp:115:41:   required from here
    src/pybinder.cpp:47:10: warning: unused variable ‘_’ [-Wunused-variable]
         auto _ = {(pybind_kmer<Ks>(m))...};
              ^
    ----------------------------------------
ERROR: Command errored out with exit status 1: /home/hasindu/UNCALLED/venv/bin/python3.9 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-096oeoxz/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-096oeoxz/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-aco86b8v/install-record.txt --single-version-externally-managed --compile --install-headers /home/hasindu/UNCALLED/venv/include/site/python3.9/uncalled Check the logs for full command output.

Any thoughts what I am doing wrong?

Computer requirement for UNCALLED

Hello everyone,
I used UNCALLED to enrich a set of genes (repeat masked already) of human genome, a total of 46M in size. When I performed this experiment on a used flowcell with 118 pores usable, UNCALLED is running well, ejecting 7471 reads from all 36449 reads (it seemed much fewer than expectation). However, when I performed this on a new flowcell with >1200 pores usable, I found few reads were ejected (51 reads of 17410 reads in first 15 minutes). I looked into the sequencing summary text and found reads were no longer ejected after 6 minutes. The occupied memory was larger and larger, and finally UNCALLED was killed by the system. It seemed UNCALLED was stuck in some process. The computer has UBUNTU 16.04 with 16 cores and 16G memory and 1T storage. I wonder whether the computer hardware was enough for this UNCALLED task? Or there's something I did wrong?

Thank you!

Computer Specifications

I am interested in using UNCALLED, can you give me an idea of the specification of computer required to run this to exclude the human genome in real time? Specifically, is a GPU required? Which Nanopore devices is it compatible with?

`sim` segmentation error

The error I got:

/var/spool/slurmd/job10576850/slurm_script: line 34: 22333 Segmentation fault      python UNCALLED/scripts/uncalled sim $bwa_prefix $path_ctl_fast5s --ctl-seqsum $path_ctl_seqsum --unc-seqsum $path_unc_seqsum --unc-paf $path_unc_paf -t 16 --enrich -c 3 --sim-speed 0.25 > uncalled_out.paf 2> uncalled_err.txt

In uncalled_err.txt:

Loading UNCALLED PAF............
================================
Procesing run...................
Generating pattern..............
================================

Loading control PAF.............
==========
Procesing run...................
Ordering reads..................
================================

I think the problem is that I provided a txt file storing all the paths to actual fast5 files, and not a directory of fast5 files.
Below is the script I run:

bwa_prefix="sim_data/viral_genome_ref"
ref_genome="sim_data/viral.1.1.genomic.fna.gz"

path_ctl_fast5s="sim_data/NA12878-DirectRNA_subset.files.txt"
path_ctl_seqsum="sim_data/NA12878-DirectRNA_subset_Guppy_4.2.2_sequencing_summary.txt"

path_unc_seqsum="sim_data/20191220_GM12878_seqsum.txt"
path_unc_paf="sim_data/20191220_GM12878_uncalled.paf"

python UNCALLED/scripts/uncalled sim $bwa_prefix $path_ctl_fast5s --ctl-seqsum $path_ctl_seqsum --unc-seqsum $path_unc_seqsum --unc-paf $path_unc_paf -t 16 --enrich -c 3 --sim-speed 0.25 > uncalled_out.paf 2> uncalled_err.txt

What should I do if I want to pass a txt file?

[QUESTION] mapping to reference stringency

Hi,

Is it possible to increase the stringency of the chunk mapping to the reference for enrichment? We are getting a few thousands reads that map to our reference from UNCALLED but when we run the data through kraken we get very few accurate reads to our bacterial genome we are enriching for. I realize we need to allow for errors but can we adjust the number of mismatches allowed in a chunk? Thanks,

Chandler

Error running 'uncalled map'

Hello,

I've installed UNCALLED and am attempting to run through the example. Indexing via BWA and UNCALLED appeared to run normally, however, I am getting an error when trying to map. This is the command used:

uncalled map -t 8 example_ref.fa fast5_filename.txt > uncalled_out.paf

This is the error that I am receiving:

Loading fast5s
Mapping
HDF5-DIAG: Error detected in HDF5 (1.8.21) thread 0:
#000: H5D.c line 356 in H5Dopen2(): not found
major: Dataset
minor: Object not found
#001: H5Gloc.c line 428 in H5G_loc_find(): can't find object
major: Symbol table
minor: Object not found
#002: H5Gtraverse.c line 859 in H5G_traverse(): internal path traversal failed
major: Symbol table
minor: Object not found
#003: H5Gtraverse.c line 639 in H5G_traverse_real(): traversal operator failed
major: Symbol table
minor: Callback failed
#004: H5Gloc.c line 383 in H5G_loc_find_cb(): object 'Signal' doesn't exist
major: Symbol table
minor: Object not found
Traceback (most recent call last):
File "/usr/local/bin/uncalled", line 436, in <module>
map_cmd(args)
File "/usr/local/bin/uncalled", line 247, in map_cmd
for p in mapper.update():
RuntimeError: /Raw/Reads/Signal: error in H5Dopen
terminate called without an active exception
Aborted

Any ideas? Thanks, I really appreciate it!

barcodes

Does UNCALLED handle barcoded samples? Meaning, does it exclude barcodes in k-mer search of the reference? Does it demultiplex reads into separate files? #32 had some info but these answers weren't clear to me

Floating point exception

Hi, I am trying to map some raw signals in FAST5 to a tiny reference of a few hundred bases. However, I am getting a floating-point exception. The following is what I did. Likely that I am doing something stupid :D Any help on this is appreciated.

uncalled index -o rfc1 rfc1.fa
uncalled map -t40 rfc1 f5.list > uncalled.paf

Loading fast5s
Mapping
Floating point exception (core dumped)

Error message when starting UNCALLED

I installed UNCALLED for all users on our Server.

  1. python3 setup.py install
  2. Even python3 and gcc has the right versions...
    (base) [admin@ms-wiss2602001 ~]$ python3 --version && gcc --version
    Python 3.8.3
    gcc (GCC) 8.3.1 20191121 (Red Hat 8.3.1-5)

uncalled was installed in /software/anaconda/anaconda3/bin while the git repo is in /software/UNCALLED

If I start UNCALLED, I get this error message:

`(base) [admin@ms-wiss2602001 bin]$ pwd
/software/anaconda/anaconda3/bin

(base) [admin@ms-wiss2602001 bin]$ ./uncalled
Traceback (most recent call last):
File "./uncalled", line 4, in
import('pkg_resources').run_script('uncalled==2.2', 'uncalled')
File "/software/anaconda/anaconda3/lib/python3.8/site-packages/pkg_resources/init.py", line 665, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/software/anaconda/anaconda3/lib/python3.8/site-packages/pkg_resources/init.py", line 1463, in run_script
exec(code, namespace, namespace)
File "/home/admin/.local/lib/python3.8/site-packages/uncalled-2.2-py3.8-linux-x86_64.egg/EGG-INFO/scripts/uncalled", line 35, in
import uncalled as unc
File "/home/admin/.local/lib/python3.8/site-packages/uncalled-2.2-py3.8-linux-x86_64.egg/uncalled/init.py", line 1, in
from _uncalled import *
ImportError: /home/admin/.local/lib/python3.8/site-packages/uncalled-2.2-py3.8-linux-x86_64.egg/_uncalled.cpython-38-x86_64-linux-gnu.so: undefined symbol: H5Oget_info
`

What is going wrong?

Thank you for your help.

installation problem

Hi
I am trying to install UNCALLED on our cluster using my own python environment:
git clone --recursive https://github.com/skovaka/UNCALLED.git
python3 setup.py install
and I get this error:
ModuleNotFoundError: No module named 'commands'

and I get also the below warnings (many):
cc1: warning: command line option ‘-std=c++11’ is valid for C++/ObjC++ but not for C [enabled by default]
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: warning: command line option ‘-std=gnu99’ is valid for C/ObjC but not for C++ [enabled by default]

python 3.8.3
gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC)

Running Uncalled using Flongle flow cell

Hi @skovaka ,
Thank you for developing such a great tool for adaptive sampling. However I have a doubt regarding using this tool while running flongle flow cells. I had found one issue stating to change the last channel to 126 but couldn't find the similar function in any of the script. The issue is mention here: #9
I am getting the same error of last channel is too big. Can you please tell me what parameter should I change while running uncalled with flongle flow cells?
Thank you so much for your help.

Read lengths in .paf files

Hi,

I found a "weird" result in .paf files

  • unmapped reads have long Query sequence length (the 2nd column)
  • mapped reads have short Query sequence length
  • mapped reads have similar Query sequence length and Query end coordinate (the 4th column)

Is that a misplaced property for real-time mode? Where to find the "real" read or event length?


For example:

uncalled map -t 16 E.coli my.fast5 > uncalled_out.paf
head -n 4 uncalled_out.paf

Output:

b84a48f0-9e86-47ef-9d20-38a0bded478e    256     125     256     -    ...
77fe7f8c-32d6-4789-9d62-41ff482cf890    92      66      92      -    ...
eee4b762-25dd-4d4a-8a59-be47065029be    4843    *       *       *    ...
e175c87b-a426-4a3f-8dc1-8e7ab5fdd30d    192     143     192     +    ...

Generating sequencing summary from fast5 raw reads

Hi @skovaka
Thank you for developing UNCALLED.

I am wondering how to generate the sequence_summary file that is necessary to run the "uncalled sim" command as described in the README: /path/to/control/fast5s --ctl-seqsum /path/to/control/sequencing_summary.txt. These files don't seem to be provided.
So I have downloaded some E. coli fast5 raw reads, but they unfortunately don't come with the sequencing_summary.txt. To my understanding, the control fast5 files are only used to have the fast5 raw signal in the simulation, so I am also wondering why it relies on fields such as template_duration which is basecaller specific.

Thank you.

Installation on macOS

When installing on Mac, UNCALLED tends to use the gcc under /usr/gcc that came with Xcode. They are either gcc 4.2.1 or clang. My gcc5 was installed via MacPorts. I'd like to find out if you have seen similar issue on macOS (Mojave and Catalina), and is there an instruction on installing UNCALLED on Mac. Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.