Comments (12)
Sorry about that, it looks like a compiler-specific error. Which version of GCC are you using (gcc -v
)? I just pushed a change which I'm hoping will fix the problem, but I can't be sure since I wasn't able to reproduce it.
from uncalled.
Thanks for that. Now I was able to install it successfully. I have come very close to plotting something, but the command to plot is giving me the following error.
File "/home/hasindu/UNCALLED/venv/bin/uncalled", line 33, in <module>
sys.exit(load_entry_point('uncalled==3.2.0', 'console_scripts', 'uncalled')())
File "/home/hasindu/UNCALLED/venv/lib/python3.9/site-packages/uncalled/__main__.py", line 392, in main
ret = fn(conf=conf)
File "/home/hasindu/UNCALLED/venv/lib/python3.9/site-packages/uncalled/vis/dotplot.py", line 388, in dotplot
for read_id, fig in dotplots.iter_plots():
File "/home/hasindu/UNCALLED/venv/lib/python3.9/site-packages/uncalled/vis/dotplot.py", line 64, in iter_plots
yield read_id, self._plot(read_id, tracks)
File "/home/hasindu/UNCALLED/venv/lib/python3.9/site-packages/uncalled/vis/dotplot.py", line 99, in _plot
Sigplot(tracks, conf=self.conf).plot(fig)
File "/home/hasindu/UNCALLED/venv/lib/python3.9/site-packages/uncalled/vis/sigplot.py", line 41, in plot
self.plot_read(fig, read_id, row, col)
File "/home/hasindu/UNCALLED/venv/lib/python3.9/site-packages/uncalled/vis/sigplot.py", line 88, in plot_read
dtw["model_current"] = track.model[dtw["kmer"]]
File "/home/hasindu/UNCALLED/venv/lib/python3.9/site-packages/uncalled/pore_model.py", line 163, in __getitem__
return self.means[self.kmer_array(idx)]
IndexError: index 1364 is out of bounds for axis 0 with size 1024
The commands I issued were:
uncalled index -o index/example_ref test/ecoli_2kb_region/draft.fa
./f5c index -d test/ecoli_2kb_region/fast5_files test/ecoli_2kb_region/reads.fasta -t16 --iop 16
./f5c eventalign -b test/ecoli_2kb_region/reads.sorted.bam -g test/ecoli_2kb_region/draft.fa -r test/ecoli_2kb_region/reads.fasta -t 16 --print-read-names --signal-index --scale-events > test/ecoli_2kb_region/results.tsv
uncalled convert nanopolish index/example_ref test/ecoli_2kb_region/fast5_files/ test/ecoli_2kb_region/results.tsv -x test/ecoli_2kb_region/reads.fasta.index.readdb -o output.db
uncalled dotplot output.db -o output_image1
Note that f5c output is compatible with nanopolish. Infact I tried running nanopolish [../nanopolish-arm/nanopolish eventalign -b test/ecoli_2kb_region/reads.sorted.bam -g test/ecoli_2kb_region/draft.fa -r test/ecoli_2kb_region/reads.fasta -t 16 --print-read-names --signal-index --scale-events > test/ecoli_2kb_region/results.tsv
] as well but still the same error.
If you need the results.tsv, you could launch the following commands to download f5c, test dataset and run eventalign.
VERSION=v1.0
wget "https://github.com/hasindu2008/f5c/releases/download/$VERSION/f5c-$VERSION-binaries.tar.gz" && tar xvf f5c-$VERSION-binaries.tar.gz && cd f5c-$VERSION/
mv ./f5c_x86_64_linux f5c
scripts/test.sh
./f5c eventalign -b test/ecoli_2kb_region/reads.sorted.bam -g test/ecoli_2kb_region/draft.fa -r test/ecoli_2kb_region/reads.fasta -t 16 --print-read-names --signal-index --scale-events > test/ecoli_2kb_region/results.tsv
from uncalled.
Ah, so haven't actually tested DNA nanopolish input yet! I've mostly been working with RNA. I don't think I'm handling DNA reverse complementing properly for nanopolish conversion, and it looks like I need to add a 6-mer DNA model. Sorry about that, I've been meaning to put together a test dataset for r9 DNA. Your f5c test dataset looks great, thanks for sharing it!
I'll work on this and try to get back to you sometime next week. In the meantime, I've added a couple notes in the README on how some functionalities have only been tested on RNA.
Thanks,
Sam
from uncalled.
Right thanks. This time I tried with some direct-RNA data and still got some errors. Any idea what I am doing wrong?
uncalled index -o index/example_ref test/rna/gencode.v35.transcripts.fa
./f5c eventalign -b test/rna/reads.sorted.bam -g test/rna/gencode.v35.transcripts.fa -r test/rna/reads.fastq -t 16 -K 256 -B 2M --print-read-names --signal-index --scale-events --rna > test/rna/result.tsv
uncalled convert nanopolish index/example_ref test/rna/fast5_files/0 test/rna/result.tsv -x test/rna/reads.fastq.index.readdb -o output.db
uncalled dotplot output.db -o outimage.png
Traceback (most recent call last):
File "/home/hasindu/UNCALLED/venv/bin/uncalled", line 33, in <module>
sys.exit(load_entry_point('uncalled==3.2.0', 'console_scripts', 'uncalled')())
File "/home/hasindu/UNCALLED/venv/lib/python3.9/site-packages/uncalled/__main__.py", line 392, in main
ret = fn(conf=conf)
File "/home/hasindu/UNCALLED/venv/lib/python3.9/site-packages/uncalled/vis/dotplot.py", line 388, in dotplot
for read_id, fig in dotplots.iter_plots():
File "/home/hasindu/UNCALLED/venv/lib/python3.9/site-packages/uncalled/vis/dotplot.py", line 64, in iter_plots
yield read_id, self._plot(read_id, tracks)
File "/home/hasindu/UNCALLED/venv/lib/python3.9/site-packages/uncalled/vis/dotplot.py", line 99, in _plot
Sigplot(tracks, conf=self.conf).plot(fig)
File "/home/hasindu/UNCALLED/venv/lib/python3.9/site-packages/uncalled/vis/sigplot.py", line 41, in plot
self.plot_read(fig, read_id, row, col)
File "/home/hasindu/UNCALLED/venv/lib/python3.9/site-packages/uncalled/vis/sigplot.py", line 109, in plot_read
samples = np.arange(samp_min, samp_max)[mask]
IndexError: boolean index did not match indexed array along dimension 0; dimension is 112017 but corresponding boolean dimension is 111831
same error with:
nanopolish eventalign -b test/rna/reads.sorted.bam -g test/rna/gencode.v35.transcripts.fa -r test/rna/reads.fastq -t 16 --print-read-names --signal-index --scale-events > test/rna/result.tsv
The RNA testdataset that I am testing on is available as:
VERSION=v1.0
wget "https://github.com/hasindu2008/f5c/releases/download/$VERSION/f5c-$VERSION-binaries.tar.gz" && tar xvf f5c-$VERSION-binaries.tar.gz && cd f5c-$VERSION/
mv ./f5c_x86_64_linux f5c
wget https://raw.githubusercontent.com/hasindu2008/f5c/master/scripts/test_eventalign.sh && mv test_eventalign.sh scripts/ && chmod +x scripts/test_eventalign.sh
scripts/test_eventalign.sh -e
./f5c eventalign -b test/rna/reads.sorted.bam -g test/rna/gencode.v35.transcripts.fa -r test/rna/reads.fastq -t 16 -K 256 -B 2M --print-read-names --signal-index --scale-events --rna > test/rna/result.tsv
from uncalled.
You need to add a "--rna" option to uncalled convert
. I've been meaning to auto-detect that, which I'll see if I can do next week. I also noticed the long sequence names in your test data don't display well on my plots, which I'll work on too.
from uncalled.
Cool. I was finally able to get it plotted. This dot plot representation is pretty great and useful. I am trying to understand a bit on how you access the signal data. Do you access the fast5 and put all the information during the converting step or do you only access them during the dotplot command? I am interested in making UNCALLED and also getting a BLOW5 file as an input to get the signal from.
from uncalled.
For dotplots I'm loading reads from fast5s so I can plot the full signal scatterplot, although I'm planning to make the optional in the future. That and the dtw
command should be the only times I need to load fast5 data. Everything else is computed based on reference-collapsed statistics stored in the database. convert
only reads the fast5 paths, but doesn't load the raw signal data itself.
I am planning to integrate blow5 soon. Due to slow random access for fast5s, I'm currently loading the whole PAF file (soon to be BAM) into memory and iterating reads in fast5 order. Obviously that's not great for large datasets, so I'm hoping to get faster random access from blow5. Starting with the python API looks straightforward, but I'll let you know if I have any questions.
from uncalled.
I would be able to give you a hand in optimising the parameters for slow5 if that becomes necessary or getting it done at C-level. Random access to slow5 in C-level is also easy - here is an example. But Python API is good to get started and should give a decent performance for your use case, especially that now it allows fetching a batch of read IDs in one call where the underlying implementation does that using multiple threads which is opaque to the user. If am really excited to see your UNCALLED become something like the IGV for signals as this is something that would open the door to many possibilities with direct signal data.
I guess the best approach would be to load the alignment records from BAM or PAF for a given genomic region to memory, then collect the read IDs of those alignments and fetch those read IDs from the BLOW5 file. The reason you are going for BAM is the capability to fetch a region I believe. For PAF also this could be possible, if the PAF file is sorted based on genomic coordinates and bgzip
compressed and indexed using tabix
. Both bgzip
and tabix
come with htslib
.
Another question, in your dotplot, there is this thing called "basecall". How do I enable that option?Is this the one that require the move table to be available in the FAST5?
from uncalled.
Thanks for the tips! I'll definitely let you know if I have any questions.
What I call "projected basecalled alignments" are currently available by aligning basecalled fast5 files (generated by guppy --fast5_out option) using uncalled dtw
. ONT recently added an option to output the necessary "move" metadata to a BAM file, which is part of why I'm working on BAM support which will release soon. I should also add a standalone command to produce the projected basecalled alignments, and maybe think of better name to differentiate from the f5c mode you just released!
from uncalled.
I just pushed a bunch of changes, which includes a fix for importing nanopolish DNA. I added a number of input/output options too, which required changing a bunch of command line arguments, so check the documentation for those changes. Another significant change is I'm no longer supporting PAF input for the dtw
command, only BAM, because Guppy fast5 output is now deprecated and the necessary information is optionally available in Guppy's BAM output. Let me know if you have any other issues
from uncalled.
Yeh cool. What is the easiest way to update my existing installation of UNCALLED? I previously cloned and did a did pip install .
Another question, I tried to load a big dataset with RNA and the uncalled convert seem to take a considerable amount of time. Is this due to FAST5 access or is it for parsing the large tsv?
from uncalled.
Yeah, that definitely needs some optimization. I think TSV parsing is the main bottleneck, especially since I'm still using pandas for that, but I haven't profiled it thoroughly yet
from uncalled.
Related Issues (20)
- undemultiplex fast5 HOT 5
- Implementation on ARM64 Raspberry Pi 4 HOT 3
- barcodes HOT 2
- Read lengths in .paf files HOT 3
- Minimum hardware requirements HOT 4
- MemoryError: std::bad_alloc HOT 5
- Error when running uncalled map HOT 5
- Computer Specifications HOT 1
- [QUERY] Can I test UNCALLED to always try and map only 2000 raw samples from FAST5 HOT 1
- Uncalled in ubuntu 16 HOT 3
- Segmentation fault in 'uncalled sim' HOT 3
- Floating point exception HOT 8
- uncalled failed to connect to minknow instance HOT 3
- Running Uncalled using Flongle flow cell HOT 2
- Updated Uncalled and Minknow No Longer Working HOT 4
- Generating sequencing summary from fast5 raw reads HOT 1
- [QUESTION] mapping to reference stringency HOT 1
- [QUESTION] 10X run-time for reads of 500 raw signals HOT 12
- Visualising f5c resquiggle output in UNCALLED4 HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from uncalled.