Giter VIP home page Giter VIP logo

Comments (12)

skovaka avatar skovaka commented on July 17, 2024

Sorry about that, it looks like a compiler-specific error. Which version of GCC are you using (gcc -v)? I just pushed a change which I'm hoping will fix the problem, but I can't be sure since I wasn't able to reproduce it.

from uncalled.

hasindu2008 avatar hasindu2008 commented on July 17, 2024

Thanks for that. Now I was able to install it successfully. I have come very close to plotting something, but the command to plot is giving me the following error.

  File "/home/hasindu/UNCALLED/venv/bin/uncalled", line 33, in <module>
    sys.exit(load_entry_point('uncalled==3.2.0', 'console_scripts', 'uncalled')())
  File "/home/hasindu/UNCALLED/venv/lib/python3.9/site-packages/uncalled/__main__.py", line 392, in main
    ret = fn(conf=conf)
  File "/home/hasindu/UNCALLED/venv/lib/python3.9/site-packages/uncalled/vis/dotplot.py", line 388, in dotplot
    for read_id, fig in dotplots.iter_plots():
  File "/home/hasindu/UNCALLED/venv/lib/python3.9/site-packages/uncalled/vis/dotplot.py", line 64, in iter_plots
    yield read_id, self._plot(read_id, tracks)
  File "/home/hasindu/UNCALLED/venv/lib/python3.9/site-packages/uncalled/vis/dotplot.py", line 99, in _plot
    Sigplot(tracks, conf=self.conf).plot(fig)
  File "/home/hasindu/UNCALLED/venv/lib/python3.9/site-packages/uncalled/vis/sigplot.py", line 41, in plot
    self.plot_read(fig, read_id, row, col)
  File "/home/hasindu/UNCALLED/venv/lib/python3.9/site-packages/uncalled/vis/sigplot.py", line 88, in plot_read
    dtw["model_current"] = track.model[dtw["kmer"]]
  File "/home/hasindu/UNCALLED/venv/lib/python3.9/site-packages/uncalled/pore_model.py", line 163, in __getitem__
    return self.means[self.kmer_array(idx)]
IndexError: index 1364 is out of bounds for axis 0 with size 1024

The commands I issued were:

uncalled index -o index/example_ref test/ecoli_2kb_region/draft.fa
./f5c index -d test/ecoli_2kb_region/fast5_files test/ecoli_2kb_region/reads.fasta -t16 --iop 16
 ./f5c eventalign -b test/ecoli_2kb_region/reads.sorted.bam -g test/ecoli_2kb_region/draft.fa -r test/ecoli_2kb_region/reads.fasta -t 16 --print-read-names --signal-index --scale-events  > test/ecoli_2kb_region/results.tsv
uncalled convert nanopolish index/example_ref test/ecoli_2kb_region/fast5_files/ test/ecoli_2kb_region/results.tsv  -x test/ecoli_2kb_region/reads.fasta.index.readdb -o output.db
uncalled dotplot output.db -o output_image1

Note that f5c output is compatible with nanopolish. Infact I tried running nanopolish [../nanopolish-arm/nanopolish eventalign -b test/ecoli_2kb_region/reads.sorted.bam -g test/ecoli_2kb_region/draft.fa -r test/ecoli_2kb_region/reads.fasta -t 16 --print-read-names --signal-index --scale-events > test/ecoli_2kb_region/results.tsv] as well but still the same error.

If you need the results.tsv, you could launch the following commands to download f5c, test dataset and run eventalign.

VERSION=v1.0
wget "https://github.com/hasindu2008/f5c/releases/download/$VERSION/f5c-$VERSION-binaries.tar.gz" && tar xvf f5c-$VERSION-binaries.tar.gz && cd f5c-$VERSION/
 mv ./f5c_x86_64_linux f5c
 scripts/test.sh
 ./f5c eventalign -b test/ecoli_2kb_region/reads.sorted.bam -g test/ecoli_2kb_region/draft.fa -r test/ecoli_2kb_region/reads.fasta -t 16 --print-read-names --signal-index --scale-events  > test/ecoli_2kb_region/results.tsv

from uncalled.

skovaka avatar skovaka commented on July 17, 2024

Ah, so haven't actually tested DNA nanopolish input yet! I've mostly been working with RNA. I don't think I'm handling DNA reverse complementing properly for nanopolish conversion, and it looks like I need to add a 6-mer DNA model. Sorry about that, I've been meaning to put together a test dataset for r9 DNA. Your f5c test dataset looks great, thanks for sharing it!

I'll work on this and try to get back to you sometime next week. In the meantime, I've added a couple notes in the README on how some functionalities have only been tested on RNA.

Thanks,
Sam

from uncalled.

hasindu2008 avatar hasindu2008 commented on July 17, 2024

Right thanks. This time I tried with some direct-RNA data and still got some errors. Any idea what I am doing wrong?

uncalled index -o index/example_ref test/rna/gencode.v35.transcripts.fa
 ./f5c eventalign -b test/rna/reads.sorted.bam -g test/rna/gencode.v35.transcripts.fa -r test/rna/reads.fastq -t 16 -K 256 -B 2M --print-read-names --signal-index --scale-events  --rna > test/rna/result.tsv
uncalled convert nanopolish index/example_ref test/rna/fast5_files/0 test/rna/result.tsv -x test/rna/reads.fastq.index.readdb -o output.db
uncalled dotplot output.db -o outimage.png

Traceback (most recent call last):
  File "/home/hasindu/UNCALLED/venv/bin/uncalled", line 33, in <module>
    sys.exit(load_entry_point('uncalled==3.2.0', 'console_scripts', 'uncalled')())
  File "/home/hasindu/UNCALLED/venv/lib/python3.9/site-packages/uncalled/__main__.py", line 392, in main
    ret = fn(conf=conf)
  File "/home/hasindu/UNCALLED/venv/lib/python3.9/site-packages/uncalled/vis/dotplot.py", line 388, in dotplot
    for read_id, fig in dotplots.iter_plots():
  File "/home/hasindu/UNCALLED/venv/lib/python3.9/site-packages/uncalled/vis/dotplot.py", line 64, in iter_plots
    yield read_id, self._plot(read_id, tracks)
  File "/home/hasindu/UNCALLED/venv/lib/python3.9/site-packages/uncalled/vis/dotplot.py", line 99, in _plot
    Sigplot(tracks, conf=self.conf).plot(fig)
  File "/home/hasindu/UNCALLED/venv/lib/python3.9/site-packages/uncalled/vis/sigplot.py", line 41, in plot
    self.plot_read(fig, read_id, row, col)
  File "/home/hasindu/UNCALLED/venv/lib/python3.9/site-packages/uncalled/vis/sigplot.py", line 109, in plot_read
    samples = np.arange(samp_min, samp_max)[mask]
IndexError: boolean index did not match indexed array along dimension 0; dimension is 112017 but corresponding boolean dimension is 111831

same error with:
nanopolish eventalign -b test/rna/reads.sorted.bam -g test/rna/gencode.v35.transcripts.fa -r test/rna/reads.fastq -t 16 --print-read-names --signal-index --scale-events > test/rna/result.tsv

The RNA testdataset that I am testing on is available as:

VERSION=v1.0
wget "https://github.com/hasindu2008/f5c/releases/download/$VERSION/f5c-$VERSION-binaries.tar.gz" && tar xvf f5c-$VERSION-binaries.tar.gz && cd f5c-$VERSION/
 mv ./f5c_x86_64_linux f5c
wget https://raw.githubusercontent.com/hasindu2008/f5c/master/scripts/test_eventalign.sh && mv test_eventalign.sh scripts/ && chmod +x scripts/test_eventalign.sh
scripts/test_eventalign.sh -e
 ./f5c eventalign -b test/rna/reads.sorted.bam -g test/rna/gencode.v35.transcripts.fa -r test/rna/reads.fastq -t 16 -K 256 -B 2M --print-read-names --signal-index --scale-events  --rna > test/rna/result.tsv
 

from uncalled.

skovaka avatar skovaka commented on July 17, 2024

You need to add a "--rna" option to uncalled convert. I've been meaning to auto-detect that, which I'll see if I can do next week. I also noticed the long sequence names in your test data don't display well on my plots, which I'll work on too.

from uncalled.

hasindu2008 avatar hasindu2008 commented on July 17, 2024

Cool. I was finally able to get it plotted. This dot plot representation is pretty great and useful. I am trying to understand a bit on how you access the signal data. Do you access the fast5 and put all the information during the converting step or do you only access them during the dotplot command? I am interested in making UNCALLED and also getting a BLOW5 file as an input to get the signal from.

from uncalled.

skovaka avatar skovaka commented on July 17, 2024

For dotplots I'm loading reads from fast5s so I can plot the full signal scatterplot, although I'm planning to make the optional in the future. That and the dtw command should be the only times I need to load fast5 data. Everything else is computed based on reference-collapsed statistics stored in the database. convert only reads the fast5 paths, but doesn't load the raw signal data itself.

I am planning to integrate blow5 soon. Due to slow random access for fast5s, I'm currently loading the whole PAF file (soon to be BAM) into memory and iterating reads in fast5 order. Obviously that's not great for large datasets, so I'm hoping to get faster random access from blow5. Starting with the python API looks straightforward, but I'll let you know if I have any questions.

from uncalled.

hasindu2008 avatar hasindu2008 commented on July 17, 2024

I would be able to give you a hand in optimising the parameters for slow5 if that becomes necessary or getting it done at C-level. Random access to slow5 in C-level is also easy - here is an example. But Python API is good to get started and should give a decent performance for your use case, especially that now it allows fetching a batch of read IDs in one call where the underlying implementation does that using multiple threads which is opaque to the user. If am really excited to see your UNCALLED become something like the IGV for signals as this is something that would open the door to many possibilities with direct signal data.

I guess the best approach would be to load the alignment records from BAM or PAF for a given genomic region to memory, then collect the read IDs of those alignments and fetch those read IDs from the BLOW5 file. The reason you are going for BAM is the capability to fetch a region I believe. For PAF also this could be possible, if the PAF file is sorted based on genomic coordinates and bgzip compressed and indexed using tabix. Both bgzip and tabix come with htslib.

Another question, in your dotplot, there is this thing called "basecall". How do I enable that option?Is this the one that require the move table to be available in the FAST5?

from uncalled.

skovaka avatar skovaka commented on July 17, 2024

Thanks for the tips! I'll definitely let you know if I have any questions.

What I call "projected basecalled alignments" are currently available by aligning basecalled fast5 files (generated by guppy --fast5_out option) using uncalled dtw. ONT recently added an option to output the necessary "move" metadata to a BAM file, which is part of why I'm working on BAM support which will release soon. I should also add a standalone command to produce the projected basecalled alignments, and maybe think of better name to differentiate from the f5c mode you just released!

from uncalled.

skovaka avatar skovaka commented on July 17, 2024

I just pushed a bunch of changes, which includes a fix for importing nanopolish DNA. I added a number of input/output options too, which required changing a bunch of command line arguments, so check the documentation for those changes. Another significant change is I'm no longer supporting PAF input for the dtw command, only BAM, because Guppy fast5 output is now deprecated and the necessary information is optionally available in Guppy's BAM output. Let me know if you have any other issues

from uncalled.

hasindu2008 avatar hasindu2008 commented on July 17, 2024

Yeh cool. What is the easiest way to update my existing installation of UNCALLED? I previously cloned and did a did pip install .

Another question, I tried to load a big dataset with RNA and the uncalled convert seem to take a considerable amount of time. Is this due to FAST5 access or is it for parsing the large tsv?

from uncalled.

skovaka avatar skovaka commented on July 17, 2024

Yeah, that definitely needs some optimization. I think TSV parsing is the main bottleneck, especially since I'm still using pandas for that, but I haven't profiled it thoroughly yet

from uncalled.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.