novoalab / nanorms Goto Github PK
View Code? Open in Web Editor NEWPrediction of RNA modifications and their stoichiometry from per-read features: current intensity, dwell time and trace (Begik*, Lucas* et al., Nature Biotech 2021)
Prediction of RNA modifications and their stoichiometry from per-read features: current intensity, dwell time and trace (Begik*, Lucas* et al., Nature Biotech 2021)
I came up with the flowing error while running the epinano_rms script. However, I do not know why the tmp_splitted_base_freq.done splitting file can not be found. Can you give some clues?
Traceback (most recent call last):
File "/home/share/yuxin/nanoRMS/epinano_RMS/epinano_rms.py", line 354, in
main()
File "/home/share/yuxin/nanoRMS/epinano_RMS/epinano_rms.py", line 319, in main
touch (".{}.done_splitting".format(tmp_dir))
File "/home/share/yuxin/nanoRMS/epinano_RMS/epinano_rms.py", line 25, in touch
open(fname, 'a').close()
FileNotFoundError: [Errno 2] No such file or directory: './home/share/yuxin/2021Fall/DATA/hm_tmp/HMEC_WT/bam/HMEC_WT_g.tmp_splitted_base_freq.done_splitting'
P.S. commands to run:
python3 /home/share/yuxin/nanoRMS/epinano_RMS/epinano_rms.py
-R /home/share/yuxin/2021Fall/DATA/hm_tmp/hg38.fa
-b /home/share/yuxin/2021Fall/DATA/hm_tmp/HMEC_WT/bam/HMEC_WT_g.bam
-s /home/share/yuxin/nanoRMS/epinano_RMS/sam2tsv.jar -d -n 10
substitute
geom_text_repel(data=subset(subs, score>threshold), aes(Position, score, label=Position,size=3, color="red"), segment.size = 1,segment.color = "black")
with
geom_text_repel(data=subset(subs, score>threshold), aes(Position, score, label=Position**)**,size=3, color="red", segment.size = 1,segment.color = "black")
to avoid failure to print the plot (both scatterplot and barplot) when there is no position above the threshold defined by the plot function
I came across a problem with get_features.py:
When I test the get_features.py with testing data (yeast date as described in readme), the console showed:
....
4000 - 3999 read skipped: {'cannot convert float NAN to integer': 3588 'No alignment': 411}
....
Anyone has some suggestions to solve this problem?
Thanks in advance!
Hi,
thanks for developing this software, I encountered some problems while testing it on the given test dataset during the step1 epinano_rms.py
thanks in advance.
Chujie
Hello,
I want to predict psU modifications in my data using nanoRMS ... To do so, I realized that I need a file containing the modified positions ? I am a bit confused since this is what I am looking for ... which positions are modified, is the prediction without this information reliable ?
Best regards,
Amina
Hello,
I tried nanoRMS for human sample, so I need to make a RNA_Mod_Positions file for human, but I have two question as folloing:
Thanks a lot for you in advance.
Hi,
First of all thanks for writing this piece of software.
I am getting an error even in trying to run the epinano_rms.py. Would you be able to help?
cmd1 = (f"samtools view -h -F 3860 {bam_file} | java -jar {sam2tsv} -r {reference_file} "
^
SyntaxError: invalid syntax
I am using python v 3.6.7 as specified in EpiNano protocol. Thank you!
-Elias
Hi,
When I run python /my/path/to/epinano_RMS/epinano_rms.py -R /my/path/to/curlcakes_ref.fasta -b /my/path/to/UNM-rep1.sorted.bam -s epinano_RMS/sam2tsv
, I get this error:
It seems that write() only takes one string, but you've given it two instead. And my python version is 3.8.8.
Thanks for your help in advance!
Hello!
I tried this step(step 3. RNA modification stoichiometry estimation using Tombo resquiggling) using data(RNA574356_WT45C/workspace/batch0.fast5 and RNA574356_WT30C/workspace/batch0.fast5) with code ( ~/software/nanoRMS/per_read/get_features.py --rna -f Saccharomyces_cerevisiae.R64-1-1_firstcolumn.ncrna.fa -t 25 -i batch0.fast5), the output bam file have nothing but the sam type header without any warning.
Could you tell me what happend?
Thank you in advance.
I am trying to run the epinan0_rms.py script for analyzing human transcriptome data. I have generated the bam files as per the instructions. I found every time out of memory error when I ran the above script.
As per the latest discussion, I re-ran the script with 1TB of total mem and it still failed. I am not sure what is going on and how to fix it. But upon checking the errors in google, I believe it is something to do with the iteritems in the script. I am attaching the error file I got after running the script. Please see what I am missing here.
me in fixing this issue.
Do you think I should do mapping with fewer fastq files followed by epinan0_rms.py script and later, merge all the CSV files of individual samples to obtain the final CSV file?
Hi,
I am trying to get RNA modification and stoichiometry from a bunch of files preprocessed with Nanopolish.
Everything works fine and I manage to get the window file in this format:
(tf_gpu2) labuser@JCSMR-049555LD:~/lib/nanoRMS-master/Part4/test_data_expected_outputs$ head WT_nanopolish_csv_window_file.ts
contig position reference_kmer read_index event_level_mean Pos sample modification reference
239861 25s 2873 GATGT 39727 72.25 25s_2873 data1 25s_2880 -7
239862 25s 2873 GATGT 40300 70.07 25s_2873 data1 25s_2880 -7
239863 25s 2873 GATGT 40441 80.5 25s_2873 data1 25s_2880 -7
239864 25s 2873 GATGT 40760 75.07 25s_2873 data1 25s_2880 -7
239865 25s 2873 GATGT 40837 76.17 25s_2873 data1 25s_2880 -7
239866 25s 2873 GATGT 41151 77.35 25s_2873 data1 25s_2880 -7
239867 25s 2873 GATGT 41452 80.45 25s_2873 data1 25s_2880 -7
239868 25s 2873 GATGT 41678 74.12 25s_2873 data1 25s_2880 -7
Now I want to run read_clustering.R but I think I am missing one step before running it because it gives me an error:
R --vanilla < read_clustering.R --args WT_nanopolish_csv_window_file.tsv KO_nanopolish_csv_window_file.tsv kmeans
Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote, :
header and 'col.names' are of different lengths
2: In read.table(file = file, header = header, sep = sep, quote = quote, :
header and 'col.names' are of different lengths
I guess that read_clustering.R is expecting another format. I have seen some file examples in test_data that looks like:
unique Strain read_index -7 -6 -5 -4 -3 -2 -1 -0 1 2 3 4 5 6
295671 25s_2880 sn3 39302 NA 111.06 95.1 77.6 106.735 80.105 75.205 NA NA 75.23 74.17 72.29 74.66
295672 25s_2880 sn3 39303 NA 112.59 98.09 79.8022222222222 103.88 84.09 77.1575 NA NA NA 70.65
295673 25s_2880 sn3 39304 115.37 110.87 90.27 77.51 98.99 79.53 72.465 71.44 74 74.34 72.325 71.74 70.94
295674 25s_2880 sn3 39305 111.7525 105.035 88.47 75.84 104.79 77.87 72.875 NA NA NA 71.15 72.866
295675 25s_2880 sn3 39307 115.45 113.75 97.36 79.69 NA 87.09 71.32 68.67 75.6 75.54 73.95 71.95 69.656
How can I convert the first file example to the second? I have not seen that step in: https://github.com/novoalab/nanoRMS/blob/master/README_nanoRMS_nanopolish.md
Thanks a lot for your help in advance.
Hello!
I tried this step(step 3. RNA modification stoichiometry estimation using Tombo resquiggling) using data(RNA574356_WT45C/workspace/batch0.fast5 and RNA574356_WT30C/workspace/batch0.fast5) with code ( ~/software/nanoRMS/per_read/get_features.py --rna -f Saccharomyces_cerevisiae.R64-1-1_firstcolumn.ncrna.fa -t 25 -i batch0.fast5), the output bam file have nothing but the sam type header without any warning.
Could you tell me what happend?
Thank you in advance.
Hello,
In the step "Estimate modification frequency difference between two samples" of "RNA modification stoichiometry estimation using Tombo resquiggling" , you note that "KMEANS does not accurately assign directionality of the stoichiometry change, whereas KNN does". However, when I mutual exchange the input file(like run "per_read/get_freq.py -f $ref -b $f.bed -o $f.bed.tsv.gz -1 a.bam -2 b.bam" and "per_read/get_freq.py -f $ref -b $f.bed -o $f.bed.tsv.gz -1 b.bam -2 a.bam"), the output data of mod_freq diff knn have nothing change.
Should I use mismatch error to get the directionality in mod_freq diff knn like mod_freq diffkmeans?
Thanks you in advance.
I used below command:-
nohup python3 /home/apps/nanoRMS/epinano_RMS/epinano_rms.py -R /Drive7/circ_nanopore_modification_analysis/NanoRMS/control/Homo_cdna.fa -b /Drive7/circ_nanopore_modification_analysis/NanoRMS/control/control_fullrepli.bam -s /home/apps/nanoRMS/epinano_RMS/sam2tsv.jar &
It gives the following errors:-
bam file not indexed!
starting indexing it
...Traceback (most recent call last):
File "/home/aclab/apps/nanoRMS/epinano_RMS/epinano_rms.py", line 354, in
main()
File "/home/aclab/apps/nanoRMS/epinano_RMS/epinano_rms.py", line 319, in main
touch (".{}.done_splitting".format(tmp_dir))
File "/home/aclab/apps/nanoRMS/epinano_RMS/epinano_rms.py", line 25, in touch
open(fname, 'a').close()
FileNotFoundError: [Errno 2] No such file or directory: './Drive7/circ_nanopore_modification_analysis/NanoRMS/control/control_fullrepli.tmp
Please sort out this issue. Thanks in advance
When I run the /home/share/yuxin/nanoRMS/predict_rna_mod/Pseudou_prediction_singlecondition.R script, It returns the following. In addition an empty Sample1_predicted_y_sites.tsv returned. Can you give some clues?
Attaching package: ‘dplyr’
The following objects are masked from ‘package:plyr’:
arrange, count, desc, failwith, id, mutate, rename, summarise,
summarize
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
Loading required package: grid
Loading required package: futile.logger
Attaching package: ‘data.table’
The following objects are masked from ‘package:dplyr’:
between, first, last
PS. the command to run:
Rscript --vanilla /home/share/yuxin/nanoRMS/predict_rna_mod/Pseudou_prediction_singlecondition.R
-f /home/share/yuxin/2021Fall/DATA/hm_tmp/HMEC_WT/bam/HMEC_WT_g.per.site.baseFreq.csv
-p /home/share/yuxin/nanoRMS/predict_rna_mod/positions/RNA_Mod_Positions_rRNAYeast.tsv
Hi,
I have a conda environment set up with the same dependencies/versions as described (and python version as described in original epinano GitHub).
I specify the script to use 10 cpus, and I can see 10 iterations of python3.6 come up in my activity monitor as expected, but the script only seems to use 3 of them at any given time when writing the .freq files, and then once it starts writing the output csv it only uses 1 cpu and takes a long time.
Is this normal behavior?
Noah
Edit: much faster in newer Python. Potentially an issue with the Rosetta emulator slowing things down (arm64 Mac). Running it natively works as long as I force multiprocessing start method as "fork"
I am having a similar question as GN Tanuj asked earlier. Why do we need the modpos (nanoRMS/predict_rna_mod/positions) file to run the Pseudou_pairedcondition_transcript.R script for the prediction of RNA modifications using our two epinano output CSV files? We are working with human PUS7 knockdown vs wildtype samples but I do not find any publication showing specifically PUS7 mRNA targets in the human transcriptome. What should I do in that case? How can I find PUS7 target mRNA in my samples in the absence of previously identified PUS7 targets in human transcriptome?
The latest version of guppy (6.4.2) does not provide the move and trace file. How we can go ahead when analyzing data between signal and sequence in the absence of move and trace tables?
Can we use the raw fast5 file as such?
Hello,
While the previous issue with the typeError got sorted for me, the following started coming up-
"yeast_ref.dict.dict needs to be created using picard.jar CreateSequenceDictionary"
This comes up even though I have created the sequence dictionary with picard.
Hello,
I run epinano_rms.py for human sample data( 4 Gb bam file , 3 Gb reference fasta file ) , no any output or warning after running a day and I can see this progress in my server but this program with test data can finished after a minute.
I noticed that the tmp_splitted_base_freq dir remain small_.freq file and CHUNK_.txt which number of CHUNK_.txt is lastest small_.freq +1. Beside , the number of files in tmp_splitted_base_freq dir change when I try again.
I guess something break in the program but I can't find why because no any warning. Could you give me some help?
Thanks a lot for your help in advance.
Hello,
I have a question about input fast5 file used for per_read modification stoichiometry calculation here:
When I used the fast5 files described below to calculate RNA modification stoichiometry with Tombo, I found that Ctrl vs F2 always had similar KNN values around 0.22 for every positions (even though no modification was found (by other methods from literature) on those positions), while Ctrl vs F1 shown a expected result.
Ctrl was base-called by guppy4.3.3 during sequencing and then re-called by guppy 4.5.3 later;
F1 was only base-called after sequencing by guppy 4.5.3;
F2 base-calling during sequencing was interrupted and ~85% reads remained raw. Base calling was done later with guppy 4.5.3.
Base calling command was:
~/guppy/bin/guppy_basecaller --input_path ~/Analysis/Nanopore/RawFast5/ --recursive --save_path ~/Analysis/Nanopore/CalledFast5/Test --flowcell FLO-MIN106 --kit SQK-RNA002 --disable_qscore_filtering --fast5_out --device cuda:0
The construction of those fast5 files are:
h5ls -r Ctrl.fast5 | less
/ Group
/read_00011303-a051-4af8-8835-3cf4813adbe6 Group
/read_00011303-a051-4af8-8835-3cf4813adbe6/Analyses Group
/read_00011303-a051-4af8-8835-3cf4813adbe6/Analyses/Basecall_1D_000 Group
/read_00011303-a051-4af8-8835-3cf4813adbe6/Analyses/Basecall_1D_000/BaseCalled_template Group
/read_00011303-a051-4af8-8835-3cf4813adbe6/Analyses/Basecall_1D_000/BaseCalled_template/Fastq Dataset {SCALAR}
/read_00011303-a051-4af8-8835-3cf4813adbe6/Analyses/Basecall_1D_000/Summary Group
/read_00011303-a051-4af8-8835-3cf4813adbe6/Analyses/Basecall_1D_000/Summary/basecall_1d_template Group
/read_00011303-a051-4af8-8835-3cf4813adbe6/Analyses/Basecall_1D_001 Group
/read_00011303-a051-4af8-8835-3cf4813adbe6/Analyses/Basecall_1D_001/BaseCalled_template Group
/read_00011303-a051-4af8-8835-3cf4813adbe6/Analyses/Basecall_1D_001/BaseCalled_template/Fastq Dataset {SCALAR}
/read_00011303-a051-4af8-8835-3cf4813adbe6/Analyses/Basecall_1D_001/BaseCalled_template/Move Dataset {14502}
/read_00011303-a051-4af8-8835-3cf4813adbe6/Analyses/Basecall_1D_001/BaseCalled_template/Trace Dataset {14502, 8}
/read_00011303-a051-4af8-8835-3cf4813adbe6/Analyses/Basecall_1D_001/Summary Group
/read_00011303-a051-4af8-8835-3cf4813adbe6/Analyses/Basecall_1D_001/Summary/basecall_1d_template Group
/read_00011303-a051-4af8-8835-3cf4813adbe6/Analyses/Segmentation_000 Group
/read_00011303-a051-4af8-8835-3cf4813adbe6/Analyses/Segmentation_000/Summary Group
/read_00011303-a051-4af8-8835-3cf4813adbe6/Analyses/Segmentation_000/Summary/segmentation Group
/read_00011303-a051-4af8-8835-3cf4813adbe6/Analyses/Segmentation_001 Group
/read_00011303-a051-4af8-8835-3cf4813adbe6/Analyses/Segmentation_001/Summary Group
/read_00011303-a051-4af8-8835-3cf4813adbe6/Analyses/Segmentation_001/Summary/segmentation Group
/read_00011303-a051-4af8-8835-3cf4813adbe6/Raw Group
/read_00011303-a051-4af8-8835-3cf4813adbe6/Raw/Signal Dataset {157561/Inf}
/read_00011303-a051-4af8-8835-3cf4813adbe6/channel_id Group
/read_00011303-a051-4af8-8835-3cf4813adbe6/context_tags Group
/read_00011303-a051-4af8-8835-3cf4813adbe6/tracking_id Group
h5ls -r F1.fast5 | less
/ Group
/read_00012b00-01f5-49c6-a44e-c6d27ea4623e Group
/read_00012b00-01f5-49c6-a44e-c6d27ea4623e/Analyses Group
/read_00012b00-01f5-49c6-a44e-c6d27ea4623e/Analyses/Basecall_1D_000 Group
/read_00012b00-01f5-49c6-a44e-c6d27ea4623e/Analyses/Basecall_1D_000/BaseCalled_template Group
/read_00012b00-01f5-49c6-a44e-c6d27ea4623e/Analyses/Basecall_1D_000/BaseCalled_template/Fastq Dataset {SCALAR}
/read_00012b00-01f5-49c6-a44e-c6d27ea4623e/Analyses/Basecall_1D_000/BaseCalled_template/Move Dataset {9154}
/read_00012b00-01f5-49c6-a44e-c6d27ea4623e/Analyses/Basecall_1D_000/BaseCalled_template/Trace Dataset {9154, 8}
/read_00012b00-01f5-49c6-a44e-c6d27ea4623e/Analyses/Basecall_1D_000/Summary Group
/read_00012b00-01f5-49c6-a44e-c6d27ea4623e/Analyses/Basecall_1D_000/Summary/basecall_1d_template Group
/read_00012b00-01f5-49c6-a44e-c6d27ea4623e/Analyses/Segmentation_000 Group
/read_00012b00-01f5-49c6-a44e-c6d27ea4623e/Analyses/Segmentation_000/Summary Group
/read_00012b00-01f5-49c6-a44e-c6d27ea4623e/Analyses/Segmentation_000/Summary/segmentation Group
/read_00012b00-01f5-49c6-a44e-c6d27ea4623e/Raw Group
/read_00012b00-01f5-49c6-a44e-c6d27ea4623e/Raw/Signal Dataset {100850/Inf}
/read_00012b00-01f5-49c6-a44e-c6d27ea4623e/channel_id Group
/read_00012b00-01f5-49c6-a44e-c6d27ea4623e/context_tags Group
/read_00012b00-01f5-49c6-a44e-c6d27ea4623e/tracking_id Group
h5ls -r F2.fast5 | less
/ Group
/read_0026764a-0da0-463d-822c-bfc44adfb79c Group
/read_0026764a-0da0-463d-822c-bfc44adfb79c/Analyses Group
/read_0026764a-0da0-463d-822c-bfc44adfb79c/Analyses/Basecall_1D_000 Group
/read_0026764a-0da0-463d-822c-bfc44adfb79c/Analyses/Basecall_1D_000/Summary Group
/read_0026764a-0da0-463d-822c-bfc44adfb79c/Analyses/Basecall_1D_001 Group
/read_0026764a-0da0-463d-822c-bfc44adfb79c/Analyses/Basecall_1D_001/BaseCalled_template Group
/read_0026764a-0da0-463d-822c-bfc44adfb79c/Analyses/Basecall_1D_001/BaseCalled_template/Fastq Dataset {SCALAR}
/read_0026764a-0da0-463d-822c-bfc44adfb79c/Analyses/Basecall_1D_001/BaseCalled_template/Move Dataset {13850}
/read_0026764a-0da0-463d-822c-bfc44adfb79c/Analyses/Basecall_1D_001/BaseCalled_template/Trace Dataset {13850, 8}
/read_0026764a-0da0-463d-822c-bfc44adfb79c/Analyses/Basecall_1D_001/Summary Group
/read_0026764a-0da0-463d-822c-bfc44adfb79c/Analyses/Basecall_1D_001/Summary/basecall_1d_template Group
/read_0026764a-0da0-463d-822c-bfc44adfb79c/Analyses/Segmentation_000 Group
/read_0026764a-0da0-463d-822c-bfc44adfb79c/Analyses/Segmentation_000/Summary Group
/read_0026764a-0da0-463d-822c-bfc44adfb79c/Analyses/Segmentation_001 Group
/read_0026764a-0da0-463d-822c-bfc44adfb79c/Analyses/Segmentation_001/Summary Group
/read_0026764a-0da0-463d-822c-bfc44adfb79c/Analyses/Segmentation_001/Summary/segmentation Group
/read_0026764a-0da0-463d-822c-bfc44adfb79c/Raw Group
/read_0026764a-0da0-463d-822c-bfc44adfb79c/Raw/Signal Dataset {144915/Inf}
/read_0026764a-0da0-463d-822c-bfc44adfb79c/channel_id Group
/read_0026764a-0da0-463d-822c-bfc44adfb79c/context_tags Group
/read_0026764a-0da0-463d-822c-bfc44adfb79c/tracking_id Group
All the rest analysis (get_features and get_freq) were following ReadMe instruction.
Anyone knows how can I solve this problem?
Thanks in advance!
I tried to read the get_feature.py file and found the trace feature was extracted with the move dataset. So I tried to do guppy basecall as the paper indicated and see a trace and move dataset in the output fast5 files. However, I am not sure how to interpret these two dataset. How can I translate the move and trace dataset into the basecall result contained in the fastq file?
Hi:)
I am interested in trying nanoRMS "RNA modification stoichiometry estimation using Tombo resquiggling" on my data but I am new here and I am having some issues that I would be thankful if you could help me with.
Thanks you so much in advance:)
Don't know if this is just my environment, but to run epinano_rms.py, I had to remove the extra initial period from
".{}.done_splitting".format(tmp_dir)
The script ran fine when I did this, so just checking wanted to put this out here.
Thanks for writing this tool. However, I am having difficulties using it. I am trying to run get_features.py and I am encountering the following error.
$ per_read/get_features.py --rna --fasta reference/reference.fa --input fast5/*.fast5
[2022-02-23 03:30:10] Processing 60 file(s)... [mem: 135 MB]
Traceback (most recent call last):
File "per_read/get_features.py", line 388, in
main()
File "per_read/get_features.py", line 383, in main
bamfiles = mod_encode(o.input, o.fasta, o.threads, o.rna, o.sensitive)
File "per_read/get_features.py", line 344, in mod_encode
return list(p.starmap(process_fast5, args))
File "per_read/get_features.py", line 307, in process_fast5
tr = get_trace_for_reference_bases(a, res.read, rna) # this takes 189µs (>50%) of time!
File "per_read/get_features.py", line 258, in get_trace_for_reference_bases
s, e = move_pos[qi:qi+2]
ValueError: not enough values to unpack (expected 2, got 0)
When I attempted debugging, I noticed that there was only a single element in the array move_pos whose value was len(trace).
Also, move had a value "none" and np.argwhere(move==1).flatten() returned an empty array.
I appreciate your help in fixing this issue. Happy to provide any further info need. Thanks for your time.
Hi, I am trying to use Pseudou_prediction_singlecondition.R to predict modified sites, but I am running into an issue of missing "mod" file that is not described in the README and is not present in the github files. Would you be able to point me to a description of this mod file or an example?
Hi
I have generated epinano files for 2 samples and I want to run Pseudou_pairedcondition_transcript.R for prediction of RNA modifications among my two epinano output csv files. But I encounter error of modified positions file (modpos) not found. How can I generate this modpos file.
Best regards
I tested the script Pseudou_prediction_pairedcondition_transcript using your test WT_ncRNA_Normal_Rep1_Epinano.csv and WT_ncRNA_HeatShock_Rep1_Epinano.csv found no difference. Howver, you have shown some of the locations has pseudourdine modification.
I used following commands (attached image).
Please see comparision of my output (left) and yours (right) from Pseudou_prediction_pairedcondition_transcript script from the same input data in the screenshot. Numbers in all the rows are similar but I do not see "YES" in my output.
Thank you,
Mohit
On line 87, it should have been
if o.mincov<5:
Also, the help description for mincov could be corrected too.
Hi,
this is my first time using nanoRMS, and I had a problem running the following command:
python3 epinano_rms.py -R ../RNA_ref.fa -b ../poke.sorted.bam -s ./epinano_RMS/sam2tsv.jar
The result is as follows:
Traceback (most recent call last):
File "epinano_rms.py", line 354, in
main()
File "epinano_rms.py", line 319, in main
touch (".{}.done_splitting".format(tmp_dir))
File "epinano_rms.py", line 25, in touch
open(fname, 'a').close()
FileNotFoundError: [Errno 2] No such file or directory:
'.../poke.sorted.tmp_splitted_base_freq.done_splitting'
How can I solve this problem?
Thank you in advance.
Hi,
I am to running this command:
./get_features.py --rna -f ./cc_yeast_rrna.fa -t 6 -i /home/labuser/lib/nanoRMS-master/per_read/test/guppy3.0.3.hac/RNA235629_WT45C/workspace/batch0.fast5
and it works fine, but when I try the same command with another reference and fast5 of my own I get this error:
[2021-07-04 11:35:59] Processing 1 file(s)... [mem: 108 MB]
Traceback (most recent call last):
File "./get_features.py", line 394, in <module>
main()
File "./get_features.py", line 389, in main
bamfiles = mod_encode(o.input, o.fasta, o.threads, o.rna, o.sensitive)
File "./get_features.py", line 350, in mod_encode
return list(p.starmap(process_fast5, args))
File "./get_features.py", line 313, in process_fast5
tr = get_trace_for_reference_bases(a, res.read, rna) # this takes 189µs (>50%) of time!
File "./get_features.py", line 247, in get_trace_for_reference_bases
move_pos = np.append(np.argwhere(move==1).flatten(), len(trace)) # add end of trace
TypeError: object of type 'NoneType' has no len()
When I compare the fast5s I realize that mine does not has these two attributes:
dataset /read_ef2a501b-4935-4424-a003-0b2fdd1165c4/Analyses/Basecall_1D_000/BaseCalled_template/Move
dataset /read_ef2a501b-4935-4424-a003-0b2fdd1165c4/Analyses/Basecall_1D_000/BaseCalled_template/Trace
I think this is causing the error but not sure how to fix it. Do you have any suggestions?
Thanks.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.