novoalab / nanorms Goto Github PK

Prediction of RNA modifications and their stoichiometry from per-read features: current intensity, dwell time and trace (Begik*, Lucas* et al., Nature Biotech 2021)

R 1.20% Python 2.13% Jupyter Notebook 96.64% Shell 0.01% Dockerfile 0.02%

predict-rna-modifications rna-modification-stoichiometry tombo-resquiggling

nanorms's People

Contributors

Stargazers

Watchers

Forkers

lpryszcz obegik septav soniacruciani wvandertoorn aminalem nagarkarsanket jessicam1 lkwhite hamzaib2

nanorms's Issues

FileNotFoundError: [Errno 2] No such file or directory

I came up with the flowing error while running the epinano_rms script. However, I do not know why the tmp_splitted_base_freq.done splitting file can not be found. Can you give some clues?

Traceback (most recent call last):
File "/home/share/yuxin/nanoRMS/epinano_RMS/epinano_rms.py", line 354, in
main()
File "/home/share/yuxin/nanoRMS/epinano_RMS/epinano_rms.py", line 319, in main
touch (".{}.done_splitting".format(tmp_dir))
File "/home/share/yuxin/nanoRMS/epinano_RMS/epinano_rms.py", line 25, in touch
open(fname, 'a').close()
FileNotFoundError: [Errno 2] No such file or directory: './home/share/yuxin/2021Fall/DATA/hm_tmp/HMEC_WT/bam/HMEC_WT_g.tmp_splitted_base_freq.done_splitting'

P.S. commands to run:
python3 /home/share/yuxin/nanoRMS/epinano_RMS/epinano_rms.py
-R /home/share/yuxin/2021Fall/DATA/hm_tmp/hg38.fa
-b /home/share/yuxin/2021Fall/DATA/hm_tmp/HMEC_WT/bam/HMEC_WT_g.bam
-s /home/share/yuxin/nanoRMS/epinano_RMS/sam2tsv.jar -d -n 10

ggplot error

substitute

geom_text_repel(data=subset(subs, score>threshold), aes(Position, score, label=Position,size=3, color="red"), segment.size = 1,segment.color = "black")

with

geom_text_repel(data=subset(subs, score>threshold), aes(Position, score, label=Position**)**,size=3, color="red", segment.size = 1,segment.color = "black")

to avoid failure to print the plot (both scatterplot and barplot) when there is no position above the threshold defined by the plot function

Epinano-RMS

EpinanoRMS script shows the message "KILLED" after it's done. I am just confirming here that this message shows the completion of the job or it is because of any error Please see attached image.

get_features.py: cannot convert float NAN to integer

I came across a problem with get_features.py:

When I test the get_features.py with testing data (yeast date as described in readme), the console showed:
....
4000 - 3999 read skipped: {'cannot convert float NAN to integer': 3588 'No alignment': 411}
....
Anyone has some suggestions to solve this problem?

Thanks in advance!

code in get_features.py

Hi!
Maybe the order of op_len and op is reverse in the 109 line?
And the a.alen in the 107 line should be the sum of the num_match, num_mismatch and num_del, rather than the number of the num_match?

ValueError: Columns must be same length as key

Hi,
thanks for developing this software, I encountered some problems while testing it on the given test dataset during the step1 epinano_rms.py

FileNotFoundError: [Errno 2] No such file or directory: , so I delete the a dot in touch (".{}.done_splitting".format(tmp_dir)) to touch ("{}.done_splitting".format(tmp_dir)), and it worked.
line 93, in proc_small_freq
df[['A', 'C', 'G', 'T']] = df['bases'].str.split(pat=':', expand=True)
File "envs/nanoRMS6/lib/python3.6/site-packages/pandas/core/frame.py", line 3041, in setitem
self._setitem_array(key, value)
File "/envs/nanoRMS6/lib/python3.6/site-packages/pandas/core/frame.py", line 3067, in _setitem_array
raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key

thanks in advance.

Chujie

psU Prediction

Hello,

I want to predict psU modifications in my data using nanoRMS ... To do so, I realized that I need a file containing the modified positions ? I am a bit confused since this is what I am looking for ... which positions are modified, is the prediction without this information reliable ?

Best regards,

Amina

About mkae a RNA_Mod_Positions file

Hello,
I tried nanoRMS for human sample, so I need to make a RNA_Mod_Positions file for human, but I have two question as folloing:

Could you provide a list for what the modification abbreviation mean like the mod column in the RNA_Mod_Positions_rRNAYeast.tsv?
I am not sure I can write modification as same as your mean.
Is necessary to provide all type of modification for each site?
For example, I provide the site of pseudouridine or site of pseudouridine and m6A, will the second one get a better result?

Thanks a lot for you in advance.

Syntax error in epinano_rms.py

Hi,

First of all thanks for writing this piece of software.
I am getting an error even in trying to run the epinano_rms.py. Would you be able to help?

cmd1 = (f"samtools view -h -F 3860 {bam_file} | java -jar {sam2tsv} -r {reference_file} "
^
SyntaxError: invalid syntax

I am using python v 3.6.7 as specified in EpiNano protocol. Thank you!

-Elias

TypeError: write() takes exactly one argument (2 given)

Hi,

When I run python /my/path/to/epinano_RMS/epinano_rms.py -R /my/path/to/curlcakes_ref.fasta -b /my/path/to/UNM-rep1.sorted.bam -s epinano_RMS/sam2tsv, I get this error:

It seems that write() only takes one string, but you've given it two instead. And my python version is 3.8.8.

Thanks for your help in advance!

Nothing output in bam file when running step 3. RNA modification stoichiometry estimation using Tombo resquiggling

Hello!

I tried this step(step 3. RNA modification stoichiometry estimation using Tombo resquiggling) using data(RNA574356_WT45C/workspace/batch0.fast5 and RNA574356_WT30C/workspace/batch0.fast5) with code ( ~/software/nanoRMS/per_read/get_features.py --rna -f Saccharomyces_cerevisiae.R64-1-1_firstcolumn.ncrna.fa -t 25 -i batch0.fast5), the output bam file have nothing but the sam type header without any warning.
Could you tell me what happend?

Thank you in advance.

CSV file for human transcriptome

I am trying to run the epinan0_rms.py script for analyzing human transcriptome data. I have generated the bam files as per the instructions. I found every time out of memory error when I ran the above script.

As per the latest discussion, I re-ran the script with 1TB of total mem and it still failed. I am not sure what is going on and how to fix it. But upon checking the errors in google, I believe it is something to do with the iteritems in the script. I am attaching the error file I got after running the script. Please see what I am missing here.

me in fixing this issue.

Do you think I should do mapping with fewer fastq files followed by epinan0_rms.py script and later, merge all the CSV files of individual samples to obtain the final CSV file?

modification stoichiometry estimation using Nanopolish resquiggling

Hi,
I am trying to get RNA modification and stoichiometry from a bunch of files preprocessed with Nanopolish.
Everything works fine and I manage to get the window file in this format:

(tf_gpu2) labuser@JCSMR-049555LD:~/lib/nanoRMS-master/Part4/test_data_expected_outputs$ head WT_nanopolish_csv_window_file.ts
contig	position	reference_kmer	read_index	event_level_mean	Pos	sample	modification	reference
239861	25s	2873	GATGT	39727	72.25	25s_2873	data1	25s_2880	-7
239862	25s	2873	GATGT	40300	70.07	25s_2873	data1	25s_2880	-7
239863	25s	2873	GATGT	40441	80.5	25s_2873	data1	25s_2880	-7
239864	25s	2873	GATGT	40760	75.07	25s_2873	data1	25s_2880	-7
239865	25s	2873	GATGT	40837	76.17	25s_2873	data1	25s_2880	-7
239866	25s	2873	GATGT	41151	77.35	25s_2873	data1	25s_2880	-7
239867	25s	2873	GATGT	41452	80.45	25s_2873	data1	25s_2880	-7
239868	25s	2873	GATGT	41678	74.12	25s_2873	data1	25s_2880	-7

Now I want to run read_clustering.R but I think I am missing one step before running it because it gives me an error:

R --vanilla < read_clustering.R --args WT_nanopolish_csv_window_file.tsv KO_nanopolish_csv_window_file.tsv kmeans

Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  header and 'col.names' are of different lengths
2: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  header and 'col.names' are of different lengths

I guess that read_clustering.R is expecting another format. I have seen some file examples in test_data that looks like:


unique  Strain  read_index      -7      -6      -5      -4      -3      -2      -1      -0      1       2       3       4       5       6
295671  25s_2880        sn3     39302   NA      111.06  95.1    77.6    106.735 80.105  75.205  NA      NA      75.23   74.17   72.29   74.66
295672  25s_2880        sn3     39303   NA      112.59  98.09   79.8022222222222        103.88  84.09   77.1575 NA      NA      NA      70.65
295673  25s_2880        sn3     39304   115.37  110.87  90.27   77.51   98.99   79.53   72.465  71.44   74      74.34   72.325  71.74   70.94
295674  25s_2880        sn3     39305   111.7525        105.035 88.47   75.84   104.79  77.87   72.875  NA      NA      NA      71.15   72.866
295675  25s_2880        sn3     39307   115.45  113.75  97.36   79.69   NA      87.09   71.32   68.67   75.6    75.54   73.95   71.95   69.656

How can I convert the first file example to the second? I have not seen that step in: https://github.com/novoalab/nanoRMS/blob/master/README_nanoRMS_nanopolish.md
Thanks a lot for your help in advance.

Nothing output to bam file when running step 3. RNA modification stoichiometry estimation using Tombo resquiggling

Hello!

Thank you in advance.

About the directionality of the stoichiometry change

Hello,
In the step "Estimate modification frequency difference between two samples" of "RNA modification stoichiometry estimation using Tombo resquiggling" , you note that "KMEANS does not accurately assign directionality of the stoichiometry change, whereas KNN does". However, when I mutual exchange the input file(like run "per_read/get_freq.py -f $ref -b $f.bed -o $f.bed.tsv.gz -1 a.bam -2 b.bam" and "per_read/get_freq.py -f $ref -b $f.bed -o $f.bed.tsv.gz -1 b.bam -2 a.bam"), the output data of mod_freq diff knn have nothing change.
Should I use mismatch error to get the directionality in mod_freq diff knn like mod_freq diffkmeans?
Thanks you in advance.

Error occured durin epinano_rms.py command

I used below command:-
nohup python3 /home/apps/nanoRMS/epinano_RMS/epinano_rms.py -R /Drive7/circ_nanopore_modification_analysis/NanoRMS/control/Homo_cdna.fa -b /Drive7/circ_nanopore_modification_analysis/NanoRMS/control/control_fullrepli.bam -s /home/apps/nanoRMS/epinano_RMS/sam2tsv.jar &

It gives the following errors:-

bam file not indexed!
starting indexing it
...Traceback (most recent call last):
File "/home/aclab/apps/nanoRMS/epinano_RMS/epinano_rms.py", line 354, in
main()
File "/home/aclab/apps/nanoRMS/epinano_RMS/epinano_rms.py", line 319, in main
touch (".{}.done_splitting".format(tmp_dir))
File "/home/aclab/apps/nanoRMS/epinano_RMS/epinano_rms.py", line 25, in touch
open(fname, 'a').close()
FileNotFoundError: [Errno 2] No such file or directory: './Drive7/circ_nanopore_modification_analysis/NanoRMS/control/control_fullrepli.tmp

Please sort out this issue. Thanks in advance

Object mask error

When I run the /home/share/yuxin/nanoRMS/predict_rna_mod/Pseudou_prediction_singlecondition.R script, It returns the following. In addition an empty Sample1_predicted_y_sites.tsv returned. Can you give some clues?

Attaching package: ‘dplyr’

The following objects are masked from ‘package:plyr’:

arrange, count, desc, failwith, id, mutate, rename, summarise,
summarize

The following objects are masked from ‘package:stats’:

filter, lag

The following objects are masked from ‘package:base’:

intersect, setdiff, setequal, union

Loading required package: grid
Loading required package: futile.logger

Attaching package: ‘data.table’

The following objects are masked from ‘package:dplyr’:

between, first, last

PS. the command to run:
Rscript --vanilla /home/share/yuxin/nanoRMS/predict_rna_mod/Pseudou_prediction_singlecondition.R
-f /home/share/yuxin/2021Fall/DATA/hm_tmp/HMEC_WT/bam/HMEC_WT_g.per.site.baseFreq.csv
-p /home/share/yuxin/nanoRMS/predict_rna_mod/positions/RNA_Mod_Positions_rRNAYeast.tsv

Writing csv from epinano_rms.py takes a long time

Hi,
I have a conda environment set up with the same dependencies/versions as described (and python version as described in original epinano GitHub).
I specify the script to use 10 cpus, and I can see 10 iterations of python3.6 come up in my activity monitor as expected, but the script only seems to use 3 of them at any given time when writing the .freq files, and then once it starts writing the output csv it only uses 1 cpu and takes a long time.
Is this normal behavior?

Noah

Edit: much faster in newer Python. Potentially an issue with the Rosetta emulator slowing things down (arm64 Mac). Running it natively works as long as I force multiprocessing start method as "fork"

modpos

I am having a similar question as GN Tanuj asked earlier. Why do we need the modpos (nanoRMS/predict_rna_mod/positions) file to run the Pseudou_pairedcondition_transcript.R script for the prediction of RNA modifications using our two epinano output CSV files? We are working with human PUS7 knockdown vs wildtype samples but I do not find any publication showing specifically PUS7 mRNA targets in the human transcriptome. What should I do in that case? How can I find PUS7 target mRNA in my samples in the absence of previously identified PUS7 targets in human transcriptome?

Move and trace files

The latest version of guppy (6.4.2) does not provide the move and trace file. How we can go ahead when analyzing data between signal and sequence in the absence of move and trace tables?

Can we use the raw fast5 file as such?

Create sequence dictionary

Hello,
While the previous issue with the typeError got sorted for me, the following started coming up-

"yeast_ref.dict.dict needs to be created using picard.jar CreateSequenceDictionary"

This comes up even though I have created the sequence dictionary with picard.

Nothing output after running a day

Hello,
I run epinano_rms.py for human sample data( 4 Gb bam file , 3 Gb reference fasta file ) , no any output or warning after running a day and I can see this progress in my server but this program with test data can finished after a minute.
I noticed that the tmp_splitted_base_freq dir remain small_.freq file and CHUNK_.txt which number of CHUNK_.txt is lastest small_.freq +1. Beside , the number of files in tmp_splitted_base_freq dir change when I try again.
I guess something break in the program but I can't find why because no any warning. Could you give me some help?
Thanks a lot for your help in advance.

per_reads/get_freq.py show constant high KNN value through all positions

Hello,

I have a question about input fast5 file used for per_read modification stoichiometry calculation here:

When I used the fast5 files described below to calculate RNA modification stoichiometry with Tombo, I found that Ctrl vs F2 always had similar KNN values around 0.22 for every positions (even though no modification was found (by other methods from literature) on those positions), while Ctrl vs F1 shown a expected result.

Ctrl was base-called by guppy4.3.3 during sequencing and then re-called by guppy 4.5.3 later;
F1 was only base-called after sequencing by guppy 4.5.3;
F2 base-calling during sequencing was interrupted and ~85% reads remained raw. Base calling was done later with guppy 4.5.3.

Base calling command was:
~/guppy/bin/guppy_basecaller --input_path ~/Analysis/Nanopore/RawFast5/ --recursive --save_path ~/Analysis/Nanopore/CalledFast5/Test --flowcell FLO-MIN106 --kit SQK-RNA002 --disable_qscore_filtering --fast5_out --device cuda:0

The construction of those fast5 files are:

h5ls -r Ctrl.fast5 | less
/ Group
/read_00011303-a051-4af8-8835-3cf4813adbe6 Group
/read_00011303-a051-4af8-8835-3cf4813adbe6/Analyses Group
/read_00011303-a051-4af8-8835-3cf4813adbe6/Analyses/Basecall_1D_000 Group
/read_00011303-a051-4af8-8835-3cf4813adbe6/Analyses/Basecall_1D_000/BaseCalled_template Group
/read_00011303-a051-4af8-8835-3cf4813adbe6/Analyses/Basecall_1D_000/BaseCalled_template/Fastq Dataset {SCALAR}
/read_00011303-a051-4af8-8835-3cf4813adbe6/Analyses/Basecall_1D_000/Summary Group
/read_00011303-a051-4af8-8835-3cf4813adbe6/Analyses/Basecall_1D_000/Summary/basecall_1d_template Group
/read_00011303-a051-4af8-8835-3cf4813adbe6/Analyses/Basecall_1D_001 Group
/read_00011303-a051-4af8-8835-3cf4813adbe6/Analyses/Basecall_1D_001/BaseCalled_template Group
/read_00011303-a051-4af8-8835-3cf4813adbe6/Analyses/Basecall_1D_001/BaseCalled_template/Fastq Dataset {SCALAR}
/read_00011303-a051-4af8-8835-3cf4813adbe6/Analyses/Basecall_1D_001/BaseCalled_template/Move Dataset {14502}
/read_00011303-a051-4af8-8835-3cf4813adbe6/Analyses/Basecall_1D_001/BaseCalled_template/Trace Dataset {14502, 8}
/read_00011303-a051-4af8-8835-3cf4813adbe6/Analyses/Basecall_1D_001/Summary Group
/read_00011303-a051-4af8-8835-3cf4813adbe6/Analyses/Basecall_1D_001/Summary/basecall_1d_template Group
/read_00011303-a051-4af8-8835-3cf4813adbe6/Analyses/Segmentation_000 Group
/read_00011303-a051-4af8-8835-3cf4813adbe6/Analyses/Segmentation_000/Summary Group
/read_00011303-a051-4af8-8835-3cf4813adbe6/Analyses/Segmentation_000/Summary/segmentation Group
/read_00011303-a051-4af8-8835-3cf4813adbe6/Analyses/Segmentation_001 Group
/read_00011303-a051-4af8-8835-3cf4813adbe6/Analyses/Segmentation_001/Summary Group
/read_00011303-a051-4af8-8835-3cf4813adbe6/Analyses/Segmentation_001/Summary/segmentation Group
/read_00011303-a051-4af8-8835-3cf4813adbe6/Raw Group
/read_00011303-a051-4af8-8835-3cf4813adbe6/Raw/Signal Dataset {157561/Inf}
/read_00011303-a051-4af8-8835-3cf4813adbe6/channel_id Group
/read_00011303-a051-4af8-8835-3cf4813adbe6/context_tags Group
/read_00011303-a051-4af8-8835-3cf4813adbe6/tracking_id Group

h5ls -r F1.fast5 | less
/ Group
/read_00012b00-01f5-49c6-a44e-c6d27ea4623e Group
/read_00012b00-01f5-49c6-a44e-c6d27ea4623e/Analyses Group
/read_00012b00-01f5-49c6-a44e-c6d27ea4623e/Analyses/Basecall_1D_000 Group
/read_00012b00-01f5-49c6-a44e-c6d27ea4623e/Analyses/Basecall_1D_000/BaseCalled_template Group
/read_00012b00-01f5-49c6-a44e-c6d27ea4623e/Analyses/Basecall_1D_000/BaseCalled_template/Fastq Dataset {SCALAR}
/read_00012b00-01f5-49c6-a44e-c6d27ea4623e/Analyses/Basecall_1D_000/BaseCalled_template/Move Dataset {9154}
/read_00012b00-01f5-49c6-a44e-c6d27ea4623e/Analyses/Basecall_1D_000/BaseCalled_template/Trace Dataset {9154, 8}
/read_00012b00-01f5-49c6-a44e-c6d27ea4623e/Analyses/Basecall_1D_000/Summary Group
/read_00012b00-01f5-49c6-a44e-c6d27ea4623e/Analyses/Basecall_1D_000/Summary/basecall_1d_template Group
/read_00012b00-01f5-49c6-a44e-c6d27ea4623e/Analyses/Segmentation_000 Group
/read_00012b00-01f5-49c6-a44e-c6d27ea4623e/Analyses/Segmentation_000/Summary Group
/read_00012b00-01f5-49c6-a44e-c6d27ea4623e/Analyses/Segmentation_000/Summary/segmentation Group
/read_00012b00-01f5-49c6-a44e-c6d27ea4623e/Raw Group
/read_00012b00-01f5-49c6-a44e-c6d27ea4623e/Raw/Signal Dataset {100850/Inf}
/read_00012b00-01f5-49c6-a44e-c6d27ea4623e/channel_id Group
/read_00012b00-01f5-49c6-a44e-c6d27ea4623e/context_tags Group
/read_00012b00-01f5-49c6-a44e-c6d27ea4623e/tracking_id Group

h5ls -r F2.fast5 | less
/ Group
/read_0026764a-0da0-463d-822c-bfc44adfb79c Group
/read_0026764a-0da0-463d-822c-bfc44adfb79c/Analyses Group
/read_0026764a-0da0-463d-822c-bfc44adfb79c/Analyses/Basecall_1D_000 Group
/read_0026764a-0da0-463d-822c-bfc44adfb79c/Analyses/Basecall_1D_000/Summary Group
/read_0026764a-0da0-463d-822c-bfc44adfb79c/Analyses/Basecall_1D_001 Group
/read_0026764a-0da0-463d-822c-bfc44adfb79c/Analyses/Basecall_1D_001/BaseCalled_template Group
/read_0026764a-0da0-463d-822c-bfc44adfb79c/Analyses/Basecall_1D_001/BaseCalled_template/Fastq Dataset {SCALAR}
/read_0026764a-0da0-463d-822c-bfc44adfb79c/Analyses/Basecall_1D_001/BaseCalled_template/Move Dataset {13850}
/read_0026764a-0da0-463d-822c-bfc44adfb79c/Analyses/Basecall_1D_001/BaseCalled_template/Trace Dataset {13850, 8}
/read_0026764a-0da0-463d-822c-bfc44adfb79c/Analyses/Basecall_1D_001/Summary Group
/read_0026764a-0da0-463d-822c-bfc44adfb79c/Analyses/Basecall_1D_001/Summary/basecall_1d_template Group
/read_0026764a-0da0-463d-822c-bfc44adfb79c/Analyses/Segmentation_000 Group
/read_0026764a-0da0-463d-822c-bfc44adfb79c/Analyses/Segmentation_000/Summary Group
/read_0026764a-0da0-463d-822c-bfc44adfb79c/Analyses/Segmentation_001 Group
/read_0026764a-0da0-463d-822c-bfc44adfb79c/Analyses/Segmentation_001/Summary Group
/read_0026764a-0da0-463d-822c-bfc44adfb79c/Analyses/Segmentation_001/Summary/segmentation Group
/read_0026764a-0da0-463d-822c-bfc44adfb79c/Raw Group
/read_0026764a-0da0-463d-822c-bfc44adfb79c/Raw/Signal Dataset {144915/Inf}
/read_0026764a-0da0-463d-822c-bfc44adfb79c/channel_id Group
/read_0026764a-0da0-463d-822c-bfc44adfb79c/context_tags Group
/read_0026764a-0da0-463d-822c-bfc44adfb79c/tracking_id Group

All the rest analysis (get_features and get_freq) were following ReadMe instruction.

Anyone knows how can I solve this problem?

Thanks in advance!

what are the meaning of Guppy output fast5 files's trace and move dataset?

I tried to read the get_feature.py file and found the trace feature was extracted with the move dataset. So I tried to do guppy basecall as the paper indicated and see a trace and move dataset in the output fast5 files. However, I am not sure how to interpret these two dataset. How can I translate the move and trace dataset into the basecall result contained in the fastq file?

getting error at "RNA modification stoichiometry estimation using Tombo resquiggling"

Hi:)

I am interested in trying nanoRMS "RNA modification stoichiometry estimation using Tombo resquiggling" on my data but I am new here and I am having some issues that I would be thankful if you could help me with.

I started with downloading test data:
Is this exactly the command that I use?
(cd per_read && wget https://public-docs.crg.es/enovoa/public/lpryszcz/src/nanoRMS/per_read/guppy3.0.3.hac -q --show-progress -r -c -nc -np -nH --cut-dirs=6 --reject="index.html*")
Here is what I get when I type in while I am in nanRMS folder:

2. When I was not successful above, I used this part of command: wget https://public-docs.crg.es/enovoa/public/lpryszcz/src/nanoRMS/per_read/guppy3.0.3.hac -q and I did not get any error. But I could not find and fast5 to use as the test in the guppy3.0.3.hac file. Is this because I did not transfer the data properly?

When I could not find any .fast5 to test, I used one of my own .fast5s and reference. Here is the commands that I used:
per_read/get_features.py --rna -f GRCh38.p10.genome.fa -t 6 -i FAL60812_pass_b3a7a5d6_9.fast5
But I keep getting this error:

How can I solved this problem?

Thanks you so much in advance:)

Extra period in .{}.done_splitting?

Don't know if this is just my environment, but to run epinano_rms.py, I had to remove the extra initial period from

".{}.done_splitting".format(tmp_dir)

The script ran fine when I did this, so just checking wanted to put this out here.

Error in get_trace_for_reference_bases while running get_features.py

Thanks for writing this tool. However, I am having difficulties using it. I am trying to run get_features.py and I am encountering the following error.

$ per_read/get_features.py --rna --fasta reference/reference.fa --input fast5/*.fast5
[2022-02-23 03:30:10] Processing 60 file(s)... [mem: 135 MB]
Traceback (most recent call last):
File "per_read/get_features.py", line 388, in
main()
File "per_read/get_features.py", line 383, in main
bamfiles = mod_encode(o.input, o.fasta, o.threads, o.rna, o.sensitive)
File "per_read/get_features.py", line 344, in mod_encode
return list(p.starmap(process_fast5, args))
File "per_read/get_features.py", line 307, in process_fast5
tr = get_trace_for_reference_bases(a, res.read, rna) # this takes 189µs (>50%) of time!
File "per_read/get_features.py", line 258, in get_trace_for_reference_bases
s, e = move_pos[qi:qi+2]
ValueError: not enough values to unpack (expected 2, got 0)

When I attempted debugging, I noticed that there was only a single element in the array move_pos whose value was len(trace).
Also, move had a value "none" and np.argwhere(move==1).flatten() returned an empty array.

I appreciate your help in fixing this issue. Happy to provide any further info need. Thanks for your time.

mod file missing from Pseudou_prediction_singlecondition.R

Hi, I am trying to use Pseudou_prediction_singlecondition.R to predict modified sites, but I am running into an issue of missing "mod" file that is not described in the README and is not present in the github files. Would you be able to point me to a description of this mod file or an example?

modpos file missing

Hi
I have generated epinano files for 2 samples and I want to run Pseudou_pairedcondition_transcript.R for prediction of RNA modifications among my two epinano output csv files. But I encounter error of modified positions file (modpos) not found. How can I generate this modpos file.

Best regards

Error in Running Epinano-RMS

Hi.
Thanks for developing this software.
I was able to run epinano_rms.py for one of my bam files without any error. However, when I tried the same steps for another bam file I got these issue that I am not sure how to solve.

Thanks for your help in advance.

Ouput from Pseudou_prediction_pairedcondition_transcript.R

I tested the script Pseudou_prediction_pairedcondition_transcript using your test WT_ncRNA_Normal_Rep1_Epinano.csv and WT_ncRNA_HeatShock_Rep1_Epinano.csv found no difference. Howver, you have shown some of the locations has pseudourdine modification.

I used following commands (attached image).

Please see comparision of my output (left) and yours (right) from Pseudou_prediction_pairedcondition_transcript script from the same input data in the screenshot. Numbers in all the rows are similar but I do not see "YES" in my output.

Thank you,
Mohit

per_read/get_freq.py: The error message says "[ERROR] -m/--mincov must be at least 5!", but the check is for at least 10.

On line 87, it should have been

if o.mincov<5:

Also, the help description for mincov could be corrected too.

Problem with epinano_rms.py

Hi,
this is my first time using nanoRMS, and I had a problem running the following command:
python3 epinano_rms.py -R ../RNA_ref.fa -b ../poke.sorted.bam -s ./epinano_RMS/sam2tsv.jar
The result is as follows:
Traceback (most recent call last):
File "epinano_rms.py", line 354, in
main()
File "epinano_rms.py", line 319, in main
touch (".{}.done_splitting".format(tmp_dir))
File "epinano_rms.py", line 25, in touch
open(fname, 'a').close()
FileNotFoundError: [Errno 2] No such file or directory:
'.../poke.sorted.tmp_splitted_base_freq.done_splitting'

How can I solve this problem?
Thank you in advance.

RNA modification stoichiometry estimation using Tombo resquiggling

Hi,

I am to running this command:
./get_features.py --rna -f ./cc_yeast_rrna.fa -t 6 -i /home/labuser/lib/nanoRMS-master/per_read/test/guppy3.0.3.hac/RNA235629_WT45C/workspace/batch0.fast5
and it works fine, but when I try the same command with another reference and fast5 of my own I get this error:

[2021-07-04 11:35:59] Processing 1 file(s)... [mem:   108 MB]
Traceback (most recent call last):
  File "./get_features.py", line 394, in <module>
    main()
  File "./get_features.py", line 389, in main
    bamfiles = mod_encode(o.input, o.fasta, o.threads, o.rna, o.sensitive)
  File "./get_features.py", line 350, in mod_encode
    return list(p.starmap(process_fast5, args))    
  File "./get_features.py", line 313, in process_fast5
    tr = get_trace_for_reference_bases(a, res.read, rna) # this takes 189µs (>50%) of time!
  File "./get_features.py", line 247, in get_trace_for_reference_bases
    move_pos = np.append(np.argwhere(move==1).flatten(), len(trace)) # add end of trace
TypeError: object of type 'NoneType' has no len()

When I compare the fast5s I realize that mine does not has these two attributes:
dataset /read_ef2a501b-4935-4424-a003-0b2fdd1165c4/Analyses/Basecall_1D_000/BaseCalled_template/Move
dataset /read_ef2a501b-4935-4424-a003-0b2fdd1165c4/Analyses/Basecall_1D_000/BaseCalled_template/Trace
I think this is causing the error but not sure how to fix it. Do you have any suggestions?
Thanks.