goekelab / m6anet Goto Github PK

Detection of m6A from direct RNA-Seq data

Home Page: https://m6anet.readthedocs.io/

License: MIT License

Python 100.00%

m6anet's Issues

Pre-training model provided by m6anet for predict m6a sites

Hi,
Thanks for releasing the model for m6anet. The pre-training model provided by m6anet for predicting m6a sites is trained in a single replicate (replicate 2 run 1) of the HCT116 cell line. Is that correct? Looking foward for your review. Thank you very much.

m6anet -train on demo data : indexer out of bound error

Hi developers,
After steps of installation, dataprep, run_inference based on the documentation and your demo data , there is an error in the training session which indicates a common index out of bound issue. I have checked relevant scripts and run the procedure several times ,but cannot figure out what is wrong.Could you please help to solve this problem?
Many thanks

cannot convert float NaN to integer

Dear developer:
the first i run the command as:
nanopolish eventalign --reads *.fastq --bam *.sorted.bam --genome *.transcript.fa --scale-events --signal-index --summary sequencing_summary.txt --threads 20 > TAIR.eventalign.txt

And I have the result like this:
AT5G67640.1 730 CAAAT 1075535 t 755 73.10 1.352 0.00266 NNNNN 0.00 0.00 inf 53612 53620 AT5G67640.1 730 CAAAT 1075535 t 756 72.66 1.133 0.00465 NNNNN 0.00 0.00 inf 53598 53612 AT5G67640.1 733 ATTAT 1075535 t 757 84.22 1.281 0.00631 ATTAT 85.44 2.46 -0.45 53579 53598 AT5G67640.1 733 ATTAT 1075535 t 758 86.30 1.625 0.00398 ATTAT 85.44 2.46 0.32 53567 53579 AT5G67640.1 733 ATTAT 1075535 t 759 83.94 2.413 0.00764 ATTAT 85.44 2.46 -0.55 53544 53567 AT5G67640.1 733 ATTAT 1075535 t 760 85.55 2.191 0.00199 ATTAT 85.44 2.46 0.04 53538 53544 AT5G67640.1 733 ATTAT 1075535 t 761 88.30 1.888 0.00332 ATTAT 85.44 2.46 1.05 53528 53538 AT5G67640.1 733 ATTAT 1075535 t 762 104.53 3.602 0.00896 NNNNN 0.00 0.00 inf 53501 53528 AT5G67640.1 733 ATTAT 1075535 t 763 111.75 3.878 0.01129 NNNNN 0.00 0.00 inf 53467 53501 AT5G67640.1 733 ATTAT 1075535 t 764 120.04 2.636 0.00299 ATTAT 85.44 2.46 12.66 53458 53467
And then when I run the command like "m6anet-dataprep --eventalign TAIR.eventalign.txt --out_dir ./output --n_processes 20", I have the problem like this:
Process Consumer-9: Traceback (most recent call last): File "/home/wwkong/miniconda3/envs/m6anet/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/home/wwkong/miniconda3/envs/m6anet/lib/python3.8/site-packages/m6anet-1.1.1-py3.8.egg/m6anet/scripts/helper.py", line 113, in run result = self.task_function(*next_task_args,self.locks) File "/home/wwkong/miniconda3/envs/m6anet/lib/python3.8/site-packages/m6anet-1.1.1-py3.8.egg/m6anet/scripts/dataprep.py", line 103, in index f_index.write('%s,%d,%d,%d\n' %(transcript_id,read_index,pos_start,pos_end)) ValueError: cannot convert float NaN to integer

Can you help me? Best wishes.

NUM_NEIGHBORING_FEATURES

Hey Christoph,
Thanks a lot for the tool.
have you played around using different NUM_NEIGHBORING_FEATURES for extracting during dataprep?
Are they interesting results?

Best

-Alex

m6anet-run_inference error

Dear m6Anet
Thank you and congratulations on getting m6Anet published in Nat. methods.
I tried m6anet-run_inference on single replicate and it worked well. But, when I have given three replicates of a sample (wildtype), a error came, similar to #38 .
The below is my script-

module load m6anet

m6anet-run_inference --input_dir /scratch/project_mnt/S0077/xPore/rawdata/xpore_data/m6anet/dataprep/wt-rep1
/scratch/project_mnt/S0077/xPore/rawdata/xpore_data/m6anet/dataprep/wt-rep2
/scratch/project_mnt/S0077/xPore/rawdata/xpore_data/m6anet/dataprep/wt-rep3 \

                 --out_dir /scratch/project_mnt/S0077/xPore/rawdata/xpore_data/m6anet/m6a_infer \
                 --infer_mod_rate \
                 --n_processes 1

The screenshot of the error is given below-

I ran the job in HPC cluster server using a PBS script, and the m6anet was loaded through module load m6anet. I do not know how to add your suggestion in the issue #38 "Inserting the torch.multiprocessing.set_sharing_strategy('file_system') line in the /home/stakatis/miniconda3/envs/m6anet/lib/python3.9/site-packages/m6anet/scripts/run_inference.py ".

Could you please advise any solution to this problem?
Kind regards
Reza

Previously generated xpore-dataprep files suitable for m6anet inference

This seems like a great tool and I could work through the demo data with no issues. I had actually generated some data using the xpore repository on my own data. Am I right in saying the data-prep in that is essentially the same in the m6anet tool set? If so can I used the files I've already generated with m6anet-inference. Or is the m6anet dataprep install different to the xpore version?

Is it possible to site-specific differential analysis after running m6anet-inference?

May I ask what guppy version have you used to train? I currently have lots of DRS data basecalled with v3.4.5 but I think they may have updated the accuracy slightly in v3.6-ish ? You'd expect similar accuracy for m6A calls using different guppy versions?

Dataprep. PerformanceWarning: indexing past lexsort depth may impact performance

Question: How to avoid or silence this warning?
Issue: Slow m6anet-dataprep process
Error 1 which occured 6181 times:

/home/stakatis/miniconda3/envs/m6anet/lib/python3.9/site-packages/m6anet/scripts/dataprep.py:101: PerformanceWarning: indexing past lexsort depth may impact performance.
pos_end += eventalign_result.loc[index]['line_length'].sum()

Error 2 which occured 1 time:

/home/stakatis/miniconda3/envs/m6anet/lib/python3.9/site-packages/m6anet/scripts/dataprep.py:143: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

Command: m6anet-dataprep --eventalign ${EVENTALIGN} --out_dir ${DATAPREP_DIR} --n_processes ${CPUS} --readcount_min 2 --min_segment_count 2
File size in ${EVENTALIGN}:

565G; run6_U87.eventalign.txt

Additional info:
m6anet version: 1.1.0 from pypi
nanopolish_version: 0.13.2 from bioconda
OS: CentOS Linux 7 (Core)
Experiment: the whole flowcell was used for 1 sample (dRNA-seq run), which generated 216 .fast5 files

data.result & biological replicates

Hi @chrishendra93

I performed the latest m6anet on all my samples, and the results are quite useful for me to design further experiments and compare the results to other software.

However, I had some questions:

minimum read count threshold (in m6anet-run_inference):

We had sequenced 6 Nanopore DRS libraries and obtained 2 ~ 2.5 million "aligned reads" for each library. The median number of aligned reads is about 25 (aligned reads per gene) in our samples. So, in our case, about half of the expressed genes would be directly excluded from the final results under the default criteria, just because the aligned reads at these genes were less than 20. It would cause some bias in interpreting the information of transcriptome-wide m6A sites, only the sites in "abundant genes" could pass the threshold. (The problem might be easily solved by improvement of the throughput in the future)

Further, the "aligned reads" is largely affected by the throughput of libraries. If gene_A1 has 21 reads in Replicate.1; 18 reads in Replicate.2; 19 reads in Replicate.3. It's obvious that only the sites in Replicate.1 would pass the threshold, while, all the other m6A sites in Replicate.2 & Replicate.3 would be lost in the final results. (We had encounter such an issue for some critical genes)

I had read the issue of #13, and know it's hard to implement such a setting due to the model were trained ready for "minimum read count threshold = 20". So, is it possible (or is it proper?) to take all the biological replicates into account at the same time? (ex: All reads in gene_A = 21+18+19, then using these 58 reads for analysis)

DRACH motif

In mammalians, DRACH motif is the most conserved consensus sequence of m6A site, however, "RRACH" motif is announced to be the most in plants.

So, it would be great for plant biologists (like me) if there's a column for recording the type of motif (GGACA, AAACT, etc) in data.result.csv.

Feel free to let me know if the questions above are not reasonable.

Many thanks

YCCHEN

PerformanceWarning & AssertionError in m6anet-dataprep

Hi,

I performed m6anet-dataprep few days ago with following commands:

(base) ycc@ycc-UBUNTU:~$ m6anet-dataprep --eventalign ~/Desktop/WT_R2/WT_Dark_R2_eventalign.txt \
>                 --out_dir ~/Desktop/WT_R2/m6anet/ \
>                 --n_processes 8

Then I got the following error messages, the script was pausing with AssertionError

/home/ycc/anaconda3/lib/python3.8/site-packages/m6anet/scripts/dataprep.py:100: PerformanceWarning: indexing past lexsort depth may impact performance.
  pos_end += eventalign_result.loc[index]['line_length'].sum()
/home/ycc/anaconda3/lib/python3.8/site-packages/m6anet/scripts/dataprep.py:100: PerformanceWarning: indexing past lexsort depth may impact performance.
  pos_end += eventalign_result.loc[index]['line_length'].sum()
/home/ycc/anaconda3/lib/python3.8/site-packages/m6anet/scripts/dataprep.py:100: PerformanceWarning: indexing past lexsort depth may impact performance.
  pos_end += eventalign_result.loc[index]['line_length'].sum()
/home/ycc/anaconda3/lib/python3.8/site-packages/m6anet/scripts/dataprep.py:100: PerformanceWarning: indexing past lexsort depth may impact performance.
  pos_end += eventalign_result.loc[index]['line_length'].sum()
/home/ycc/anaconda3/lib/python3.8/site-packages/m6anet/scripts/dataprep.py:100: PerformanceWarning: indexing past lexsort depth may impact performance.
  pos_end += eventalign_result.loc[index]['line_length'].sum()
/home/ycc/anaconda3/lib/python3.8/site-packages/m6anet/scripts/dataprep.py:100: PerformanceWarning: indexing past lexsort depth may impact performance.
  pos_end += eventalign_result.loc[index]['line_length'].sum()
/home/ycc/anaconda3/lib/python3.8/site-packages/m6anet/scripts/dataprep.py:100: PerformanceWarning: indexing past lexsort depth may impact performance.
  pos_end += eventalign_result.loc[index]['line_length'].sum()
/home/ycc/anaconda3/lib/python3.8/site-packages/m6anet/scripts/dataprep.py:100: PerformanceWarning: indexing past lexsort depth may impact performance.
  pos_end += eventalign_result.loc[index]['line_length'].sum()
/home/ycc/anaconda3/lib/python3.8/site-packages/m6anet/scripts/dataprep.py:142: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  chunk_split['line_length'] = np.array(lines)
Process Consumer-10:
Traceback (most recent call last):
  File "/home/ycc/anaconda3/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/ycc/anaconda3/lib/python3.8/site-packages/m6anet/scripts/helper.py", line 113, in run
    result = self.task_function(*next_task_args,self.locks)
  File "/home/ycc/anaconda3/lib/python3.8/site-packages/m6anet/scripts/dataprep.py", line 304, in preprocess_tx
    assert(len(kmer) == 1)
AssertionError
Process Consumer-16:
Traceback (most recent call last):
  File "/home/ycc/anaconda3/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/ycc/anaconda3/lib/python3.8/site-packages/m6anet/scripts/helper.py", line 113, in run
    result = self.task_function(*next_task_args,self.locks)
  File "/home/ycc/anaconda3/lib/python3.8/site-packages/m6anet/scripts/dataprep.py", line 304, in preprocess_tx
    assert(len(kmer) == 1)
AssertionError
Process Consumer-12:
Traceback (most recent call last):
  File "/home/ycc/anaconda3/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/ycc/anaconda3/lib/python3.8/site-packages/m6anet/scripts/helper.py", line 113, in run
    result = self.task_function(*next_task_args,self.locks)
  File "/home/ycc/anaconda3/lib/python3.8/site-packages/m6anet/scripts/dataprep.py", line 304, in preprocess_tx
    assert(len(kmer) == 1)
AssertionError
Process Consumer-9:
Traceback (most recent call last):
  File "/home/ycc/anaconda3/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/ycc/anaconda3/lib/python3.8/site-packages/m6anet/scripts/helper.py", line 113, in run
    result = self.task_function(*next_task_args,self.locks)
  File "/home/ycc/anaconda3/lib/python3.8/site-packages/m6anet/scripts/dataprep.py", line 304, in preprocess_tx
    assert(len(kmer) == 1)
AssertionError
Process Consumer-13:
Traceback (most recent call last):
  File "/home/ycc/anaconda3/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/ycc/anaconda3/lib/python3.8/site-packages/m6anet/scripts/helper.py", line 113, in run
    result = self.task_function(*next_task_args,self.locks)
  File "/home/ycc/anaconda3/lib/python3.8/site-packages/m6anet/scripts/dataprep.py", line 304, in preprocess_tx
    assert(len(kmer) == 1)
AssertionError
Process Consumer-15:
Traceback (most recent call last):
  File "/home/ycc/anaconda3/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/ycc/anaconda3/lib/python3.8/site-packages/m6anet/scripts/helper.py", line 113, in run
    result = self.task_function(*next_task_args,self.locks)
  File "/home/ycc/anaconda3/lib/python3.8/site-packages/m6anet/scripts/dataprep.py", line 304, in preprocess_tx
    assert(len(kmer) == 1)
AssertionError
Process Consumer-14:
Traceback (most recent call last):
  File "/home/ycc/anaconda3/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/ycc/anaconda3/lib/python3.8/site-packages/m6anet/scripts/helper.py", line 113, in run
    result = self.task_function(*next_task_args,self.locks)
  File "/home/ycc/anaconda3/lib/python3.8/site-packages/m6anet/scripts/dataprep.py", line 304, in preprocess_tx
    assert(len(kmer) == 1)
AssertionError
Process Consumer-11:
Traceback (most recent call last):
  File "/home/ycc/anaconda3/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/ycc/anaconda3/lib/python3.8/site-packages/m6anet/scripts/helper.py", line 113, in run
    result = self.task_function(*next_task_args,self.locks)
  File "/home/ycc/anaconda3/lib/python3.8/site-packages/m6anet/scripts/dataprep.py", line 304, in preprocess_tx
    assert(len(kmer) == 1)
AssertionError

The script just stuck at here for ~24 hours, so I interrupt the program. And I only got 25 genes in my data.log

Willing to provide more details if needed, thanks!

Best regards
YCCHEN

What are these kmers?

I have been using m6Anet a lot and I get out that table with position, prob, kmer, etc. I have recently pulled out information on just one transcript (randomly chosen) and when I look at the positions in that transcript that are methylated they do not correspond to the kmer that is output by the table. I have done this for multiple transcripts. Additionally, if I sum up all the kmers over one transcript and get a count on how often each one appears, this does not match up with the number of times the kmer actually appears in the gene/transcript. Am I missing something?

How can I get the m6a sites data please?

Dear developers, sorry to bother you. I have some questions about m6anet.
I was recently doing an analysis on m6a and want to get M6a site modification data for different cell lines (K562, MCF7, HepG2, HCT116, A549), and you mentioned in 《Detection of m6A from direct RNA sequencing using a multiple instance learning framework》, I want to know how can I get the “data.readcount.labeledl” data as provided in the demo folder in your GitHub project?

Looking forward to your reply.
Best Regards!

genomic position

Hello,

I have used m6anet to identified the m6a modification in direct RNA reads of honey bees, and I got I table like this:

I added the column id_pos.

I want to see the distribution of my m6A sites across 5' UTR CDS and 3' UTR, I found the code used on your article, which is in the codeocean clapsule. The table used to create the figure needs a transcript annotations file and a .csv file with genomic position. However, when running m6anet the .csv file only has transcript_positions.

Is there anyway to obtain the genomic positions as well?, and also how can I create the transcript annotation file that contains the end of the 5' utr end of CDS and end tx.?

Thank you and I'm looking forward to hearing from you.

Best,
Camila

m6anet-dataprep doesn't generate data.readcount.labelled file

Dear developers,

when running m6anet-train I get the following error:

m6anet-train --model_config /home/diaz/anaconda3/envs/xpore2.1/lib/python3.10/site-packages/m6anet/model/configs/model_configs/prod_pooling.toml --train_config /home/diaz/anaconda3/envs/xpore2.1/lib/python3.10/site-packages/m6anet/model/configs/training_configs/oversampled.toml --save_dir ./m6anet_train_sham --device cpu --lr 0.0001 --seed 25 --epochs 30 --num_workers 32 --save_per_epoch 1 --num_iterations 5
Saving training information to ./m6anet_train_sham
Traceback (most recent call last):
  File "/home/diaz/anaconda3/envs/xpore2.1/bin/m6anet-train", line 8, in <module>
    sys.exit(main())
  File "/home/diaz/anaconda3/envs/xpore2.1/lib/python3.10/site-packages/m6anet/scripts/train.py", line 111, in main
    train_and_save(args)
  File "/home/diaz/anaconda3/envs/xpore2.1/lib/python3.10/site-packages/m6anet/scripts/train.py", line 69, in train_and_save
    train_dl, val_dl, test_dl = build_dataloader(train_config, num_workers)
  File "/home/diaz/anaconda3/envs/xpore2.1/lib/python3.10/site-packages/m6anet/utils/builder.py", line 43, in build_dataloader
    train_ds = build_dataset(train_config["dataset"], mode='Train')
  File "/home/diaz/anaconda3/envs/xpore2.1/lib/python3.10/site-packages/m6anet/utils/builder.py", line 24, in build_dataset
    return NanopolishDS(**config, mode=mode)
  File "/home/diaz/anaconda3/envs/xpore2.1/lib/python3.10/site-packages/m6anet/utils/data_utils.py", line 37, in __init__
    self.initialize_data_info(root_dir, min_reads)
  File "/home/diaz/anaconda3/envs/xpore2.1/lib/python3.10/site-packages/m6anet/utils/data_utils.py", line 92, in initialize_data_info
    read_count = pd.read_csv(os.path.join(fpath, "data.readcount.labelled"))
  File "/home/diaz/anaconda3/envs/xpore2.1/lib/python3.10/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/home/diaz/anaconda3/envs/xpore2.1/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 680, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/home/diaz/anaconda3/envs/xpore2.1/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 575, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/home/diaz/anaconda3/envs/xpore2.1/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 933, in __init__
    self._engine = self._make_engine(f, self.engine)
  File "/home/diaz/anaconda3/envs/xpore2.1/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1217, in _make_engine
    self.handles = get_handle(  # type: ignore[call-overload]
  File "/home/diaz/anaconda3/envs/xpore2.1/lib/python3.10/site-packages/pandas/io/common.py", line 789, in get_handle
    handle = open(
FileNotFoundError: [Errno 2] No such file or directory: '/nanopore/xpore_analysis/m6anet_dataprep_sham/data.readcount.labelled'

The content of the m6anet-dataprep output folder is:
data.index data.json data.log data.readcount eventalign.index

So there isn't any file called data.readcount.labelled.

Could you be so kind to help me understand why am I getting this issue?

Thank you so much in advance,
Núria

Error occurred during installation with pip and setup.py

Hi developer!

Thank you for developing a useful tool.
I have a question about the installation of m6anet.

Based on the document (https://m6anet.readthedocs.io/en/latest/installation.html), as m6Anet requires [Python version 3.8 or higher]to run, I loaded python module (python3/3.9.6) and attempted to install it with pip.
However, I got the error message below.
From the error message, it appears that the program has numerous dependencies on software that is not mentioned in the installation instructions or documentation that I can find. These appear to include lapack, openblas, flame, atlas, and accelerate. It's not clear to me whether it just needs the libraries themselves, or if it also needs additional python packages to interact with those libraries.
Could you please give me an idea of how to fix this problem and install the tool?

[ek81w@c40b06 m6Anet]$ module load python3/3.9.6
python3 3.9.6 is located under /share/pkg/python3/3.9.6
[ek81w@c40b06 m6Anet]$ pip install m6anet
Defaulting to user installation because normal site-packages is not writeable
Collecting m6anet
Downloading m6anet-1.1.0-py3-none-any.whl (119 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 119.7/119.7 kB 3.1 MB/s eta 0:00:00
Collecting scipy>=1.4.1
Using cached scipy-1.8.1.tar.gz (38.2 MB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... error
error: subprocess-exited-with-error

× Preparing metadata (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [220 lines of output]
setup.py:486: UserWarning: Unrecognized setuptools command ('dist_info --egg-base /tmp/pip-modern-metadata-iyxsjlrm'), proceeding with generating Cython sources and expanding templates
warnings.warn("Unrecognized setuptools command ('{}'), proceeding with "
Running from SciPy source directory.
Running scipy/stats/_generate_pyx.py
Running scipy/special/_generate_pyx.py
Running scipy/linalg/_generate_pyx.py
Processing scipy/sparse/_csparsetools.pyx.in
Processing scipy/sparse/csgraph/_matching.pyx
Processing scipy/sparse/csgraph/_flow.pyx
Processing scipy/sparse/csgraph/_tools.pyx
Processing scipy/sparse/csgraph/_shortest_path.pyx
Processing scipy/sparse/csgraph/_traversal.pyx
Processing scipy/sparse/csgraph/_reordering.pyx
Processing scipy/sparse/csgraph/_min_spanning_tree.pyx
Processing scipy/spatial/_hausdorff.pyx
Processing scipy/spatial/_voronoi.pyx
Processing scipy/spatial/_qhull.pyx
Processing scipy/spatial/_ckdtree.pyx
Processing scipy/spatial/transform/_rotation.pyx
Processing scipy/_lib/_test_deprecation_def.pyx
Processing scipy/_lib/_ccallback_c.pyx
Processing scipy/_lib/_test_deprecation_call.pyx
Processing scipy/_lib/messagestream.pyx
Processing scipy/cluster/_hierarchy.pyx
Processing scipy/cluster/_vq.pyx
Processing scipy/cluster/_optimal_leaf_ordering.pyx
Processing scipy/stats/_sobol.pyx
Processing scipy/stats/_qmc_cy.pyx
Processing scipy/stats/_stats.pyx
Processing scipy/stats/_biasedurn.pyx
Processing scipy/stats/_unuran/unuran_wrapper.pyx
Processing scipy/stats/_boost/src/binom_ufunc.pyx
Processing scipy/stats/_boost/src/nbinom_ufunc.pyx
Processing scipy/stats/_boost/src/beta_ufunc.pyx
Processing scipy/stats/_boost/src/hypergeom_ufunc.pyx
Processing scipy/fftpack/convolve.pyx
Processing scipy/optimize/_bglu_dense.pyx
Processing scipy/optimize/_group_columns.pyx
Processing scipy/optimize/_highs/cython/src/_highs_wrapper.pyx
Processing scipy/optimize/_highs/cython/src/_highs_constants.pyx
Processing scipy/optimize/_trlib/_trlib.pyx
Processing scipy/optimize/cython_optimize/_zeros.pyx.in
Processing scipy/optimize/tnc/_moduleTNC.pyx
Processing scipy/optimize/_lsq/givens_elimination.pyx
Processing scipy/special/_test_round.pyx
Processing scipy/special/_ufuncs.pyx
Processing scipy/special/_comb.pyx
Processing scipy/special/_ellip_harm_2.pyx
Processing scipy/special/_ufuncs_cxx.pyx
Processing scipy/special/cython_special.pyx
Processing scipy/signal/_peak_finding_utils.pyx
Processing scipy/signal/_sosfilt.pyx
Processing scipy/signal/_spectral.pyx
Processing scipy/signal/_upfirdn_apply.pyx
Processing scipy/signal/_max_len_seq_inner.pyx
Processing scipy/linalg/cython_lapack.pyx
Processing scipy/linalg/cython_blas.pyx
Processing scipy/linalg/_solve_toeplitz.pyx
Processing scipy/linalg/_cythonized_array_utils.pyx
Processing scipy/linalg/_matfuncs_sqrtm_triu.pyx
Processing scipy/linalg/_decomp_update.pyx.in
warning: unuran_wrapper.pyx:470:21: Non-trivial type declarators in shared declaration (e.g. mix of pointers and values). Each pointer declaration should be on its own line.
warning: unuran_wrapper.pyx:470:28: Non-trivial type declarators in shared declaration (e.g. mix of pointers and values). Each pointer declaration should be on its own line.
warning: unuran_wrapper.pyx:470:36: Non-trivial type declarators in shared declaration (e.g. mix of pointers and values). Each pointer declaration should be on its own line.
warning: unuran_wrapper.pyx:515:21: Non-trivial type declarators in shared declaration (e.g. mix of pointers and values). Each pointer declaration should be on its own line.
warning: unuran_wrapper.pyx:515:28: Non-trivial type declarators in shared declaration (e.g. mix of pointers and values). Each pointer declaration should be on its own line.
warning: unuran_wrapper.pyx:515:36: Non-trivial type declarators in shared declaration (e.g. mix of pointers and values). Each pointer declaration should be on its own line.
warning: unuran_wrapper.pyx:1469:21: Non-trivial type declarators in shared declaration (e.g. mix of pointers and values). Each pointer declaration should be on its own line.
warning: unuran_wrapper.pyx:1469:28: Non-trivial type declarators in shared declaration (e.g. mix of pointers and values). Each pointer declaration should be on its own line.
warning: unuran_wrapper.pyx:1469:36: Non-trivial type declarators in shared declaration (e.g. mix of pointers and values). Each pointer declaration should be on its own line.
Processing scipy/io/matlab/_streams.pyx
Processing scipy/io/matlab/_mio_utils.pyx
Processing scipy/io/matlab/_mio5_utils.pyx
Processing scipy/ndimage/src/_ni_label.pyx
Processing scipy/ndimage/src/_cytest.pyx
Processing scipy/interpolate/interpnd.pyx
Processing scipy/interpolate/_bspl.pyx
Processing scipy/interpolate/_ppoly.pyx
Cythonizing sources
lapack_opt_info:
lapack_mkl_info:
customize UnixCCompiler
libraries mkl_rt not found in ['/share/pkg/python3/3.9.6/lib', '/usr/local/lib64', '/usr/local/lib', '/usr/lib64', '/usr/lib']
NOT AVAILABLE

openblas_lapack_info:
libraries openblas not found in ['/share/pkg/python3/3.9.6/lib', '/usr/local/lib64', '/usr/local/lib', '/usr/lib64', '/usr/lib']
NOT AVAILABLE

openblas_clapack_info:
libraries openblas,lapack not found in ['/share/pkg/python3/3.9.6/lib', '/usr/local/lib64', '/usr/local/lib', '/usr/lib64', '/usr/lib']
NOT AVAILABLE

flame_info:
libraries flame not found in ['/share/pkg/python3/3.9.6/lib', '/usr/local/lib64', '/usr/local/lib', '/usr/lib64', '/usr/lib']
NOT AVAILABLE

atlas_3_10_threads_info:
Setting PTATLAS=ATLAS
libraries lapack_atlas not found in /share/pkg/python3/3.9.6/lib
libraries tatlas,tatlas not found in /share/pkg/python3/3.9.6/lib
libraries lapack_atlas not found in /usr/local/lib64
libraries tatlas,tatlas not found in /usr/local/lib64
libraries lapack_atlas not found in /usr/local/lib
libraries tatlas,tatlas not found in /usr/local/lib
libraries lapack_atlas not found in /usr/lib64/atlas
libraries tatlas,tatlas not found in /usr/lib64/atlas
libraries lapack_atlas not found in /usr/lib64/sse2
libraries tatlas,tatlas not found in /usr/lib64/sse2
libraries lapack_atlas not found in /usr/lib64
libraries tatlas,tatlas not found in /usr/lib64
libraries lapack_atlas not found in /usr/lib
libraries tatlas,tatlas not found in /usr/lib
<class 'numpy.distutils.system_info.atlas_3_10_threads_info'>
NOT AVAILABLE

atlas_3_10_info:
libraries lapack_atlas not found in /share/pkg/python3/3.9.6/lib
libraries satlas,satlas not found in /share/pkg/python3/3.9.6/lib
libraries lapack_atlas not found in /usr/local/lib64
libraries satlas,satlas not found in /usr/local/lib64
libraries lapack_atlas not found in /usr/local/lib
libraries satlas,satlas not found in /usr/local/lib
libraries lapack_atlas not found in /usr/lib64/atlas
libraries satlas,satlas not found in /usr/lib64/atlas
libraries lapack_atlas not found in /usr/lib64/sse2
libraries satlas,satlas not found in /usr/lib64/sse2
libraries lapack_atlas not found in /usr/lib64
libraries satlas,satlas not found in /usr/lib64
libraries lapack_atlas not found in /usr/lib
libraries satlas,satlas not found in /usr/lib
<class 'numpy.distutils.system_info.atlas_3_10_info'>
NOT AVAILABLE

atlas_threads_info:
Setting PTATLAS=ATLAS
libraries lapack_atlas not found in /share/pkg/python3/3.9.6/lib
libraries ptf77blas,ptcblas,atlas not found in /share/pkg/python3/3.9.6/lib
libraries lapack_atlas not found in /usr/local/lib64
libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib64
libraries lapack_atlas not found in /usr/local/lib
libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib
libraries lapack_atlas not found in /usr/lib64/atlas
libraries ptf77blas,ptcblas,atlas not found in /usr/lib64/atlas
libraries lapack_atlas not found in /usr/lib64/sse2
libraries ptf77blas,ptcblas,atlas not found in /usr/lib64/sse2
libraries lapack_atlas not found in /usr/lib64
libraries ptf77blas,ptcblas,atlas not found in /usr/lib64
libraries lapack_atlas not found in /usr/lib
libraries ptf77blas,ptcblas,atlas not found in /usr/lib
<class 'numpy.distutils.system_info.atlas_threads_info'>
NOT AVAILABLE

atlas_info:
libraries lapack_atlas not found in /share/pkg/python3/3.9.6/lib
libraries f77blas,cblas,atlas not found in /share/pkg/python3/3.9.6/lib
libraries lapack_atlas not found in /usr/local/lib64
libraries f77blas,cblas,atlas not found in /usr/local/lib64
libraries lapack_atlas not found in /usr/local/lib
libraries f77blas,cblas,atlas not found in /usr/local/lib
libraries lapack_atlas not found in /usr/lib64/atlas
libraries f77blas,cblas,atlas not found in /usr/lib64/atlas
libraries lapack_atlas not found in /usr/lib64/sse2
libraries f77blas,cblas,atlas not found in /usr/lib64/sse2
libraries lapack_atlas not found in /usr/lib64
libraries f77blas,cblas,atlas not found in /usr/lib64
libraries lapack_atlas not found in /usr/lib
libraries f77blas,cblas,atlas not found in /usr/lib
<class 'numpy.distutils.system_info.atlas_info'>
NOT AVAILABLE

accelerate_info:
NOT AVAILABLE

lapack_info:
libraries lapack not found in ['/share/pkg/python3/3.9.6/lib', '/usr/local/lib64', '/usr/local/lib', '/usr/lib64', '/usr/lib']
NOT AVAILABLE

/tmp/pip-build-env-_eel5sfk/overlay/lib/python3.9/site-packages/numpy/distutils/system_info.py:1748: UserWarning:
Lapack (http://www.netlib.org/lapack/) libraries not found.
Directories to search for the libraries can be specified in the
numpy/distutils/site.cfg file (section [lapack]) or by setting
the LAPACK environment variable.
return getattr(self, 'calc_info{}'.format(name))()
lapack_src_info:
NOT AVAILABLE

/tmp/pip-build-env-_eel5sfk/overlay/lib/python3.9/site-packages/numpy/distutils/system_info.py:1748: UserWarning:
Lapack (http://www.netlib.org/lapack/) sources not found.
Directories to search for the sources can be specified in the
numpy/distutils/site.cfg file (section [lapack_src]) or by setting
the LAPACK_SRC environment variable.
return getattr(self, 'calc_info{}'.format(name))()
NOT AVAILABLE

Traceback (most recent call last):
File "/share/pkg/python3/3.9.6/lib/python3.9/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 363, in
main()
File "/share/pkg/python3/3.9.6/lib/python3.9/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 345, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/share/pkg/python3/3.9.6/lib/python3.9/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 164, in prepare_metadata_for_build_wheel
return hook(metadata_directory, config_settings)
File "/tmp/pip-build-env-_eel5sfk/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 174, in prepare_metadata_for_build_wheel
self.run_setup()
File "/tmp/pip-build-env-_eel5sfk/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 267, in run_setup
super(_BuildMetaLegacyBackend,
File "/tmp/pip-build-env-_eel5sfk/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 158, in run_setup
exec(compile(code, file, 'exec'), locals())
File "setup.py", line 628, in
setup_package()
File "setup.py", line 624, in setup_package
setup(**metadata)
File "/tmp/pip-build-env-_eel5sfk/overlay/lib/python3.9/site-packages/numpy/distutils/core.py", line 135, in setup
config = configuration()
File "setup.py", line 526, in configuration
raise NotFoundError(msg)
numpy.distutils.system_info.NotFoundError: No BLAS/LAPACK libraries found.
To build Scipy from sources, BLAS & LAPACK libraries need to be installed.
See site.cfg.example in the Scipy source directory and
https://docs.scipy.org/doc/scipy/reference/building/index.html for details.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Also, I tried it with 'Installation from our GitHub repository' but it also had an error message.

[ek81w@ghpcc06 Euijin]$ git clone https://github.com/GoekeLab/m6anet.git
[ek81w@ghpcc06 Euijin]$ cd m6anet/
[ek81w@ghpcc06 m6anet]$ python setup.py install
Traceback (most recent call last):
File "setup.py", line 3, in
from setuptools import setup,find_packages
File "/share/pkg/python3/3.9.6/lib/python3.9/site-packages/setuptools/init.py", line 132
for k in set(_incl) & set(attrs)
^
SyntaxError: invalid syntax

By chance, if you provide a conda environment, it would likely be significantly easier and less trouble than trying to figure out all the dependencies it requires through trial and error.

I really appreciate your help in advance!

Can m6anet pick up m1A signal?

Can the tool be able to able to tell apart m1A versus m6A? We are seeing in our organism some interesting signal from m6Anet inference that enriches at 5 prime end of transcripts. Similarly, our rRNA transcripts also showed strong inferred signal at 5 prime end. Greatly appreciate your help!

How do I get the files in the 'm6Anet requires eventalign.txt from nanopolish' step?

Hello,
I want to detect m6A modifications from my direct RNA sequencing sample.When I look at 'https://m6anet.readthedocs.io/en/latest/quickstart.html#quickstart' page, 'm6Anet requires eventalign.txt from nanopolish' (nanopolish eventalign --reads reads.fastq --bam reads.sorted.bam --genome transcript.fa --scale-events --signal-index --summary /path/to/summary.txt --threads 50 > /path/to/eventalign.txt) The files required for the code in this step are not well understood. How do the 'reads.sorted.bam', 'transcript.fa' and '/path/to/summary.txt' in this line of code get? Can you provide sample file information?

Looking forward to receiving your reply.

Thanks.

Xiao tong

m6anet-run_inference, IndexError: single positional indexer is out-of-bounds

Dear developers,

Thanks for the nice tool! I tried to run the m6anet-run_inference, but there is an error saying that IndexError: single positional indexer is out-of-bounds. Could you please help me to solve this problem?

I used below code to run:

m6anet-run_inference --input_dir ./apo/dataprep --out_dir ./apo/inference --infer_mod_rate --n_processes 4

Please let me know if you need anything from my side.
Thank you in advance for your help!

Best,
Huawen

nf-m6anet NextFlow pipeline

Dear all,
I just created a NextFlow pipeline called nf-m6anet, which performs alignment of sequencing reads in fastq format to the transcriptome with minimap2, resquiggling of raw fast5 files with f5c re-implementation of Nanopolish and m6A detection with m6anet. Compared to nanoseq, it also filters high-quality m6A+ sites based on modification probability and performs lift-over of transcriptome-based to genome-based coordinates. Feel free to use it and provide me with any feedbacks or suggestions you may have!
Best

m6anet-dataprep error

m6anet-dataprep --eventalign /binf-isilon/PBgrp/jfb841/nanopore/data_extract/20180227_1832_20180227_FAH59351_vir1_2922_DRS/basecalled_data/reads-ref.eventalign.vir1_1.txt --out_dir /binf-isilon/PBgrp/jfb841/nanopore/data_extract/m6anet_output.vir1_1 --n_processes 20

Hi,
I performed m6anet-dataprep with the script above and the run will be stuck there. I am wondering what could be the possibility leading to this and what is the "AssertionError" mean? Thanks.

Bing

output the read level modification probability

Hi,
thanks a lot for developing m6anet!
I have successfully run my data with m6anet. But I have a question, can I output the read level modification probability for specific sites?

Best wishes.

Few m6a sites detected

Hi,
I downloaded HEK293T-WT data which includes a eventalign result from xPore and it's preparation is as same as that in your article.
I performed m6anet-dataprep and got this warning:

/home/yuan/.local/lib/python3.9/site-packages/m6anet/scripts/dataprep.py:142: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
chunk_split['line_length'] = np.array(lines)
/home/yuan/.local/lib/python3.9/site-packages/m6anet/scripts/dataprep.py:100: PerformanceWarning: indexing past lexsort depth may impact performance.
pos_end += eventalign_result.loc[index]['line_length'].sum()

Then I tried m6anet-run_inference and only found 33 m6a sites.
Here is my commands:

m6anet-dataprep --eventalign eventalign.txt --out_dir m6a --n_processes 20
m6anet-run_inference --input_dir m6a --out_dir m6a --n_processes 10

Could you give me some suggestions about this issue? Thank you very much!

Error while trying to run the m6anet-dataprep

Hello,

I ran the nanopolish to get the files necessary to run m6anet on my data. I ran the following script.

m6anet-dataprep --eventalign sample1.eventalign.txt \ --summary sample1_summary.txt \ --out_dir /scratch/user/data/m6anet/dataprep --n_processes 4

I get the following error.

Traceback (most recent call last): File "/home/bhargavam/anaconda3/bin/m6anet-dataprep", line 11, in <module> load_entry_point('m6anet==0.0.1', 'console_scripts', 'm6anet-dataprep')() File "/home/bhargavam/anaconda3/lib/python3.7/site-packages/m6anet-0.0.1-py3.7.egg/m6anet/scripts/dataprep.py", line 245, in main File "/home/bhargavam/anaconda3/lib/python3.7/site-packages/m6anet-0.0.1-py3.7.egg/m6anet/scripts/helper.py", line 67, in is_successful File "/home/bhargavam/anaconda3/lib/python3.7/site-packages/m6anet-0.0.1-py3.7.egg/m6anet/scripts/helper.py", line 60, in read_last_line OSError: [Errno 22] Invalid argument

Any insights on what the problem might be and how to resolve this are much appreciated. Thank you very much.

Setting minimum read count

Dear m6anet Team,

would you recommend setting the "Minimum read counts per gene" argument to something like 10 or 100 for a first try, in order to see real confident positions? My doubt here is that I might lose a lot of sites, since with direct RNA sequencing, I have very low read numbers for most of the genes.

Many thanks for your thoughts.
Best regards, Sophia

probability_modified column is empty

Hi there,

I have tried running m6anet run-inference to predict m6A modifications on my nanopore data.

This was done based on the quick start tutorial https://m6anet.readthedocs.io/en/latest/quickstart.html

However, when I have a look at the output file, instead of indicating a 0 or a 1 as shown in the examples, mine turns out to be a blank column.

For your reference I have attached a snippet of my output file:

transcript_id, transcript_position, n_reads, probability_modified, kmer
2L, 74989, 31, ,TGACT
2L, 75008, 32, ,AAACT
2L, 75100, 33, ,TAACA
2L, 75106, 31, ,TGACT
2L, 75151, 35, ,AAACT

If you can provide me some guidance with this that will be great. Thank you!

m6anet queries

Hi Chris,

Thanks for releasing the new model for m6anet.

So far we managed to install and run (m6anet-dataprep & m6anet-run_inference) this tool on 4 Nanopore sequenced datasets obtained for 2 WT and 2 ime4-knockout yeast strains (the latter of which should be devoid of any m6AS). The data we are employing is the one derived in this paper by the Maria-Novoa lab. I have 3 queries :

Does this tool allow directly comparing a WT sample to a KO counterpart?
For some reason we do not get a quantification of the prediction quality - The “probability_modified” column at the final output file contains only NA’s for all predicted m6A locations. How can this be resolved?

3.From our preliminary results it seems, surprisingly, that the predicted m6A locations are very similar between the WT and the KO. We obtain a similar number of predicted sites for both groups, with few unique m6A sites in the WT with regard to the KO.

We wondered if changing some of the default arguments may help us perform a more refined prediction. Some of the arguments which might be relevant (but which we haven't yet played around with, given that we could not find documentation on how to best modulate them) could include the --n_neighbors argument (in m6A-net-dataprep) or potentially also --model_config and --model_state_dict in m6anet-run_inference. If you can provide a short explanation for these arguments it will be great.

Thanks,

Vadim

received 0 items of ancdata

I have some direct RNA data, when I run the following command,
m6anet-run_inference --input_dir /ourdisk/hpc/rnafold/dywang/dont_archive/On_process/mocka59/m6A_v27 --out_dir /ourdisk/hpc/rnafold/dywang/dont_archive/On_process/mocka59/m6A_v27 --infer_mod_rate --n_processes 4

I got the error report
Traceback (most recent call last):
File "/home/dywang/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 779, in _try_get_data
data = self._data_queue.get(timeout=timeout)
File "/opt/oscer/software/Python/3.8.2-GCCcore-9.3.0/lib/python3.8/multiprocessing/queues.py", line 116, in get
return _ForkingPickler.loads(res)
File "/home/dywang/.local/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 282, in rebuild_storage_fd
fd = df.detach()
File "/opt/oscer/software/Python/3.8.2-GCCcore-9.3.0/lib/python3.8/multiprocessing/resource_sharer.py", line 58, in detach
return reduction.recv_handle(conn)
File "/opt/oscer/software/Python/3.8.2-GCCcore-9.3.0/lib/python3.8/multiprocessing/reduction.py", line 189, in recv_handle
return recvfds(s, 1)[0]
File "/opt/oscer/software/Python/3.8.2-GCCcore-9.3.0/lib/python3.8/multiprocessing/reduction.py", line 164, in recvfds
raise RuntimeError('received %d items of ancdata' %
RuntimeError: received 0 items of ancdata

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/dywang/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 803, in _try_get_data
fs = [tempfile.NamedTemporaryFile() for i in range(fds_limit_margin)]
File "/home/dywang/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 803, in
fs = [tempfile.NamedTemporaryFile() for i in range(fds_limit_margin)]
File "/opt/oscer/software/Python/3.8.2-GCCcore-9.3.0/lib/python3.8/tempfile.py", line 541, in NamedTemporaryFile
(fd, name) = _mkstemp_inner(dir, prefix, suffix, flags, output_type)
File "/opt/oscer/software/Python/3.8.2-GCCcore-9.3.0/lib/python3.8/tempfile.py", line 250, in _mkstemp_inner
fd = _os.open(file, flags, 0o600)
OSError: [Errno 24] Too many open files: '/tmp/tmpkjiqmh_s'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/dywang/.local/bin/m6anet-run_inference", line 11, in
load_entry_point('m6anet==1.1.1', 'console_scripts', 'm6anet-run_inference')()
File "/home/dywang/.local/lib/python3.8/site-packages/m6anet-1.1.1-py3.8.egg/m6anet/scripts/run_inference.py", line 72, in main
File "/home/dywang/.local/lib/python3.8/site-packages/m6anet-1.1.1-py3.8.egg/m6anet/scripts/run_inference.py", line 67, in run_inference
File "/home/dywang/.local/lib/python3.8/site-packages/m6anet-1.1.1-py3.8.egg/m6anet/utils/training_utils.py", line 225, in infer_mod_ratio
File "/home/dywang/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 363, in next
data = self._next_data()
File "/home/dywang/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 974, in _next_data
idx, data = self._get_data()
File "/home/dywang/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 941, in _get_data
success, data = self._try_get_data()
File "/home/dywang/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 806, in _try_get_data
raise RuntimeError(
RuntimeError: Too many open files. Communication with the workers is no longer possible. Please increase the limit using ulimit -n in the shell or change the sharing strategy by calling torch.multiprocessing.set_sharing_strategy('file_system') at the beginning of your code

Segmentation fault in running m6anet-run_inference

Hi Developer!

I was running m6anet with demo data.

m6anet-dataprep appeared to run successfully though it had a warning message.
m6anet-dataprep --eventalign demo/eventalign.txt --out_dir dataprep_out --n_processes 4

/home/ek81w/.conda/envs/m6anet/lib/python3.8/site-packages/m6anet/scripts/dataprep.py:143: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
chunk_split['line_length'] = np.array(lines)
/home/ek81w/.conda/envs/m6anet/lib/python3.8/site-packages/m6anet/scripts/dataprep.py:101: PerformanceWarning: indexing past lexsort depth may impact performance.
pos_end += eventalign_result.loc[index]['line_length'].sum()

However, when I run m6anet-run_inference, it showed Segmentation fault message.
m6anet-run_inference --input_dir dataprep_out --out_dir inference_out --infer_mod-rate --n_processes 4
I tried increasing the memory request, but it didn't work.
By chance, if you know an idea to fix this problem, please let me know.

Thank you.

EuiJin

ImportError: cannot import name 'TypeAlias' from 'typing_extensions' (/usr/lib/python3/dist-packages/typing_extensions.py)

Hi, I was running m6anet and got to the inference step when I encountered this error. Here's the output.

  Traceback (most recent call last):
    File "/home/matthew/.local/bin/m6anet-run_inference", line 5, in <module>
      from m6anet.scripts.run_inference import main
    File "/home/matthew/.local/lib/python3.8/site-packages/m6anet/scripts/run_inference.py", line 2, in <module>
      import torch
    File "/home/matthew/.local/lib/python3.8/site-packages/torch/__init__.py", line 753, in <module>
      from .serialization import save, load
    File "/home/matthew/.local/lib/python3.8/site-packages/torch/serialization.py", line 18, in <module>
      from typing_extensions import TypeAlias
  ImportError: cannot import name 'TypeAlias' from 'typing_extensions' (/usr/lib/python3/dist-packages/typing_extensions.py)

Looked like a simple import error, and sure enough if I installed cast_control, the error disappeared.

sudo python3 -m pip install cast_control

Just letting you know so you can update your pip build configuration with this library.

Supplementary Table 5

Dear developer,

Hi, I am running m6anet tools to detect m6a modification sites. After generating the output data, I want to compare my result with your data to make sure I run correctly.
In the ENA database, I saw there are three replicates of HEK293T WT and KO samples, However, Supplementary Table 5 only has probability of modified_wt and probability_modified_KO.
I was wondering in order to get supplementary table 5 if you use m6anet-run_inference --input_dir demo_data_1 demo_data_2 ... --out_dir demo_data --infer_mod-rate --n_processes 4 to combine all three replicates from each condition, WT, and KO, after you run m6anet-dataprep for 6 samples (3 replicates for each condition)?

Also, when I run m6Anet with demo data, I had a different mod_ratio despite changing the command line in each run. Is there a possibility to get a different mod_ratio value?

I am looking forward to hearing from you.
Thank you for your help!

Inference identifying Inosine as m6A?

Hey dev team,

I am wondering whether the built in inference model will identify Inosine signals as m6 Adenosine signals? From you professional opinion, would m6Anet be able to be trained to pick up Inosine signals? I dont think Inosine would pass the pores and generate exact Guanine like signals since 5mC or 6mA can alter the signals already versus non modified C and A.

PAR_Y or not

Hi Developer!

In your latest version , some transcripts' id include both PAR_Y and normal version . But in old version , those messages were cut off . So what does the old id mean ?

Kmer

What are possible reasons, why a DRACH motif is not processed?

Kind regards

Alex

MIN_READS setting

Hi developer！If I want to include more sites from low expressed gene, can I just modify the MIN_READS setting (from 20 to 5) in run_inference.py ?

how to train model on all middle A 5-mer motif

change the number of read count

Hi, how to change the minimum read count when running m6anet-run_inference? Thanks!

Error in m6anet-inference step

Hi Chris,

Thanks for releasing the new model for m6anet. I have run the nanopolish, data-prep steps successfully but I am encountering error with the inference step. Here's the command I used and the error that I got.

Here are the contents of the m6anet data-prep output directory.

data.index  data.json  data.log  data.readcount  eventalign.index

and a few lines at the beginning for each file.
data.index

transcript_id,transcript_position,start,end
gnl|X|GEFLOABG_1,495,0,241
gnl|X|GEFLOABG_1,554,241,805
gnl|X|GEFLOABG_1,566,805,1074
gnl|X|GEFLOABG_1,609,1074,1460
gnl|X|GEFLOABG_1,641,1460,1702
gnl|X|GEFLOABG_1,794,1702,1935
gnl|X|GEFLOABG_1,929,1935,2312
gnl|X|GEFLOABG_1,1276,2312,2415
gnl|X|GEFLOABG_1,2798,2415,2768

data.json

{"gnl|X|GEFLOABG_1":{"495":{"AAAACCA":[[0.01184074074074074,2.4917777777777776,110.4,0.00564,3.395,103.8,0.00752,2.2332285714285716,83.8],[0.01029,3.492,106.7,0.0156,4.1739999999999995,106.2,0.008974782608695654,1.8518405797101452,85.8]]}}}
{"gnl|X|GEFLOABG_1":{"554":{"CGAACTT":[[0.003320000000000001,6.228,113.0,0.003519512195121951,2.2558536585365854,98.7,0.00548,1.915878787878788,94.2],[0.0046864,5.01176,111.9,0.005838703703703704,2.561796296296296,98.3,0.004930512820512821,1.81474358974359,93.4],[0.010001636363636365,4.834199999999999,116.8,0.00299,2.9989999999999997,96.2,0.010620000000000001,2.162,91.2],[0.008946176470588235,5.474882352941177,116.5,0.018920000000000003,3.4560000000000004,102.8,0.00465,1.288,92.7],[0.00498,3.87,112.3,0.00598,2.3705,100.1,0.00299,1.9419999999999997,93.7]]}}}
{"gnl|X|GEFLOABG_1":{"566":{"GGGACTC":[[0.00465,2.62,125.0,0.011222000000000001,2.3234,126.6,0.0070901960784313725,4.266843137254902,91.4],[0.00232,3.6639999999999993,118.4,0.023049523809523808,8.358790476190476,126.5,0.0033529999999999996,3.2972499999999996,91.1]]}}}
{"gnl|X|GEFLOABG_1":{"609":{"AAAACTT":[[0.0069680930232558155,2.397232558139535,110.0,0.006640000000000002,3.582,112.1,0.00465,3.0060000000000002,97.6],[0.01727975,3.17975,112.6,0.004538723404255319,2.678808510638298,111.9,0.005993611111111112,1.6218055555555557,90.4],[0.010791319148936171,2.3994510638297872,111.8,0.00465,2.253,108.8,0.0073202272727272725,2.005272727272727,95.2]]}}}
{"gnl|X|GEFLOABG_1":{"641":{"AAAACAT":[[0.01029,3.805,104.3,0.0166,4.933,96.9,0.004095833333333333,2.7229166666666664,86.7],[0.00797,1.8795,111.6,0.009260961538461539,2.988711538461539,101.2,0.0063100000000000005,3.5839999999999996,88.1]]}}}
{"gnl|X|GEFLOABG_1":{"794":{"AAAACTG":[[0.00996,5.541,111.1,0.007640000000000001,6.316,107.9,0.005605757575757576,2.3050909090909095,91.1],[0.00365,3.0589999999999993,109.3,0.00266,2.226,100.3,0.0037576000000000003,2.78356,93.4]]}}}
{"gnl|X|GEFLOABG_1":{"929":{"CGAACTG":[[0.005724387755102042,4.217897959183674,103.9,0.006640000000000002,4.3839999999999995,100.0,0.003927636363636363,2.333509090909091,94.9],[0.010001636363636365,5.474981818181819,116.8,0.00232,1.39,102.4,0.00797,1.676,97.8],[0.011672112676056338,7.767183098591549,118.6,0.00299,2.261,104.7,0.008669074074074076,2.5933148148148146,94.3]]}}}
{"gnl|X|GEFLOABG_1":{"1276":{"ATAACAT":[[0.00365,1.327,87.2,0.00299,1.148,91.1,0.00365,2.305,96.9]]}}}
{"gnl|X|GEFLOABG_1":{"2798":{"CTGACAT":[[0.016830176991150442,3.0475840707964603,107.3,0.00498,13.562000000000001,115.8,0.0234909649122807,2.8029298245614034,81.8],[0.00365,2.01,105.9,0.0093,8.011000000000001,113.2,0.0083,2.9219999999999997,78.3],[0.010674254545454544,3.1419200000000003,106.3,0.0049552,9.74708,109.6,0.007640000000000001,3.4,83.2]]}}}
{"gnl|X|GEFLOABG_1":{"2872":{"GTGACAC":[[0.006475897435897436,3.2191025641025637,98.5,0.006076666666666667,9.85124,115.7,0.0052285185185185195,2.837111111111111,81.5],[0.008934782608695653,4.1631847826086945,103.2,0.011549135514018692,5.914228971962616,110.3,0.005172222222222222,4.274511111111111,84.8]]}}}

data.log

gnl|X|GEFLOABG_1: Data preparation ... Done.

data.readcount

transcript_id,transcript_position,n_reads
gnl|X|GEFLOABG_1,495,2
gnl|X|GEFLOABG_1,554,5
gnl|X|GEFLOABG_1,566,2
gnl|X|GEFLOABG_1,609,3
gnl|X|GEFLOABG_1,641,2
gnl|X|GEFLOABG_1,794,2
gnl|X|GEFLOABG_1,929,3
gnl|X|GEFLOABG_1,1276,1
gnl|X|GEFLOABG_1,2798,3

eventalign.index

transcript_id,read_index,pos_start,pos_end
gnl|X|GEFLOABG_1,9,172,40517
gnl|X|GEFLOABG_1,17,40517,124637
gnl|X|GEFLOABG_1,4,124637,184631
gnl|X|GEFLOABG_1,15,184631,306199
gnl|X|GEFLOABG_1,12,306199,361953
gnl|X|GEFLOABG_1,27,361953,440434
gnl|X|GEFLOABG_1,5,440434,546348
gnl|X|GEFLOABG_1,16,546348,605586
gnl|X|GEFLOABG_1,21,605586,686048

Happy to provide more details if needed. Please let me know how to resolve this error.

reference genome or transcriptome

Dear m6anet Team,

I am using m6anet for the first time with mouse data from ONT directRNA sequencing.
I preprocessed the reads with guppy (basecalling), minimap2 (alignment to the reference genome and to the reference transcriptome, from Gencode) and nanopolish as described in #9 .
However, I would like to make sure about the reference:
Do you usually use the reference genome, or a reference transcriptome (one sequence = one transcript)?
Many thanks for your help. Best regards, Sophia

Error in m6anet-run_inference step

Hi ,
Thanks for releasing the new model for m6anet. I have run the nanopolish, data-prep steps successfully but I am encountering error with the m6anet-inference step. Here's the command I used and the error that I got. Could you give me some suggestions about this issue? Thank you very much!

There are the output files of the m6anet data-prep output directory.
data.index data.json data.log data.readcount eventalign.index
A few lines for each file as follows:
data.idex
transcript_id,transcript_position,start,end cc6m_2244_T7_ecorv,1965,0,130 cc6m_2244_T7_ecorv,1983,130,236 cc6m_2244_T7_ecorv,2030,236,352 cc6m_2459_T7_ecorv,333,352,470 cc6m_2459_T7_ecorv,338,470,578 cc6m_2459_T7_ecorv,419,578,695 cc6m_2459_T7_ecorv,431,695,824 cc6m_2459_T7_ecorv,496,824,942 cc6m_2459_T7_ecorv,565,942,1048 cc6m_2459_T7_ecorv,589,1048,1168 cc6m_2459_T7_ecorv,641,1168,1284 cc6m_2459_T7_ecorv,647,1284,1406
data.json
{"cc6m_2244_T7_ecorv":{"1965":{"AGGACTT":[[0.0059211111,3.8680915033,119.6,0.00266,3.98,121.4,0.0066841026,1.6050512821,85.8]]}}} {"cc6m_2244_T7_ecorv":{"1983":{"TGAACCG":[[0.00299,3.341,117.7,0.00465,7.216,90.4,0.00498,4.315,82.8]]}}} {"cc6m_2244_T7_ecorv":{"2030":{"ATAACCA":[[0.003588125,1.84678125,84.1,0.00299,2.173,90.9,0.003386,2.1706,83.0]]}}} {"cc6m_2459_T7_ecorv":{"333":{"GGGACTT":[[0.0078971429,1.9767142857,118.4,0.00531,5.515,119.4,0.00498,2.502,82.8]]}}} {"cc6m_2459_T7_ecorv":{"338":{"TTAACAA":[[0.00664,3.418,89.2,0.00299,1.14,85.6,0.0056425,1.290125,83.1]]}}} {"cc6m_2459_T7_ecorv":{"419":{"AAAACAT":[[0.01494,5.865,102.8,0.01627,4.07,101.1,0.0044143478,1.0262173913,87.8]]}}} {"cc6m_2459_T7_ecorv":{"431":{"CTAACTT":[[0.0080698246,1.8048596491,91.4,0.0136877778,3.0567777778,102.1,0.00232,1.771,91.2]]}}} {"cc6m_2459_T7_ecorv":{"496":{"CGGACCC":[[0.00232,1.516,117.8,0.0059936111,6.8690555556,110.8,0.00332,1.729,71.5]]}}} {"cc6m_2459_T7_ecorv":{"565":{"TTGACAT":[[0.00664,1.587,103.3,0.01461,4.278,108.9,0.00266,2.246,81.8]]}}} {"cc6m_2459_T7_ecorv":{"589":{"ATAACAA":[[0.0029082353,1.64,83.4,0.00232,1.157,93.0,0.0087265347,2.4270792079,86.7]]}}} {"cc6m_2459_T7_ecorv":{"641":{"ATAACTC":[[0.0064280769,1.6935096154,85.6,0.00232,1.491,88.8,0.00299,1.384,85.5]]}}} {"cc6m_2459_T7_ecorv":{"647":{"CAAACTT":[[0.0093738182,2.9810545455,104.1,0.00365,3.917,102.8,0.0049552,0.87816,91.3]]}}}

data.log
cc6m_2244_T7_ecorv: Data preparation ... Done. cc6m_2459_T7_ecorv: Data preparation ... Done.
data.readcount
‘transcript_id,transcript_position,n_reads
cc6m_2244_T7_ecorv,1965,1
cc6m_2244_T7_ecorv,1983,1
cc6m_2244_T7_ecorv,2030,1
cc6m_2459_T7_ecorv,333,1
cc6m_2459_T7_ecorv,338,1
cc6m_2459_T7_ecorv,419,1
cc6m_2459_T7_ecorv,431,1
cc6m_2459_T7_ecorv,496,1
cc6m_2459_T7_ecorv,565,1
cc6m_2459_T7_ecorv,589,1
cc6m_2459_T7_ecorv,641,1
cc6m_2459_T7_ecorv,647,1
’

eventalign.txt
transcript_id,read_index,pos_start,pos_end cc6m_2244_T7_ecorv,0,172,84090 cc6m_2459_T7_ecorv,1,84090,190941

Where to find read-level probability?

Hi according to this thread #47
I turn on the --infer_mod_rate flag, and it now gives a mod_ratio entry which gives the ratio, but I still can't find the read level probability, I find there is a data.json file contains N-by-9 matrix, is the read level probability in this matrix? And if so how should I interpret it?

Best,
Teng

m6anet publication

Hi developer,

I only found m6anet paper in biorxiv. I was just wondering if m6anet has been published in a peer-reviewed journal.

Thank you!

Some Feedback and Questions

Hello, really looking forward to giving this a try with our lab's dataset. Detecting m6A modifications has been quite a challenge to get up and running, but m6Anet looks like a great alternative to other already established methods.

After getting things setup, I just have a few questions. I'm pretty sure I know the answer to these, but I just want to verify:

For input data, you must always use transcriptome aligned data, correct? It would not be valid to attempt to basecall and align nanopore raw data to the genome and then attempt to put that data through m6Anet?
For basecalling requirements, previous m6a detection methods processing single samples required the utilized basecalling algorithms to match those that were used for the trained models (for instance, EpiNano-SVM requires data basecalled with Guppy 3.1.5, which is very outdated). Since .fast5 files generated from more recent, accurate, and faster versions of Guppy are not backwards compatible with older Guppy versions (see the nanopore thread here), I'm curious if there are any 'basecalling algorithm' requirements for using m6anet? Can your input data be basecalled with any Guppy version?

Also, a little feedback:

The command found in the documentation for m6anet-run_inference --input_dir demo_data --out_dir demo_data --infer_mod-rate --n_processes 4 contains a typo. --infer_mod-rate should actually be --infer_mod_rate, otherwise it throws an error.
I had segmentation fault issues when doing an installation of m6anet using the setup.py method, which was due to running the installation within an already existing conda environment. I instead created a new empty conda environment (conda create --name m6anet) prior to the install which fixed the issue.
(This is not related to m6anet, but useful for people here) Guppy caused a few headaches for me when attempting to run nanopolish index and eventalign, because of folder paths and file names in the sequencing_summary.txt file. Nanopolish originally thought I had corrupted or missing data, but it was simply looking for it in the wrong place. Although slower, if you have trouble with the first pre-processing step, try again without including the sequencing_summary.txt file in your pipeline and see if that gets you through to the next step.

Thank you so much!

Errors while running m6anet-train

Hi,
Thanks a lot for developing m6anet!
I am trying to run m6anet-train with my own data, and I have extracted features from 2 base pairs flanking the candidate site. I modified 'num_neighboring_features = 2' in m6anet/m6anet/model/configs/model_configs/prod_pooling.toml and m6anet/m6anet/model/configs/training_configs/oversampled.toml. However I got this error

There are 2861 train sites
There are 675 val sites
There are 785 test sites
Traceback (most recent call last):
File "/home/guowb/software/anaconda/ENTER/envs/machine_learning/bin/m6anet-train", line 33, in
sys.exit(load_entry_point('m6anet==1.0.0', 'console_scripts', 'm6anet-train')())
File "/home/guowb/software/anaconda/ENTER/envs/machine_learning/lib/python3.9/site-packages/m6anet-1.0.0-py3.9.egg/m6anet/scripts/train.py", line 108, in main
File "/home/guowb/software/anaconda/ENTER/envs/machine_learning/lib/python3.9/site-packages/m6anet-1.0.0-py3.9.egg/m6anet/scripts/train.py", line 75, in train_and_save
File "/home/guowb/software/anaconda/ENTER/envs/machine_learning/lib/python3.9/site-packages/m6anet-1.0.0-py3.9.egg/m6anet/utils/training_utils.py", line 38, in train
File "/home/guowb/software/anaconda/ENTER/envs/machine_learning/lib/python3.9/site-packages/m6anet-1.0.0-py3.9.egg/m6anet/utils/training_utils.py", line 100, in train_one_epoch
File "/home/guowb/software/anaconda/ENTER/envs/machine_learning/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/guowb/software/anaconda/ENTER/envs/machine_learning/lib/python3.9/site-packages/m6anet-1.0.0-py3.9.egg/m6anet/model/model.py", line 81, in forward
File "/home/guowb/software/anaconda/ENTER/envs/machine_learning/lib/python3.9/site-packages/m6anet-1.0.0-py3.9.egg/m6anet/model/model.py", line 66, in get_site_probability
File "/home/guowb/software/anaconda/ENTER/envs/machine_learning/lib/python3.9/site-packages/m6anet-1.0.0-py3.9.egg/m6anet/model/model.py", line 63, in get_site_representation
File "/home/guowb/software/anaconda/ENTER/envs/machine_learning/lib/python3.9/site-packages/m6anet-1.0.0-py3.9.egg/m6anet/model/model.py", line 56, in get_read_representation
File "/home/guowb/software/anaconda/ENTER/envs/machine_learning/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/guowb/software/anaconda/ENTER/envs/machine_learning/lib/python3.9/site-packages/torch/nn/modules/container.py", line 141, in forward
input = module(input)
File "/home/guowb/software/anaconda/ENTER/envs/machine_learning/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/guowb/software/anaconda/ENTER/envs/machine_learning/lib/python3.9/site-packages/m6anet-1.0.0-py3.9.egg/m6anet/model/model_blocks/blocks.py", line 78, in forward
File "/home/guowb/software/anaconda/ENTER/envs/machine_learning/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/guowb/software/anaconda/ENTER/envs/machine_learning/lib/python3.9/site-packages/torch/nn/modules/sparse.py", line 158, in forward
return F.embedding(
File "/home/guowb/software/anaconda/ENTER/envs/machine_learning/lib/python3.9/site-packages/torch/nn/functional.py", line 2044, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self

Do I need to modify other parameters? Could you give me some suggestions about this issue? Thank you very much!

Inference. RuntimeError: Too many open files

Question: Where should I add the suggested "torch.multiprocessing.set_sharing_strategy('file_system')" code. And will it work if I don't have administrative privileges in my cluster? I do not have the permission to increase the limit through "ulimit -n"

Issue: Inference step fails
Error:

RuntimeError: Too many open files. Communication with the workers is no longer possible. Please increase the limit using "ulimit -n" in the shell or change the sharing strategy by calling "torch.multiprocessing.set_sharing_strategy('file_system')" at the beginning of your code

Command: m6anet-run_inference --input_dir ${DATAPREP_DIR} --out_dir ${INFERENCE_DIR} --infer_mod_rate --n_processes ${CPUS}
File sizes in ${DATAPREP_DIR}:

128M; eventalign.index; 2611156 (wc -l)
42M; data.readcount; 1710279 (wc -l)
2.2M; data.log; 48222 (wc -l)
3.8G; data.json; 1710278 (wc -l)
73M; data.index; 1710279 (wc -l)

Additional info:
m6anet version: 1.1.0 from pypi
OS: CentOS Linux 7 (Core)
Experiment: the whole flowcell was used for 1 sample (dRNA-seq run), which generated 216 .fast5 files

unmodified control library

Hi Christopher,
Thanks for releasing the new method for mapping m6A with direct RNA sequencing. I would like to know if the unmodified library is needed with this method for mapping m6A as my understanding from this paper is that the method is trained so that unmodified library is not needed anymore to map m6A. Since I am going to deliver samples for sequencing, this is important to know. Thanks.

Regards
Bing

m6anet-dataprep gives NaN as output

Hello,

I ran nanopolish and m6anet-dataprep but the data.json contains NaN:

{"68ebd370-cef0-4b97-bd1b-825d623446e2_chr1:18000":{"315":{"CAGACAG":[[NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN]]}}}
{"68ebd370-cef0-4b97-bd1b-825d623446e2_chr1:18000":{"322":{"GGGACTT":[[NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN]]}}}
{"68ebd370-cef0-4b97-bd1b-825d623446e2_chr1:18000":{"349":{"GAGACAT":[[NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN],[NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN]]}}}
{"68ebd370-cef0-4b97-bd1b-825d623446e2_chr1:18000":{"381":{"GTGACAG":[[NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN],[NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN]]}}}
{"68ebd370-cef0-4b97-bd1b-825d623446e2_chr1:18000":{"488":{"TGGACAA":[[NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN]]}}}

EventAlign output doesn’t seem to be wrong:

contig  position        reference_kmer  read_index      strand  event_index     event_level_mean        event_stdv      event_length    model_kmer      model_mean      model_stdv      standardized_level
68ebd370-cef0-4b97-bd1b-825d623446e2_chr1:18000 273     AGCTG   0       t       8       109.36  7.133   0.00498 AGCTG   117.44  3.55    -1.99
68ebd370-cef0-4b97-bd1b-825d623446e2_chr1:18000 274     GCTGT   0       t       9       87.27   1.491   0.00432 GCTGT   84.35   2.85    0.89
68ebd370-cef0-4b97-bd1b-825d623446e2_chr1:18000 275     CTGTT   0       t       10      102.96  2.044   0.00398 CTGTT   102.18  3.78    0.18
68ebd370-cef0-4b97-bd1b-825d623446e2_chr1:18000 276     TGTTA   0       t       11      111.84  6.158   0.00664 TGTTA   106.43  7.49    0.63

I tried to run m6anet-run_inference and the output is missing the probability_modified column:

transcript_id,transcript_position,n_reads,probability_modified
316b811d-5a1e-46ad-bcc6-969d81d03759_chr1:566000,19,40,
316b811d-5a1e-46ad-bcc6-969d81d03759_chr1:566000,78,23,
316b811d-5a1e-46ad-bcc6-969d81d03759_chr1:566000,86,847,
316b811d-5a1e-46ad-bcc6-969d81d03759_chr1:566000,147,821,
316b811d-5a1e-46ad-bcc6-969d81d03759_chr1:566000,156,867,

Here the commands I used:

$nanopolish eventalign \
    --reads $FASTQ \
    --bam $BAM \
    --genome $REF \
    --scale-events -t $NCPUS > $WORKDIR/$id.eventalign.txt

m6anet-dataprep --eventalign $WORKDIR/$id.eventalign.txt \
                --out_dir $WORKDIR --n_processes $NCPUS

m6anet-inference --input_dir $WORKDIR --out_dir $WORKDIR --n_processes $NCPUS

I got no errors during the installation process

Could you help me understand this error please?
Thank you in advance for your help

data.json file

Hey,
thanks for your tool! It works class!
I struggling a bit with the .json file. I want to extract the kmers for the result file. Do you have any suggestions how to do it? I tried using the json and ujson package from python but not really working.
Thanks in advance!

Understanding embedding layer dimensions

Dear authors,

Congrats on the publication!

I am a bit confused over the 66-dimension kmer-embedding layer. How do we go from DRACH motifs to an encoding of 66 dimensions? Thanks and keep up the good work!

Genome mapping

Hello,
thanks for your tool to detect m6a!
Is it possible to map the modifications to the genome? (like in xpore through using the genome flag in the dataprep step)?
Or is it only possible on the isoformlevel?

Big thanks!

The kit used for the cell line

Dear developer，

Could you please tell me the kit used for the cell line?
Thanks you~

converting transcripts and transcriptomic position obtained through m6Anet into genomic coordinates

Dear Christopher,

I have a question on converting the transcript ID and its position obtained through m6Anet into genes and genomic position.

Could you please advise on how to change the transcript id and transcript position into gene and genomic position, which you have reported in your m6Anet paper (supplementary table1)?

I appreciate your time and help.

Thanks a lot.
Best regards
Reza

goekelab / m6anet Goto Github PK

m6anet's Issues

Recommend Projects

Recommend Topics

Recommend Org