pengni / ccsmeth Goto Github PK

View Code? Open in Web Editor NEW

67.0 67.0 10.0 81.53 MB

Detecting DNA methylation from PacBio CCS reads

License: BSD 3-Clause Clear License

Python 99.28% R 0.72%

ccsmeth's Introduction

Hi there 👋

ccsmeth's People

Contributors

Stargazers

Watchers

Forkers

bowangxjtu lixingqiancs ahmedarslan xzhuo rahelehsalehi ningshuang-yao shaghayeghsoudi bit-vs-it hhe2 randolium

ccsmeth's Issues

model for CHG or CHH with HiFi reads

Hi, @PengNi

Thanks for great tool!! It running very smoothly. But can we predict the CHG or CHH signal with HiFi reads like training 5mCpG model? It's very import for plant people. Is there are some limitation of subreads? Or just need some ground truth? WGBS or ONT?

how about cpg_tools of Pacbio

Hello，

what is the different between Pacbio CpG tools and your ccsmeth software? And where is the models can I get when using --model_file /path/to/ccsmeth/models/model.ckpt

Thank you!

Question:types of input, old data or just HiFi?

This looks exactly what I have been searching for, for some time. I have old sequel2 data but these are not the more recent HiFi. Before I invest my time in installing and testing this, can anyone tell me if this accepts normal, old sequel2 output files (i.e. non-HiFi files)?

卡在11%不动了

你好！
我做了call_mod
然后error_output给了下面这个：

split_holebatches process-19085 starts
extract_features process-19087 starts
[W::hts_idx_load3] The index file is older than the data file: /gpfs/data/pirontilab/Students/data/RobPacBio/121A/mapped.bam.bai
extract_features process-19095 starts
/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/site-packages/torch/cuda/__init__.py:145: UserWarning: 
NVIDIA A100 80GB PCIe with CUDA capability sm_80 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37.
If you want to use the NVIDIA A100 80GB PCIe GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
extract_features process-19110 starts
/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/site-packages/torch/cuda/__init__.py:145: UserWarning: 
NVIDIA A100 80GB PCIe with CUDA capability sm_80 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37.
If you want to use the NVIDIA A100 80GB PCIe GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
extract_features process-19144 starts
extract_features process-19152 starts
split_holebatches process-19085 generates 9340 hole/read batches(50)

batch_reader:   0%|          | 0/9340 [00:00<?, ?it/s][W::hts_idx_load3] The index file is older than the data file: /gpfs/data/pirontilab/Students/data/RobPacBio/121A/mapped.bam.bai
Process Process-8:
Process Process-5:

batch_reader:   0%|          | 10/9340 [00:00<11:41, 13.30it/s]Traceback (most recent call last):
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/site-packages/ccsmeth/extract_features.py", line 531, in worker_extract_features_from_holebatches
    features_batch = _batch_feature_list2s(features_batch)
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/site-packages/ccsmeth/extract_features.py", line 480, in _batch_feature_list2s
    fipdms.append(np.array(kmer_ipdm, dtype=np.float))
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/site-packages/numpy/__init__.py", line 305, in __getattr__
    raise AttributeError(__former_attrs__[attr])
AttributeError: module 'numpy' has no attribute 'float'.
`np.float` was a deprecated alias for the builtin `float`. To avoid this error in existing code, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
Traceback (most recent call last):
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/site-packages/ccsmeth/extract_features.py", line 531, in worker_extract_features_from_holebatches
    features_batch = _batch_feature_list2s(features_batch)
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/site-packages/ccsmeth/extract_features.py", line 480, in _batch_feature_list2s
    fipdms.append(np.array(kmer_ipdm, dtype=np.float))
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/site-packages/numpy/__init__.py", line 305, in __getattr__
    raise AttributeError(__former_attrs__[attr])
AttributeError: module 'numpy' has no attribute 'float'.
`np.float` was a deprecated alias for the builtin `float`. To avoid this error in existing code, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

batch_reader:   0%|          | 16/9340 [00:00<07:29, 20.76it/s]Process Process-3:
Process Process-9:
Process Process-7:
Traceback (most recent call last):
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/site-packages/ccsmeth/extract_features.py", line 531, in worker_extract_features_from_holebatches
    features_batch = _batch_feature_list2s(features_batch)
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/site-packages/ccsmeth/extract_features.py", line 480, in _batch_feature_list2s
    fipdms.append(np.array(kmer_ipdm, dtype=np.float))
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/site-packages/numpy/__init__.py", line 305, in __getattr__
    raise AttributeError(__former_attrs__[attr])
AttributeError: module 'numpy' has no attribute 'float'.
`np.float` was a deprecated alias for the builtin `float`. To avoid this error in existing code, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

batch_reader:   0%|          | 21/9340 [00:01<11:19, 13.71it/s]Traceback (most recent call last):
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/site-packages/ccsmeth/extract_features.py", line 531, in worker_extract_features_from_holebatches
    features_batch = _batch_feature_list2s(features_batch)
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/site-packages/ccsmeth/extract_features.py", line 480, in _batch_feature_list2s
    fipdms.append(np.array(kmer_ipdm, dtype=np.float))
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/site-packages/numpy/__init__.py", line 305, in __getattr__
    raise AttributeError(__former_attrs__[attr])
AttributeError: module 'numpy' has no attribute 'float'.
`np.float` was a deprecated alias for the builtin `float`. To avoid this error in existing code, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
Traceback (most recent call last):
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/site-packages/ccsmeth/extract_features.py", line 531, in worker_extract_features_from_holebatches
    features_batch = _batch_feature_list2s(features_batch)
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/site-packages/ccsmeth/extract_features.py", line 480, in _batch_feature_list2s
    fipdms.append(np.array(kmer_ipdm, dtype=np.float))
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/site-packages/numpy/__init__.py", line 305, in __getattr__
    raise AttributeError(__former_attrs__[attr])
AttributeError: module 'numpy' has no attribute 'float'.
`np.float` was a deprecated alias for the builtin `float`. To avoid this error in existing code, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

batch_reader:   0%|          | 27/9340 [00:01<08:04, 19.20it/s]
batch_reader:   0%|          | 38/9340 [00:01<04:46, 32.51it/s]
batch_reader:   1%|          | 58/9340 [00:01<02:31, 61.11it/s]
batch_reader:   1%|          | 77/9340 [00:01<01:47, 86.55it/s]
batch_reader:   1%|          | 94/9340 [00:01<01:32, 100.23it/s]
batch_reader:   1%|          | 113/9340 [00:02<01:16, 120.08it/s]
batch_reader:   1%|▏         | 132/9340 [00:02<01:07, 136.62it/s]
batch_reader:   2%|▏         | 151/9340 [00:02<01:01, 149.84it/s]
batch_reader:   2%|▏         | 168/9340 [00:02<01:02, 146.20it/s]
batch_reader:   2%|▏         | 187/9340 [00:02<00:58, 157.36it/s]
batch_reader:   2%|▏         | 207/9340 [00:02<00:54, 167.52it/s]
batch_reader:   2%|▏         | 227/9340 [00:02<00:51, 175.88it/s]
batch_reader:   3%|▎         | 247/9340 [00:02<00:50, 180.27it/s]
batch_reader:   3%|▎         | 266/9340 [00:02<00:55, 162.07it/s]
batch_reader:   3%|▎         | 286/9340 [00:03<00:52, 171.92it/s]
batch_reader:   3%|▎         | 306/9340 [00:03<00:50, 177.51it/s]
batch_reader:   3%|▎         | 326/9340 [00:03<00:49, 182.13it/s]
batch_reader:   4%|▎         | 346/9340 [00:03<00:48, 184.52it/s]
batch_reader:   4%|▍         | 365/9340 [00:03<01:37, 91.77it/s] 
batch_reader:   4%|▍         | 385/9340 [00:03<01:21, 109.33it/s]
batch_reader:   4%|▍         | 401/9340 [00:04<01:31, 97.67it/s] 
batch_reader:   4%|▍         | 420/9340 [00:04<01:17, 114.61it/s]
batch_reader:   5%|▍         | 440/9340 [00:04<01:07, 132.07it/s]
batch_reader:   5%|▍         | 460/9340 [00:04<01:00, 146.57it/s]
batch_reader:   5%|▌         | 480/9340 [00:04<00:55, 158.23it/s]
batch_reader:   5%|▌         | 498/9340 [00:04<01:01, 142.78it/s]
batch_reader:   6%|▌         | 518/9340 [00:04<00:56, 154.90it/s]
batch_reader:   6%|▌         | 538/9340 [00:04<00:53, 165.79it/s]
batch_reader:   6%|▌         | 558/9340 [00:05<00:50, 173.70it/s]
batch_reader:   6%|▌         | 578/9340 [00:05<00:49, 178.79it/s]
batch_reader:   6%|▋         | 598/9340 [00:05<00:47, 183.99it/s]
batch_reader:   7%|▋         | 618/9340 [00:05<00:46, 186.13it/s]
batch_reader:   7%|▋         | 637/9340 [00:05<00:46, 185.62it/s]
batch_reader:   7%|▋         | 657/9340 [00:05<00:56, 154.24it/s]
batch_reader:   7%|▋         | 677/9340 [00:05<00:52, 163.98it/s]
batch_reader:   7%|▋         | 697/9340 [00:05<00:50, 171.88it/s]
batch_reader:   8%|▊         | 716/9340 [00:05<00:48, 176.33it/s]
batch_reader:   8%|▊         | 736/9340 [00:06<00:47, 181.41it/s]
batch_reader:   8%|▊         | 755/9340 [00:06<01:33, 91.70it/s] 
batch_reader:   8%|▊         | 770/9340 [00:07<02:24, 59.27it/s]
batch_reader:   8%|▊         | 790/9340 [00:07<01:52, 76.27it/s]
batch_reader:   9%|▊         | 810/9340 [00:07<01:30, 94.23it/s]
batch_reader:   9%|▉         | 830/9340 [00:07<01:16, 111.97it/s]
batch_reader:   9%|▉         | 850/9340 [00:07<01:06, 128.62it/s]
batch_reader:   9%|▉         | 868/9340 [00:07<01:12, 117.57it/s]
batch_reader:  10%|▉         | 888/9340 [00:07<01:02, 134.27it/s]
batch_reader:  10%|▉         | 908/9340 [00:07<00:57, 147.78it/s]
batch_reader:  10%|▉         | 928/9340 [00:07<00:52, 159.26it/s]
batch_reader:  10%|█         | 948/9340 [00:08<00:49, 168.35it/s]
batch_reader:  10%|█         | 968/9340 [00:08<00:47, 174.51it/s]
batch_reader:  11%|█         | 988/9340 [00:08<00:46, 179.26it/s]
batch_reader:  11%|█         | 1006/9340 [00:19<00:46, 179.26it/s]

然后就一直在这里卡着，不知道咋办
感谢你

CCSMETH TRAIN

Hello @PengNi

I want clarity on using the ccsmeth train to train a new model, please what is the component in the --train_file TRAIN_FILE because this was not well described in the manual? Kindly help with simplifying what this Train file should be made up of. Thanks.

A question about the output file

hi，I used this command：
ccsmeth call_freqb --input_bam /home/lyx/wangaokai/data/pacbio_analyse/tden-mas/03.call_mods/r64114_20201106_074159.hifi.pbmm2.call_mods.modbam.bam --ref /home/lyx/wangaokai/data/genome/tden/Tden.genome.fasta --output ./r64114_20201106_074159.hifi.pbmm2.call_mods.modbam.feaq --threads 20 --sort
I got：

This is different from the reference format in README. How do I understand it？

interpret features.zscore.fb.depth1.call_mods.tsv

Hi Peng, I ran this command:

/softwares/ccsmeth_v0.1.1_d20220329_python3venv/bin/ccsmeth call_mods \
  --input ./1_extract_res/chr22.pbmm2.features.zscore.fb.depth1.tsv \
  --model_file /ccsmeth/models/model_cpg_attbigru2s_hg002_15kb_s2.b21_epoch7.ckpt \
  --output ./2_prediction_res/chr22.pbmm2.features.zscore.fb.depth1.call_mods.tsv \
  --threads 10 --threads_call 2 --model_type attbigru2s

I got these results:

chr22	10513906	+	m64200e_210430_130607/71764808	2	0.590554	0.409446	0	CTCGT
chr22	10514601	+	m64200e_210430_130607/71764808	2	0.463456	0.536544	1	AACGG
chr22	10514791	+	m64200e_210430_130607/71764808	2	0.15142	0.84858	1	GGCGG
chr22	10515266	+	m64200e_210430_130607/71764808	2	0.401638	0.598362	1	ACCGT
chr22	10515380	+	m64200e_210430_130607/71764808	2	0.817427	0.182573	0	GTCGT
chr22	10515535	+	m64200e_210430_130607/71764808	2	0.159198	0.840802	1	ACCGT
chr22	10515734	+	m64200e_210430_130607/71764808	2	0.630529	0.369471	0	AACGG

I don't know the meaning of each column. Do you have a file to describe every column?

Thank you for your nice reply!
Best wishes,
Shuhua

RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

Numpy那个应该是没有问题了，这次走到21% 了哈哈
但是出现了cuDNN的error，这个好像是pytorch的报错，我确实不知道咋搞
↓我的code

#!/bin/bash
#SBATCH --mail-type=END,FAIL 
#SBATCH --nodes=1
#SBATCH --ntasks=2
#SBATCH --cpus-per-task=2
#SBATCH --time=02:00:00
#SBATCH --mem=48G
#SBATCH --gres=gpu:a100:1
#SBATCH -o %A_%a_output.txt
#SBATCH -e %A_%a_error.txt

CUDA_VISIBLE_DEVICES=0 ccsmeth call_mods \
  --input 121A/mapped.bam \
  --ref 121A/assembly.rotated.polished.renamed.fsa \
  --model_file /ccsmeth/models/model_ccsmeth_5mCpG_call_mods_attbigru2s_b21.v2.ckpt \
  --output output.hifi.pbmm2.call_mods \
  --threads 10 --threads_call 2 --model_type attbigru2s \
  --rm_per_readsite --mode align

↓ error.txt

batch_reader:  21%|██        | 1941/9340 [03:23<24:18,  5.07it/s]
batch_reader:  21%|██        | 1944/9340 [03:24<28:32,  4.32it/s]
batch_reader:  21%|██        | 1949/9340 [03:25<27:37,  4.46it/s]
batch_reader:  21%|██        | 1953/9340 [03:26<28:56,  4.25it/s]
batch_reader:  21%|██        | 1957/9340 [03:27<29:52,  4.12it/s]
batch_reader:  21%|██        | 1962/9340 [03:28<28:35,  4.30it/s]
batch_reader:  21%|██        | 1968/9340 [03:29<26:08,  4.70it/s]
batch_reader:  21%|██        | 1973/9340 [03:30<26:06,  4.70it/s]
batch_reader:  21%|██        | 1979/9340 [03:31<24:40,  4.97it/s]
batch_reader:  21%|██▏       | 1985/9340 [03:33<23:47,  5.15it/s]
batch_reader:  21%|██▏       | 1990/9340 [03:34<24:27,  5.01it/s]
batch_reader:  21%|██▏       | 1996/9340 [03:35<23:36,  5.18it/s]
batch_reader:  21%|██▏       | 2001/9340 [03:36<24:16,  5.04it/s]Process Process-6:
Process Process-4:
Traceback (most recent call last):
Traceback (most recent call last):
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/site-packages/ccsmeth/call_modifications.py", line 340, in _call_mods_q
    pred_str, accuracy, batch_num = _call_mods2s(features_batch, model, args.batch_size, device)
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/site-packages/ccsmeth/call_modifications.py", line 246, in _call_mods2s
    voutputs, vlogits = model(FloatTensor(b_fkmers, device), FloatTensor(b_fpasss, device),
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/site-packages/ccsmeth/call_modifications.py", line 340, in _call_mods_q
    pred_str, accuracy, batch_num = _call_mods2s(features_batch, model, args.batch_size, device)
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/site-packages/ccsmeth/models.py", line 118, in forward
    out1, n_states1 = self.rnn(out1, self.init_hidden(out1.size(0),
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/site-packages/ccsmeth/call_modifications.py", line 246, in _call_mods2s
    voutputs, vlogits = model(FloatTensor(b_fkmers, device), FloatTensor(b_fpasss, device),
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/site-packages/torch/nn/modules/rnn.py", line 942, in forward
    result = _VF.gru(input, hx, self._flat_weights, self.bias, self.num_layers,
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/site-packages/ccsmeth/models.py", line 118, in forward
    out1, n_states1 = self.rnn(out1, self.init_hidden(out1.size(0),
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/site-packages/torch/nn/modules/rnn.py", line 942, in forward
    result = _VF.gru(input, hx, self._flat_weights, self.bias, self.num_layers,
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

o(╥﹏╥)o

IndexError while running ccsmeth call_mods

I'm getting the following error and after this the job seems to be idling. it does not exit out and give me an error but instead keeps running without producing an output. the error is below -

===============================================
2024-01-10 13:37:00 - INFO - [main]call_mods starts
2024-01-10 13:37:00 - INFO - cuda availability: True
2024-01-10 13:37:02 - INFO - format_features process-10612 starts
2024-01-10 13:37:02 - INFO - call_mods process-10615 starts
2024-01-10 13:37:02 - INFO - write_process-10617 starts
2024-01-10 13:37:02 - INFO - read_features process-10602 starts
2024-01-10 13:37:02 - INFO - format_features process-10608 starts
2024-01-10 13:37:02 - INFO - format_features process-10607 starts
2024-01-10 13:37:02 - INFO - call_mods process-10616 starts
Process Process-1:
Traceback (most recent call last):
File "/crex/proj/snic2021-6-151/nobackup/Suvi/miniconda3/envs/ccsmethenv/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/crex/proj/snic2021-6-151/nobackup/Suvi/miniconda3/envs/ccsmethenv/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/crex/proj/snic2021-6-151/nobackup/Suvi/miniconda3/envs/ccsmethenv/lib/python3.8/site-packages/ccsmeth/_call_modifications_txt.py", line 74, in _read_features_file_to_str
h_num_total = _count_holenum(features_file)
File "/crex/proj/snic2021-6-151/nobackup/Suvi/miniconda3/envs/ccsmethenv/lib/python3.8/site-packages/ccsmeth/_call_modifications_txt.py", line 61, in _count_holenum
holeid = words[3]
IndexError: list index out of range
2024-01-10 13:37:02 - INFO - format_features process-10614 starts
2024-01-10 13:37:02 - INFO - format_features process-10603 starts
2024-01-10 13:37:02 - INFO - format_features process-10609 starts
2024-01-10 13:37:02 - INFO - format_features process-10606 starts
2024-01-10 13:37:02 - INFO - format_features process-10613 starts
2024-01-10 13:37:02 - INFO - format_features process-10604 starts
2024-01-10 13:37:02 - INFO - format_features process-10605 starts
2024-01-10 13:37:02 - INFO - format_features process-10611 starts

how do i go about fixing the index error as it says?

thanks,
Suvi

call_mod error in vision 0.5.0

Hello, @PengNi , I faced this problem when I run call_mod in vision 0.5.0, but not in vision 0.4.1.

This is my script:
CUDA_VISIBLE_DEVICES=1 ccsmeth call_mods
--input ./CD.hifi.CG_chr10.r10k_tdata.tsv
--output ./testing_res
--model_file ./trian_model/attbigru2s.betterthanlast.b21_epoch7.ckpt
--model_type attbigru2s --threads 2 --threads_call 2 &

How to write a Dataset that reduces memory usage and fits in DistributedSampler?

iterable-style datasets? Chunkable datasets?
https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader
pytorch/pytorch#26547

install error

Hi Peng Ni,
When I install the methccs, but I got this error report as follows:

FileNotFoundError: [Errno 2] No such file or directory: '/home/wangbo/softwares/methccs-master/deepsmrt/_version.py'

Please get me a help.

Yours,
Bo

stderr/log optimization

move all the logs in stdout to stderr, or use logging instead?

setting temp folder for samtools

I am running this in a shared HPC environment. Looks like the samtools is using /tmp and it has limited space. I am aware there is a --tmp-dir setting for samtools. I do not see how to set it up for the CCSmeth pipeline (environment variable?)

Thanks for your help.

==stderr:
samtools sort: fail to open "/tmp/samtools.3547891.6686.tmp.0000.bam": No such file or directory
terminate called after throwing an instance of 'PacBio::Utility::AlarmException'
/bin/sh: line 1: 3547891 Aborted (core dumped) pbmm2 align --preset HIFI -j 8 --sort

Methylation Data Labels Problems

Hi! I am currently reviewing the performance of various PacBio methylation detection tools and I am in the process of replicating your model and code for further understanding. May I kindly inquire if the methylation data labels in your zebrafish study were derived using the Bismark tool? Additionally, could you guide me to where I might find the labels for the publicly available NA12898 data you mentioned in your research? If it's more convenient, would you consider sharing the labels for both the zebrafish and NA12898 datasets? Thank you in advance for your assistance ：）

ccsmeth call_mods

Dear Peng,
I run the 'ccsmeth call_mods' command and got these results:

[chenshuhua@xnode07 hg38_hla]$ /softwares/ccsmeth_v0.1.1_d20220329_python3venv/bin/ccsmeth call_mods \
>   --input ./1_extract_res/${BAM}.pbmm2.features.zscore.fb.depth1.tsv \
>   --model_file /software/ccsmeth/models/model_cpg_attbigru2s_hg002_15kb_s2.b21_epoch7.ckpt \
>   --output ./2_prediction_res/${BAM}.pbmm2.features.zscore.fb.depth1.call_mods.tsv \
>   --threads 16--threads_call 2 --model_type attbigru2s
# ===============================================
## parameters:
input:
	./1_extract_res/chr22.CKCG-00309-CLR.clr.aln.bam.pbmm2.features.zscore.fb.depth1.tsv
holes_batch:
	50
model_file:
	/software/ccsmeth/models/model_cpg_attbigru2s_hg002_15kb_s2.b21_epoch7.ckpt
model_type:
	attbigru2s
seq_len:
	21
is_stds:
	yes
class_num:
	2
dropout_rate:
	0
batch_size:
	512
n_vocab:
	16
n_embed:
	4
layer_rnn:
	3
hid_rnn:
	256
layer_tfe:
	6
d_model_tfe:
	256
nhead_tfe:
	4
nhid_tfe:
	512
output:
	./2_prediction_res/chr22.CKCG-00309-CLR.clr.aln.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv
ref:
	None
holeids_e:
	None
holeids_ne:
	None
motifs:
	CG
mod_loc:
	0
methy_label:
	1
mapq:
	20
identity:
	0.8
two_strands:
	False
comb_strands:
	False
depth:
	1
norm:
	zscore
no_decode:
	False
num_subreads:
	0
path_to_samtools:
	None
seed:
	1234
threads:
	16
threads_call:
	2
tseed:
	1234
# ===============================================
[main]call_mods starts..
format_features process-767315 starts
format_features process-767309 starts
format_features process-767316 starts
format_features process-767308 starts
read_features process-767307 starts
format_features process-767313 starts
format_features process-767319 starts
format_features process-767312 starts
format_features process-767311 starts
format_features process-767314 starts
format_features process-767318 starts
format_features process-767310 starts
format_features process-767317 starts
write_process-767322 starts
call_mods process-767321 starts
call_mods process-767320 starts
read_features process-767307 ending, read 5292 holes
format_features process-767311 ending, read 9 batches
format_features process-767314 ending, read 7 batches
format_features process-767317 ending, read 9 batches
format_features process-767310 ending, read 6 batches
format_features process-767309 ending, read 10 batches
format_features process-767318 ending, read 7 batches
format_features process-767315 ending, read 9 batches
format_features process-767313 ending, read 9 batches
format_features process-767316 ending, read 8 batches
format_features process-767319 ending, read 11 batches
format_features process-767308 ending, read 11 batches
format_features process-767312 ending, read 10 batches

i can get some results from chr22:

[chenshuhua@login04 2_prediction_res]$ wc -l chr22.CKCG-00309-CLR.clr.aln.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv
73499 chr22.CKCG-00309-CLR.clr.aln.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv
[chenshuhua@login04 2_prediction_res]$ tail chr22.CKCG-00309-CLR.clr.aln.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv
chr22	17985464	+	m64252e_210528_083148/45615225	1	0.048914	0.951086	1	TCCGG
chr22	17985737	+	m64252e_210528_083148/45615225	1	0.64279	0.35721	0	ATCGA
chr22	17981642	+	m64252e_210528_083148/34866157	1	0.052538	0.947462	1	AGCGC
chr22	17981666	+	m64252e_210528_083148/34866157	1	0.065927	0.934073	1	TGCGC
chr22	17981725	+	m64252e_210528_083148/34866157	1	0.12836	0.87164	1	GGCGA
chr22	17981806	+	m64252e_210528_083148/34866157	1	0.174896	0.825104	1	GGCGG
chr22	17982403	+	m64252e_210528_083148/34866157	1	0.056801	0.943199	1	CACGG
chr22	17982495	+	m64252e_210528_083148/34866157	1	0.538114	0.461886	0	AACGC
chr22	17982530	+	m64252e_210528_083148/34866157	1	0.201508	0.798492	1	GGCGC
chr22	17982584	+	m64252e_210528_083148/34866157	1	0.332635	0.667365	1	ATCGC

But it seems like the result just stopped at chr22:17982584 for more than 10 hours.
I check the software status it is "S" meaning "stopping".

top - 09:08:58 up 34 days, 13:44,  2 users,  load average: 1.22, 1.20, 1.23
Tasks: 864 total,   2 running, 860 sleeping,   0 stopped,   2 zombie
%Cpu(s):  2.0 us,  1.3 sy,  0.0 ni, 96.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem : 385166.6 total, 171320.6 free,  20672.6 used, 193173.4 buff/cache
MiB Swap:  51200.0 total,  51097.5 free,    102.5 used. 353409.4 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 693939 chenshu+  20   0   68060   6128   4292 S   0.7   0.0   7:18.51 top
1061537 chenshu+  20   0   65104   5652   3976 R   0.7   0.0   0:00.26 top
 693088 chenshu+  20   0   25172   5952   3400 S   0.0   0.0   0:00.06 /bin/bash
 693850 chenshu+  20   0  172208   6020   4660 S   0.0   0.0   0:02.32 sshd: chenshuhua@pts/2
 693851 chenshu+  20   0   25184   5980   3476 S   0.0   0.0   0:00.05 -bash
 766982 chenshu+  20   0 6323640 119824   6456 S   0.0   0.0   0:04.87 softwares/ccsmeth_v0.1.1_d20220329_python3venv/bin/python3 /softwares/ccsmeth_v0.1.1_d20220329_python3venv/bin/ccsmeth call_mods --input ./1_extract_res/chr2+
 767306 chenshu+  20   0   55196   8924   2464 S   0.0   0.0   0:00.03 /softwares/ccsmeth_v0.1.1_d20220329_python3venv/bin/python3 -c from multiprocessing.resource_tracker import main;main(12)
 767308 chenshu+  20   0 6802004 519536   6372 S   0.0   0.1   0:07.72 /softwares/ccsmeth_v0.1.1_d20220329_python3venv/bin/python3 -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=13, pipe_handle=21) --multiprocessing-fork
 767309 chenshu+  20   0 6754884 465660   6352 S   0.0   0.1   0:06.64 /softwares/ccsmeth_v0.1.1_d20220329_python3venv/bin/python3 -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=13, pipe_handle=23) --multiprocessing-fork
 767310 chenshu+  20   0 6660628 382556   6388 S   0.0   0.1   0:05.01 /softwares/ccsmeth_v0.1.1_d20220329_python3venv/bin/python3 -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=13, pipe_handle=26) --multiprocessing-fork
 767311 chenshu+  20   0 6702432 424536   6492 S   0.0   0.1   0:06.56 /softwares/ccsmeth_v0.1.1_d20220329_python3venv/bin/python3 -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=13, pipe_handle=29) --multiprocessing-fork
 767312 chenshu+  20   0 6720200 437152   6480 S   0.0   0.1   0:06.44 /softwares/ccsmeth_v0.1.1_d20220329_python3venv/bin/python3 -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=13, pipe_handle=31) --multiprocessing-fork
 767313 chenshu+  20   0 6710220 377140   6340 S   0.0   0.1   0:06.41 /softwares/ccsmeth_v0.1.1_d20220329_python3venv/bin/python3 -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=13, pipe_handle=34) --multiprocessing-fork
 767314 chenshu+  20   0 6747560 423872   6440 S   0.0   0.1   0:05.63 /softwares/ccsmeth_v0.1.1_d20220329_python3venv/bin/python3 -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=13, pipe_handle=37) --multiprocessing-fork
 767315 chenshu+  20   0 6706640 372192   6408 S   0.0   0.1   0:06.33 /softwares/ccsmeth_v0.1.1_d20220329_python3venv/bin/python3 -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=13, pipe_handle=39) --multiprocessing-fork
 767316 chenshu+  20   0 6717492 436604   6436 S   0.0   0.1   0:06.16 /softwares/ccsmeth_v0.1.1_d20220329_python3venv/bin/python3 -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=13, pipe_handle=41) --multiprocessing-fork
 767317 chenshu+  20   0 6743736 414116   6368 S   0.0   0.1   0:06.23 /softwares/ccsmeth_v0.1.1_d20220329_python3venv/bin/python3 -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=13, pipe_handle=43) --multiprocessing-fork
 767318 chenshu+  20   0 6640932 360924   6364 S   0.0   0.1   0:05.28 /softwares/ccsmeth_v0.1.1_d20220329_python3venv/bin/python3 -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=13, pipe_handle=45) --multiprocessing-fork
 767319 chenshu+  20   0 6770504 483744   6368 S   0.0   0.1   0:06.85 /softwares/ccsmeth_v0.1.1_d20220329_python3venv/bin/python3 -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=13, pipe_handle=47) --multiprocessing-fork
 767320 chenshu+  20   0       0      0      0 Z   0.0   0.0   0:08.16 [python3] <defunct>
 767321 chenshu+  20   0       0      0      0 Z   0.0   0.0   0:08.17 [python3] <defunct>
 767322 chenshu+  20   0 6323268 123896   6640 S   0.0   0.0   0:06.85 /softwares/ccsmeth_v0.1.1_d20220329_python3venv/bin/python3 -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=13, pipe_handle=53) --multiprocessing-fork
1061449 chenshu+  20   0  172208   5636   4272 S   0.0   0.0   0:00.00 sshd: chenshuhua@pts/3
1061450 chenshu+  20   0   25184   5904   3400 S   0.0   0.0   0:00.06 -bash

I was wondering if the problem is from my data, my GPU, or your software?

Extract features file

Dear,
I align the ccs.bam to genome using ccsmeth align and the output is input into ccsmeth extract as follows. But I get empty file and without any error stdout.

Index error during align_reads in denovo mode

Hi there,

I'm getting the below error after the call_mods has generated the bam file and I use it to align to ref. It generated the pbmm2.bam file but could not create the BAI index.

2023-12-06 17:30:58 - INFO - stdout:

2023-12-06 17:30:58 - INFO - stderr:
>|> 20231206 16:33:04.637 -|- WARN -|- AlignSettings -|- 0x2b9fefede340|| -|- Requested more threads for alignment 
(160) than system-wide available (40)
>|> 20231206 16:33:04.689 -|- WARN -|- operator() -|- 0x2b9fefede340|| -|- Input is aligned reads. Only primary 
alignments will be respected to allow idempotence!
[E::hts_idx_check_range] Region 536851006..536880338 cannot be stored in a bai index. Try using a csi index
[E::sam_index] Read 'm64164_220611_190014/99878424/ccs' with ref_name='Chr1At', ref_length=614431332, 
flags=0, pos=536851007 cannot be indexed
>|> 20231206 17:30:58.187 -|- FATAL -|- Close -|- 0x2b9fefede340|| -|- pbmm2 align ERROR: BAI Index Generation: 
Failed to create index for output.hifi.call_mods.modbam.pbmm2.bam

I tried to run pbmm2 directly using this code which allows for a CSI index to be created by pbmm2 and samtools

pbmm2 align --preset CCS -j 40 --sort  --bam-index CSI ref.fasta output.hifi.call_mods.modbam.bam 
output.hifi.call_mods.modbam.pbmm2.bam && samtools index -@ 40 -c output.hifi.call_mods.modbam.pbmm2.bam

I proceeded to the call modification freq step using the aggregate mode with the above bam and index files and got the below error which means the CSI index file is not being recognised and samtools tries to create a BAI index again and fails.

2023-12-06 23:40:49 - INFO - [main]call_freq_bam starts
2023-12-06 23:40:49 - INFO - indexing bam file-output.call_mods.modbam.pbmm2.bam

[E::hts_idx_check_range] Region 536851006..536880338 cannot be stored in a bai index. Try using a csi index
[E::sam_index] Read 'm64164_220611_190014/99878424/ccs' with ref_name='Chr1At', ref_length=614431332, flags=0, pos=536851007 cannot be indexed
Traceback (most recent call last):
  File "//miniconda3/envs/ccsmeth/bin/ccsmeth", line 10, in <module>
    sys.exit(main())
  File "/miniconda3/envs/ccsmeth/lib/python3.8/site-packages/ccsmeth/ccsmeth.py", line 795, in main
    args.func(args)
  File "/miniconda3/envs/ccsmeth/lib/python3.8/site-packages/ccsmeth/ccsmeth.py", line 44, in main_call_freqb
    call_mods_frequency_from_bamfile(args)
  File "/miniconda3/envs/ccsmeth/lib/python3.8/site-packages/ccsmeth/call_mods_freq_bam.py", line 688, in call_mods_frequency_from_bamfile
    index_bam_if_needed2(inputpath, args.threads)
  File "/miniconda3/envs/ccsmeth/lib/python3.8/site-packages/ccsmeth/utils/process_utils.py", line 280, in index_bam_if_needed2
    pysam.index("-@", str(threads), inputfile)
  File "/miniconda3/envs/ccsmeth/lib/python3.8/site-packages/pysam/utils.py", line 83, in __call__
    raise SamtoolsError(
pysam.utils.SamtoolsError: 'samtools returned with error 1: stdout=, stderr=samtools index: failed to create index for "output.hifi.call_mods.modbam.pbmm2.bam": Numerical result out of range\n'

What is the workaround for this please?

about vocub_size

ccsmeth/ccsmeth/models.py

Line 31 in dbc0cba

self.embed = nn.Embedding(vocab_size, embedding_size) # for dna/rna base

Hi, nice project! But I am confused about this parameter. From my understanding, there are only four types of base, i.e. ATCG. Why you set the vocabulary size as 16? Thx!

Training error

Hi,

I wanted to train ccsmeth on

my data, I changed batch_size several times, but still I have this error after some internal epochs. Could you please help me out?
Thank you so much.
Raheleh

add submodule for model_ckpt downloading

like ccsmeth get_models

call strand-specific methylation

add options to call strand-specific methylation from CCS.

Results based on count and model at the genome level is different

Hi PengNi
I'm trying to test ccsmeth with HG002 and I'm noticing a big difference in the number of results produced using count method as model, is this normal?
The HG002 label uses sulphites and count method seems to have more counts, close to twice as model

Normalization

Hi Peng,
Thanks again for sharing your code. I was wondering did you normalize the data in training? I realized you normalized the data when you extracted the features.

Call plant methylation using hg002 trained model？

Hi Dr. Peng Ni，

Could I use your trained hg002 model to call methylation in Arabidopsis or other plants ?

Best wishes,

Bo Wang

Where is the master - models

Hello,
Where is the master - models, can I use model_cpg_attbigru2s_hg002_15kb_s2.b21_epoch7.ckpt?

A question about the frequency threshold determination

I used the call_freqb as described and output the BED file, column 11 of the file is the frequency and likelihood of methylation sites, I would like to ask you if there is a recommended filter value (how much frequency should be greater) as a trusted methylation site.

How to avoid call_freqb aggregate-mode using all cores automatically

pytorch issue?

A question about BedTools

hello，PengNi，While doing call_freqb, I encountered the following problem，How do I deal with it？Thank you

Error was:

            Traceback (most recent call last):
      File "/home/lyx/wangaokai/anaconda3/envs/ccsmeth-mas/lib/python3.10/multiprocessing/process.py", line 315, in _bootstrap
        self.run()
      File "/home/lyx/wangaokai/anaconda3/envs/ccsmeth-mas/lib/python3.10/multiprocessing/process.py", line 108, in run
        self._target(*self._args, **self._kwargs)
      File "/home/lyx/wangaokai/anaconda3/envs/ccsmeth-mas/lib/python3.10/site-packages/ccsmeth/call_mods_freq_bam.py", line 531, in _worker_write_bed_result
        ori_bed.sort().moveto(bedfile)
      File "/home/lyx/wangaokai/anaconda3/envs/ccsmeth-mas/lib/python3.10/site-packages/pybedtools/bedtool.py", line 923, in decorated
        result = method(self, *args, **kwargs)
      File "/home/lyx/wangaokai/anaconda3/envs/ccsmeth-mas/lib/python3.10/site-packages/pybedtools/bedtool.py", line 402, in wrapped
        stream = call_bedtools(
      File "/home/lyx/wangaokai/anaconda3/envs/ccsmeth-mas/lib/python3.10/site-packages/pybedtools/helpers.py", line 460, in call_bedtools
        raise BEDToolsError(subprocess.list2cmdline(cmds), stderr)
    pybedtools.helpers.BEDToolsError:
    Command was:

            bedtools sort -i ./r64114_20201106_074159.hifi.pbmm2.call_mods.modbam.feaq.count.all.bed
    
    Error message was:
    Error: malformed BED entry at line 12023531. Start Coordinate detected that is < 0. Exiting.

allele-specific methylation

Hi，Very good tool, I came here specially to ask you a question.

In the article(DNA 5-methylcytosine detection and methylation phasing using PacBio circular consensus sequencing), Allele specific methylation is obtained by combining SNP of phase separation with Pacbio CCS Reads.

If I already have haplotype resolved genome, how to get allele-specific methylation?

use diploid genome instead of haplotype resolved genome as reference genome？
if used haplotype resolved genome as reference genome, how to use ccsmeth？
Extracted single set of reads，and then after analyzing.

Hope to hear from you,very thanks

Installing ccsmeth

Hello

I could not install the ccsmeth. I ran the commands in the linux cluster. I tried with both commands given for installing ccsmeth.

python setup.py install

No local packages or working download links found for torch<=1.7.0,>=1.2.0
error: Could not find suitable distribution for Requirement.parse('torch<=1.7.0,>=1.2.0')

pip install ccsmeth
`
  Building wheel for pysam (setup.py) ... error
  ERROR: Command errored out with exit status 1:
```

Has anyone come across such issues?

Thanks
Priya

add pbmm2 as subreads aligner

Import error

Hello

I used the below code to call methyl modifications:

CUDA_VISIBLE_DEVICES=0 ccsmeth call_mods \
  --input /scicore/home/cichon/GROUP/PacBio_workflow/methylation/002-016/Xt4Y5lka_mapped.bam \
  --ref /scicore/home/cichon/GROUP/PacBio_workflow/workflow/workflow/reference/human_GRCh38_no_alt_analysis_set.fasta \
  --model_file /scicore/home/cichon/GROUP/PacBio_workflow/methylation/002-016/ccsmeth/build/lib/ccsmeth/ \
  --output /scicore/home/cichon/GROUP/PacBio_workflow/methylation/002-016/output_ccsmeth/ \
  --threads 10 --threads_call 2 --model_type attbigru2s

I got the below error:

Traceback (most recent call last):
  File "/scicore/home/cichon/thirun0000/miniconda3/envs/ccsmethenv/bin/ccsmeth", line 33, in <module>
    sys.exit(load_entry_point('ccsmeth==0.2.3', 'console_scripts', 'ccsmeth')())
  File "/scicore/home/cichon/thirun0000/miniconda3/envs/ccsmethenv/lib/python3.7/site-packages/ccsmeth-0.2.3-py3.7.egg/ccsmeth/ccsmeth.py", line 769, in main
    args.func(args)
  File "/scicore/home/cichon/thirun0000/miniconda3/envs/ccsmethenv/lib/python3.7/site-packages/ccsmeth-0.2.3-py3.7.egg/ccsmeth/ccsmeth.py", line 27, in main_call_mods
    from .call_modifications import call_mods
  File "/scicore/home/cichon/thirun0000/miniconda3/envs/ccsmethenv/lib/python3.7/site-packages/ccsmeth-0.2.3-py3.7.egg/ccsmeth/call_modifications.py", line 14, in <module>
    from sklearn import metrics
  File "/scicore/home/cichon/thirun0000/miniconda3/envs/ccsmethenv/lib/python3.7/site-packages/sklearn/__init__.py", line 64, in <module>
    from .base import clone
  File "/scicore/home/cichon/thirun0000/miniconda3/envs/ccsmethenv/lib/python3.7/site-packages/sklearn/base.py", line 11, in <module>
    from scipy import sparse
  File "/scicore/home/cichon/thirun0000/miniconda3/envs/ccsmethenv/lib/python3.7/site-packages/scipy/sparse/__init__.py", line 231, in <module>
    from .csr import *
  File "/scicore/home/cichon/thirun0000/miniconda3/envs/ccsmethenv/lib/python3.7/site-packages/scipy/sparse/csr.py", line 15, in <module>
    from ._sparsetools import csr_tocsc, csr_tobsr, csr_count_blocks, \
ImportError: /lib64/libstdc++.so.6: version `CXXABI_1.3.9' not found (required by /scicore/home/cichon/thirun0000/miniconda3/envs/ccsmethenv/lib/python3.7/site-packages/scipy/sparse/_sparsetools.cpython-37m-x86_64-linux-gnu.so)

I checked using the grep command

sparse/_sparsetools.cpython-37m-x86_64-linux-gnu.so | grep CXXABI
 
Found no  CXXABI module. Should I do the installation again

Thanks
Priya

Extract features file structure

Hi, I used ccsmeth extract command to extract features. How should it be structured when opened in tsv file in python?
Could you please give some information related to the file structure?
Thank you so much...

calling modifications on Pacbio Revio platform

Dear authors,
thanks for your great tool 'ccsmeth'. I try calling modifications on a hifi.bam generated from Pacbio Revio platform. However, I only get a empty bam files.
I have noticed the given models are trained using PacBio Sequel II reads, I wonder whether differences between the two platforms leads to my results.
I look forward to your reply!

How to train a model to infer the site-level methylation frequency

Hello @PengNi

I want clarity on using the ccsmeth Train to train a new model for infering the site-level methylation frequency, but I don't find the description. Thanks.

Call modification

Hi Peng,
Thanks again for ccsmeth. You suggested using "Quick start", I have a question related to the call modification step and train step. What is the difference between the two steps? If I skip the modification call and after the first and second steps, extract the features from hifi reads, that would be correct? Why should I do the modification step?

Thanks a lot for your time and help.

ccsmeth train

Hi Peng,
I would like to retrain your ccsmeth model using another ccs or clr dataset with corresponding WGBS data. I am not sure about your "train_file", is it an aligned bam file, feature .tsv file formatted by step 2 'ccsmeth extract'? How can I label the methylation level of every site or every ZMW?

ccsmeth train -h
usage: ccsmeth train [-h] --train_file TRAIN_FILE --valid_file VALID_FILE
                     --model_dir MODEL_DIR
  --train_file TRAIN_FILE
  --valid_file VALID_FILE

Can you give me an example command?
Thanks!

Best wishes,
Shuhua

All reads are skipped/failed

Hi PengNi!
I recently installed the ccsmeth and encountered a problem when using 'call_mods'.

The command I used is
CUDA_VISIBLE_DEVICES=0 ccsmeth call_mods --input test.hifi_reads.aligned.bam --ref reference.fasta --model_file model_ccsmeth_5mCpG_call_mods_attbigru2s_b21.v2.ckpt --output test.hifi_reads.aligned.call_mods --threads 60 --threads_call 4 --model_type attbigru2s --mode align

The program exits without error, but the result(modbam.bam) is empty.
Looking at the log, I found the following problem:

2023-07-20 14:48:25 - INFO - extract_features process-47563 ending, proceed 2040 hole_batches(50): 102000 holes/reads in total, 102000 skipped/failed.
2023-07-20 14:48:25 - INFO - extract_features process-47286 ending, proceed 2079 hole_batches(50): 103950 holes/reads in total, 103950 skipped/failed.
...
2023-07-20 14:48:26 - INFO - extract_features process-45955 ending, proceed 2199 hole_batches(50): 109950 holes/reads in total, 109950 skipped/failed.
2023-07-20 14:48:26 - INFO - call_mods process-42482 ending, proceed 0 batches(512)
2023-07-20 14:48:26 - INFO - call_mods process-41961 ending, proceed 0 batches(512)
2023-07-20 14:48:26 - INFO - wrote 0 reads, in which 0 were added mm tags

0 reads were written because all reads were skipped or failed.
I cannot figure out why. Could you give me some hints?

optimize ccsmeth call_mods procedure

now ccsmeth call_mods: call mods -> save as per_readsite.tsv -> gathering per_read modinfo -> save as modbam format.

maybe call_mods -> directly save as modbam format is a better way.

ccsmeth call_mods: indexError and TypeError

Hi!
We performed ccsmeth through the data included in the paper and confirmed that the accuracy was improved over primrose.

Now we are testing ccsmeth with data from REVIO.
After aligning with the hifi_reads.bam file containing the kinetics signal, call_mods was performed.
The modbam.bam file was created, but the following error was displayed in the log.
What could be the problem?

Output reference CGs or not when there are SNVs at corresponding sites of reads

If output those CGs or not, when in reference they are CGs, but in reads they are not CGs because of SNV. ccsmeth won't give predictions of those sites, since ccsmeth only predicts CGs in reads.

For now, ccsmeth won't output those sites. Another solution is, as primrose, output those sites as unmethylated (modification freq = 0).

TypeError: 'tuple' object does not support item assignment

Hi, Peng.
When I run this command:

ccsmeth extract --input chr22.ccs.subreads.bam --ref hg38.fa --threads 5 --norm zscore --comb_strands --depth 1 --output chr22.ccs.subreads.features.zscore.fb.depth1.tsv

I got these messages:

## parameters:
input:
	chr22.ccs.subreads.bam
ref:
	hg38.fa
holeids_e:
	None
holeids_ne:
	None
output:
	chr22.ccs.subreads.bam.features.zscore.fb.depth1.tsv
seq_len:
	21
motifs:
	CG
mod_loc:
	0
methy_label:
	1
mapq:
	20
identity:
	0.8
two_strands:
	False
comb_strands:
	True
depth:
	1
norm:
	zscore
no_decode:
	False
num_subreads:
	0
path_to_samtools:
	None
holes_batch:
	50
seed:
	1234
threads:
	10
# ===============================================
[extract_features]start..
read_input process-520151 starts
cmd to view input: samtools view -@ 3 -h chr22.ccs.subreads.bam
extrac_features process-520153 starts
extrac_features process-520156 starts
extrac_features process-520161 starts
extrac_features process-520167 starts
extrac_features process-520168 starts
extrac_features process-520169 starts
extrac_features process-520170 starts
extrac_features process-520171 starts
write_process-520172 started
Process Process-9:
Traceback (most recent call last):
  File "/home/chenshuhua/.local/opt/miniconda_v4.9.2_d20210310/envs/ccsmethenv/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/home/chenshuhua/.local/opt/miniconda_v4.9.2_d20210310/envs/ccsmethenv/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/home/chenshuhua/.local/opt/miniconda_v4.9.2_d20210310/envs/ccsmethenv/lib/python3.6/site-packages/ccsmeth-0.1.0-py3.6.egg/ccsmeth/extract_features.py", line 543, in _worker_extract
    feature_list += handle_one_hole2(hole_aligninfo, contigs, motifs, args)
  File "/home/chenshuhua/.local/opt/miniconda_v4.9.2_d20210310/envs/ccsmethenv/lib/python3.6/site-packages/ccsmeth-0.1.0-py3.6.egg/ccsmeth/extract_features.py", line 451, in handle_one_hole2
    feature_list += _comb_fb_features(fwd_features, bwd_features)
  File "/home/chenshuhua/.local/opt/miniconda_v4.9.2_d20210310/envs/ccsmethenv/lib/python3.6/site-packages/ccsmeth-0.1.0-py3.6.egg/ccsmeth/extract_features.py", line 362, in _comb_fb_features
    ffea[4] = max(ffea[4], bfea[4])
TypeError: 'tuple' object does not support item assignment

Looking forward to your reply.

Best wishes,
Shuhua

empty output of ccsmeth extract command

Dear,
Firstly, I align the subreads to genome using ccsmeth align and the output is input into ccsmeth extract as follows. But I get empty file m64081.pbmm2.features.zscore.fb.depth1.tsv and without any error stdout.
The bam file input into ccsmeth align is output using the command ccs input output --hifi-kinetics.

ccsmeth extract --input m64081.pbmm2.bam --ref genome.fasta --threads 1 --norm zscore --comb_strands --depth 1 --output m64081.pbmm2.features.zscore.fb.depth1.tsv

And, the stdout is shown as follows:

# ===============================================
## parameters: 
input:
	m64081.pbmm2.bam
ref:
	genome.fasta
holeids_e:
	None
holeids_ne:
	None
output:
	m64081.pbmm2.features.zscore.fb.depth1.tsv
seq_len:
	21
motifs:
	CG
mod_loc:
	0
methy_label:
	1
mapq:
	20
identity:
	0.8
two_strands:
	False
comb_strands:
	True
depth:
	1
norm:
	zscore
no_decode:
	False
num_subreads:
	0
path_to_samtools:
	None
holes_batch:
	50
seed:
	1234
threads:
	10
# ===============================================
[extract_features]start..
extrac_features process-370857 starts
extrac_features process-370857, 200 hole_batches(50) proceed
extrac_features process-370866 starts
extrac_features process-370866, 200 hole_batches(50) proceed
extrac_features process-370863 starts
extrac_features process-370863, 200 hole_batches(50) proceed
extrac_features process-370864 starts
extrac_features process-370864, 200 hole_batches(50) proceed
extrac_features process-370861 starts
extrac_features process-370861, 200 hole_batches(50) proceed
extrac_features process-370865 starts
extrac_features process-370865, 200 hole_batches(50) proceed
extrac_features process-370860 starts
extrac_features process-370860, 200 hole_batches(50) proceed
extrac_features process-370859 starts
extrac_features process-370859, 200 hole_batches(50) proceed
extrac_features process-370866, 400 hole_batches(50) proceed
extrac_features process-370857, 400 hole_batches(50) proceed
extrac_features process-370864, 400 hole_batches(50) proceed
extrac_features process-370863, 400 hole_batches(50) proceed
extrac_features process-370860, 400 hole_batches(50) proceed
extrac_features process-370861, 400 hole_batches(50) proceed
extrac_features process-370865, 400 hole_batches(50) proceed
extrac_features process-370859, 400 hole_batches(50) proceed
extrac_features process-370866, 600 hole_batches(50) proceed
extrac_features process-370864, 600 hole_batches(50) proceed
extrac_features process-370857, 600 hole_batches(50) proceed
extrac_features process-370863, 600 hole_batches(50) proceed
extrac_features process-370860, 600 hole_batches(50) proceed
extrac_features process-370865, 600 hole_batches(50) proceed
extrac_features process-370861, 600 hole_batches(50) proceed
extrac_features process-370866, 800 hole_batches(50) proceed
extrac_features process-370859, 600 hole_batches(50) proceed
extrac_features process-370860, 800 hole_batches(50) proceed
extrac_features process-370863, 800 hole_batches(50) proceed
extrac_features process-370857, 800 hole_batches(50) proceed
extrac_features process-370864, 800 hole_batches(50) proceed
extrac_features process-370865, 800 hole_batches(50) proceed
extrac_features process-370861, 800 hole_batches(50) proceed
extrac_features process-370860, 1000 hole_batches(50) proceed
extrac_features process-370866, 1000 hole_batches(50) proceed
extrac_features process-370863, 1000 hole_batches(50) proceed
extrac_features process-370857, 1000 hole_batches(50) proceed
extrac_features process-370864, 1000 hole_batches(50) proceed
extrac_features process-370859, 800 hole_batches(50) proceed
extrac_features process-370865, 1000 hole_batches(50) proceed
extrac_features process-370860, 1200 hole_batches(50) proceed
extrac_features process-370861, 1000 hole_batches(50) proceed
extrac_features process-370863, 1200 hole_batches(50) proceed
extrac_features process-370866, 1200 hole_batches(50) proceed
extrac_features process-370857, 1200 hole_batches(50) proceed
extrac_features process-370864, 1200 hole_batches(50) proceed
extrac_features process-370860, 1400 hole_batches(50) proceed
extrac_features process-370865, 1200 hole_batches(50) proceed
extrac_features process-370861, 1200 hole_batches(50) proceed
extrac_features process-370863, 1400 hole_batches(50) proceed
extrac_features process-370857, 1400 hole_batches(50) proceed
extrac_features process-370866, 1400 hole_batches(50) proceed
extrac_features process-370859, 1000 hole_batches(50) proceed
extrac_features process-370864, 1400 hole_batches(50) proceed
extrac_features process-370860, 1600 hole_batches(50) proceed
extrac_features process-370865, 1400 hole_batches(50) proceed
extrac_features process-370863, 1600 hole_batches(50) proceed
extrac_features process-370857, 1600 hole_batches(50) proceed
extrac_features process-370861, 1400 hole_batches(50) proceed
extrac_features process-370866, 1600 hole_batches(50) proceed
extrac_features process-370860, 1800 hole_batches(50) proceed
extrac_features process-370864, 1600 hole_batches(50) proceed
extrac_features process-370865, 1600 hole_batches(50) proceed
extrac_features process-370863, 1800 hole_batches(50) proceed
extrac_features process-370857, 1800 hole_batches(50) proceed
extrac_features process-370866, 1800 hole_batches(50) proceed
extrac_features process-370861, 1600 hole_batches(50) proceed
extrac_features process-370859, 1200 hole_batches(50) proceed
extrac_features process-370860, 2000 hole_batches(50) proceed
extrac_features process-370865, 1800 hole_batches(50) proceed
extrac_features process-370864, 1800 hole_batches(50) proceed
extrac_features process-370857, 2000 hole_batches(50) proceed
extrac_features process-370863, 2000 hole_batches(50) proceed
extrac_features process-370866, 2000 hole_batches(50) proceed
extrac_features process-370861, 1800 hole_batches(50) proceed
extrac_features process-370860, 2200 hole_batches(50) proceed
extrac_features process-370864, 2000 hole_batches(50) proceed
extrac_features process-370865, 2000 hole_batches(50) proceed
extrac_features process-370859, 1400 hole_batches(50) proceed
extrac_features process-370866, 2200 hole_batches(50) proceed
extrac_features process-370857, 2200 hole_batches(50) proceed
extrac_features process-370861, 2000 hole_batches(50) proceed
extrac_features process-370864, 2200 hole_batches(50) proceed
extrac_features process-370863, 2200 hole_batches(50) proceed
extrac_features process-370860, 2400 hole_batches(50) proceed
extrac_features process-370865, 2200 hole_batches(50) proceed
extrac_features process-370857, 2400 hole_batches(50) proceed
extrac_features process-370859, 1600 hole_batches(50) proceed
extrac_features process-370866, 2400 hole_batches(50) proceed
extrac_features process-370861, 2200 hole_batches(50) proceed
extrac_features process-370860, 2600 hole_batches(50) proceed
extrac_features process-370864, 2400 hole_batches(50) proceed
extrac_features process-370863, 2400 hole_batches(50) proceed
extrac_features process-370865, 2400 hole_batches(50) proceed
extrac_features process-370857, 2600 hole_batches(50) proceed
extrac_features process-370866, 2600 hole_batches(50) proceed
extrac_features process-370861, 2400 hole_batches(50) proceed
extrac_features process-370864, 2600 hole_batches(50) proceed
extrac_features process-370863, 2600 hole_batches(50) proceed
extrac_features process-370860, 2800 hole_batches(50) proceed
extrac_features process-370859, 1800 hole_batches(50) proceed
extrac_features process-370865, 2600 hole_batches(50) proceed
extrac_features process-370857, 2800 hole_batches(50) proceed
extrac_features process-370866, 2800 hole_batches(50) proceed
extrac_features process-370863, 2800 hole_batches(50) proceed
extrac_features process-370864, 2800 hole_batches(50) proceed
extrac_features process-370861, 2600 hole_batches(50) proceed
extrac_features process-370865, 2800 hole_batches(50) proceed
extrac_features process-370860, 3000 hole_batches(50) proceed
extrac_features process-370857, 3000 hole_batches(50) proceed
extrac_features process-370859, 2000 hole_batches(50) proceed
extrac_features process-370863, 3000 hole_batches(50) proceed
extrac_features process-370864, 3000 hole_batches(50) proceed
extrac_features process-370866, 3000 hole_batches(50) proceed
extrac_features process-370865, 3000 hole_batches(50) proceed
extrac_features process-370860, 3200 hole_batches(50) proceed
extrac_features process-370857, 3200 hole_batches(50) proceed
extrac_features process-370861, 2800 hole_batches(50) proceed
extrac_features process-370863, 3200 hole_batches(50) proceed
extrac_features process-370866, 3200 hole_batches(50) proceed
extrac_features process-370864, 3200 hole_batches(50) proceed
extrac_features process-370865, 3200 hole_batches(50) proceed
extrac_features process-370857, 3400 hole_batches(50) proceed
extrac_features process-370859, 2200 hole_batches(50) proceed
extrac_features process-370861, 3000 hole_batches(50) proceed
extrac_features process-370860, 3400 hole_batches(50) proceed
extrac_features process-370863, 3400 hole_batches(50) proceed
extrac_features process-370864, 3400 hole_batches(50) proceed
extrac_features process-370865, 3400 hole_batches(50) proceed
extrac_features process-370866, 3400 hole_batches(50) proceed
extrac_features process-370860, 3600 hole_batches(50) proceed
extrac_features process-370857, 3600 hole_batches(50) proceed
extrac_features process-370861, 3200 hole_batches(50) proceed
extrac_features process-370863, 3600 hole_batches(50) proceed
extrac_features process-370859, 2400 hole_batches(50) proceed
extrac_features process-370864, 3600 hole_batches(50) proceed
extrac_features process-370865, 3600 hole_batches(50) proceed
extrac_features process-370866, 3600 hole_batches(50) proceed
extrac_features process-370857, 3800 hole_batches(50) proceed
extrac_features process-370863, 3800 hole_batches(50) proceed
extrac_features process-370861, 3400 hole_batches(50) proceed
extrac_features process-370860, 3800 hole_batches(50) proceed
extrac_features process-370864, 3800 hole_batches(50) proceed
extrac_features process-370865, 3800 hole_batches(50) proceed
extrac_features process-370859, 2600 hole_batches(50) proceed
extrac_features process-370863, 4000 hole_batches(50) proceed
extrac_features process-370866, 3800 hole_batches(50) proceed
extrac_features process-370857, 4000 hole_batches(50) proceed
extrac_features process-370864, 4000 hole_batches(50) proceed
extrac_features process-370860, 4000 hole_batches(50) proceed
extrac_features process-370865, 4000 hole_batches(50) proceed
extrac_features process-370861, 3600 hole_batches(50) proceed
extrac_features process-370866, 4000 hole_batches(50) proceed
extrac_features process-370857, 4200 hole_batches(50) proceed
extrac_features process-370863, 4200 hole_batches(50) proceed
extrac_features process-370864, 4200 hole_batches(50) proceed
extrac_features process-370865, 4200 hole_batches(50) proceed
extrac_features process-370859, 2800 hole_batches(50) proceed
extrac_features process-370860, 4200 hole_batches(50) proceed
extrac_features process-370861, 3800 hole_batches(50) proceed
extrac_features process-370857, 4400 hole_batches(50) proceed
extrac_features process-370864, 4400 hole_batches(50) proceed
extrac_features process-370865, 4400 hole_batches(50) proceed
extrac_features process-370863, 4400 hole_batches(50) proceed
extrac_features process-370866, 4200 hole_batches(50) proceed
extrac_features process-370860, 4400 hole_batches(50) proceed
extrac_features process-370857, 4600 hole_batches(50) proceed
extrac_features process-370859, 3000 hole_batches(50) proceed
extrac_features process-370861, 4000 hole_batches(50) proceed
extrac_features process-370864, 4600 hole_batches(50) proceed
extrac_features process-370863, 4600 hole_batches(50) proceed
extrac_features process-370866, 4400 hole_batches(50) proceed
extrac_features process-370865, 4600 hole_batches(50) proceed
extrac_features process-370857, 4800 hole_batches(50) proceed
extrac_features process-370860, 4600 hole_batches(50) proceed
extrac_features process-370861, 4200 hole_batches(50) proceed
read_input process-370856 starts
cmd to view input: samtools view -@ 3 -h m64081.pbmm2.bam
read_input process-370856 ending, read 1794590 holes, with return_code-0
extrac_features process-370861 ending, proceed 4236 hole_batches(50)
extrac_features process-370864 ending, proceed 4794 hole_batches(50)
extrac_features process-370863 ending, proceed 4774 hole_batches(50)
extrac_features process-370857 ending, proceed 4900 hole_batches(50)
extrac_features process-370859 ending, proceed 3175 hole_batches(50)
extrac_features process-370865 ending, proceed 4763 hole_batches(50)
extrac_features process-370860 ending, proceed 4674 hole_batches(50)
extrac_features process-370866 ending, proceed 4576 hole_batches(50)
write_process-370872 started
write_process-370872 finished
[extract_features]costs 1367.2 seconds

Any help is much appreciated.
Thanks.

Best regards,
Zhongping Xu

Modify the number of the layers

Hi,
Thanks for your code again. I changed the number of layers from 3 to 1 for my data. and when I want to test the retrained model, I got this error. "RuntimeError: Error(s) in loading state_dict for ModelAttRNN:
Missing key(s) in state_dict: "rnn.weight_ih_l1", "rnn.weight_hh_l1", "rnn.bias_ih_l1", "rnn.bias_hh_l1", "rnn.weight_ih_l1_reverse", "rnn.weight_hh_l1_reverse", "rnn.bias_ih_l1_reverse", "rnn.bias_hh_l1_reverse".
Unexpected key(s) in state_dict: "fc.weight", "fc.bias". "

it seems that the model doesn't have any fc layer. could you please help me out how I can solve the problem? Thanks a lot

No local packages or working download links found for ('torch<=1.7.0,>=1.2.0')

Trying to install a working version of ccsmeth into Ubuntu running on virtual box on an iMac, and got the error message about PyTorch ("No local packages or working download links found for ('torch<=1.7.0,>=1.2.0')"). The Pytorch page has the stable version of 1.1 and a developer version of 1.8, and both are outside the range set in ccsmeth requirements. If one searches the pytorch site there are archived versions less than 1.8, but I don't know which of these are OK.

multi GPU training and testing

https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html;
https://zhuanlan.zhihu.com/p/350301395;
https://zhuanlan.zhihu.com/p/358924078;
https://www.aiuai.cn/aifarm1764.html, https://www.telesens.co/2019/04/04/distributed-data-parallel-training-using-pytorch-on-aws/;

confusion about the traditional ccsmeth procedure

Hi, Peng,
Thanks for your development tool ccsmeth. I am reading your paper "DNA 5-methylcytosine detection and methylation phasing
using PacBio circular consensus sequencing" and playing the tool ccsmeth provided here. I am little confused about the traditional procedure about ccsmeth.
In your paper, Fig.1 and Fig.4 clearly showed us it's subread that was used for alignment and calling methylation by ccsmeth. The CCS read(Hifi reads) are used for SNV calling and Haplotype phasing. This make sense, the kinetics related Pacbio BAM ' tag "ipd" and"pw" for raw subread are used for training and testing.
However, here, it seems that it is hifi reads (after CCS calling from subreads ) are used for ccsmeth.
I am confused and want to confirm whether ccsmeth can apply both the subread and ccs/hifi reads for 5mCpG detecting. If so, what's the difference by the two approaches. You know the CCS/hifi reads from subreads, it is the average kinetic tags, "fi" , "ri", "fp", "rp" are recorded in the ccs.bam.
Additionally, I found that primrose --> pbmm--> pb-CpG-tools are the hifi-reads based on the methylation provided by PacBio. What's the advantages of ccsmeth if it can applied for hifi reads based methylation extraction.
Thank you in advance.
Wenchao

never ending run

Hi @PengNi,

I recently tried to run ccsmeth and hit an unexpected issue. The run never ends. I run ccsmeth from a docker image on an entire human genome (the T2T v1.0 release). I have a 42X SequelII coverage dataset as input.

Here is the command:

ccsmeth call_mods --tseed 20221101 --input all.ccs.chm13v1_0.bam --ref chm13.draft_v1.0.fasta --model_file /ccsmeth/models/model_ccsmeth_5mCpG_call_mods_attbigru2s_b21.v1.ckpt --output all.ccs.chm13v1_0.CpG-5mC.call_mods --threads 30 --threads_call 3 --model_type attbigru2s --rm_per_readsite --mode align

The genome file is of regular size, 2.9GB. The bam files is 353GB big, it corresponds to a coverage of ~40X. It's sorted, indexed, normally nothing wrong on this end.

What happens:

The program starts. I see 30 processes spawning as expected. I also see the creation of a all.ccs.chm13v1_0.CpG-5mC.call_mods.per_readsite.tsv file. There is active writing on this file, it's getting bigger and bigger. The memory consumption steadily grows until reaching huge amounts of memory. Then there is a reallocation process going on. It results in a sudden decrease of the memory used by the process and the docker containers enters a "brain death" state. There is hardly anything happening in there and it just will never resume nor finish (I waited until one week).

What I expect:

To get an error message or to complete the run instead of this "death" state.

I have tried to sub-sample heavily from the BAM file to obtain a final 1X coverage in order to run a test. In this case, the run finishes and I get a modbam file. Is this simply a matter of input size? Should we run ccsmeth on separated chromosomes or avoid big datasets?

Thank you in advance

Process stuck when split the data.

Hi PengNi!
I recently installed the ccsmeth and encountered a problem when using it. When I call modification, the process stuck at 8%, and I cannot figure out why.
The command I used is

CUDA_VISIBLE_DEVICES=0,1,2,3 ccsmeth call_mods \
  --input /home/user/ydliu/pacbio/ccs_data/output.hifi.bam \
  --model_file /home/user/ydliu/cos/ccsmeth/models/model_ccsmeth_5mCpG_aggregate_attbigru_b11.v2p.ckpt \
  --output /home/user/ydliu/pacbio/ccs_data/output.hifi.call_mods \
  --threads 20 --threads_call 6 --model_type attbigru2s \
  --mode denovo

The log is as followed:

And after I reinstall the numpy, a new problem occurred.