marbl / moddotplot Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
Related to the #29 I thought to investigate by using test sequences.
I take a sequence from arabidopsis chr1 and copy paste a sequence 3 times,
%%writefile ../tmp/seq1.fa
>seq1
CCCTAAACCCTAAACCCTAAACCCTAAACCTCTGAATCCTTAATCCCTAAATCCCTAAAT
CTTTAAATCCTACATCCATGAATCCCTAAATACCTAATTCCCTAAACCCGAAACCGGTTT
CTCTGGTTGAAAATCATTGTGTATATAATGATAATTTTATCGTTTTTATGTAATTGCTTA
TTGTTGTGTGTAGATTTTTTAAAAATATCATTTGAGGTCAATACAAATCCTATTTCTTGT
GGTTTTCTTTCCTTCACTTAGCTATGGATGGTTTATCTTCATTTGTTATATTGGATACAA
GCTTTGCTACGATCTACATTTGGGAATGTGAGTCTCTTATTGTAACCTTAGGGTTGGTTT
ATCTCAAGAATCTTATTAATTGTTTGGACTGTTTATGTTTGGACATTTATTGTCATTCTT
ACTCCTTTGTGGAAATGTTTGTTCTATCAATTTATCTTTTGTGGGAAAATTATTTAGTTG
TAGGGATGAAGTCTTTCTTCGTTGTTGTTACGCTTGTCATCTCATCTCTCAATGATATGG
GATGGTCCTTTAGCATTTATTCTGAAGTTCTTCTGCTTGATGATTTTATCCTTAGCCAAA
AGGATTGGTGGTTTGAAGACACATCATATCAAAAAAGCTATCGCCTCGACGATGCTCTAT
TTCTATCCTTGTAGCACACATTTTGGCACTCAAAAAAGTATTTTTAGATGTTTGTTTTGC
TTCTTTGAAGTAGTTTCTCTTTGCAAAATTCCTCTTTTTTTAGAGTGATTTGGATGATTC
AAGACTTCTCGGTACTGCAAAGTTCTTCCGCCTGATTAATTATCCATTTTACCTTTGTCG
TAGATATTAGGTAATCTGTAAGTCAACTCATATACAACTCATAATTTAAAATAAAATTAT
GATCGACACACGTTTACACATAAAATCTGTAAATCAACTCATATACCCGTTATTCCCACA
ATCATATGCTTTCTAAAAGCAAAAGTATATGTCAACAATTGGTTATAAATTATTAGAAGT
TTTCCACTTATGACTTAAGAACTTGTGAAGCAGAAAGTGGCAACACCCCCCACCTCCCCC
CCCCCCCCCCACCCCCCAAATTGAGAAGTCAATTTTATATAATTTAATCAAATAAATAAG
TTTATGGTTAAGAGTTTTTTACTCTCTTTATTTTTCTTTTTCTTTTTGAGACATACTGAA
AAAAGTTGTAATTATTAATGATAGTTCTGTGATTCCTCCATGAATCACATCTGCTTGATT
TTTCTTTCATAAATTTATAAGTAATACATTCTTATAAAATGGTCAGAGAAACACCAAAGA
TCCCGAGATTTCTTCTCACTTACTTTTTTTCTATCTATCTAGATTATATAAATGAGATGT
TGAATTAGAGGAACCTTTGATTCAATGATCATAGAAAAATTAGGTAAAGAGTCAGTGTCG
TTATGTTATGGAAGATGTGAATGAAGTTTGACTTCTCATTGTATATGAGTAAAATCTTTT
CTTACAAGGGAAGTCCCCAATTGGTCAACATGTGAAAGCACGTGTCATGTTCTTACTTTT
GTTTGGGTAATCTTCTAATTACTGTATATGGAAGATGTGAATGAAGTTTTGGTCCTGAAT
GTGGCCAAGGTTCCGTCATTTGGAGATACGAAATCAAATCTCCTTTAAGATTTTGTTTTT
ATAATGTGTTCTTCCATCCACATCTATCTCCATATGATATGGACCATATCATACATCATC
ATTTGTCCAAATGCATGAATGAATTTGGAAATAGGTACGAGAATGCCAACAATGACAAGA
AGGGATCAAAGACAGTTTTTAAAACAATATTTTACAGGGTTTTAATCTAATTCTAAGTTT
TGGTCACTCACTTTGTTAAAAGAATAATTCAGTGTCTGGACACTAAAATCTTCCAAAAAC
CCCATATACATATATGCTATTTCGATACTTATATTTATTTACTCAGCATAAAAAATATTA
ACCATGTATTCATAGTAAAATGTTTCATGTGATATCAAACCAGCGACAACAAAAGTATTA
TTCCCCTCATTATGTTTGACTCCTATTATATTTTTATTTTAATTTTTTTCACTATCATCT
TTCTTGCAATGAAAGTCCCATATATTGGTCAACATTTCAAACCACTTGTTCTCTTTTATG
TTTTGGTAAGAGCTATCTTCTAAATTTATAATACGCATAAATTCAAAAGTAAAAGAAAAT
TTTGGTCATGAATGTTGTTTAAGTCATTTGGAGATACGAAATCAAATCTCCTTGTAGATT
%%writefile ../tmp/seq2.fa
>seq2
CCCTAAACCCTAAACCCTAAACCCTAAACCTCTGAATCCTTAATCCCTAAATCCCTAAAT
CTTTAAATCCTACATCCATGAATCCCTAAATACCTAATTCCCTAAACCCGAAACCGGTTT
CTCTGGTTGAAAATCATTGTGTATATAATGATAATTTTATCGTTTTTATGTAATTGCTTA
TTGTTGTGTGTAGATTTTTTAAAAATATCATTTGAGGTCAATACAAATCCTATTTCTTGT
GGTTTTCTTTCCTTCACTTAGCTATGGATGGTTTATCTTCATTTGTTATATTGGATACAA
GCTTTGCTACGATCTACATTTGGGAATGTGAGTCTCTTATTGTAACCTTAGGGTTGGTTT
ATCTCAAGAATCTTATTAATTGTTTGGACTGTTTATGTTTGGACATTTATTGTCATTCTT
ACTCCTTTGTGGAAATGTTTGTTCTATCAATTTATCTTTTGTGGGAAAATTATTTAGTTG
TAGGGATGAAGTCTTTCTTCGTTGTTGTTACGCTTGTCATCTCATCTCTCAATGATATGG
GATGGTCCTTTAGCATTTATTCTGAAGTTCTTCTGCTTGATGATTTTATCCTTAGCCAAA
AGGATTGGTGGTTTGAAGACACATCATATCAAAAAAGCTATCGCCTCGACGATGCTCTAT
TTCTATCCTTGTAGCACACATTTTGGCACTCAAAAAAGTATTTTTAGATGTTTGTTTTGC
TTCTTTGAAGTAGTTTCTCTTTGCAAAATTCCTCTTTTTTTAGAGTGATTTGGATGATTC
AAGACTTCTCGGTACTGCAAAGTTCTTCCGCCTGATTAATTATCCATTTTACCTTTGTCG
TAGATATTAGGTAATCTGTAAGTCAACTCATATACAACTCATAATTTAAAATAAAATTAT
GATCGACACACGTTTACACATAAAATCTGTAAATCAACTCATATACCCGTTATTCCCACA
ATCATATGCTTTCTAAAAGCAAAAGTATATGTCAACAATTGGTTATAAATTATTAGAAGT
TTTCCACTTATGACTTAAGAACTTGTGAAGCAGAAAGTGGCAACACCCCCCACCTCCCCC
ACTCCTTTGTGGAAATGTTTGTTCTATCAATTTATCTTTTGTGGGAAAATTATTTAGTTG
TAGGGATGAAGTCTTTCTTCGTTGTTGTTACGCTTGTCATCTCATCTCTCAATGATATGG
GATGGTCCTTTAGCATTTATTCTGAAGTTCTTCTGCTTGATGATTTTATCCTTAGCCAAA
AGGATTGGTGGTTTGAAGACACATCATATCAAAAAAGCTATCGCCTCGACGATGCTCTAT
TTCTATCCTTGTAGCACACATTTTGGCACTCAAAAAAGTATTTTTAGATGTTTGTTTTGC
TTCTTTGAAGTAGTTTCTCTTTGCAAAATTCCTCTTTTTTTAGAGTGATTTGGATGATTC
AAGACTTCTCGGTACTGCAAAGTTCTTCCGCCTGATTAATTATCCATTTTACCTTTGTCG
TAGATATTAGGTAATCTGTAAGTCAACTCATATACAACTCATAATTTAAAATAAAATTAT
GATCGACACACGTTTACACATAAAATCTGTAAATCAACTCATATACCCGTTATTCCCACA
ATCATATGCTTTCTAAAAGCAAAAGTATATGTCAACAATTGGTTATAAATTATTAGAAGT
TTTCCACTTATGACTTAAGAACTTGTGAAGCAGAAAGTGGCAACACCCCCCACCTCCCCC
ACTCCTTTGTGGAAATGTTTGTTCTATCAATTTATCTTTTGTGGGAAAATTATTTAGTTG
TAGGGATGAAGTCTTTCTTCGTTGTTGTTACGCTTGTCATCTCATCTCTCAATGATATGG
GATGGTCCTTTAGCATTTATTCTGAAGTTCTTCTGCTTGATGATTTTATCCTTAGCCAAA
AGGATTGGTGGTTTGAAGACACATCATATCAAAAAAGCTATCGCCTCGACGATGCTCTAT
TTCTATCCTTGTAGCACACATTTTGGCACTCAAAAAAGTATTTTTAGATGTTTGTTTTGC
TTCTTTGAAGTAGTTTCTCTTTGCAAAATTCCTCTTTTTTTAGAGTGATTTGGATGATTC
AAGACTTCTCGGTACTGCAAAGTTCTTCCGCCTGATTAATTATCCATTTTACCTTTGTCG
TAGATATTAGGTAATCTGTAAGTCAACTCATATACAACTCATAATTTAAAATAAAATTAT
GATCGACACACGTTTACACATAAAATCTGTAAATCAACTCATATACCCGTTATTCCCACA
ATCATATGCTTTCTAAAAGCAAAAGTATATGTCAACAATTGGTTATAAATTATTAGAAGT
TTTCCACTTATGACTTAAGAACTTGTGAAGCAGAAAGTGGCAACACCCCCCACCTCCCCC
CCCCCCCCCCACCCCCCAAATTGAGAAGTCAATTTTATATAATTTAATCAAATAAATAAG
TTTATGGTTAAGAGTTTTTTACTCTCTTTATTTTTCTTTTTCTTTTTGAGACATACTGAA
AAAAGTTGTAATTATTAATGATAGTTCTGTGATTCCTCCATGAATCACATCTGCTTGATT
TTTCTTTCATAAATTTATAAGTAATACATTCTTATAAAATGGTCAGAGAAACACCAAAGA
TCCCGAGATTTCTTCTCACTTACTTTTTTTCTATCTATCTAGATTATATAAATGAGATGT
TGAATTAGAGGAACCTTTGATTCAATGATCATAGAAAAATTAGGTAAAGAGTCAGTGTCG
TTATGTTATGGAAGATGTGAATGAAGTTTGACTTCTCATTGTATATGAGTAAAATCTTTT
CTTACAAGGGAAGTCCCCAATTGGTCAACATGTGAAAGCACGTGTCATGTTCTTACTTTT
GTTTGGGTAATCTTCTAATTACTGTATATGGAAGATGTGAATGAAGTTTTGGTCCTGAAT
GTGGCCAAGGTTCCGTCATTTGGAGATACGAAATCAAATCTCCTTTAAGATTTTGTTTTT
ATAATGTGTTCTTCCATCCACATCTATCTCCATATGATATGGACCATATCATACATCATC
ATTTGTCCAAATGCATGAATGAATTTGGAAATAGGTACGAGAATGCCAACAATGACAAGA
AGGGATCAAAGACAGTTTTTAAAACAATATTTTACAGGGTTTTAATCTAATTCTAAGTTT
TGGTCACTCACTTTGTTAAAAGAATAATTCAGTGTCTGGACACTAAAATCTTCCAAAAAC
CCCATATACATATATGCTATTTCGATACTTATATTTATTTACTCAGCATAAAAAATATTA
ACCATGTATTCATAGTAAAATGTTTCATGTGATATCAAACCAGCGACAACAAAAGTATTA
TTCCCCTCATTATGTTTGACTCCTATTATATTTTTATTTTAATTTTTTTCACTATCATCT
TTCTTGCAATGAAAGTCCCATATATTGGTCAACATTTCAAACCACTTGTTCTCTTTTATG
TTTTGGTAAGAGCTATCTTCTAAATTTATAATACGCATAAATTCAAAAGTAAAAGAAAAT
TTTGGTCATGAATGTTGTTTAAGTCATTTGGAGATACGAAATCAAATCTCCTTGTAGATT
Hi Alex,
While I am trying this tool, I have met several questions.
I installed it by
python -m venv venv
source venv/bin/activate
python setup.py install
and everything runs successfully.
Using /public/home/lanlan/software/ModDotPlot/venv/lib/python3.11/site-packages/MarkupSafe-2.1.2-py3.11-linux-x86_64.egg
Finished processing dependencies for moddotplot==0.4.2
Then I tried the example
moddotplot -i test/Chr1_cen.fa
Then it turned to the error message like
File "<frozen importlib._bootstrap>", line 1206, in _gcd_import
File "<frozen importlib._bootstrap>", line 1178, in _find_and_load
File "<frozen importlib._bootstrap>", line 1149, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 940, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "/public/home/lanlan/software/ModDotPlot/moddotplot/__main__.py", line 7, in <module>
sys.exit(main())
^^^^^^
File "/public/home/lanlan/software/ModDotPlot/moddotplot/moddotplot.py", line 154, in main
paired_bed_file(
TypeError: paired_bed_file() missing 1 required positional argument: 'k'
I thought it could be dealt by adding the -k
moddotplot -i test/Chr1_cen.fa --kmer 21
but still return the error massage.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<frozen importlib._bootstrap>", line 1206, in _gcd_import
File "<frozen importlib._bootstrap>", line 1178, in _find_and_load
File "<frozen importlib._bootstrap>", line 1149, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 940, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "/public/home/lanlan/software/ModDotPlot/moddotplot/__main__.py", line 7, in <module>
sys.exit(main())
^^^^^^
File "/public/home/lanlan/software/ModDotPlot/moddotplot/moddotplot.py", line 127, in main
kmer_list = read_kmers_from_file(i, args.kmer)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/public/home/lanlan/software/ModDotPlot/moddotplot/parse_fasta.py", line 103, in read_kmers_from_file
all_kmers.append(report_all_kmers(seq.fetch(seq_id), ksize))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/public/home/lanlan/software/ModDotPlot/moddotplot/parse_fasta.py", line 85, in report_all_kmers
for kmer in kmers:
File "/public/home/lanlan/software/ModDotPlot/moddotplot/parse_fasta.py", line 21, in generate_kmers
mask = (1 << (3*k)) - 1
~~^^~~~~~~
How can I fix it?
Thanks,
Lan
Hi Alex,
I installed the tool and when I tried to run it, I encountered the error shown:
Traceback (most recent call last):
File "/home/Tools/software/venv/bin/moddotplot", line 33, in
sys.exit(load_entry_point('moddotplot==0.3.0', 'console_scripts', 'moddotplot')())
File "/home/Tools/software/venv/bin/moddotplot", line 25, in importlib_load_entry_point
return next(matches).load()
File "/home/mambaforge/lib/python3.10/importlib/metadata/init.py", line 171, in load
module = import_module(match.group('module'))
File "/home/mambaforge/lib/python3.10/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/home/Tools/software/ModDotPlot/moddotplot/main.py", line 4, in
from moddotplot.moddotplot import main
File "/home/Tools/software/ModDotPlot/moddotplot/moddotplot.py", line 4, in
from moddotplot.interactive import run_dash
File "/home/Tools/software/ModDotPlot/moddotplot/interactive.py", line 2, in
from moddotplot.estimate_identity import (
ImportError: cannot import name 'poisson_distance' from 'moddotplot.estimate_identity' (/home/Tools/software/ModDotPlot/moddotplot/estimate_identity.py)
Please kindly help me look into this.
Thank you.
I was running in static mode and attempting to compare two sequences of different sizes and couldn't figure out why I was only getting an indexing error sometimes. I realized that when the query sequence is longer than the reference sequence everything works just fine, but if the query is shorter I get an indexing error.
I was able to replicate it just by truncating one of the example sequences given in the sequences directory, the chr21_segment.tiny.fa file referenced is just the first 500 lines of the chr21_segment.fa
moddotplot static -f sequences/chr21_segment.tiny.fa sequences/chr15_segment.fa --compare-only
__ __ _ _____ _ _____ _ _
| \/ | | | | __ \ | | | __ \| | | |
| \ / | ___ __| | | | | | ___ | |_ | |__) | | ___ | |_
| |\/| |/ _ \ / _` | | | | |/ _ \| __| | ___/| |/ _ \| __|
| | | | (_) | (_| | | |__| | (_) | |_ | | | | (_) | |_
|_| |_|\___/ \__,_| |_____/ \___/ \__| |_| |_|\___/ \__|
v0.8.2
Running ModDotPlot in static mode
Retrieving k-mers from Chr21....
Progress: |████████████████████████████████████████| 100.0% Completed
Chr21 k-mers retrieved!
Retrieving k-mers from Chr15....
Progress: |████████████████████████████████████████| 100.0% Completed
Chr15 k-mers retrieved!
Computing pairwise identity matrix for Chr21 and Chr15...
Sequence length Chr21: 34930
Sequence length Chr15: 6000000
Window size w: 35
Modimizer sketch size: 35
Plot Resolution r: 1000
Traceback (most recent call last):████████████████-| 99.0% Complete
File "/private/groups/migalab/hloucks/CenSat/arraySim/ModDotPlot/venv/bin/moddotplot", line 5, in <module>
from moddotplot.__main__ import main
File "/private/groups/migalab/hloucks/CenSat/arraySim/ModDotPlot/venv/lib/python3.11/site-packages/moddotplot/__main__.py", line 11, in <module>
sys.exit(main())
^^^^^^
File "/private/groups/migalab/hloucks/CenSat/arraySim/ModDotPlot/venv/lib/python3.11/site-packages/moddotplot/moddotplot.py", line 940, in main
pair_mat = createPairwiseMatrix(
^^^^^^^^^^^^^^^^^^^^^
File "/private/groups/migalab/hloucks/CenSat/arraySim/ModDotPlot/venv/lib/python3.11/site-packages/moddotplot/estimate_identity.py", line 96, in createPairwiseMatrix
matrix = pairwiseContainmentMatrix(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/private/groups/migalab/hloucks/CenSat/arraySim/ModDotPlot/venv/lib/python3.11/site-packages/moddotplot/estimate_identity.py", line 385, in pairwiseContainmentMatrix
containment_matrix[w, q] = (
~~~~~~~~~~~~~~~~~~^^^^^^
IndexError: index 998 is out of bounds for axis 0 with size 998
If I switch the order however, it works just fine:
moddotplot static -f sequences/chr15_segment.fa sequences/chr21_segment.tiny.fa --compare-only
__ __ _ _____ _ _____ _ _
| \/ | | | | __ \ | | | __ \| | | |
| \ / | ___ __| | | | | | ___ | |_ | |__) | | ___ | |_
| |\/| |/ _ \ / _` | | | | |/ _ \| __| | ___/| |/ _ \| __|
| | | | (_) | (_| | | |__| | (_) | |_ | | | | (_) | |_
|_| |_|\___/ \__,_| |_____/ \___/ \__| |_| |_|\___/ \__|
v0.8.2
Running ModDotPlot in static mode
Retrieving k-mers from Chr15....
Progress: |████████████████████████████████████████| 100.0% Completed
Chr15 k-mers retrieved!
Retrieving k-mers from Chr21....
Progress: |████████████████████████████████████████| 100.0% Completed
Chr21 k-mers retrieved!
Computing pairwise identity matrix for Chr15 and Chr21...
Sequence length Chr15: 6000000
Sequence length Chr21: 34930
Window size w: 6000
Modimizer sketch size: 1500
Plot Resolution r: 1000
Progress: |████████████████████████████████████████| 100.0% Completed
Saved bed file to Chr15.bed
None
5999
Creating plots and saving to ./Chr15_Chr21...
./Chr15_Chr21.pdf, ./Chr15_Chr21.png, ./Chr15_Chr21_HIST.pdf and ./Chr15_Chr21_HIST.png saved sucessfully.
With v0.8.0, I get an error when comparing across two chromosomes (moddotplot interactive -f two_chromosomes.fa --compare-only
)
f"Building pairwise matrices for {seq_list[i]} and {seq_list[j]}, using a minimum window size of {window_lengths[0]}.... \n"
~~~~~~~~^^^
TypeError: list indices must be integers or slices, not str
From these lines
ModDotPlot/src/moddotplot/moddotplot.py
Lines 655 to 661 in ed190c7
Removing the print statement allowed the plots to finish up correctly.
Ran ModDotPlot with the yeast reference genome.
moddotplot interactive -f ~/Data/assemblies/yeast-ref/yeast_genome.fna --port 8989
__ __ _ _____ _ _____ _ _
| \/ | | | | __ \ | | | __ \| | | |
| \ / | ___ __| | | | | | ___ | |_ | |__) | | ___ | |_
| |\/| |/ _ \ / _` | | | | |/ _ \| __| | ___/| |/ _ \| __|
| | | | (_) | (_| | | |__| | (_) | |_ | | | | (_) | |_
|_| |_|\___/ \__,_| |_____/ \___/ \__| |_| |_|\___/ \__|
v0.8.0
...
And ran into the following error:
17 sequences were detected, however interactive mode can only load two sequences at a time.
Interactive mode will proceed with NC_001133.9 and NC_001134.8
Traceback (most recent call last):
File "/home/Users/blk6/Contribute/ModDotPlot/venv/bin/moddotplot", line 5, in <module>
from moddotplot.__main__ import main
File "/home/Users/blk6/Contribute/ModDotPlot/venv/lib/python3.10/site-packages/moddotplot/__main__.py", line 11, in <module>
sys.exit(main())
File "/home/Users/blk6/Contribute/ModDotPlot/venv/lib/python3.10/site-packages/moddotplot/moddotplot.py", line 557, in main
raise ValueError(
ValueError: Minimum window size must be greater than or equal to the modimizer sketch size
Here's the .fai
entries for the relevant sequences:
NC_001133.9 230218 76 80 81
NC_001134.8 813184 233249 80 81
I'm guessing these sequences are too short for the default params. Would there be a way for the params to automatically adjust based on the shortest query sequence?
Hi! Thanks for the interesting and fast tool!
I was testing it on the GCF_000001735.4
A. thaliana assembly and I noticed a mismatch between the sequence names in the output files.
This is the fasta index (.fai) for the input genomes after extracting only the five chromosomes:
NC_003070.9 30427671 56 60 61
NC_003071.7 19698289 30934920 60 61
NC_003074.8 23459830 50961579 60 61
NC_003075.7 18585056 74812472 60 61
NC_003076.8 26975502 93707344 60 61
And with ModDotPlot (@ commit c1388eb) I ran the following command:
moddotplot static -f GCF_000001735.4_TAIR10.1_genomic.chrs.fna
Unexpectedly, this was the output. Notice the mismatch between the filename and the content of the files. Which one is correct?
head NC_*bed
==> NC_003070.9.bed <==
#query_name query_start query_end reference_name reference_start reference_end perID_by_events
NC_003070.9 1 30428 NC_003070.9 1 30428 100.0
NC_003070.9 1 30428 NC_003070.9 30429 60856 96.89179977979816
NC_003070.9 30429 60856 NC_003070.9 30429 60856 100.0
NC_003070.9 30429 60856 NC_003070.9 60857 91284 96.86273970525768
NC_003070.9 60857 91284 NC_003070.9 60857 91284 100.0
NC_003070.9 60857 91284 NC_003070.9 91285 121712 96.8066448041969
NC_003070.9 91285 121712 NC_003070.9 91285 121712 100.0
NC_003070.9 91285 121712 NC_003070.9 121713 152140 96.88899802419402
NC_003070.9 121713 152140 NC_003070.9 121713 152140 100.0
==> NC_003071.7.bed <==
#query_name query_start query_end reference_name reference_start reference_end perID_by_events
NC_003075.7 1 19699 NC_003075.7 1 19699 100.0
NC_003075.7 1 19699 NC_003075.7 19700 39398 96.34241311605368
NC_003075.7 1 19699 NC_003075.7 39399 59097 95.02161679397038
NC_003075.7 1 19699 NC_003075.7 59098 78796 92.86765095697956
NC_003075.7 1 19699 NC_003075.7 1615319 1635017 88.19960026756749
NC_003075.7 1 19699 NC_003075.7 1635018 1654716 88.19960026756749
NC_003075.7 1 19699 NC_003075.7 2541172 2560870 91.12687268394012
NC_003075.7 1 19699 NC_003075.7 2560871 2580569 93.41097733509187
NC_003075.7 1 19699 NC_003075.7 3210938 3230636 92.00089775836454
==> NC_003074.8.bed <==
#query_name query_start query_end reference_name reference_start reference_end perID_by_events
NC_003074.8 1 23460 NC_003074.8 1 23460 100.0
NC_003074.8 1 23460 NC_003074.8 23461 46920 96.83259585797633
NC_003074.8 23461 46920 NC_003074.8 23461 46920 100.0
NC_003074.8 23461 46920 NC_003074.8 46921 70380 96.68315315308905
NC_003074.8 46921 70380 NC_003074.8 46921 70380 100.0
NC_003074.8 46921 70380 NC_003074.8 70381 93840 96.97521648973213
NC_003074.8 70381 93840 NC_003074.8 70381 93840 100.0
NC_003074.8 70381 93840 NC_003074.8 93841 117300 96.70998860321123
NC_003074.8 93841 117300 NC_003074.8 93841 117300 100.0
==> NC_003075.7.bed <==
#query_name query_start query_end reference_name reference_start reference_end perID_by_events
NC_003076.8 1 18586 NC_003076.8 1 18586 100.0
NC_003076.8 1 18586 NC_003076.8 18587 37172 96.95608358007378
NC_003076.8 18587 37172 NC_003076.8 18587 37172 100.0
NC_003076.8 18587 37172 NC_003076.8 37173 55758 96.87871811713131
NC_003076.8 37173 55758 NC_003076.8 37173 55758 100.0
NC_003076.8 37173 55758 NC_003076.8 55759 74344 96.87827825075271
NC_003076.8 55759 74344 NC_003076.8 55759 74344 100.0
NC_003076.8 55759 74344 NC_003076.8 74345 92930 96.91004434964874
NC_003076.8 74345 92930 NC_003076.8 74345 92930 100.0
==> NC_003076.8.bed <==
#query_name query_start query_end reference_name reference_start reference_end perID_by_events
NC_003071.7 1 26976 NC_003071.7 1 26976 100.0
NC_003071.7 1 26976 NC_003071.7 26977 53952 96.89687811961811
NC_003071.7 26977 53952 NC_003071.7 26977 53952 100.0
NC_003071.7 26977 53952 NC_003071.7 53953 80928 96.7788487007729
NC_003071.7 53953 80928 NC_003071.7 53953 80928 100.0
NC_003071.7 53953 80928 NC_003071.7 80929 107904 96.86344796571336
NC_003071.7 53953 80928 NC_003071.7 8902081 8929056 90.20730812484314
NC_003071.7 53953 80928 NC_003071.7 8929057 8956032 90.20730812484314
NC_003071.7 53953 80928 NC_003071.7 12085249 12112224 91.56204318214218
Hi, thanks for the great software!
I've been trying out the static plots and noticed that there are no color legends. Would it be possible to add such legends to these plots?
Hi, Alex
It's so nice to see tons of updates on ModDotPlot
. I have a question about the interactive mode. For most plant genomes, its centromere/HOR region is unknown, it would be great to have an interactive mode to explore across the genome. But it would be difficult for most HPC to use with ssh port. So could the ModDotPlot
have these features?
StainedGlass
, could ModDotPlot
save the raw matrix with cooler
? It's easier to play around with HiGlass
(identity, load annotation tracks, such as ChIP-Seq and gene/TE annotation)Best regards
Zhigui
Hi,
awesome tool and superfast :)
As mentioned here #5, histogram is helpful but if we need scale legend similar to interactive mode how to generate for static mode? Can there be an additional text file with this information which can be used for plotting separately or add the color code to bed file?
Thank you
Hi,
When I try to run an a vs b plot using --compare, it produces this error
TypeError: paired_bed_file_a_vs_b() takes 15 positional arguments but 16 were given
It looks like there's a disconnect between the definition of the function in static_plots.py and how it's called in moddotplot.py (line 305)?
Is it possible to generate an All vs All dotplot with ModDotPlot?
When I run moddotplot -i Asm_A.fasta Asm_B.fasta --compare
I'm just getting a single dotplot of contig_1 vs contig_2 from the first fasta file only.
Hi Alex,
Great tool! Congratulations!!
While using it, I found several places that could benefit from more clarification and slight modifications.
-w
and -d
is not very clear.query_name
and reference_name
are all replaced by the -p
value. Can it be the original sequence names?mod_identity.py
is hard coded to perID*100 >= 80
; can you make it customizable to allow reporting lower identity regions?mod_identity.py
script can only run on one sequence? Can it take the entire genome?plot.r
script requires a folder named results
in the working directory; otherwise, it will run into errors.plot.r
script requires ggplot2, dplyr, scales, RColorBrewer, data.table, cowplot, glue, argparse
packages installed in R.Thanks!
Shujun
Hi thanks for the great tool!
I am testing it and I noticed a difference and it got me confused. I looked at #22 but it I downloaded moddotplot last week so it should not be this.
I have a 144kb sequence that I compare to a 239kb sequence using the command
for sample in sample17 sample18; do
moddotplot static \
-f sample16.fa $sample\
-o sample16_vs_${sample} --compare-only \
-w 100 --identity 70 --breakpoints 70 80 90 95 100 --palette Spectral_4 --dpi 1200 \
done
The strange thing is I know that sample 16 is 144kb, sample 17 238 and sample 18 172kb, also by looking at their fasta sequences:
When I look at the bed file it looks like they are swapped, as the header is
#query_name query_start query_end reference_name reference_start reference_end perID_by_events
Can this be a bug? Should the labels be switched?
Hi,
I was wondering if there was a way to make the static plot for two separate sequences? As I understand it right now, it checks the identity of a sequence with itself.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.