Giter VIP home page Giter VIP logo

moddotplot's People

Contributors

alexsweeten avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

moddotplot's Issues

test sequence not as expected?

Related to the #29 I thought to investigate by using test sequences.

I take a sequence from arabidopsis chr1 and copy paste a sequence 3 times,

%%writefile ../tmp/seq1.fa
>seq1
CCCTAAACCCTAAACCCTAAACCCTAAACCTCTGAATCCTTAATCCCTAAATCCCTAAAT
CTTTAAATCCTACATCCATGAATCCCTAAATACCTAATTCCCTAAACCCGAAACCGGTTT
CTCTGGTTGAAAATCATTGTGTATATAATGATAATTTTATCGTTTTTATGTAATTGCTTA
TTGTTGTGTGTAGATTTTTTAAAAATATCATTTGAGGTCAATACAAATCCTATTTCTTGT
GGTTTTCTTTCCTTCACTTAGCTATGGATGGTTTATCTTCATTTGTTATATTGGATACAA
GCTTTGCTACGATCTACATTTGGGAATGTGAGTCTCTTATTGTAACCTTAGGGTTGGTTT
ATCTCAAGAATCTTATTAATTGTTTGGACTGTTTATGTTTGGACATTTATTGTCATTCTT
ACTCCTTTGTGGAAATGTTTGTTCTATCAATTTATCTTTTGTGGGAAAATTATTTAGTTG
TAGGGATGAAGTCTTTCTTCGTTGTTGTTACGCTTGTCATCTCATCTCTCAATGATATGG
GATGGTCCTTTAGCATTTATTCTGAAGTTCTTCTGCTTGATGATTTTATCCTTAGCCAAA
AGGATTGGTGGTTTGAAGACACATCATATCAAAAAAGCTATCGCCTCGACGATGCTCTAT
TTCTATCCTTGTAGCACACATTTTGGCACTCAAAAAAGTATTTTTAGATGTTTGTTTTGC
TTCTTTGAAGTAGTTTCTCTTTGCAAAATTCCTCTTTTTTTAGAGTGATTTGGATGATTC
AAGACTTCTCGGTACTGCAAAGTTCTTCCGCCTGATTAATTATCCATTTTACCTTTGTCG
TAGATATTAGGTAATCTGTAAGTCAACTCATATACAACTCATAATTTAAAATAAAATTAT
GATCGACACACGTTTACACATAAAATCTGTAAATCAACTCATATACCCGTTATTCCCACA
ATCATATGCTTTCTAAAAGCAAAAGTATATGTCAACAATTGGTTATAAATTATTAGAAGT
TTTCCACTTATGACTTAAGAACTTGTGAAGCAGAAAGTGGCAACACCCCCCACCTCCCCC
CCCCCCCCCCACCCCCCAAATTGAGAAGTCAATTTTATATAATTTAATCAAATAAATAAG
TTTATGGTTAAGAGTTTTTTACTCTCTTTATTTTTCTTTTTCTTTTTGAGACATACTGAA
AAAAGTTGTAATTATTAATGATAGTTCTGTGATTCCTCCATGAATCACATCTGCTTGATT
TTTCTTTCATAAATTTATAAGTAATACATTCTTATAAAATGGTCAGAGAAACACCAAAGA
TCCCGAGATTTCTTCTCACTTACTTTTTTTCTATCTATCTAGATTATATAAATGAGATGT
TGAATTAGAGGAACCTTTGATTCAATGATCATAGAAAAATTAGGTAAAGAGTCAGTGTCG
TTATGTTATGGAAGATGTGAATGAAGTTTGACTTCTCATTGTATATGAGTAAAATCTTTT
CTTACAAGGGAAGTCCCCAATTGGTCAACATGTGAAAGCACGTGTCATGTTCTTACTTTT
GTTTGGGTAATCTTCTAATTACTGTATATGGAAGATGTGAATGAAGTTTTGGTCCTGAAT
GTGGCCAAGGTTCCGTCATTTGGAGATACGAAATCAAATCTCCTTTAAGATTTTGTTTTT
ATAATGTGTTCTTCCATCCACATCTATCTCCATATGATATGGACCATATCATACATCATC
ATTTGTCCAAATGCATGAATGAATTTGGAAATAGGTACGAGAATGCCAACAATGACAAGA
AGGGATCAAAGACAGTTTTTAAAACAATATTTTACAGGGTTTTAATCTAATTCTAAGTTT
TGGTCACTCACTTTGTTAAAAGAATAATTCAGTGTCTGGACACTAAAATCTTCCAAAAAC
CCCATATACATATATGCTATTTCGATACTTATATTTATTTACTCAGCATAAAAAATATTA
ACCATGTATTCATAGTAAAATGTTTCATGTGATATCAAACCAGCGACAACAAAAGTATTA
TTCCCCTCATTATGTTTGACTCCTATTATATTTTTATTTTAATTTTTTTCACTATCATCT
TTCTTGCAATGAAAGTCCCATATATTGGTCAACATTTCAAACCACTTGTTCTCTTTTATG
TTTTGGTAAGAGCTATCTTCTAAATTTATAATACGCATAAATTCAAAAGTAAAAGAAAAT
TTTGGTCATGAATGTTGTTTAAGTCATTTGGAGATACGAAATCAAATCTCCTTGTAGATT
%%writefile ../tmp/seq2.fa
>seq2
CCCTAAACCCTAAACCCTAAACCCTAAACCTCTGAATCCTTAATCCCTAAATCCCTAAAT
CTTTAAATCCTACATCCATGAATCCCTAAATACCTAATTCCCTAAACCCGAAACCGGTTT
CTCTGGTTGAAAATCATTGTGTATATAATGATAATTTTATCGTTTTTATGTAATTGCTTA
TTGTTGTGTGTAGATTTTTTAAAAATATCATTTGAGGTCAATACAAATCCTATTTCTTGT
GGTTTTCTTTCCTTCACTTAGCTATGGATGGTTTATCTTCATTTGTTATATTGGATACAA
GCTTTGCTACGATCTACATTTGGGAATGTGAGTCTCTTATTGTAACCTTAGGGTTGGTTT
ATCTCAAGAATCTTATTAATTGTTTGGACTGTTTATGTTTGGACATTTATTGTCATTCTT
ACTCCTTTGTGGAAATGTTTGTTCTATCAATTTATCTTTTGTGGGAAAATTATTTAGTTG
TAGGGATGAAGTCTTTCTTCGTTGTTGTTACGCTTGTCATCTCATCTCTCAATGATATGG
GATGGTCCTTTAGCATTTATTCTGAAGTTCTTCTGCTTGATGATTTTATCCTTAGCCAAA
AGGATTGGTGGTTTGAAGACACATCATATCAAAAAAGCTATCGCCTCGACGATGCTCTAT
TTCTATCCTTGTAGCACACATTTTGGCACTCAAAAAAGTATTTTTAGATGTTTGTTTTGC
TTCTTTGAAGTAGTTTCTCTTTGCAAAATTCCTCTTTTTTTAGAGTGATTTGGATGATTC
AAGACTTCTCGGTACTGCAAAGTTCTTCCGCCTGATTAATTATCCATTTTACCTTTGTCG
TAGATATTAGGTAATCTGTAAGTCAACTCATATACAACTCATAATTTAAAATAAAATTAT
GATCGACACACGTTTACACATAAAATCTGTAAATCAACTCATATACCCGTTATTCCCACA
ATCATATGCTTTCTAAAAGCAAAAGTATATGTCAACAATTGGTTATAAATTATTAGAAGT
TTTCCACTTATGACTTAAGAACTTGTGAAGCAGAAAGTGGCAACACCCCCCACCTCCCCC
ACTCCTTTGTGGAAATGTTTGTTCTATCAATTTATCTTTTGTGGGAAAATTATTTAGTTG
TAGGGATGAAGTCTTTCTTCGTTGTTGTTACGCTTGTCATCTCATCTCTCAATGATATGG
GATGGTCCTTTAGCATTTATTCTGAAGTTCTTCTGCTTGATGATTTTATCCTTAGCCAAA
AGGATTGGTGGTTTGAAGACACATCATATCAAAAAAGCTATCGCCTCGACGATGCTCTAT
TTCTATCCTTGTAGCACACATTTTGGCACTCAAAAAAGTATTTTTAGATGTTTGTTTTGC
TTCTTTGAAGTAGTTTCTCTTTGCAAAATTCCTCTTTTTTTAGAGTGATTTGGATGATTC
AAGACTTCTCGGTACTGCAAAGTTCTTCCGCCTGATTAATTATCCATTTTACCTTTGTCG
TAGATATTAGGTAATCTGTAAGTCAACTCATATACAACTCATAATTTAAAATAAAATTAT
GATCGACACACGTTTACACATAAAATCTGTAAATCAACTCATATACCCGTTATTCCCACA
ATCATATGCTTTCTAAAAGCAAAAGTATATGTCAACAATTGGTTATAAATTATTAGAAGT
TTTCCACTTATGACTTAAGAACTTGTGAAGCAGAAAGTGGCAACACCCCCCACCTCCCCC
ACTCCTTTGTGGAAATGTTTGTTCTATCAATTTATCTTTTGTGGGAAAATTATTTAGTTG
TAGGGATGAAGTCTTTCTTCGTTGTTGTTACGCTTGTCATCTCATCTCTCAATGATATGG
GATGGTCCTTTAGCATTTATTCTGAAGTTCTTCTGCTTGATGATTTTATCCTTAGCCAAA
AGGATTGGTGGTTTGAAGACACATCATATCAAAAAAGCTATCGCCTCGACGATGCTCTAT
TTCTATCCTTGTAGCACACATTTTGGCACTCAAAAAAGTATTTTTAGATGTTTGTTTTGC
TTCTTTGAAGTAGTTTCTCTTTGCAAAATTCCTCTTTTTTTAGAGTGATTTGGATGATTC
AAGACTTCTCGGTACTGCAAAGTTCTTCCGCCTGATTAATTATCCATTTTACCTTTGTCG
TAGATATTAGGTAATCTGTAAGTCAACTCATATACAACTCATAATTTAAAATAAAATTAT
GATCGACACACGTTTACACATAAAATCTGTAAATCAACTCATATACCCGTTATTCCCACA
ATCATATGCTTTCTAAAAGCAAAAGTATATGTCAACAATTGGTTATAAATTATTAGAAGT
TTTCCACTTATGACTTAAGAACTTGTGAAGCAGAAAGTGGCAACACCCCCCACCTCCCCC
CCCCCCCCCCACCCCCCAAATTGAGAAGTCAATTTTATATAATTTAATCAAATAAATAAG
TTTATGGTTAAGAGTTTTTTACTCTCTTTATTTTTCTTTTTCTTTTTGAGACATACTGAA
AAAAGTTGTAATTATTAATGATAGTTCTGTGATTCCTCCATGAATCACATCTGCTTGATT
TTTCTTTCATAAATTTATAAGTAATACATTCTTATAAAATGGTCAGAGAAACACCAAAGA
TCCCGAGATTTCTTCTCACTTACTTTTTTTCTATCTATCTAGATTATATAAATGAGATGT
TGAATTAGAGGAACCTTTGATTCAATGATCATAGAAAAATTAGGTAAAGAGTCAGTGTCG
TTATGTTATGGAAGATGTGAATGAAGTTTGACTTCTCATTGTATATGAGTAAAATCTTTT
CTTACAAGGGAAGTCCCCAATTGGTCAACATGTGAAAGCACGTGTCATGTTCTTACTTTT
GTTTGGGTAATCTTCTAATTACTGTATATGGAAGATGTGAATGAAGTTTTGGTCCTGAAT
GTGGCCAAGGTTCCGTCATTTGGAGATACGAAATCAAATCTCCTTTAAGATTTTGTTTTT
ATAATGTGTTCTTCCATCCACATCTATCTCCATATGATATGGACCATATCATACATCATC
ATTTGTCCAAATGCATGAATGAATTTGGAAATAGGTACGAGAATGCCAACAATGACAAGA
AGGGATCAAAGACAGTTTTTAAAACAATATTTTACAGGGTTTTAATCTAATTCTAAGTTT
TGGTCACTCACTTTGTTAAAAGAATAATTCAGTGTCTGGACACTAAAATCTTCCAAAAAC
CCCATATACATATATGCTATTTCGATACTTATATTTATTTACTCAGCATAAAAAATATTA
ACCATGTATTCATAGTAAAATGTTTCATGTGATATCAAACCAGCGACAACAAAAGTATTA
TTCCCCTCATTATGTTTGACTCCTATTATATTTTTATTTTAATTTTTTTCACTATCATCT
TTCTTGCAATGAAAGTCCCATATATTGGTCAACATTTCAAACCACTTGTTCTCTTTTATG
TTTTGGTAAGAGCTATCTTCTAAATTTATAATACGCATAAATTCAAAAGTAAAAGAAAAT
TTTGGTCATGAATGTTGTTTAAGTCATTTGGAGATACGAAATCAAATCTCCTTGTAGATT

so I would expect a dotplot like
image

However, I get this
image
image

image
why?

Error question

Hi Alex,

While I am trying this tool, I have met several questions.
I installed it by

python -m venv venv
source venv/bin/activate
python setup.py install

and everything runs successfully.

Using /public/home/lanlan/software/ModDotPlot/venv/lib/python3.11/site-packages/MarkupSafe-2.1.2-py3.11-linux-x86_64.egg
Finished processing dependencies for moddotplot==0.4.2

Then I tried the example

moddotplot -i test/Chr1_cen.fa
Then it turned to the error message like

  File "<frozen importlib._bootstrap>", line 1206, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1178, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1149, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/public/home/lanlan/software/ModDotPlot/moddotplot/__main__.py", line 7, in <module>
    sys.exit(main())
             ^^^^^^
  File "/public/home/lanlan/software/ModDotPlot/moddotplot/moddotplot.py", line 154, in main
    paired_bed_file(
TypeError: paired_bed_file() missing 1 required positional argument: 'k'

I thought it could be dealt by adding the -k
moddotplot -i test/Chr1_cen.fa --kmer 21
but still return the error massage.

           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1206, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1178, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1149, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/public/home/lanlan/software/ModDotPlot/moddotplot/__main__.py", line 7, in <module>
    sys.exit(main())
             ^^^^^^
  File "/public/home/lanlan/software/ModDotPlot/moddotplot/moddotplot.py", line 127, in main
    kmer_list = read_kmers_from_file(i, args.kmer)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/public/home/lanlan/software/ModDotPlot/moddotplot/parse_fasta.py", line 103, in read_kmers_from_file
    all_kmers.append(report_all_kmers(seq.fetch(seq_id), ksize))
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/public/home/lanlan/software/ModDotPlot/moddotplot/parse_fasta.py", line 85, in report_all_kmers
    for kmer in kmers:
  File "/public/home/lanlan/software/ModDotPlot/moddotplot/parse_fasta.py", line 21, in generate_kmers
    mask = (1 << (3*k)) - 1
            ~~^^~~~~~~

How can I fix it?

Thanks,
Lan

Error using

Hi Alex,

I installed the tool and when I tried to run it, I encountered the error shown:

Traceback (most recent call last):
File "/home/Tools/software/venv/bin/moddotplot", line 33, in
sys.exit(load_entry_point('moddotplot==0.3.0', 'console_scripts', 'moddotplot')())
File "/home/Tools/software/venv/bin/moddotplot", line 25, in importlib_load_entry_point
return next(matches).load()
File "/home/mambaforge/lib/python3.10/importlib/metadata/init.py", line 171, in load
module = import_module(match.group('module'))
File "/home/mambaforge/lib/python3.10/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/home/Tools/software/ModDotPlot/moddotplot/main.py", line 4, in
from moddotplot.moddotplot import main
File "/home/Tools/software/ModDotPlot/moddotplot/moddotplot.py", line 4, in
from moddotplot.interactive import run_dash
File "/home/Tools/software/ModDotPlot/moddotplot/interactive.py", line 2, in
from moddotplot.estimate_identity import (
ImportError: cannot import name 'poisson_distance' from 'moddotplot.estimate_identity' (/home/Tools/software/ModDotPlot/moddotplot/estimate_identity.py)

Please kindly help me look into this.

Thank you.

Error when query is shorter than the reference

I was running in static mode and attempting to compare two sequences of different sizes and couldn't figure out why I was only getting an indexing error sometimes. I realized that when the query sequence is longer than the reference sequence everything works just fine, but if the query is shorter I get an indexing error.
I was able to replicate it just by truncating one of the example sequences given in the sequences directory, the chr21_segment.tiny.fa file referenced is just the first 500 lines of the chr21_segment.fa

moddotplot static -f sequences/chr21_segment.tiny.fa sequences/chr15_segment.fa --compare-only

  __  __           _   _____        _     _____  _       _   
 |  \/  |         | | |  __ \      | |   |  __ \| |     | |  
 | \  / | ___   __| | | |  | | ___ | |_  | |__) | | ___ | |_ 
 | |\/| |/ _ \ / _` | | |  | |/ _ \| __| |  ___/| |/ _ \| __|
 | |  | | (_) | (_| | | |__| | (_) | |_  | |    | | (_) | |_ 
 |_|  |_|\___/ \__,_| |_____/ \___/ \__| |_|    |_|\___/ \__|

v0.8.2 

Running ModDotPlot in static mode

Retrieving k-mers from Chr21.... 

Progress: |████████████████████████████████████████| 100.0% Completed

Chr21 k-mers retrieved! 

Retrieving k-mers from Chr15.... 

Progress: |████████████████████████████████████████| 100.0% Completed

Chr15 k-mers retrieved! 

Computing pairwise identity matrix for Chr21 and Chr15... 

        Sequence length Chr21: 34930

        Sequence length Chr15: 6000000

        Window size w: 35

        Modimizer sketch size: 35

        Plot Resolution r: 1000

Traceback (most recent call last):████████████████-| 99.0% Complete
  File "/private/groups/migalab/hloucks/CenSat/arraySim/ModDotPlot/venv/bin/moddotplot", line 5, in <module>
    from moddotplot.__main__ import main
  File "/private/groups/migalab/hloucks/CenSat/arraySim/ModDotPlot/venv/lib/python3.11/site-packages/moddotplot/__main__.py", line 11, in <module>
    sys.exit(main())
             ^^^^^^
  File "/private/groups/migalab/hloucks/CenSat/arraySim/ModDotPlot/venv/lib/python3.11/site-packages/moddotplot/moddotplot.py", line 940, in main
    pair_mat = createPairwiseMatrix(
               ^^^^^^^^^^^^^^^^^^^^^
  File "/private/groups/migalab/hloucks/CenSat/arraySim/ModDotPlot/venv/lib/python3.11/site-packages/moddotplot/estimate_identity.py", line 96, in createPairwiseMatrix
    matrix = pairwiseContainmentMatrix(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/private/groups/migalab/hloucks/CenSat/arraySim/ModDotPlot/venv/lib/python3.11/site-packages/moddotplot/estimate_identity.py", line 385, in pairwiseContainmentMatrix
    containment_matrix[w, q] = (
    ~~~~~~~~~~~~~~~~~~^^^^^^
IndexError: index 998 is out of bounds for axis 0 with size 998

If I switch the order however, it works just fine:

moddotplot static -f sequences/chr15_segment.fa sequences/chr21_segment.tiny.fa --compare-only

  __  __           _   _____        _     _____  _       _   
 |  \/  |         | | |  __ \      | |   |  __ \| |     | |  
 | \  / | ___   __| | | |  | | ___ | |_  | |__) | | ___ | |_ 
 | |\/| |/ _ \ / _` | | |  | |/ _ \| __| |  ___/| |/ _ \| __|
 | |  | | (_) | (_| | | |__| | (_) | |_  | |    | | (_) | |_ 
 |_|  |_|\___/ \__,_| |_____/ \___/ \__| |_|    |_|\___/ \__|

v0.8.2 

Running ModDotPlot in static mode

Retrieving k-mers from Chr15.... 

Progress: |████████████████████████████████████████| 100.0% Completed

Chr15 k-mers retrieved! 

Retrieving k-mers from Chr21.... 

Progress: |████████████████████████████████████████| 100.0% Completed

Chr21 k-mers retrieved! 

Computing pairwise identity matrix for Chr15 and Chr21... 

        Sequence length Chr15: 6000000

        Sequence length Chr21: 34930

        Window size w: 6000

        Modimizer sketch size: 1500

        Plot Resolution r: 1000

Progress: |████████████████████████████████████████| 100.0% Completed


Saved bed file to Chr15.bed

None
5999
Creating plots and saving to ./Chr15_Chr21...

./Chr15_Chr21.pdf, ./Chr15_Chr21.png, ./Chr15_Chr21_HIST.pdf and ./Chr15_Chr21_HIST.png saved sucessfully. 

Cross-chromosome comparison errors out due to string integer in print statement

With v0.8.0, I get an error when comparing across two chromosomes (moddotplot interactive -f two_chromosomes.fa --compare-only)

    f"Building pairwise matrices for {seq_list[i]} and {seq_list[j]}, using a minimum window size of {window_lengths[0]}.... \n"
                                                        ~~~~~~~~^^^
TypeError: list indices must be integers or slices, not str

From these lines

print(
f"Quickly building pairwise matrices for {seq_list[i]} and {seq_list[j]}, using a window size of {window_lengths[0]}.... \n"
)
else:
print(
f"Building pairwise matrices for {seq_list[i]} and {seq_list[j]}, using a minimum window size of {window_lengths[0]}.... \n"
)

Removing the print statement allowed the plots to finish up correctly.

ValueError Crash (sequences too small?)

Ran ModDotPlot with the yeast reference genome.

moddotplot interactive -f ~/Data/assemblies/yeast-ref/yeast_genome.fna --port 8989

  __  __           _   _____        _     _____  _       _
 |  \/  |         | | |  __ \      | |   |  __ \| |     | |
 | \  / | ___   __| | | |  | | ___ | |_  | |__) | | ___ | |_
 | |\/| |/ _ \ / _` | | |  | |/ _ \| __| |  ___/| |/ _ \| __|
 | |  | | (_) | (_| | | |__| | (_) | |_  | |    | | (_) | |_
 |_|  |_|\___/ \__,_| |_____/ \___/ \__| |_|    |_|\___/ \__|

v0.8.0

...

And ran into the following error:

17 sequences were detected, however interactive mode can only load two sequences at a time.

Interactive mode will proceed with NC_001133.9 and NC_001134.8

Traceback (most recent call last):
  File "/home/Users/blk6/Contribute/ModDotPlot/venv/bin/moddotplot", line 5, in <module>
    from moddotplot.__main__ import main
  File "/home/Users/blk6/Contribute/ModDotPlot/venv/lib/python3.10/site-packages/moddotplot/__main__.py", line 11, in <module>
    sys.exit(main())
  File "/home/Users/blk6/Contribute/ModDotPlot/venv/lib/python3.10/site-packages/moddotplot/moddotplot.py", line 557, in main
    raise ValueError(
ValueError: Minimum window size must be greater than or equal to the modimizer sketch size

Here's the .fai entries for the relevant sequences:

NC_001133.9     230218  76      80      81
NC_001134.8     813184  233249  80      81

I'm guessing these sequences are too short for the default params. Would there be a way for the params to automatically adjust based on the shortest query sequence?

Mismatch sequence names in output

Hi! Thanks for the interesting and fast tool!

I was testing it on the GCF_000001735.4 A. thaliana assembly and I noticed a mismatch between the sequence names in the output files.

This is the fasta index (.fai) for the input genomes after extracting only the five chromosomes:

NC_003070.9	30427671	56	60	61
NC_003071.7	19698289	30934920	60	61
NC_003074.8	23459830	50961579	60	61
NC_003075.7	18585056	74812472	60	61
NC_003076.8	26975502	93707344	60	61

And with ModDotPlot (@ commit c1388eb) I ran the following command:

moddotplot static -f GCF_000001735.4_TAIR10.1_genomic.chrs.fna

Unexpectedly, this was the output. Notice the mismatch between the filename and the content of the files. Which one is correct?

head NC_*bed
==> NC_003070.9.bed <==
#query_name	query_start	query_end	reference_name	reference_start	reference_end	perID_by_events
NC_003070.9	1	30428	NC_003070.9	1	30428	100.0
NC_003070.9	1	30428	NC_003070.9	30429	60856	96.89179977979816
NC_003070.9	30429	60856	NC_003070.9	30429	60856	100.0
NC_003070.9	30429	60856	NC_003070.9	60857	91284	96.86273970525768
NC_003070.9	60857	91284	NC_003070.9	60857	91284	100.0
NC_003070.9	60857	91284	NC_003070.9	91285	121712	96.8066448041969
NC_003070.9	91285	121712	NC_003070.9	91285	121712	100.0
NC_003070.9	91285	121712	NC_003070.9	121713	152140	96.88899802419402
NC_003070.9	121713	152140	NC_003070.9	121713	152140	100.0

==> NC_003071.7.bed <==
#query_name	query_start	query_end	reference_name	reference_start	reference_end	perID_by_events
NC_003075.7	1	19699	NC_003075.7	1	19699	100.0
NC_003075.7	1	19699	NC_003075.7	19700	39398	96.34241311605368
NC_003075.7	1	19699	NC_003075.7	39399	59097	95.02161679397038
NC_003075.7	1	19699	NC_003075.7	59098	78796	92.86765095697956
NC_003075.7	1	19699	NC_003075.7	1615319	1635017	88.19960026756749
NC_003075.7	1	19699	NC_003075.7	1635018	1654716	88.19960026756749
NC_003075.7	1	19699	NC_003075.7	2541172	2560870	91.12687268394012
NC_003075.7	1	19699	NC_003075.7	2560871	2580569	93.41097733509187
NC_003075.7	1	19699	NC_003075.7	3210938	3230636	92.00089775836454

==> NC_003074.8.bed <==
#query_name	query_start	query_end	reference_name	reference_start	reference_end	perID_by_events
NC_003074.8	1	23460	NC_003074.8	1	23460	100.0
NC_003074.8	1	23460	NC_003074.8	23461	46920	96.83259585797633
NC_003074.8	23461	46920	NC_003074.8	23461	46920	100.0
NC_003074.8	23461	46920	NC_003074.8	46921	70380	96.68315315308905
NC_003074.8	46921	70380	NC_003074.8	46921	70380	100.0
NC_003074.8	46921	70380	NC_003074.8	70381	93840	96.97521648973213
NC_003074.8	70381	93840	NC_003074.8	70381	93840	100.0
NC_003074.8	70381	93840	NC_003074.8	93841	117300	96.70998860321123
NC_003074.8	93841	117300	NC_003074.8	93841	117300	100.0

==> NC_003075.7.bed <==
#query_name	query_start	query_end	reference_name	reference_start	reference_end	perID_by_events
NC_003076.8	1	18586	NC_003076.8	1	18586	100.0
NC_003076.8	1	18586	NC_003076.8	18587	37172	96.95608358007378
NC_003076.8	18587	37172	NC_003076.8	18587	37172	100.0
NC_003076.8	18587	37172	NC_003076.8	37173	55758	96.87871811713131
NC_003076.8	37173	55758	NC_003076.8	37173	55758	100.0
NC_003076.8	37173	55758	NC_003076.8	55759	74344	96.87827825075271
NC_003076.8	55759	74344	NC_003076.8	55759	74344	100.0
NC_003076.8	55759	74344	NC_003076.8	74345	92930	96.91004434964874
NC_003076.8	74345	92930	NC_003076.8	74345	92930	100.0

==> NC_003076.8.bed <==
#query_name	query_start	query_end	reference_name	reference_start	reference_end	perID_by_events
NC_003071.7	1	26976	NC_003071.7	1	26976	100.0
NC_003071.7	1	26976	NC_003071.7	26977	53952	96.89687811961811
NC_003071.7	26977	53952	NC_003071.7	26977	53952	100.0
NC_003071.7	26977	53952	NC_003071.7	53953	80928	96.7788487007729
NC_003071.7	53953	80928	NC_003071.7	53953	80928	100.0
NC_003071.7	53953	80928	NC_003071.7	80929	107904	96.86344796571336
NC_003071.7	53953	80928	NC_003071.7	8902081	8929056	90.20730812484314
NC_003071.7	53953	80928	NC_003071.7	8929057	8956032	90.20730812484314
NC_003071.7	53953	80928	NC_003071.7	12085249	12112224	91.56204318214218

Legends for static plots?

Hi, thanks for the great software!
I've been trying out the static plots and noticed that there are no color legends. Would it be possible to add such legends to these plots?

Save in html or export the martix in mcool format

Hi, Alex

It's so nice to see tons of updates on ModDotPlot. I have a question about the interactive mode. For most plant genomes, its centromere/HOR region is unknown, it would be great to have an interactive mode to explore across the genome. But it would be difficult for most HPC to use with ssh port. So could the ModDotPlot have these features?

  1. Save the output in plotly html format? It's easier to download and explore.
  2. Similar to StainedGlass, could ModDotPlot save the raw matrix with cooler? It's easier to play around with HiGlass (identity, load annotation tracks, such as ChIP-Seq and gene/TE annotation)

Best regards
Zhigui

similarity scale legend in static mode

Hi,

awesome tool and superfast :)

As mentioned here #5, histogram is helpful but if we need scale legend similar to interactive mode how to generate for static mode? Can there be an additional text file with this information which can be used for plotting separately or add the color code to bed file?

Thank you

a vs b dotplot error

Hi,

When I try to run an a vs b plot using --compare, it produces this error
TypeError: paired_bed_file_a_vs_b() takes 15 positional arguments but 16 were given

It looks like there's a disconnect between the definition of the function in static_plots.py and how it's called in moddotplot.py (line 305)?

All vs All interactive plot

Is it possible to generate an All vs All dotplot with ModDotPlot?

When I run moddotplot -i Asm_A.fasta Asm_B.fasta --compare I'm just getting a single dotplot of contig_1 vs contig_2 from the first fasta file only.

Some feedbacks

Hi Alex,

Great tool! Congratulations!!
While using it, I found several places that could benefit from more clarification and slight modifications.

  1. The relationship between -w and -d is not very clear.
  2. In the bed file, query_name and reference_name are all replaced by the -p value. Can it be the original sequence names?
  3. Line 34 of mod_identity.py is hard coded to perID*100 >= 80; can you make it customizable to allow reporting lower identity regions?
  4. The mod_identity.py script can only run on one sequence? Can it take the entire genome?
  5. The plot.r script requires a folder named results in the working directory; otherwise, it will run into errors.
  6. The plot.r script requires ggplot2, dplyr, scales, RColorBrewer, data.table, cowplot, glue, argparse packages installed in R.

Thanks!
Shujun

Swapped labels a vs b plot?

Hi thanks for the great tool!

I am testing it and I noticed a difference and it got me confused. I looked at #22 but it I downloaded moddotplot last week so it should not be this.

I have a 144kb sequence that I compare to a 239kb sequence using the command

for sample in sample17 sample18; do
        moddotplot static \
                -f sample16.fa $sample\
                -o sample16_vs_${sample} --compare-only \
                -w 100 --identity 70 --breakpoints 70 80 90 95 100 --palette Spectral_4 --dpi 1200 \
done

The output I get is this.
image

The strange thing is I know that sample 16 is 144kb, sample 17 238 and sample 18 172kb, also by looking at their fasta sequences:
image

When I look at the bed file it looks like they are swapped, as the header is
#query_name query_start query_end reference_name reference_start reference_end perID_by_events

image

Can this be a bug? Should the labels be switched?

Adding separate query and test sequences

Hi,
I was wondering if there was a way to make the static plot for two separate sequences? As I understand it right now, it checks the identity of a sequence with itself.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.