segatalab / hclust2 Goto Github PK

Hclust2 is a handy tool for plotting heat-maps with several useful options to produce high quality figures that can be used in publication.

License: MIT License

Python 100.00%

hclust2's Introduction

Hclust2 is a handy tool for plotting heat-maps with several useful options to produce high quality figures that can be used in publication.

Installation

$ conda install -c bioconda hclust2

$ pip install hclust2

Examples

Below is the heatmap produced by Hclust2 on the MetaPhlAn2 abundance profiles of HMP and HMP1-phase2 samples (microbial species and samples are hierarchically clustered).

Usage

#!python

usage: hclust2.py [-h] [-i [INPUT_FILE]] [-o [OUTPUT_FILE]]
                  [--legend_file [LEGEND_FILE]] [-t INPUT_TYPE] [--sep SEP]
                  [--out_table OUT_TABLE] [--fname_row FNAME_ROW]
                  [--sname_row SNAME_ROW] [--metadata_rows METADATA_ROWS]
                  [--skip_rows SKIP_ROWS] [--sperc SPERC] [--fperc FPERC]
                  [--stop STOP] [--ftop FTOP] [--def_na DEF_NA]
                  [--f_dist_f F_DIST_F] [--s_dist_f S_DIST_F]
                  [--load_dist_matrix_f LOAD_DIST_MATRIX_F]
                  [--load_dist_matrix_s LOAD_DIST_MATRIX_S]
                  [--load_pickled_dist_matrix_f LOAD_PICKLED_DIST_MATRIX_F]
                  [--load_pickled_dist_matrix_s LOAD_PICKLED_DIST_MATRIX_S]
                  [--save_pickled_dist_matrix_f SAVE_PICKLED_DIST_MATRIX_F]
                  [--save_pickled_dist_matrix_s SAVE_PICKLED_DIST_MATRIX_S]
                  [--no_fclustering] [--no_sclustering] [--flinkage FLINKAGE]
                  [--slinkage SLINKAGE] [--dpi DPI] [-l] [--title TITLE] [-s]
                  [--no_slabels] [--minv MINV] [--maxv MAXV] [--no_flabels]
                  [--max_slabel_len MAX_SLABEL_LEN]
                  [--max_flabel_len MAX_FLABEL_LEN]
                  [--flabel_size FLABEL_SIZE] [--slabel_size SLABEL_SIZE]
                  [--fdend_width FDEND_WIDTH] [--sdend_height SDEND_HEIGHT]
                  [--metadata_height METADATA_HEIGHT]
                  [--metadata_separation METADATA_SEPARATION]
                  [--image_size IMAGE_SIZE]
                  [--cell_aspect_ratio CELL_ASPECT_RATIO]
                  [-c {Accent,Blues,BrBG,BuGn,BuPu,Dark2,GnBu,Greens,Greys,OrRd,Oranges,PRGn,Paired,Pastel1,Pastel2,PiYG,PuBu,PuBuGn,PuOr,PuRd,Purples,RdBu,RdGy,RdPu,RdYlBu,RdYlGn,Reds,Set1,Set2,Set3,Spectral,YlGn,YlGnBu,YlOrBr,YlOrRd,afmhot,autumn,binary,bone,brg,bwr,cool,copper,flag,gist_earth,gist_gray,gist_heat,gist_ncar,gist_rainbow,gist_stern,gist_yarg,gnuplot,gnuplot2,gray,hot,hsv,jet,ocean,pink,prism,rainbow,seismic,spectral,spring,summer,terrain,winter,bbcyr,bbcry,bcry}]
                  [--bottom_c BOTTOM_C] [--top_c TOP_C] [--nan_c NAN_C]

optional arguments:
  -h, --help            show this help message and exit
  -i [INPUT_FILE], --inp [INPUT_FILE], --in [INPUT_FILE]
                        The input matrix
  -o [OUTPUT_FILE], --out [OUTPUT_FILE]
                        The output image file [image on screen of not
                        specified]
  --legend_file [LEGEND_FILE]
                        The output file for the legend of the provided
                        metadata
  -t INPUT_TYPE, --input_type INPUT_TYPE
                        The input type can be a data matrix or distance matrix
                        [default data_matrix]

Input data matrix parameters:
  --sep SEP
  --out_table OUT_TABLE
                        Write processed data matrix to file
  --fname_row FNAME_ROW
                        row number containing the names of the features
                        [default 0, specify -1 if no names are present in the
                        matrix
  --sname_row SNAME_ROW
                        column number containing the names of the samples
                        [default 0, specify -1 if no names are present in the
                        matrix
  --metadata_rows METADATA_ROWS
                        Row numbers to use as metadata[default None, meaning
                        no metadata
  --skip_rows SKIP_ROWS
                        Row numbers to skip (0-indexed, comma separated) from
                        the input file[default None, meaning no rows skipped
  --sperc SPERC         Percentile of sample value distribution for sample
                        selection
  --fperc FPERC         Percentile of feature value distribution for sample
                        selection
  --stop STOP           Number of top samples to select (ordering based on
                        percentile specified by --sperc)
  --ftop FTOP           Number of top features to select (ordering based on
                        percentile specified by --fperc)
  --def_na DEF_NA       Set the default value for missing values [default None
                        which means no replacement]

Distance parameters:
  --f_dist_f F_DIST_F   Distance function for features [default correlation]
  --s_dist_f S_DIST_F   Distance function for sample [default euclidean]
  --load_dist_matrix_f LOAD_DIST_MATRIX_F
                        Load the distance matrix to be used for features
                        [default None].
  --load_dist_matrix_s LOAD_DIST_MATRIX_S
                        Load the distance matrix to be used for samples
                        [default None].
  --load_pickled_dist_matrix_f LOAD_PICKLED_DIST_MATRIX_F
                        Load the distance matrix to be used for features as
                        previously saved as pickle file using hclust2 itself
                        [default None].
  --load_pickled_dist_matrix_s LOAD_PICKLED_DIST_MATRIX_S
                        Load the distance matrix to be used for samples as
                        previously saved as pickle file using hclust2 itself
                        [default None].
  --save_pickled_dist_matrix_f SAVE_PICKLED_DIST_MATRIX_F
                        Save the distance matrix for features to file [default
                        None].
  --save_pickled_dist_matrix_s SAVE_PICKLED_DIST_MATRIX_S
                        Save the distance matrix for samples to file [default
                        None].

Clustering parameters:
  --no_fclustering      avoid clustering features
  --no_sclustering      avoid clustering samples
  --flinkage FLINKAGE   Linkage method for feature clustering [default
                        average]
  --slinkage SLINKAGE   Linkage method for sample clustering [default average]


Heatmap options:
  --dpi DPI             Image resolution in dpi [default 150]
  -l, --log_scale       Log scale
  --title TITLE         Title of the plot
  -s, --sqrt_scale      Square root scale
  --no_slabels          Do not show sample labels
  --minv MINV           Minimum value to display in the color map [default
                        None meaning automatic]
  --maxv MAXV           Maximum value to display in the color map [default
                        None meaning automatic]
  --no_flabels          Do not show feature labels
  --max_slabel_len MAX_SLABEL_LEN
                        Max number of chars to report for sample labels
                        [default 15]
  --max_flabel_len MAX_FLABEL_LEN
                        Max number of chars to report for feature labels
                        [default 15]
  --flabel_size FLABEL_SIZE
                        Feature label font size [default 10]
  --slabel_size SLABEL_SIZE
                        Sample label font size [default 10]
  --fdend_width FDEND_WIDTH
                        Width of the feature dendrogram [default 1 meaning
                        100% of default heatmap width]
  --sdend_height SDEND_HEIGHT
                        Height of the sample dendrogram [default 1 meaning
                        100% of default heatmap height]
  --metadata_height METADATA_HEIGHT
                        Height of the metadata panel [default 0.05 meaning 5%
                        of default heatmap height]
  --metadata_separation METADATA_SEPARATION
                        Distance between the metadata and data panels.
                        [default 0.001 meaning 0.1% of default heatmap height]
  --image_size IMAGE_SIZE
                        Size of the largest between width and eight size for

  --cell_aspect_ratio CELL_ASPECT_RATIO
                        Aspect ratio between width and height for the cells of
                        the heatmap [default 1.0]
  -c {Accent,Blues,BrBG,BuGn,BuPu,Dark2,GnBu,Greens,Greys,OrRd,Oranges,PRGn,Paired,Pastel1,Pastel2,PiYG,PuBu,PuBuGn,PuOr,PuRd,Purples,RdBu,RdGy,RdPu,RdYlBu,RdYlGn,Reds,Set1,Set2,Set3,Spectral,YlGn,YlGnBu,YlOrBr,YlOrRd,afmhot,autumn,binary,bone,brg,bwr,cool,copper,flag,gist_earth,gist_gray,gist_heat,gist_ncar,gist_rainbow,gist_stern,gist_yarg,gnuplot,gnuplot2,gray,hot,hsv,jet,ocean,pink,prism,rainbow,seismic,spectral,spring,summer,terrain,winter,bbcyr,bbcry,bcry}, --colormap {Accent,Blues,BrBG,BuGn,BuPu,Dark2,GnBu,Greens,Greys,OrRd,Oranges,PRGn,Paired,Pastel1,Pastel2,PiYG,PuBu,PuBuGn,PuOr,PuRd,Purples,RdBu,RdGy,RdPu,RdYlBu,RdYlGn,Reds,Set1,Set2,Set3,Spectral,YlGn,YlGnBu,YlOrBr,YlOrRd,afmhot,autumn,binary,bone,brg,bwr,cool,copper,flag,gist_earth,gist_gray,gist_heat,gist_ncar,gist_rainbow,gist_stern,gist_yarg,gnuplot,gnuplot2,gray,hot,hsv,jet,ocean,pink,prism,rainbow,seismic,spectral,spring,summer,terrain,winter,bbcyr,bbcry,bcry}
  --bottom_c BOTTOM_C   Color to use for cells below the minimum value of the
                        scale [default None meaning bottom color of the scale]
  --top_c TOP_C         Color to use for cells below the maximum value of the
                        scale [default None meaning bottom color of the scale]
  --nan_c NAN_C         Color to use for nan cells [default None]

hclust2's People

Contributors

Stargazers

Watchers

Forkers

hpcbio vitorheidrich tughv

hclust2's Issues

TypeError: '<' not supported between instances of 'float' and 'str'

Hello hclust2 community,
I am running on Windows Subsystem Linux Ubuntu 20. hclust2 was installed using conda
conda install -c bioconda hclust2
conda update hclust2

$ hclust2.py -i abundance_table_species.txt -o abundance_heatmap_species.png --ftop 25 --f_dist_f braycurtis --s_dist_f braycurtis --cell_aspect_ratio 0.5 -l --flabel_size 6 --slabel_size 6 --max_flabel_len 100 --max_slabel_len 100 --minv 0.1 --dpi 300
Traceback (most recent call last):
  File "/home/renekat/anaconda3/envs/phlan/bin/hclust2.py", line 825, in <module>
    hclust2_main()
  File "/home/renekat/anaconda3/envs/phlan/bin/hclust2.py", line 784, in hclust2_main
    dm = DataMatrix( args.inp, args )
  File "/home/renekat/anaconda3/envs/phlan/bin/hclust2.py", line 174, in __init__
    select( self.args.fperc, self.args.ftop )
  File "/home/renekat/anaconda3/envs/phlan/bin/hclust2.py", line 159, in select
    self.table['perc'] = self.table.apply(lambda x: stats.scoreatpercentile(x,perc),axis=1)
  File "/home/renekat/anaconda3/envs/phlan/lib/python3.7/site-packages/pandas/core/frame.py", line 6878, in apply
    return op.get_result()
  File "/home/renekat/anaconda3/envs/phlan/lib/python3.7/site-packages/pandas/core/apply.py", line 186, in get_result
    return self.apply_standard()
  File "/home/renekat/anaconda3/envs/phlan/lib/python3.7/site-packages/pandas/core/apply.py", line 296, in apply_standard
    values, self.f, axis=self.axis, dummy=dummy, labels=labels
  File "pandas/_libs/reduction.pyx", line 620, in pandas._libs.reduction.compute_reduction
  File "pandas/_libs/reduction.pyx", line 128, in pandas._libs.reduction.Reducer.get_result
  File "/home/renekat/anaconda3/envs/phlan/bin/hclust2.py", line 159, in <lambda>
    self.table['perc'] = self.table.apply(lambda x: stats.scoreatpercentile(x,perc),axis=1)
  File "/home/renekat/anaconda3/envs/phlan/lib/python3.7/site-packages/scipy/stats/stats.py", line 1891, in scoreatpercentile
    sorted_ = np.sort(a, axis=axis)
  File "<__array_function__ internals>", line 6, in sort
  File "/home/renekat/anaconda3/envs/phlan/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 989, in sort
    a.sort(axis=axis, kind=kind, order=order)
TypeError: '<' not supported between instances of 'float' and 'str'

Any help with this error would be appreciated!
Best,
Rene

EDITED:
I am not sure if the two errors are related, but I tried to run on the example data and got this:

$ /mnt/c/Users/Student/github/hclust2/examples/HMP-MetaPhlAn/run.sh
Traceback (most recent call last):
  File "/home/renekat/anaconda3/envs/phlan/bin/hclust2.py", line 825, in <module>
    hclust2_main()
  File "/home/renekat/anaconda3/envs/phlan/bin/hclust2.py", line 803, in hclust2_main
    cl.shcluster()
  File "/home/renekat/anaconda3/envs/phlan/bin/hclust2.py", line 380, in shcluster
    self.shclusters = sph.linkage(self.s_dm, method=self.args.slinkage)
  File "/home/renekat/anaconda3/envs/phlan/lib/python3.7/site-packages/scipy/cluster/hierarchy.py", line 1057, in linkage
    raise ValueError("The condensed distance matrix must contain only "
ValueError: The condensed distance matrix must contain only finite values.

Citation

Hello hclust2 developers,
I have found your tool very helpful and would like to cite you.
Would you add the "Cite this Repository" to your page?
Thank you!
René
https://academia.stackexchange.com/questions/14010/how-do-you-cite-a-github-repository

Question regarding some HClust options

I hope I could get a clarification on what --sperc & --fperc option do... What does "Percentile of feature/sample value distribution for sample selection" mean? Furthermore, if I could get an explanation on what ftop or stop does too. Using the ftop option does not seem to pull out the top abundant features in my dataset.

Thank you, any explanation would be greatly appreciated.

The condensed distance matrix must contain only finite values. in hclust2.py

I am trying to make a heatmap with hclust2.py in cluster slurm with conda, it contains the following files:
1:merged_metagenome.txt
2:hclust2.py

the command was like this:
hclust2.py -i hclust2.py -i merged_metagenome.txt -o heatmap.png -o heat_legends.pdf
The error it shows is the following:
File "/hpcfs/home/cursos/bcom4102/.local/bin/hclust2.py", line 825, in
hclust2_main()
File "/hpcfs/home/cursos/bcom4102/.local/bin/hclust2.py", line 805, in hclust2_main
cl.fhcluster()
File "/hpcfs/home/cursos/bcom4102/.local/bin/hclust2.py", line 386, in fhcluster
self.fhclusters = sph.linkage(self.f_dm, method=self.args.flinkage)
File "/hpcfs/apps/anaconda/3.9/lib/python3.9/site-packages/scipy/cluster/hierarchy.py", line 1065, in linkage
raise ValueError("The condensed distance matrix must contain only "
ValueError: The condensed distance matrix must contain only finite values.

Do you know how to solve this problem? I attach an image of the nerged_metagenome.txt

hclust2 reported the matplotlib warning

Hello,when I was using the hclust2 to draw, there is a problem occurred. Can you please help me solve it ?Here is the warning.

/data/workdir/huwa/software/miniconda3/envs/mpa/bin/hclust2.py:588: MatplotlibDeprecationWarning: You are modifying the state of a globally registered colormap. This has been deprecated since 3.3 and in 3.6, you will not be able to modify a registered colormap in-place. To remove this warning, you can make a copy of the colormap first. cmap = mpl.cm.get_cmap("bbcry").copy()
cm.set_under( bottom_col )
/data/workdir/huwa/software/miniconda3/envs/mpa/bin/hclust2.py:594: MatplotlibDeprecationWarning: You are modifying the state of a globally registered colormap. This has been deprecated since 3.3 and in 3.6, you will not be able to modify a registered colormap in-place. To remove this warning, you can make a copy of the colormap first. cmap = mpl.cm.get_cmap("bbcry").copy()
cm.set_over( top_col )
Traceback (most recent call last):
File "/data/workdir/huwa/software/miniconda3/envs/mpa/bin/hclust2.py", line 825, in
hclust2_main()
File "/data/workdir/huwa/software/miniconda3/envs/mpa/bin/hclust2.py", line 822, in hclust2_main
hm.draw()
File "/data/workdir/huwa/software/miniconda3/envs/mpa/bin/hclust2.py", line 684, in draw
norm = norm_f( vmin=minv if minv > 0.0 else None, vmax=maxv)
File "/data/workdir/huwa/software/miniconda3/envs/mpa/lib/python3.7/site-packages/matplotlib/_api/deprecation.py", line 459, in wrapper
return func(*args, **kwargs)
File "/data/workdir/huwa/software/miniconda3/envs/mpa/lib/python3.7/site-packages/matplotlib/init.py", line 1414, in inner
return func(ax, *map(sanitize_sequence, args), **kwargs)
File "/data/workdir/huwa/software/miniconda3/envs/mpa/lib/python3.7/site-packages/matplotlib/axes/_axes.py", line 5492, in imshow
im._scale_norm(norm, vmin, vmax)
File "/data/workdir/huwa/software/miniconda3/envs/mpa/lib/python3.7/site-packages/matplotlib/cm.py", line 381, in _scale_norm
"Passing parameters norm and vmin/vmax simultaneously is "
ValueError: Passing parameters norm and vmin/vmax simultaneously is not supported. Please pass vmin/vmax directly to the norm when creating it.

IndexError with hclust2 conda environment

Hello,

I am using hclust2 to visualize merged data from metaphlan. I am running it as a conda environment, called app2, on Ubuntu. The conda installation is in /opt/conda/. There were no errors when I installed hclust2 in the environment.

This is the command I am running:
hclust2.py --in merged_abundance_table.txt -l --out heatmap.png

Here is the error:

/opt/conda/envs/app2/bin/hclust2.py:152: FutureWarning: read_table is deprecated, use read_csv instead.
  index_col = self.args.sname_row if self.args.sname_row > -1 else None
Traceback (most recent call last):
  File "/opt/conda/envs/app2/bin/hclust2.py", line 781, in <module>
    dm = DataMatrix( args.inp, args )
  File "/opt/conda/envs/app2/bin/hclust2.py", line 152, in __init__
    index_col = self.args.sname_row if self.args.sname_row > -1 else None
  File "/opt/conda/envs/app2/lib/python2.7/site-packages/pandas/io/parsers.py", line 702, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/opt/conda/envs/app2/lib/python2.7/site-packages/pandas/io/parsers.py", line 435, in _read
    data = parser.read(nrows)
  File "/opt/conda/envs/app2/lib/python2.7/site-packages/pandas/io/parsers.py", line 1139, in read
    ret = self._engine.read(nrows)
  File "/opt/conda/envs/app2/lib/python2.7/site-packages/pandas/io/parsers.py", line 2033, in read
    values = data.pop(self.index_col[i])
IndexError: list index out of range

Here is how I installed hclust2 using conda:

conda create -n app2
conda activate app2
conda config --env --add channels bioconda
conda install --yes hclust2

Here is the merged_abundance text file. During the metaphlan runs I saw this warning:

WARNING: The metagenome profile contains clades that represent multiple species merged into a single representant. An additional column listing the merged species is added to the MetaPhlAn output.

Could this have something to do with the error?

ValueError: The condensed distance matrix must contain only finite values

Hello,

This is error is related to #1. Once that issue was solved and @fbeghini closed it, I reinstalled hclust2 in a conda environment, as follows:

conda create -n hclust
conda activate hclust2
conda config --env --add channels bioconda
conda install --yes hclust2

Using the same merged abundance file mentioned in #1 (created using metaphlan3), I ran the following command:

$ hclust2.py --in merged_abundance_table.txt -l --out heatmap.png

And got the following error:

Traceback (most recent call last):
  File "/opt/conda/envs/hclust2/bin/hclust2.py", line 825, in <module>
    hclust2_main()
  File "/opt/conda/envs/hclust2/bin/hclust2.py", line 805, in hclust2_main
    cl.fhcluster()
  File "/opt/conda/envs/hclust2/bin/hclust2.py", line 386, in fhcluster
    self.fhclusters = sph.linkage(self.f_dm, method=self.args.flinkage)
  File "/opt/conda/envs/hclust2/lib/python3.8/site-packages/scipy/cluster/hierarchy.py", line 1057, in linkage
    raise ValueError("The condensed distance matrix must contain only "
ValueError: The condensed distance matrix must contain only finite values.

This being a different error than before, I assume that hclust2 on bioconda channel had been updated to fix issue #1. In case I was wrong, I followed the advise @fbeghini posted in #1 to manually remove the first line (containing the string #mpa_v30_CHOCOPhlAn_201901) and the column, NCBI_tax_id, but got the same error.

Looking over the matrix in the merged abundance file, it is not immediately clear why the matrix would contain non-finite values.

Italicize labels

Hello,

Is there a way to italicize the labels on the taxonomic profile (e.g. name of the species) heatmap using hclust2.py?

Thank you in advance!

Metadata color palette

Hello, I am generating two heatmaps with the same metadata categories, however, each time I generate a graph, the color legend changes. It makes it hard to compare two heatmaps.

Is there a way to manually select a few colors by certain order so that I can have two heatmaps show the same colors of my metadata categories?

Such as red for Ohio, orange for Indiana, yellow for illinois, green, for Wisconsin, so on so forth.

Best,

hclust2-1.0.0 :: test failure

Hello,

I just installed hclust2 from tagged release 1.0.0, in order to be abble to generate heatmaps form metaphlan outputs.

in order to test it I ran the test as described here: https://github.com/SegataLab/hclust2/blob/master/examples/HMP-MetaPhlAn/run.sh using the provided HMP.species.txt file

ang got the same error than: #3
and also: biobakery forum #1732 #1705

any insight ?

python/3.8.1
cycler==0.10.0
hclust2==1.0.0
kiwisolver==1.3.1
matplotlib==3.4.2
numpy==1.21.0
pandas==1.3.0
Pillow==8.3.0
pyparsing==2.4.7
python-dateutil==2.8.1
pytz==2021.1
scipy==1.7.0
six==1.16.0

regards

Eric

segatalab / hclust2 Goto Github PK

hclust2's Introduction

Installation

Examples

Usage

hclust2's People

Contributors

Stargazers

Watchers

Forkers

hclust2's Issues

Recommend Projects

Recommend Topics

Recommend Org