lsys / forestplot Goto Github PK

View Code? Open in Web Editor NEW

110.0 3.0 10.0 8.58 MB

A Python package to make publication-ready but customizable coefficient plots.

Home Page: http://forestplot.rtfd.io

License: MIT License

Python 24.08% Makefile 0.32% Jupyter Notebook 75.61%

data-science forestplot python visualization dataviz matplotlib coefficientplot data-visualization

forestplot's Introduction

Forestplot

Easy API for forest plots.
A Python package to make publication-ready but customizable forest plots.

This package makes publication-ready forest plots easy to make out-of-the-box. Users provide a dataframe (e.g. from a spreadsheet) where rows correspond to a variable/study with columns including estimates, variable labels, and lower and upper confidence interval limits. Additional options allow easy addition of columns in the dataframe as annotations in the plot.


Release
Status
Coverage
Python
Docs
Meta
Binder

show/hide

Installation

Quick Start

Some Examples with Customizations

Gallery and API Options

Multi-models

Known Issues

Background and Additional Resources

Contributing

Installation

Install from PyPI

pip install forestplot

Install from conda-forge

conda install forestplot

Install from source

git clone https://github.com/LSYS/forestplot.git
cd forestplot
pip install .

Developer installation

git clone https://github.com/LSYS/forestplot.git
cd forestplot
pip install -r requirements_dev.txt

make lint
make test

(back to top)

Quick Start

import forestplot as fp

df = fp.load_data("sleep")  # companion example data
df.head(3)

	var	r	moerror	label	group	ll	hl	n	power	p-val
0	age	0.0903729	0.0696271	in years	age	0.02	0.16	706	0.671578	0.0163089
1	black	-0.0270573	0.0770573	=1 if black	other factors	-0.1	0.05	706	0.110805	0.472889
2	clerical	0.0480811	0.0719189	=1 if clerical worker	occupation	-0.03	0.12	706	0.247768	0.201948

(* This is a toy example of how certain factors correlate with the amount of sleep one gets. See the notebook that generates the data.)

The example input dataframe above have 4 key columns

Column	Description	Required
`var`	Variable label	✓
`r`	Correlation coefficients (estimates to plot)	✓
`label`	Variable labels	✓
`group`	Variable grouping labels
`ll`	Conf. int. lower limits
`hl`	Containing the conf. int. higher limits
`n`	Sample size
`power`	Statistical power
`p-val`	P-value

(See Gallery and API Options for more details on required and optional arguments.)

Make the forest plot

fp.forestplot(df,  # the dataframe with results data
              estimate="r",  # col containing estimated effect size 
              ll="ll", hl="hl",  # columns containing conf. int. lower and higher limits
              varlabel="label",  # column containing variable label
              ylabel="Confidence interval",  # y-label title
              xlabel="Pearson correlation",  # x-label title
              )

Save the plot

plt.savefig("plot.png", bbox_inches="tight")

(back to top)

Some Examples With Customizations

Add variable groupings, add group order, and sort by estimate size.

fp.forestplot(df,  # the dataframe with results data
              estimate="r",  # col containing estimated effect size 
              ll="ll", hl="hl",  # columns containing conf. int. lower and higher limits              
              varlabel="label",  # column containing variable label
              capitalize="capitalize",  # Capitalize labels
              groupvar="group",  # Add variable groupings 
              # group ordering
              group_order=["labor factors", "occupation", "age", "health factors", 
                           "family factors", "area of residence", "other factors"],
              sort=True  # sort in ascending order (sorts within group if group is specified)               
              )

Add p-values on the right and color alternate rows gray

fp.forestplot(df,  # the dataframe with results data
              estimate="r",  # col containing estimated effect size 
              ll="ll", hl="hl",  # columns containing conf. int. lower and higher limits
              varlabel="label",  # column containing variable label
              capitalize="capitalize",  # Capitalize labels
              groupvar="group",  # Add variable groupings 
              # group ordering
              group_order=["labor factors", "occupation", "age", "health factors", 
                           "family factors", "area of residence", "other factors"],
              sort=True,  # sort in ascending order (sorts within group if group is specified)               
              pval="p-val",  # Column of p-value to be reported on right
              color_alt_rows=True,  # Gray alternate rows
              ylabel="Est.(95% Conf. Int.)",  # ylabel to print
              **{"ylabel1_size": 11}  # control size of printed ylabel
              )

Customize annotations and make it a table

fp.forestplot(df,  # the dataframe with results data
              estimate="r",  # col containing estimated effect size 
              ll="ll", hl="hl",  # lower & higher limits of conf. int.
              varlabel="label",  # column containing the varlabels to be printed on far left
              capitalize="capitalize",  # Capitalize labels
              pval="p-val",  # column containing p-values to be formatted
              annote=["n", "power", "est_ci"],  # columns to report on left of plot
              annoteheaders=["N", "Power", "Est. (95% Conf. Int.)"],  # ^corresponding headers
              rightannote=["formatted_pval", "group"],  # columns to report on right of plot 
              right_annoteheaders=["P-value", "Variable group"],  # ^corresponding headers
              xlabel="Pearson correlation coefficient",  # x-label title
              table=True,  # Format as a table
              )

Strip down all bells and whistle

fp.forestplot(df,  # the dataframe with results data
              estimate="r",  # col containing estimated effect size 
              ll="ll", hl="hl",  # lower & higher limits of conf. int.
              varlabel="label",  # column containing the varlabels to be printed on far left
              capitalize="capitalize",  # Capitalize labels
              ci_report=False,  # Turn off conf. int. reporting
              flush=False,  # Turn off left-flush of text
              **{'fontfamily': 'sans-serif'}  # revert to sans-serif                              
              )

Example with more customizations

fp.forestplot(df,  # the dataframe with results data
              estimate="r",  # col containing estimated effect size 
              ll="ll", hl="hl",  # lower & higher limits of conf. int.
              varlabel="label",  # column containing the varlabels to be printed on far left
              capitalize="capitalize",  # Capitalize labels
              pval="p-val",  # column containing p-values to be formatted
              annote=["n", "power", "est_ci"],  # columns to report on left of plot
              annoteheaders=["N", "Power", "Est. (95% Conf. Int.)"],  # ^corresponding headers
              rightannote=["formatted_pval", "group"],  # columns to report on right of plot 
              right_annoteheaders=["P-value", "Variable group"],  # ^corresponding headers
              groupvar="group",  # column containing group labels
              group_order=["labor factors", "occupation", "age", "health factors", 
                           "family factors", "area of residence", "other factors"],                   
              xlabel="Pearson correlation coefficient",  # x-label title
              xticks=[-.4,-.2,0, .2],  # x-ticks to be printed
              sort=True,  # sort estimates in ascending order
              table=True,  # Format as a table
              # Additional kwargs for customizations
              **{"marker": "D",  # set maker symbol as diamond
                 "markersize": 35,  # adjust marker size
                 "xlinestyle": (0, (10, 5)),  # long dash for x-reference line 
                 "xlinecolor": "#808080",  # gray color for x-reference line
                 "xtick_size": 12,  # adjust x-ticker fontsize
                }  
              )

Annotations arguments allowed include:

ci_range: Confidence interval range (e.g. (-0.39 to -0.25)).
est_ci: Estimate and CI (e.g. -0.32(-0.39 to -0.25)).
formatted_pval: Formatted p-values (e.g. 0.01**).

To confirm what processed columns are available as annotations, you can do:

processed_df, ax = fp.forestplot(df, 
                                 ...  # other arguments here
                                 return_df=True  # return processed dataframe with processed columns
                                )
processed_df.head(3)

	label	group	n	r	CI95%	p-val	BF10	power	var	hl	ll	moerror	formatted_r	formatted_ll	formatted_hl	ci_range	est_ci	formatted_pval	formatted_n	formatted_power	formatted_est_ci	yticklabel	formatted_formatted_pval	formatted_group	yticklabel2
0	Mins worked per week	Labor factors	706	-0.321384	[-0.39 -0.25]	1.99409e-18	1.961e+15	1	totwrk	-0.25	-0.39	0.0686165	-0.32	-0.39	-0.25	(-0.39 to -0.25)	-0.32(-0.39 to -0.25)	0.0***	706	1	-0.32(-0.39 to -0.25)	Mins worked per week 706 1.0 -0.32(-0.39 to -0.25)	0.0***	Labor factors	0.0*** Labor factors
1	Years of schooling	Labor factors	706	-0.0950039	[-0.17 -0.02]	0.0115515	1.137	0.72	educ	-0.02	-0.17	0.0749961	-0.1	-0.17	-0.02	(-0.17 to -0.02)	-0.10(-0.17 to -0.02)	0.01**	706	0.72	-0.10(-0.17 to -0.02)	Years of schooling 706 0.72 -0.10(-0.17 to -0.02)	0.01**	Labor factors	0.01** Labor factors

(back to top)

Multi-models

For coefficient plots where each variable can have multiple estimates (each model has one).

import forestplot as fp

df_mmodel = pd.read_csv("../examples/data/sleep-mmodel.csv").query(
    "model=='all' | model=='young kids'"
)
df_mmodel.head(3)

	var	coef	se	T	pval	r2	adj_r2	ll	hl	model	group	label
0	age	0.994889	1.96925	0.505213	0.613625	0.127289	0.103656	-2.87382	4.8636	all	age	in years
3	age	22.634	15.4953	1.4607	0.149315	0.178147	-0.0136188	-8.36124	53.6293	young kids	age	in years
4	black	-84.7966	82.1501	-1.03222	0.302454	0.127289	0.103656	-246.186	76.5925	all	other factors	=1 if black

fp.mforestplot(
    dataframe=df_mmodel,
    estimate="coef",
    ll="ll",
    hl="hl",
    varlabel="label",
    capitalize="capitalize",
    model_col="model",
    color_alt_rows=True,
    groupvar="group",
    table=True,
    rightannote=["var", "group"],
    right_annoteheaders=["Source", "Group"],
    xlabel="Coefficient (95% CI)",
    modellabels=["Have young kids", "Full sample"],
    xticks=[-1200, -600, 0, 600],
    mcolor=["#CC6677", "#4477AA"],
    # Additional kwargs for customizations
    **{
        "markersize": 30,
        # override default vertical offset between models (0.0 to 1.0)
        "offset": 0.35,  
        "xlinestyle": (0, (10, 5)),  # long dash for x-reference line
        "xlinecolor": ".8",  # gray color for x-reference line
    },
)

Please note: This module is still experimental. See this jupyter notebook for more examples and tweaks.

Gallery and API Options

Check out this jupyter notebook for a gallery variations of forest plots possible out-of-the-box. The table below shows the list of arguments users can pass in. More fined-grained control for base plot options (eg font sizes, marker colors) can be inferred from the example notebook gallery.

Option	Description	Required
`dataframe`	Pandas dataframe where rows are variables (or studies for meta-analyses) and columns include estimated effect sizes, labels, and confidence intervals, etc.	✓
`estimate`	Name of column in `dataframe` containing the estimates.	✓
`varlabel`	Name of column in `dataframe` containing the variable labels (study labels if meta-analyses).	✓
`ll`	Name of column in `dataframe` containing the conf. int. lower limits.
`hl`	Name of column in `dataframe` containing the conf. int. higher limits.
`logscale`	If True, make the x-axis log scale. Default is False.
`capitalize`	How to capitalize strings. Default is None. One of "capitalize", "title", "lower", "upper", "swapcase".
`form_ci_report`	If True (default), report the estimates and confidence interval beside the variable labels.
`ci_report`	If True (default), format the confidence interval as a string.
`groupvar`	Name of column in `dataframe` containing the variable grouping labels.
`group_order`	List of group labels indicating the order of groups to report in the plot.
`annote`	List of columns to add as annotations on the left-hand side of the plot.
`annoteheaders`	List of column headers for the left-hand side annotations.
`rightannote`	List of columns to add as annotations on the right-hand side of the plot.
`right_annoteheaders`	List of column headers for the right-hand side annotations.
`pval`	Name of column in `dataframe` containing the p-values.
`starpval`	If True (default), format p-values with stars indicating statistical significance.
`sort`	If True, sort variables by `estimate` values in ascending order.
`sortby`	Name of column to sort by. Default is `estimate`.
`flush`	If True (default), left-flush variable labels and annotations.
`decimal_precision`	Number of decimal places to print. (Default = 2)
`figsize`	Tuple indicating core figure size. Default is (4, 8)
`xticks`	List of xticklabels to print on x-axis.
`ylabel`	Y-label title.
`xlabel`	X-label title.
`color_alt_rows`	If True, shade out alternating rows in gray.
`preprocess`	If True (default), preprocess the `dataframe` before plotting.
`return_df`	If True, returned the preprocessed `dataframe`.

(back to top)

Known Issues

Variable labels coinciding with group variables may lead to unexpected formatting issues in the graph.
Left-flushing of annotations relies on the monospace font.
Plot may give strange behavior for few rows of data (six rows or fewer. see this issue)
Plot can get cluttered with too many variables/rows (~30 onwards)
Not tested with PyCharm (#80) nor Google Colab (#110).
Duplicated varlabel may lead to unexpected results (see #76, #81). mplot for grouped models could be useful for such cases (see #59, WIP).

(back to top)

Background and Additional Resources

More about forest plots

Forest plots have many aliases (h/t Chris Alexiuk). Other names include coefplots, coefficient plots, meta-analysis plots, dot-and-whisker plots, blobbograms, margins plots, regression plots, and ropeladder plots.

Forest plots in the medical and health sciences literature are plots that report results from different studies as a meta-analysis. Markers are centered on the estimated effect and horizontal lines running through each marker depicts the confidence intervals.

The simplest version of a forest plot has two columns: one for the variables/studies, and the second for the estimated coefficients and confidence intervals. This layout is similar to coefficient plots (coefplots) and is thus useful for more than meta-analyses.

More resources about forest plots

[1] Chang, Y., Phillips, M.R., Guymer, R.H. et al. The 5 min meta-analysis: understanding how to read and interpret a forest plot. Eye 36, 673–675 (2022).
[2] Lewis S, Clarke M. Forest plots: trying to see the wood and the trees BMJ 2001; 322 :1479

More about this package

The package is lightweight, built on pandas, numpy, and matplotlib.

It is slightly opinioniated in that the aesthetics of the plot inherits some of my sensibilities about what makes a nice figure. You can however easily override most defaults for the look of the graph. This is possible via **kwargs in the forestplot API (see Gallery and API options) and the matplotlib API.

Planned enhancements include forest plots where each row can have multiple coefficients (e.g. from multiple models).

Related packages

[1] [Stata] Jann, Ben (2014). Plotting regression coefficients and other estimates. The Stata Journal 14(4): 708-737.
[2] [Python] Meta-Analysis in statsmodels
[3] [Python] Matt Bracher-Smith's Forestplot
[4] [R] Solt, Frederick and Hu, Yue (2021) dotwhisker: Dot-and-Whisker Plots of Regression Results
[5] [R] Bounthavong, Mark (2021) Forest plots. RPubs by RStudio

(back to top)

Contributing

Contributions are welcome, and they are greatly appreciated!

Potential ways to contribute:

Raise issues/bugs/questions
Write tests for missing coverage
Add features (see examples notebook for a survey of existing features)
Add example datasets with companion graphs
Add your graphs with companion code

Issues

Please submit bugs, questions, or issues you encounter to the GitHub Issue Tracker. For bugs, please provide a minimal reproducible example demonstrating the problem (it may help me troubleshoot if I have a version of your data).

Pull Requests

Please feel free to open an issue on the Issue Tracker if you'd like to discuss potential contributions via PRs.

(back to top)

forestplot's People

Contributors

Stargazers

Watchers

Forkers

gitter-badger abhiw42 johan-lorentzon juancq eythore kamalakbari7 covix kingb12 jeanbaptisteb drdavidorji

forestplot's Issues

change font to support Chinese

currently, Chinese Character will not show in the plot and it is the same for korean, japanese and so on. How to change font setting to let it support Chinese? thx in advance

Save SVG version of the forest plot

Thanks for this great library! I prefer to write my plots in SVG since it has a much better resolution than PNG or JPEG formats. I tried to use plt.savefig() function but it writes an empty file. Is it not possible to save the plot outputs in svg? It would be great if this feature is added.

Allow no drawing of CI

Would be nice to allow no drawing of the confidence intervals directly from the API. (A hack is to convert the ll and hl to be the same as the estimate.)

One solution is to allow ll and hl to be None.

Another solution is a new option drawci that is True by default but can be set to False.

~~Add new option~~
Update args validation
Update docs: ll and hl no longer required(?)

Plotting of estimates on a log-scale

At present, the estimates ("r" in the example dataframe) can only be plotted on a linear scale. Certain estimates (e.g. Odds Ratios (ORs)) are typically plotted onto log-scales.
For example, in forestplot:

Equivalent in 'R':

It would be great if forestplot allowed users the request a log-scale and (ideally) to customize aspects of this.

Horizontal space between the chart and variable column

Hello,
I followed your instruction to install then tried the sample code here. But it create the big space between the chart and the column to show variables. I tried some other examples of my own but the outputs seem same. Do you have any idea why? Thank you very much.

Requesting an option to change the location of x-reference line

When plotting hazard or odds ratios in forest plots it would be helpful to be able to change the location of x-reference line--to move it from x=0 to x=1.

Better backend for Confidence Intervals

There seems to be a better backend for CIs: https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.errorbar.html#matplotlib.axes.Axes.errorbar.

This should allow for asymmetrical CIs common in Odds ratios (#28).

Should affect only the backend.

Update docs for RTD

project title is still "pyforestplot"
Table of Contents internal linking not working in RTD
Check that docstrings are consistent

See https://myst-parser.readthedocs.io/en/stable/syntax/optional.html#syntax-header-anchors

Change group variable font size

Dear all,

Thanks so much for such amazing tool.
My question is how can I change the font size for groupvar ?
When I change the fontsize value, nothing happen for groupvar in term of font size.
Any suggestion please?

best

Add tests for mplot

Add docstrings for mplot

Add option to show P-values in scientific notation

Hi @LSYS,

Thank you for this wonderful package!

I was wondering if it is possible to show P-values in scientific notation as very small P-values are displayed as 0.

Thank you!

N column is also formatted to have decimals

Thanks for this neat package again!

When I add the "N" column in the table, it is automatically converting it to have one decimal point (example: 100 -> 100.0)

Tried multiple work arounds to modify it to make an integer but couldnt work it out. In the example shown in the manual, it does seem to work fine.

Any help is greatly appreciated.

Updated**

data['n'] = data['n'].astype("string")

Fixed it.

Update `readme-examples.iypnb`

BUG failing when running on matplotlib 3.8

Thanks for your great work,

I tried running your code with the newest version of matplotlib (3.8) but unfortunately it fails.

Error from matplotlib:

*** AttributeError: 'YTick' object has no attribute 'label'

originating at forsetplot/graph_utils.py line: pad = max( T.label.get_window_extent(renderer=fig.canvas.get_renderer()).width for T in yax.majorTicks )

Downgrading to matplotlib 3.7.3 solves the issue

`mforestplot`: Prep data for `mforestplot`

Prepare example data for the planned mforestplot utility to plot forestplots where each row can have multiple models.

For example:

Reorganize mplot

Freeze matplotlib-inline dependency in setup.py

Freeze matplotlib-inline dependency to <= 0.1.3 in setup.py.

Forgot to do this in previous patch.

See #40.

Tidy imports using isort

Top of the plot gets cut off

Hello! Thank you for building this awesome package.

I'm trying to visualize some odds-ratios and their confidence intervals, and I'm running into an issue where the top of the plot seems to get cut off.

I installed forestplot today via conda with conda install -c conda-forge forestplot, and am running it on Python 3.9.6.

Data

>>> df
  varname        or       low      high
0       a  1.761471  0.926563  3.022285
1       b  1.443660  0.517059  3.130794
2       c  1.525227  0.603459  3.082003
3       d  0.895682  0.463703  1.660016
4       e  4.025670  1.196941  8.932753
5       f  1.086105  0.555171  2.045735

Plotting code

fp.forestplot(df,  # the dataframe with results data
    estimate='or',  # col containing estimated effect size 
    ll='low', hl='high',  # columns containing conf. int. lower and higher limits
    varlabel='varname',  # column containing variable label
    ylabel='Confidence interval',  # y-label title
    xlabel='Odds Ratio',  # x-label title
    color_alt_rows=True,
    figsize=(4,8),
    **{
        'xline' : 1.,
        "xlinestyle": (0, (10, 5)),  # long dash for x-reference line 
    }
)

plt.savefig('test_forest_plot.png', bbox_inches='tight')

What I'm seeing is the attached image, where the top of the plot (where variable "a" should be) is getting cut off. I've tried adjusting the plot size but the issue persists. Any insight or advice would be appreciated!

Init regression tests

Start some regression tests to ensure that old bugs do not resurface in new releases (e.g., the issue raise in #47).

Include the relevant data needed to test plot

Maintain label character formatting (making no string normalisation the default)

Label "IMD quintile" is represented as "Imd quintile"
I would propose that the default setting would be to maintain label strings as stated (with perhaps the option to switch on a "string normalisation" option.

Fix spacing issue at top of plot

Issue raised at #47

Excess Whitespace Error in Jupyter

I followed the install and quick start instructions in the README, but I'm having an issue with excess whitespace being added between the variable labels and the plot. Below are screenshots of the example as well as example with customization 5. The whitespace is added regardless of the width of the overall forest plot.

I am using Jupyter version 6.5.2 and Python 3.9.15 via Anaconda.

How do I move/remove the line on the yaxis?

There is a line along the y-axis where x=0. I want this line to be where x=1 or to remove it completely. Is there a way to do this? I've been trying to access it using spines set_position but it didn't work.

Confidence interval and p-value labels have different height and fontsize

The "Confidence interval" ylabel and the "P-value" headers have different height and fontsize:

import pandas as pd
import forestplot as fp

df = fp.load_data("sleep")

fp.forestplot(df,  # the dataframe with results data
              estimate="r",  # col containing estimated effect size 
              ll="ll", hl="hl",  # columns containing conf. int. lower and higher limits
              varlabel="label",  # column containing variable label
              pval="p-val",  # Column of p-value to be reported on right
              ylabel="Confidence interval",  # ylabel to print
              )

`mforestplot`: Init src code

Add src to plot mforestplot: having multiple estimates in one row. See #59.

add a wheel to pypi in addition to a tar.gz file

Got an install warning:

DEPRECATION: forestplot is being installed using the legacy 'setup.py install' method, 
because it does not have a 'pyproject.toml' and the 'wheel' package is not installed. pip 
23.1 will enforce this behaviour change. A possible replacement is to enable the '--use-
pep517' option. Discussion can be found at https://github.com/pypa/pip/issues/8559

It would be good to also push wheels to pypi; since this package is pure python it should be fairly straightforward. I would be happy to contribute to a PR if you would accept.

Update example #1 readme

Example with customization #1 is outdated:

import forestplot as fp

df = fp.load_data("sleep")

fp.forestplot(df,  # the dataframe with results data
              estimate="r",  # col containing estimated effect size 
              moerror="moerror",  # columns containing conf. int. margin of error
              varlabel="label",  # column containing variable label
              capitalize="capitalize",  # Capitalize labels
              groupvar="group",  # Add variable groupings 
              # group ordering
              group_order=["labor factors", "occupation", "age", "health factors", 
                           "family factors", "area of residence", "other factors"],
              sort=True,  # sort in ascending order (sorts within group if group is specified)               
              )

Thresholds and symbols are not getting passed through

kwargs for thresholds and symbols are not getting passed through to the star_pval formatter.

For example,

import pandas as pd
import forestplot as fp

df = fp.load_data("sleep")

fp.forestplot(df,  # the dataframe with results data
              estimate="r",  # col containing estimated effect size 
              ll="ll", hl="hl",  # columns containing conf. int. lower and higher limits
              varlabel="label",  # column containing variable label
              pval="p-val",  # Column of p-value to be reported on right
              color_alt_rows=True,  # Gray alternate rows
              ylabel="Est.(95% Conf. Int.)",  # ylabel to print
              decimal_precision=3,
              **{"thresholds":(0.001, 0.01, 0.05)}
              )

Update Workflow to test package from Conda

Add test from conda-forge installation to https://github.com/LSYS/forestplot/blob/patch/.github/workflows/nb-pkg.yml.

Feature Request: Customizable Column Name Option in forestplot

Summary:

I would like to propose a feature enhancement for the forestplot package that allows users to specify custom column names instead of the default "Variable" label. This feature would be handy for those looking to compare models or other items that require more descriptive column naming.

Current Behavior:

Currently, the forestplot package automatically assigns the name "Variable" to one of its columns. While this works well for general purposes, it needs more flexibility for cases where a different label might be more appropriate or informative (I could change it for my use cases, but it can be difficult for some users).
For example, in one of your sample codes, we can see that the first column of the forestplot is Variable:

fp.forestplot(df,  # the dataframe with results data
              estimate="r",  # col containing estimated effect size 
              ll="ll", hl="hl",  # lower & higher limits of conf. int.
              varlabel="label",  # column containing the varlabels to be printed on far left
              capitalize="capitalize",  # Capitalize labels
              pval="p-val",  # column containing p-values to be formatted
              annote=["n", "power", "est_ci"],  # columns to report on left of plot
              annoteheaders=["N", "Power", "Est. (95% Conf. Int.)"],  # ^corresponding headers
              rightannote=["formatted_pval", "group"],  # columns to report on right of plot 
              right_annoteheaders=["P-value", "Variable group"],  # ^corresponding headers
              xlabel="Pearson correlation coefficient",  # x-label title
              table=True,  # Format as a table
              )

Proposed Feature:

I suggest adding an option allowing users to specify their column names. This could be implemented as an additional argument in the relevant function(s), allowing users to choose a custom name that better fits their data or the context of their analysis.

Use Case:

For instance, in scenarios where users are comparing different models (e.g., prediction models like 'Regression', 'Random Forest', 'SVM'), having the ability to label the column as "Prediction Models" or a similar custom name would enhance the readability and relevance of the forestplot.

Benefits:

Enhanced Customization: Users can tailor the forestplot to fit the context of their data better.
Increased Clarity: Custom column names can make the plots more intuitive and informative, especially for presentations or publications.
Broader Applicability: This feature could broaden the use cases for the forestplot package, making it more versatile for various data analyses.

Thank you for considering this feature request. I believe it would be a valuable addition to the forestplot package and enhance its utility for a wide range of users.

Best regards,
@kamalakbari7

Remove whitespaces at top of the plot

Remove whitespaces at top of the plot (vertical whitespace) using the y-axis limits.

Related to #37.

Known Issue: Table headers don't work as expected with 6 (or fewer) rows of data

See issue48-table-does-not-work-6rows-or-fewer.ipynb.

df = fp.load_data("sleep")

fp.forestplot(df.head(6),  # the dataframe with results data
              estimate="r",  # col containing estimated effect size 
              ll="ll", hl="hl",  # lower & higher limits of conf. int.
              varlabel="label",  # column containing the varlabels to be printed on far left
              capitalize="capitalize",  # Capitalize labels
              pval="p-val",  # column containing p-values to be formatted
              annote=["n", "power", "est_ci"],  # columns to report on left of plot
              annoteheaders=["N", "Power", "Est. (95% Conf. Int.)"],  # ^corresponding headers
              rightannote=["formatted_pval", "group"],  # columns to report on right of plot 
              right_annoteheaders=["P-value", "Variable group"],  # ^corresponding headers
              xlabel="Pearson correlation coefficient",  # x-label title
              table=True,  # Format as a table
              )

Not including first rows of dataset, row shifting, incorrectly annotating row as left-hand labels, whereas labels on the right are correct

review_example.csv

import forestplot as fp
import pandas as pd

df = pd.read_csv("review_example.csv",sep=";")  # companion example data


fp.forestplot(df,  # the dataframe with results data
              estimate='PCSA_Men_mean',  # col containing estimated effect size 
              ll= 'PCSA_Men_Lower', hl='PCSA_Men_Upper',  # columns containing conf. int. lower and higher limits
              varlabel='Abbreviation',  # column containing variable label
              capitalize="capitalize",  # Capitalize labels
              annote=["Source", "Image modality", 'Sample_size',"Method", 'Position'],   # columns to report on left of plot
              annoteheaders=["Ref", "Modality", 'N',"PCSA", 'Pose'],  # ^corresponding headers
              rightannote=['Age', 'Height', 'Weight', 'Fiber_length', 'Pennation', "Info"],  # columns to report on right of plot 
              right_annoteheaders=['Age[y]', 'Height[cm]', 'Weight[kg]', 'Fiber_length[cm]', 'Pennation[Deg]', "Note"],  #corresponding headers
              
              groupvar= "Agegroup",  # column containing group labels
              group_order=["Reference","Young Adults","Adults"], 
              xlabel="PCSA Ratio",  # x-label title
              xticks=[0,30,60],  # x-ticks to be printed
              table=True,  # Format as a table
              color_alt_rows=True,  # Gray alternate rows
              # Additional kwargs for customizations
              **{"marker": "D",  # set maker symbol as diamond
                 "markersize": 35,  # adjust marker size
                 "xtick_size": 12,  # adjust x-ticker fontsize
                })
#plt.savefig("plot.jpg", bbox_inches="tight")

Update documentation to reflect new changes and fixing errors

Updating documentation to reflect new changes and fixing errors:

Add capitalize option to the examples
Add capitalize option to Table of API options
Remove moerror option
Add binder badge
Add citation metadata
Add example of "multi-forestplot"
Fix Example 5 with error "family factors'
Update examples/readme-examples.ipynb
Document logscale option

Coloring Alternate Rows text

Hello, I'm really enjoying this package but have one issue. The 'color_alt_rows' setting colors only the plot but doesn't include the text which still makes it difficult to match up and read easily. Would it possible to shade in the text as well? Thanks!

Known Issue: Left-flushing of annotations relies on the monospace font

I would like to use different fonts (e.g. Helvetica, Arial) to match journal font requirements for publication-ready figures.

A potential fix could be to use matplotlib.pyplot.table to render the table text to allow left-align with any font.

Setting 95% confidence interval limits

I have data of about 20 rows. The confidence intervals are between 0 and 5 except for two rows with confidence interval labels ranging between 1.5 and 40. Plotting them all together makes the intervals very narrow because they are compressed by the wide intervals of the two records. I wanted to add a confidence interval limit to make the plot readable. I used xticks=[0,1,5, 10] and set_xlim(0, 10). But this makes my plot bad.
This was the graph without adding set_xlim(0,10)

After

Pass axis to forestplot call

Is there any way to pass an existing axis to forestplot? This would enable subplots.

Add note to save using `bbox_inches="tight"`

Add note in readme to save figures with the bbox_inches="tight" option.

Related to #38.

Adding lines in X axis

Is there a way to move the xaxis line from 0 to 1 or add a new line in Xaxis?

Changing the line color indicating statistical significance

Great package!

Where in the code should I look if I want to modify the plotting so that I can change the color of a line when the corresponding row is statistically significant? I don't want to show the p-values or the stars, I just want to change the line color.

Plot hides left hand side with variable names and confidence intervals

When running the following code I get a plot without variable names and confidence intervals. I'm using running matplotlib 3.6.2, numpy 1.22.4 and pandas 1.5.1

`import csv
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import forestplot as fp

N = 10

ids = ["Number is: " + str(x) for x in range(N) ]

coef = np.random.uniform(0.4,2, size=N)
lower = coef - np.random.uniform(size=N)
upper = coef + np.random.uniform(size=N)

data = pd.DataFrame({"c":coef, "lower": lower, "upper": upper, "name": ids})
fig = fp.forestplot(data, estimate="c", ll="lower", hl="upper", varlabel="name")
plt.savefig("debug.png")`

Add readme for mplot

Switch `append` to `concat`

The pandas append backend seems to have been deprecated:

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.append.html#pandas-dataframe-append

passing x ticks labels

Hi,

Thanks for this great package. I was wondering if there is an argument to pass the x-tick labels. I'd want to plot odds ratios in the log scale, but label them with their actual OR values. For e.g. I'd like to mark the ticks at [math.log(0.5), math.log(1), math.log(1.5)] and label them as [0.5,1,1.5]

Duplicate values in separate groupings bug

There seems to be a bug where if you have duplicate values in separate groupings the plot does not show some of the rows.

import sys
import forestplot as fp
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib

print(
    f"numpy version: {sys.version}",
    f"pandas version: {pd.__version__}",
    f"matplotlib version: {matplotlib.__version__}",
    f"forestplot version: {fp.__version__}",
    sep='\n'
)
# numpy version: 3.8.1 (default, Feb  3 2020, 12:44:18) 
# [GCC 4.8.5 20150623 (Red Hat 4.8.5-39)]
# pandas version: 1.4.3
# matplotlib version: 3.4.2
# forestplot version: 0.3.1

def create_data():
    group_a = pd.DataFrame({'name': ['name_a', 'name_b'], 'estimate': [1.1, 1.0]})
    group_a['Lower CI'] = group_a['estimate'] - 0.05
    group_a['Upper CI'] = group_a['estimate'] + 0.05
    group_a['group'] = "group_a"

    group_b = group_a.copy()
    group_b['group'] = 'group_b'
    groups = pd.concat([group_a, group_b], axis=0) 
    # group_a["group"] = "group_a"
    return groups

df = create_data()
display(df)

print("Missing part of the plot")
fp.forestplot(df,
              estimate='estimate', varlabel='name', ll="Lower CI", hl="Upper CI", groupvar="group")
plt.show()

# print("Still missing part of the plot")
# df.loc[df['group'] == 'group_b', ['estimate', "Lower CI", "Upper CI"]] += 0.001
# fp.forestplot(df,
#               estimate='estimate', varlabel='name', ll="Lower CI", hl="Upper CI", groupvar="group")
# plt.show()

print("Now it works")
df.loc[df['group'] == 'group_b', ['estimate', "Lower CI", "Upper CI"]] += 0.01
fp.forestplot(df,
              estimate='estimate', varlabel='name', ll="Lower CI", hl="Upper CI", groupvar="group")
plt.show()

Add Contributing.md and Code_of_conduct.md

These 2 files are considered as an important part of any project so we should consider adding them.

Remove excessive whitespaces between labels and plot

Remove excessive whitespaces between labels and plot (horizontal whitespace) by freezing matplotlib-inline to version <=0.1.3.

Related to #37 and #38.

lsys / forestplot Goto Github PK

forestplot's Introduction

Forestplot

Table of Contents

Installation

Quick Start

Some Examples With Customizations

Multi-models

Gallery and API Options

Known Issues

Background and Additional Resources

Contributing

forestplot's People

Contributors

Stargazers

Watchers

Forkers

forestplot's Issues

Data

Plotting code

Summary:

Current Behavior:

Proposed Feature:

Use Case:

Benefits:

Recommend Projects

Recommend Topics

Recommend Org