Giter VIP home page Giter VIP logo

ckdckd145 / statmanager-kr Goto Github PK

View Code? Open in Web Editor NEW
7.0 2.0 0.0 2.69 MB

Open-source statistical package in Python based on Pandas

Home Page: https://cslee145.notion.site/60cbfcbc90614fe990e02ab8340630cc?v=4991650ae5ce4427a215d1043802f5c0&pvs=4

License: MIT License

Python 92.38% Jupyter Notebook 6.85% TeX 0.77%
research social-science statistical-analysis statistics clinical-trials correlation-analysis null-hypothesis pandas pandas-python statistical-methods

statmanager-kr's Introduction

pypi license logo

Statmanager-kr

Open-source statistical package for Python based on the Pandas.

Python과 Pandas 사용자를 위한 오픈소스 통계 패키지

Especially for researchers, data scientists, psychologist, students, and anyone who interested in conducting hypothesis testing. The statmanager-kr aims to organize packages that are "convenient to use", "uncompliated to use", and "convenient to see results". The end goal of statmanager-kr is to be a simple and useful package that can be used by people who don't know much about Python and Pandas.

Pandas를 사용하며, 가설 검증에 대해 관심을 갖는 연구원, 데이터분석가, 심리학자, 학생 등을 위합니다. statmanager-kr은 사용하기 쉽고, 사용이 복잡하지 않으며, 결과를 확인하기에 편리한 패키지 구성을 목표로 개발됩니다. statmanager-kr 개발의 최종 목표는 Python과 Pandas를 잘 알지 못하는 사람도 이용할 수 있는 매우 간편하면서도 유용한 패키지를 만드는 것입니다.

Currently, KOREAN and ENGLISH are supported.
현재 지원하는 언어 세팅은 한글영어입니다.

Documentaion

한글 공식 문서
Official Documentation

Notifications :

Source codes are available in the Github respository
소스코드는 깃헙 레포지토리에서 확인할 수 있습니다.

For updates, please see the notice in the documentation or the Github release.
업데이트 내역은 정식 문서 내 공지사항 혹은 Github release에서 확인하시기 바랍니다.

Please use Github Discussion to let me know if you have any questions, bugs you encounter, suggestions, etc. Of course, you can also email the developer directly.
궁금하신 점, 발생하는 버그, 제안 사항 등 모든 것은 Github Discussion을 활용해서 알려주시면 감사하겠습니다. 물론, 개발자에게 직접 이메일을 보내셔도 됩니다.

  • Quick Start with sample jupyter notebook file

  • Available functions | 현재 사용 가능한 분석

    • Read detailed instructions | 상세 사용법 열람
      1. Normality assumption | 정규성 가정
      2. Homoskedasticity assumption | 등분산성 가정
      3. Reliability | 신뢰도 확인
      4. Frequency analysis | 빈도분석
      5. Correlation analysis | 상관분석
      6. Comparison (2) | 차이비교 (2)
      7. Comparison (3) | 차이비교 (3)
      8. Regression
  • Available functions to make figure or graph | 그래프 혹은 그림 제작에 활용되는 기능

    • P-P plot
    • Q-Q plot
    • Histogram
    • Histogram (cumulative)
    • Pointplot (within differences)
    • Boxplot (between group difference)

Dependency

  • pandas
  • statsmodels
  • scipy
  • numpy
  • matplotlib
  • seaborn
  • XlsxWriter

Recommendation

Using "Jupyter Notebook" is STRONGLY RECOMMENDED (Of course, statmanager-kr works just as well in a Python environment)
"주피터 노트북(Jupyter Notebook)" 사용을 강력하게 권고합니다. 물론, Python 환경에서도 statmanager-kr은 문제없이 작동합니다.

Installing statmanager-kr

pip install statmanager-kr

Updating statmanager-kr

pip install statmanager-kr --upgrade

Quick Start

Import

import pandas as pd
from statmanager import Stat_Manager

df = pd.read_csv('testdf.csv', index_col = 'name')
sm = Stat_Manager(df, language = 'eng')

Independent Samples T-test

sm.progress(method = 'ttest_ind', vars = 'age', group_vars = 'sex').figure()
Output (Click to See)
female male
n 15.00 15.00
mean 27.33 28.00
median 26.00 26.00
sd 4.88 6.94
min 21.00 20.00
max 39.00 39.00
dependent variable t-value degree of freedom p-value 95% CI Cohen'd
height -0.304 28 0.763 [-5.153, 3.820] -0.111

figure

Dependent Samples T-test

sm.progress(method = 'ttest_rel', vars = ['prescore', 'postscore']).figure()
Output (Click to See)
prescore postscore
n
mean 5.13 4.23
median 5.50 4.00
sd 2.85 2.91
min
max
variables t-value degree of freedom p-value 95% CI Cohen's d
['prescore', 'postscore'] 1.198 29 0.24 [-0.636, 2.436] 0.313

figure

Pearson's Correlation

sm.progress(method = 'pearsonr', vars = ['income', 'prescore', 'age']).figure()
Output (Click to See)
n Pearson's r p-value 95%_confidence_interval
income & prescore 30 -0.103 0.588 [-0.447, 0.267]
income & age 30 -0.051 0.789 [-0.404, 0.315]
prescore & age 30 -0.044 0.816 [-0.398, 0.321]
income prescore age
income 1.000 -0.103 -0.051
prescore -0.103 1.000 -0.044
age -0.051 -0.044 1.000

figure

One-way ANOVA with Post-hoc test

sm.progress(method = 'f_oneway', vars = 'age', group_vars = 'condition', posthoc = True).figure()
Output (Click to See)
test_group sham_group control_group
n 10 10 10
mean 28.5 28.3 26.2
median 27 29 25.5
sd 6.57 5.56 5.88
min
max
sum_sq df F p-value partial eta squared
Intercept 6864.4 1 189.469 0 0.872
C(condition) 32.467 2 0.448 0.644 0.004
Residual 978.2 27 NaN NaN 0.124
Test Multiple Comparison ttest_ind FWER=0.05 method=bonf alphacSidak=0.02, alphacBonf=0.
group1 group2 stat pval pval_corr reject
control_group sham_group -0.8204 0.4227 1 FALSE
control_group test_group -0.8246 0.4204 1 FALSE
sham_group test_group -0.0735 0.9422 1 FALSE

figure

One-way Repeated Measure ANOVA with Post-hoc test

sm.progress(method = 'f_oneway_rm', vars = ['prescore','postscore','fupscore'], posthoc = True).figure()
Output (Click to See)
prescore postscore fupscore
n 30.00 30.00 30.00
mean 5.13 4.23 4.37
median 5.50 4.00 4.00
sd 2.85 2.91 2.62
min
max
F Value Num DF Den DF p-value partial etq squared
variable 1.079 2 58 0.347 0.02
Test Multiple Comparison ttest_ind FWER=0.05 method=bonf alphacSidak=0.02, alphacBonf=0.
group1 group2 stat pval pval_corr reject
fupscore postscore 0.1866 0.8526 1 FALSE
fupscore prescore -1.0849 0.2824 0.8473 FALSE
postscore prescore -1.2106 0.231 0.6929 FALSE

figure

Development: Changseok Lee

Copyright (C) 2023 Changseok Lee

statmanager-kr's People

Contributors

ckdckd145 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

statmanager-kr's Issues

JOSS Reviewer Issue

Hi!
nice software you have there. I will use this issue to keep track of problems I found on the way to use it.

  • the differences to pengouin are mentioned in the paper, but I would also add them to the readme under "related software" with a brief discussion of strengths of each
  • a compatible python version should be provided. I'm running my tests with python 3.12 current version
  • clarification: "with df that have a structure of WIDE-RANGE. " => is WIDE-RANGE the same as a tidy dataframe in R?
  • while the errormessage is generally good (it tells me what I need to do), I don't understand why I need to specify an index for the DF.
  • Search sm.howtouse("fgiure") for the function to draw pictures and graphs! => typo in figure
  • in the howtouse() output it would be nice to get an example how to actual use a method, e.g. something like sm.progress(method = 'ttest_ind', vars = 'age', group_vars = 'sex').figure() - (not for every method, just one general example to get an idea how to use it)
  • if I define an id="species", I cannot use it anymore as grouping variable
  • if I specify more than 2 groups in ttest_ind, I get a very cryptic error AxisError: axismust be an integer, a tuple of integers, orNone.
  • typo: Indenpendent in howtouse
  • if I run ttest_rel I get a KeyError: 's' if I use the wrong call syntax (which is impossible to find out from the REPL/jupyter notebook, you have to look into the actual manual).
  • automated testing: While in the docs in notion there are some printed comparisons against e.g. scipy, it is not clear whether these are run automatically (I didnt find the code for the documentation), or whether the author checked each for equivalence
  • in a spot-check I saw that the cohen's d calculation was implemented by the author, but I cannot confirm that there exists a test for these functions.
  • The JOSS Part " Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support" is not completed as of now (or I missed it)

I will continue later, but the unittesting is really important, especially in a stats-package

Linking back to (openjournals/joss-reviews/issues/6642)

JOSS review issue2

I've been exploring your package and appreciate its capabilities. However, I encountered a couple of issues that I would like to bring to your attention:

  1. The use of index_col="name" as suggested in the Quickstart guide does not work with the provided testdata/testdf.csv. It functions correctly when I switch to another variable, such as "id".

  2. The bar plot generated by the command sm.progress(method='ttest_ind', vars='age', group_vars='sex').figure() appears misaligned. Is there a specific version of the plotting library that I should be using to ensure proper alignment?

Screenshot 2024-04-30 at 4 16 56 PM

기능 단위 분절 예정 (상당한 기간이 소요될 것으로 보임) | I will be breaking up the coding into functional units (this will take a significant amount of time)

현재 statamanager-kr은 manager.py 내에서 정해진 방식에 따라 분석이 진행되도록 코딩되어 있습니다.
현재 의도한 대로 기능은 잘 작동하고 버그가 거의 다 잡힌 것으로 보이지만, 각 기능을 업데이트하거나 수정함에 있어 다소 불편한 점이 있습니다.

그래서 더 많은 통계분석 기능이 추가되기 전에 미리 손을 보려고 합니다.
따라서 업데이트가 다소 지연될 수 있습니다. 양해해주세요!

Currently, statamanager-kr is coded so that the analysis proceeds in a prescribed manner within manager.py.
While it currently functions as intended and seems to be mostly bug-free, there are some inconveniences with updating or fixing each feature.

So I want to get my hands dirty before adding more statistical analysis features.
As a result, updates may be delayed a bit. Thanks for your patience!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.