- Implementation of ChiMerge (https://www.aaai.org/Papers/AAAI/1992/AAAI92-019.pdf)
Works
sklearn
way
- Supervised discretization using
target
,Chi2 statistics & test
- Can be configured to multiprocess (
n_jobs
)
from discretization.chi_merge import *
chi_merge = ChiMerge(con_features=X.columns, significance_level=0.1, n_jobs=-3)
chi_merge.fit_transform(X)
It follows below rules.
- If continuous feature is discretized,
- Within interval, class frequency is stable.
- Two adjacent intervals should have no similar class frequencies.
- This is tested by Chi2 test
- What if considering k-adjacent, not 2-adjacent ?
Should be normalized and reflected to formula in paper
- ChiMerge: Discretization of Numeric Attributes (https://www.aaai.org/Papers/AAAI/1992/AAAI92-019.pdf (https://www.aaai.org/Papers/AAAI/1992/AAAI92-019.pdf))
- Discretization: An Enabling Technique (https://cs.nju.edu.cn/zhouzh/zhouzh.files/course/dm/reading/reading03/liu_dmkd02.pdf)
TODO
- dataset to s3