EC503

20190331

#EC503 Minutes of Meeting

Firstly, we use several algorithms and try to classify the data. We used DRO, Logistic Regression, KNN, SVM, Decision Tree. 🌲 And we found that the result was bad. Althouth the accuracy were high for all algorithms, that was because the dataset has large bias: 90% of the data are labeled as 0, only 10% of data are labeled as 1. Thus, the accuracy will be 90% even if the classifier predict all test data as 0. And the AUC was significantly low: only 0.5 Even if we normalize the data, the results had nothing change: still very bad.

Plan for the second week:

把label为1的数据提取出来，让label为1的数据的比例大一些，然后看一下结果是否有好转
想办法检测outlier并去除：identify outlier
需要降维吗？
能否试一下：不以整体精确度为导向，而是以“把1都预测对，然后再考虑0”为导向
看一下Kaggle里面的kernel https://www.kaggle.com/c/santander-customer-transaction-prediction/kernels
查一下对数据有bia的情况下的检测方法

Suggestions:

Survey on the problem of bias in dataset.
- Undoing the Damage of Dataset Bias</ol
- Unbiased Look at Dataset Bias
Survey on outlier detection problems
Imbalanced Learn

20190422

参考文章ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning. 用 Synthetic data generation来产生新数据点做训练，适用于所有machine learning方法。
对于SVM, 可以参考以下两篇结合active learning和svm的文章。
S. Ertekin, J. Huang, and C. L. Giles, “Active Learning for Class Imbalance Problem,” in Proc. Annual Int. ACM SIGIR Conf. Research and development in information retrieval, pp. 823 - 824, Amsterdam, Netherlands, 2007.
S. Ertekin, J. Huang, L. Bottou, C. L. Giles, “Learning on the Bor- der: Active Learning in Imbalanced Data Classification,” in CIKM’07, November 6-8, 2007, Lisboa, Portugal.
对于NN，可以参考以下文章
Z. H. Zhou and X. Y. Liu, “Training Cost-Sensitive Neural Networks with Methods Addressing the Class Imbalance Problem,” IEEE Trans- actions on Knowledge and Data Engineering, vol. 18, no. 1, pp. 63-77, 2006.

zwc662 / ec503 Goto Github PK

ec503's Introduction

EC503

20190331

20190422

ec503's People

Contributors

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent