Giter VIP home page Giter VIP logo

ec503's Introduction

EC503

20190331

#EC503 Minutes of Meeting

Firstly, we use several algorithms and try to classify the data. We used DRO, Logistic Regression, KNN, SVM, Decision Tree. 🌲 And we found that the result was bad. Althouth the accuracy were high for all algorithms, that was because the dataset has large bias: 90% of the data are labeled as 0, only 10% of data are labeled as 1. Thus, the accuracy will be 90% even if the classifier predict all test data as 0. And the AUC was significantly low: only 0.5 Even if we normalize the data, the results had nothing change: still very bad.

Plan for the second week:

  1. 把label为1的数据提取出来,让label为1的数据的比例大一些,然后看一下结果是否有好转
  2. 想办法检测outlier并去除:identify outlier
  3. 需要降维吗?
  4. 能否试一下:不以整体精确度为导向,而是以“把1都预测对,然后再考虑0”为导向
  5. 看一下Kaggle里面的kernel https://www.kaggle.com/c/santander-customer-transaction-prediction/kernels
  6. 查一下对数据有bia的情况下的检测方法

Suggestions:

20190422

  • 参考文章ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning. 用 Synthetic data generation来产生新数据点做训练,适用于所有machine learning方法。

  • 对于SVM, 可以参考以下两篇结合active learning和svm的文章。

  • S. Ertekin, J. Huang, and C. L. Giles, “Active Learning for Class Imbalance Problem,” in Proc. Annual Int. ACM SIGIR Conf. Research and development in information retrieval, pp. 823 - 824, Amsterdam, Netherlands, 2007.

  • S. Ertekin, J. Huang, L. Bottou, C. L. Giles, “Learning on the Bor- der: Active Learning in Imbalanced Data Classification,” in CIKM’07, November 6-8, 2007, Lisboa, Portugal.

  • 对于NN,可以参考以下文章

  • Z. H. Zhou and X. Y. Liu, “Training Cost-Sensitive Neural Networks with Methods Addressing the Class Imbalance Problem,” IEEE Trans- actions on Knowledge and Data Engineering, vol. 18, no. 1, pp. 63-77, 2006.

ec503's People

Contributors

zwc662 avatar yinzhusu avatar qipanyang avatar liangxy8 avatar

Watchers

James Cloos avatar  avatar

Forkers

yinzhusu

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.