Giter VIP home page Giter VIP logo

clustering_interestssurvey's Introduction

Clustering_InterestsSurvey

Implemented dimensionality reduction for 217 features for community clustering analysis.

Summary

Assignment of Machine learning courses

  • DataSet: 217 interests, 6340 persons.
  • Data pre-processing:
    • Solve the problem of missing values ​​and outliers to avoid errors in cluster analysis.
    • For example, exclude people with too many or too few interests.
  • Dimensionality reduction:
    1. Low variance filter, Subset selection
      • for clustering, more concentrated the distribution of the data corresponding to the feature, the better the contribution to the classifier.
      • Because the type of feature data set is 1 or 0, like Boolean data, and Boolean features are Bernoulli distribution (The variance is p*(1-p)).
      • Filtering on the variance will remove features that have a value of 0 or 1 in more than 80% of the samples.
    2. Principal Component Analysis
      • Because the dataset is not labeled data, PCA is more suitable than LDA as a dimensionality reduction method.
      • find a projection axis can be obtained after projection to obtain the maximum variation of this group of data.
  • Clustering:
    • The K-means clustering algorithm (K-means) was selected as this method.
    • Reason:
      • Because the data is unlabeled, it is necessary to use unsupervised learning.
      • The outliers have been excluded from data preprocessing, so using this algorithm will not cause noise data to be affected.
    • When K < 4, the curve drops sharply; when K > 4, the curve tends to be stable, so the inflection point 4 is K the best value.
  • Result Interpretation:
    • There are four groups in total.
    • Perform an interest analysis for each group, according to their pattern on the principal components.

clustering_interestssurvey's People

Contributors

ryanhsin98 avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.