Giter VIP home page Giter VIP logo

dyncluster's Introduction

dynCluster: Dynamic Clustering Algorithm

Overview

dynCluster implements a new dynamic clustering method that can effectively summarize massive amounts of granular dyadic flow data. For more details of the method and applications, see:

Installation

We recommend installing dynCluster on Amazon Web Services (AWS). This approach will allow users to easily scale up to accommodate bigger datasets. For step-by-step instructions on how to install dynCluster on AWS, see our Wiki page

Usage: A Toy Example on AWS

  1. Once dynCluster is installed on AWS, we create a small simulated dataset following the data generating process described in our paper. For details, see our Wiki page.

    • The data covers 10 countries (90 directed-dyads) trading 40 products over 10 time periods.
    ##   year cty1 cty2 product_1 product_2 product_3  product_4 product_5 ...
    ## 1    1    1    2         0         0         0        0.0         0     
    ## 2    1    1    3         0         0         0   664344.6         0  
    ## 3    1    1    4         0         0         0        0.0         0  
    ## 4    1    1    5    372390         0         0        0.0         0  
    ## 5    1    1    6   3171746   2797487   4872051   981809.8   2497946 
    ## ...
    
    • For each time period, dyads belong to 3 different clusters (or types of international trade). These data represent the "true" dyadic cluster memberships. The ultimate goal of this example is to see how well dynCluster can use the bilateral trade data above to recover the three clusters and dyadic cluster memberships.
    ##   cty1 cty2 dyad z1 z2 z3 z4 z5 z6 z7 z8 z9 z10
    ## 1    1   10 1_10  2  2  2  2  2  2  2  2  2   2
    ## 2   10    1 1_10  2  2  2  2  2  2  2  2  2   2
    ## 3    1    2  1_2  1  1  1  1  1  1  1  1  1   1
    ## 4    2    1  1_2  1  1  1  1  1  1  1  1  1   1
    ## 5    1    3  1_3  2  2  2  2  2  2  2  2  2   2
    ## 6    3    1  1_3  2  2  2  2  2  2  2  2  2   2
    ## 7    1    4  1_4  2  2  2  2  2  1  1  1  1   1
    ## 8    4    1  1_4  2  2  2  2  2  1  1  1  1   1
    ## 9    1    5  1_5  2  2  2  2  2  2  2  2  2   2
    ## 10   5    1  1_5  2  2  2  2  2  2  2  2  2   2
    ## ...
    
  2. We then implement dynCluster in R using the function mainZTM. This function wraps and calls C++ functions (e.g., mainRcpp) from dynCluster. Note that this toy example runs on t2.micro instances in AWS, which is available as a free tier.

    # load library
    library(dynCluster)
        
    # run and time dynCluster
    ptm <- proc.time() # start the clock
    mainZTM("./example/toy", comeBack=TRUE)
    proc.time() - ptm # stop the clock
  3. To assess the performance of dynCluster, we create product-trade heatmaps based on the "true" cluster membership data above and the estimated cluster membership from dynCluster. For details, see our Wiki page.

    • A side-by-side comparison of the two heatmaps below show that the composition of product trade is very similar. This suggests that dynCluster did well in recovering the original clusters.

      True Product Proportion Estimated Product Proportion
      Estimated
    • The table below cross-tabulates the true vs. estimated cluster membership for each dyad-period. The cells in the diagonal show the number of dyad-periods correctly classified. Overall, dynCluster correctly recovered 98.4% of the true dyadic cluster memberships.

      Estimated
      Cluster 1 Cluster 2 Cluster 3 Total
      Cluster 1 167 0 0 167
      True Cluster 2 1 187 6 194
      Cluster 3 0 0 89 89
      Total 168 187 95 450

Getting help

For any questions or problems when using dynCluster, please e-mail the authors.

dyncluster's People

Contributors

stevenliaotw avatar hj08003 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.