Giter VIP home page Giter VIP logo

cryptoclustering's Introduction

CryptoClustering

I used my knowledge of Python and unsupervised learning to predict if cryptocurrencies are affected by 24-hour or 7-day price changes.

Alt text

Prepared the Data:

  • I used the StandardScaler() module from scikit-learn to normalize the data from the CSV file.

Alt text

  • I created a DataFrame with the scaled data and set the "coin_id" index from the original DataFrame as the index for the new DataFrame.

TO find the Best Value for k Using the Original Scaled DataFrame:

  • I used the elbow method to find the best value for k using the following steps:
  • Created a list with the number of k values from 1 to 11.
  • Created an empty list to store the inertia values.
  • Created a for loop to compute the inertia with each possible value of k.
  • Created a dictionary with the data to plot the elbow curve.
  • Plotted a line chart with all the inertia values computed with the different values of k to visually identify the optimal value for k.

Alt text

  • Answered the following question in my notebook: What is the best value for k? Answer: 4

Cluster Cryptocurrencies with K-means Using the Original Scaled Data:

  • I used the following steps to cluster the cryptocurrencies for the best value for k on the original scaled data:
  • Initialized the K-means model with the best value for k.
  • Fit the K-means model using the original scaled DataFrame.
  • Predicted the clusters to group the cryptocurrencies using the original scaled DataFrame.
  • Created a copy of the original data and add a new column with the predicted clusters.
  • Created a scatter plot using hvPlot as follows: Set the x-axis as "PC1" and the y-axis as "PC2".
  • Colored the graph points with the labels found using K-means.
  • Added the "coin_id" column in the hover_cols parameter to identify the cryptocurrency represented by each data point.

Alt text

Optimized the Clusters with Principal Component Analysis:

  • I used the original scaled DataFrame, to perform a PCA and reduced the features to three principal components.
  • I retrieved the explained variance to determine how much information can be attributed to each principal component
  • Created a new DataFrame with the PCA data and set the "coin_id" index from the original DataFrame as the index for the new DataFrame.

Alt text

To find the Best Value for k Using the PCA Data:

  • I used the elbow method on the PCA data to find the best value for k using the following steps:
  • Created a list with the number of k-values from 1 to 11.
  • Created an empty list to store the inertia values.
  • Created a for loop to compute the inertia with each possible value of k.
  • Created a dictionary with the data to plot the Elbow curve.
  • Plotted a line chart with all the inertia values computed with the different values of k to visually identify the optimal value for k.

Alt text

Answered the following question in your notebook:

  • What is the best value for k when using the PCA data?
  • Does it differ from the best k value found using the original data?

Cluster Cryptocurrencies with K-means Using the PCA Data:

  • I used the following steps to cluster the cryptocurrencies for the best value for k on the PCA data:
  • Initialized the K-means model with the best value for k.
  • Fit the K-means model using the PCA data.
  • Predicted the clusters to group the cryptocurrencies using the PCA data.
  • Created a copy of the DataFrame with the PCA data and add a new column to store the predicted clusters.
  • Created a scatter plot using hvPlot as follows:
  • Set the x-axis as "price_change_percentage_24h" and the y-axis as "price_change_percentage_7d".
  • Colored the graph points with the labels found using K-means.
  • Added the "coin_id" column in the hover_cols parameter to identify the cryptocurrency represented by each data point.

Alt text

Answered the following question:

  • What is the impact of using fewer features to cluster the data using K-Means?

Answer: The number of clusters (k) as shown in the elbow curve plots was not affected using fewer features. For cryptocurrency clusters, group 1 and 3 are very distinct from the rest of the cryptocurrencies, while group 0 and 2 are similiar. The use of fewer features to cluster data using K-means, helped reduce the amount of noise in the cryptocurrency clusters with pca, thereby making the grouping of the data more clear and readable.

cryptoclustering's People

Contributors

chiomauche avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.