Giter VIP home page Giter VIP logo

data-science-portfolio's Introduction

Data Science Portfolio

This is a repository of the projects I worked on or currently working on. It is updated regularly. The projects are either written in R (R markdown) or Python (Jupyter Notebook). The goal of the projects is to use data science/statistical modelling techniques to find something that is interesting. A typical project consist of finding and cleaning data, analysis, visualization and conclusion. Click on the projects to see full analysis and code.

Projects:

  • Cross correlation analysis between Bitcoin Price and S&P500 price over time.
  • Granger causality test between Bitcoin and stock prices
  • Fitted ARIMA model on Bitcoin prices to forecast Bitcoin range of movement.
  • Keywords(R, Time Series, Causality, Quandl API)


  • Predicted US (2016) election victories as the voting results of each region becomes available.
  • Regressed states with results against polling data and predicted results for the remaining states
  • Monte Carlos simulation used to simulate the winner of the election.
  • Compared simulated results with exchange rates fluctuations to see if market is efficient.
  • Keywords(Python, Linear Regression, Monte Carlos Simulation)


  • Fitted power-law and log-normal distribution to US baby names data since 1960.
  • Use bootstrapping techniques to find a distribution of the power-law parameters
  • Crawled Twitter to find 20000 random user and fitted power law distribution to users' friends count and followers count.
  • Keywords(R, Power-law, Bootstrapping, Log-normal)


  • Plotted scatter-plot matrix to visualize the data
  • Fitted polynomial linear regression on wine quality vs wine chemical properties.
  • Used ridge and lasso regularization to tackle overfitting and compared result
  • Used cross validation to select the optimal regularization strength
  • Keywords(Python, Linear Regression, Ridge and Lasso Regularization, Cross Validation)


  • Parsed a few GB of Tweets to select all the tweets in UK and in English.
  • Used 'qdap' package to analyze the emotion of the Tweets
  • Plotted the emotions over the day and over the week and analysed the interesting results.
  • Keywords(R, Twitter API, Time Series, Sentiment Analysis, ggplot)

  • Downloaded economic indicators data using World Bank API, and cleaned data
  • Downloaded search query of next and last year in Google for each country
  • Fitted linear regression between GDP and future orientation
  • Keywords(R, World Bank API, Google API, Data Cleaning, Linear regression)

  • Predicted UK (2017) election victories as the voting results as it happened.
  • retrieved from Tweets of result announcement and extracted time of announcement for each region.
  • Regressed regions with results against polling data and predicted results for the remaining regions
  • Monte Carlos simulation used to simulate the winner of the election.
  • Keywords(Python, Twitter API, Merging Data)

data-science-portfolio's People

Contributors

alexhuang1117 avatar

Watchers

James Cloos avatar Bai Feng avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.