Giter VIP home page Giter VIP logo

sumansid / google-search-trends-and-s-p-500 Goto Github PK

View Code? Open in Web Editor NEW
5.0 2.0 1.0 1.2 MB

In this repository, I have fetched google search trends data of search terms that are related to the stock market and have made a linear regression model in order to see which search terms are highly significant.

Jupyter Notebook 100.00%
stock-analysis data-science price-action correlation linear-regression

google-search-trends-and-s-p-500's Introduction

Google-Search-Trends-and-S-P-500

In this repository, I have fetched google search trends data of search terms that are related to the stock market and have made a linear regression model in order to see which search terms are highly significant.

How to run the files?

The files can be uploaded to jupyter notebook cloud (No installations) or opened using anaconda navigator. Upon uploading the files, recheck the file paths under the "Paths" markdown in the jupyter notebook. (Especially for the 2008 model)

Findings :

It turns out that price has the highest correlation with debt, bull market and economic boom. And of course, it has a high correlation with DummyTime because the price is going up (mostly) with time.

In order to find which search term is highly significant and which one is not, I decided to run a multivariable regression.

Single Variable Regression : y = mx+b Multi Variable Regression: y = mx1 + mx2 + mx3 … + b

Here, the ‘x’ represent the search terms and y represents the price (explanatory and response respectively)

For the 2004 - Present model :

The train data R squared value is high while the test data R squared is low. This is because our model was trained with the train data and made so that it fits well with it. While for the test data, a data that our model didn’t see before, the model does not fit too well. This is logical because the real world data will always have variance and may not fit the regression properly. According to the first info, 92% of the price action is explained by the search terms and dummy time for training data. For the test data, 52% of the price action is explained by the search terms and dummy time.

For the 2008 weekly model :

Training data R squared 0.8816992296795523 Testing data R Squared 0.6010514258458983

Improvements that could be made:

Firstly, we found out that the R squared for the test dataset if too low, we could improve this by using larger datasets with more search terms.

Check for multicollinearity : Some search terms could be highly correlated with each other and might have less correlation with the price itself. If such search terms are removed after considering their p values, I believe the model will fit the test data well. Search trends from other search engines could be used to get a bigger picture and to make important distinctions.

Web traffic in trading sites, news source could also be used for a better prediction of the price.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.