Giter VIP home page Giter VIP logo

stock-market-prediction-using-scala-and-spark's Introduction

Stock-Market-Prediction-Using-Scala-and-Spark

This is the CSYE7200 Big Data Systems Engineering Using Scala Final Project for Team 8 Spring 2018

Team Members:

Purva Bundela [email protected]

Dishank Shah [email protected]

Final Presentation

https://prezi.com/view/Lr7sPPoyMfhfscPKTesr/

Abstract

To examine a number of different forecasting techniques to predict future stock returns based on past returns and numerical news indicators to construct a portfolio of multiple stocks in order to diversify the risk. We do this by applying supervised learning methods for stock price forecasting by interpreting the seemingly chaotic market data.

Methodology

Data cleaning and parsing

  1. Data from all the companies CSV was loaded into dataframes and converted to format required by ARIMA model.

Spark Timeseries Methodology

  1. A time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time.

  2. Company name and dates were taken as features to train the data using ARIMA model.

  3. Dataframes of all the companies were joined and loaded to RDD.

  4. Using ARIMA model data is trained and model is then used for forecasting future values.

  5. Using forecast method of ARIMA model stock prices for 30 days.

Twitter Sentiment Analysis

  1. Tweets acquired by Search API are in JSON format with a maximum limit of 100 per request. Built a JSON parser to correctly parse the and filter those attributes which are not required.

  2. Special characters are removed to increase the accuracy of the sentiment scores.

  3. Using Stanford NLP to calculate the sentiment score which tells whether the particular tweet is positive or negative.

  4. Using Spark Streaming to receive the stream of tweets and perform the analysis for past 7 days.

Web Application and Visualization

  1. Web application is created using Play framework.

  2. Data is send from controller to view using Akka framework.

  3. User should be able to either signUp or login to view the dashboard.

  4. To show the analysis, d3.js is used

Steps to run the project on the local machine

Run on windows and mac

  1. Download sbt 0.13.17

  2. Configure Java 1.8 on your machine

  3. Configure scala 2.11.8 on your machine

  4. To run from terminal go to the Stock-Market-Prediction and write sbt run.

  5. Also make sure to configure your database in config file so that you can signUp and login to the application.

Dataset

SNAP Twitter7 dataset

Yahoo Finance Stocks dataset

The dataset was taken from Kaggle and had data for around 500 companies.

Each data file had 8 columns

We trained the model with the data of 10 companies and 15000 rows.

Details

used HDFS to store dataset

utilized Spark to read and pre-process the dataset.

used Play framework for web application.

applied Stanford NLP to do sentiment analysis on twitter data.

applied Spark-ML to train a Timeseries model.

assessed the accuracy with R square and RMSE.

To show the analysis and visualization, d3.js is used

Continuous Integration

This project is using Travis CI as the continuous integration tool Build Status Codacy Badge

stock-market-prediction-using-scala-and-spark's People

Contributors

codacy-badger avatar dishanks9 avatar purvabundela avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.