Giter VIP home page Giter VIP logo

lussierc / stockstoryscraper Goto Github PK

View Code? Open in Web Editor NEW
4.0 1.0 2.0 154.29 MB

The Stock Story Scraper (SSS) is a text mining tool that gathers a stock's relevant articles and performs extensive sentiment analysis on them. Fall 2020 Independent Study - Allegheny College.

Python 94.92% Dockerfile 0.78% Shell 4.06% Batchfile 0.24%
predict-stock-swings text-mining stock-news sentiment stocks

stockstoryscraper's Introduction

StockStoryScraper [SSS]

Scrapes new articles from highly rated stock news websites thru Google News, analyzes them for sentiment among other things, then scores them and provides an overall rating of stock sentiment and well-being. Allows users to get a quick look into how a stock is performing in the news.

Fall 2020 Independent Study - Allegheny College.

SSS Logo

Tool Overview

  • Scrapes articles from highly rated stock news websites specified by the user. The user also enters their chosen stocks, corresponding ticker symbols, date range for scraping, and an export file (.csv) that can be read back into the tool.
    • With this, users can read back in their previously exported CSVs of article information to view their results again.
      • Code to compare results from different runs will be added in future updates.
  • Uses vaderSentiment to perform textual sentiment analysis.
  • Scores the articles, gathers their price information, and generates results pertaining to the stocks overall sentiment feelings and well being.
  • Tool is used via a User Interface (web application) using Streamlit or a Command Line Interface.
  • More features to be added soon.

What's the point?

It takes a lot of time to read every available news article about a stock, whether you are a professional trader or an amateur. The tool quickly gathers all the relevant articles from highly rated stock websites in the user's defined date range, then analyzes their textual sentiments. The user can read the articles that were scraped if they choose or look at the numerous graphs on the tool depicting article sentiments and overall stock feelings/performance.

This saves lots of time and gives users a one stop shop for stock news and the automatic analysis of them.

View the Program in Action

Here is an example of the projects web app UI (using Streamlit) in action. The example shows the settings being configured for a new, fresh run of the project:

Run screen!

Running the Project

There are a few ways that users can run the project! These methods include using your own local Python3 installation, Pipenv, or Docker.

Running with Docker

The program can be run within a Docker Container using Docker Desktop. For more information on how to install this program, view this resource.

There are builder scripts for each type of machine. First ensure you are in the src directory. To run the Mac OS version for instance, you would use the following commands:

  1. sh ./docker/build_macOS.sh -- builds the container
  2. sh ./docker/run_macOS.sh -- enters the container
  3. python3 run_tool.py -- run the program

OS-specific scripts to build and run containers

The following bash scripts simplify building the container.

OS Building Running
MacOS ./build_macOS.sh ./run_macOS.sh
Linux ./build_linux.sh ./run_linux.sh
Windows build_win.bat run_win.bat

These files may be found in the directory, docker/ and the builder require a copy of Dockerfile to run which is in the src directory, hence why these command should be run from the src directory like in the example above.

Running with Pipenv

Make sure Python(3) and Pipenv are installed on your machine. Find information on installing pipenv here.

Pipenv

The project comes with a Pipfile in the src directory that will install the necessary packages for the program, making it easy for users with Pipenv to run the project on their machines.

First navigate to the src directory using cd src. Then run the command pipenv lock to install the necessary Python packages.

You can then run the command pipenv run python3 run_tool.py to run the program. You will be presented with the option to run either the UI web interface or the Command Line Interface.

Running with Python

First ensure Python and Pip are installed on your machine. Then navigate to the src directory.

You can install the required packages for the project using Pip by running pip3 install -r requirements.txt or pip install -r requirements.txt depending on your machine's Pip installation.

Then, you can run the program by using the command python3 run_tool.py.

Note: if you run into Spacy issues while running the program, you may have to run the command python3 -m spacy download en.

Problems, Ideas, or Praise

Please leave an issue in the Issue Tracker if you encounter errors, have ideas, or anything of the like!

Future Work

View the Issue Tracker to see future tasks that will be completed in the near future.

stockstoryscraper's People

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

Forkers

boazde pk1762012

stockstoryscraper's Issues

Refactor Code

As I near the completion of the basic version of the tool, I will need to refactor the code of the project to make it more efficient, syntactically correct, easier to test, and readable for others.

With this, there are a few things I know I need to refactor now:

  • CML code
  • Create standalone interface (contains options to run CML or UI)
  • Refactor results calculation code
  • etc.

Add More Results Calculations

Need to improve & refactor current calculations in addition to adding a few new ones.

Possible calculations to be added:

  • Create instability status calculation
  • Allow users to scrape multiple date ranges at once and them compare them
  • Create "Buy Status" calculation which will advise users to either buy, sell, or pass/hold on trading a stock

Write Report

Draft a basic report/accompanying document for the project that describes the tool, the project motivations, the work completed, future work that could be completed, the accuracy of the tool, and possible shortcomings of the tool.

Improve Display of Results in Web App

Need to improve the display of metrics in the web app. Some of the graphs and flow of information just doesn't feel right.

Will add more comments in the future about what work this issue would entail.

Clean up CML Printing

The way the CML is currently printing content needs to be improved. This would include:

  • Removing extraneous print statements of program data that the user should not see.
  • Cleaning up prompts and other print statements.

Investigate Refactoring Opportunities

Investigate what areas of the code (and project as a whole) can & should be refactored. Make tickets for these areas, these improvements will be included in Version 2.

I already looked into it a bit when creating #12. Start with those code areas first & then look elsewhere. #12 will likely be closed once the other tickets are created & this research is complete.

Also, investigate new feature opportunities.

Implement Interface

Once I finalize the backend, which includes calculating some more results and outputting/inputting them, I will need to implement my Streamlit interface.

This interface will include a main/welcome page which asks the user to input the stocks and websites they want to use. Once this information is downloaded, an overview of each stock will be displayed on the screen giving insights into it's overall "health". Users can then go to different pages which display the articles and their information for each stock. More info will also be displayed from results.

Fix Version Issues

Define versions for Python packages in the requirements.txt file and fix issues in the web app that came up with the introduction of a new streamlit version.

Create .gitignore

I need to create a .gitignore file that ignores things like CSV files, pycache files, .DS_Store files, and more.

It should remove all files not necessary on the remote repository to save space and reduce the time needed for new users to remove the repository.

Testing

Implement a basic test suite using Pytest.

Tests could include:

  • Testing that different articles are properly analyzed
  • Testing calculations are correct using sample data for results
  • Ensure sample articles can be properly scraped
  • etc.... more to be added to this issue/the PR for testing as time elapses

Improve Stock Well Being Prediction

Try using things like Neural Networks or Linear Models to calculate the stock well being prediction. Need to update it from the basic calculation it is making now.

Finalize Program Run Methods

Allow users to run the program using:

  • Regular Python
  • Pipenv
  • Docker

Include all the commands necessary in the README.

Finalize Documentation

Finalize the README with information about the tool, how to use it, and it's results/accuracy.

Include the run commands for the run methods discussed in #9.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.