Giter VIP home page Giter VIP logo

dwl-project_scj's Introduction

DWL-Project_SCJ

Repository for DWL-Project of group SCJ. It is expected, that the user of this repository has some basic knowledge about:

  • Python
  • AWS Services RDS, Lambda, S3
  • Apache Airflow Docs
  • (Docker Docs, depends on the operating system)

!! DWL_02 !!

For details of contents of the second part of the project for the second module of Data Warehouse and Data Lake Systems, please refer the folder DWL_02

About the Project

The idea that you can add value to your finances through a smart investment strategy is something most individuals understand relatively early in life. It is the case, however, that the decision to invest a portion of one's assets is not made until later for many individuals. The main purpose for the whole project is, to analyse and present the performance of various investment opportunities over the last few months, in order to subsequently provide an overview of investment strategies and opportunities for newcomers.

Data Source

Data was extracted from four API

  • Binance: prices and other key figures of top 10 crypto currencies (according to market cap)
  • YahooFinance: prices and other key figures of four indices and two precious metals
  • Reddit: posts in which the investment assets are mentionned
  • Twitter: Count of tweets which have hashtags of the investment assets

TwitterAPI_HistoricalData - File

Code for extracting and loading the data of the Twitter API within the period 1/1/2021 - 5/4/2022. Goal of this script was to load the data into the RDS from the beginning of the period, we want to start the analyses, till the start of the daily data load. The code was executed once, as part of the script, the database tables were created.

Requirements:

  • Packages mentioned in the first cell should be installed
  • Access to a Twitter Academic research account
  • Database is prepared
  • Database credentials and Twittwer Bearer token are stored in an .env-file

TwitterAPI_Lambda_dailyload - File

Code of a AWS Lambda function for extracting and loading the data of the Twitter API of the last day on a daily basis. The function is executed every day.

Requirements:

  • Packages mentioned in the first cell should be part of a layer in the Lambda function
  • Access to a Twitter developer account
  • Database is prepared
  • Database credentials and Twittwer Bearer token are stored as environmental variable of the lambda function

YahoofinanceAPI_HistoricalData - File

Code for extracting and loading the data of the YahooFinance API within the period 1/1/2021 - 31/3/2022. Goal of this script was to load the data into the RDS from the beginning of the period, we want to start the analyses, till the start of the daily data load. The code was executed once, as part of the script, the database tables were created.

Requirements:

  • Packages mentioned in the first cell should be installed
  • Database is prepared
  • Database credentials are stored in an .env-file

YahoofinanceAPI_Lambda_dailyload - File

Code of a AWS Lambda function for extracting and loading the data of the YahooFinance API of the last day on a daily basis. The function is executed every day.

Requirements

  • Packages mentionned in the first cell should be part of a layer in the Lambda function
  • Database is prepared
  • Database credentials are stored as environmental variable of the lambda function

BinanceAPI_HistoricalData - File

Code for extracting and loading historical cryptocurrency data from Binance within the period 1/1/2017 - today. Goal of this script is the same like for YahooFinance: Load the data into the RDS from beginning of the period and create the database tables.

Requirements:

  • Packages mentioned in the first cells of code may be needed to be installed.
  • Database is prepared
  • Binance account is necessary incl. creating an API-Connection on Binance account
  • Credentials stored in an .env-File

BinanceAPI_Lambda_dailyload - File

Code of a AWS Lambda function for extracting and loading daily the data of the Binance API of the previous day. The function is executed every day.

Requirements:

  • Packages mentionned in the first cell should be part of a layer in the Lambda function
  • Database is prepared
  • Binance account is necessary incl. creating an API-Connection on Binance account
  • Database credentials are stored as environmental variable of the lambda function

Reddit_HistoricalData - File

Code for extracting, light transforming and load comment data from Reddit to S3 bucket from 1/1/2021 to circa 5/4/2022.

Requirements:

  • Packages mentioned in the first cells of code may be needed to be installed. Alternative:
  • .env-File in the same folder like the script with the credentials of at least AWS (Reddit is only necessary if praw-library is used, e.g. for looking for certain subreddits). The naming of the variables can be take out of the script.
  • an S3 bucket on AWS (IMPORTANT: put in the right bucket name in Line 106)
  • enough time if you're looking for bitcoin-comments for a long period ๐Ÿ˜‰

ApacheAirflow / Reddit_PeriodicalData_Airflow - Folder

All necessary files for getting periodical Reddit data with Apache Airflow. The current configuration was run on an Windows10 operating system inside a Docker Container. It is also possible to run the DAGs outside of a Docker container.
It is intended to run the code every three days. If a different period is desired, Line 25 and Line 116 have to be changed.

Requirements: (assuming Apache Airflow will be run in Docker on Windows10) The installation of Docker + Airflow is very well explained here - text and here - Video

  • Install Docker Engine (incl. Docker Compose)
  • Copy the project-Folder airflow-docker to the desired location
  • Run the command below to ensure the container and host computer have matching file permissions:
echo -e "AIRFLOW_UID=$(id -u)\nAIRFLOW_GID=0" > .env
  • Run docker-compose with following command -> Apache Airflow will be started, missing packages should be installed (using those two files: requirements.txt & Dockerfile)
docker-compose up
  • Specify following variables in Apache Airflow:
    • AWS credentials (ACCESS_KEY, SECRET_KEY, SESSION_TOKEN)
    • S3 bucket name (bucket = 'Bucketname of S3-Bucket')
  • Run the DAG Reddit_PeriodicalData_Airflow.py

dwl-project_scj's People

Contributors

databauheini avatar githubcs101 avatar sachsjef avatar

Watchers

 avatar

Forkers

juanparker1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.