Giter VIP home page Giter VIP logo

fastf1-project's Introduction

FastF1 Project

Extraction and modeling of F1 data

Tools

  • FastF1 - A python library for accessing F1 data
  • Airflow - A data orchestration tool
  • PostgreSQL - A relational database

Recreating the project

  • Go to the airflow directory:
    cd airflow/
    
  • If you have run airflow in docker before, run these commands first to start afresh:
    docker-compose down
    docker volume rm airflow_postgres-db-volume
    rm -r logs
    
  • Run these commands:
    echo -e "AIRFLOW_UID=$(id -u)" > .env
    mkdir logs/
    mkdir ../fastf1_cache/
    docker-compose build
    docker-compose up airflow-init
    docker-compose up -d
    
  • Check periodically through docker ps whether the web server is healthy.
  • Once the web server is healthy, go to localhost:8080 on your browser. Use airflow as username and password.
  • Trigger the DAG on the airflow UI.
  • To check results on the database as the dag is running, use psql or any Postgres client you have to log in to Postgres. The database is available on port 5432. The username is airflow and the database name is fastf1:
    psql -h localhost -p 5434 -U airflow -d fastf1
    
  • If you can't access data from the terminal, go into the docker container and access Postgres there:
    docker exec -it <container_id> bash
    
    psql -U airflow -d fastf1
    
  • You can run these queries on the database after the year 2019 has finished running on the Airflow UI to test that the data entered and everything is working perfectly:
    select * from dim_sessions;
    select * from dim_drivers;
    select * from fact_results;
    select * from fact_lap_statistics;
    select * from fact_lap_telemetry_statistics;
  • For a rerun of years that stop unexpectedly:
    • Login into docker on the shell:
    docker exec -it <container_id> bash
    
    • For 2019:
    airflow dags backfill -s 2019-01-01 -e 2019-12-31 --reset-dagruns fastf1_extract
    
    • For 2020:
    airflow dags backfill -s 2020-01-01 -e 2020-12-31 --reset-dagruns fastf1_extract
    
    • For 2021:
    airflow dags backfill -s 2021-01-01 -e 2021-12-31 --reset-dagruns fastf1_extract
    
    • For 2022:
    airflow dags backfill -s 2022-01-01 -e 2022-12-31 --reset-0dagruns fastf1_extract
    
    • It has already been set not to repeat what has been run before.

Project flow

  • When running the containers initially, the database is created through the Postgres entry point. This file runs before any container starts running
  • When the dag is initialized, its sole job is to run this python file from 2019 to 2022.
  • Python gets the event schedule for each year and structures it according to the database schema.
  • For each session, the results and lap statistics are gotten and also sent to the database.

Database Structure

  • Fastf1 library data is sent to the database following the star schema schema
  • The database contains 2 dimension tables: dim_sessions and dim_drivers that contain information about sessions and drivers
  • There are 2 main fact tables: fact_results and fact_lap_statistics which contain race results and drivers lap statistics respectively
  • fact_lap_statistics is further normalized into another table: fact_lap_telemetry_statistics which contains telemetry information for each lap.
  • I should note here that in an ideal star schema, there should be only one fact table, but due to the data's complexity, normalizing the data's facts made it easier to reason about.

fastf1-project's People

Contributors

toludaree avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.