Giter VIP home page Giter VIP logo

robguilarr / brawlstars-retention-pipeline Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 0.0 5.55 MB

End-to-end pipeline to extract logs from Brawlstars API, using a KMeans model for player classification and cohorts creation, to produce parametrized bounded retention metrics loading them to a Google Cloud Bucket.

Home Page: https://www.robguilar.com/posts/brawlstars_retention_pipeline/

Python 80.09% Jupyter Notebook 17.80% Dockerfile 2.12%
gaming kedro pyspark

brawlstars-retention-pipeline's Introduction

Repository Overview

โš ๏ธ Important: Any views, material or statements expressed are mines and not those of my employer

Hi there! Thanks for visiting my repository. Here you'll find my most recent personal projects, which were also published on my portfolio website.

๐Ÿ‘จ๐Ÿผโ€๐Ÿ’ป About me

As a Data Scientist and Engineer, I specialize in transforming complex data into tangible business outcomes, particularly for the retail and consumer packaged goods sectors.

My expertise lies in deploying advanced analytics algorithms, deep learning solutions, and machine learning approaches.

๐Ÿš€ Interests

The passion for exploring the convergence of graphics technology platforms and analytics solutions, particularly in areas such as gaming and the metaverse, has been a driving force in my career, as in the coming years this unique intersection will offer numerous innovative solutions that can contribute effectively to a tangible impact.

My journey in this field was initiated some years ago with a self-taught venture into Unity programming, which not only laid the foundation for my technical background but also sparked this deep interest.

If you want to contact me, you can find more information in the next section.


Tech Stack

This repository is primarily built to showcase the technologies I use to develop my projects, including current and in-progress work.

python kedro spark nvidia_cuda nvidia_omniverse databricks GCP Unity FastAPI Gemini OpenAI


GitHub Activity Stats

robguilarr github activity graph


brawlstars-retention-pipeline's People

Contributors

robguilarr avatar

Stargazers

 avatar  avatar

Watchers

 avatar

brawlstars-retention-pipeline's Issues

metadata_request_preprocess assembly

Do a similar node as battlelog_request_preprocess:

# General dependencies
import pandas as pd
import brawlstats
# To load the configuration (https://kedro.readthedocs.io/en/stable/kedro_project_setup/configuration.html#credentials)
from kedro.config import ConfigLoader
from kedro.framework.project import settings
conf_loader_local = ConfigLoader(conf_source= settings.CONF_SOURCE, env= 'local')
conf_credentials = conf_loader_local['credentials']
# Logging
import time
import logging
log = logging.getLogger(__name__)
# Async processes
import asyncio

def battlelogs_request(player_tags: str) -> pd.DataFrame:
    '''
    Extracts Battlelogs from Brawlstars API by executing an Async Event Loop over a list of futures objects. These are
    made of task objects built of Async threads due blocking call limitations of api_request sub_module.
    Args:
        player_tags: PLayer tag list
    Returns:
        All players battlelogs concatenated into a structured Dataframe
    '''
    # Get key and validate it exists
    API_KEY = conf_credentials.get('brawlstars_api', None).get('API_KEY', None)
    try:
        assert API_KEY != None
    except AssertionError:
        log.info("No API key has been defined. Request one at https://developer.brawlstars.com/")

    # Create client object from brawlstats API wrapper, be aware of preventing the rate limit for huge requests,
    # review prevent_ratelimit in the source code
    client = brawlstats.Client(token=API_KEY)

    # Create list of player tags, from catalog
    player_tags = player_tags.split(',')

    def api_request(tag: str) -> pd.DataFrame:
        '''Request battlelogs from the Brawl Stars API and give a strutured format'''
        try:
            # Extract list of 25 most recent session logs
            player_battle_logs = client.get_battle_logs(tag).raw_data
            # Normalize data in structured format
            player_battle_logs_structured = pd.json_normalize(player_battle_logs)
            player_battle_logs_structured['player_id'] = tag
        except:
            log.info(f"No Battlelog extracted for player {tag}")
            player_battle_logs_structured = pd.DataFrame()
            pass
        return player_battle_logs_structured

    async def api_request_async(tag: str) -> pd.DataFrame:
        '''
        Transform non-sync request function to async coroutine, which creates
        a future object by API request.
        The Coroutine contains a blocking call that won't return a log until it's complete. So,
        to run concurrently, await the thread and not the coroutine by using this method.
        '''
        return await asyncio.to_thread(api_request, tag)

    async def spawn_request(player_tags: list) -> pd.DataFrame:
        '''Use gathering to request battlelogs as async tasks objects, made of coroutines'''
        start = time.time()
        log.info(f"Battlelogs request process started")
        # Comprehensive list of coroutines as Task Objects, whom will be already scheduled its execution
        requests_tasks = [asyncio.create_task(api_request_async(tag)) for tag in player_tags]
        # Future Object: List of battlelogs as Dataframes
        battlelogs_data_list = await asyncio.gather(*requests_tasks)
        # When all tasks all executed, concat all dataframes into one
        battlelogs_data = pd.concat(battlelogs_data_list, ignore_index=True)
        log.info(f"Battlelogs request process Finished in {time.time() - start} seconds")
        return battlelogs_data

    # Run the events-loop
    battlelogs_data = asyncio.run(spawn_request(player_tags[:20]))

    # Validate concurrency didn't affect the data request
    try:
        assert not battlelogs_data.empty
    except AssertionError:
        log.info("No Battlelogs were extracted. Please check your Client Connection")

    return battlelogs_data

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.