Giter VIP home page Giter VIP logo

logs-distributor's Introduction

log-distributor

Goal

The goal is to design a high-throughput logs distributor that will act as an initial receiver of packets of log messages. The distributor receives log message packets from a number of agents that collect and transmit application/infrastructure logs. The distributor fronts several analyzers, each analyzer being assigned a relative weight (e.g. 0.4, 0.3, 0.1, 0.2). The distributor should route log message packets to analyzers, so that eventually each analyzer analyzes a fraction of log messages roughly proportional to their relative weight.

Features

  • Uses uvicorn, an ASGI web server workers for concurrent request handling, cooperative multi-tasking and high-throughput
  • Uses asyncio to efficiently handle multiple non-blocking operations in parallel within a single request and respond asynchronously
  • Uses thread locks to guard modifications to shared data structures
  • Uses redis to maintain shared state among worker threads

Components:

  • Load Generator: Built using Locust, it supplies the Distributor with an unbounded stream of application/infra logs coming from multiple agents
  • Distributor: A multi-threaded web-server that can perform high-throughput, non-blocking, thread-safe distribution of log messages
  • Analyzer: A service that accepts log messages received from the Distributor based on a pre-defined relative weight

Project Structure:

.
├── distributor
│   ├── distributor.py 
│   └── app.py 
│   └── Dockerfile  
├── analyzer
│   ├── analyzer.py
│   └── Dockerfile
├── loadgen
│   ├── locust.py
└── requirements.txt  

Pre-Requisites

Install the following:

  • python 3.9
  • docker

Run Instructions:

  • Create a virtual environment
python -m venv env
  • Activate the virtual environment
source env/bin/activate
  • Build all the dependencies
pip install -r requirements.txt
  • Build shared docker network for communication
docker network create app-network
  • Build docker image for all services (Distributor and Analyzer)
docker build -t distributor ./distributor
docker build -t analyzer ./analyzer
  • Start the redis instance (to store shared state)
docker run --network app-network --name redis -p 6379:6379 redis
  • Start the Distributor (single instance)
docker run --network app-network -p 3000:3000 --name distributor distributor
  • Spin up as many instances of Analyzer as needed, each as a separate docker container with a unique ID. When an Analyzer is started, it will automatically call the Distributor to register itself with it. Use the following convention:
docker run --network app-network -e ANALYZER_ID={ID} -e -p {INTERNAL_PORT}:{EXTERNAL_PORT} ANALYZER_WEIGHT={WEIGHT} --name analyzer_{ID} analyzer

Example usage: This will spin up 3 Analyzers with weights [0.5, 0.3, 0.2]

docker run --network app-network -e ANALYZER_ID=1 -e ANALYZER_WEIGHT=0.5  -p 3001:3001 --name analyzer_1 analyzer
docker run --network app-network -e ANALYZER_ID=2 -e ANALYZER_WEIGHT=0.3  -p 3002:3002 --name analyzer_2 analyzer
docker run --network app-network -e ANALYZER_ID=3 -e ANALYZER_WEIGHT=0.2  -p 3003:3003 --name analyzer_3 analyzer

Load Test

  • Start the locust load generator to simulate requests coming from multiple source nodes
locust -f ./loadgen/locust.py
  • Open a browser tab at http://localhost:8089/

  • To begin simulating sending of logs, enter the following parameters:

    • Number of Users (peak concurrency): {INTEGER}
    • Ramp Up (users started per second): {INTEGER}
    • Host: http://127.0.0.1:3000
    • Click Start Swarm

locust settings

  • When an analyzer goes offline, it will de-register itself with the Distributor so that it no longer receives any logs. To simulate an analyzer going offline, run the following:
docker container stop {ANALYZER_CONTAINER_NAME}
docker container rm {ANALYZER_CONTAINER_NAME}

Example usage:

docker container stop analyzer_1
docker container rm analyzer_1
  • To stop all containers, run:
docker rm -f $(docker ps -aq)

Manual Testing

  • Once the Distributor and Analyzer(s) are up, if you want to send a single log message to the Distributor:
curl -X POST -H "Content-Type: application/json" -d '{ "timestamp": 1020, "severity": "WARN", "source": "app", "message": "hello world!" }' http://127.0.0.1:3000/message/send
  • To send a fixed number of logs:
for i in {1..100}; do; curl -X POST -H "Content-Type: application/json" -d '{ "timestamp": 1020, "severity": "WARN", "source": "app", "message": "hello world!" }' http://127.0.0.1:3000/message/send ; done
  • If you want to see the stats of how many messages are sent to an analyzer at any point in time:
curl -X GET http://127.0.0.1:3000/analyzer/stats
  • If you want to see the logs of any container, run:
docker logs {CONTAINER_NAME} --tail=0 --follow

Assumptions:

  • The distributor and analyzers are run on the same network and can reach each other
  • When an agent comes online / goes offline, it is its responsibility for informing the distributor
  • Analyzer IDs are unique
  • Even though weights might sum to > 1, they will be normalized proportionately such that all logs are shared based on the relative weights
  • Logs will be distributed based on the analyzers that are "online" at any given point in time. When an new analyzer comes online or an existing analyzer goes offline, the weights are re-adjusted and logs will be distributed based on the new relative weights.
  • Ports assigned to each analyzer is always available and not bound to another process. Port is assigned to an analyzer in the following manner: 3000 + {ANALYZER_ID}, and this port is available at analyzer creation time.
  • {ANALYZER_ID} is an integer and is always >= 1

logs-distributor's People

Contributors

hardiksurana avatar

Stargazers

Syed Amanat avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.