Giter VIP home page Giter VIP logo

airbnb_config_generation's Introduction

Usage

This is meant to be a one-stop-shop where configure files can be updated and pushed to AWS to control the airbnb scraper infrastructure. Ideally, we narrow the inputs down to the few knobs we may want to turn at any given time. The main config file is config.json which controls:

  • pricing scraper template location
  • availability scraper template location
  • aws batch template location
  • eventbridge template location
  • pricing scraper configuration
    • how many days of pricing do we want to pull into the future
    • where to output configuration for pricing scraper
    • where to write outputs from scraper
    • where to write metadata generated by scraping run
  • availability scraper configuration
    • start month and how many months into future to scrape
    • where to output configuration for availability scraper
    • where to write outputs from scraper
    • where to write metadata generated by scraping run
  • configuration for creating list of IDs to run
    • number of IDs per container
    • location of master list of IDs
    • where to write config to
    • max duration of scraping job

The file generate_id_scraper_configs.py is the "main" function and will use the templates and config.json to create the needed configurations and write them to their rightful s3 locations. This is run on a recurring basis from a lambda function to ensure that configs are up-to-date before the ID scrapers are run. In theory, if we want to make on-the-fly changes to the infrastructure, we should be able to update the configs here then push them using

./bin/push_master_configs.sh

Setup

  • Configure AWS CLI
  • Create ECR repo
  • Push container to ECR repo
  • Create ECS cluster if you don't already have one
  • Create necessary IAM roles
    • ecsTaskExecutionRole
      • AmazonS3FullAccess
      • AmazonECSTaskExecutionRolePolicy
    • AmazonECSTaskS3BucketRole
      • AmazonECSTaskS3BucketPolicy
        • s3 accesses for bucked (I use full access)
  • Update Task definition with the correct URIs
  • Create task definition
  • Create s3 bucket if needed
  • Update master configs, templates if needed
  • Upload master configs
  • Run task definitio

Switched to running container in lambda since couldnt schedule ecs, seems to work just testing, try to schedule working on schedule thru lambda, can also get lambda to start an ECS task if we want. Probably fine as lambda for now

updated to include emr config, havent built/psuhed/tested

airbnb_config_generation's People

Contributors

jcd13d avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.