Giter VIP home page Giter VIP logo

docker-secor's Introduction

Secor 0.22 (Kafka 0.10.0.1)

This project uses Docker (and Gradle under the hood) to produce a Docker image containing an uber-jar (single jar containing all dependencies) version of Pinterest Secor (https://github.com/pinterest/secor).

Currently, this project packages Secor 0.22, using Kafka client version 0.10.0.1.

docker-entrypoint.sh is the entrypoint into the image, it is responsible for taking passed in ENV vars at runtime and configuring Secor accordingly. The table below shows all ENV variables you can specify to configure Secor's operation.

Building the Image

You can build the image locally by doing:

docker build -t [yourorg/]secor .

How to Run

If you are running from outside an AWS instance, you must specify AWS credentials:

docker run \
  -e DEBUG=true \
  -e ZOOKEEPER_QUORUM=zookeeper.quorum.consul:2181 \
  -e AWS_ACCESS_KEY=YOUR_KEY \
  -e AWS_SECRET_KEY=YOUR_SECRET \
  -e SECOR_S3_BUCKET=my-kafka-backups \
  -e SECOR_GROUP=raw_logs \
  secor

There are many more configuration options that can be passed in via -e NAME=value, described in the table below.

## Releasing (to Ovotech)

The Docker Hub repo (ovotech/secor) is configured to automatically build when this repository is changed (i.e., when a change is pushed to master, or when a tag/release is created).

Configuration options

Can be configured using environment variables:

Environment Variable Name Required? Purpose
ZOOKEEPER_QUORUM Yes Zookeeper quorum (not required if KAFKA_SEED_BROKER_HOST is specified)
KAFKA_SEED_BROKER_HOST Yes Kafka broker hosts (not required if ZOOKEEPER_QUORUM is specified)
SECOR_S3_BUCKET Yes The S3 bucket into which backups will be persisted
SECOR_S3_PATH Yes Path within S3 bucket where sequence files are stored
SECOR_FILE_READER_WRITER_FACTORY No Which WriterFactory to use (defaults to: SequenceFileReaderWriterFactory)
DEBUG No Enable some debug logging (defaults to: false)
JVM_MEMORY No How much memory to give the JVM (via the -Xmx parameter) (defaults to: 512m)
KAFKA_SEED_BROKER_PORT No Kafka broker port (defaults to: 9092)
AWS_ACCESS_KEY No AWS Access key
AWS_SECRET_KEY No AWS Secret access key
AWS_REGION No The AWS region for S3 uploading
AWS_ENDPOINT No The AWS S3 endpoint to use for uploading
SECOR_KAFKA_TOPIC_FILTER No Regexp filter which topics it should replicate (defaults to: .*)
SECOR_KAFKA_TOPIC_BLACKLIST No Which topics to exclude from backing up
SECOR_MAX_FILE_BYTES No Max bytes per file stored in S3 (defaults to: 200000000)
SECOR_MAX_FILE_SECONDS No Max time per file before it is stored in S3 (defaults to: 3600)
SECOR_COMPRESSION_CODEC No Which Hadoop compression codec to use for partition files
SECOR_FILE_EXTENSION No Custom file extension to be appended to all partition names
SECOR_TIMESTAMP_NAME No When using DateMessageParser what field to use in the JSON (defaults to: timestamp)
SECOR_TIMESTAMP_PATTERN No When using DateMessageParser what format the timestamp is
PARTITIONER_GRANULARITY_HOUR No Should Secor partition the files up by hour as well as day? (defaults to: false)
PARTITIONER_GRANULARITY_MINUTE No Should Secor partition the files up by hour as well as hour, and day? (defaults to: false)
SECOR_KAFKA_GROUP No Kafka consumer group name (defaults to: secor_backup)
SECOR_MESSAGE_PARSER_CLASS No Which message parser factory to use (defaults to: OffsetMessageParser)
SECOR_GENERATION No Generational version ID to differentiate between incompatible upgrades (defaults to: 1)
SECOR_CONSUMER_THREADS No Number of consumer threads per Secor process (defaults to: 7)
SECOR_MESSAGES_PER_SECOND No Maximum number of messages consumed per second for the entire process (defaults to: 10000)
SECOR_OFFSETS_PER_PARTITION No How many offsets should be stored within a backed up partition? (defaults to: 10000000)
KAFKA_OFFSETS_STORAGE No Should offsets be stored in Kafka or ZooKeeper? (defaults to: kafka)
KAFKA_DUAL_COMMIT_ENABLED No Should Secor commit processed offsets to Kafak AND ZooKeeper? (defaults to: false)
SECOR_OSTRICH_PORT No What port should Ostrich data be sent on (defaults to: 9999)

Using without Docker

If you just wish to package up Secor as a standalone uber-JAR, you can run the gradle tasks directly, e.g.:

./gradlew shadowJar

docker-secor's People

Contributors

dwo avatar pyoio avatar sjoerdmulder avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.