Giter VIP home page Giter VIP logo

flashback's Introduction

What is Flashback

How can you measure how good your MongoDB (or other databases with similar interface) performance is? Easy, you can benchmark it. A general way to solve this problem is to use a benchmark tool to generate queries with random contents under certain random distribution.

But sometimes you are not satisfied with the randomly generated queries, since you're not confident in how much these queries resemble your real workload.

The difficulty compounds when one MongoDB instance may host completely different types of databases that each have their own unique and complicated access patterns.

That is the reason we came up with Flashback, a MongoDB benchmark framework that allows us to benchmark with "real" queries. It is comprised of a set of scripts that fall into the 2 categories:

  1. records the operations(ops) that occur during a stretch of time;
  2. replays the recorded ops.

The two parts are not tied to each other and can be used independently for different purposes.

How it works

Record

How do you know which ops are performed by MongoDB? There are a lot of ways to do this. But in Flashback, we record the ops by enabling MongoDB's profiling.

By setting the profile level to 2 (profile all ops), we'll be able to fetch the ops information detailed enough for future replay -- except for insert ops.

MongoDB does not log insertion details in the profile DB. However, if a MongoDB instance is working in a "replica set", we can capture insert information by reading the oplog.

Thus, we record the ops with the following steps:

  1. The script starts multiple threads to pull the profiling results and oplog entries for collections and databases that we are interested in. Each thread works independently.
  2. After fetching the entries, we'll merge the results from all sources to get a full picture of all operations.

Replay

With the ops being recorded, we also have a replayer to replay them in different ways:

  • Replay ops with "best effort". The replayer diligently sends these ops to databases as fast as possible. This style can help us to measure the limits of databases. Please note to reduce the overhead for loading ops, we'll preload the ops to the memory and replay them as fast as possible. This potentially limits the number of ops played back per session to the available memory on the Replay host.
  • Reply ops in accordance to their original timestamps, which allows us to imitate regular traffic.

The replay module is written in Go because Python doesn't do a good job in concurrent CPU intensive tasks.

How to use it

Record

Prerequisites

  • The "record" module is written in python. You'll need to have pymongo, mongodb's python driver installed.
  • Set MongoDB profiling level to be 2, which captures all the ops.
  • Run MongoDB in a replica set mode (even there is only one node), which allows us to access the oplog.

Configuration

  • If you are a first time user, please run cp config.py.example config.py.
  • In config.py, modify it based on your need. Here are some notes:
    • We intentionally separate the servers for oplog retrieval and profiling results retrieval. As a good practice, it's better to pull oplog from secondaries. However profiling results must be pulled from the primary server.
    • duration_secs indicates the length for the recording.

Start Recording

After configuration, please simply run python record.py.

Replay

Prerequisites

Go 1.4

Installation

$ go get github.com/ParsePlatform/flashback/cmd/flashback

Command

Required options:

flashback \
    --style=[real|stress] \
    --ops_filename=<file_name> \ # Operations file, such as generated by the Record tool

For a full list of options:

flashback --help

Misc

pcap_converter

pcap_converter is an experimental way to build a recorded ops file from a pcap of mongo traffic.

$ go get github.com/ParsePlatform/flashback/cmd/pcap_converter
$ tcpdump -i lo0 -w some_mongo_cap.pcap 'dst port 27017'
$ pcap_converter -f some_mongo_cap.pcap > ops_filename.json

flashback's People

Contributors

wojcikstefan avatar dbmurphy avatar liukai avatar tmc avatar jameswahlin avatar charity avatar agfeldman avatar igorcanadi avatar

Watchers

mntz avatar James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.