Giter VIP home page Giter VIP logo

optimum-benchmark's Introduction

Optimum-Benchmark Logo

All benchmarks are wrong, some will cost you less than others.

Optimum-Benchmark ๐Ÿ‹๏ธ

Optimum-Benchmark is a unified multi-backend & multi-device utility for benchmarking Transformers, Diffusers, PEFT, TIMM and Optimum flavors, along with all their supported optimizations & quantization schemes, for inference & training, in distributed & non-distributed settings, in the most correct, efficient and scalable way possible (you don't even need to download the weights).

News ๐Ÿ“ฐ

  • PYPI release soon.
  • Added a simple Python API to run benchmarks with all isolation and tracking features supported by the CLI.

Motivations ๐Ÿค”

  • HF hardware partners wanting to know how their hardware performs compared to another hardware on the same models.
  • HF ecosystem users wanting to know how their chosen model performs in terms of latency, throughput, memory usage, energy consumption, etc compared to another model.
  • Experimenting with hardware & backend specific optimizations & quantization schemes that can be applied to models and improve their computational/memory/energy efficiency.

Notes ๐Ÿ“

  • If you were using optimum-benchmark before and want to keep using the old CLI only version, you can still do so by installing from this branch 0.0.1.

Current status ๐Ÿ“ˆ

API

CPU CUDA ROCM MISC

CLI

CPU Pytorch Tests CPU OnnxRuntime Tests CPU Intel Neural Compressor Tests CPU OpenVINO Tests CUDA Pytorch Tests CUDA OnnxRuntime Tests CUDA Torch-ORT Tests TensorRT OnnxRuntime Tests TensorRT-LLM Tests ROCm Pytorch Tests ROCm OnnxRuntime Tests MISC Tests

Quickstart ๐Ÿš€

Installation ๐Ÿ“ฅ

You can install optimum-benchmark using pip:

pip install optimum-benchmark

or by cloning the repository and installing it in editable mode:

git clone https://github.com/huggingface/optimum-benchmark.git
cd optimum-benchmark
pip install -e .

Depending on the backends you want to use, you might need to install some extra dependencies:

  • PyTorch (default): pip install optimum-benchmark
  • OpenVINO: pip install optimum-benchmark[openvino]
  • Torch-ORT: pip install optimum-benchmark[torch-ort]
  • OnnxRuntime: pip install optimum-benchmark[onnxruntime]
  • TensorRT-LLM: pip install optimum-benchmark[tensorrt-llm]
  • OnnxRuntime-GPU: pip install optimum-benchmark[onnxruntime-gpu]
  • Intel Neural Compressor: pip install optimum-benchmark[neural-compressor]
  • Py-TGI: pip install optimum-benchmark[py-tgi]

Running benchmarks from Python API ๐Ÿงช

You can run benchmarks from the Python API, using the launch function. Here's an example of how to run a benchmark using the pytorch backend, torchrun launcher and inference benchmark.

from optimum_benchmark.logging_utils import setup_logging
from optimum_benchmark.experiment import launch, ExperimentConfig
from optimum_benchmark.backends.pytorch.config import PyTorchConfig
from optimum_benchmark.launchers.torchrun.config import TorchrunConfig
from optimum_benchmark.benchmarks.inference.config import InferenceConfig

if __name__ == "__main__":
    setup_logging(level="INFO")
    launcher_config = TorchrunConfig(nproc_per_node=2)
    benchmark_config = InferenceConfig(latency=True, memory=True)
    backend_config = PyTorchConfig(model="gpt2", device="cuda", device_ids="0,1", no_weights=True)
    experiment_config = ExperimentConfig(
        experiment_name="api-launch",
        benchmark=benchmark_config,
        launcher=launcher_config,
        backend=backend_config,
    )
    benchmark_report = launch(experiment_config)
    experiment_config.push_to_hub("IlyasMoutawwakil/benchmarks") # pushes experiment_config.json to the hub
    benchmark_report.push_to_hub("IlyasMoutawwakil/benchmarks") # pushes benchmark_report.json to the hub

Yep, it's that simple! Check the supported backends, launchers and benchmarks matrix in the features section.

Running benchmarks from CLI ๐Ÿƒโ€โ™‚๏ธ

You can also run a benchmark using the command line by specifying the configuration directory and the configuration name. Both arguments are mandatory for hydra. --config-dir is the directory where the configuration files are stored and --config-name is the name of the configuration file without its .yaml extension.

optimum-benchmark --config-dir examples/ --config-name pytorch_bert

This will run the benchmark using the configuration in examples/pytorch_bert.yaml and store the results in runs/pytorch_bert.

The result files are benchmark_report.json, the program's logs cli.log and the configuration that's been used experiment_config.json, including backend, launcher, benchmark and environment configurations.

The directory for storing these results can be changed by setting hydra.run.dir (and/or hydra.sweep.dir in case of a multirun) in the command line or in the config file.

Configuration overrides ๐ŸŽ›๏ธ

It's easy to override the default behavior of a benchmark from the command line.

optimum-benchmark --config-dir examples/ --config-name pytorch_bert backend.model=gpt2 backend.device=cuda

Configuration multirun sweeps ๐Ÿงน

You can easily run configuration sweeps using the -m or --multirun option. By default, configurations will be executed serially but other kinds of executions are supported with hydra's launcher plugins : =submitit, hydra/launcher=rays, etc.

optimum-benchmark --config-dir examples --config-name pytorch_bert -m backend.device=cpu,cuda

Configurations structure ๐Ÿ“

You can create custom and more complex configuration files following these examples.

Features ๐ŸŽจ

optimum-benchmark allows you to run benchmarks with minimal configuration. The only required parameters are:

  • The launcher to use (e.g. process).
  • The type of benchmark (e.g. training)
  • The backend to run on (e.g. onnxruntime).
  • The model name or path (e.g. bert-base-uncased)

Everything else is optional or inferred at runtime, but can be configured to your needs.

Launchers ๐Ÿš€

  • Distributed inference/training (launcher=torchrun)
  • Process isolation between consecutive runs (launcher=process)
  • Assert GPU devices (NVIDIA & AMD) isolation (launcher.device_isolation=true)

Backends & Devices ๐Ÿ“ฑ

  • Pytorch backend for CPU (backend=pytorch, backend.device=cpu)
  • Pytorch backend for CUDA (backend=pytorch, backend.device=cuda)
  • Pytorch backend for Habana Gaudi Processor (backend=pytorch, backend.device=habana)
  • OnnxRuntime backend for CPUExecutionProvider (backend=onnxruntime, backend.device=cpu)
  • OnnxRuntime backend for CUDAExecutionProvider (backend=onnxruntime, backend.device=cuda)
  • OnnxRuntime backend for ROCMExecutionProvider (backend=onnxruntime, backend.device=cuda, backend.provider=ROCMExecutionProvider)
  • OnnxRuntime backend for TensorrtExecutionProvider (backend=onnxruntime, backend.device=cuda, backend.provider=TensorrtExecutionProvider)
  • Intel Neural Compressor backend for CPU (backend=neural-compressor, backend.device=cpu)
  • TensorRT-LLM backend for CUDA (backend=tensorrt-llm, backend.device=cuda)
  • OpenVINO backend for CPU (backend=openvino, backend.device=cpu)

Benchmarking ๐Ÿ‹๏ธ

  • Memory tracking (benchmark.memory=true)
  • Energy and efficiency tracking (benchmark.energy=true)
  • Latency and throughput tracking (benchmark.latency=true)
  • Warm up runs before inference (benchmark.warmup_runs=20)
  • Warm up steps during training (benchmark.warmup_steps=20)
  • Inputs shapes control (e.g. benchmark.input_shapes.sequence_length=128)
  • Dataset shapes control (e.g. benchmark.dataset_shapes.dataset_size=1000)
  • Prefill latency and Decoding throughput deduced from Generate and Forward pass (auto-enabled for text generation models)
  • Forward, Call and Generate pass kwargs control (e.g. for an LLM benchmark.generate_kwargs.max_new_tokens=100, for a diffusion model benchmark.call_kwargs.num_images_per_prompt=4)

Backend features ๐Ÿงฐ

  • "No weights" to benchmark models without downloading their weights (backend.no_weights=true)
  • Onnxruntime Quantization and AutoQuantization (backend.quantization=true or backend.auto_quantization=avx2, etc)
  • Onnxruntime Calibration for Static Quantization (backend.quantization_config.is_static=true, etc)
  • Onnxruntime Optimization and AutoOptimization (backend.optimization=true or backend.auto_optimization=O4, etc)
  • BitsAndBytes quantization scheme (backend.quantization_scheme=bnb, backend.quantization_config.load_in_4bit, etc)
  • GPTQ quantization scheme (backend.quantization_scheme=gptq, backend.quantization_config.bits=4, etc)
  • PEFT training (backend.peft_strategy=lora, backend.peft_config.task_type=CAUSAL_LM, etc)
  • Transformers' Flash Attention V2 (backend.use_flash_attention_v2=true)
  • Optimum's BetterTransformer (backend.to_bettertransformer=true)
  • DeepSpeed-Inference support (backend.deepspeed_inference=true)
  • Dynamo/Inductor compiling (backend.torch_compile=true)
  • Automatic Mixed Precision (backend.amp_autocast=true)

Contributing ๐Ÿค

Contributions are welcome! And we're happy to help you get started. Feel free to open an issue or a pull request. Things that we'd like to see:

  • More backends (Tensorflow, TFLite, Jax, etc).
  • More tests (for optimizations and quantization schemes).
  • More hardware support (Habana Gaudi Processor (HPU), etc).
  • Task evaluators for the most common tasks (would be great for output regression).

optimum-benchmark's People

Contributors

ilyasmoutawwakil avatar fxmarty avatar tomaarsen avatar aliabdelkader avatar actions-user avatar karthickai avatar lopozz avatar aoowweenn avatar benhachy avatar poznano-amd avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.