yardstiq / quantum-benchmarks Goto Github PK
View Code? Open in Web Editor NEWbenchmarking quantum circuit emulators for your daily research usage
License: Other
benchmarking quantum circuit emulators for your daily research usage
License: Other
@atilag the single gate benchmark is almost flat for qiskit, but for other frameworks it seems normal (at least is scaling with the system size). Could you help me review if there is anything wrong in the benchmark or this is the correct? I'm not sure if the script is correct since the scaling is kinda strange. This benchmark is running with the master branch with the commit SHA given in README.
we need to add a full implementation for quantum circuit born machine that includes the AD time for PennyLane, Yao and tf quantum
Just double-checking that you didn't benchmark the Forest simulator right?
Hi. I've run the Benchmark (forked from commit 2492b3a) with some latest Qiskit and Cirq-0.13.1
. Multi-gate tests look valid, but on single-gate tasks I see the attached picture. AFAIK the non-exponential performance in Cirq's case means that it uses some "forbidden" optimisation like erasing the unchanged qubits or alike. Does this explanation sound right and how can I find and disable this optimization?
Hi there,
It's excellent to see initiative benchmarking the wide suite of available QC emulators!
However, it appears PyQuEST-cffi is mislabeled as "QuEST" in the plot legends.
PyQuEST-cffi is an independent project by HQS to write python bindings for the C project QuEST, on which I myself work. These python bindings carry overhead to the underlying QuEST C functions, and hence their performance can be (especially with large iteration in python) significantly worse than QuEST's, which is not here benchmarked. Is it possible to correct these legends?
Note I believe, like Yao, PyQuEST-cffi supports GPU in addition to CPU (since QuEST supports multithreading, GPU and distribution).
Thanks very much,
Tyson
see #30
Hi,
I'm a contributor of qulacs.
First of all, thanks for adding our library qulacs to this nice benchmark project.
I've checked qulacs benchmark script, and confirmed it is implemented in an efficient way.
On the other hand, I have two following request/question about benchmarks.
Though our library was incompatible with the latest gcc in the previous version, now we believe "pip install qulacs" works for all the recent gcc (and it is merged to our SIMD codes). Can I ask you to try it, and replace build script for forked repository with pypi package install ("qulacs==0.1.8" in requirements.txt)?
If you will do gpu benchmarking with the same project, (qulacs-gpu==0.1.8) might be better, which enables both CPU/GPU simulation, but fails to build without CUDA.
As far as I know, for example, cirq will perform simulation with complex64 by default (https://cirq.readthedocs.io/en/stable/generated/cirq.Simulator.html), but qulacs compute with complex128. Is there any regulation about precision? I think benchmarks should be done in the same precision if possible.
Thanks,
It seems that the link embedding the quantum circuit born machine picture in CONTRIBUTING.md
is unfortunately dead. It might be a good idea to host the file in this repo the way the benchmark plots are saved in /images
.
Hi @hillmich
currently our benchmark machine fails to build jkq-ddsim due to the following error:
CMake Warning at /usr/share/cmake-3.10/Modules/FindBoost.cmake:567 (message):
Imported targets and dependency information not available for Boost version
(all versions older than 1.33)
Call Stack (most recent call first):
/usr/share/cmake-3.10/Modules/FindBoost.cmake:907 (_Boost_COMPONENT_DEPENDENCIES)
/usr/share/cmake-3.10/Modules/FindBoost.cmake:1558 (_Boost_MISSING_DEPENDENCIES)
apps/CMakeLists.txt:3 (find_package)
CMake Error at /usr/share/cmake-3.10/Modules/FindBoost.cmake:1947 (message):
Unable to find the requested Boost libraries.
Unable to find the Boost header files. Please set BOOST_ROOT to the root
directory containing Boost or BOOST_INCLUDEDIR to the directory containing
Boost's headers.
Call Stack (most recent call first):
apps/CMakeLists.txt:3 (find_package)
I'm wondering if you have boost installation setup in your cmake? since I don't think we would like to have boost as the default global benchmark environment since it's a quite large dependency. I'm wondering if not, would you mind update the setup.sh
to install boost for jkq-ddsim?
NOTE: also you might already notice we refactor the benchmark recently to make it more modular.
In CONTRIBUTING.md, the link https://quantumbfs.github.io/Yao.jl/latest/assets/figures/differentiable.png seems to be broken.
the benchmark data path is not correctly handled correct.
Hi,
I've noticed you pushed the benchmark results in the data/
directory. However, the bin/plot
script does not generate the plots. Did I miss some preprocessing steps?
Best regards and stay safe,
Stefan
$ bin/plot
Traceback (most recent call last):
File "bin/plot", line 9, in <module>
labels=['X', 'H', 'T', 'CNOT', 'Toffoli']
File "/home/stefan/repos/quantum-benchmarks/bin/utils/plot_utils.py", line 89, in parse_data
gate_data[each_package] = wash_benchmark_data(each_package, labels)
File "/home/stefan/repos/quantum-benchmarks/bin/utils/plot_utils.py", line 44, in wash_benchmark_data
with open(find_json(name)) as f:
File "/home/stefan/repos/quantum-benchmarks/bin/utils/plot_utils.py", line 34, in find_json
for each in os.listdir(benchmark_path):
NotADirectoryError: [Errno 20] Not a directory: '/home/stefan/repos/quantum-benchmarks/data/yao.csv'
let's add benchmarks for tensorflow quantum
Hello @Roger-luo, I am a developer of Qiskit Aer and was recently shown your rather nice benchmark repo. I have some suggestions for how the qiskit benchmarks could be improved, since I feel they are under representing the simulator.
Suggestions:
When you transpile the circuit in qiskit you need to include the backend so that it compiles to the native basis gates of the simulator, otherwise it will unroll all single-qubit gates to u3 gates.
You shouldn't be using the statevector simulator for benchmarks, rather you should be using the qasm_simulator. The statevector simulator has a lot of overhead in serializing the statevector via JSON, where as the qasm simulator does not (you can still ask for a snapshot of the statevector in the qasm simulator). This overhead has been improved somewhat in our next release due to replacing JSON with Pybind11, but it still under-represents the simulator if you are interested in timing how fast it is applying gates.
The qasm simulator has numerous options for method and parallelization that you may want to explicitly configure. Eg:
How you report the time taken depends on what you are trying to benchmark. Aer includes a lot of overhead in its result data output. So if you are trying to profile the time of a single gate, you can get a more accurate measure of that excluding the result serialization if desired. The different ways of timing include:
backend.run
Result.time_taken
). This excludes the time in initializing and validating the Python result object from the output python dict of the simulatorResult.metadata['time_taken']
). This excludes the C++ -> Py result conversion overhead.Result.results[0].time_taken
). This excludes any overhead for validation and configuration settings in the C++ simulator, and any Py -> C++ conversion.Depending on what you are trying to show in benchmarks different timing is more important. I would argue for the Gate-level benchmarks you should show the C++ times, but for circuit level benchmarks that include results you would actually use I would show the Python time.
If you like I could put in a PR to this repo to make some of the suggested changes, but below I've included a code snippet for applying theses suggestions to a manual implementation of your X-gate benchmark:
import numpy as np
from qiskit import *
import time
import matplotlib.pyplot as plt
def native_execute(circuit, backend, backend_options):
experiment = transpile(circuit, backend) # Transpile to simulator basis gates
qobj = assemble(experiment, shots=1) # Set execution shots to 1
start = time.time()
result = backend.run(qobj, backend_options=backend_options).result()
stop = time.time()
time_py_full = stop - start # Total execution time in python
time_py_run = result.time_taken # C++ measured total execution time excluding conversion of C++ results to Py results, and Py qobj to C++ qobj
time_cpp_full = result.metadata['time_taken'] # C++ measured total execution time excluding conversion of C++ results to Py results, and Py qobj to C++ qobj
time_cpp_expr = result.results[0].time_taken # C++ measured execution time of a single circuit (ie. state init, apply gates, excludes other setup overhead for config options etc)
return time_py_full, time_py_run, time_cpp_full, time_cpp_expr
def benchmark_x(qubit_range, samples, backend_options=None):
backend = Aer.get_backend('qasm_simulator')
ts_py_full = np.zeros(len(nqs))
ts_py_run = np.zeros(len(nqs))
ts_cpp_full = np.zeros(len(nqs))
ts_cpp_exp = np.zeros(len(nqs))
for i, nq in enumerate(qubit_range):
qc = QuantumCircuit(nq)
qc.x(0)
t_py_full = 0
t_py_run = 0
t_cpp_full = 0
t_cpp_exp = 0
for _ in range(samples):
t0, t1, t2, t3 = native_execute(qc, backend, backend_options)
t_py_full += t0
t_py_run += t1
t_cpp_full += t2
t_cpp_exp += t3
# Average time in ns
ts_py_full[i] = 1e9 * t_py_full / samples
ts_py_run[i] = 1e9 * t_py_run / samples
ts_cpp_full[i] = 1e9 * t_cpp_full / samples
ts_cpp_exp[i] = 1e9 * t_cpp_exp / samples
return ts_py_full, ts_py_run, ts_cpp_full, ts_cpp_exp
# Benchark: X gate on qubit-0
backend_options = {
# Force Statevector method so stabilizer (clifford) simulator isn't used
"method": "statevector",
# Disable parallelization
"max_parallel_threads": 1,
# Stop simulator truncating to 1-qubit circuit simulations
"truncate_enable": False,
}
nqs = list(range(5, 26))
ts_py_full1, ts_py_run1, ts_cpp_full1, ts_cpp_expr1 = benchmark_x(nqs, 1000, backend_options)
plt.semilogy(nqs, ts_py_full1, 'o-', label='Python (full)')
plt.semilogy(nqs, ts_py_run1, 's-', label='Python (run-only)')
plt.semilogy(nqs, ts_cpp_full1, '^-', label='C++ (full)')
plt.semilogy(nqs, ts_cpp_expr1, 'd-', label='C++ (experiment-only)')
plt.legend()
plt.grid()
plt.savefig('aer_x_qasm_sv.pdf')
Here is an example of running the above on my laptop:
currently, we plot everything to a single picture, since this project was mainly used by our paper for Yao.jl. But given the benchmarks are growing dense, it would be necessary to have a clearer visualization on this. Idealy an interactive web page would be great.
I already have a fix, but the BM_sim_QCBM
test currently uses Qrack's simulator random number generator for random rotation angles. This happens to be enough RNG demand to empty the on-chip entropy pool. I'm converting this to use a pseudo-RNG, like the QuEST suite, and I'll add it in #53.
Thanks for your previous contribution to get the CPU benchmark correct. However, I'd like to check the following result, we recently went through the entire benchmark again since the benchmark result is very strange: the timing merely scales with the number of qubits at all. I suspect when running the benchmark of qiskit-gpu, the cudaDeviceSynchronize
is not called. even for 30 qubits, the timing is only 5ms.
on the other side, I didn't find any function call to cudaDeviceSynchronize
in the source code either: https://github.com/Qiskit/qiskit-aer/blob/master/src/simulators/statevector/qubitvector_thrust.hpp
I feel it is unlikely that thrust does the sync implicitly, but I could be wrong.
Moreover, the timing is 100x difference from what qulacs and Yao has, since qulacs and Yao are implemented independently, and their benchmark results match each other, thus I believe this problem could exist. But I'd like to get some help from you to confirm this.
FYI: even sum over a vector of size 2^30 in complex<float64>
requires 24ms.
Thanks a lot.
Roger
Hi @Codewithsk, not sure if you are busy recently, but do you know if we can trigger the runners from github now? (like what you demonstrated before) it would be nice to run that script instead of the old ones since the old ones seems not working anymore somehow
cc: @antalszava to keep you update
Sorry, I can't find the original issue that I thought raised the question, but are Qrack or PyQrack ever going to be added to these benchmarks?
Speaking as one of the authors, I'm sorry to press the issue, but it's unequivocal that Qrack performance is constant on single X, H, T, and CNOT gates at any arbitrary qubit width, due primarily to optimization via proactive and reactive Schmidt decomposition, with very loose inspiration taken from Pareto-Efficient Quantum Circuit Simulation Using Tensor Contraction Deferral. Redundantly, Qrack's extended stabilizer simulation capabilities cover these cases at well, but I think the comparison would be fair, because these and other optimizations in the default optimal Qrack "layer stack" are exact and generally universal, without limitation to Clifford group, for example. (Also, this is a case of performance on these benchmarks, not something as unlikely as general constant or linear performance, obviously, which is why Qrack has historically presented representatively general harder cases in our own benchmarks.)
I'm saying, I understand that stabilizer would break these trends without a universal gate set, but this is not the case for these benchmarks run in Qrack, which can perform this way while admitting an exactly simulated universal gate set in the same case. Is the motivation for leaving Qrack benchmarks out based on the assumption that its default optimal settings wouldn't be universal, like with Clifford? (To be clear, they are, though!)
I tried to build the Qrack experiment with sudo ./bin/benchmark setup qrack
and it failed.
I opened this issue because I couldn't find any export.h
in this project or the qrack project https://github.com/vm6502q/qrack .
[ 98%] Built target benchmarks
[100%] Linking CXX executable unittest
[100%] Built target unittest
Install the project...
-- Install configuration: ""
-- Installing: /usr/local/include/qrack/common/config.h
...
-- Installing: /usr/local/share/pkgconfig/qrack.pc
-- Installing: /usr/local/bin/qrack_cl_precompile
In file included from benchmarks.cc:1:
benchmark/include/benchmark/benchmark.h:190:10: fatal error: benchmark/export.h: No such file or directory
#include "benchmark/export.h"
^~~~~~~~~~~~~~~~~~~~
compilation terminated.
I guess this issue should be easy to fix. I'll tag @WrathfulSpatula since this issue is related to Qrack.
Is that because of the data structure of DDsim?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.