Giter VIP home page Giter VIP logo

Comments (8)

ghtm2 avatar ghtm2 commented on September 14, 2024 2

For anyone who might stumble upon this:
The addition of a x86-64 micro architecture level can squeeze out some more performance, depending upon the compression level and hardware capabilities.

Benchmark 1 = plain build
Benchmark 2 = the binary linked above
Benchmark 3 = ltoed, pgoed and x86-64-v3 leveled build
Benchmark 4 = ltoed, pgoed and x86-64-v4 leveled build

Benchmark 1: mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect /tmp/tst; rm -rf /tmp/tst
  Time (mean ± σ):      3.139 s ±  0.014 s    [User: 3.102 s, System: 0.031 s]
  Range (min … max):    3.120 s …  3.159 s    5 runs
 
Benchmark 2: mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect_clevert /tmp/tst; rm -rf /tmp/tst
  Time (mean ± σ):      2.911 s ±  0.013 s    [User: 2.878 s, System: 0.026 s]
  Range (min … max):    2.895 s …  2.926 s    5 runs
 
Benchmark 3: mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect_v3 /tmp/tst; rm -rf /tmp/tst
  Time (mean ± σ):      2.865 s ±  0.005 s    [User: 2.828 s, System: 0.030 s]
  Range (min … max):    2.858 s …  2.871 s    5 runs
 
Benchmark 4: mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect_v4 /tmp/tst; rm -rf /tmp/tst
  Time (mean ± σ):      2.880 s ±  0.002 s    [User: 2.843 s, System: 0.030 s]
  Range (min … max):    2.878 s …  2.882 s    5 runs
 
Summary
  mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect_v3 /tmp/tst; rm -rf /tmp/tst ran
    1.01 ± 0.00 times faster than mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect_v4 /tmp/tst; rm -rf /tmp/tst
    1.02 ± 0.01 times faster than mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect_clevert /tmp/tst; rm -rf /tmp/tst
    1.10 ± 0.01 times faster than mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect /tmp/tst; rm -rf /tmp/tst

At default settings the difference is neglegible, if that is all you use, don't bother.

Benchmark 1: mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect -5 /tmp/tst; rm -rf /tmp/tst
  Time (mean ± σ):      6.439 s ±  0.096 s    [User: 6.389 s, System: 0.037 s]
  Range (min … max):    6.334 s …  6.548 s    5 runs
 
Benchmark 2: mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect_clevert -5 /tmp/tst; rm -rf /tmp/tst
  Time (mean ± σ):      5.213 s ±  0.017 s    [User: 5.164 s, System: 0.037 s]
  Range (min … max):    5.193 s …  5.230 s    5 runs
 
Benchmark 3: mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect_v3 -5 /tmp/tst; rm -rf /tmp/tst
  Time (mean ± σ):      4.358 s ±  0.016 s    [User: 4.307 s, System: 0.040 s]
  Range (min … max):    4.340 s …  4.379 s    5 runs
 
Benchmark 4: mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect_v4 -5 /tmp/tst; rm -rf /tmp/tst
  Time (mean ± σ):      4.258 s ±  0.010 s    [User: 4.208 s, System: 0.040 s]
  Range (min … max):    4.251 s …  4.276 s    5 runs
 
Summary
  mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect_v4 -5 /tmp/tst; rm -rf /tmp/tst ran
    1.02 ± 0.00 times faster than mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect_v3 -5 /tmp/tst; rm -rf /tmp/tst
    1.22 ± 0.00 times faster than mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect_clevert -5 /tmp/tst; rm -rf /tmp/tst
    1.51 ± 0.02 times faster than mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect -5 /tmp/tst; rm -rf /tmp/tst

Does almost as much as adding pgo did.

Benchmark 1: mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect -9 /tmp/tst; rm -rf /tmp/tst
  Time (mean ± σ):     65.767 s ±  0.164 s    [User: 65.578 s, System: 0.052 s]
  Range (min … max):   65.602 s … 66.035 s    5 runs
 
Benchmark 2: mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect_clevert -9 /tmp/tst; rm -rf /tmp/tst
  Time (mean ± σ):     43.676 s ±  0.030 s    [User: 43.521 s, System: 0.052 s]
  Range (min … max):   43.637 s … 43.711 s    5 runs
 
Benchmark 3: mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect_v3 -9 /tmp/tst; rm -rf /tmp/tst
  Time (mean ± σ):     27.658 s ±  0.162 s    [User: 27.531 s, System: 0.056 s]
  Range (min … max):   27.488 s … 27.927 s    5 runs
 
Benchmark 4: mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect_v4 -9 /tmp/tst; rm -rf /tmp/tst
  Time (mean ± σ):     25.154 s ±  0.079 s    [User: 25.034 s, System: 0.054 s]
  Range (min … max):   25.058 s … 25.256 s    5 runs
 
Summary
  mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect_v4 -9 /tmp/tst; rm -rf /tmp/tst ran
    1.10 ± 0.01 times faster than mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect_v3 -9 /tmp/tst; rm -rf /tmp/tst
    1.74 ± 0.01 times faster than mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect_clevert -9 /tmp/tst; rm -rf /tmp/tst
    2.61 ± 0.01 times faster than mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect -9 /tmp/tst; rm -rf /tmp/tst

Quite a bump, shaves off at least 16 seconds and more than halves the time when compared to the plain build.

from efficient-compression-tool.

ghtm2 avatar ghtm2 commented on September 14, 2024 2

@ghtm2 Could you provide your binary? In that day, I tested the avx256 and avx512 build but it run even slower in my machine (AMD R5 5600U {zen3}). If enable avx will faster it's quiet a big bump! And, which CPU is used in your benchmark?

Sure, here are the v3 and v4 binaries: ect.tar.gz
You'll need at least glibc 2.38 installed though.
The CPU used is a AMD Ryzen 7 7840U, so Zen 4.

@ghtm2 Hi, did you have nasm installed while building the binary?

Yes.

from efficient-compression-tool.

ghtm2 avatar ghtm2 commented on September 14, 2024 2

Sorry for the glacial response times, I'm quite busy at the moment.

Yes, I've build it with GCC 14.2.1 as that is what's currently shipped on Arch.
I can also confirm, that Clang produces noticeably slower ect binaries, no matter the flags.

I've made a small howto to reproduce the build for arch and derivatives: howto.tar.gz

I'm pretty sure that there is still some performance to be had with the appropriate flags and better input for PGO.
One might also want to try to further optimize with bolt, but I currently don't have the time to try.

from efficient-compression-tool.

kkocdko avatar kkocdko commented on September 14, 2024

@ghtm2 Could you provide your binary? In that day, I tested the avx256 and avx512 build but it run even slower in my machine (AMD R5 5600U {zen3}). If enable avx will faster it's quiet a big bump! And, which CPU is used in your benchmark?

from efficient-compression-tool.

kkocdko avatar kkocdko commented on September 14, 2024

@ghtm2 Hi, did you have nasm installed while building the binary?

from efficient-compression-tool.

kkocdko avatar kkocdko commented on September 14, 2024

@ghtm2 Awesome! Your binary is much faster, how did you do that? I append -march=x86-64-v3 -mavx2 here, but it's even slower, increase my benchmark from 48s to 1m27s, and your ect_v3 binary is 26s.

if (CMAKE_CXX_COMPILER_ID STREQUAL "GNU" OR CMAKE_CXX_COMPILER_ID STREQUAL "Clang"
OR CMAKE_CXX_COMPILER_ID STREQUAL "AppleClang" OR CMAKE_CXX_COMPILER_ID STREQUAL "ARMClang")
if(CPU_TYPE STREQUAL "x86_64" OR CPU_TYPE STREQUAL "i386")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mpclmul -msse4.2")
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -mpclmul -msse4.2")

And, my whole build script here, I ran build with llvm-19, did you use GCC?:

https://github.com/clevert-app/clevert/blob/main/.github/workflows/asset_zcodecs.yml#L171

I really, really want to replicate your success.

from efficient-compression-tool.

kkocdko avatar kkocdko commented on September 14, 2024

I objdump your binary, GCC 14.2.1?

from efficient-compression-tool.

kkocdko avatar kkocdko commented on September 14, 2024

I reproduced your benchmark. It's faster using GCC instead of Clang. I will try to tweak it more. Thank you!

from efficient-compression-tool.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.