Comments (8)
For anyone who might stumble upon this:
The addition of a x86-64 micro architecture level can squeeze out some more performance, depending upon the compression level and hardware capabilities.
Benchmark 1 = plain build
Benchmark 2 = the binary linked above
Benchmark 3 = ltoed, pgoed and x86-64-v3 leveled build
Benchmark 4 = ltoed, pgoed and x86-64-v4 leveled build
Benchmark 1: mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect /tmp/tst; rm -rf /tmp/tst
Time (mean ± σ): 3.139 s ± 0.014 s [User: 3.102 s, System: 0.031 s]
Range (min … max): 3.120 s … 3.159 s 5 runs
Benchmark 2: mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect_clevert /tmp/tst; rm -rf /tmp/tst
Time (mean ± σ): 2.911 s ± 0.013 s [User: 2.878 s, System: 0.026 s]
Range (min … max): 2.895 s … 2.926 s 5 runs
Benchmark 3: mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect_v3 /tmp/tst; rm -rf /tmp/tst
Time (mean ± σ): 2.865 s ± 0.005 s [User: 2.828 s, System: 0.030 s]
Range (min … max): 2.858 s … 2.871 s 5 runs
Benchmark 4: mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect_v4 /tmp/tst; rm -rf /tmp/tst
Time (mean ± σ): 2.880 s ± 0.002 s [User: 2.843 s, System: 0.030 s]
Range (min … max): 2.878 s … 2.882 s 5 runs
Summary
mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect_v3 /tmp/tst; rm -rf /tmp/tst ran
1.01 ± 0.00 times faster than mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect_v4 /tmp/tst; rm -rf /tmp/tst
1.02 ± 0.01 times faster than mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect_clevert /tmp/tst; rm -rf /tmp/tst
1.10 ± 0.01 times faster than mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect /tmp/tst; rm -rf /tmp/tst
At default settings the difference is neglegible, if that is all you use, don't bother.
Benchmark 1: mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect -5 /tmp/tst; rm -rf /tmp/tst
Time (mean ± σ): 6.439 s ± 0.096 s [User: 6.389 s, System: 0.037 s]
Range (min … max): 6.334 s … 6.548 s 5 runs
Benchmark 2: mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect_clevert -5 /tmp/tst; rm -rf /tmp/tst
Time (mean ± σ): 5.213 s ± 0.017 s [User: 5.164 s, System: 0.037 s]
Range (min … max): 5.193 s … 5.230 s 5 runs
Benchmark 3: mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect_v3 -5 /tmp/tst; rm -rf /tmp/tst
Time (mean ± σ): 4.358 s ± 0.016 s [User: 4.307 s, System: 0.040 s]
Range (min … max): 4.340 s … 4.379 s 5 runs
Benchmark 4: mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect_v4 -5 /tmp/tst; rm -rf /tmp/tst
Time (mean ± σ): 4.258 s ± 0.010 s [User: 4.208 s, System: 0.040 s]
Range (min … max): 4.251 s … 4.276 s 5 runs
Summary
mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect_v4 -5 /tmp/tst; rm -rf /tmp/tst ran
1.02 ± 0.00 times faster than mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect_v3 -5 /tmp/tst; rm -rf /tmp/tst
1.22 ± 0.00 times faster than mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect_clevert -5 /tmp/tst; rm -rf /tmp/tst
1.51 ± 0.02 times faster than mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect -5 /tmp/tst; rm -rf /tmp/tst
Does almost as much as adding pgo did.
Benchmark 1: mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect -9 /tmp/tst; rm -rf /tmp/tst
Time (mean ± σ): 65.767 s ± 0.164 s [User: 65.578 s, System: 0.052 s]
Range (min … max): 65.602 s … 66.035 s 5 runs
Benchmark 2: mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect_clevert -9 /tmp/tst; rm -rf /tmp/tst
Time (mean ± σ): 43.676 s ± 0.030 s [User: 43.521 s, System: 0.052 s]
Range (min … max): 43.637 s … 43.711 s 5 runs
Benchmark 3: mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect_v3 -9 /tmp/tst; rm -rf /tmp/tst
Time (mean ± σ): 27.658 s ± 0.162 s [User: 27.531 s, System: 0.056 s]
Range (min … max): 27.488 s … 27.927 s 5 runs
Benchmark 4: mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect_v4 -9 /tmp/tst; rm -rf /tmp/tst
Time (mean ± σ): 25.154 s ± 0.079 s [User: 25.034 s, System: 0.054 s]
Range (min … max): 25.058 s … 25.256 s 5 runs
Summary
mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect_v4 -9 /tmp/tst; rm -rf /tmp/tst ran
1.10 ± 0.01 times faster than mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect_v3 -9 /tmp/tst; rm -rf /tmp/tst
1.74 ± 0.01 times faster than mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect_clevert -9 /tmp/tst; rm -rf /tmp/tst
2.61 ± 0.01 times faster than mkdir -p /tmp/tst; cp *.png /tmp/tst/; ./ect -9 /tmp/tst; rm -rf /tmp/tst
Quite a bump, shaves off at least 16 seconds and more than halves the time when compared to the plain build.
from efficient-compression-tool.
@ghtm2 Could you provide your binary? In that day, I tested the avx256 and avx512 build but it run even slower in my machine (AMD R5 5600U {zen3}). If enable avx will faster it's quiet a big bump! And, which CPU is used in your benchmark?
Sure, here are the v3 and v4 binaries: ect.tar.gz
You'll need at least glibc 2.38 installed though.
The CPU used is a AMD Ryzen 7 7840U, so Zen 4.
@ghtm2 Hi, did you have
nasm
installed while building the binary?
Yes.
from efficient-compression-tool.
Sorry for the glacial response times, I'm quite busy at the moment.
Yes, I've build it with GCC 14.2.1 as that is what's currently shipped on Arch.
I can also confirm, that Clang produces noticeably slower ect binaries, no matter the flags.
I've made a small howto to reproduce the build for arch and derivatives: howto.tar.gz
I'm pretty sure that there is still some performance to be had with the appropriate flags and better input for PGO.
One might also want to try to further optimize with bolt, but I currently don't have the time to try.
from efficient-compression-tool.
@ghtm2 Could you provide your binary? In that day, I tested the avx256 and avx512 build but it run even slower in my machine (AMD R5 5600U {zen3}). If enable avx will faster it's quiet a big bump! And, which CPU is used in your benchmark?
from efficient-compression-tool.
@ghtm2 Hi, did you have nasm
installed while building the binary?
from efficient-compression-tool.
@ghtm2 Awesome! Your binary is much faster, how did you do that? I append -march=x86-64-v3 -mavx2
here, but it's even slower, increase my benchmark from 48s to 1m27s, and your ect_v3
binary is 26s.
Efficient-Compression-Tool/src/CMakeLists.txt
Lines 110 to 114 in 9aabc23
And, my whole build script here, I ran build with llvm-19, did you use GCC?:
https://github.com/clevert-app/clevert/blob/main/.github/workflows/asset_zcodecs.yml#L171
I really, really want to replicate your success.
from efficient-compression-tool.
I objdump your binary, GCC 14.2.1?
from efficient-compression-tool.
I reproduced your benchmark. It's faster using GCC instead of Clang. I will try to tweak it more. Thank you!
from efficient-compression-tool.
Related Issues (20)
- Could NOT find Threads (missing: Threads_FOUND) HOT 1
- Use CI to build release binaries
- How to only strip the metadata from PNG files? HOT 1
- ECT built from source for Apple Silicon is slower than release x86/64 build HOT 6
- `-strip` with gzip does not drop original name in metadata HOT 2
- Suggestion for lossless confirmation HOT 1
- Infinite loop via FileOptimizer since 0.9.2 HOT 44
- Using '-zip' option leaves temporary '..zip' file when used on folder HOT 2
- Request: Document what the `--strict` option does HOT 1
- Is building on arm64 / aarch64 supported? HOT 2
- Access violation in merge.h in windows release build only HOT 1
- [Enhancement] Animated PNG (APNG) support HOT 4
- --disable-png and --disable-jpg are ignored when -gzip is given. HOT 2
- Per file and per block multithreading don't work correctly HOT 3
- replace functions with Array
- "bad png file" if Japanese character is in path HOT 1
- Possible equivalent of `--mt-file` when recompressing files contained zip and gzip archives HOT 1
- Please add unicode support HOT 3
- Request: Distribute via homebrew HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from efficient-compression-tool.