Comments (15)
To provide some numbers, compressing a large image took:
- BC1_RGBA: 1.4 s 1 thread, 0.24 s 12 threads
- BC6H: 28.9 s 1 thread, 4.9 s 12 threads
- BC7: 31 m 51 s 1 thread, 7 m 57 s 12 threads
Making sure you use the -j option to use multithreading can at least mitigate the issue, but the NVTT BC7 compressor looks to be largely brute force: it evaluates all 8 modes and takes the best result, and each individual mode appears to be quite slow as well.
I added my own issue to the nvidia-texture-tools project. (castano/nvidia-texture-tools#327) I'll have to see what other options are available for compressors that would be possible to integrate.
from cuttlefish.
Yowsa, those are some terrible timings. No fault to you, I really think Cuttlefish is a great little cross-platform tool, and you've incorporated so many libs into one. Maybe a way to select squish vs. nvtt vs. DirectX tex version of encoders since you have them all, but maybe I didn't find that option yet.
So I found that DirectX Texture has cpu and gpu (compute) based encoder. NVTT's CUDA BC encoders all seem in total disrepair like they're were in the middle of a refactor 4 years ago, and left commented out and broken. Maybe the BC1 encoder is still functional. And finally Rich Geldrich has a fast cpu (mostly for rgb) BC7 2 mode encoder that looks great.
I have to run with jobs set to 1, since multiple textures are in flight when being built. So I'm looking for faster encoders always.
from cuttlefish.
Unfortunately, squish doesn't support BC7 compression. The DirectXTex library looks like it's only designed to work on Windows. I would also be reluctant to use anything CUDA based, since it would require a NVidia GPU to run.
By Rich Geldrich's implementation are you referring to https://github.com/richgel999/bc7enc16? While incomplete (only supporting modes 1 and 6), based on the readme it looks like those are the most important modes for textures in practice. Based on this, I think I can add some logic to choose between the bc7enc16 and NVTT implementation based on quality level and alpha values inside the block.
Thanks for pointing that out!
from cuttlefish.
Oh, sorry, I thought you already had libsquish in there. That's a BC1-BC3 compressor from days of old. DXTex would have to be ported, yes, but at least it's compute instead of hacked up CUDA code like in NVTT. Of course, there's only one GPU vs. 4-8 CPU cores, but just seems like that might be one of the few ways to dramatically speed up BC7 encoding. But dealing with the cross-platform compute isn't so simple these days. Rich also has Basis which is using a lot of that technology for transcoding.
Also PVRTexTool is adding BC compressors in the September release to Mac/Linux, so that will mean that tool can handle the major compressed formats then.
from cuttlefish.
NVTT has squish embedded in it, and I use the same logic as their tools to choose between the squish compressors and the NVidia compressors based on quality level and block layout. My point was it won't help with accelerating BC7 compression since the block format is completely different.
from cuttlefish.
Nice, so many of these other encoders have bitrotted, that I didn't expect a change so fast. Thanks for doing that! Btw, I put up a patch for EtcLib (etc2comp) here. I didn't see what your Etc solution was (NVTT?), but this collects several fixes from various issues that never landed there. Yet one more attempt at speeding up Etc2 generation via EtcTool.
from cuttlefish.
I have integrated bv7enc16 in version 2.1.0. It will always use bc7enc16 for normal or lower qualities, choose between bc7enc16 and NVTT on high quality based on the range of alpha values in the block (so it can make a different decision between blocks), and always uses NVTT for highest quality.
The image I gave numbers for earlier now takes ~1.7 s to convert to BC7 with normal quality.
from cuttlefish.
For ETC, I'm using etc2comp. It looks like all of the patches you posted are for various utilities for the tool portion, such as mipmap generation and proper management of sRGB inputs and outputs. However, I handle all these tasks separately within Cuttlefish, and only use the library for the block compression, so those bugs shouldn't affect my usage of the library. I did make a fork to fix a memory leak, though. (google/etc2comp#47)
from cuttlefish.
BC7 perf is so fast now, that I had to verify that my scripts were not still generating BC3. Thanks for the rapid integration to the tool, and to Rich Geldreich for the open-source on the codec.
from cuttlefish.
Just wanted to let you know that BC7enc16 only supports 2 modes - 1 opaque and 1 for alpha. There's a new release of that, but I adopted Bcenc which is a little older but has more modes. These are both by Rich Geldreich. I noted some alpha artifacts with cuttlefish bc7 when it used BC7enc16, but don't have a file to supply you with unfortunately. It may be as simple as pulling the latest release, but I don't recall if Rich had a different repo.
from cuttlefish.
Thanks for the info. I was aware of the limitations of bc7enc16, but when searching for Rich Geldreich's BC7 encoder that's the one I found and didn't realize he had a newer implementation with more modes. I have swapped out for bc7enc, and the nvidia-texture-tools implementation is reserved for the "highest" quality setting.
Another change I made was to enable perceptual weighting for sRGB textures with "normal" quality. It's a bit slower than linear weights, but is still 2x faster than the "high" quality, so I felt it was a better tradeoff for speed and quality. (when testing with a large image, it was ~1.5 s for normal + linear, ~2.5 s for normal + perceptual, and ~5 s for high + perceptual) l Since you've used it with more real-world situations then myself, let me know if you find the speed tradeoff isn't worth it.
from cuttlefish.
We released our best BC7 encoder here:
https://github.com/BinomialLLC/bc7e
It's 2-3x faster than ispc_texcomp BC7 at the same avg. PSNR.
Also check out;
https://github.com/richgel999/bc7enc_rdo
from cuttlefish.
The encoders in NVTT and DirectXTex are extremely slow and dated, BTW.
from cuttlefish.
Thanks for the info, I'll take a look into what I can integrate. I wouldn't complain if I can get rid of the NVTT dependency, especially since I see it's now been archived.
I'm a little wary of the ispc version due to the added dependency. I'd have to see how complicated it would be to integrate across both automated and local builds, though it may be limited in terms of what platforms it can be used on. M1 Macs come to mind, but I somewhat doubt that the GPU supports the format anyway, so falling back to the lower quality bc7enc.cpp implementation in that case may be moot.
When looking at the BC1-7 encoders in the bc7enc_rdo repository, I noticed that BC2 and BC6H are missing. There's various other options for BC2, but do you have any recommendations for a BC6H replacement? The only other implementation I could find was Compressonator, which fortunately doesn't look too difficult to extract the part I need.
Speaking of Compressonator, do you have any opinions on its BC7 implementation? It appears to have support for all modes, though obviously that doesn't do it much good if it's very slow or inaccurate. I might try a hybrid approach similar to what I do right now with NVTT based on quality settings if it's slower but still good quality, which would have the benefit of not needing extra build dependencies or be tied to x86.
from cuttlefish.
@alecazam , @richgel999 I'm still putting on the finishing touches (making sure it builds on all platforms, installing ISPC for the automated builds) but here's the final setup I have for the BC formats:
- BC1_RGB: rgbcx with full 3-color support.
- BC1_RGBA: opaque blocks use rgbcx with 3-color black disabled. Transparent blocks use squish instead. (Compressionator has the interface for alpha support, but it's disabled internally...)
- BC2: rgbcx for the color block, manually sets the 4-bit alpha values for the alpha block.
- BC3: rgbcx in all situations.
- BC4/5: rgbcx for unsigned blocks, Compressonator for signed blocks.
- BC6H: ispc_texcomp for ufloat when ISPC is available. Uses Compressonator for signed floats, as well as ufloat when ISPC isn't available.
- BC7: bc7e when ISPC is available, bc7enc when it's not.
This will be available once I release version 2.5.0.
from cuttlefish.
Related Issues (17)
- Configure prints "FreeImage not found" and "PVRTexLib not found" even though all git submodules are present HOT 2
- missing submodule commit for libsquish HOT 2
- image from TGA RGBA -> BC7 getting a weird transformation HOT 3
- Many tests fail: error: couldn't load image 'texture.png' HOT 14
- Many tests fail: error: couldn't load image 'texture.png' HOT 1
- Many tests fail: error: couldn't load image 'texture.png' HOT 11
- Failure to write when output folder doesn't exist. HOT 7
- Artifacts in alpha channel with BC1_RGB compression HOT 2
- error: file format DDS doesn't support format A8B8G8R8 with type (*insert any*) HOT 1
- Typo in when using "--help" to see list of commands. HOT 1
- --swizzle channel ordering not working as expected. HOT 3
- PVRTC needs a squarepo2 option HOT 3
- BC1 vs. BC7 HOT 1
- Texture::maxMipmapLevels returns (seemingly) incorrect value for non-square images HOT 2
- error: cannot initialize return object of type 'BOOL' (aka 'int') with an rvalue of type 'nullptr_t' HOT 2
- error: unknown type name 'CMP_Vec3ui' HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cuttlefish.