This is an issue with choice to use NVTT, but perhaps a better/faster replacement for

BC7 perf is so fast now, that I had to verify that my s were not still generatin

BC7 encoder is very slow about cuttlefish HOT 15 CLOSED

akb825 commented on June 18, 2024

BC7 encoder is very slow

from cuttlefish.

Comments (15)

akb825 commented on June 18, 2024

To provide some numbers, compressing a large image took:

BC1_RGBA: 1.4 s 1 thread, 0.24 s 12 threads
BC6H: 28.9 s 1 thread, 4.9 s 12 threads
BC7: 31 m 51 s 1 thread, 7 m 57 s 12 threads

Making sure you use the -j option to use multithreading can at least mitigate the issue, but the NVTT BC7 compressor looks to be largely brute force: it evaluates all 8 modes and takes the best result, and each individual mode appears to be quite slow as well.

I added my own issue to the nvidia-texture-tools project. (castano/nvidia-texture-tools#327) I'll have to see what other options are available for compressors that would be possible to integrate.

from cuttlefish.

alecazam commented on June 18, 2024

Yowsa, those are some terrible timings. No fault to you, I really think Cuttlefish is a great little cross-platform tool, and you've incorporated so many libs into one. Maybe a way to select squish vs. nvtt vs. DirectX tex version of encoders since you have them all, but maybe I didn't find that option yet.

So I found that DirectX Texture has cpu and gpu (compute) based encoder. NVTT's CUDA BC encoders all seem in total disrepair like they're were in the middle of a refactor 4 years ago, and left commented out and broken. Maybe the BC1 encoder is still functional. And finally Rich Geldrich has a fast cpu (mostly for rgb) BC7 2 mode encoder that looks great.

I have to run with jobs set to 1, since multiple textures are in flight when being built. So I'm looking for faster encoders always.

from cuttlefish.

akb825 commented on June 18, 2024

Unfortunately, squish doesn't support BC7 compression. The DirectXTex library looks like it's only designed to work on Windows. I would also be reluctant to use anything CUDA based, since it would require a NVidia GPU to run.

By Rich Geldrich's implementation are you referring to https://github.com/richgel999/bc7enc16? While incomplete (only supporting modes 1 and 6), based on the readme it looks like those are the most important modes for textures in practice. Based on this, I think I can add some logic to choose between the bc7enc16 and NVTT implementation based on quality level and alpha values inside the block.

Thanks for pointing that out!

from cuttlefish.

alecazam commented on June 18, 2024

Oh, sorry, I thought you already had libsquish in there. That's a BC1-BC3 compressor from days of old. DXTex would have to be ported, yes, but at least it's compute instead of hacked up CUDA code like in NVTT. Of course, there's only one GPU vs. 4-8 CPU cores, but just seems like that might be one of the few ways to dramatically speed up BC7 encoding. But dealing with the cross-platform compute isn't so simple these days. Rich also has Basis which is using a lot of that technology for transcoding.

Also PVRTexTool is adding BC compressors in the September release to Mac/Linux, so that will mean that tool can handle the major compressed formats then.

from cuttlefish.

akb825 commented on June 18, 2024

NVTT has squish embedded in it, and I use the same logic as their tools to choose between the squish compressors and the NVidia compressors based on quality level and block layout. My point was it won't help with accelerating BC7 compression since the block format is completely different.

from cuttlefish.

alecazam commented on June 18, 2024

Nice, so many of these other encoders have bitrotted, that I didn't expect a change so fast. Thanks for doing that! Btw, I put up a patch for EtcLib (etc2comp) here. I didn't see what your Etc solution was (NVTT?), but this collects several fixes from various issues that never landed there. Yet one more attempt at speeding up Etc2 generation via EtcTool.

google/etc2comp#49

from cuttlefish.

akb825 commented on June 18, 2024

I have integrated bv7enc16 in version 2.1.0. It will always use bc7enc16 for normal or lower qualities, choose between bc7enc16 and NVTT on high quality based on the range of alpha values in the block (so it can make a different decision between blocks), and always uses NVTT for highest quality.

The image I gave numbers for earlier now takes ~1.7 s to convert to BC7 with normal quality.

from cuttlefish.

akb825 commented on June 18, 2024

For ETC, I'm using etc2comp. It looks like all of the patches you posted are for various utilities for the tool portion, such as mipmap generation and proper management of sRGB inputs and outputs. However, I handle all these tasks separately within Cuttlefish, and only use the library for the block compression, so those bugs shouldn't affect my usage of the library. I did make a fork to fix a memory leak, though. (google/etc2comp#47)

from cuttlefish.

alecazam commented on June 18, 2024

BC7 perf is so fast now, that I had to verify that my scripts were not still generating BC3. Thanks for the rapid integration to the tool, and to Rich Geldreich for the open-source on the codec.

from cuttlefish.

alecazam commented on June 18, 2024

Just wanted to let you know that BC7enc16 only supports 2 modes - 1 opaque and 1 for alpha. There's a new release of that, but I adopted Bcenc which is a little older but has more modes. These are both by Rich Geldreich. I noted some alpha artifacts with cuttlefish bc7 when it used BC7enc16, but don't have a file to supply you with unfortunately. It may be as simple as pulling the latest release, but I don't recall if Rich had a different repo.

from cuttlefish.

akb825 commented on June 18, 2024

Thanks for the info. I was aware of the limitations of bc7enc16, but when searching for Rich Geldreich's BC7 encoder that's the one I found and didn't realize he had a newer implementation with more modes. I have swapped out for bc7enc, and the nvidia-texture-tools implementation is reserved for the "highest" quality setting.

Another change I made was to enable perceptual weighting for sRGB textures with "normal" quality. It's a bit slower than linear weights, but is still 2x faster than the "high" quality, so I felt it was a better tradeoff for speed and quality. (when testing with a large image, it was ~1.5 s for normal + linear, ~2.5 s for normal + perceptual, and ~5 s for high + perceptual) l Since you've used it with more real-world situations then myself, let me know if you find the speed tradeoff isn't worth it.

from cuttlefish.

richgel999 commented on June 18, 2024

We released our best BC7 encoder here:
https://github.com/BinomialLLC/bc7e

It's 2-3x faster than ispc_texcomp BC7 at the same avg. PSNR.
Also check out;
https://github.com/richgel999/bc7enc_rdo

from cuttlefish.

richgel999 commented on June 18, 2024

The encoders in NVTT and DirectXTex are extremely slow and dated, BTW.

from cuttlefish.

akb825 commented on June 18, 2024

Thanks for the info, I'll take a look into what I can integrate. I wouldn't complain if I can get rid of the NVTT dependency, especially since I see it's now been archived.

I'm a little wary of the ispc version due to the added dependency. I'd have to see how complicated it would be to integrate across both automated and local builds, though it may be limited in terms of what platforms it can be used on. M1 Macs come to mind, but I somewhat doubt that the GPU supports the format anyway, so falling back to the lower quality bc7enc.cpp implementation in that case may be moot.

When looking at the BC1-7 encoders in the bc7enc_rdo repository, I noticed that BC2 and BC6H are missing. There's various other options for BC2, but do you have any recommendations for a BC6H replacement? The only other implementation I could find was Compressonator, which fortunately doesn't look too difficult to extract the part I need.

Speaking of Compressonator, do you have any opinions on its BC7 implementation? It appears to have support for all modes, though obviously that doesn't do it much good if it's very slow or inaccurate. I might try a hybrid approach similar to what I do right now with NVTT based on quality settings if it's slower but still good quality, which would have the benefit of not needing extra build dependencies or be tied to x86.

from cuttlefish.

akb825 commented on June 18, 2024

@alecazam , @richgel999 I'm still putting on the finishing touches (making sure it builds on all platforms, installing ISPC for the automated builds) but here's the final setup I have for the BC formats:

BC1_RGB: rgbcx with full 3-color support.
BC1_RGBA: opaque blocks use rgbcx with 3-color black disabled. Transparent blocks use squish instead. (Compressionator has the interface for alpha support, but it's disabled internally...)
BC2: rgbcx for the color block, manually sets the 4-bit alpha values for the alpha block.
BC3: rgbcx in all situations.
BC4/5: rgbcx for unsigned blocks, Compressonator for signed blocks.
BC6H: ispc_texcomp for ufloat when ISPC is available. Uses Compressonator for signed floats, as well as ufloat when ISPC isn't available.
BC7: bc7e when ISPC is available, bc7enc when it's not.

This will be available once I release version 2.5.0.

from cuttlefish.

BC7 encoder is very slow about cuttlefish HOT 15 CLOSED

Comments (15)

Related Issues (17)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent