Giter VIP home page Giter VIP logo

Comments (4)

nigeltao avatar nigeltao commented on July 28, 2024

Is there an inherent reason for that?

The reason is that I don't use x86 (32-bit) or MSVC day-to-day, so I was conservative. We could possibly change it to

#if defined(_M_X64) || defined(_M_IX86)

although some care might be needed, if other code (PNG-related or otherwise) is depending on the 64-ness that WUFFS_BASE__CPU_ARCH__X86_64 currently guarantees.

Out of curiousity, and if profiling is not too much work for you, do you know which Wuffs functions get greatly improved performance once you define WUFFS_BASE__CPU_ARCH__X86_64 for your 32-bit MSVC-compiled program?

from wuffs.

kugelrund avatar kugelrund commented on July 28, 2024

Sure, the main differences for me are basically

  • wuffs_png__decoder__filter_4_distance_4_x86_sse42 being called instead of wuffs_png__decoder__filter_4_distance_4_fallback
  • wuffs_png__decoder__filter_4_distance_3_x86_sse42 being called instead of wuffs_png__decoder__filter_4_distance_3_fallback
  • wuffs_adler32__hasher__up_x86_sse42 being called instead of wuffs_adler32__hasher__up__choosy_default
  • wuffs_crc32__ieee_hasher__up_x86_avx2 being called instead of wuffs_crc32__ieee_hasher__up__choosy_default

The filter_4_distance_4 function takes the majority of the time for me, so that being faster is the main win. Perhaps I should have mentioned that I need RGBA output, I suppose that determines which filter function is necessary.

Here is an example profile (with only wuffs functions listed), before adding || defined(_M_IX86):

Function Name	                                       Total CPU [unit, %]  Self CPU [unit, %]
wuffs_png__decoder__decode_frame                           31681 (75,78 %)         1 ( 0,00 %)
wuffs_png__decoder__filter_and_swizzle                     19626 (46,95 %)         0 ( 0,00 %)
wuffs_png__decoder__filter_and_swizzle__choosy_default     19619 (46,93 %)        30 ( 0,07 %)
wuffs_png__decoder__filter_4                               18324 (43,83 %)         9 ( 0,02 %)
wuffs_png__decoder__filter_4_distance_4_fallback           16389 (39,20 %)     16387 (39,20 %)
wuffs_png__decoder__decode_pass                            12053 (28,83 %)         1 ( 0,00 %)
wuffs_zlib__decoder__transform_io                          11598 (27,74 %)         3 ( 0,01 %)
wuffs_deflate__decoder__transform_io                       10147 (24,27 %)         0 ( 0,00 %)
wuffs_deflate__decoder__decode_blocks                      10109 (24,18 %)         2 ( 0,00 %)
wuffs_deflate__decoder__decode_huffman_fast32               9793 (23,43 %)      9515 (22,76 %)
wuffs_png__decoder__filter_4_distance_3_fallback            1928 ( 4,61 %)      1928 ( 4,61 %)
wuffs_adler32__hasher__update_u32                           1447 ( 3,46 %)         1 ( 0,00 %)
wuffs_adler32__hasher__up                                   1446 ( 3,46 %)         0 ( 0,00 %)
wuffs_adler32__hasher__up__choosy_default                   1446 ( 3,46 %)      1446 ( 3,46 %)
wuffs_base__pixel_swizzler__swizzle_interleaved_from_slice  1263 ( 3,02 %)         5 ( 0,01 %)
wuffs_crc32__ieee_hasher__update_u32                         566 ( 1,35 %)         4 ( 0,01 %)
wuffs_crc32__ieee_hasher__up                                 562 ( 1,34 %)         0 ( 0,00 %)
wuffs_crc32__ieee_hasher__up__choosy_default                 562 ( 1,34 %)       562 ( 1,34 %)
wuffs_base__pixel_swizzler__bgrw__bgr                        510 ( 1,22 %)       451 ( 1,08 %)
wuffs_deflate__decoder__init_dynamic_huffman                 306 ( 0,73 %)        86 ( 0,21 %)
wuffs_deflate__decoder__init_huff                            257 ( 0,61 %)       257 ( 0,61 %)
wuffs_deflate__decoder__init_fixed_huffman                    40 ( 0,10 %)         2 ( 0,00 %)

And after adding || defined(_M_IX86):

Function Name	                                       Total CPU [unit, %]  Self CPU [unit, %]
wuffs_png__decoder__decode_frame                           16978 (63,48 %)         0 ( 0,00 %)
wuffs_png__decoder__decode_pass                            10508 (39,29 %)         1 ( 0,00 %)
wuffs_zlib__decoder__transform_io                          10437 (39,02 %)         1 ( 0,00 %)
wuffs_deflate__decoder__transform_io                       10087 (37,71 %)         1 ( 0,00 %)
wuffs_deflate__decoder__decode_blocks                      10036 (37,52 %)         2 ( 0,01 %)
wuffs_deflate__decoder__decode_huffman_fast32               9712 (36,31 %)      9389 (35,10 %)
wuffs_png__decoder__filter_and_swizzle	                    6468 (24,18 %)         0 ( 0,00 %)
wuffs_png__decoder__filter_and_swizzle__choosy_default      6466 (24,18 %)        23 ( 0,09 %)
wuffs_png__decoder__filter_4                                5203 (19,45 %)         7 ( 0,03 %)
wuffs_png__decoder__filter_4_distance_4_x86_sse42           4232 (15,82 %)      4231 (15,82 %)
wuffs_base__pixel_swizzler__swizzle_interleaved_from_slice  1237 ( 4,62 %)         8 ( 0,03 %)
wuffs_png__decoder__filter_4_distance_3_x86_sse42            964 ( 3,60 %)       964 ( 3,60 %)
wuffs_base__pixel_swizzler__bgrw__bgr                        515 ( 1,93 %)       442 ( 1,65 %)
wuffs_adler32__hasher__up                                    349 ( 1,30 %)         0 ( 0,00 %)
wuffs_adler32__hasher__update_u32                            349 ( 1,30 %)         0 ( 0,00 %)
wuffs_adler32__hasher__up_x86_sse42                          349 ( 1,30 %)       349 ( 1,30 %)
wuffs_deflate__decoder__init_dynamic_huffman                 316 ( 1,18 %)        78 ( 0,29 %)
wuffs_deflate__decoder__init_huff                            283 ( 1,06 %)       283 ( 1,06 %)
wuffs_crc32__ieee_hasher__update_u32                          92 ( 0,34 %)         7 ( 0,03 %)
wuffs_crc32__ieee_hasher__up                                  85 ( 0,32 %)         0 ( 0,00 %)
wuffs_crc32__ieee_hasher__up_x86_avx2                         85 ( 0,32 %)        85 ( 0,32 %)
wuffs_deflate__decoder__init_fixed_huffman                    50 ( 0,19 %)         1 ( 0,00 %)

from wuffs.

nigeltao avatar nigeltao commented on July 28, 2024

It's fixed for release/c/wuffs-unsupported-snapshot.c. It'll be fixed in release/c/wuffs-v0.3.c whenever the next beta (version 0.3.0-beta.10) gets minted.

In case anyone else from the future is trying something similar, I presume you are configuring MSVC with /arch:AVX.

from wuffs.

kugelrund avatar kugelrund commented on July 28, 2024

Works great, thank you very much.

In case anyone else from the future is trying something similar, I presume you are configuring MSVC with /arch:AVX.

Yes indeed.

from wuffs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.