Giter VIP home page Giter VIP logo

Comments (11)

huonw avatar huonw commented on July 22, 2024 5

There might be small amounts of data in the .rodata section if the dep uses a lot of &'static strs

I don't think "small" or a focus on string literals is quite right: there's quite a few crates with large static tables, e.g. unicode properties, text encoding, cached computations for performance (~100K), time-zone look-ups (~270K). The first two particularly turn up in central crates, with a lot of dependent crates. I personally have a few crates with large tables like the above, and it would be nice if they were highlighted in cargo bloat output, so I can easily understand how much effort I should put into optimising them.

For a specific example, https://crates.io/crates/encoding_rs seems to have ~320K of static tables itself, and the ~120K which end up in the final binary of ripgrep are about 3% of the stripped binary size (for cargo build --release with the default features). This means the bloat-contribution of encoding_rs is dramatically underestimated: the 35K that cargo bloat --crates --release lists seems to be at least ~4× smaller than the actual value. In total, on Mac, the various __const sections make up nearly 20% of ripgrep's stripped size:

$ size -A -t -d rg-stripped | sort -n -k 2
rg-stripped  :
section                 size         addr
__mod_init_func            8   4298888344
__nl_symbol_ptr           16   4298887168
__got                     56   4298887184
__thread_bss             192   4299084184
__thread_vars            240   4299082408
__thread_data            368   4299083816
__bss                    724   4299087136
__stubs                  828   4297882320
__la_symbol_ptr         1104   4298887240
__data                  1160   4299082656
__stub_helper           1396   4297883148
__common                2744   4299084384
__cstring              10444   4298546272
__unwind_info          35932   4298556716
__gcc_except_tab      105724   4297884544
__const               194056   4298888352
__eh_frame            294496   4298592648
__const               556000   4297990272
__text               2911152   4294971168
Total                4116640

The code is still the largest, but it's not as completely one-sided as that comment suggests.

A more extreme (and slightly less "real-world") example of something using encoding_rs is https://github.com/hsivonen/recode_rs, where various tables end up being 30% of the (non-stripped!) binary size, which is something that bloaty (the encoding_rs::data symbols) and size (the __const sections) both highlight, but cargo bloat doesn't at the moment:

$ cargo bloat --release -n 10
Compiling ...
Analyzing target/release/recode_rs

 File  .text     Size       Crate Name
27.7%  61.3% 302.3KiB             [760 Others]
 3.4%   7.6%  37.7KiB encoding_rs encoding_rs::variant::VariantEncoder::encode_from_utf8_raw
 3.4%   7.4%  36.7KiB encoding_rs encoding_rs::variant::VariantEncoder::encode_from_utf16_raw
 2.9%   6.5%  32.2KiB encoding_rs encoding_rs::variant::VariantDecoder::decode_to_utf8_raw
 2.6%   5.9%  28.9KiB encoding_rs encoding_rs::variant::VariantDecoder::decode_to_utf16_raw
 1.0%   2.2%  10.8KiB     getopts getopts::Options::parse
 1.0%   2.1%  10.5KiB   [Unknown] _read_line_info
 0.9%   1.9%   9.4KiB   [Unknown] _stats_arena_print
 0.8%   1.9%   9.2KiB         std std::sys_common::backtrace::output
 0.7%   1.6%   7.9KiB         std _je_stats_print
 0.7%   1.6%   7.8KiB         std _je_mallocx
45.1% 100.0% 493.3KiB             .text section size, the file size is 1.1MiB
$ size -A -t -d target/release/recode_rs | sort -k 2 -n
section                size         addr
target/release/recode_rs  :
__mod_init_func           8   4295852808
__nl_symbol_ptr          16   4295852032
__got                    40   4295852048
__thread_data            48   4295880392
__thread_vars            96   4295880296
__thread_bss            104   4295880440
__bss                   468   4295880544
__stubs                 540   4295472448
__la_symbol_ptr         720   4295852088
__stub_helper           916   4295472988
__data                  968   4295879328
__common               2600   4295881024
__unwind_info          2784   4295832924
__gcc_except_tab       3460   4295473904
__cstring             10444   4295822480
__eh_frame            16288   4295835712
__const               26512   4295852816
__const              345104   4295477376
__text               501872   4294970576
Total                912988
$ bloaty -d symbols -n 10 target/release/recode_rs
     VM SIZE                                                                                      FILE SIZE
 --------------                                                                                --------------
  57.4%   631Ki [1226 Others]                                                                    627Ki  57.4%
  12.0%   131Ki [__LINKEDIT]                                                                     128Ki  11.8%
   3.7%  41.0Ki encoding_rs::data::BIG5_UNIFIED_IDEOGRAPH_BYTES::hb0140e06e71bb28e              41.0Ki   3.8%
   3.7%  40.9Ki encoding_rs::data::GBK_HANZI_BYTES::h59ac8ab5fd0c5593                           40.9Ki   3.7%
   3.7%  40.9Ki encoding_rs::data::JIS0208_KANJI_BYTES::hc7e4b09543cf47f7                       40.9Ki   3.7%
   3.7%  40.9Ki encoding_rs::data::KSX1001_UNIFIED_HANJA_BYTES::h11b55d33a7050459               40.9Ki   3.7%
   3.4%  37.8Ki encoding_rs::variant::VariantEncoder::encode_from_utf8_raw::h791216902f079374   37.8Ki   3.5%
   3.4%  36.9Ki encoding_rs::data::BIG5_LOW_BITS::h7673f2a02219b92a                             36.9Ki   3.4%
   3.3%  36.8Ki encoding_rs::variant::VariantEncoder::encode_from_utf16_raw::hee0803b3fb2af4f3  36.8Ki   3.4%
   2.9%  32.3Ki encoding_rs::variant::VariantDecoder::decode_to_utf8_raw::hf798d70e99362630     32.3Ki   3.0%
   2.6%  29.0Ki encoding_rs::variant::VariantDecoder::decode_to_utf16_raw::h3df9b02e7fbdf4b7    29.0Ki   2.7%
 100.0%  1.07Mi TOTAL                                                                           1.07Mi 100.0%

from cargo-bloat.

kbknapp avatar kbknapp commented on July 22, 2024 3

what I actually want to do when I use it is optimize the size of my binary as a whole. So ideally, I'd like cargo bloat to tell me from which libraries everything in my binary originates, so that I can make a decision that is not exclusively based on the text section size.

Dependencies don't really make up much of any other section besides .text which is where all the code lives. There might be small amounts of data in the .rodata section if the dep uses a lot of &'static strs, or in both .eh_frame and .gcc_except_table for error cases. But you're talking bytes.

If you're concerned with super small binaries, turning off Rust debug symbols (they're off by default in release builds) and stripping the binary of any other debug symbols is far more effective than scrounging for bytes in anything other than .text.

Debug symbols (both Rust's and others) are tens of megabytes large. As @RazrFalcon said, they're just public type and function names though.

Here's this repo with debug symbols on, off, and fully stripped

debug=true debug=false stripped + debug=false
87.0M 13.0M 7.2M

If we look at the stripped version we can see that .text takes up ~5.2M

kevin@beefcake: ~/Projects/cargo-bloat 
➜ size -A -t -d target/release/cargo-bloat
target/release/cargo-bloat  :
section                 size      addr
.interp                   28       624
.note.ABI-tag             32       652
.note.gnu.build-id        36       684
.gnu.hash                176       720
.dynsym                12096       896
.dynstr                 7369     12992
.gnu.version            1008     20362
.gnu.version_r           576     21376
.rela.dyn             151488     21952
.rela.plt              11112    173440
.init                     23    184552
.plt                    7424    184576
.plt.got                  48    192000
.text                5367936    192048    <---- .text
.fini                      9   5559984
.rodata               662089   5560000    <---- .rodata
.eh_frame_hdr         100756   6222092   
.eh_frame             462984   6322848    <---- .eh_frame
.gcc_except_table     539080   6785832    <---- .gcc_except_table
.tdata                   552   9425248
.init_array               16   9425800
.fini_array                8   9425816
.data.rel.ro          113232   9425824
.dynamic                 640   9539056
.got                    3984   9539696
.data                   4625   9543680
.bss                    6336   9548320
.comment                  96         0
Total                7453759

Next largest is in fact .rodata, .eh_frame, and .gcc_except_table. But I'm not sure how much of that is from this repo, or it's deps...even so they're dwarfed by .text and debug symbols.

from cargo-bloat.

RazrFalcon avatar RazrFalcon commented on July 22, 2024 1

I'll look is this supported by goblin.

from cargo-bloat.

RazrFalcon avatar RazrFalcon commented on July 22, 2024

Yes, I should note somewhere that it's only the .text section.

About what kind of debug symbols are you asking? The one from the debug build or the symbols table that also exists in the release build?

from cargo-bloat.

gnzlbg avatar gnzlbg commented on July 22, 2024

About what kind of debug symbols are you asking? The one from the debug build or the symbols table that also exists in the release build?

The ones in the symbols table that also exist in the release build. I like debug-symbols in release builds while debugging, but I don't like to ship release builds with debug symbols "in general".

from cargo-bloat.

kbknapp avatar kbknapp commented on July 22, 2024

This has been implemented now, no? 😄

kevin@chickenlegs: ~/Projects/cargo-bloat 
➜ cargo bloat --release --crates      
[.. snip compiling ..]

 File  .text     Size Name
13.0%  33.6%   1.7MiB std
 7.1%  18.2% 930.6KiB cargo
 5.3%  13.7% 702.9KiB [Unknown]
 2.3%   5.9% 304.2KiB libgit2_sys
 2.0%   5.2% 263.5KiB toml
 1.4%   3.5% 178.8KiB regex
 1.2%   3.0% 155.7KiB goblin
 1.1%   2.9% 150.4KiB serde_ignored
 1.0%   2.6% 134.7KiB curl_sys
 0.8%   2.1% 107.1KiB serde_json
 0.5%   1.4%  70.4KiB docopt
 0.4%   1.1%  56.8KiB regex_syntax
 0.3%   0.9%  44.3KiB url
 0.3%   0.7%  38.0KiB libssh2_sys
 0.3%   0.7%  35.4KiB git2
 0.3%   0.7%  34.6KiB serde
 0.2%   0.6%  30.7KiB globset
 0.2%   0.5%  25.9KiB cargo_bloat
 0.2%   0.4%  20.7KiB tar
 0.1%   0.3%  15.6KiB aho_corasick
38.8% 100.0%   5.0MiB .text section size, the file size is 12.9MiB

from cargo-bloat.

RazrFalcon avatar RazrFalcon commented on July 22, 2024

@kbknapp I thought that he is about this sections.

from cargo-bloat.

RazrFalcon avatar RazrFalcon commented on July 22, 2024

@gnzlbg can you explain in details what you need?

from cargo-bloat.

gnzlbg avatar gnzlbg commented on July 22, 2024

Currently, when I use cargo bloat, the total size reported by cargo bloat is the size of the text section, which is much smaller than the size of my binary. While cargo bloat allows me to optimize the size of the text section, what I actually want to do when I use it is optimize the size of my binary as a whole. So ideally, I'd like cargo bloat to tell me from which libraries everything in my binary originates, so that I can make a decision that is not exclusively based on the text section size.

I think debug symbols is a good place to start next, because the largest part of at least my rust binaries are debug symbols. I'd just like to know "where do they come from", that is, from which library do they originate.

I don't know if this is possible though. If it isn't, please just close this issue.

from cargo-bloat.

RazrFalcon avatar RazrFalcon commented on July 22, 2024

Currently, when I use cargo bloat, the total size reported by cargo bloat is the size of the text section, which is much smaller than the size of my binary.

It's already fixed.

what I actually want to do when I use it is optimize the size of my binary as a whole

But there is nothing to optimize, imho. The debug section contain names of the methods. That's it. Fewer methods - fewer size.

I think debug symbols is a good place to start next, because the largest part of at least my rust binaries are debug symbols.

I strip my executables, so I don't have such problem.

from cargo-bloat.

RazrFalcon avatar RazrFalcon commented on July 22, 2024

We have #66 for .rodata, but everything else is way to complicated to implement and will not provide much benefits.

from cargo-bloat.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.