Giter VIP home page Giter VIP logo

Comments (19)

lifthrasiir avatar lifthrasiir commented on July 30, 2024 1

Maybe I'm late in the discussion but I'm the initial author of flt2dec formatting code.

I think the stack space can be reduced as long as the formatting itself requests only that amount of digits. In the other words, the stack usage for {:.10e} or so can be bounded and fixed (it will only need 11 digits), but {:.10} cannot (it depends on the length of the integral part, 319 digits max). It is also possible to redesign the entire (internal) API to only require the fixed amount of buffer (say, 64 bytes) and emit a buffer of digits as needed, though it would be a big change.

I don't think that the stack usage can be reduced to some hundred bytes, however, precisely because in the worst case (at about 0.x% probability) fp formatting requires seven bignums, each weighing 160 bytes. Three of them can be discarded at the expense of performance, but remaining four are not, so 640 bytes + output buffer is the absolute minimum. If you have to optimize past this point your only bet is to use Grisu2 which is correct (the output will remap to the original value when parsed) and shortest (there is no other shorter output that is correct) but not closest (there may be other outputs of the same length closer to the original value). To the this end dtoa might be desirable---Serde uses it for fast fp formatting.

from wg.

briansmith avatar briansmith commented on July 30, 2024

Maybe it's good to (also) write guidelines recommending people avoid core::fmt if they care about code size. It seems like there are probably a lot of improvements that can be made, but in many cases it is used, it can be avoided completely.

from wg.

japaric avatar japaric commented on July 30, 2024

Maybe it's good to (also) write guidelines recommending people avoid core::fmt

What would the guideline suggest to use instead of core::fmt? Other than don't use formatting at all.

from wg.

briansmith avatar briansmith commented on July 30, 2024

What would the guideline suggest to use instead of core::fmt? Other than don't use formatting at all.

Exactly "don't use formatting at all." For example, when a function returns an error, don't avoid using core::fmt to format an error string stored in the error. Instead, return an error type that can be formatted by the code using the library, if it actually wants a formatted message.

from wg.

thejpster avatar thejpster commented on July 30, 2024

from wg.

japaric avatar japaric commented on July 30, 2024

@briansmith

don't avoid using core::fmt to format an error string stored in the error.

Sounds like a good, general (as in, not embedded specific) recommendation to me.

that can be formatted by the code using the library

I guess my worry is most types will implement core::fmt traits but the user wants to use the CodeSizeOptimizedFmt trait but they can't because (a) the trait is not in core, (b) third party crate didn't implement it because it's not in core and (c) the user can't implement it themselves because of coherence nor manually format each field of a struct because of privacy

@thejpster

I would be interested in seeing the equivalent results for C and C++ (printf vs std::cout) for the same target

I want run time to also be compared and I want to see these numbers for an 8-bit (e.g. AVR) micro. AFAICT, the current implementation doesn't do almost any inlining because the core::fmt implementation uses trait objects (a lot?) so there are a bunch of fmt-related functions in the final binary and all those function calls affect the run time.

from wg.

pftbest avatar pftbest commented on July 30, 2024

I would be interested in seeing the equivalent results for C and C++ (printf vs std::cout) for the same target

In my case:

  • C code goes from 1550 bytes to 30241 bytes if you use sprintf from newlib (default libc on cortex-m).
  • Rust code goes:
    • from 956 to 3810 if you format integer
    • from 956 to 19884 if you format double and use compiler-builtins
    • from 968 to 20020 if you format double and use libgcc

So I think 2k bytes of code is not a big deal really, compared to 20k needed for floating point. And people are ok with 30k+ binaries in C world (apparently).

The thing I'm worried about is the fact that rust does 1Kbyte memset each time you print a float:
https://github.com/rust-lang/rust/blob/master/src/libcore/fmt/mod.rs#L1498

from wg.

japaric avatar japaric commented on July 30, 2024

C code goes from 1550 bytes to 30241 bytes

Is this (a) optimized and (b) the increase for formatting an integer? or is the increase the same regardless of the kind of formatting one does?

The thing I'm worried about is the fact that rust does 1Kbyte memset each time you print a float

The float formatting code is pretty hardcode. It prints floats with "full" precision AIUI.

I wonder if we could have some sort of "pluggable" formatting where one can bring their own formatting code that does different trades off between code size and precision. And by pluggable, I mean that you can recycle the deriving infrastructure for your formatting traits, i.e. the existing #[derive(Debug, Display)] would implement your traits. (Maybe this is too ambitious)

from wg.

pftbest avatar pftbest commented on July 30, 2024

@japaric

r is the increase the same regardless of the kind of formatting one does?

Yes, it is always the same, because libc comes precompiled with gcc toolchain, so it does not matter if I use optimizations or integers instead of floats. I even tried -flto but the difference was less than 40 bytes.

from wg.

dhylands avatar dhylands commented on July 30, 2024

The thing I'm worried about is the fact that rust does 1Kbyte memset each time you print a float:
https://github.com/rust-lang/rust/blob/master/src/libcore/fmt/mod.rs#L1498

Not only is it a 1K memset, it's 1K of stack space that's required. On embedded targets wasting 1K of stack space like this is atrocious.

from wg.

jethrogb avatar jethrogb commented on July 30, 2024

In the embedded world, formatting is well known to require large amount of code. See for example section 3.15.2.1 of http://sdcc.sourceforge.net/doc/sdccman.pdf . I've seen tables like those for other compilers but I can't seem to find them right now.

The thing I'm worried about is the fact that rust does 1Kbyte memset each time you print a float:
https://github.com/rust-lang/rust/blob/master/src/libcore/fmt/mod.rs#L1498

Is that amount actually necessary? The maximum length needed to represent an f64 is approximately "-1.4503599627370496e-308".len()==24

from wg.

japaric avatar japaric commented on July 30, 2024

To the this end dtoa might be desirable

For formatting of integers, I recommend numtoa; it generates fast and small code IME. You'll have to put together different buffers into a human readable message though.

Another option is just to send everything using a binary format. The receiver end will have to parse the binary stuff and format it properly for human consumption. This uses less resources on the device side. These days I use byteorder for binary serialization but I wish we had better options in the ecosystem.

from wg.

awelkie avatar awelkie commented on July 30, 2024

Another option is just to send everything using a binary format. The receiver end will have to parse the binary stuff and format it properly for human consumption. This uses less resources on the device side.

Indeed, I think a lot of the use cases for formatting could be handled by a procedural macro that replaced uses of format! with an automatically generated binary format, and also spit out a machine-readable description of that binary format. For example, something like this:

let x: u32 = 0;
let y: u32 = 0;
write!(&mut out, "variable x is {}", x);
write!(&mut out, "variable y is {}", y);

could be replaced by

write!(&mut out, "1");
write!(&mut out, &x as &[u8; 4])

write!(&mut out, "2");
write!(&mut out, &y as &[u8; 4]);

And then a document describing that message "1" is "variable x is {}" followed by a little-endian u32 (assuming the code is compiled for a little endian CPU). A program could use that document to translate the output into the desired human-readable output. All the formatting and the storage of the constant strings would be done on some other computer.

I think something like this could make for a pretty slick low-overhead method of debug printing.

from wg.

japaric avatar japaric commented on July 30, 2024

Yeah, that sounds like an autogenerated enum that sits in a crate common to both the device program and the host side. Something like this:

// common code
enum Message {
    X(u32),
    Y(u32),
}

impl Message {
    fn deserialize(buffer: &[u8; 5]) -> Result<Self> { .. }
    fn serialize(&self, buffer: &mut [u8; 5]) { .. }
}

impl fmt::Display for Message { .. }

// device side
Message::X(42).serialize(&mut buffer);
sink.write_all(&buffer); // sends [0u8, 42u32] through the wire

Message::Y(24).serialize(&mut buffer);
sink.write_all(&buffer); // sends [1u8, 24u32] through the wire

// host side
let message = Message::deserialize(&buffer).unwrap();
println!("{}", message); // prints "variable x is 42"
// ..
let message = Message::deserialize(&buffer).unwrap();
println!("{}", message); // prints "variable y is 24"

Now we just need a macro wizard to implement the procedural macro (or macro 1.1) that takes care of all the boilerplate.

from wg.

japaric avatar japaric commented on July 30, 2024

An alternative to core::fmt popped up this week: fast_fmt. It's faster and leaner than core::fmt as it doesn't use trait objects so LLVM can optimize the code better. The downsides: it's formatting syntax (fwrite! macro) is not the same as write!'s and since it has it's own Write trait you can't format things marked with #[derive(Debug, Display)]. However, if you only need to format numbers then it's a much ergonomic solution to manually formatting things using numtoa.

from wg.

therealprof avatar therealprof commented on July 30, 2024

@japaric I think you're numbers are not quite up-to-date anymore. The overhead keeps growing and growing. A simple

writeln!(&mut output, "Hello World!");

into a dummy Buffer creates a 2744 bytes .text section on thumbv6m nowadays.

Just yesterdays changes caused the code generated for pad_integral to grow by another 60 bytes.

Yet the compiler/linker leaves plenty of dead code hanging around, like show_usize which is not even called. (rust-lang/rust#43369)

And then there's another issue. The code is so fat that using fmt (or fast_fmt or just numtoa) let's me constantly bump into this:

foo.cgu-0.rs:(.text+0x0): relocation truncated to fit: R_ARM_THM_JUMP11 against `DEFAULT_HANDLER'

from wg.

jonwingfield avatar jonwingfield commented on July 30, 2024

Is there any recognized way around this? Even when eliminating Debug from my code, I still get a lot of bloat from the various assertions/messages in slice and other places throughout the library. I wish there was a way to conditionally compile them out for release builds. It's hard to fit things into 16K :)

from wg.

spease avatar spease commented on July 30, 2024

@jonwingfield Not sure if this addresses your issue, but @dhylands wrote a related crate awhile back:
https://github.com/dhylands/format-rs

I'm going to cross-post this to the embedded-wg to get some better visibility from people who might be interested and have time to fix this. EDIT: Doh, it's already here. :)

from wg.

dtolnay avatar dtolnay commented on July 30, 2024

Relevant libcore PR: Cut down on number formatting code size.
One commenter noted 28% reduction in code size for a specific ARMv6m example.

from wg.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.