Giter VIP home page Giter VIP logo

Comments (19)

tomkooij avatar tomkooij commented on August 17, 2024 1

@arjenhiemstra I just found some code which casts ints to int32_t.
(git blame points to me)

EDIT: at least IthoDeviceStatus::int_val is int32_t https://github.com/arjenhiemstra/ithowifi/blob/master/software/NRG_itho_wifi/main/IthoSystem.h#L54

It should be an easy fix. Will look into it soon.

from ithowifi.

tomkooij avatar tomkooij commented on August 17, 2024 1

OMG this is so stupid:
https://github.com/arjenhiemstra/ithowifi/blob/master/software/NRG_itho_wifi/main/devices/wpu.h#L588
(keep in mind these things are autogenerated)

There is are two values utc-time. So the correct one is just overwritten by the other (boolean) one...

It's fixed now. I'll submit a PR. But need to cook diner now ;-)
image

(In the PR i'll also fix the int32_t / uint32_t thing.

from ithowifi.

TD-er avatar TD-er commented on August 17, 2024 1

Let's take that text you quoted from:

The union is at least as big as necessary to hold its largest data member, but is usually not larger. The other data members are intended to be allocated in the same bytes as part of that largest member. The details of that allocation are implementation-defined, except that all non-static data members have the same address. It is undefined behavior to read from the member of the union that wasn't most recently written. Many compilers implement, as a non-standard language extension, the ability to read inactive members of a union.

The most important caveats in this alinea are marked as bold.
I'm not sure if the "same address" part is always enforceable, but I guess it will probably be with this example struct.

I don't know what will happen if you make a union using bitwise members (struct { uint8 bla1:4; uint8 bla2:4; } bla; vs uint16_t bla:8; for example)

The "most recently written" part is probably not an issue in any implementation upto 32 bit (on a 32-bit CPU), but I'm not sure how it may work on larger members of the union.

And if you like to initialize the struct, I would set the pointer to null_ptr as that will also set the byteval to 0.

from ithowifi.

arjenhiemstra avatar arjenhiemstra commented on August 17, 2024 1

Well, let's rephrase it to: "Pay very close attention when using union"

👍 As we should always do!

And now I will continue fixing the bugs in my own code due to use of union.... ;) (at least on ESP8266 it doesn't behave as expected and it seems to differ between Linux en Windows builds)

Hats off and good luck! Hope to see you soon again @ De Maakplek!

from ithowifi.

tomkooij avatar tomkooij commented on August 17, 2024

Doesn't seem an easy fix... :(

When I change IthoDeviceStatus::int_val to int64_t or uint32_t the value of UTC in both status html page and MQTT status JSON is still "1". When I D_LOG() the value to syslog the correct value is shown. The value is correctly read from the status i2c message and correctly stored in IthoStatus.

It seems to be problem with serialising the JSON. But I cannot find what is going wrong. (This is a 4 byte uint32, not even a 64bit long)

from ithowifi.

TD-er avatar TD-er commented on August 17, 2024

Please be aware that there isn't a standard String function for 64-bit ints.
So maybe the compiler casts it it to a bool.

You could simply test by casting it to uint32_t to see if this will work out.

from ithowifi.

arjenhiemstra avatar arjenhiemstra commented on August 17, 2024

String is not used in this case. The (u)int32_t value is a possible value of a union which is part of an own data struct:

union
{
byte byteval;
int32_t intval;
uint32_t uintval;
double floatval;
const char *stringval;
} value;

The is fed to ArduinoJSON directly. Which should support 32bit signed/unsigned out of the box.

I read that the consensus is for time_t to be signed 32bit. Is that also the way it is processed from the input i2c 2401 result?

Could you post the i2c results from the debug page?

from ithowifi.

TD-er avatar TD-er commented on August 17, 2024

Hmm using a union for types of different size is not really defined.
So be very very careful when using these kinds of union types.

from ithowifi.

arjenhiemstra avatar arjenhiemstra commented on August 17, 2024

according to:
https://en.cppreference.com/w/cpp/language/union

"The union is at least as big as necessary to hold its largest data member"

which in this case should be 8 bytes for the double on esp32. All other possible values should fit in these 8 bytes.
I do not see/understand where the undefined behaviour arrises from.

from ithowifi.

TD-er avatar TD-er commented on August 17, 2024

You don't know where the members will be located.
And since neither of the members is initialized, the values can be literally anything, even if you initialized a single one of them.

from ithowifi.

arjenhiemstra avatar arjenhiemstra commented on August 17, 2024

You don't know where the members will be located. And since neither of the members is initialized, the values can be literally anything, even if you initialized a single one of them.

I still do not get it (and I really would like to understand it and I know that you are very knowledgable so I want to take it seriously)

According to the same union documentation as mentioned earlier:
"all non-static data members have the same address"

Which would mean (imo) that the start address of the data we are interested in (being a byte, double, uint etc) would all start at the same union address. It is then up the the programmer to make sure to interpreted the bytes located at that starting address until the length of the type (1 to 8 bytes in this case) correctly and fitting for the data type expected.

In code first the data type is set in a separate value in the struct here:

enum : uint8_t
{
is_byte,
is_int,
is_float,
is_string
} type;

So each instance of the struct:

struct ithoDeviceStatus
{
const char *name;
enum : uint8_t
{
is_byte,
is_int,
is_float,
is_string
} type;
uint8_t length;
union
{
byte byteval;
int32_t intval;
uint32_t uintval;
double floatval;
const char *stringval;
} value;
uint32_t divider;
uint8_t updated;
bool is_signed;
ithoDeviceStatus() : updated(0){};
};

has at least a data type set (that happens first before the data at the union's mem location is being handled, I'm fully aware it does not happen on struct initialisation) for the union in that instance of the struct. These instances are then emplaced on a vector and kept in memory.

This set datatype determines how the rest of the code should interpreted the data at union mem &x. The bytes are read from the start address until the length of the data type (1 to 8 bytes) and then interpreted as instructed by the set data type (being a byte, double, uint etc).

The only undefined behaviour I can think of is when the data type is set but the actual data has not been written to the union. The data in mem (which could be anything atm) would be interpreted as instructed but especially for a const char * this could lead to a crash I can imagine. A byte, (u)int would probably only lead to a wrong value, I'm not sure how a double would be handled, have not looked into that to be honest.

I love to hear your thoughts about it!

from ithowifi.

arjenhiemstra avatar arjenhiemstra commented on August 17, 2024

and of course the struct could be changed to something like this:

struct ithoDeviceStatus
{
  const char *name;
  enum : uint8_t
  {
    is_byte,
    is_int,
    is_float,
    is_string
  } type;
  uint8_t length;
  union
  {
    byte byteval;
    int32_t intval;
    uint32_t uintval;
    double floatval;
    const char *stringval;
  } value{.byteval = 0};
  uint32_t divider;
  uint8_t updated;
  bool is_signed;
  ithoDeviceStatus() : name(nullptr), type(is_byte), length(1), divider(1), updated(0), is_signed(false){};
};

to make 100% sure that at creating instance all occupied memory has a known value (expect for the 7 padding bytes in the union). But I think that this will still not circumvent the possible issue of setting the type to is_string and then not writing the correct bytes to the union's mem location and trying to let the code interpret it as a cons char * for example.

from ithowifi.

arjenhiemstra avatar arjenhiemstra commented on August 17, 2024

this: struct { uint8 bla1:4; uint8 bla2:4; } bla;

is a struct with two distinct memory locations for 2 uint8 values.

whereas in my example this would translate to something like:
struct { uint8_t type, union { uint8 bla1:4; uint8 bla2:4; } uni } bla;

here we would still have two distinct memory locations but in that case one unit8 for type and one unit8 for the value stored in union uni (which is this example is both an uint8, a bit unusual application of a union of course)

it is the sentence after the part you made bold that sets me in the direction I used "except that all non-static data members have the same address". The union in my case consists of non-static data members, if I understand correctly.

And combined with what is in ISO/IEC 9899:
"
The size of a union is sufficient to contain the largest of its members. The value of at
most one
of the members can be stored in a union object at any time. A pointer to a
union object, suitably converted, points to each of its members
"

this should result in behaviour like I described earlier (at least I certainly hope so)

And isn't "undefined behavior" [...] "most recently written" meant to be the explanation of a situation where you for example last wrote data to union member uintval and try to access member ie. intval thereafer?

edit;
to elaborate a bit more on the 32/64 bit CPU, I think this should not matter.

for example:

union {
uint8_t a,
int32_t b,
int64_t c,
char string[24]
} value;

value should occupy a minimum of 24 bytes of memory space for each instantiation of the union, in-depended of CPU arch. and for any instance of the union &a == &b == &c == &string[0].

from ithowifi.

TD-er avatar TD-er commented on August 17, 2024

this: struct { uint8 bla1:4; uint8 bla2:4; } bla;

is a struct with two distinct memory locations for 2 uint8 values.

Nope, the sizeof of that struct should be 1 byte.
The funny thing is you can even mix signed and unsigned here, thus putting the bit for the minus sign at an unusual position.

struct { uint8 bla1:4; int8 bla2:4; } bla;

This has two individually accessible 4-bit variables so you don't need to perform bitmasks to get data from them or set it.
And bla2 is signed so you can set values from -8 ... 7 to it.

whereas in my example this would translate to something like: struct { uint8_t type, union { uint8 bla1:4; uint8 bla2:4; } uni } bla;

Nope, the opposite:

union{ uint8_t type, struct { uint8 bla1:4; uint8 bla2:4; } uni } bla;

Otherwise the bla1 and bla2 would share the same memory bits.

here we would still have two distinct memory locations but in that case one unit8 for type and one unit8 for the value stored in union uni (which is this example is both an uint8, a bit unusual application of a union of course)

You can create all kinds of bitwise structs. Just be aware of using signed vs. unsigned as that has bitten me way too often :)

See for example this union I use in my WiFi code:

  union 
  {
    struct {
      uint16_t isHidden:1; // Hidden SSID
      uint16_t lowPriority:1; // Try as last attempt
      uint16_t isEmergencyFallback:1; 
      uint16_t phy_11b:1; 
      uint16_t phy_11g:1; 
      uint16_t phy_11n:1; 
      uint16_t phy_lr:1; 
      uint16_t phy_11ax:1; 
      uint16_t wps:1; 
      uint16_t ftm_responder:1; 
      uint16_t ftm_initiator:1; 

      uint16_t unused:5;      
    };
    uint16_t flags{};
  };

But it is perfectly fine to user more bits to store some enum value for example.

Just be aware that the order of the bits may feel a bit counter-intuitive, so always test/check when you try to implement some struct to match some register definition from a datasheet.

it is the sentence after the part you made bold that sets me in the direction I used "except that all non-static data members have the same address". The union in my case consists of non-static data members, if I understand correctly.

And combined with what is in ISO/IEC 9899: " The size of a union is sufficient to contain the largest of its members. The value of at most one of the members can be stored in a union object at any time. A pointer to a union object, suitably converted, points to each of its members "

Yep the sizeof the union struct is just what you'd expect, 8 bytes.
It can sometimes be padded if the largest element of the union isn't a multiple of 4 bytes, but that's compiler dependent.

Also most compilers (all???) will complain if you ever use a union type in a packed struct, so it is unlikely you will end up with unaligned members like an uint32_t starting at an address which is not a multiple of 4.

this should result in behaviour like I described earlier (at least I certainly hope so)

And isn't "undefined behavior" [...] "most recently written" meant to be the explanation of a situation where you for example last wrote data to union member uintval and try to access member ie. intval thereafer?

Most 32-bit CPUs will load/store per 32 bit.
So any update to union members of at most 32 bit will very likely update all bits and even if it isn't updated yet at that memory location but still in the cache or in some register, those will work as expected when you write to one and read the other.
However if you exceed this 32-bit boundary, you may see odd behavior.

edit; to elaborate a bit more on the 32/64 bit CPU, I think this should not matter.

for example:

union {
uint8_t a;
int32_t b;
int64_t c;
char string[24];
} value;

value should occupy a minimum of 24 bytes of memory space for each instantiation of the union, in-depended of CPU arch. and for any instance of the union &a == &b == &c == &string[0].

Yep, all true, but now consider this:

int64_t test() {
 union {
  uint8_t a;
  int32_t b;
  int64_t c;
  char string[24];
 } value;

 memset(&value, 0, sizeof(value)); // May be optimized out by the compiler, can lead to very tricky situations and bugs!

 value.a = 123;
 value.b = value.a; // Just to make sure the compiler actually does something with the assignment
 for (size_t i = 1; i < sizeof(value.string); ++i) { value.string[i] = 0; } // Make sure the compiler doesn't optimize out all other assignments but still needs to reload some registers
 value.a = 123;
 value.c = 124;

 return value.a;
}

Strictly speaking it is compiler dependent what will be returned.
One would expect this function to return 0, but given it is left to the compiler implementation, it is possible the function will return 123.
Also the optimization steps done by the compiler may remove all assignments except for the last one.
Or the compiler may consider it to be the same address and actually returning value.c (thus returning 124) without performing a cast. But that would be a compiler bug IMHO.

from ithowifi.

TD-er avatar TD-er commented on August 17, 2024

And just to illustrate why it might be tricky using unions.....
I think exactly the union I posted in my code is causing issues as I try to clear all bits in the constructor by setting the flags to 0.
But the separate bits still appear not to be set.
Only when I explicitly set the bits myself, they are set.

So in my own code, I still need to explicitly set the flags myself:

WiFi_AP_Candidate::WiFi_AP_Candidate(uint8_t networkItem) : index(0), flags(0) {
  // Need to make sure the phy isn't known as we can't get this information from the AP
  // See: https://github.com/letscontrolit/ESPEasy/issues/4996
  // Not sure why this makes any difference as the flags should already have been set to 0.
  phy_11b = false;
  phy_11g = false;
  phy_11n = false;
  wps     = false;
[...]

As I was already looking into this myself, I was focussed on issues with unions, so be warned :)

from ithowifi.

arjenhiemstra avatar arjenhiemstra commented on August 17, 2024
  union 
  {
    struct {
      uint16_t isHidden:1; // Hidden SSID
      uint16_t lowPriority:1; // Try as last attempt
      uint16_t isEmergencyFallback:1; 
      uint16_t phy_11b:1; 
      uint16_t phy_11g:1; 
      uint16_t phy_11n:1; 
      uint16_t phy_lr:1; 
      uint16_t phy_11ax:1; 
      uint16_t wps:1; 
      uint16_t ftm_responder:1; 
      uint16_t ftm_initiator:1; 

      uint16_t unused:5;      
    };
    uint16_t flags{};
  };

my point is: your struct in a union from above example is something else than a union in a struct (my example / the piece of code from this project). You are accessing bits (missed that in my earlier reaction, sorry), I'm trying to get whole types from the union.

as per this part:

`int64_t test() {
union {
uint8_t a;
int32_t b;
int64_t c;
char string[24];
} value;

memset(&value, 0, sizeof(value)); // May be optimized out by the compiler, can lead to very tricky situations and bugs!

value.a = 123;
value.b = value.a; // Just to make sure the compiler actually does something with the assignment
for (size_t i = 1; i < sizeof(value.string); ++i) { value.string[i] = 0; } // Make sure the compiler doesn't optimize out all other assignments but still needs to reload some registers
value.a = 123;
value.c = 124;

return value.a;
}`

This example quite clearly explains what I'm trying to convey and as I understand documentation this should happen:

value.c = 124;
would set the active member of the union value to member c with a value of 124, as it is a int64_t the binary representation of value.a in mem should then be:
00000000000000000000000001111100

if you (illegally) access the non active union member value a (return value.a;) most (?) compilers would interpretet the data at mem address of a as uint8_t, which is 01111100 which is 124 in decimal. And thus return the value 124.

I've just done a quick test with GCC and Clang, both return the result as described above.
image

from ithowifi.

TD-er avatar TD-er commented on August 17, 2024

It depends on the endianness of the platform where the bits will be set.

And as I mentioned in my previous reply it really makes a difference when you clear all bytes via one member and try to read from another.
As I literally now have this bug which occurs on ESP8266 with SDK2.7.4 where I set the bits all to 0 in the constructor and then not set those bits explicitly again. When accessing the individual bits, they are not cleared.

The funky part is that I also do something very similar in another union which totals as 32 bit (and is guaranteed to be 32-bit aligned) and this does work just fine for years.

from ithowifi.

arjenhiemstra avatar arjenhiemstra commented on August 17, 2024

Very interesting investigation, thanks!

It depends on the endianness of the platform where the bits will be set.

sure, just left that out to keep things simple and because I think we both understand how much that will be of influence within a particular system architecture.

And as I mentioned in my previous reply it really makes a difference when you clear all bytes via one member and try to read from another.

According to doc mentioned earlier:
"It is undefined behavior to read from the member of the union that wasn't most recently written."

So OK, I think the 'puzzlement' here drills down to the difference in bit-field based members of a union or struct (with defined bit width) in your examples and the, type based, non-static union members in my code. According to the above statement only reading data from the most recent written member is defined behaviour, reading from any other member is undefined. But that might not be the case for bit-field based structures, cannot find a clear definition of that in the C docs.
Also:
"Each non-bit-field member of a structure or union object is aligned in an implementation defined manner appropriate to its type."

the code I initially shared:

union
{
byte byteval;
int32_t intval;
uint32_t uintval;
double floatval;
const char *stringval;
} value;

adheres to this and is also very much similar in setup and use as the example here:
https://en.cppreference.com/w/cpp/language/union

#include <iostream>
 
// S has one non-static data member (tag), three enumerator members (CHAR, INT, DOUBLE), 
// and three variant members (c, i, d)
struct S
{
    enum{CHAR, INT, DOUBLE} tag;
    union
    {
        char c;
        int i;
        double d;
    };
};
 
void print_s(const S& s)
{
    switch(s.tag)
    {
        case S::CHAR: [std::cout](http://en.cppreference.com/w/cpp/io/cout) << s.c << '\n'; break;
        case S::INT: [std::cout](http://en.cppreference.com/w/cpp/io/cout) << s.i << '\n'; break;
        case S::DOUBLE: [std::cout](http://en.cppreference.com/w/cpp/io/cout) << s.d << '\n'; break;
    }
}
 
int main()
{
    S s = {S::CHAR, 'a'};
    print_s(s);
    s.tag = S::INT;
    s.i = 123;
    print_s(s);
}

So I think we can (counting on good compiler implementation surely) conclude that this statement:

Hmm using a union for types of different size is not really defined.

Seems not applicable in the case of my example first given where the union members are non-static data types without bit-fields?

from ithowifi.

TD-er avatar TD-er commented on August 17, 2024

Well, let's rephrase it to: "Pay very close attention when using union"
And now I will continue fixing the bugs in my own code due to use of union.... ;) (at least on ESP8266 it doesn't behave as expected and it seems to differ between Linux en Windows builds)

from ithowifi.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.