Giter VIP home page Giter VIP logo

Comments (16)

alizain avatar alizain commented on July 17, 2024 3

Okay so clearly the change in spec was not as well thought out as it should have been. I admit, it was made in (hopefully) uncharacteristic haste. While the twitter/snowflake implementation gets around that by remembering the highest last seen time, I agree that it's not worth enforcing the complicating on implementations. I'd be happy to roll it back if everyone agrees

from javascript.

waaghals avatar waaghals commented on July 17, 2024

Firstly, Dรดh. I've just spent the last two hour or so implementing a proper port to .Net not knowing one already existed. I pushed what I have, but I'll just abandon mine because yours is much more complete.

I'm unsure about leaving out the 'increment within same ms' as this will guaranty that the id's are always monotonically increasing. However I was thinking of creating a generator for this instead of implementing it in the Ulid Type itself so it does not have that responsibility. This way it could be left out of the spec, while still be available for consumers which need this guaranty.

The race conditions are solvable within C#, but I'm not so confident for Javascript because it does not have a notion of locking. However Firebase has been successfully using a JavaScript approach for some time now.

You/we may want to decide on pronunciation and document it: "you-lid"? "you-el-i-dee"? "uhl-id"?

To late, I already started calling the "you-lid" ๐Ÿ˜œ

from javascript.

RobThree avatar RobThree commented on July 17, 2024

However I was thinking of creating a generator for this instead of implementing it in the Ulid Type itself so it does not have that responsibility. This way it could be left out of the spec, while still be available for consumers which need this guaranty.

Mine already has that possibility; you could implement an IUlidRng that generates increasing values within the same millisecond. However, as pointed out, this requires you to keep state, tip-toe around datetime (wall-clock or not) issues, think about locking etc. and, if not very carefully implemented, is a disaster waiting to happen whereas random (with 80 bits of entropy) is safe enough for most purposes; however ordering will only be reliable to the ms resolution, below that ordering would be random. Either way is fine with me, by the way, but having seen the other implementations (which also don't implement this increasing-within-the-same-timeslot) I'd favor the 'old' (or current by others implementations) behavior.

The race conditions are solvable within C#

They'll be solvable in most, if not any, language (JS is usually run in a single-threaded environment anyway AFAIK). ๐Ÿ˜‰ My point was that it introduces complexity (state, races, ....) for little gain (unless you really want your ordering to be guaranteed to even within the ms-range. If that's the case then I agree; use a custom "Rng" (thinking of providing one myself) but it shouldn't be part of the spec.

Also: If we'd choose the "increasing" route; Why increase by 1? Why not increase with a random amount every call? If we increase by 1 the ULID's would become 'enumerable' again (meaning: if a ULID appears in a URL for example then I could try +1 or -1 to explore the system). One of the nice properties of GUID/UUID's is that they aren't sequential and 'hard to guess' (although that doesn't mean it's secure or you should rely on that fact for security purposes!). Care would have to be taken if a lot of ULID's were generated in the same ms to handle overflow of the 80bit value (which could (more) easily happen if the increments aren't 1 but, say, rand(0..1000000) and the initial 80 bits are close to the overflow value) but other than that I'd opt for random increments instead of 1. Ofcourse, the increment itself (be it a 'fixed' value of 1, 42, ... or a 'random' (clamped) value) could be another property of the generator.

To late, I already started calling the "you-lid" ๐Ÿ˜œ

That makes two of us! ๐ŸŽ‰


Edit: I have written a (proposed) "sequential ULID RNG" that should work with NUlid. You can find the source here. It's not done yet (see line 68), though, since I need to implement addition (and handle overflow/wraparound) over a 10-byte integer. I haven't decided on how to implement that (easy route would be to use System.Numerics's BigInteger but a) it's slow as hell for this specific purpose and b) I believe it's not available in .Net core it is!). As soon as I've decided I will implement and include the SequentialUlidRng in the next release.

In essence it's a 'wrapper' for a given RNG. It will use any IUlidRng to create random values but will increment the last value when called within the same millisecond by either a) a fixed value like 1, 42, ... or b) a random value between min and max.

Feedback and ideas are appreciated ๐Ÿ‘

from javascript.

rafaelsales avatar rafaelsales commented on July 17, 2024

I also don't like the current approach for that issue, it's too flaky.

In Ruby I'm able to get time up to nano, and then increasing the time section of the ULID improves that.

But in JS, not sure

from javascript.

savonarola avatar savonarola commented on July 17, 2024

Hello!

It isn't quite clear how we can increase the most significant character. As far as I understand this, starting, for example, from "01ASXJXJJF 0T7FMQ0S7B9K1D6T", we should generate the following:

01ASXJXJJF 0T7FMQ0S7B9K1D6T
01ASXJXJJF 1T7FMQ0S7B9K1D6T
...
01ASXJXJJF ZT7FMQ0S7B9K1D6T
01ASXJXJJF ZU7FMQ0S7B9K1D6T
01ASXJXJJF ZV7FMQ0S7B9K1D6T
...
01ASXJXJJF ZZ7FMQ0S7B9K1D6T
01ASXJXJJF ZZ8FMQ0S7B9K1D6T
...
01ASXJXJJF ZZZZZZZZZZZZZZZZ

(space added to visually separate time part from random one)
If we act like this, we will be able to generate at most 32*16 = 512 ULIDs within a millisecond, and it's very few.

Have I misunderstood something? In my opinion we should better increase the 16-char random tail as if it were an 16-digit 32-based integer.

from javascript.

alizain avatar alizain commented on July 17, 2024

Come to think of it: You might also want to steer clear of the sign-bit; when systems would order on the (internal) 128 bit representation / byte-array (for 'speed') the order may 'break' as soon as the sign bit is set to 1. Twitter thought of it way before it happened but I think it's a good idea. Sacrificing 1 bit out of 128 should't be that much of a problem (though it does halve your "ulid-space" from 2^128 to 2^127).

@RobThree, so if I understand it correctly, this may have an affect on the binary representation, but not the string encoded one. Also, if everything is stored as unsigned_ints, would this problem still hold?

from javascript.

alizain avatar alizain commented on July 17, 2024

We may want to allow / think about / specify a(n optional?) separator for readability. No { and } mess like (G/U)UID's but something like ttttt-ttttt-rrrrrr-rrrrr-rrrrr (lengths 5-5-6-5-5) might make ulid's more readable. Other formats/separators are ofcourse a possibility (as well as none of this ofcourse; I'm just bringing my ideas to the table)

@RobThree, I don't have any particular preference for or against separators. It could be made optional for better readability. Would you like to open another issue to discuss in? This one is getting kind of cramped ๐Ÿ˜

from javascript.

alizain avatar alizain commented on July 17, 2024

@RobThree @waaghals I prefer "you-lid" for pronunciation as well ๐ŸŽ‰ , and have no preference between all-caps or no-caps when written. However, I'm not a big fan of "Ulid" ๐Ÿ˜Ÿ

from javascript.

alizain avatar alizain commented on July 17, 2024

@waaghals AFAIK node has no notion of locking because it is single-threaded. All code in the JS implementation of ULID is also completely synchronous

from javascript.

alizain avatar alizain commented on July 17, 2024

If we act like this, we will be able to generate at most 32*16 = 512 ULIDs within a millisecond, and it's very few.

@savonarola, you're quite right, it was a hasty change on my part. It has now been reverted

from javascript.

alizain avatar alizain commented on July 17, 2024

@RobThree NUlid has been added to the readme ๐ŸŽ‰

from javascript.

RobThree avatar RobThree commented on July 17, 2024

@savonarola

It isn't quite clear how we can increase the most significant character.

We should never 'increase' anythin in the base32 space. Base32 is just a representation of the actual (6 or 10 byte) value. It could've just as well been base 10 but base32 has other advantages. So increasing anything should be done as you would increase any other value. The only difference is that we're used to work with 1, 2, 4 or 8 bytes (byte, word, dword, long or whatever you call it in your preferred language) and this time we happen to work with 6 (time) or 10 ('random') byte sized values. The resulting base32 representation will follow from that automatically.

In my opinion we should better increase the 16-char random tail as if it were an 16-digit 32-based integer.

We should only increment the "random" part (if increments are decided upon at all; be it increments of 1 or fixed "X" values or random values as per my earlier suggestion) and take care of not 'overflowing' into the time part of the ulid; that would seriously affect the reliability of knowing when a ulid was generated; one of it's key 'selling points' to begin with. So only affect the 10 "random" bytes, leave the 6 "time" bytes in tact; throw an overflowexception or something similar if the increment causes the 'random part' to overflow; not using the bits from the "time" part to overflow into.

I was personally thinking along the following lines: when creating an "initial random value" (so first value in a 'fresh timeslot / ms') don't use the first (leftmost) byte; leave it at 0. Use the other 9 for random value. This will give you 256 'overflows' of the 9 other bytes before you'll come close to even thinking about touching the 'time part' of the ulid. This way you can still increment by 1, X or <random_value> a lot of times and still gives you enough entropy (the other 9 bytes) to avoid most collisions.

Long story short: I still like the 'simplicity' of a ulid and still think it's a nice idea (which I've had several times in my career as well) but I also think that we're moving towards a snowflake-alike thing in the end if we don't watch out for feature creep; we're (already) reinventing the wheel a bit. If ulid's are to become widely adopted then it should have simplicity (not too much stuff encoded, not too hard to implement), readability (yay base32) and a very tiny chance of collisions (so the 'best' feature of a UUID/GUID).

@alizain

Also, if everything is stored as unsigned_ints, would this problem still hold?

Yes; some systems don't have a notion of signed/unsigned which is what not-using the MSB is attempting to prevent in the first place. It messes up the sorting if you use the actual (16 byte) value because of the 'wraparound' because of the SIGNED(note: not UNsigned)-ness.

I don't have any particular preference for or against separators. It could be made optional for better readability.

Sure; no problem. Simplest is you just find 26 chars [0-9ABCDEFGHJKMNPQRSTVWXYZ] and filter out any allowed separators (like .,-/_: for example) so any of the following would be valid:

01ARYZ6S41-DEAD-BEEF-DEAD-BEEF
01ARYZ6S41-DEADBE-EFDEA-DBEEF
01AR:YZ6S:41DE:ADBE:EFDEAD:BEEF
01AR-YZ6S_41/DEAD/BEEF:DEAD.BEEF

However, that would severely impact the recognizability of a ulid (though I exaggerated the above example greatly with wildly varying notations and way too many allowed separators). Coming up with / "standardizing" one (maybe 2) ways to write ulid's (and preferably enforcing those, and only those, 'standardized upon' ways) besides them being one big 26-char string is probably a good idea.

NUlid has been added to the readme ๐ŸŽ‰

Yay! ๐ŸŽ‰

from javascript.

waaghals avatar waaghals commented on July 17, 2024

I was thinking of using the last two (or more) LSB as incrementation space. This way there would be less randomness sacrificed. However this would yield in very similar Ulids when a collision occurs. (Not a issue for me) This would allow for 65536 increments of one within the same millisecond.

I was personally thinking along the following lines: when creating an "initial random value" (so first value in a 'fresh timeslot / ms') don't use the first (leftmost) byte; leave it at 0. Use the other 9 for random value. This will give you 256 'overflows' of the 9 other bytes before you'll come close to even thinking about touching the 'time part' of the ulid. This way you can still increment by 1, X or a lot of times and still gives you enough entropy (the other 9 bytes) to avoid most collisions.

I really like this approach, first I though you where sacrificing the MSB of randomness as a counter to allow 256 collisions within each millisecond, but this is not the case. This approach allows for random increments which makes the chances of a collision lower. However I think a full byte might be to large in most cases.

I think most of the time a single overflow bit would be sufficient. It would be really nice if it was possible to indicate where the overflow bits are. This way the number of overflow bits can be decided at runtime for the best possible fit. (i.e sacrifice randomness to prevent collisions, or vice-versa) The increment can then also be variable because a larger overflow space allows for much larger random increments)

The Ulids could be formatted as follows (! is the overflow separator):
01AN4Z07BY-!79KA1307SR9X4MV3 (no overflow bit, high change of overflow, very random)
01AN4Z07BY-7!9KA1307SR9X4MV3 (5 bits of overflow space)
01AN4Z07BY-79!KA1307SR9X4MV3 (10 bits of overflow space)

It might also be possible to use overflow spaces other then a multiple of five by introducing a extra character.

I just thought of this so not a lot of thought work went into it.

Thoughts?

from javascript.

RobThree avatar RobThree commented on July 17, 2024

overflow separator

I think it's overcomplicating things. We should reserve at least one, but preferably a few more bits to 'overflow into' when incrementing random amounts within the same ms. If an overflow happens you'll be able to determine it did because one or more of the bits designated as 'overflow bit' will be non-zero. No need for separator(s). If the number of bits, let's take 4 as a compromise, are 'fixed' then there's no need to complicate a ulid with "configurable/customizable overflow configuration".

from javascript.

alizain avatar alizain commented on July 17, 2024

Sorry for the lack of activity on my part, is there still interest in properly implementing incrementation within the same ms?

from javascript.

alizain avatar alizain commented on July 17, 2024

Closing issue, feel free to re-open!

from javascript.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.