jetify-com / typeid Goto Github PK

View Code? Open in Web Editor NEW

2.9K 2.9K 38.0 127 KB

Type-safe, K-sortable, globally unique identifier inspired by Stripe IDs

License: Apache License 2.0

Go 80.19% Shell 19.81%

guid typeid uuid uuidv7

typeid's People

Contributors

Stargazers

Watchers

typeid's Issues

Add Typescript Implementation

Implemented a typescript implementation. Needed for my work. Will post here soon after I polish it up. Thanks for this project!

I've got a new Rust implementation of the TypeId spec that I'd love to share. It's called mti and it offers a different approach by adding extensions to strings, allowing them to behave and parse like TypeIds while maintaining type safety.

What's cool about this implementation:

It's split into three crates:
- mti (the main crate, which combines the other two)
- TypeIdSuffix
- TypeIdPrefix
The TypeIdSuffix and TypeIdPrefix crates are designed for independent use. They've been thoroughly fuzz tested, prop tested, and formally verified.
Both the mti and TypeIdPrefix crates include an optional prefix sanitization feature, which guarantees a valid prefix is created even from bad data.
This split design allows for some neat tricks in high-volume data pipeline processing. For example, you can generate prefixes and suffixes separately and combine them later to maintain high throughput.

I've been using this in a distributed actor framework I'm working on, and it's been pretty slick so far.

Future plans:

Exposing these crates to JavaScript and PHP, reusing the Rust core.

Thanks for all your hard work on the spec! Let me know if you need any more info or if you'd like to include this implementation.

[question] how to use in a Postgres database?

RFC: Consider asking the IETF to make a smarter move on UUID V7 before adopting

We should push for a better option (or revision to) UUID V7

From an email I sent the authors of that draft:

Given that the "Unix Epoch" value is going to Y2K us in 2038, thus meaning all the sortability of V7 UUIDs would be broken, any chance you would consider revising that format slightly?

I would propose a Epoch-Period of one or two bits at the front of the UUID field. Then right-shift the actual Unix timestamp one or two bits before injecting those values in the rest of the timestamp field in V7 format. That would only lose us one or two bits of timestamp precision while buying us either 68 or 204 more YEARS before we get Y2Ked

If we actually are building this to have some useful K-sortability, seems crazy to be asking for trouble and adopting a representation that will roll-over in 2038... that's not far away.

Mark C# (.NET) implementation validated against spec

Thanks for adding to the list!

I reimplemented the mentioned tests here ( https://github.com/TenCoKaciStromy/typeid-dotnet/blob/main/src/dotnet-typeid-tests/ValidationAgainstSpec_Tests.cs ).

On the possible ambiguity when decoding.

Hi,

First of all, cool project! After implementing it myself, I wanted to share some thoughts.

Since TypeIDs have a fixed length with known padding, they can be encoded and decoded in a straightforward manner. However, this does not resolve a certain ambiguity that arises when decoding the suffix, depending on the leftmost character. This is likely already known, but I believe its implications could be made more explicit.

Imagine the first three bits of a UUID to be 100. With padding, that would be 00100. Now, encoding is simple:

encode(00100) = '4'

And so is decoding:

decode('4') = 00100

Then we strip the padding and get back our initial three bits: 100. However, decode('c'), decode('m'), and decode('w') lead to this exact same result, as their binary representation is XX100. After discarding the first two bits, 100 remains in all cases. In short, this implies that if two TypeIDs are identical except for their leftmost suffix characters, and both characters map to the same binary representation after stripping the first two padding bits, the resulting UUID is the same. 32 TypeId suffixes that only differ in the first character map to only 8 unique UUIDs.

Yes... strictly speaking, no TypeID suffix that was encoded as described in the formal specification can ever start with another character than '0'-'7', as these are the only characters with a binary representation beginning with 00..., which is exactly the padding. But the specification does not explicitly restrict a TypeID suffix not to begin with '8'-'z', syntactically, those are still valid TypeIDs.

I'm not suggesting this is a problem. The specification is not incorrect. It just does not (in mathematical terms) describe a bijective function, and I'm concerned that end users of TypeID libraries may intuitively expect the encoding and decoding process to be bijective.

An illustration

This behavior can be observed with the current implementation of the command-line tool from this repository.

First, let's decode and re-encode a TypeID suffix starting with a character from between '0' and '7':

$ typeid decode prefix_01h2xcejqtf2nbrexx3vqjhp41
type: prefix
uuid: 0188bac7-4afa-78aa-bc3b-bd1eef28d881

$ typeid encode prefix 0188bac7-4afa-78aa-bc3b-bd1eef28d881
prefix_01h2xcejqtf2nbrexx3vqjhp41

As expected, the encoded result is equal to the original TypeID.

Now, let's take the same TypeID, but replace the leftmost character of the suffix with something between '8' and 'z', which still constitutes a syntactically correct TypeID:

$ typeid decode prefix_81h2xcejqtf2nbrexx3vqjhp41
type: prefix
uuid: 0188bac7-4afa-78aa-bc3b-bd1eef28d881 # same as above

$ typeid encode prefix 0188bac7-4afa-78aa-bc3b-bd1eef28d881
prefix_01h2xcejqtf2nbrexx3vqjhp41

But now: prefix_81h2xcejqtf2nbrexx3vqjhp41 != prefix_01h2xcejqtf2nbrexx3vqjhp41

As mentioned above, if we try this for all 32 characters, the command-line tool decodes 32 different TypeIDs to only 8 unique UUIDs:

[0,8,g,r]1h2xcejqtf2nbrexx3vqjhp41 -> 0188bac7-4afa-78aa-bc3b-bd1eef28d881
[1,9,h,s]1h2xcejqtf2nbrexx3vqjhp41 -> 2188bac7-4afa-78aa-bc3b-bd1eef28d881
[2,a,j,t]1h2xcejqtf2nbrexx3vqjhp41 -> 4188bac7-4afa-78aa-bc3b-bd1eef28d881
[3,b,k,v]1h2xcejqtf2nbrexx3vqjhp41 -> 6188bac7-4afa-78aa-bc3b-bd1eef28d881
[4,c,m,w]1h2xcejqtf2nbrexx3vqjhp41 -> 8188bac7-4afa-78aa-bc3b-bd1eef28d881
[5,d,n,x]1h2xcejqtf2nbrexx3vqjhp41 -> a188bac7-4afa-78aa-bc3b-bd1eef28d881
[6,e,p,y]1h2xcejqtf2nbrexx3vqjhp41 -> c188bac7-4afa-78aa-bc3b-bd1eef28d881
[7,f,q,z]1h2xcejqtf2nbrexx3vqjhp41 -> e188bac7-4afa-78aa-bc3b-bd1eef28d881

My thoughts:

You could argue that for properly generated TypeIDs, the leftmost suffix character is always between '0'-'7'. That's true, but the problem arises not during encoding, but during decoding. Input strings from external sources (users, clients, etc.) are not inherently trustworthy. Even syntactically correct TypeIDs lead to this ambiguity (as demonstrated above).
Possible solutions:
- Keep everything as it is. Maybe it's not that much of a problem.
- Or: Do not allow '8'-'z' as the leftmost characters, as no properly generated suffix should ever begin with those characters. This is what I did in my Java implementation that I submitted yesterday, because I initially assumed it was not permitted. Only later I found out that it isn't explicitly specified.

I hope this feedback is in some way helpful.

More compact string encoding

Cool project! I've also recently been thinking about typeIDs (I follow a similar pattern in some of my toy projects) and might be interested in collaborating on a python implementation.

The approach I've been taking is to run a UUIDv7 through base58 (https://pypi.org/project/base58/) before prefixing in order to get an even shorter string encoding. I haven't done this at any particular scale, but I'd be curious if you considered an encoding like this, and if there are any pros/cons you see either way?

When I was looking for an encoding scheme, I had a similar set of requirements:

URL safe, case-insensitive, avoids ambiguous characters, can be selected for copy-pasting by double-clicking, and is a more compact encoding than the traditional hex encoding used by UUIDs

FYI: link to specification is broken/has changed

Link to specification is broken. The spec appears to reside in the README.md file of the /spec folder.

Current: https://github.com/jetpack-io/typeid/blob/main/spec.md
Working: https://github.com/jetpack-io/typeid/blob/main/spec/README.md

Add C# .NET implementation to the list

Hi,

I've created a performance-oriented implementation of TypeId in C#: https://github.com/firenero/TypeId
What should be done to add it to the list and mark it as verified? I've seen the discussion about automated flow for validation (#23 ) but not sure if there is something in place already.

Also thanks for the reference implementation and examples of valid/invalid typeIds. They were really helpful during developing my library.

Add Dart implementation to list

Heyo y'all!

Just finished implementing and publishing typeid in Dart!

Repo
pub.dev link (dart / flutter package manager)
Tests using vectors provided in spec
Successful build with tests passing

Wanted to share and get a ✅ before opening a PR to add it to y'alls list of implementations

Remove second rust implementation

Hi! Thank you again for adding my implementation :)
Library that was provided by @conradludgate is very good, and I don't think we should have two rust implementations,
could you please remove mine from the table?

Created a Postgres extension for TypeID

Hi there,
Just wanted to put this out there that I created an extension for Postgres which allows to use TypeIDs as any other IDs: https://github.com/blitss/typeid-postgres-extension

It's based on the code from @conradludgate and also passes the tests you had.

It stores them as prefixes + 128-bit UUIDs under the hood and returns them as human-readable IDs; allows to use ID as primary key as well sort it/ filter it/convert back and forth to uuid.

I'm not very proficient with Rust but according to my tests it should be fine to use and better than sql implementation (https://github.com/jetify-com/typeid-sql) due to it being an extension and implementing wider subset of postgres features.

Let's consider the ways of automation of the spec validation

TypeID now has a lot of implementations in different programming languages. Therefore, there should be a way of tracking their validation statuses (against spec). I am wondering about a badge, that can show in the README a validation status (failed, succeed) of the particular library. Nevertheless, there should be better solutions.

Add Swift Implementation

Hi!

I’ve implemented a Swift lib for typeid: https://github.com/Frizlab/swift-typeid.
Can you add it in the implementations list?

Thanks!

Origin of base 32 alphabet

Hello,

This is more a comment than a true issue.
I was surprised to see Crockford's alphabet
for something that looked familiar to me and was only from 2019.
Don't misunderstand me, I think this alphabet is convenient and maybe he was the first to propose exactly this choice of letters;
hence, he may deserve to be credited;
but for those interested in the topic, I suggest to look at:
https://ux.stackexchange.com/questions/53341/are-there-any-letters-numbers-that-should-be-avoided-in-an-id
from 2014
with link to:
https://github.com/tytso/pwgen/blame/master/pw_rand.c
from 2005
and if there had been GitHub since 1970, I would guess something earlier could be more easily found.
I think definitely https://www.crockford.com/base32.html should add bibliographic references.

Best regards,
Laurent Lyaudet

Need help with underscore support in TS/JS?

👋! Our team at Graphite would love to get underscore support in the official TypeScript/JS library.

Do you need help implementing this? If it's straightforward we might be able to land a patch and accelerate to the v0.3 spec, like you did for Go: 5ffabce

RFC: Consider allowing `_` as an additional separator within the typeid prefix

The spec, as defined today, only allows for lowercase alphabetic characters in the type prefix. Some users though, might need a way to have a "compound noun" in the prefix. Imagine you want the type to be "user accounts"; today you would have to encode that as a single word useraccounts but it might be preferable to allow a separator to encode it as user_accounts instead.

Future specification on binary format

Is there any plan to add into the specification how to convert a typeid to binary format?

In my other personal project utilising typeid, I will need to serialise the ids. So far I'm implementing my own serialisation only for that specific project, but if there will be a formal specification, I can include that in the Haskell implementation as well.

RFC: Consider adding one or two extra characters to the encoding for a checksum

One the top comments in the HackerNews discussion was:

I've been doing this kind of thing for years with two notable differences:
...

I add two base-32 characters as a checksum (salted of course). This is prevents having to go look at the datastore when the
value is bogus either by accident or malice. I'm unsure why other implementations don't do this.

Should we do that as part of the official TypeID spec?

Add .NET (C#) implementation

Hi!

I implemented a .NET lib: https://github.com/TenCoKaciStromy/typeid-dotnet

Could you add it in the implementations list?

Thanks!

jetify-com / typeid Goto Github PK

typeid's People

Contributors

Stargazers

Watchers

Forkers

typeid's Issues

Recommend Projects

Recommend Topics

Recommend Org