Giter VIP home page Giter VIP logo

typeid's People

Contributors

akhundmurad avatar aleris avatar ant8e avatar blitss avatar broothie avatar cbuctok avatar conradludgate avatar firenero avatar frizlab avatar fxlae avatar gcurtis avatar github-actions[bot] avatar guizmaii avatar janwennrich avatar johnnynotsolucky avatar lagoja avatar loreto avatar lucilleh avatar mikeland73 avatar mistermoe avatar mmzk1526 avatar mohsenari avatar ongteckwu avatar rrrodzilla avatar sloanelybutsurely avatar softprops avatar tencokacistromy avatar tensorush avatar titouancreach avatar xinz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

typeid's Issues

Add Typescript Implementation

Implemented a typescript implementation. Needed for my work. Will post here soon after I polish it up. Thanks for this project!

new Rust implementation

Hey everyone,

I've got a new Rust implementation of the TypeId spec that I'd love to share. It's called mti and it offers a different approach by adding extensions to strings, allowing them to behave and parse like TypeIds while maintaining type safety.

What's cool about this implementation:

  1. It's split into three crates:

  2. The TypeIdSuffix and TypeIdPrefix crates are designed for independent use. They've been thoroughly fuzz tested, prop tested, and formally verified.

  3. Both the mti and TypeIdPrefix crates include an optional prefix sanitization feature, which guarantees a valid prefix is created even from bad data.

  4. This split design allows for some neat tricks in high-volume data pipeline processing. For example, you can generate prefixes and suffixes separately and combine them later to maintain high throughput.

I've been using this in a distributed actor framework I'm working on, and it's been pretty slick so far.

Future plans:

  • Exposing these crates to JavaScript and PHP, reusing the Rust core.

Thanks for all your hard work on the spec! Let me know if you need any more info or if you'd like to include this implementation.

RFC: Consider asking the IETF to make a smarter move on UUID V7 before adopting

We should push for a better option (or revision to) UUID V7

From an email I sent the authors of that draft:

Given that the "Unix Epoch" value is going to Y2K us in 2038, thus meaning all the sortability of V7 UUIDs would be broken, any chance you would consider revising that format slightly?

I would propose a Epoch-Period of one or two bits at the front of the UUID field. Then right-shift the actual Unix timestamp one or two bits before injecting those values in the rest of the timestamp field in V7 format. That would only lose us one or two bits of timestamp precision while buying us either 68 or 204 more YEARS before we get Y2Ked

If we actually are building this to have some useful K-sortability, seems crazy to be asking for trouble and adopting a representation that will roll-over in 2038... that's not far away.

On the possible ambiguity when decoding.

Hi,

First of all, cool project! After implementing it myself, I wanted to share some thoughts.

Since TypeIDs have a fixed length with known padding, they can be encoded and decoded in a straightforward manner. However, this does not resolve a certain ambiguity that arises when decoding the suffix, depending on the leftmost character. This is likely already known, but I believe its implications could be made more explicit.

Imagine the first three bits of a UUID to be 100. With padding, that would be 00100. Now, encoding is simple:

encode(00100) = '4'

And so is decoding:

decode('4') = 00100

Then we strip the padding and get back our initial three bits: 100. However, decode('c'), decode('m'), and decode('w') lead to this exact same result, as their binary representation is XX100. After discarding the first two bits, 100 remains in all cases. In short, this implies that if two TypeIDs are identical except for their leftmost suffix characters, and both characters map to the same binary representation after stripping the first two padding bits, the resulting UUID is the same. 32 TypeId suffixes that only differ in the first character map to only 8 unique UUIDs.

Yes... strictly speaking, no TypeID suffix that was encoded as described in the formal specification can ever start with another character than '0'-'7', as these are the only characters with a binary representation beginning with 00..., which is exactly the padding. But the specification does not explicitly restrict a TypeID suffix not to begin with '8'-'z', syntactically, those are still valid TypeIDs.

I'm not suggesting this is a problem. The specification is not incorrect. It just does not (in mathematical terms) describe a bijective function, and I'm concerned that end users of TypeID libraries may intuitively expect the encoding and decoding process to be bijective.

An illustration


This behavior can be observed with the current implementation of the command-line tool from this repository.

First, let's decode and re-encode a TypeID suffix starting with a character from between '0' and '7':

$ typeid decode prefix_01h2xcejqtf2nbrexx3vqjhp41
type: prefix
uuid: 0188bac7-4afa-78aa-bc3b-bd1eef28d881

$ typeid encode prefix 0188bac7-4afa-78aa-bc3b-bd1eef28d881
prefix_01h2xcejqtf2nbrexx3vqjhp41

As expected, the encoded result is equal to the original TypeID.

Now, let's take the same TypeID, but replace the leftmost character of the suffix with something between '8' and 'z', which still constitutes a syntactically correct TypeID:

$ typeid decode prefix_81h2xcejqtf2nbrexx3vqjhp41
type: prefix
uuid: 0188bac7-4afa-78aa-bc3b-bd1eef28d881 # same as above

$ typeid encode prefix 0188bac7-4afa-78aa-bc3b-bd1eef28d881
prefix_01h2xcejqtf2nbrexx3vqjhp41

But now: prefix_81h2xcejqtf2nbrexx3vqjhp41 != prefix_01h2xcejqtf2nbrexx3vqjhp41

As mentioned above, if we try this for all 32 characters, the command-line tool decodes 32 different TypeIDs to only 8 unique UUIDs:

[0,8,g,r]1h2xcejqtf2nbrexx3vqjhp41 -> 0188bac7-4afa-78aa-bc3b-bd1eef28d881
[1,9,h,s]1h2xcejqtf2nbrexx3vqjhp41 -> 2188bac7-4afa-78aa-bc3b-bd1eef28d881
[2,a,j,t]1h2xcejqtf2nbrexx3vqjhp41 -> 4188bac7-4afa-78aa-bc3b-bd1eef28d881
[3,b,k,v]1h2xcejqtf2nbrexx3vqjhp41 -> 6188bac7-4afa-78aa-bc3b-bd1eef28d881
[4,c,m,w]1h2xcejqtf2nbrexx3vqjhp41 -> 8188bac7-4afa-78aa-bc3b-bd1eef28d881
[5,d,n,x]1h2xcejqtf2nbrexx3vqjhp41 -> a188bac7-4afa-78aa-bc3b-bd1eef28d881
[6,e,p,y]1h2xcejqtf2nbrexx3vqjhp41 -> c188bac7-4afa-78aa-bc3b-bd1eef28d881
[7,f,q,z]1h2xcejqtf2nbrexx3vqjhp41 -> e188bac7-4afa-78aa-bc3b-bd1eef28d881

My thoughts:

  • You could argue that for properly generated TypeIDs, the leftmost suffix character is always between '0'-'7'. That's true, but the problem arises not during encoding, but during decoding. Input strings from external sources (users, clients, etc.) are not inherently trustworthy. Even syntactically correct TypeIDs lead to this ambiguity (as demonstrated above).
  • Possible solutions:
    • Keep everything as it is. Maybe it's not that much of a problem.
    • Or: Do not allow '8'-'z' as the leftmost characters, as no properly generated suffix should ever begin with those characters. This is what I did in my Java implementation that I submitted yesterday, because I initially assumed it was not permitted. Only later I found out that it isn't explicitly specified.

I hope this feedback is in some way helpful.

More compact string encoding

Cool project! I've also recently been thinking about typeIDs (I follow a similar pattern in some of my toy projects) and might be interested in collaborating on a python implementation.

The approach I've been taking is to run a UUIDv7 through base58 (https://pypi.org/project/base58/) before prefixing in order to get an even shorter string encoding. I haven't done this at any particular scale, but I'd be curious if you considered an encoding like this, and if there are any pros/cons you see either way?

When I was looking for an encoding scheme, I had a similar set of requirements:

URL safe, case-insensitive, avoids ambiguous characters, can be selected for copy-pasting by double-clicking, and is a more compact encoding than the traditional hex encoding used by UUIDs

Add C# .NET implementation to the list

Hi,

I've created a performance-oriented implementation of TypeId in C#: https://github.com/firenero/TypeId
What should be done to add it to the list and mark it as verified? I've seen the discussion about automated flow for validation (#23 ) but not sure if there is something in place already.

Also thanks for the reference implementation and examples of valid/invalid typeIds. They were really helpful during developing my library.

Remove second rust implementation

Hi! Thank you again for adding my implementation :)
Library that was provided by @conradludgate is very good, and I don't think we should have two rust implementations,
could you please remove mine from the table?

Created a Postgres extension for TypeID

Hi there,
Just wanted to put this out there that I created an extension for Postgres which allows to use TypeIDs as any other IDs: https://github.com/blitss/typeid-postgres-extension

It's based on the code from @conradludgate and also passes the tests you had.

It stores them as prefixes + 128-bit UUIDs under the hood and returns them as human-readable IDs; allows to use ID as primary key as well sort it/ filter it/convert back and forth to uuid.

I'm not very proficient with Rust but according to my tests it should be fine to use and better than sql implementation (https://github.com/jetify-com/typeid-sql) due to it being an extension and implementing wider subset of postgres features.

Let's consider the ways of automation of the spec validation

TypeID now has a lot of implementations in different programming languages. Therefore, there should be a way of tracking their validation statuses (against spec). I am wondering about a badge, that can show in the README a validation status (failed, succeed) of the particular library. Nevertheless, there should be better solutions.

Origin of base 32 alphabet

Hello,

This is more a comment than a true issue.
I was surprised to see Crockford's alphabet
for something that looked familiar to me and was only from 2019.
Don't misunderstand me, I think this alphabet is convenient and maybe he was the first to propose exactly this choice of letters;
hence, he may deserve to be credited;
but for those interested in the topic, I suggest to look at:
https://ux.stackexchange.com/questions/53341/are-there-any-letters-numbers-that-should-be-avoided-in-an-id
from 2014
with link to:
https://github.com/tytso/pwgen/blame/master/pw_rand.c
from 2005
and if there had been GitHub since 1970, I would guess something earlier could be more easily found.
I think definitely https://www.crockford.com/base32.html should add bibliographic references.

Best regards,
Laurent Lyaudet

Need help with underscore support in TS/JS?

๐Ÿ‘‹! Our team at Graphite would love to get underscore support in the official TypeScript/JS library.

Do you need help implementing this? If it's straightforward we might be able to land a patch and accelerate to the v0.3 spec, like you did for Go: 5ffabce

RFC: Consider allowing `_` as an additional separator within the typeid prefix

The spec, as defined today, only allows for lowercase alphabetic characters in the type prefix. Some users though, might need a way to have a "compound noun" in the prefix. Imagine you want the type to be "user accounts"; today you would have to encode that as a single word useraccounts but it might be preferable to allow a separator to encode it as user_accounts instead.

Future specification on binary format

Is there any plan to add into the specification how to convert a typeid to binary format?

In my other personal project utilising typeid, I will need to serialise the ids. So far I'm implementing my own serialisation only for that specific project, but if there will be a formal specification, I can include that in the Haskell implementation as well.

RFC: Consider adding one or two extra characters to the encoding for a checksum

One the top comments in the HackerNews discussion was:

I've been doing this kind of thing for years with two notable differences:
...

I add two base-32 characters as a checksum (salted of course). This is prevents having to go look at the datastore when the
value is bogus either by accident or malice. I'm unsure why other implementations don't do this.

Should we do that as part of the official TypeID spec?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.