Giter VIP home page Giter VIP logo

Comments (6)

namazso avatar namazso commented on June 16, 2024

I don't get it, does it hinder your usage in any way?

It goes with so lax rules because it's intended to work with lines straight from sumfiles and from tables on websites.

from openhashtab.

RadarNyan avatar RadarNyan commented on June 16, 2024

It doesn't affect the functionality of the application, it just feels wrong to have a long string appear in the text box, and at first glance I had no idea why it was pasted there (since the part it incorrectly assumes as a "hash value" is cut off) until I realized it must have recognized the "KBxxxxxxx" number as a hash value, which in itself is stupid:

  1. I haven't found any hash values in the wild that involve the letter "K". Most of them are represented in hexadecimal form.
  2. I think it shouldn't take that much time to check if the string in the clipboard contains a hexadecimal value of proper length, i.e., a possible hash value.

Here's a quick example regex for matching hex numbers between 8 and 32 digits:

\b[a-f0-9]{8,32}\b|\b[A-F0-9]{8,32}\b

8 and 32 should be replaced by the shortest and longest possible lengths of the algorithms enabled by the user.

from openhashtab.

namazso avatar namazso commented on June 16, 2024

I haven't found any hash values in the wild that involve the letter "K". Most of them are represented in hexadecimal form.

But they may come with garbage attached due to being copy-pasted from various sites and whatnot. Colon, for example, is a fairly common pollution in front of hashes (like Hash:1234ABCD).

I think it shouldn't take that much time to check if the string in the clipboard contains a hexadecimal value of proper length, i.e., a possible hash value.

That is checked. See the regex below, it's enforcing at least 4 byte long hashes.

Here's a quick example regex for matching hex numbers between 8 and 32 digits:

That is a very poor regex, as greedy matches like that need a lot of stack space and result in stack overflow. It also doesn't ensure that there's an even count of nibbles.

For reference, this is the current one:

((?:[0-9A-F]{2} ?)(?:[0-9A-F]{2} ?)(?:[0-9A-F]{2} ?)(?:[0-9A-F]{2} ?)++|(?:[0-9a-f]{2} ?)(?:[0-9a-f]{2} ?)(?:[0-9a-f]{2} ?)(?:[0-9a-f]{2} ?)++)

I might try using \b though, it might improve the matches, because K is indeed not a very common separator.

8 and 32 should be replaced by the shortest and longest possible lengths of the algorithms enabled by the user.

This is a bit complicated to implement considering there's an option that enables algorithms based on length of the captured hash. There would need to be two check points based on that feature.

from openhashtab.

RadarNyan avatar RadarNyan commented on June 16, 2024

That indeed might not be the most optimized regex, I don't consider myself an expert, so when it comes to regex I tend to write the most intuitive one with the least length possible that I can come up with, so it's easier for me to read and leave optimization to the regex engine.

greedy matches like that need a lot of stack space and result in stack overflow

Would you elaborate on this? Isn't using ? in the current regex also greedy?


For this example (checksum for Ubuntu ISO files)

b98dac940a82b110e6265ca78d1320f1f7103861e922aa1a54e4202686e9bbd3 *ubuntu-22.04-latest-desktop-amd64.iso
5e38b55d57d94ff029719342357325ed3bda38fa80054f9330dc789cd2d43931 *ubuntu-22.04-latest-live-server-amd64.iso
b98dac940a82b110e6265ca78d1320f1f7103861e922aa1a54e4202686e9bbd3 *ubuntu-22.04.2-desktop-amd64.iso
5e38b55d57d94ff029719342357325ed3bda38fa80054f9330dc789cd2d43931 *ubuntu-22.04.2-live-server-amd64.iso

Match with the current regex ((?:[0-9A-F]{2} ?)(?:[0-9A-F]{2} ?)(?:[0-9A-F]{2} ?)(?:[0-9A-F]{2} ?)++|(?:[0-9a-f]{2} ?)(?:[0-9a-f]{2} ?)(?:[0-9a-f]{2} ?)(?:[0-9a-f]{2} ?)++) results in (tested with Regex Debugger provided by regex101.com):

4 Matches, 779 steps
Match 1 found in 106 step(s)
Match 2 found in 333 step(s)
Match 3 found in 349 step(s)
Match 4 found in 308 step(s)
Match 5 failed in 218 step(s)

Using the short regex \b[0-9a-f]{8,64}\b|\b[0-9A-F]{8,64}\b (had to increase 32 to 64 to work with this example) results in:

4 Matches, 146 steps
Match 1 found in 4 step(s)
Match 2 found in 114 step(s)
Match 3 found in 126 step(s)
Match 4 found in 104 step(s)
Match 5 failed in 112 step(s)

But this one has the drawback of including strings with odd numbers of digits, this bug/feature might help if the user missed the first digit when copying the checksum string?

If it's absolutely necessary to limit match results to strings with even numbers of digits only, then it can be modified as \b(?:[0-9a-f]{2}){4,32}\b|\b(?:[0-9A-F]{2}){4,32}\b, which results in:

4 Matches, 470 steps
Match 1 found in 68 step(s)
Match 2 found in 218 step(s)
Match 3 found in 232 step(s)
Match 4 found in 208 step(s)
Match 5 failed in 154 step(s)

This one is better than the current one at enforcing even numbers of digits, since the current one will match the first 62 digits of a string with 63 digits, while this one will not match that string altogether.

from openhashtab.

namazso avatar namazso commented on June 16, 2024

Would you elaborate on this? Isn't using ? in the current regex also greedy?

Yes, but it only happens 5 times max, as the ++ later is possessive (and thus does not have this issue). Everything with potential to "give back" needs one stack frame. Debug builds (on my machine) run out of stack in about 20 greedy matches. So 5 is fine, 64 is not. Although one with possessive ? would be a further win.

Step counts don't really matter, because CTRE processes regexes compile-time, so the matcher is native, and stupid fast. With the drawback being, of course, that it uses the stack exclusively for state, so if the state is too big, you're out of luck.

\b(?:[0-9a-f][0-9a-f]){4,32}\b|\b(?:[0-9A-F][0-9A-F]){4,32}\b

The space acceptance in the original one was very much intentional, some sites like to break hashes into arbitrary chunks for some unknown reason (however, never between nibbles). And I think this still has the greedy issue, with up to 32 deep greedy matches if we add space support.

This one is better than the current one at enforcing even numbers of digits, since the current one will match the first 30 digits of a string with 31 digits, while this one will not match that string altogether.

That would be an improvement, yeah.

How about this:

\b(?:[0-9a-f][0-9a-f])(?: ?+[0-9a-f][0-9a-f]){3,31}+\b|\b(?:[0-9A-F][0-9A-F])(?: ?+[0-9A-F][0-9A-F]){3,31}+\b

Seems to work about well for this test set:

b98dac940a82b110e6265ca78d1320f1f7103861e922aa1a54e4202686e9bbd3 *ubuntu-22.04-latest-desktop-amd64.iso
5e38b55d57d94ff029719342357325ed3bda38fa80054f9330dc789cd2d43931 *ubuntu-22.04-latest-live-server-amd64.iso
b98dac940a82b110e6265ca78d1320f1f7103861e922aa1a54e4202686e9bbd3 *ubuntu-22.04.2-desktop-amd64.iso
5e38b55d57d94ff029719342357325ed3bda38fa80054f9330dc789cd2d43931 *ubuntu-22.04.2-live-server-amd64.iso
5e38b55d57d94ff05e38b55d57d94ff0 *ubuntu-22.04.2-live-server-amd64.iso
5e 38 b5 5d 57 d9 4f f0 29 71 93 42 35 73 25 ed 3b da 38 fa 80 05 4f 93 30 dc 78 9c d2 d4 39 31 *ubuntu-22.04.2-live-server-amd64.iso
5e38 b55d 57d9 4ff0 2971 9342 3573 25ed 3bda da38 fa80 054f 9330 dc78 9cd2 d439 *ubuntu-22.04.2-live-server-amd64.iso
5e38 b55d 57d 9 4ff0 2971 9342 3573 25ed 3bda da38 fa80 054f 9330 dc78 9cd2 d439 *ubuntu-22.04.2-live-server-amd64.iso
5e38b55d57d94ff029719342357325ed3bda38fa80054f9330dc789cd2d43931
Hash:5e38b55d57d94ff029719342357325ed3bda38fa80054f9330dc789cd2d43931
KB892378731922

from openhashtab.

RadarNyan avatar RadarNyan commented on June 16, 2024

Now I see, for use case like this, when it is desired to match as much as possible but no need to give back, just add + behind the Repetition {} to make it Possessive instead of Greedy. I didn't know about sites breaking hash strings with separators, maybe the intention was to make it easier for those who need to manually type it out?

It seems you've copied my old comment before I updated it, so it retains the (?:[0-9a-f][0-9a-f]) form instead of my later edited (?:[0-9a-f]{2}), I'm not sure if this difference would affect performance or not (it does reduce the number of steps though).

from openhashtab.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.