Giter VIP home page Giter VIP logo

Comments (15)

jkeiser avatar jkeiser commented on June 13, 2024

Sure! It's not unreasonable to have field.key().escaped() or maybe field.escaped_key() if the former isn't feasible. There's no reason not to have such a method accessible as long as the user has to type the word raw or escaped somewhere.

Just for context, the rationales behind not making key() convert automatically or easily to string_view were:

  1. Converting to string_view requires scanning the string to get the length, which is a waste of time for many use cases (string comparison, raw copy, and unescaping), so we don't want the easiest methods to start off doing it--this is why key() returns the raw_json_string gets returned in the first place, so these operations can be done without preemptively taking that overhead.

  2. Processing strings without unescaping should be explicit in the code, requiring you to write "escaped" or "raw" to do it. Due to the rarity of escapes, accidentally forgetting to unescape is the kind of "silent but deadly" bug that tends to make its way to production and fail mysteriously and intermittently there. In some ways, it's like asking the user to sign a waiver saying they understand the risks :) Counterpoint: escapes in keys (rather than values) are so rare that we already have one API--object["key"]--that processes escaped keys without anything explicitly indicating it in the code. We still try to minimize the number of places we do this, and point 1 still applies, though.

from simdjson.

lemire avatar lemire commented on June 13, 2024

I agree with @jkeiser that it is reasonable to extend the API further.

from simdjson.

lemire avatar lemire commented on June 13, 2024

Note that is it not difficult to implement. We effectively have the code already.

from simdjson.

renzibei avatar renzibei commented on June 13, 2024

I think the length of the escaped_key() should be length of key_raw_json_token() minus two bytes?
Following is the code of key_raw_json_token() I found. Should escaped_key() be very similar, but cutting one byte and the head and tail of key_raw_json_token()?

simdjson_inline std::string_view field::key_raw_json_token() const noexcept {
  SIMDJSON_ASSUME(first.buf != nullptr); // We would like to call .alive() by Visual Studio won't let us.
  return std::string_view(reinterpret_cast<const char*>(first.buf-1), second.iter._json_iter->token.peek(-1) - first.buf + 1);
}

from simdjson.

jkeiser avatar jkeiser commented on June 13, 2024

Unfortunately, key_raw_json_token() will include everything from the open quote to the character just before the :. This means if there are any spaces between the key and the colon, it is included in key_raw_json_token(): for { "abc" : "def" }, the raw_json_token will have four spaces at the end, for example.

from simdjson.

lemire avatar lemire commented on June 13, 2024

@renzibei @jkeiser We must do some non-trivial work even if we do not unescape.

It might answer this question: Is there a technical or design rationale for this absence, or could this be considered for future implementation?

The answer is that it is not free.

from simdjson.

renzibei avatar renzibei commented on June 13, 2024

It might indeed require scanning for the ending double quote to find the length of the key, but for ease of use, I think this may be worthwhile. If we compare the key multiple times with some other strings, then we've almost certainly scanned the key multiple times already. Additionally, in situations where the key is used as the key in a hash map, having the length information and a std::string_view becomes indispensable. Given these considerations, I believe the cost of determining the length could be justified.

We can remind the user that this operation to get a string_view has some cost, but smaller than unescaped_key().

from simdjson.

lemire avatar lemire commented on June 13, 2024

@renzibei When checking for equality, we do not actually need to find the end quote... so it is not work that we do in any case, or that we could necessarily amortize in practice.

There is no argument against the fact that the feature request is valid and we will provide it.

Let me be clear : we will provide this functionality in a future release. In fact, I am openly inviting folks to provide a pull request. If nobody does it, I will.

from simdjson.

jkeiser avatar jkeiser commented on June 13, 2024

Yeah, to be specific, the code to compare the key with a string is basically strncmp(field.key().buf, str.data(), str.len()) && *(field.key().buf+str.len()) == '"'. So we don't scan for it; we just check if the quote is where it should be given the length of the string we're comparing to.

from simdjson.

renzibei avatar renzibei commented on June 13, 2024

@lemire Thanks for the reply.

You mentioned that the function has been implemented somewhere already?

We effectively have the code already.

from simdjson.

renzibei avatar renzibei commented on June 13, 2024

Yeah, to be specific, the code to compare the key with a string is basically strncmp(field.key().buf, str.data(), str.len()) && *(field.key().buf+str.len()) == '"'. So we don't scan for it; we just check if the quote is where it should be given the length of the string we're comparing to.

I understand. What I'm saying is that, when we compare the key() to other strings multiple times, we may have accessed the whole key memory already. The strncmp here or memcmp can be viewed as a scan of the memory.

from simdjson.

lemire avatar lemire commented on June 13, 2024

The strncmp here or memcmp can be viewed as a scan of the memory.

But that's not what we do in the code.

from simdjson.

lemire avatar lemire commented on June 13, 2024

You mentioned that the function has been implemented somewhere already?

We can locate the start and the end. It is a simple matter of backtracking and finding the quote.

Except for the copy-pasting, the whole thing can be implemented with one or two extra lines of code.

The expensive part is to write the documentation and the new tests.

from simdjson.

renzibei avatar renzibei commented on June 13, 2024

This week I'm preoccupied with some commitments. If you have the bandwidth to tackle this soon, that would be fantastic. Otherwise, I'd be happy to contribute a pull request, potentially after one or two weeks. Of course, if anyone else has the capacity to jump in sooner, that would be great as well.

from simdjson.

lemire avatar lemire commented on June 13, 2024

This will be part of the next release.

from simdjson.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.