Comments (15)
Sure! It's not unreasonable to have field.key().escaped()
or maybe field.escaped_key()
if the former isn't feasible. There's no reason not to have such a method accessible as long as the user has to type the word raw
or escaped
somewhere.
Just for context, the rationales behind not making key()
convert automatically or easily to string_view were:
-
Converting to string_view requires scanning the string to get the length, which is a waste of time for many use cases (string comparison, raw copy, and unescaping), so we don't want the easiest methods to start off doing it--this is why key() returns the raw_json_string gets returned in the first place, so these operations can be done without preemptively taking that overhead.
-
Processing strings without unescaping should be explicit in the code, requiring you to write "escaped" or "raw" to do it. Due to the rarity of escapes, accidentally forgetting to unescape is the kind of "silent but deadly" bug that tends to make its way to production and fail mysteriously and intermittently there. In some ways, it's like asking the user to sign a waiver saying they understand the risks :) Counterpoint: escapes in keys (rather than values) are so rare that we already have one API--
object["key"]
--that processes escaped keys without anything explicitly indicating it in the code. We still try to minimize the number of places we do this, and point 1 still applies, though.
from simdjson.
I agree with @jkeiser that it is reasonable to extend the API further.
from simdjson.
Note that is it not difficult to implement. We effectively have the code already.
from simdjson.
I think the length of the escaped_key()
should be length of key_raw_json_token()
minus two bytes?
Following is the code of key_raw_json_token()
I found. Should escaped_key()
be very similar, but cutting one byte and the head and tail of key_raw_json_token()
?
simdjson_inline std::string_view field::key_raw_json_token() const noexcept {
SIMDJSON_ASSUME(first.buf != nullptr); // We would like to call .alive() by Visual Studio won't let us.
return std::string_view(reinterpret_cast<const char*>(first.buf-1), second.iter._json_iter->token.peek(-1) - first.buf + 1);
}
from simdjson.
Unfortunately, key_raw_json_token()
will include everything from the open quote to the character just before the :
. This means if there are any spaces between the key and the colon, it is included in key_raw_json_token(): for { "abc" : "def" }
, the raw_json_token will have four spaces at the end, for example.
from simdjson.
@renzibei @jkeiser We must do some non-trivial work even if we do not unescape.
It might answer this question: Is there a technical or design rationale for this absence, or could this be considered for future implementation?
The answer is that it is not free.
from simdjson.
It might indeed require scanning for the ending double quote to find the length of the key, but for ease of use, I think this may be worthwhile. If we compare the key multiple times with some other strings, then we've almost certainly scanned the key multiple times already. Additionally, in situations where the key is used as the key in a hash map, having the length information and a std::string_view
becomes indispensable. Given these considerations, I believe the cost of determining the length could be justified.
We can remind the user that this operation to get a string_view has some cost, but smaller than unescaped_key().
from simdjson.
@renzibei When checking for equality, we do not actually need to find the end quote... so it is not work that we do in any case, or that we could necessarily amortize in practice.
There is no argument against the fact that the feature request is valid and we will provide it.
Let me be clear : we will provide this functionality in a future release. In fact, I am openly inviting folks to provide a pull request. If nobody does it, I will.
from simdjson.
Yeah, to be specific, the code to compare the key with a string is basically strncmp(field.key().buf, str.data(), str.len()) && *(field.key().buf+str.len()) == '"'
. So we don't scan for it; we just check if the quote is where it should be given the length of the string we're comparing to.
from simdjson.
@lemire Thanks for the reply.
You mentioned that the function has been implemented somewhere already?
We effectively have the code already.
from simdjson.
Yeah, to be specific, the code to compare the key with a string is basically
strncmp(field.key().buf, str.data(), str.len()) && *(field.key().buf+str.len()) == '"'
. So we don't scan for it; we just check if the quote is where it should be given the length of the string we're comparing to.
I understand. What I'm saying is that, when we compare the key() to other strings multiple times, we may have accessed the whole key memory already. The strncmp
here or memcmp
can be viewed as a scan of the memory.
from simdjson.
The strncmp here or memcmp can be viewed as a scan of the memory.
But that's not what we do in the code.
from simdjson.
You mentioned that the function has been implemented somewhere already?
We can locate the start and the end. It is a simple matter of backtracking and finding the quote.
Except for the copy-pasting, the whole thing can be implemented with one or two extra lines of code.
The expensive part is to write the documentation and the new tests.
from simdjson.
This week I'm preoccupied with some commitments. If you have the bandwidth to tackle this soon, that would be fantastic. Otherwise, I'd be happy to contribute a pull request, potentially after one or two weeks. Of course, if anyone else has the capacity to jump in sooner, that would be great as well.
from simdjson.
This will be part of the next release.
from simdjson.
Related Issues (20)
- Better error for JSON Pointer "overshooting" the actual JSON structure. HOT 5
- Windows with Unicode path HOT 3
- Possible to support `ppc` via Altivec? (i.e. for non-VSX ISA) HOT 3
- and_then support in simdjson_result for monadic operations HOT 2
- misuse of `__AVX2__` etc., likely to cause miscompilation with GCC14 HOT 3
- Parsing issue with the attached json HOT 3
- Commit c85e8a7 seems to have broken building on ARM Mac HOT 1
- git clone https://github.com/simdjson/simdjson.git failed HOT 1
- document_stream::iterator::source misses the trailing character if the last element is scalar. HOT 4
- How to work with gcc 98, our project is use gcc low version. HOT 1
- Build failure on AVX2 systems due to forcing AVX512 types HOT 3
- Build failure with GCC 14 in nodejs with `-O2 -march=znver2` HOT 3
- warning with gcc14 on c++20 mode : warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor] HOT 2
- [not bug] how to install it on Centos HOT 1
- [FOLLOWUP of 2170] document_stream::iterator::source misses the trailing character if the last element is scalar. HOT 1
- parse broken json HOT 3
- get_int64() can be slower than get_double()?
- Looking for ways to bypass check for '}' at the end in the value_iterator. HOT 15
- use of size_t but including <cstddef> HOT 2
- Typo in simdjson/doc/basics.md HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from simdjson.