Comments (8)
The problem i have with single strings is that it makes querying via Elasitcsearch much more difficult and less intuitive. Elasticsearch will already tokenize the document, splitting on whitespace and other common deliminators like ":=,"
Eg.
With the following example documents
{
"title": "Doc1",
"tags": [
"key1=foo1",
"key2=foo2"
]
},
{
"title": "Doc2",
"tags": [
"key5=foo1",
"key2=foo3"
]
},
{
"title": "Doc3",
"tags": [
"foobar"
]
},
{
"title": "Doc4",
"tags": [
"key1=foo1",
"key1=bar2"
]
},
if i wanted to match every document where key1==foo1, then i would need to quote my search query.
{
"query": {
"query_string": {
"query": "tags:\"key1=foo1\""
}
}
}
which would match documents 1 and 4
The gotcha here, is that if you dont provide the quotes, the search will match all documents that have a tag that contains "key1" or "foo1", matching documents 1,2 and 4. This is certainly not the intended result.
Additionally because of the quoting, you lose the ability to do partial matches, ie, where the value of key "key2" starts with "foo".
However, as a benefit, if you just provided a search query of "tags:foo1" it would return all documents that have a tag that has a key or a value set to "foo1"
with a key:value schema
{
"title": "Doc1",
"tags": {
"key1": "foo1",
"key2": "foo2"
}
},
{
"title": "Doc2",
"tags": {
"key5": "foo1",
"key2": "foo3"
}
},
{
"title": "Doc3",
"tags":
"foobar": "true"
}
},
{
"title": "Doc4",
"tags": {
"key1": "foo1 bar2",
]
},
The same search query to match where key1==foo1 becomes
{
"query": {
"query_string": {
"query": "tags.key1:foo1"
}
}
}
No quoting necessary.
to match where key2 starts with foo
"query": "tags.key2:foo*"
It would not be possible (to my knowledge) to send a query that would match either the key or the value. But you can match the value across all keys by including a "fields" field to limit the scope of the query.
{
"query": {
"query_string": {
"fields": ["tags.*"],
"query": "foo1"
}
}
}
which would match documents 1, 2 and 4
Including the "fields" field on all queries would probably be a good practice anyway and wouldnt affect tag.key:value format queries
from metrictank.
ok. So i have been reading up on this and experimenting with Elasticsearch, and it is looking more and more like single strings with ":" separated key value pairs is easiest to deal with.
As noted in the previous comments, when using an Object approach for key:value pairs, it is not possible to search where the key matches the query string. This is a pretty big issue given that any UI would want to provide suggestions as users enter the query.
Turns out the default tokenizer in Elasticsearch wont split on ":" characters so "key:value" will be treated as 1 term, where as "key=value" would be treated as 2 ["key", "value"]. users would still be able to match a query across all keys by searching "*:value" (the ':' needs to be escaped)
{
"query": {
"query_string": {
"fields": [ "tags" ],
"query": "*\\:foo1"
}
}
}
from metrictank.
ok so you're saying we would store tags as an array of strings like "key:val" instead of using "=" ?
does this hamper or reinforce the feasability of key-less tags, and why?
from metrictank.
So yes, i am saying stick with an array of strings. Splitting the string into key/value pairs would be left up to the client querying the index (graphite/grafana). Though for simplicity it using a colon ":" as the deliminator is the preferred approach.
This approach will allow users to use key-less tags. Using key-less tags will however limit capabilities as the user wont be able to perform groupBy, AliasBy etc.. style transformations.
from metrictank.
okay. so actionable items?
- move to
[]string
instead of a map and use:
as delimiter?
probably best if we can also put this change as part of the nsq migration.
from metrictank.
yes. ill update the collectors so they send strings.
from metrictank.
comment to mark this as answered in codetree
from metrictank.
this has already been deployed to production. Metric definitions and events are using []string for tags.
from metrictank.
Related Issues (20)
- High priority / backlog on metric flush HOT 5
- MT-Whisper-Importer-Writer can get stuck on invalid requests HOT 1
- request: tag-native divide and asPercent functions HOT 9
- Panic and crash in chunk cache
- Graphite API responds with 413 when response is too large not request HOT 5
- Add more logging with traceID HOT 2
- Ability to "revive" archived series HOT 15
- Support "archive" in tags/delByQuery HOT 2
- Stored 'lastupdate' being approximate can cause inconsistent missing data HOT 6
- deletes don't affect stale metrics. They may resurface if max-stale gets increased HOT 2
- Move to go modules for dependency management HOT 5
- Authentication in mt-gateway or document other way to proceed without tsdb-gw HOT 4
- Panic in mt-whisper-importer-reader HOT 1
- Conf parsing changes fail to parse regex HOT 2
- Multiple shards on same hardware node HOT 2
- metrictank memory issues HOT 6
- UnpartitionedMemoryIdx.Get does not check writeQueue
- MetricData messages in mdm topic poorly defined. make versioning explicit?
- Is the project stalled? HOT 2
- Control partition size using Cassandra as backend HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from metrictank.