Giter VIP home page Giter VIP logo

Comments (8)

woodsaj avatar woodsaj commented on May 18, 2024

The problem i have with single strings is that it makes querying via Elasitcsearch much more difficult and less intuitive. Elasticsearch will already tokenize the document, splitting on whitespace and other common deliminators like ":=,"

Eg.
With the following example documents

{
  "title": "Doc1",
  "tags": [
    "key1=foo1",
    "key2=foo2"
  ]
},
{
  "title": "Doc2",
  "tags": [
    "key5=foo1",
    "key2=foo3"
  ]
},
{
  "title": "Doc3",
  "tags": [
    "foobar"
  ]
},
{
  "title": "Doc4",
  "tags": [
    "key1=foo1",
    "key1=bar2"
  ]
},

if i wanted to match every document where key1==foo1, then i would need to quote my search query.

{
  "query": {
    "query_string": {
        "query": "tags:\"key1=foo1\""
    }
  }
}

which would match documents 1 and 4

The gotcha here, is that if you dont provide the quotes, the search will match all documents that have a tag that contains "key1" or "foo1", matching documents 1,2 and 4. This is certainly not the intended result.

Additionally because of the quoting, you lose the ability to do partial matches, ie, where the value of key "key2" starts with "foo".

However, as a benefit, if you just provided a search query of "tags:foo1" it would return all documents that have a tag that has a key or a value set to "foo1"

with a key:value schema

{
  "title": "Doc1",
  "tags": {
    "key1": "foo1",
    "key2": "foo2"
  }
},
{
  "title": "Doc2",
  "tags": {
    "key5": "foo1",
    "key2": "foo3"
  }
},
{
  "title": "Doc3",
  "tags": 
    "foobar": "true"
  }
},
{
  "title": "Doc4",
  "tags": {
    "key1": "foo1 bar2",
  ]
},

The same search query to match where key1==foo1 becomes

{
  "query": {
    "query_string": {
        "query": "tags.key1:foo1"
    }
  }
}

No quoting necessary.

to match where key2 starts with foo

 "query": "tags.key2:foo*"

It would not be possible (to my knowledge) to send a query that would match either the key or the value. But you can match the value across all keys by including a "fields" field to limit the scope of the query.

{
  "query": {
    "query_string": {
       "fields": ["tags.*"],
       "query": "foo1"
    }
  }
}

which would match documents 1, 2 and 4

Including the "fields" field on all queries would probably be a good practice anyway and wouldnt affect tag.key:value format queries

from metrictank.

woodsaj avatar woodsaj commented on May 18, 2024

ok. So i have been reading up on this and experimenting with Elasticsearch, and it is looking more and more like single strings with ":" separated key value pairs is easiest to deal with.

As noted in the previous comments, when using an Object approach for key:value pairs, it is not possible to search where the key matches the query string. This is a pretty big issue given that any UI would want to provide suggestions as users enter the query.

Turns out the default tokenizer in Elasticsearch wont split on ":" characters so "key:value" will be treated as 1 term, where as "key=value" would be treated as 2 ["key", "value"]. users would still be able to match a query across all keys by searching "*:value" (the ':' needs to be escaped)

{  
  "query": {  
    "query_string": {  
      "fields": [ "tags" ],
      "query": "*\\:foo1"
    }
  }
}

from metrictank.

Dieterbe avatar Dieterbe commented on May 18, 2024

ok so you're saying we would store tags as an array of strings like "key:val" instead of using "=" ?

does this hamper or reinforce the feasability of key-less tags, and why?

from metrictank.

woodsaj avatar woodsaj commented on May 18, 2024

So yes, i am saying stick with an array of strings. Splitting the string into key/value pairs would be left up to the client querying the index (graphite/grafana). Though for simplicity it using a colon ":" as the deliminator is the preferred approach.

This approach will allow users to use key-less tags. Using key-less tags will however limit capabilities as the user wont be able to perform groupBy, AliasBy etc.. style transformations.

from metrictank.

Dieterbe avatar Dieterbe commented on May 18, 2024

okay. so actionable items?

  • move to []string instead of a map and use : as delimiter?

probably best if we can also put this change as part of the nsq migration.

from metrictank.

woodsaj avatar woodsaj commented on May 18, 2024

yes. ill update the collectors so they send strings.

from metrictank.

torkelo avatar torkelo commented on May 18, 2024

comment to mark this as answered in codetree

from metrictank.

woodsaj avatar woodsaj commented on May 18, 2024

this has already been deployed to production. Metric definitions and events are using []string for tags.

from metrictank.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.