Giter VIP home page Giter VIP logo

sqlds's Introduction

Build Status

sqlds

sqlds stands for SQL Datasource.

Most SQL-driven datasources, like Postgres, MySQL, and MSSQL share extremely similar codebases.

The sqlds package is intended to remove the repetition of these datasources and centralize the datasource logic. The only thing that the datasources themselves should have to define is connecting to the database, and what driver to use, and the plugin frontend.

Usage

if err := datasource.Manage("my-datasource", datasourceFactory, datasource.ManageOpts{}); err != nil {
  log.DefaultLogger.Error(err.Error())
  os.Exit(1)
}

func datasourceFactory(ctx context.Context, s backend.DataSourceInstanceSettings) (instancemgmt.Instance, error) {
  ds := sqlds.NewDatasource(&myDatasource{})
  return ds.NewDatasource(ctx, s)
}

Standardization

Macros

The sqlds package formerly defined a set of default macros, but those have been migrated to grafana-plugin-sdk-go: see the code for details.

sqlds's People

Contributors

adamyeats avatar andresmgot avatar bossinc avatar carl-db avatar codeincarnate avatar conor-mcavoy avatar dprokop avatar fionera avatar fridgepoet avatar gingerwizard avatar iwysiu avatar jdbaldry avatar jkaflik avatar julienduchesne avatar kevinwcyu avatar kminehart avatar nineinchnick avatar njvrzm avatar sasklacz avatar scottlepp avatar shelldandy avatar sunker avatar tolzhabayev avatar wbrowne avatar yesoreyeram avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sqlds's Issues

Macro matching is too lax

While debugging the Unit-Tests of https://github.com/grafana/clickhouse-datasource I found out that sqlds macro match is too lax. This is basically an extension of #53. The biggest issue is that it does not fail always, because it is dependent on the order of the map, which is not stable

Input:

select * from foo where ( date >= $__fromTime and date <= $__toTime ) limit 100

Output:

select * from foo where ( date >= toDateTime(intDiv(-6795364578871,1000)) ) limit 100

Expected Output:

select * from foo where ( date >= toDateTime(intDiv(-6795364578871,1000)) and date <= toDateTime(intDiv(-6795364578871,1000)) )

Reason:

It fails when it is evaluating toTime (the second macro inside the parentheses) and then fromTime. The match will contain the already evaluated macro and call strings.Replace onto it, which will replace it and create the wrong query.
image

When evaluating fromTime first, the match is correct:
image

Additional Info:

Working:

image

Broken:

image

Support FillMode per query

Some of the existing data sources (e.g. PostgreSQL) support to specify a fill mode but just for the current query. The $__timeGroup macro of PostgreSQL states: $__timeGroup(dateColumn,'5m', 0), $__timeGroup(dateColumn,'5m', previous), $__timeGroup(dateColumn,'5m', NULL), being the third argument the fill mode.

As far as I can see, using sqlds is only possible to setup the fill mode globally for the datasource.

My suggestion would be to add an optional FillMissing struct to the Query and conditionally use that one (taking precedence over the Datasource fill mode). WDYT @kminehart ?

Let driver implement CallResourceHandler

Many SQL data sources have the need to load additional data from some api to populate the query editor with meta data. In some cases, the meta data can be retrieved using a normal sql query, i.e by using the QueryData handler which in turn is using the go sql query method. But retrieving meta data using sql queries is not always feasible. Both redshift and athena for example, needs to be able to load meta data that cannot be fetched using a sql statement. Since sqlds acts as the main entry point for the grpc server, there's currently no way for redshift or athena to load meta data, making it hard to use sqlds.

To solve this problem, it should be possible for the driver to implement it's own CallResourceHandler. The resource handler should be optional, and registered upon creation of the sqlds datasource.

The signature in sqlds could be something like this:

GetResourceHandler() backend.CallResourceHandler

And the consumer would create the resource handler doing something like this:

func (s *RedshiftDatasource) GetResourceHandler() backend.CallResourceHandler {
	mux := http.NewServeMux()
	mux.HandleFunc("/some-meta-data-endpoint", func (rw http.ResponseWriter, req *http.Request) {
		rw.Write([]byte("hello"))
	})
	return httpadapter.New(mux)
}

To make this optional and not introduce breaking changes, it should probably not be a part of the driver interface. Maybe we should use type assertions instead.

Thoughts @kminehart @andresmgot @yesoreyeram ?

should we make time-series behavior closer to core-sql datasources? ( "metric" handling)

hi,

we have an issue reported for the bigquery-datasource (which uses sqlds), about a difference in behavior between bigquery-plugin and postgres. i'd like to discuss what we think about this, is this that we should change, add some optional-thing, or we should not have.

the postgres/mysq/mssql does a special handling for the following case:

  • format=time_series
  • exactly 3 db-columns
  • the string-column is named metric

for example, this query:

SELECT CURRENT_TIMESTAMP as time, 'sku' AS metric, 1 AS cost

the output from sqlds is a dataframe with 2 fields: the time-field and the value-field. this is about the value field.
sqlds: value-field has: name=cost, labels={metric:sku}
postgres: value-field has: name=sku, no labels

this will cause the legend-field in the visualisation to change:
sqlds: legend has cost sku
postgres: legend has sku

this can be a nicer visual representation in some cases (after all, if you only have 1 value-db-column, you don't care about it's name).

the postgres code: https://github.com/grafana/grafana/blob/86c618a6d62dcd2f33983fb0cecbad71781da8c3/pkg/tsdb/sqleng/sql_engine.go#L340-L354

i wonder what do you think about this. some arguments:

  • pro:
    • this may be a very common time-series query (don't know), so making it more ergonomical may be a good thing
    • we already have some assumptions about the data in the format=time_series case
  • con:
    • not doing it makes the behavior simpler, easier to understand, this special-behavior is not really discoverable
    • changing the behavior breaks backward compatibility
    • there is a workaround: use a transform (labels-to-fields or organize-by-name)

what do you think?

(one alternative is to do the change in the visualisation (timeseries in this case))

for reference, here's a list of various queries, and the legend-comparison between postgres and bigquery:


SELECT CURRENT_TIMESTAMP as time, 'sku' AS something, 1 AS cost

bigquery: cost sku
postgres: cost sku


SELECT CURRENT_TIMESTAMP as time, 'sku' AS metric, 1 AS cost

bigquery: cost sku
postgres: sku


SELECT CURRENT_TIMESTAMP as time, 'sku' AS metric, 'xyz' as thing, 1 AS cost

bigquery: cost {metric="sku", thing="xyz"}
postgres: cost {metric="sku", thing="xyz"}


calling same macro with different parameters doesn't work

In SQL Query when the same macro used multiple times with different parameters, they are not parsed correctly.

WHERE $__timeFilter("c_timestamp") AND $__fromTimeFilter("c_string",YYYY-MM-DD) -- works
---
WHERE $__timeFilter("c_timestamp") AND $__fromTimeFilter("c_timestamp") -- works
---
WHERE $__timeFilter("c_timestamp")  AND $__fromTimeFilter("c_timestamp")  AND $__fromTimeFilter("c_timestamp") -- works
---
WHERE $__timeFilter("c_timestamp") AND $__fromTimeFilter("c_string",YYYY-MM-DD)  AND $__fromTimeFilter("c_timestamp") - doesn't work

in the above example, last one tranlsated as follows

WHERE "c_timestamp" > '2019-04-29T12:47:46Z' AND "c_timestamp" < '2022-04-29T12:47:46Z' AND TO_TIMESTAMP("c_string",'YYYY-MM-DD') > '2019-04-29T12:47:46Z'  AND $__fromTimeFilter("c_timestamp") -- doesn't work

Healthcheck fails when Connect throws error

After migrating to the managed data source, CheckHealth doesn't throw an actual health check error if the Connect throws an error. If the Connect method throws error, healthcheck failed with the generic error message Plugin health check failed

image

Extend the Completable interface methods

The current method for getting a table columns looks like this:

func Columns(ctx context.Context, table string) ([]string, error)

This is not complete since the columns may depend of a schema (e.g. there can be a table foo in two schemas public and bar). But some SQL data sources doesn't have the concept of schema and forcing it may not be the best option.

Also, we have an example of a data source that the table / column resolution depends on other parameters (not SQL related). This SQL data source is a database as a service, so it also depends on the database name and the region. To satisfy this for the moment, we have added a new endpoint /columns-with-details but that's just a workaround IMO.

To satisfy all the different data sources, we will need flexible argument(s) to set the different parameters. If we agree to that we have several options for the function signature:

  1. func Columns(ctx context.Context, options map[string]string) ([]string, error)
  2. func Columns(ctx context.Context, options ...string) ([]string, error)
  3. func Columns(ctx context.Context, options interface{}) ([]string, error)

I would go with the second since it better reflect the variadic nature of the function and all the parameters we foresee are strings. Opinions?

Note that this would be a breaking change.

Add release instructions

There is nowhere in this repo where it explains how to release this package. It should be documented so future devs know how to release it.

Discussion: Macros standardization

Hi, I am working on adding macros to a new datasource and it may makes sense to do some refactoring. As far as I can see, all SQL datasources can/should implement the following macros:

  • $__timeFilter. To filter based on the query interval
  • $__timeGroup. To group metrics
  • $__timeFrom. To filter based on the query interval starting point
  • $__timeTo. To filter based on the query interval ending point

The implementation (even the number of args) may differ between different data sources though. Does it makes sense to define those macros here and let the different datasources implement it?

Also, it's necessary a library to convert string durations (e.g. 10ms) to a time.Duration. Main grafana has a package for this: https://github.com/grafana/grafana/tree/main/pkg/components/gtime. Should we copy that package here so we can have a standard way of parsing durations (for the $__timeGroup macro)? (or to the grafana-plugin-sdk-go?)

WDYT @sunker @kminehart @yesoreyeram?

Add a CHANGLOG.md

Currently, the project is missing CHANGELOG.md. It is hard to find the changes and breaking changes.

Panels with multiple queries submit them sequentially

While investigating a customer problem regarding the performance of the snowflake datasource, it became apparent that there seems to be a limitation in that plugin that runs multiple queries sequentially.
@kminehart indicated that this would need to be fixed here.

Macro regex should be more strict

I attempted to add a new macro timeFilterEpoc while there is already a macro timeFilter. The problem is that the regex here is always matching timeFilter in all cases.

Recurring macros are not honoured.

If a query have same macro multiple times, only the first occurrence is getting replaced.

For example, in the below query $__timeGroup(timestamp,3h) was expected to replaced thrice. but only the first instance was replaced.

SELECT $__timeGroup(timestamp,3h), "user", sum("value") as "value"
FROM "salesdata"
WHERE $__timeFilter("timestamp")
GROUP BY  $__timeGroup(timestamp,3h), "user"
ORDER BY $__timeGroup(timestamp,3h)  ASC

When formatting as a timeseries, convert to many instead of wide

Currently, when we need to convert frames into the timeseries format, we covert the long format (what we get from the queries) into a wide format with the data.LongToWide function. The problem with wide format, which we currently use, is that if labels (string fields) have unique values, it creates a new column for every value. This means, that for data with n points, we end up with a n x n sized frame after converting it. Instead, we should convert to the many format, which would create n separate size 1 data frames, taking up significantly less memory.

The main part of this is that we currently use the data.LongToWide function from grafana-plugin-sdk-go and there isn't an equivalent function for LongToMany. From what I understand, the two formats should behave the same, but we should test that.

Info about formats: https://grafana.com/developers/dataplane/timeseries

issue reported in https://github.com/grafana/support-escalations/issues/6904

Make it possible to pass args to driver query method

Some data sources plugin, such as the one for Athena, will need to be able to switch database inside the query editor without having to change the connection in the config editor. The query method in the go sql package takes a variadic function parameter for args, so we should be able to use that to pass additional arguments to the driver.

Unfortunately, the args might differ from plugin to plugin, making it hard to provide proper typings for this. For that reason, I think the Query type needs to take the args as json. Something like this:

type Query struct {
	RawSQL string            `json:"rawSql"`
	Format FormatQueryOption `json:"format"`
	Args   json.RawMessage 	 `json:args`
...
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.