grafana / sqlds Goto Github PK

View Code? Open in Web Editor NEW

16.0 122.0 12.0 164 KB

A package that assists writing SQL-driven datasources

License: Apache License 2.0

Go 99.85% Makefile 0.15%

sqlds's Introduction

sqlds

sqlds stands for SQL Datasource.

Most SQL-driven datasources, like Postgres, MySQL, and MSSQL share extremely similar codebases.

The sqlds package is intended to remove the repetition of these datasources and centralize the datasource logic. The only thing that the datasources themselves should have to define is connecting to the database, and what driver to use, and the plugin frontend.

Usage

if err := datasource.Manage("my-datasource", datasourceFactory, datasource.ManageOpts{}); err != nil {
  log.DefaultLogger.Error(err.Error())
  os.Exit(1)
}

func datasourceFactory(ctx context.Context, s backend.DataSourceInstanceSettings) (instancemgmt.Instance, error) {
  ds := sqlds.NewDatasource(&myDatasource{})
  return ds.NewDatasource(ctx, s)
}

Standardization

Macros

The sqlds package formerly defined a set of default macros, but those have been migrated to grafana-plugin-sdk-go: see the code for details.

sqlds's People

Contributors

Stargazers

Watchers

Forkers

isabella232 nineinchnick jkaflik carl-db fionera carlverge gingerwizard aramperes conor-mcavoy protryon vunetsystems

sqlds's Issues

Macro matching is too lax

While debugging the Unit-Tests of https://github.com/grafana/clickhouse-datasource I found out that sqlds macro match is too lax. This is basically an extension of #53. The biggest issue is that it does not fail always, because it is dependent on the order of the map, which is not stable

Input:

select * from foo where ( date >= $__fromTime and date <= $__toTime ) limit 100

Output:

select * from foo where ( date >= toDateTime(intDiv(-6795364578871,1000)) ) limit 100

Expected Output:

select * from foo where ( date >= toDateTime(intDiv(-6795364578871,1000)) and date <= toDateTime(intDiv(-6795364578871,1000)) )

Reason:

It fails when it is evaluating toTime (the second macro inside the parentheses) and then fromTime. The match will contain the already evaluated macro and call strings.Replace onto it, which will replace it and create the wrong query.

When evaluating fromTime first, the match is correct:

Additional Info:

Working:

Broken:

Support FillMode per query

Some of the existing data sources (e.g. PostgreSQL) support to specify a fill mode but just for the current query. The $__timeGroup macro of PostgreSQL states: $__timeGroup(dateColumn,'5m', 0), $__timeGroup(dateColumn,'5m', previous), $__timeGroup(dateColumn,'5m', NULL), being the third argument the fill mode.

As far as I can see, using sqlds is only possible to setup the fill mode globally for the datasource.

My suggestion would be to add an optional FillMissing struct to the Query and conditionally use that one (taking precedence over the Datasource fill mode). WDYT @kminehart ?

Test Issue - Please ignore

Refactor away `github.com/pkg/errors` in favor of `errors`

With Snowflake, a query with no results gives an error when converting LongToWide, even though `ErrNoRows` is checked.

Expose functionality for long running queries

Let driver implement CallResourceHandler

Many SQL data sources have the need to load additional data from some api to populate the query editor with meta data. In some cases, the meta data can be retrieved using a normal sql query, i.e by using the QueryData handler which in turn is using the go sql query method. But retrieving meta data using sql queries is not always feasible. Both redshift and athena for example, needs to be able to load meta data that cannot be fetched using a sql statement. Since sqlds acts as the main entry point for the grpc server, there's currently no way for redshift or athena to load meta data, making it hard to use sqlds.

To solve this problem, it should be possible for the driver to implement it's own CallResourceHandler. The resource handler should be optional, and registered upon creation of the sqlds datasource.

The signature in sqlds could be something like this:

GetResourceHandler() backend.CallResourceHandler

And the consumer would create the resource handler doing something like this:

func (s *RedshiftDatasource) GetResourceHandler() backend.CallResourceHandler {
	mux := http.NewServeMux()
	mux.HandleFunc("/some-meta-data-endpoint", func (rw http.ResponseWriter, req *http.Request) {
		rw.Write([]byte("hello"))
	})
	return httpadapter.New(mux)
}

To make this optional and not introduce breaking changes, it should probably not be a part of the driver interface. Maybe we should use type assertions instead.

Thoughts @kminehart @andresmgot @yesoreyeram ?

Macros that have nested functions with multi params do not work

This is coming from grafana/clickhouse-datasource#454

If you have a macro that is expecting one param like $__singleParamMacro and you nest a function in that macro like $__singleParamMacro(nestedFunc("foo","bar")) it will split on the comma resulting in two params nestedFunc("foo" and "bar")

We need an RE that does not find commas in nested () and [].

should we make time-series behavior closer to core-sql datasources? ( "metric" handling)

hi,

we have an issue reported for the bigquery-datasource (which uses sqlds), about a difference in behavior between bigquery-plugin and postgres. i'd like to discuss what we think about this, is this that we should change, add some optional-thing, or we should not have.

the postgres/mysq/mssql does a special handling for the following case:

format=time_series
exactly 3 db-columns
the string-column is named metric

for example, this query:

SELECT CURRENT_TIMESTAMP as time, 'sku' AS metric, 1 AS cost

the output from sqlds is a dataframe with 2 fields: the time-field and the value-field. this is about the value field.
sqlds: value-field has: name=cost, labels={metric:sku}
postgres: value-field has: name=sku, no labels

this will cause the legend-field in the visualisation to change:
sqlds: legend has cost sku
postgres: legend has sku

this can be a nicer visual representation in some cases (after all, if you only have 1 value-db-column, you don't care about it's name).

the postgres code: https://github.com/grafana/grafana/blob/86c618a6d62dcd2f33983fb0cecbad71781da8c3/pkg/tsdb/sqleng/sql_engine.go#L340-L354

i wonder what do you think about this. some arguments:

pro:
- this may be a very common time-series query (don't know), so making it more ergonomical may be a good thing
- we already have some assumptions about the data in the format=time_series case
con:
- not doing it makes the behavior simpler, easier to understand, this special-behavior is not really discoverable
- changing the behavior breaks backward compatibility
- there is a workaround: use a transform (labels-to-fields or organize-by-name)

what do you think?

(one alternative is to do the change in the visualisation (timeseries in this case))

for reference, here's a list of various queries, and the legend-comparison between postgres and bigquery:

SELECT CURRENT_TIMESTAMP as time, 'sku' AS something, 1 AS cost

bigquery: cost sku
postgres: cost sku

SELECT CURRENT_TIMESTAMP as time, 'sku' AS metric, 1 AS cost

bigquery: cost sku
postgres: sku

SELECT CURRENT_TIMESTAMP as time, 'sku' AS metric, 'xyz' as thing, 1 AS cost

bigquery: cost {metric="sku", thing="xyz"}
postgres: cost {metric="sku", thing="xyz"}

calling same macro with different parameters doesn't work

In SQL Query when the same macro used multiple times with different parameters, they are not parsed correctly.

WHERE $__timeFilter("c_timestamp") AND $__fromTimeFilter("c_string",YYYY-MM-DD) -- works
---
WHERE $__timeFilter("c_timestamp") AND $__fromTimeFilter("c_timestamp") -- works
---
WHERE $__timeFilter("c_timestamp")  AND $__fromTimeFilter("c_timestamp")  AND $__fromTimeFilter("c_timestamp") -- works
---
WHERE $__timeFilter("c_timestamp") AND $__fromTimeFilter("c_string",YYYY-MM-DD)  AND $__fromTimeFilter("c_timestamp") - doesn't work

in the above example, last one tranlsated as follows

WHERE "c_timestamp" > '2019-04-29T12:47:46Z' AND "c_timestamp" < '2022-04-29T12:47:46Z' AND TO_TIMESTAMP("c_string",'YYYY-MM-DD') > '2019-04-29T12:47:46Z'  AND $__fromTimeFilter("c_timestamp") -- doesn't work

Healthcheck fails when Connect throws error

After migrating to the managed data source, CheckHealth doesn't throw an actual health check error if the Connect throws an error. If the Connect method throws error, healthcheck failed with the generic error message Plugin health check failed

Show actual query in query inspector for failed queries

When a query fails, in query inspector unable to find the actual query ran after macros applied.

Actual :

Expected :

Add test(s) for query timeout

Extend the Completable interface methods

The current method for getting a table columns looks like this:

func Columns(ctx context.Context, table string) ([]string, error)

This is not complete since the columns may depend of a schema (e.g. there can be a table foo in two schemas public and bar). But some SQL data sources doesn't have the concept of schema and forcing it may not be the best option.

Also, we have an example of a data source that the table / column resolution depends on other parameters (not SQL related). This SQL data source is a database as a service, so it also depends on the database name and the region. To satisfy this for the moment, we have added a new endpoint /columns-with-details but that's just a workaround IMO.

To satisfy all the different data sources, we will need flexible argument(s) to set the different parameters. If we agree to that we have several options for the function signature:

func Columns(ctx context.Context, options map[string]string) ([]string, error)
func Columns(ctx context.Context, options ...string) ([]string, error)
func Columns(ctx context.Context, options interface{}) ([]string, error)

I would go with the second since it better reflect the variadic nature of the function and all the parameters we foresee are strings. Opinions?

Note that this would be a breaking change.

Add release instructions

There is nowhere in this repo where it explains how to release this package. It should be documented so future devs know how to release it.

Discussion: Macros standardization

Hi, I am working on adding macros to a new datasource and it may makes sense to do some refactoring. As far as I can see, all SQL datasources can/should implement the following macros:

$__timeFilter. To filter based on the query interval
$__timeGroup. To group metrics
$__timeFrom. To filter based on the query interval starting point
$__timeTo. To filter based on the query interval ending point

The implementation (even the number of args) may differ between different data sources though. Does it makes sense to define those macros here and let the different datasources implement it?

Also, it's necessary a library to convert string durations (e.g. 10ms) to a time.Duration. Main grafana has a package for this: https://github.com/grafana/grafana/tree/main/pkg/components/gtime. Should we copy that package here so we can have a standard way of parsing durations (for the $__timeGroup macro)? (or to the grafana-plugin-sdk-go?)

WDYT @sunker @kminehart @yesoreyeram?

Add a CHANGLOG.md

Currently, the project is missing CHANGELOG.md. It is hard to find the changes and breaking changes.

Discuss: Should we put this in the grafana-plugin-sdk?

Add Context and timeout

Panels with multiple queries submit them sequentially

While investigating a customer problem regarding the performance of the snowflake datasource, it became apparent that there seems to be a limitation in that plugin that runs multiple queries sequentially.
@kminehart indicated that this would need to be fixed here.

Add a contributing file with instructions on how to release

Cache database connections based on datasource Updated time

The most straightforward way to address grafana/athena-datasource#264, where the cached connection is not updated when credentials are changed, is to use the updated timestamp as part of the connection cache key. To remove some code duplication and fix the same issue for all sqlds-based datasources, I'd like to make that change here.

Macro regex should be more strict

I attempted to add a new macro timeFilterEpoc while there is already a macro timeFilter. The problem is that the regex here is always matching timeFilter in all cases.

Multiple connections appear to be broken

It seems like the default key is always replaced when NewDatasource is called, so it seems like the latest connection is always being used. I have a temp workaround for the issue here: https://github.com/grafana/snowflake-datasource/pull/98

It's unclear how this is intended to work or if this is related to the issue, but the query's q.ConnectionArgs variable is always nil.

Update to sqlutil helpers

In https://github.com/grafana/cloud-data-sources/issues/94 some standalone sql helper functions were copied to sqlutil in the plugin sdk. Use those here instead to deduplicate code.

Recurring macros are not honoured.

If a query have same macro multiple times, only the first occurrence is getting replaced.

For example, in the below query $__timeGroup(timestamp,3h) was expected to replaced thrice. but only the first instance was replaced.

SELECT $__timeGroup(timestamp,3h), "user", sum("value") as "value"
FROM "salesdata"
WHERE $__timeFilter("timestamp")
GROUP BY  $__timeGroup(timestamp,3h), "user"
ORDER BY $__timeGroup(timestamp,3h)  ASC

When formatting as a timeseries, convert to many instead of wide

Currently, when we need to convert frames into the timeseries format, we covert the long format (what we get from the queries) into a wide format with the data.LongToWide function. The problem with wide format, which we currently use, is that if labels (string fields) have unique values, it creates a new column for every value. This means, that for data with n points, we end up with a n x n sized frame after converting it. Instead, we should convert to the many format, which would create n separate size 1 data frames, taking up significantly less memory.

The main part of this is that we currently use the data.LongToWide function from grafana-plugin-sdk-go and there isn't an equivalent function for LongToMany. From what I understand, the two formats should behave the same, but we should test that.

Info about formats: https://grafana.com/developers/dataplane/timeseries

issue reported in https://github.com/grafana/support-escalations/issues/6904

Make it possible to pass args to driver query method

Some data sources plugin, such as the one for Athena, will need to be able to switch database inside the query editor without having to change the connection in the config editor. The query method in the go sql package takes a variadic function parameter for args, so we should be able to use that to pass additional arguments to the driver.

Unfortunately, the args might differ from plugin to plugin, making it hard to provide proper typings for this. For that reason, I think the Query type needs to take the args as json. Something like this:

type Query struct {
	RawSQL string            `json:"rawSql"`
	Format FormatQueryOption `json:"format"`
	Args   json.RawMessage 	 `json:args`
...
}