Parameterized Queries
I'd like to propose a spec for how parameterized queries should be processed by the query endpoints - there's two parts to that.
- How Parameter Values and Types are specified .
- How Parameter Names and Values are provided on the URL.
Parameter Values and Types
for both SQL and SPARQL, I think we should support "simple", "safe", and "RDF" parameter values.
- RDF parameters allow for complete precision in the same language the underlying query engine understands, at the cost of a pretty verbose and esoteric syntax. If you want total precision, use this syntax.
- "safe" parameters allow you to be specific about String/URI parameters by wrapping values in "" or <> - definitely a necessity for user-entered content. If you're building an SDK or integration, this helps make sure that you can be precise about types without the loss of readability in RDF types.
- "simple" parameters mean that we'll do the right thing with most values, defaulting to String where we can't make a better guess. This maximizes the chance that ad-hoc queries will return results when the user's meaning is clear.
to parse values, here is the algorithm:
try "RDF" parameters:
if value matches /^"(.*)"\^\^<([^<>]*)>$/
:
"abcdef"^^<http://www.w3.org/2001/XMLSchema#string>
"3"^^<http://www.w3.org/2001/XMLSchema#integer>
"4.2"^^<http://www.w3.org/2001/XMLSchema#decimal>
"true"^^<http://www.w3.org/2001/XMLSchema#boolean>
(matches two "groups" - the string type value and the URI of the type)
"Safe" parameters:
if value matches /^"(.*)"$/
:
"abcdef" <- String
"3" <- String
"4.2" <- String
"true" <- String
"https://data.world/" <- String
(matches one group - the string value)
if value matches /^<(.*)>$/
:
<https://data.world/> <- URI
<abcdef> <- URI
<3> <- URI
(matches one group - the URI)
"Simple" parameters:
if value matches /^([0-9]+)$/
:
if value matches /^([0-9]*[.][0-9]+)$/
:
if value matches /^(true|false)$/
:
if value matches /^([a-z]+:\/\/.*)$/
:
https://data.world/ <- URI
(all of the above match one group - the value to interpret as Integer/Decimal/Boolean/URI)
otherwise :
(just treat the whole value as a String if nothing else matches)
Parameter Names and Values
For SPARQL:
SPARQL supports named parameters, and parameters in queries can be specified either as ?var
or $var
- it's a very common convention to use ?var
for variables that are meant to be matched and $var
for variables that are bound to the query execution. Because of that, using the $
syntax as query string parameters is a common way to pass bound variables on a HTTP URL. No reason we shouldn't use that syntax here:
.../sparql/user/dataset?query=<QUERY>&$var1=<VALUE1>&$var2=<VALUE2>
where and are values according to the spec above
For SQL:
SQL only supports positional parameters. Luckily, HTTP query parameters have a straightforward way to specify an arbitrary length sequence of values for a query parameter - simply repeat the same query parameter name, and multiple instances of that will be treated as a sequence of those values. I'm proposing that we use p
for the name of our parameter variable (to keep the URLs nice and short), but could do param
or parameter
too:
.../sql/user/dataset?query=<QUERY>&p=<VALUE1>&p=<VALUE2>
where, again, and are values according to the spec above
In both cases (SPARQL and SQL) the way we interpret values is identical. Clearly the values will need to be URL-encoded when actually sent on a URL (as with any value)...