kensho-technologies / graphql-compiler Goto Github PK
View Code? Open in Web Editor NEWTurn complex GraphQL queries into optimized database queries.
License: Apache License 2.0
Turn complex GraphQL queries into optimized database queries.
License: Apache License 2.0
We currently disallow using @fold
directives when either directly within, or in a traversal from, an @optional
vertex field. We might be able to add support for this with some work, if there is interest for it.
Ensure that the schema satisfies the following two properties:
out_
or in_
, it must be a vertex field.out_
or in_
.Right now the user is supplying the correct backend string, like 'postgresql', to tell the compiler what backend to compile to. Instead, the user should set things up with a sqlalchemy dialect, which is less prone to issues with passing in an incorrect string, and is naturally accessible with most SQLAlchemy setups.
Right now, compiler setup looks like
backend = 'postgresql'
sql_metadata = SqlMetadata(backend, sqlalchemy_metadata)
compile_graphql_to_sql(..., sql_metadata)
Instead, the dialect corresponding to the backend can be passed in as
from sqlalchemy.dialects import postgresql
sql_metadata = SqlMetadata(postgresql, sqlalchemy_metadata)
compile_graphql_to_sql(..., sql_metadata)
We will need to create a new scalar type to support this.
I was wondering if relationship could be managed through a directive such as @relationship gestalt's one.
To me, the use of a directive seems more user friendly as it does not force any naming convention of field labels, but - honestly - I may be missing some points.
This would allow for a meta-visitor that applies the required visitors in the provided order. This separates the initialization from the steps of the visitors, which is a little more readable.
This could look something like:
visitor_fn_1 = ...
visitor_fn_2 = ...
visitor_fn_3 = ...
visitor_fn = pipeline(
visitor_fn_1,
visitor_fn_2,
visitor_fn_3,
)
block.visit_and_update_expressions(visitor_fn)
as a replacement to
visitor_fn_1 = ...
block.visit_and_update_expressions(visitor_fn_1)
visitor_fn_2 = ...
block.visit_and_update_expressions(visitor_fn_2)
visitor_fn_3 = ...
block.visit_and_update_expressions(visitor_fn_3)
Your current test schema looks like this:
type RootSchemaQuery {
Animal: Animal
BirthEvent: BirthEvent
Entity: Entity
Event: Event
FeedingEvent: FeedingEvent
Food: Food
FoodOrSpecies: FoodOrSpecies
Location: Location
Species: Species
UniquelyIdentifiable: UniquelyIdentifiable
}
If I understood GraphQL and the queries correctly, you should return an [Animal]
instead of a single Animal
. But this causes some problems within your compiler.
If a class has a field that is a list, I would like to do something like
{
Animal {
nicknames @filter(op_name: "length", value: ["$something"])
}
}
Allow queries like:
{
Species {
in_Animal_OfSpecies @fold {
out_Entity_Related {
name @output(out_name: "related_list")
}
}
}
}
We currently don't support traversals from inside a @fold
scope.
The Gremlin backend does not currently support the _x_count
meta-field, per #158.
The rule of thumb for mapping an edge to a JOIN statement is that if the edge is required, an INNER JOIN should be used, and if the edge is optional a LEFT JOIN should be used. This applies to all tables involved in both direct and many-to-many JOINs, with one notable exception.
When an edge is required within an optional scope, the compiler semantics state that if the
outer optional edge is present, but the inner required edge is not, this result should be
excluded. For example with the GraphQL query:
{
Animal {
name @output(out_name: "name")
out_Animal_ParentOf @optional {
name @output(out_name: "child_name")
out_Animal_ParentOf {
name @output(out_name: "grandchild_name")
}
}
}
}
An animal that has a child (satisfying the first optional ParentOf edge), but where that child has no children (failing to satisfy the second required ParentOf edge) should produce no result. Using nested INNER JOINs here from the outer LEFT JOIN, like
SELECT
animal.name as name,
child.name as child_name,
grandchild.name as grandchild_name
FROM animal
LEFT JOIN (
animal AS child
INNER JOIN (
animal as grandchild
) ON child.parentof_id = grandchild.animal_id
) ON animal AS child ON animal.parentof_id = child.animal_id
will have a NULL value returned for the grandchild.name property. The LEFT JOIN condition is fulfilled but the INNER JOIN condition is not, which doesn't exclude the result but rather includes it with a NULL value.
To get the correct semantics, the result when the INNER JOIN condition is not fulfilled needs to be filtered out. This is done explicitly by replacing the INNER JOIN with a LEFT JOIN, and then applying the JOIN condition in the WHERE clause to the rows that are non-null from the LEFT JOIN. For this example this looks like:
SELECT
animal.name as name,
child.name as child_name,
grandchild.name as grandchild_name
FROM animal
LEFT JOIN (
animal AS child
INNER JOIN (
animal as grandchild
) ON child.parentof_id = grandchild.animal_id
) ON animal AS child ON animal.parentof_id = child.animal_id
WHERE
child.animal_id IS NULL
OR
child.parentof_id = grandchild.animal_id -- reapply JOIN condition in WHERE clause
The null check ensures that the filter is only applied iff the LEFT JOIN condition is actually
satisfied.
Once we have query cardinality estimation set up, it should be possible to extend that system to offer a paging / streaming mechanism that avoids overwhelming clients with large result sets all at once.
Somewhat of a related cause to #156: the generated Gremlin code assumes that the edge data is stored at a field named to correspond to the edge type in question. However, if the edge class is abstract, this is not the case.
Resolving this issue might be challenging and may require a lot of work, since Gremlin (for the most part) is not aware of the database schema and inheritance structure.
Having a @tag
directive whose value is never used is semantically wrong since the directive could and should simply be removed. This should throw an error, but currently does not.
Could be interesting to extend support for http://www.agensgraph.com/
I am looking for support of Huawei Graph Engine Service (cloud), which supports "pure" Gremlin, I need to ask first if they or not customizing their own dialect. I think this shouldn't be that hard, i can start with driver change.
Q: What is difference in OrientDB Gremlin "dialect" vs pure Gremlin? Or what should I do to support pure Gremlin?
Q: How do you construct typical GraphQL schema for Graph? I was even thinking to create one super big root GraphQL schema covering all labels, properties, etc.
Any help from real world experience welcomed.
Thank you.
Assume the following schema:
type Foo {
name: String
out_ParentEdgeType: [Foo]
out_ChildEdgeType: [Foo]
}
where the ParentEdgeType
is an abstract superclass of the ChildEdgeType
.
The following query then gets incorrectly compiled in MATCH:
{
Foo {
name @output(out_name: "name")
out_ParentEdgeType @optional @filter(op_name: "has_edge_degree", value: ["$degree"]) {
name
}
}
}
The issue is that the has_edge_degree
assumes that the edge is stored as a field named out_ParentEdgeType
on the Foo
vertex. However, this is not always the case -- if the edge is actually of type ChildEdgeType
(subclass of ParentEdgeType
), it will instead be stored in the out_ChildEdgeType
field on the Foo
vertex.
A possible resolution would be to switch to using the outE()
operator instead, which correctly accounts for inheritance.
Hey guys!
First of all, thanks a lot for this awesome work. I was testing the compiler in combination with Gremlin. The following GraphQL is mentioned in your Readme, but causes a NotImplementedError when trying to generate a Gremlin statement out of it:
Animal {
name @output(out_name: "name")
out_Animal_ParentOf @fold {
_x_count @filter(op_name: ">=", value: ["$min_children"])
@output(out_name: "number_of_children")
name @filter(op_name: "has_substring", value: ["$substr"])
@output(out_name: "child_names")
}
}
Is it a bug or is it just not implemented.
Many thanks!
We currently support only querying data stored on vertices. Edges are used only as a means to get from one vertex to another.
However, data could also in principle be stored on edges. This is something we should consider supporting.
For compiling to MATCH, the following OrientDB issue is a blocker: orientechnologies/orientdb#7802
Currently, it is not possible to do the following:
{
Animal {
name @output(out_name: "animal_name")
out_Entity_Related {
... Species {
name @output(out_name: "animal_species_name")
}
... Animal {
name @output(out_name: "related_animal_name")
}
}
}
}
Maybe there's a way to implement it.
To match the semantics of the GraphQL compiler, recursive common table expressions (CTEs) are required. SQL backends are good at pushing predicates down into subqueries and CTEs, however this does not generally extend to recursive CTEs. This means that it is very easy to write a recursive CTE that will scan an entire table, even if all but a few starting points of that recursion are eventually discarded later.
Using the query
{
Animal {
name @output(out_name: "animal_name")
@filter(op_name: "in_collection", value: ["$names"])
out_Animal_LivesIn @optional {
name @output(out_name: "location_name")
}
out_Animal_ParentOf @recurse(depth: 2) {
name @output(out_name: "animal_or_descendant_name")
}
}
}
as an example, this is addressed with the following algorithm:
SELECT
animal.name AS animal_name,
location.name AS location_name
FROM
animal
LEFT JOIN animal_livesin ON animal_livesin.animal_id = animal.animal_id
LEFT JOIN location ON location.location_id = animal_livesin.livesin_id
WHERE
animal.name IN :names
WITH base_cte AS ( -- the actual name of the CTE is an anonymous table name
SELECT
animal.name as animal_name,
location.name as location_name
animal.animal_id as link_column -- the actual name of the column an anonymous column name
FROM
animal
LEFT JOIN animal_livesin ON animal_livesin.animal_id = animal.animal_id
LEFT JOIN location ON location.location_id = animal_livesin.livesin_id
WHERE
animal.name IN :names
)
__depth_internal_name
, which keeps track of recursion depth per the compiler's semantics.WITH RECURSIVE recursive_cte AS (
-- anchor query, starts with trivial semantics with each animal as it's own parent
SELECT
base_cte.link_column AS animal_id,
base_cte.link_column AS parentof_id,
0 AS __depth_internal_name
FROM
base_cte
UNION ALL
-- recursive query
SELECT
recursive_cte.animal_id,
animal_parentof.parentof_id,
-- increment the depth
recursive_cte.__depth_internal_name + 1 AS __depth_internal_name
FROM
animal_parentof
JOIN recursive_cte ON recursive_cte.parentof_id = animal_parentof.animal_id
WHERE
recursive_cte.__depth_internal_name < :depth -- depth from recurse directive
)
WITH recursive_cte_outputs AS (
SELECT
animal.name AS animal_or_descendant_name,
anon_3.animal_id AS recursive_link_column -- anonymously aliased column
FROM
recursive_cte
JOIN animal on animal.animal_id = recursive_cte.parentof_id
JOIN base_cte ON recursive_cte.animal_id = base_cte.link_column
)
SELECT
base_cte.animal_name,
base_cte.location_name,
recursive_cte_outputs.animal_or_descendant_name
FROM
base_cte
JOIN
recursive_cte_outputs ON base_cte.link_column = recursive_cte_outputs.recursive_link_column
Compiling the following legal GraphQL query fails with the TypeError below. It seems that a BinaryComposition
object is somehow constructed with a None
as one of the sub-expressions.
{
Animal {
name @output(out_name: "animal_name")
uuid @filter(op_name: "between", value: ["$uuid_lower_bound","$uuid_upper_bound"])
in_Animal_ParentOf @optional
@filter(op_name: "has_edge_degree", value: ["$number_of_edges"]) {
out_Entity_Related {
... on Event {
name @output(out_name: "related_event")
}
}
}
}
}
Error:
graphql_compiler/compiler/common.py:50: in compile_graphql_to_match
schema, graphql_string, type_equivalence_hints)
graphql_compiler/compiler/common.py:94: in _compile_graphql_generic
type_equivalence_hints=type_equivalence_hints)
graphql_compiler/compiler/ir_lowering_match/__init__.py:121: in lower_ir
compound_match_query)
graphql_compiler/compiler/ir_lowering_match/optional_traversal.py:563: in lower_context_field_expressions
match_traversals, current_visitor_fn)
graphql_compiler/compiler/ir_lowering_match/optional_traversal.py:532: in _lower_non_existent_context_field_filters
new_filter = step.where_block.visit_and_update_expressions(visitor_fn)
graphql_compiler/compiler/blocks.py:174: in visit_and_update_expressions
new_predicate = self.predicate.visit_and_update(visitor_fn)
graphql_compiler/compiler/expressions.py:676: in visit_and_update
new_left = self.left.visit_and_update(visitor_fn)
graphql_compiler/compiler/expressions.py:680: in visit_and_update
return visitor_fn(BinaryComposition(self.operator, new_left, new_right))
graphql_compiler/compiler/expressions.py:660: in __init__
self.validate()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = BinaryComposition(('&&', None, BinaryComposition(('||', BinaryComposition(('&&', BinaryComposition(('=', Variable(('$n...rentOf',)))), Variable(('$number_of_edges', <graphql.type.definition.GraphQLScalarType object at 0x1056361a8>))))))))))
def validate(self):
"""Validate that the BinaryComposition is correctly representable."""
_validate_operator_name(self.operator, BinaryComposition.SUPPORTED_OPERATORS)
if not isinstance(self.left, Expression):
raise TypeError(u'Expected Expression left, got: {} {}'.format(
> type(self.left).__name__, self.left))
E TypeError: Expected Expression left, got: NoneType None
graphql_compiler/compiler/expressions.py:668: TypeError
Thanks to @kaleagore for the report.
Not sure if this is already the case -- verify and fix if needed.
The code might not enforce this at the moment. Fix it and add a test.
Are you planning to support graphql mutations?
The current OrientDB schema used for snapshot tests is missing fields and classes that exist in the test GraphQL schema.
We should bring them back into sync, and add a test to make sure that they don't diverge again.
When compiling, we accept a GraphQL schema parameter. We should validate that the schema passed this way includes all the required scalar types and has all the required directives.
This can use a constructed SQLAlchemy MetaData object to construct the GraphQL schema from the table objects in the metadata. These tables themselves can be automatically reflected from the database. See https://docs.sqlalchemy.org/en/latest/core/metadata.html for a little background.
In the current examples, all variables are encapsulated with "". According to the definition, this is wrong. This is not critical, as it works anyway. But if you try to integrate the compiler into 3rd party libraries, this might become an issue.
Instead of using:
{
Animal {
name @output(out_name: "animal_name")
color @filter(op_name: "=", value: ["$animal_color"])
}
}
You should write:
query($animal_color: String!) {
Animal {
name @output(out_name: "animal_name")
color @filter(op_name: "=", value: [$animal_color])
}
}
It appears that the Python port of GraphQL.js is less strict about enforcing the "no double-underscored fields in the schema" policy than the GraphQL.js library itself. As a result, the schemas generated by the newest version of the compiler cannot be parsed by the original Javascript GraphQL library.
This is a very unfortunate problem. Sadly, I think the least painful solution would be to rename our __count
field to something like _x_count
, signifying that it's an extension field via the _x_
prefix. Single-underscored fields are allowed to appear in the schema, so this should address the problem for users relying on non-Python GraphQL libraries.
Unfortunately, this will be a breaking change for the GraphQL compiler, and any queries that rely on __count
will have to be changed to use _x_count
instead.
$ flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
Currently, doing queries like the following is not allowed:
{
Animal {
out_Animal_ParentOf @optional {
out_Animal_FedAt {
name @output(out_name: "name")
}
}
}
}
This is because we have no way to traverse out of an optional vertex in MATCH, so this issue is a blocker: orientechnologies/orientdb#7803
This is blocked by #212. This feature allows for any number of required edges to be compiled by the SQL backend.
Hi guys,
I am thinking how to utilize this compiler within Java. One idea is that I will create a generator which will create all possible query combinations and output for each combination a query in some Java consumable form and then fill all these queries into Java app.
What do you think, or is there better way?
There is an example in your Readme using a out_Animal_RelatedTo field which is not existing according to your schema.
Off a clean master branch, running pipenv lock
throws error:
Locking [dev-packages] dependencies...
Warning: Your dependencies could not be resolved. You likely have a mismatch in your sub-dependencies.
You can use $ pipenv install --skip-lock to bypass this mechanism, then run $ pipenv graph to inspect the situation.
Hint: try $ pipenv lock --pre if it is a pre-release dependency.
Could not find a version that matches pluggy<0.7,>=0.5,>=0.7
Tried: 0.3.0, 0.3.0, 0.3.1, 0.3.1, 0.4.0, 0.4.0, 0.5.0, 0.5.1, 0.5.1, 0.5.2, 0.5.2, 0.6.0, 0.6.0, 0.6.0, 0.7.1, 0.7.1, 0.8.0, 0.8.0
There are incompatible versions in the resolved dependencies.```
Maybe detect that all outputs are optional, and filter out empty-only rows.
Example affected query:
{
Animal {
out_Animal_ParentOf @optional {
name @output(out_name: "child_name")
}
}
}
Animals with no offspring will still return rows, but their data will be empty.
While normalized data representations are great for data quality and cleanliness, they often get in the way of ease of use, data discoverability, and navigation through the database.
Using the Animals
schema in the compiler's tests as an example, it would be much easier to find a given animal's grandparents if an out_Animal_Grandparent
edge existed. However, this edge is simply a two-fold traversal of the existing in_Animal_ParentOf
edge; adding a out_Animal_Grandparent
edge would denormalize the schema and would cause difficulties in maintaining the data.
Instead of adding such an edge to the database, we could define a macro that would be expanded by the GraphQL compiler before query compilation. That way, users can submit a query that relies on the out_Animal_Grandparent
edge, and the compiler can use the macro system to rewrite that query into an equivalent query that relies only on existing schema elements.
Currently, when required edges are introduced to the SQL backend, the name_or_alias
filter will now be run against the SQL backend. For this to succeed, there needs to be a SQL backend that supports the List valued alias
field. Postgres is ideal with its native list type.
This issue requires:
name_or_alias
filter to the rootalias
filter only on test backends thatA declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.