Giter VIP home page Giter VIP logo

graphql-compiler's People

Contributors

0xflotus avatar amartyashankha avatar benlongo avatar bojanserafimov avatar chewselene avatar colcarroll avatar cw6515 avatar dependabot[bot] avatar evantey14 avatar gurer-kensho avatar jcd2020 avatar jmeulemans avatar juliangoetz avatar lonerz avatar lwprogramming avatar manesioz avatar michaelashtilmanminkin avatar obi1kenobi avatar pmantica1 avatar qqi0o0 avatar yangsong97 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

graphql-compiler's Issues

Replace SqlBackend with SQLAlchemy Dialect references

Right now the user is supplying the correct backend string, like 'postgresql', to tell the compiler what backend to compile to. Instead, the user should set things up with a sqlalchemy dialect, which is less prone to issues with passing in an incorrect string, and is naturally accessible with most SQLAlchemy setups.

Right now, compiler setup looks like

backend = 'postgresql'
sql_metadata = SqlMetadata(backend, sqlalchemy_metadata)
compile_graphql_to_sql(..., sql_metadata)

Instead, the dialect corresponding to the backend can be passed in as

from sqlalchemy.dialects import postgresql
sql_metadata = SqlMetadata(postgresql, sqlalchemy_metadata)
compile_graphql_to_sql(..., sql_metadata)

Manage relationship through a directive

I was wondering if relationship could be managed through a directive such as @relationship gestalt's one.
To me, the use of a directive seems more user friendly as it does not force any naming convention of field labels, but - honestly - I may be missing some points.

Add support for pipelined visitor functions

This would allow for a meta-visitor that applies the required visitors in the provided order. This separates the initialization from the steps of the visitors, which is a little more readable.

This could look something like:

visitor_fn_1 = ...
visitor_fn_2 = ...
visitor_fn_3 = ...

visitor_fn = pipeline(
     visitor_fn_1,
     visitor_fn_2,
     visitor_fn_3,
)
block.visit_and_update_expressions(visitor_fn)

as a replacement to

visitor_fn_1 = ...
block.visit_and_update_expressions(visitor_fn_1)
visitor_fn_2 = ...
block.visit_and_update_expressions(visitor_fn_2)
visitor_fn_3 = ...
block.visit_and_update_expressions(visitor_fn_3)

Problem with Query Definition in Schema

Your current test schema looks like this:

type RootSchemaQuery {
        Animal: Animal
        BirthEvent: BirthEvent
        Entity: Entity
        Event: Event
        FeedingEvent: FeedingEvent
        Food: Food
        FoodOrSpecies: FoodOrSpecies
        Location: Location
        Species: Species
        UniquelyIdentifiable: UniquelyIdentifiable
    }

If I understood GraphQL and the queries correctly, you should return an [Animal] instead of a single Animal. But this causes some problems within your compiler.

Allow traversing from within a @fold scope

Allow queries like:

{
    Species {
        in_Animal_OfSpecies @fold {
            out_Entity_Related {
                name @output(out_name: "related_list")
            }
        }
    }
}

We currently don't support traversals from inside a @fold scope.

Add support for optional edges to the SQL backend.

The rule of thumb for mapping an edge to a JOIN statement is that if the edge is required, an INNER JOIN should be used, and if the edge is optional a LEFT JOIN should be used. This applies to all tables involved in both direct and many-to-many JOINs, with one notable exception.

When an edge is required within an optional scope, the compiler semantics state that if the
outer optional edge is present, but the inner required edge is not, this result should be
excluded. For example with the GraphQL query:

{
    Animal {
        name @output(out_name: "name")
        out_Animal_ParentOf @optional {
            name @output(out_name: "child_name")
            out_Animal_ParentOf {
                name @output(out_name: "grandchild_name")
            }
        }
    }
}

An animal that has a child (satisfying the first optional ParentOf edge), but where that child has no children (failing to satisfy the second required ParentOf edge) should produce no result. Using nested INNER JOINs here from the outer LEFT JOIN, like

SELECT
    animal.name as name,
    child.name as child_name,
    grandchild.name as grandchild_name
FROM animal
LEFT JOIN (
    animal AS child
    INNER JOIN (
        animal as grandchild
    ) ON child.parentof_id = grandchild.animal_id
) ON animal AS child ON animal.parentof_id = child.animal_id

will have a NULL value returned for the grandchild.name property. The LEFT JOIN condition is fulfilled but the INNER JOIN condition is not, which doesn't exclude the result but rather includes it with a NULL value.

To get the correct semantics, the result when the INNER JOIN condition is not fulfilled needs to be filtered out. This is done explicitly by replacing the INNER JOIN with a LEFT JOIN, and then applying the JOIN condition in the WHERE clause to the rows that are non-null from the LEFT JOIN. For this example this looks like:

SELECT
    animal.name as name,
    child.name as child_name,
    grandchild.name as grandchild_name
FROM animal
LEFT JOIN (
    animal AS child
    INNER JOIN (
        animal as grandchild
    ) ON child.parentof_id = grandchild.animal_id
) ON animal AS child ON animal.parentof_id = child.animal_id
WHERE
    child.animal_id IS NULL
    OR
    child.parentof_id = grandchild.animal_id -- reapply JOIN condition in WHERE clause

The null check ensures that the filter is only applied iff the LEFT JOIN condition is actually
satisfied.

Paging / streaming mechanism for large queries

Once we have query cardinality estimation set up, it should be possible to extend that system to offer a paging / streaming mechanism that avoids overwhelming clients with large result sets all at once.

Folding on an abstract edge class does not work correctly in Gremlin

Somewhat of a related cause to #156: the generated Gremlin code assumes that the edge data is stored at a field named to correspond to the edge type in question. However, if the edge class is abstract, this is not the case.

Resolving this issue might be challenging and may require a lot of work, since Gremlin (for the most part) is not aware of the database schema and inheritance structure.

Support Huawei Cloud Service (Gremlin)

I am looking for support of Huawei Graph Engine Service (cloud), which supports "pure" Gremlin, I need to ask first if they or not customizing their own dialect. I think this shouldn't be that hard, i can start with driver change.

Q: What is difference in OrientDB Gremlin "dialect" vs pure Gremlin? Or what should I do to support pure Gremlin?
Q: How do you construct typical GraphQL schema for Graph? I was even thinking to create one super big root GraphQL schema covering all labels, properties, etc.

Any help from real world experience welcomed.

Thank you.

Filtering with "has_edge_degree" on an abstract edge class does not work correctly in MATCH

Assume the following schema:

type Foo {
    name: String
    out_ParentEdgeType: [Foo]
    out_ChildEdgeType: [Foo]
}

where the ParentEdgeType is an abstract superclass of the ChildEdgeType.

The following query then gets incorrectly compiled in MATCH:

{
    Foo {
        name @output(out_name: "name")
        out_ParentEdgeType @optional @filter(op_name: "has_edge_degree", value: ["$degree"]) {
            name
        }
    }
}

The issue is that the has_edge_degree assumes that the edge is stored as a field named out_ParentEdgeType on the Foo vertex. However, this is not always the case -- if the edge is actually of type ChildEdgeType (subclass of ParentEdgeType), it will instead be stored in the out_ChildEdgeType field on the Foo vertex.

A possible resolution would be to switch to using the outE() operator instead, which correctly accounts for inheritance.

NotImplementedError when calling toGremlin in FoldedContextField

Hey guys!

First of all, thanks a lot for this awesome work. I was testing the compiler in combination with Gremlin. The following GraphQL is mentioned in your Readme, but causes a NotImplementedError when trying to generate a Gremlin statement out of it:

Animal {
        name @output(out_name: "name")
        out_Animal_ParentOf @fold {
            _x_count @filter(op_name: ">=", value: ["$min_children"])
                    @output(out_name: "number_of_children")
            name @filter(op_name: "has_substring", value: ["$substr"])
                 @output(out_name: "child_names")
        }
    }

Is it a bug or is it just not implemented.

Many thanks!

Support querying data stored on edges

We currently support only querying data stored on vertices. Edges are used only as a means to get from one vertex to another.

However, data could also in principle be stored on edges. This is something we should consider supporting.

For compiling to MATCH, the following OrientDB issue is a blocker: orientechnologies/orientdb#7802

Support for multiple type coercion in a single scope

Currently, it is not possible to do the following:

{
    Animal {
        name @output(out_name: "animal_name")
        out_Entity_Related {
            ... Species {
                name @output(out_name: "animal_species_name")
            }
            ... Animal {
                name @output(out_name: "related_animal_name")
            }
        }
    }
}

Maybe there's a way to implement it.

Implement @recurse directive for SQL backend

To match the semantics of the GraphQL compiler, recursive common table expressions (CTEs) are required. SQL backends are good at pushing predicates down into subqueries and CTEs, however this does not generally extend to recursive CTEs. This means that it is very easy to write a recursive CTE that will scan an entire table, even if all but a few starting points of that recursion are eventually discarded later.

Using the query

{
    Animal {
        name @output(out_name: "animal_name")
             @filter(op_name: "in_collection", value: ["$names"])
        out_Animal_LivesIn @optional {
            name @output(out_name: "location_name")
        }
        out_Animal_ParentOf @recurse(depth: 2) {
            name @output(out_name: "animal_or_descendant_name")
        }
    }
}

as an example, this is addressed with the following algorithm:

  1. Recursively create the query, treating recursive edges as a black box. For this example, this results in the rough SQL:
SELECT
    animal.name AS animal_name,
    location.name AS location_name
FROM
    animal
LEFT JOIN animal_livesin ON animal_livesin.animal_id = animal.animal_id
LEFT JOIN location ON location.location_id = animal_livesin.livesin_id
WHERE
    animal.name IN :names
  1. Wrap this query as a CTE, and include any link columns in the output. A link column is the column that the recursive clause will later be attached to.
WITH base_cte AS ( -- the actual name of the CTE is an anonymous table name
    SELECT
        animal.name as animal_name,
        location.name as location_name
        animal.animal_id as link_column -- the actual name of the column an anonymous column name
    FROM
        animal
    LEFT JOIN animal_livesin ON animal_livesin.animal_id = animal.animal_id
    LEFT JOIN location ON location.location_id = animal_livesin.livesin_id
    WHERE
        animal.name IN :names
)
  1. Construct the recursive clause. Here we only recurse on the columns necessary to JOIN before and after the recursion, output columns are not carried along. The recursion is joined to the CTE of the base query, ensuring that the recursion only starts at the required starting points, no more.
    Also worth noting with the recursive clause is the __depth_internal_name, which keeps track of recursion depth per the compiler's semantics.
WITH RECURSIVE recursive_cte AS (
    -- anchor query, starts with trivial semantics with each animal as it's own parent
    SELECT
        base_cte.link_column AS animal_id,
        base_cte.link_column AS parentof_id,
        0 AS __depth_internal_name
    FROM
        base_cte
    UNION ALL
    -- recursive query
    SELECT
        recursive_cte.animal_id,
        animal_parentof.parentof_id,
        -- increment the depth
        recursive_cte.__depth_internal_name + 1 AS __depth_internal_name
    FROM
        animal_parentof
        JOIN recursive_cte ON recursive_cte.parentof_id = animal_parentof.animal_id
    WHERE
        recursive_cte.__depth_internal_name < :depth -- depth from recurse directive
)
  1. JOIN the recursive clause to the recursive table (here animal_parentof) to create output columns, and join back to base cte to carry along tag columns.
WITH recursive_cte_outputs AS (
    SELECT
        animal.name AS animal_or_descendant_name,
        anon_3.animal_id AS recursive_link_column -- anonymously aliased column
    FROM
        recursive_cte
        JOIN animal on animal.animal_id = recursive_cte.parentof_id
        JOIN base_cte ON recursive_cte.animal_id = base_cte.link_column
)
  1. Create the final query
SELECT
    base_cte.animal_name,
    base_cte.location_name,
    recursive_cte_outputs.animal_or_descendant_name
FROM
    base_cte
JOIN
    recursive_cte_outputs ON base_cte.link_column = recursive_cte_outputs.recursive_link_column

TypeError caused by BinaryComposition with None sub-expression

Compiling the following legal GraphQL query fails with the TypeError below. It seems that a BinaryComposition object is somehow constructed with a None as one of the sub-expressions.

{
    Animal {
        name @output(out_name: "animal_name")
        uuid @filter(op_name: "between", value: ["$uuid_lower_bound","$uuid_upper_bound"])

        in_Animal_ParentOf @optional
                           @filter(op_name: "has_edge_degree", value: ["$number_of_edges"]) {
            out_Entity_Related {
                ... on Event {
                    name @output(out_name: "related_event")
                }
            }
        }
    }
}

Error:

graphql_compiler/compiler/common.py:50: in compile_graphql_to_match
    schema, graphql_string, type_equivalence_hints)
graphql_compiler/compiler/common.py:94: in _compile_graphql_generic
    type_equivalence_hints=type_equivalence_hints)
graphql_compiler/compiler/ir_lowering_match/__init__.py:121: in lower_ir
    compound_match_query)
graphql_compiler/compiler/ir_lowering_match/optional_traversal.py:563: in lower_context_field_expressions
    match_traversals, current_visitor_fn)
graphql_compiler/compiler/ir_lowering_match/optional_traversal.py:532: in _lower_non_existent_context_field_filters
    new_filter = step.where_block.visit_and_update_expressions(visitor_fn)
graphql_compiler/compiler/blocks.py:174: in visit_and_update_expressions
    new_predicate = self.predicate.visit_and_update(visitor_fn)
graphql_compiler/compiler/expressions.py:676: in visit_and_update
    new_left = self.left.visit_and_update(visitor_fn)
graphql_compiler/compiler/expressions.py:680: in visit_and_update
    return visitor_fn(BinaryComposition(self.operator, new_left, new_right))
graphql_compiler/compiler/expressions.py:660: in __init__
    self.validate()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = BinaryComposition(('&&', None, BinaryComposition(('||', BinaryComposition(('&&', BinaryComposition(('=', Variable(('$n...rentOf',)))), Variable(('$number_of_edges', <graphql.type.definition.GraphQLScalarType object at 0x1056361a8>))))))))))

    def validate(self):
        """Validate that the BinaryComposition is correctly representable."""
        _validate_operator_name(self.operator, BinaryComposition.SUPPORTED_OPERATORS)
    
        if not isinstance(self.left, Expression):
            raise TypeError(u'Expected Expression left, got: {} {}'.format(
>               type(self.left).__name__, self.left))
E           TypeError: Expected Expression left, got: NoneType None

graphql_compiler/compiler/expressions.py:668: TypeError

Thanks to @kaleagore for the report.

Snapshot tests' OrientDB schema is out of sync relative to the test GraphQL schema

The current OrientDB schema used for snapshot tests is missing fields and classes that exist in the test GraphQL schema.

Compare:
https://github.com/kensho-technologies/graphql-compiler/blob/c85429d2a56fc5856522429643ce79cce25efda2/graphql_compiler/tests/test_data_tools/schema.sql
vs

We should bring them back into sync, and add a test to make sure that they don't diverge again.

Usage of Variables

In the current examples, all variables are encapsulated with "". According to the definition, this is wrong. This is not critical, as it works anyway. But if you try to integrate the compiler into 3rd party libraries, this might become an issue.

Instead of using:

{
    Animal {
        name @output(out_name: "animal_name")
        color @filter(op_name: "=", value: ["$animal_color"])
    }
}

You should write:

query($animal_color: String!) {
    Animal {
        name @output(out_name: "animal_name")
        color @filter(op_name: "=", value: [$animal_color])
    }
}

Custom meta field __count is breaking schema parsing in GraphQL.js

It appears that the Python port of GraphQL.js is less strict about enforcing the "no double-underscored fields in the schema" policy than the GraphQL.js library itself. As a result, the schemas generated by the newest version of the compiler cannot be parsed by the original Javascript GraphQL library.

This is a very unfortunate problem. Sadly, I think the least painful solution would be to rename our __count field to something like _x_count, signifying that it's an extension field via the _x_ prefix. Single-underscored fields are allowed to appear in the schema, so this should address the problem for users relying on non-Python GraphQL libraries.

Unfortunately, this will be a breaking change for the GraphQL compiler, and any queries that rely on __count will have to be changed to use _x_count instead.

cc @jmeulemans @lodrion

8 functions have McCabe complexity > 10

$ flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics

  • ./graphql_compiler/compiler/compiler_frontend.py:383:1: C901 '_compile_vertex_ast' is too complex (17)
  • ./graphql_compiler/compiler/ir_lowering_common.py:17:1: C901 'sanity_check_ir_blocks_from_frontend' is too complex (34)
  • ./graphql_compiler/compiler/ir_lowering_common.py:213:1: C901 'optimize_boolean_expression_comparisons' is too complex (13)
  • ./graphql_compiler/compiler/ir_lowering_match.py:284:1: C901 '_translate_equivalent_locations' is too complex (11)
  • ./graphql_compiler/compiler/match_query.py:33:1: C901 '_per_location_tuple_to_step' is too complex (12)
  • ./graphql_compiler/compiler/workarounds/orientdb_eval_scheduling.py:32:1: C901 '_process_filter_block' is too complex (11)
  • ./graphql_compiler/query_formatting/gremlin_formatting.py:83:1: C901 '_safe_gremlin_argument' is too complex (11)
  • ./graphql_compiler/query_formatting/match_formatting.py:69:1: C901 '_safe_match_argument' is too complex (11)

Using from Java

Hi guys,

I am thinking how to utilize this compiler within Java. One idea is that I will create a generator which will create all possible query combinations and output for each combination a query in some Java consumable form and then fill all these queries into Java app.

What do you think, or is there better way?

Unable to resolve dependencies for pipenv lock

Off a clean master branch, running pipenv lock throws error:

Locking [dev-packages] dependencies...

Warning: Your dependencies could not be resolved. You likely have a mismatch in your sub-dependencies.                                                                                                   
  You can use $ pipenv install --skip-lock to bypass this mechanism, then run $ pipenv graph to inspect the situation.                                                                                   
  Hint: try $ pipenv lock --pre if it is a pre-release dependency.
Could not find a version that matches pluggy<0.7,>=0.5,>=0.7
Tried: 0.3.0, 0.3.0, 0.3.1, 0.3.1, 0.4.0, 0.4.0, 0.5.0, 0.5.1, 0.5.1, 0.5.2, 0.5.2, 0.6.0, 0.6.0, 0.6.0, 0.7.1, 0.7.1, 0.8.0, 0.8.0                                                                      
There are incompatible versions in the resolved dependencies.```

Allow "virtual" edges to be defined, and expanded using an AST-based macro system

While normalized data representations are great for data quality and cleanliness, they often get in the way of ease of use, data discoverability, and navigation through the database.

Using the Animals schema in the compiler's tests as an example, it would be much easier to find a given animal's grandparents if an out_Animal_Grandparent edge existed. However, this edge is simply a two-fold traversal of the existing in_Animal_ParentOf edge; adding a out_Animal_Grandparent edge would denormalize the schema and would cause difficulties in maintaining the data.

Instead of adding such an edge to the database, we could define a macro that would be expanded by the GraphQL compiler before query compilation. That way, users can submit a query that relies on the out_Animal_Grandparent edge, and the compiler can use the macro system to rewrite that query into an equivalent query that relies only on existing schema elements.

Implement List-valued columns for the PostgreSQL backend

Currently, when required edges are introduced to the SQL backend, the name_or_alias filter will now be run against the SQL backend. For this to succeed, there needs to be a SQL backend that supports the List valued alias field. Postgres is ideal with its native list type.

This issue requires:

  1. A test to be introduced that applies the name_or_alias filter to the root
  2. Modification to the SQL test harness that introduces the alias filter only on test backends that
    support it (postgres)
  3. Changing the default test dialect to Postgres from SQLite, so that compiled query tests have a
    full featured backend.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.