dx-beam / dx Goto Github PK

Inference engine written in Elixir

License: MIT License

Elixir 99.96% Makefile 0.02% Shell 0.01%

dx's Introduction

Dx

Dx enabled you to write Elixir codes as if all your Ecto data is already (pre)loaded.

Under the hood, Dx translates your code (defined using defd) to a version that loads data automatically, when needed, and even translates parts of your code to database queries, which is even more efficient, without you having to implement the data loading at all.

Example

defmodule MyApp.DataLogic do
  import Dx.Defd

  defd published_lists_with_no_tasks(user) do
    Enum.filter(MyApp.Schema.List, fn list ->
      list.published? and
        Enum.count(list.tasks) == 0 and
        list.created_by_id == user.id
    end)
  end
end

This can be called using

Dx.Defd.load!(MyApp.DataLogic.published_lists_with_no_tasks(user))

and will be fully translated to a database call.

It still works the same when you call other defd functions, so you can organize your code cleanly.

Installation

Add dx to your list of dependencies in mix.exs:

def deps do
  [
    {:dx, "~> 0.3.0"}
  ]
end

Add this line to the top of your Ecto schema modules (replace MyApp.Repo with your Ecto repo module)

use Dx.Ecto.Schema, repo: MyApp.Repo

Configure your repo in config.exs (replace MyApp.Repo with your Ecto repo module)

config :dx, repo: MyApp.Repo

Import the formatter rules in .formatter.exs:

[
  import_deps: [:dx]
]

Background

Most server backends for web and mobile applications are split between the actual application and at least one database. In their day-to-day programming, most Elixir developers have to keep that in mind and think about how to store data in the database, and when and how to load it. It's so deeply engrained that we often take this problem for granted, having integrated it in how we think about code and code architecture. For example, Phoenix (the most popular web framework for Elixir) has API contexts that suggest structuring apps into modules that act as a boundary (or interface) to the rest of the code. Within these, data is loaded and returned. Since it's a generic interface, the simplest approach is to load all data that's possibly needed, and return it. However, as the app grows in functionality and thus complexity, this may become a lot of data. And it's still necessary to think about what to return, where it's needed, and how to slice it.

Imagine this problem would not exist. Enter Dx.

With Dx, Elixir developers don't have to think about loading data from the database at all. You just write Elixir code, as if all data is already loaded and readily available.

How it works

When working with data in the database, you define Elixir functions using defd instead of def (the regular Elixir function definition). The defd function must be imported from the Dx.Defd module. Within defd functions, you can write regular Elixir code, accessing all fields and associations as if they're already loaded. You can also call other defd functions and structure your code in modules as usual.

When the app is compiled, Dx translates your defd code into multiple versions with different ways to load data:

Data loading: Any data that might need to be loaded is wrapped in a check that either returns the already loaded data, or returns a "data requirement". Dx runs the code at the entry point (the first function that's a defd function) and either receives the result, or receives a number of data requirements. These are loaded, using the dataloader library under the hood. Then the code is run again, this time either returning the result, or more data requirements, and so on.
Data querying: Parts of the code may be translated to "data scopes", which are used to generate database queries out of your code. For example, using the standard library function Enum.filter in a defd function will try to translate the condition (the anonymous function passed as second argument) into a database query. When successful, the data will not be loaded and then filtered in Elixir, but will already be filtered in the database.

All this happens automatically in the background. Parts of the work are done when compiling your code. Other parts are done when running it.

Caveats

Dx is designed with great care for developer experience. You can just start using it, and will get warnings with explanations if something should or must be done differently. It still helps to understand the main limitations:

Pure functions

Dx translates your code into different other versions of it. The translated versions may then be run any number of times, more or less often than the original would have been run. Thus, that any code defined using defd should be functionally pure. This means, it should not have any side effects.

When the same code is run with the same arguments, it must always return the same result. Examples for non-pure code are using date and time, or random numbers.
defd functions should also not modify any external state, such as modifying data in the database, or printing text to the console. Except if it's fine that the modification is applied multiple times.

Calling non-defd functions

You can call non-defd functions from within defd functions. However, Dx can't "look into" them. No data inside them will be loaded, and they can never be translated to database queries. They will also be run any number of times, so they should be pure functions as well.

Dx will ask you to wrap the call in a non_dx/1 function call. This is just to make clear that the called function is not defined using defd when reading the code.

Finding good entry points

Any time a defd function is called from a regular Elixir function, that's an entry point. That's where any needed data will be loaded.

Dx will ask you to wrap the call in a Dx.Defd.load!/1 function call. This is just to make clear that the called function is an entry point to defd land and data may be loaded here.

It may help to create dedicated modules for all defd functions. They are usually the core of the application, with much of the (business) logic. Any code calling into them - the entry points - in contrast, are outside these modules, for example in a API function, a Phoenix controller, or an Oban worker. This is where the data is loaded, whereas the defd modules consist only of pure functions with (business) logic.

Filter conditions in Elixir vs. SQL

Conditions can behave quite differently in SQL vs. Elixir. In the future, Dx will fully translate all nuances correctly, but for now, you have to keep that in mind yourself.

NULL never matches anything in SQL, but it does in Elixir. For example, title != "TODO" when title = nil will match in Elixir, but not match in SQL. Thus, nil cases must be handled individually: is_nil(title) or title != "TODO"
Dx joins has_one and belongs_to associations using LEFT JOIN in SQL. This means, you can happily access association chains, even if interim assocation parts do not exist. This would crash in Elixir, but in SQL, all fields just appear as NULL. Thus, the presence of associations should be checked individually: not is_nil (list.creator) and is_nil(list.creator.deleted_at)

Currently supported

Syntax

Defining functions using defd
- with multiple clauses
- with patterns in arguments
- without guards
Calling all Enum functions
Calling all Kernel functions without a function argument
fn without patterns in arguments or guards
case with patterns
cond
==

Translatable to database queries

Functions

Enum.count/1
Enum.filter/2

will be translated to database queries, if both

the first argument is either
- a schema module, f.ex. Enum.filter(Todo.Task, fn task -> task.priority == "high")
- the result of another function listed above
the second argument (if any) consists only of functions listed above or:
- ==

Roadmap

Check the Dx roadmap board for updates.

inferred schema fields (deprecated)

Dx is an Elixir library that allows adding inferred properties to Ecto schemas, defined by rules based on fields, associations or other inferred properties. You can then load/query them as if they were Ecto fields or associations using Dx's API.

It allows you to write declarative and easy-to-read domain logic by defining WHAT the rules are without having to care about HOW to execute them.

Under the hood, Dx's evaluation engine loads associations as needed concurrently in batches and can even translate your logic to Ecto queries directly.

If you're new to Dx, the best place to start are the Guides.

Special thanks

This project is sponsored and kindly supported by Team Engine.

If you'd like to join us working on Dx and Refactory as a contractor, please reach out to @arnodirlam.

dx's People

Contributors

Stargazers

Watchers

Forkers

nicolasdabreo kipkemei kianmeng ftes zurga

dx's Issues

Use predicates in rule modules directly

Currently, predicates in rule modules can only be used by adding the whole module via the :extra_rules option. When passing multiple modules via :extra_rules, it can quickly get confusing an error-prone, because all predicates from all these modules go to the same (imagined) namespace and can override each other - if defined on the same type. This can be a powerful means to achieve layers of, and is similar to inheritance in object-oriented programming, but should not be the default or only option.

A more explicit and confined way to split rules into (rule) modules and using them is to use predicates in rule modules directly.

Syntax

Option 1

For example, the syntax could look like:

defmodule Todo.Rules.Authorisation do
  use Infer.Rules, for: Todo.User

  infer is_admin?: true, when: %{roles: %{name: "admin"}}
end

defmodule Todo.List do
  # in condition
  infer can_delete?: true, when: %{args: %{current_user: %{{Todo.Rules.Authorisation, :is_admin?} => true}}}
  infer can_delete?: false

  # in ref path
  infer can_delete?: {:ref, [:args, :current_user, {Todo.Rules.Authorisation, :is_admin?}]}
end

While this seems the most logical syntax at first glance, it's also rather hard to read, because the main piece of information, the predicate name is_admin?, is rather nested.

Option 2

A better syntax could thus be:

# in condition
infer can_delete?: true, when: %{args: %{current_user: %{is_admin?: {Todo.Rules.Authorisation, true}}}}

# in ref path (unchanged)
infer can_delete?: {:ref, [:args, :current_user, {Todo.Rules.Authorisation, :is_admin?}]}

This preserves the sequence of args, current_user and is_admin?, and puts the rule module next to the value, conveying "the source of the value".

Allowing both

Alternatively, both syntaxes could be allowed.

Rule scope

When using a predicate in a rule module, that predicate should only "see" other predicates in the same module, as well as the usual predicates defined directly in the schema type's module.

Interaction with `extra_rules`

This way of using rule modules should become the default go-to way. extra_rules can be kept as a way to override rules, and including those referenced in other modules (which is also easier to see in the second syntax above). It could also be deprecated.

Add telemetry_options passed to dataloader and ecto

Currently, queries run by dataloader or Infer's query functions happen under the hood without the ability to pass additional options to them.

It should be possible to do this

when calling the Infer API

Infer.load!(..., telemetry_options: [...])

globally and dynamically

config :infer, telemetry_options: &MyApp.Infer.Config.telemetry_options/1

defmodule MyApp.Infer.Config do
  def telemetry_options(atom) do
    [
      logger_metadata: Logger.metadata()
    ]
  end
end

Handle dataloader errors

When loading a dataloader batch fails (e.g. times out), dataloader returns {:error, e} instead of {source, data}. This is currently not handled.

Also, Result.unwrap! always tries to raise an error, even if the error is an atom, e.g. :timeout. This It should only raise valid exceptions.

Error when no condition for a predicate matches

Currently, when a predicate is defined and all rules ("cases") have a condition, the fallback value is nil when none of the conditions matches.

In order to make the definition of predicates more explicit, and to force users to think about all the cases, this should be changed to raise if no condition matches.

This change is backward-incompatible. To migrate, users must look at each predicate and explicitly define a fallback rule returning nil with no condition, if needed.

Example

defmodule Todo.List do
  infer archived?: true, when: %{archived_at: {:not, nil}}
end

iex> %Todo.List{archived_at: nil}
...> |> Infer.get!(:archived?)

# current result
nil

# new result
** (Infer.MatchError) no rule condition matching for predicate :archived?

Boolean shorthand predicate should return `false` when not matching

Currently, when using the boolean shorthand predicate, then the condition doesn't match, nil is returned.

Instead, false should be returned.

Example

defmodule Todo.List do
  infer :archived?, when: %{archived_at: {:not, nil}}
end

iex> %Todo.List{archived_at: nil}
...> |> Infer.get!(:archived?)

# current result
nil

# new result
false

Migration

This change is backward-incompatible.

To migrate, users must go through all conditions matching boolean shorthand predicates with nil, and replace that with false.

Querying: Explicit (or auto-detected) batching field

Currently, the syntax for querying records as part of a rule result (:query_one, :query_first and :query_all) uses a keyword list for the condition, with an implicit and semantic:

defmodule Todo.List do
  infer complete_tasks: {:query_all, Todo.Task, [list_id: {:ref, :id}, completed?: true]}
end

This is inconsistent with and conditions being written as maps in all other parts of Infer. The consequent syntax would be:

defmodule Todo.List do
  infer complete_tasks: {:query_all, Todo.Task, %{list_id: {:ref, :id}, completed?: true}}
end

defmodule Todo.Task do
  infer completed?: true, when: %{completed_at: {:not, nil}}
  infer completed?: false
end

The reason for the current implementation is that we need to pick one part of the and condition for batching the internal Ecto query on. This is currently the first of part. In the example above, it would be list_id: {:ref, id}, so evaluating the rule for multiple Todo.List records would generate a query like

SELECT * FROM tasks WHERE completed_at IS NOT NULL AND list_id IN [1, 2, 4]

This allows users to hand-pick, which condition to use for batching. But it's very implicit and uses an inconsistent syntax.

Migration

This change is backward-incompatible.

To migrate, adjust the syntax for all :query_one, :query_first and :query_all results.

Implementation steps

The following steps could be implemented in separate PRs, if preferred.

Explicit batch_by option

The first and easiest step is to make the syntax consistent with the rest of Infer. The example above would become:
```
infer complete_tasks: {:query_all, Todo.Task, %{list_id: {:ref, :id}, completed?: true}, batch_by: :list_id}
```
Automatic detection in simple cases

Not all condition parts are applicable or make sense for batching. In the example above, completed? is a predicate, which is defined using a :not condition, which cannot be used for batching, since it's static.

We could also batch by completed?: true, but since the matching value is always true there wouldn't be any gain.

The next step could thus be to look at all condition parts and only keep the ones that have a :ref in their value, because that makes them dynamic and thus candidates for batching.

Then, if only one candidate remains, batch_by must be omitted.
Automatic detection in complex cases

If multiple condition parts remain as candidates for batching, the ideal solution is to implement a heuristic for picking the best-suited one. The best suited-suited condition part is the one that leads to the fewest batches. In other words, it's the one with the highest cardinality in values matched.

For example, a condition matching list_id with the values 1, 2, 4 and author_id with the values 6, 6, 7 would pick the list_id, because it has 3 unique values ("cardinality") whereas author_id has only 2. The number of generated batches would thus be 2 (the number of different author_id values). When 3 or more condition parts are involved, it is necessary to count the number of unique combinations of values of the other parts.

This implementation requires looking at the relevant values (with a cap, f.ex. 25) to determine the field with the highest cardinality.

It should also support cases where the right side (matched value) is a list of multiple values, and flatten them for the cardinality detection.

When this is implemented, batch_by becomes fully optional.

Rename `{:all, [...]}` to `{:all_of, [...]}`

To increase readability and avoid confusion with the existing {:all?, condition}, which behaves like Enum.all?/2, the existing {:all, [condition_1, ..., condition_n]} is to be renamed to {:all_of, [condition_1, ..., condition_n]}.

Load data on args when passed to query functions

When calling query functions or using query primitives (such as query_all, query_one), Infer can only resolve simple args, no data structures or records.

Query functions should work correctly with rules where

Conditions on args are evaluated
Refs on args are evaluated where data is already loaded
Refs on args are evaluated where data is not loaded

For now, these should be loaded using Infer.Engine and injected into the SQL that Infer.Ecto.Query generates. In a later step, data that's not loaded on args should be translated to SQL itself.

Add guides

Re-add tests

Since Infer was originally developed in-house and within a private domain, we could not make the tests open-source together with the code. We're working on rewriting the tests using a generic data schema.

Expand, validate & simplify logic before execution

Currently, when executing logic, the mappings and conditions must look up the rules while traversing the requirements. This is true for Dx.Engine and the ecto query builder.

Downsides

the lookup is relatively slow (calling functions on modules dynamically) and is executed repeatedly, for each node in the logic tree, for each subject, and for each evaluation after loading data
the execution can not be simplified/optimized easily, because there's no easy way to "look ahead"
changes in the DSL must often be reflected in the execution, blurring the line of separation of concerns

Solution

We introduce an intermediate data structure, a "plan", based on what's needed in a particular evaluation run (the "logic tree"). This is a subset of the DSL.

The goal is that the Dx.Engine and the ecto query builder receive only the plan and don't need to look up any rules. They only need to understand the plan, and nothing that goes into the plan.

Advantages

the plan is created before execution, only once, and can easily be simplified/optimized and then re-used within one evaluation run
clear separation of concerns

Documentation Bug?

In basics/04_references.md:44-47, there appears to be a bug. The field on Task is completed_at, but the completed_later? inference uses archived_at:

  infer completed_later?: false, when: %{completed?: false}
  infer completed_later?: false, when: %{list: %{archived?: false}}
  infer completed_later?: true, when: %{archived_at: {:gt, {:ref, [:list, :archived_at]}}}
  infer completed_later?: false

This seems like it should be:

  infer completed_later?: false, when: %{completed?: false}
  infer completed_later?: false, when: %{list: %{archived?: false}}
  infer completed_later?: true, when: %{completed_at: {:gt, {:ref, [:list, :archived_at]}}}
  infer completed_later?: false

dx-beam / dx Goto Github PK

dx's Introduction

Dx

Example

Installation

Background

How it works

Caveats

Pure functions

Calling non-defd functions

Finding good entry points

Filter conditions in Elixir vs. SQL

Currently supported

Syntax

Translatable to database queries

Functions

Roadmap

inferred schema fields (deprecated)

Special thanks

dx's People

Contributors

Stargazers

Watchers

Forkers

dx's Issues

Syntax

Option 1

Option 2

Allowing both

Rule scope

Interaction with extra_rules

Example

Example

Migration

Migration

Implementation steps

Downsides

Solution

Advantages

Recommend Projects

Recommend Topics

Recommend Org

Interaction with `extra_rules`