evgskv / logica Goto Github PK

View Code? Open in Web Editor NEW

1.7K 37.0 88.0 5.57 MB

Logica is a logic programming language that compiles to SQL. It runs on Google BigQuery, PostgreSQL and SQLite.

Home Page: https://logica.dev

License: Apache License 2.0

Python 11.24% Lex 2.69% Jupyter Notebook 85.75% PicoLisp 0.06% Common Lisp 0.19% Vim Script 0.06% Shell 0.01%

datalog sql bigquery language prolog-implementation prolog logica postgresql sqlite trino presto logic-programming

logica's Introduction

Logica: language of Big Data

Logica is an open source declarative logic programming language for data manipulation. Logica is a successor to Yedalog, a language created at Google earlier.

Why?

Logica is for engineers, data scientists and other specialists who want to use logic programming syntax when writing queries and pipelines for databases and datawarehouses. Logica programs run on BigQuery, Postgres and SQLite.

Logica compiles to SQL and gives you access to the power of SQL ecosystem with the convenience of logic programming syntax.

This is useful because SQL enginers are magnitudes more powerful than state of the art native logic programming engines. For example, BigQuery is a distributed datawarehouse and thus logic programs written in Logica can be easily parallelized onto thousands of servers. Postgres and SQLite are among most popular databases, they are capable of processing substantial volumes of data right on your machine.

We encourage you to try Logica, especially if

you already use logic programming and need more computational power, or
you already have data in BigQuery, PostgreSQL or SQLite, or
you want to learn logic programming and apply it to processing of Big Data.

Support for more SQL dialects and engines is coming in the future.

I have not heard of logic programming. What is it?

Logic programming is a declarative programming paradigm where the program is written as a set of logical statements.

Logic programming was developed in academia from the late 60s. Prolog and Datalog are the most prominent examples of logic programming languages. Logica is a language of the Datalog family.

Datalog and relational databases start from the same idea: think of data as relations and think of data manipulation as a sequence of operations over these relations. But Datalog and SQL differ in how these operations are described. Datalog is inspired by the mathematical syntax of the first order propositional logic and SQL follows the syntax of natural language.

SQL was based on the natural language to give access to databases to the people without formal training in computer programming or mathematics. This convenience may become costly when the logic that you want to express is non trivial. There are many examples of hard-to-read SQL queries that correspond to simple logic programs.

How does Logica work?

Logica compiles the logic program into a SQL expression, so it can be executed on BigQuery, the state of the art SQL engine.

Among database theoreticians Datalog and SQL are known to be equivalent. And indeed the conversion from Datalog to SQL and back is often straightforward. However there are a few nuances, for example how to treat disjunction and negation. In Logica we tried to make choices that make understanding of the resulting SQL structure as easy as possible, thus empowering user to write programs that are executed efficiently.

Why is it called Logica?

Logica stands for Logic with aggregation.

How to learn?

Learn basics of Logica with the CoLab tutorial located at tutorial folder. See examples of using Logica in examples folder.

Tutorial and examples show how to access Logica from CoLab. You can also install Logica command line tool.

Prerequisites

To run Logica programs on BigQuery you will need a Google Cloud Project. Once you have a project you can run Logica programs in CoLab providing your project id.

To run Logica locally you need Python3.

To initiate Logica predicates execution from the command line you will need bq, a BigQuery command line tool. For that you need to install Google Cloud SDK.

Installation

Google Cloud Project is the only thing you need to run Logica in Colab, see Hello World example.

You can install Logica command with pip as follows.

# Install.
python3 -m pip install logica
# Run:
# To see usage message.
python3 -m logica
# To print SQL for HelloWorld program.
python3 -m logica - print Greet <<<'Greet(greeting: "Hello world!")'

If your PATH includes Python's bin folder then you will also be able to run it simply as

logica - print Greet <<<'Greet(greeting: "Hello world!")'

Alternatively, you can clone GitHub repository:

git clone https://github.com/evgskv/logica
cd logica
./logica - print Greet <<<'Greet(greeting: "Hello world!")'

Code samples

Here a couple examples of how Logica code looks like.

Prime numbers

Find prime numbers less than 30.

Program primes.l:

# Define numbers 1 to 30.
Number(x + 1) :- x in Range(30);

# Defining composite numbers.
Composite(a * b) distinct :- Number(a), Number(b), a > 1, b > 1;

# Defining primes as "not composite".
Prime(n) distinct :- Number(n), n > 1, ~Composite(n);

Running primes.l

$ logica primes.l run Prime
+-------+
| prime |
+-------+
|     2 |
|     3 |
|     5 |
|     7 |
|    11 |
|    13 |
|    17 |
|    19 |
|    23 |
|    29 |
+-------+

Cities with largest beer variety

Let's use beer variety dataset from plotly.

Let us find top 5 states with largest variety of beers. In each state we will pick city with the largest variety in the state.

Program beer.l:

@Engine("duckdb");

@Ground(Beer);
Beer(..r) :- 
    `('https://github.com/plotly/datasets/blob/master/beers.csv?raw=true')`(..r);

BeersInState(state) += 1 :- Beer(state:);
BeersInCity(state, city) += 1 :- Beer(state:, city:);

ArgMax5(x) = ArgMaxK(x, 5);
BestCityForBeer(state:, city:,
                city_beers: BeersInCity(state, city),
                state_beers: BeersInState(state)) :-
    state in ArgMax5{s -> BeersInState(s)},
    city = ArgMax{c -> BeersInCity(state, c)};

Running beer.l:

# logica beer.l run BestCityForBeer
+-------+--------------+------------+-------------+
| state | city         | city_beers | state_beers |
+-------+--------------+------------+-------------+
| IN    | Indianapolis | 43         | 139         |
| CO    | Boulder      | 41         | 265         |
| CA    | San Diego    | 42         | 183         |
| TX    | Austin       | 25         | 130         |
| MI    | Grand Rapids | 66         | 162         |
+-------+--------------+------------+-------------+

Feedback

Feel free to create github issues for bugs and feature requests.

You questions and comments are welcome at our github discussions!

Unless otherwise noted, the Logica source files are distributed under the Apache 2.0 license found in the LICENSE file.

This is not an officially supported Google product.

logica's People

Stargazers

Watchers

Forkers

b-xiang wsdjeg vcccvcv daheige kevinlights dingguitao taoism-o sunsetbrightgorgeousagain china-yjk sefgit samuell jrryjcksn viewway doytsujin mioedvitnir konstantint knut0815 sailfish009 yitaether matheuswells shaunstanislauslau lmharlem kennykuang xeirniomj strogo darevyua0 sauravshah31 stjordanis mayite cocti-zhonghao osvaldlaszlo houghliu limengcanyu afgnsu foooool ganjiayi timkam ericway1024 gvauvert chocolatierwhocodes xueweiz pithomlabs vishalbelsare erights dwillmer arjunreddy27 satohjohn yaneshtyagi xiaojing5576 ajwang marlboro90 changming8 atqnp andrewkostousov odanoburu jesseagreene badboy merlindavies knoguchi regan-koopmans cyberflamego rvalek jeevanragula broenotika smdesai mlliarm rabraham albinkc tanmaysahay94 justjake cuishuang tmokazaki petarkabashki mxwllwang raymon610549135 gmh5225 jmyounker ksyusha123 greeengooo vigilancetech-com ramonarfv maindealer cchuang1024 kantorserhiy lanli2017 jagrafft sysang yilinxia

logica's Issues

Add checks of predicate arguments

It will take time to implement type inference.
In the meanwhile we could have a simpler check: when a predicate is called (including databases) and arguments are used, verify that the predicate has corresponding columns.

Support clickhouse

Clickhouse is a columnar store that can be queried with SQL.

Sqlite: Tutorial. OperationalError: no such function: APPROX_COUNT_DISTINCT

Hi,

%%logica OfficeColumn,Offices,EmployeesPerOffice,EmployeesOfOffice
@Engine("sqlite");
Employee(name: "Alice", role: "Product Manager", office: "SEA");
Employee(name: "Bob", role: "Engineer", office: "SEA");
Employee(name: "Caroline", role: "Engineer", office: "LAX");
Employee(name: "David", role: "Data Scientist", office: "LAX");
Employee(name: "Eve", role: "Data Scientist", office: "SEA");

OfficeColumn(office:) :- Employee(office:);

Offices(office:) distinct :- Employee(office:);

EmployeesPerOffice(office:,
                  num_employees? += 1,
                  role_count? Count=role) distinct :-
  Employee(name: unused_name, office:, role:);
  
EmployeesOfOffice(office:, employees? List= name) distinct :-
  Employee(name:, office:);

The logs are:

Running predicate: Offices
Running predicate: OfficeColumn
Running predicate: EmployeesOfOffice
Running predicate: EmployeesPerOffice

---------------------------------------------------------------------------

OperationalError                          Traceback (most recent call last)

<ipython-input-20-9c03d875318d> in <module>()
----> 1 get_ipython().run_cell_magic('logica', 'OfficeColumn,Offices,EmployeesPerOffice,EmployeesOfOffice', '@Engine("sqlite");\nEmployee(name: "Alice", role: "Product Manager", office: "SEA");\nEmployee(name: "Bob", role: "Engineer", office: "SEA");\nEmployee(name: "Caroline", role: "Engineer", office: "LAX");\nEmployee(name: "David", role: "Data Scientist", office: "LAX");\nEmployee(name: "Eve", role: "Data Scientist", office: "SEA");\n\nOfficeColumn(office:) :- Employee(office:);\n\nOffices(office:) distinct :- Employee(office:);\n\nEmployeesPerOffice(office:,\n                  num_employees? += 1,\n                  role_count? Count=role) distinct :-\n  Employee(name: unused_name, office:, role:);\n  \nEmployeesOfOffice(office:, employees? List= name) distinct :-\n  Employee(name:, office:);\n  ')

8 frames

/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
   2115             magic_arg_s = self.var_expand(line, stack_depth)
   2116             with self.builtin_trap:
-> 2117                 result = fn(magic_arg_s, cell)
   2118             return result
   2119 

/usr/local/lib/python3.7/dist-packages/logica/colab_logica.py in logica(line, cell)
    118 @register_cell_magic
    119 def logica(line, cell):
--> 120   Logica(line, cell, run_query=True)
    121 
    122 

/usr/local/lib/python3.7/dist-packages/logica/colab_logica.py in Logica(line, cell, run_query)
    239       sql_runner = RunSQL
    240     result_map = concertina_lib.ExecuteLogicaProgram(
--> 241       executions, sql_runner=sql_runner, sql_engine=engine)
    242 
    243   for idx, predicate in enumerate(predicates):

/usr/local/lib/python3.7/dist-packages/logica/common/concertina_lib.py in ExecuteLogicaProgram(logica_executions, sql_runner, sql_engine)
    215 
    216   concertina = Concertina(config, engine)
--> 217   concertina.Run()
    218   return engine.final_result

/usr/local/lib/python3.7/dist-packages/logica/common/concertina_lib.py in Run(self)
     76   def Run(self):
     77     while self.actions_to_run:
---> 78       self.RunOneAction()
     79 
     80   def ActionColor(self, a):

/usr/local/lib/python3.7/dist-packages/logica/common/concertina_lib.py in RunOneAction(self)
     69     self.running_actions |= {one_action}
     70     self.UpdateDisplay()
---> 71     self.engine.Run(self.action[one_action].get('action', {}))
     72     self.running_actions -= {one_action}
     73     self.complete_actions |= {one_action}

/usr/local/lib/python3.7/dist-packages/logica/common/concertina_lib.py in Run(self, action)
     18       print('Running predicate:', predicate)
     19       result = self.sql_runner(action['sql'], action['engine'],
---> 20                                is_final=(predicate in self.final_predicates))
     21       if predicate in self.final_predicates:
     22         self.final_result[predicate] = result

/usr/local/lib/python3.7/dist-packages/logica/colab_logica.py in __call__(self, sql, engine, is_final)
    159   # TODO: Sqlite runner should not be accepting an engine.
    160   def __call__(self, sql, engine, is_final):
--> 161     return RunSQL(sql, engine, self.connection, is_final)
    162 
    163 

/usr/local/lib/python3.7/dist-packages/logica/colab_logica.py in RunSQL(sql, engine, connection, is_final)
    142   elif engine == 'sqlite':
    143     statements = parse.SplitRaw(sql, ';')
--> 144     connection.executescript(sql)
    145     if is_final:
    146       return pandas.read_sql(statements[-1], connection)

OperationalError: no such function: APPROX_COUNT_DISTINCT

Benchmark performance

It would be great to do a somewhat rigorous analysis of whether there are performance issues when query is written in Logica vs SQL.

Improve tutorial

I am very curious about experience of newcomers with the tutorial. I am sure it can be improved to make onboarding easier.

Using with AWS Redshift/Athena

Hi There,
Didn't get much of documentation on this, how can I use it against Redshift DB through Query/Athena.

Occasional race condition during compilation

Hi @EvgSkv!

Problem statement:

I am noticing a rather infrequent crash during compilation:

  File "/.venv/lib/python3.9/site-packages/logica/compiler/universe.py", line 457, in __init__
    rules = self.UnfoldRecursion(rules)
  File "/.venv/lib/python3.9/site-packages/logica/compiler/universe.py", line 506, in UnfoldRecursion
    annotations = Annotations(rules, {})
  File "/.venv/lib/python3.9/site-packages/logica/compiler/universe.py", line 138, in __init__
    self.annotations = self.ExtractAnnotations(
  File "/.venv/lib/python3.9/site-packages/logica/compiler/universe.py", line 396, in ExtractAnnotations
    ql = expr_translate.QL(Thrower(), ThrowException, ThrowException,
  File "/.venv/lib/python3.9/site-packages/logica/compiler/expr_translate.py", line 146, in __init__
    self.built_in_functions = copy.deepcopy(self.bulk_functions)
  File "/usr/local/lib/python3.9/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/usr/local/lib/python3.9/copy.py", line 229, in _deepcopy_dict
    for key, value in x.items():
RuntimeError: dictionary changed size during iteration

(the line numbers might be a bit shifted because I caught this on a slightly outdated compiler version -- this shouldn't relevant here).

This is coming from inside the builtin copy.deepcopy, and indicates that the number of keys of the dict in self.bulk_functions was modified while it was being copied.
I am getting this in a python process that may concurrently run several compilation (LogicaProgram(parsed_logica, user_flags=flags).FormattedPredicateSql(...)) threads.

So, the guess is that one thread crashes during copy.deepcopy while another is somehow modified the same dict.

Sentiment:

I know that compiler thread-safety is nowhere guaranteed, so I would totally understand if you chose to disregard issue.

That said, I chose to post here because I am a little dumbfounded: I see this error periodically happening on this threaded service, but I can't reproduce it directly. I am also out of ideas for now as to what causes it, because going through the code I reach the following conclusions:

If compilation in thread 2 is modifying the dict in self.bulk_functions that is simultaneously being copied in another compilation in thread 1, then the thread 2 is at a stage of compilation that is before or after making it's own copy.deepcopy.
It can't be AFTER, because once thread 2 compilation has made it's own deepcopy, it will operate on it and will not touch the original.
All operations that compiler does to the bulk_functions dict BEFORE making a deep copy must've already been done, inside the thread 1 compilation (because it has reached copy.deepcopy and crashed), and looking at the code I see that once bulk_functions is populated (prior to copying), it will not be changed.

So, I can't even put a finger or a specific place where thread 2 might be mutating bulk_functions during it's own compilation to cause a crash in thread 1's compilation at copy.deepcopy.

What am I missing? Do you have an idea where I should be looking? Does the reasoning make sense? Is the whole direction wrong?

P.S.

While analysing the above, I was trying to understand the relevance of CleanDictionary, as I see it happens sometimes, but I never actually reach deletion of any keys when I try to go step-by-step with the debugger. Either way, it shouldn't be related to the above issue because CleanDictionary operates on the deepcopy after it has been made, so it can't touch the original dict, right?

Any ideas on this subject would be appreciated!
Sorry for a tl;dr...

Sqlite: whole row syntax gives error. "OperationalError: no such column"

Hi,

%%logica Engineers

@Engine("sqlite");

Employee(name: "Alice", role: "Product Manager");
Employee(name: "Bob", role: "Engineer");
Employee(name: "Caroline", role: "Engineer");
Employee(name: "David", role: "Data Scientist");
Employee(name: "Eve", role: "Data Scientist");

Engineers(..r) :- Employee(..r), r.role == "Engineer";  # <====== This fails
# Engineers(name:, role:) :- Employee(name:, role:) , role == "Engineer"  <===== This works

The following query is stored at Engineers_sql variable.
WITH t_0_Employee AS (SELECT * FROM (
  
    SELECT
      'Alice' AS name,
      'Product Manager' AS role
   UNION ALL
  
    SELECT
      'Bob' AS name,
      'Engineer' AS role
   UNION ALL
  
    SELECT
      'Caroline' AS name,
      'Engineer' AS role
   UNION ALL
  
    SELECT
      'David' AS name,
      'Data Scientist' AS role
   UNION ALL
  
    SELECT
      'Eve' AS name,
      'Data Scientist' AS role
  
) AS UNUSED_TABLE_NAME  )
SELECT
  Employee.*
FROM
  t_0_Employee AS Employee
WHERE
  (JSON_EXTRACT(Employee, "$.role") = 'Engineer');

Engineers

Running predicate: Engineers

---------------------------------------------------------------------------

OperationalError                          Traceback (most recent call last)

<ipython-input-19-db61baf7c59b> in <module>()
----> 1 get_ipython().run_cell_magic('logica', 'Engineers', '\n@Engine("sqlite");\n\nEmployee(name: "Alice", role: "Product Manager");\nEmployee(name: "Bob", role: "Engineer");\nEmployee(name: "Caroline", role: "Engineer");\nEmployee(name: "David", role: "Data Scientist");\nEmployee(name: "Eve", role: "Data Scientist");\n\nEngineers(..r) :- Employee(..r), r.role == "Engineer";\n# Engineers(name:, role:) :- Employee(name:, role:) , role == "Engineer"')

8 frames

/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
   2115             magic_arg_s = self.var_expand(line, stack_depth)
   2116             with self.builtin_trap:
-> 2117                 result = fn(magic_arg_s, cell)
   2118             return result
   2119 

/usr/local/lib/python3.7/dist-packages/logica/colab_logica.py in logica(line, cell)
    118 @register_cell_magic
    119 def logica(line, cell):
--> 120   Logica(line, cell, run_query=True)
    121 
    122 

/usr/local/lib/python3.7/dist-packages/logica/colab_logica.py in Logica(line, cell, run_query)
    239       sql_runner = RunSQL
    240     result_map = concertina_lib.ExecuteLogicaProgram(
--> 241       executions, sql_runner=sql_runner, sql_engine=engine)
    242 
    243   for idx, predicate in enumerate(predicates):

/usr/local/lib/python3.7/dist-packages/logica/common/concertina_lib.py in ExecuteLogicaProgram(logica_executions, sql_runner, sql_engine)
    215 
    216   concertina = Concertina(config, engine)
--> 217   concertina.Run()
    218   return engine.final_result

/usr/local/lib/python3.7/dist-packages/logica/common/concertina_lib.py in Run(self)
     76   def Run(self):
     77     while self.actions_to_run:
---> 78       self.RunOneAction()
     79 
     80   def ActionColor(self, a):

/usr/local/lib/python3.7/dist-packages/logica/common/concertina_lib.py in RunOneAction(self)
     69     self.running_actions |= {one_action}
     70     self.UpdateDisplay()
---> 71     self.engine.Run(self.action[one_action].get('action', {}))
     72     self.running_actions -= {one_action}
     73     self.complete_actions |= {one_action}

/usr/local/lib/python3.7/dist-packages/logica/common/concertina_lib.py in Run(self, action)
     18       print('Running predicate:', predicate)
     19       result = self.sql_runner(action['sql'], action['engine'],
---> 20                                is_final=(predicate in self.final_predicates))
     21       if predicate in self.final_predicates:
     22         self.final_result[predicate] = result

/usr/local/lib/python3.7/dist-packages/logica/colab_logica.py in __call__(self, sql, engine, is_final)
    159   # TODO: Sqlite runner should not be accepting an engine.
    160   def __call__(self, sql, engine, is_final):
--> 161     return RunSQL(sql, engine, self.connection, is_final)
    162 
    163 

/usr/local/lib/python3.7/dist-packages/logica/colab_logica.py in RunSQL(sql, engine, connection, is_final)
    142   elif engine == 'sqlite':
    143     statements = parse.SplitRaw(sql, ';')
--> 144     connection.executescript(sql)
    145     if is_final:
    146       return pandas.read_sql(statements[-1], connection)

OperationalError: no such column: Employee

Engineers example broken under sqlite

Code:

%%logica Engineers
@Engine("sqlite");
Employee(name: "Alice", role: "Product Manager");
Employee(name: "Bob", role: "Engineer");
Employee(name: "Caroline", role: "Engineer");
Employee(name: "David", role: "Data Scientist");
Employee(name: "Eve", role: "Data Scientist");
Engineers(..r) :- Employee(..r), r.role == "Engineer";

Error:

OperationalError: no such column: Employee

I know that at this stage meaningful error messages may be a tall order but if it could perhaps print out the SQL statement it should be fairly easy to see what went wrong.

Also, sincerely thank you for this project: it is inspired.

how to express array_agg(x order by t)?

In such query

GroupEvents1(user_pseudo_id, events? List= custom_event_name) distinct :-
    Events(user_pseudo_id:, custom_event_name:, event_timestamp:);

I want to get events with order by event_timestamp still find no answer in document?

It will be the same situation in Functional aggregation:

GroupEvents2(user_pseudo_id) List=  custom_event_name  :-
    Events(user_pseudo_id:, custom_event_name:, event_timestamp:);

Integrating DuckDB

Hi,
I'm interested in using DuckDB(https://duckdb.org/) as a data source for Logica and I was wondering what would be the steps to integrate it. It's like SQLite for analytics and is a subset of the PostgreSQL syntax.

We could use sqlalchemy but 1) I don't know how to proceed after and 2) I think the ORM design of sqlalchemy may make it slow if we are returning a lot of data(just an hypothesis based on duckdb/duckdb#305)

So the question is two fold.

If we use the sqlalchemy interface, the setup code would be something like this.

# Install Logica.
!pip install logica
!pip install duckdb
!pip install duckdb-engine



# Connect to the database.
from logica import colab_logica
from sqlalchemy import create_engine

engine = create_engine('duckdb:///:memory:');
connection = engine.connect();
colab_logica.SetDbConnection(connection)

but how do I use it after that?

If I don't want to use the sqlachemy interface, how would I go about doing that? The general usage is:

import duckdb
con = duckdb.connect(database=':memory:', read_only=False) #can also use a file like sqlite
# create a table
con.execute("CREATE TABLE items(item VARCHAR, value DECIMAL(10,2), count INTEGER)")
# insert two items into the table
con.execute("INSERT INTO items VALUES ('jeans', 20.0, 1), ('hammer', 42.2, 2)")

# retrieve the items again
con.execute("SELECT * FROM items")
print(con.fetchall())
# [('jeans', 20.0, 1), ('hammer', 42.2, 2)]

Add type inference

We would like to infer types of arguments of the predicates, including reading types of database columns.
Then we can display a native error for the user, rather then compiling an SQL and having user debug SQL error.

hi! he did work in big quary?

hi, I have a question, does it work in big guary and will it work in the future? I have not figured it out yet to test it, but if it really works in the google cloud console, I will definitely start testing it in the near future

Expected ‘undefined’ error, but Logica compiles fine

A bit of context: I was translating TPC-H's 10th query, and forgot to put into scope the value of l_orderkey from the lineitem table. Logica thus took the l_orderkey == o_orderkey to mean an assignment, while I meant it as an equality test.

That Logica makes no syntactic distinction between the equality and assignment operators is explained in the Appendix of the Logica tutorial, but maybe this example shows why this can be confusing; if we had different operators the error would be clearer and easier to report to the user. Another option is to detect that l_orderkey goes unused, and therefore either it's a spurious assignment of which we might warn the user, or else it was meant as an equality test, but for some reason (a user mistake in this example) it can't be performed as intended.

@Engine("sqlite");

Query(c_custkey:, c_name:,
      revenue? += l_extendedprice * (1 - l_discount),
      c_acctbal:, n_name:, c_address:, c_phone:, c_comment:) distinct :-
  customer(c_custkey:, c_name:, c_acctbal:, c_address:,
           c_phone:, c_comment:, c_nationkey:),
  orders(o_orderkey:, o_custkey:, o_orderdate:),
  lineitem(l_extendedprice:, l_discount:, l_returnflag:),
  nation(n_nationkey:, n_name:),
  c_custkey == o_custkey,
  l_orderkey == o_orderkey,
  o_orderdate >= Date(cutoff_date),
  o_orderdate < Date(cutoff_date, "+3 month"),
  l_returnflag == "R",
  c_nationkey == n_nationkey,
  cutoff_date == "1993-10-01";
@OrderBy(Query, "revenue desc");
@Limit(Query, 20);

Make IN work in SQLite

Example of broken program:

logica - run Q <<<'Q(x) :- x in [1,3,5], Constraint(x in [1,2,3]); @Engine("sqlite");'

sqlite3.OperationalError: no such table: UNNEST

Even numbers (in tutorial) does not work for sqlite

Hi,
If I run this, I don't get even numbers.

%%logica Even
@Engine("sqlite");

Z(x) :- x in Range(20);

Even(x) :- Z(x), Z(x / 2);

print('Even numbers:', ', '.join(map(str, Even['col0'])))

Even numbers: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19

This is the generated SQL:

The following query is stored at Even_sql variable.
SELECT
  x_4.value AS col0
FROM
  JSON_EACH((select json_group_array(n) from (with recursive t as(select 0 as n union all select n + 1 as n from t where n + 1 < 20) select n from t))) as x_4, JSON_EACH((select json_group_array(n) from (with recursive t as(select 0 as n union all select n + 1 as n from t where n + 1 < 20) select n from t))) as x_6
WHERE
  (x_6.value = ((x_4.value) / (2)));

Graph did not work

I am using the current packages within Python 3.11.3 conda env

logica == 1.3.141592
graphviz == 0.20.1
jupyterlab == 3.5.3

and I want to execute

from logica.common import graph
graph.DirectedGraph(dataset)

In JupyterLab, I didn't receive any feedback. However, I did encounter an error in the console.

Uncaught ReferenceError: require is not defined
    at <anonymous>:2:5
    at t.attachWidget (jlab_core.6144df8e3fe03e33f880.js?v=6144df8e3fe03e33f880:2:1594563)
    at t.insertWidget (jlab_core.6144df8e3fe03e33f880.js?v=6144df8e3fe03e33f880:2:1593914)
    at j._insertOutput (jlab_core.6144df8e3fe03e33f880.js?v=6144df8e3fe03e33f880:2:925250)
    at j.onModelChanged (jlab_core.6144df8e3fe03e33f880.js?v=6144df8e3fe03e33f880:2:922427)
    at m (jlab_core.6144df8e3fe03e33f880.js?v=6144df8e3fe03e33f880:2:1560565)
    at Object.l [as emit] (jlab_core.6144df8e3fe03e33f880.js?v=6144df8e3fe03e33f880:2:1560225)
    at e.emit (jlab_core.6144df8e3fe03e33f880.js?v=6144df8e3fe03e33f880:2:1558470)
    at c._onListChanged (jlab_core.6144df8e3fe03e33f880.js?v=6144df8e3fe03e33f880:2:918850)
    at m (jlab_core.6144df8e3fe03e33f880.js?v=6144df8e3fe03e33f880:2:1560565)

@EvgSkv Could you provide suggestions on how to address this error? Thanks : )

PostgreSQL support

Most of logica works with PostgreSQL except for the following list:

Output of structures is not supported. E.g. with "psql" engine currently you can't do
Q(contact: {name: "John", email: "[email protected]"});
This is because PostgreSQL requires pre-defining types before using the structs and this is not implemented.
ArgMaxK is ignoring the limit, so picking a few maximal values from a very large set is impossible. We don't know how to fix this, as PostgreSQL does not support LIMIT in ARRAY_AGG.

Originally posted by @EvgSkv in #20 (comment)

Tutorial: Possibly outdated documentation on intentional error (but it doesn't error out)

Hi,
In the tutorial it mentions,
Code:

%%logica Engineers
@Engine("psql");

Employee(name: "Alice", role: "Product Manager", office: "SEA");
Employee(name: "Bob", role: "Engineer", office: "SEA");
Employee(name: "Caroline", role: "Engineer", office: "LAX");
Employee(name: "David", role: "Data Scientist", office: "LAX");
Employee(name: "Eve", role: "Data Scientist", office: "SEA");

IsEngineer(record) :- record.role == "Engineer";
  
Engineers(..r) :- Employee(..r), IsEngineer(r);

I don't get any error :) as mentioned above. Also if you see the above screenshot(and in the tutorial which I didn't run), there is a successful output. I feel like I'm missing something.

Support SparkSQL

Spark support would be really nice since it would allow to process big datasets on on-premise deployments.

Aggregating an Ordered Set in BigQuery — ARRAY_AGG(DISTINCT ... ORDER BY ...) is not support! 😭

I'm trying to generate a set ordered by a another value, but I'm really struggling to pull it off. I'm not sure how to work around BigQuery's limitations, after trying quite a few attempts. Without doing some clever work around for BigQuery not supporting ARRAY_AGG(DISTINCT ... ORDER BY ...), the only other way I see is JOINing another select after the GROUP BY is preformed, but I'm also struggling to figure out how to pull that off in Logica as well. Anyways, love the project! Overall it's been really cool! 🚀

Code

Here's the most straight forward attempt

%%logica Test

@Engine("bigquery");

OrderedSet(a) = SqlExpr("ARRAY_AGG(DISTINCT {value} ORDER BY {arg})", {value: a.value, arg: a.arg});

TestOrderedSet(g:, ids_by_t? Array= t -> id) distinct :-
  Data(g:, id:, t:);

Data(g: 1, id: 1, t: 5);
Data(g: 1, id: 2, t: 4);
Data(g: 1, id: 2, t: 4);
Data(g: 1, id: 3, t: 3);
Data(g: 1, id: 3, t: 3);
Data(g: 1, id: 4, t: 2);
Data(g: 1, id: 5, t: 1);

Data(g: 2, id: 10, t: 1);
Data(g: 2, id: 10, t: 1);
Data(g: 2, id: 20, t: 2);
Data(g: 2, id: 20, t: 0);
Data(g: 2, id: 30, t: 3);

Expected(g: 1, ids_by_t: [5, 4, 3, 2, 1]);
Expected(g: 2, ids_by_t: [20, 10, 30]);
Expected(g: 3, ids_by_t: [20, 10, 30]);

Test(g:, actual:, expected:) :-
  TestOrderedSet(g:, ids_by_t: actual),
  Expected(g:, ids_by_t: expected);

Output

And the error 😢

BadRequest: 400 An aggregate function that has both DISTINCT and ORDER BY arguments can only ORDER BY expressions that are arguments to the function at [77:39]

                   -----Query Job SQL Follows-----                   

    |    .    |    .    |    .    |    .    |    .    |    .    |
   1:WITH t_2_Data AS (SELECT * FROM (
   2:  
   3:    SELECT
   4:      1 AS g,
   5:      1 AS id,
   6:      5 AS t
   7:   UNION ALL
   8:  
   9:    SELECT
  10:      1 AS g,
  11:      2 AS id,
  12:      4 AS t
  13:   UNION ALL
  14:  
  15:    SELECT
  16:      1 AS g,
  17:      2 AS id,
  18:      4 AS t
  19:   UNION ALL
  20:  
  21:    SELECT
  22:      1 AS g,
  23:      3 AS id,
  24:      3 AS t
  25:   UNION ALL
  26:  
  27:    SELECT
  28:      1 AS g,
  29:      3 AS id,
  30:      3 AS t
  31:   UNION ALL
  32:  
  33:    SELECT
  34:      1 AS g,
  35:      4 AS id,
  36:      2 AS t
  37:   UNION ALL
  38:  
  39:    SELECT
  40:      1 AS g,
  41:      5 AS id,
  42:      1 AS t
  43:   UNION ALL
  44:  
  45:    SELECT
  46:      2 AS g,
  47:      10 AS id,
  48:      1 AS t
  49:   UNION ALL
  50:  
  51:    SELECT
  52:      2 AS g,
  53:      10 AS id,
  54:      1 AS t
  55:   UNION ALL
  56:  
  57:    SELECT
  58:      2 AS g,
  59:      20 AS id,
  60:      2 AS t
  61:   UNION ALL
  62:  
  63:    SELECT
  64:      2 AS g,
  65:      20 AS id,
  66:      0 AS t
  67:   UNION ALL
  68:  
  69:    SELECT
  70:      2 AS g,
  71:      30 AS id,
  72:      3 AS t
  73:  
  74:) AS UNUSED_TABLE_NAME  ),
  75:t_0_TestOrderedSet AS (SELECT
  76:  Data.g AS g,
  77:  ARRAY_AGG(DISTINCT Data.id ORDER BY Data.t) AS ids_by_t
  78:FROM
  79:  t_2_Data AS Data
  80:GROUP BY g),
  81:t_3_Expected AS (SELECT * FROM (
  82:  
  83:    SELECT
  84:      1 AS g,
  85:      ARRAY[5, 4, 3, 2, 1] AS ids_by_t
  86:   UNION ALL
  87:  
  88:    SELECT
  89:      2 AS g,
  90:      ARRAY[20, 10, 30] AS ids_by_t
  91:   UNION ALL
  92:  
  93:    SELECT
  94:      3 AS g,
  95:      ARRAY[20, 10, 30] AS ids_by_t
  96:  
  97:) AS UNUSED_TABLE_NAME  )
  98:SELECT
  99:  TestOrderedSet.g AS g,
 100:  TestOrderedSet.ids_by_t AS actual,
 101:  Expected.ids_by_t AS expected
 102:FROM
 103:  t_0_TestOrderedSet AS TestOrderedSet, t_3_Expected AS Expected
 104:WHERE
 105:  (Expected.g = TestOrderedSet.g)
    |    .    |    .    |    .    |    .    |    .    |    .    |

Generate SQL

And the generated SQL for the lazy

WITH t_2_Data AS (SELECT * FROM (
  
    SELECT
      1 AS g,
      1 AS id,
      5 AS t
   UNION ALL
  
    SELECT
      1 AS g,
      2 AS id,
      4 AS t
   UNION ALL
  
    SELECT
      1 AS g,
      2 AS id,
      4 AS t
   UNION ALL
  
    SELECT
      1 AS g,
      3 AS id,
      3 AS t
   UNION ALL
  
    SELECT
      1 AS g,
      3 AS id,
      3 AS t
   UNION ALL
  
    SELECT
      1 AS g,
      4 AS id,
      2 AS t
   UNION ALL
  
    SELECT
      1 AS g,
      5 AS id,
      1 AS t
   UNION ALL
  
    SELECT
      2 AS g,
      10 AS id,
      1 AS t
   UNION ALL
  
    SELECT
      2 AS g,
      10 AS id,
      1 AS t
   UNION ALL
  
    SELECT
      2 AS g,
      20 AS id,
      2 AS t
   UNION ALL
  
    SELECT
      2 AS g,
      20 AS id,
      0 AS t
   UNION ALL
  
    SELECT
      2 AS g,
      30 AS id,
      3 AS t
  
) AS UNUSED_TABLE_NAME  ),
t_0_TestOrderedSet AS (SELECT
  Data.g AS g,
  ARRAY_AGG(DISTINCT Data.id ORDER BY Data.t) AS ids_by_t
FROM
  t_2_Data AS Data
GROUP BY g),
t_3_Expected AS (SELECT * FROM (
  
    SELECT
      1 AS g,
      ARRAY[5, 4, 3, 2, 1] AS ids_by_t
   UNION ALL
  
    SELECT
      2 AS g,
      ARRAY[20, 10, 30] AS ids_by_t
   UNION ALL
  
    SELECT
      3 AS g,
      ARRAY[20, 10, 30] AS ids_by_t
  
) AS UNUSED_TABLE_NAME  )
SELECT
  TestOrderedSet.g AS g,
  TestOrderedSet.ids_by_t AS actual,
  Expected.ids_by_t AS expected
FROM
  t_0_TestOrderedSet AS TestOrderedSet, t_3_Expected AS Expected
WHERE
  (Expected.g = TestOrderedSet.g);

Addressing issues of SQL

Hi Evgeny!
I'm a little bit involved into PostgreSQL development, so I decided to take a look on Logica.
Seems like an interesting piece of research, thanks!

There is a good post summarizing developers complains about SQL [0]. How do you think, does Logica address these issues?
Here's the list of SQL problems:
lack of proper orthogonality — SQL is hard to compose;
lack of compactness — SQL is a large language;
lack of consistency — SQL is inconsistent in syntax and semantics;
poor system cohesion — SQL does not integrate well enough with application languages and protocols.

Thanks!

[0] https://www.edgedb.com/blog/we-can-do-better-than-sql/

STRUCT columns

I have a table with a STRUCT column, e.g. STRUCT<x INT64, y INT64>. It looks like Logica variables can only be alphanumeric plus underscore. Is there a way to refer to the fields of a STRUCT? I tried using backticks for the variable name.

Support SqlServer

Make programs run on Microsoft SQL server.

Re-digest predicates from DB

I tried to pass one predicate to another with the following scripts on local PC

%%logica Child

@Engine("sqlite");

@AttachDatabase("mydata", "my_database.db");
@Dataset("ParentTable");

Parent(..r) :- mydata.ParentTable(..r);

Child(x):- Parent(child:x);

The Child table is supposed to be created successfully, but I encountered the error below

Running predicate: Child
--- SQL ---
SELECT
  JSON_EXTRACT(mydata_ParentTable, "$.child") AS col0
FROM
  mydata.ParentTable AS mydata_ParentTable
[ Error ] Error while executing SQL:
Execution failed on sql 'SELECT
  JSON_EXTRACT(mydata_ParentTable, "$.child") AS col0
FROM
  mydata.ParentTable AS mydata_ParentTable': no such column: mydata_ParentTable

Documentation on -> operator

The tutorial uses at one point the -> operator.

# Finding most popular name for each year.
@OrderBy(TopNameByYear, "year");
TopNameByYear(year) ArgMax= name -> NameCountByYear(name:, year:);

While this seems to be a variant on the functional aggregation method, it's not documented anywhere. Is there a more complete language reference that can be linked?

Thanks!

Improve parser whitespace robustness

This predicate parses:

Human(name: "Socrates", iq: 250, year_of_birth: -470, location: "Athens");

but this one doesn't, because it contains a tab:

Human(name: "Socrates", iq: 250, year_of_birth: -470, location: "Athens"	);

(example inspired by the Logica Postgres quick start)

tested with

logica test.l print Human

where test.l has the contents of the source blocks above.

EDIT: I hadn't initially realized that the problem was solely caused by tabs.

PostgreSQL: syntax error at or near "AS" for tutorial StructureTest

Hi,
The following code errors out for psql.

# Setup
# Install Logica.
!pip install logica

# Install postgresql server.
!sudo apt-get -y -qq update
!sudo apt-get -y -qq install postgresql
!sudo service postgresql start

# Prepare database for Logica.
!sudo -u postgres psql -c "CREATE USER logica WITH SUPERUSER"
!sudo -u postgres psql -c "ALTER USER logica PASSWORD 'logica';"
!sudo -u postgres psql -U postgres -c 'CREATE DATABASE logica;'

# Connect to the database.
from logica import colab_logica
from sqlalchemy import create_engine
import pandas
engine = create_engine('postgresql+psycopg2://logica:[email protected]', pool_recycle=3600);
connection = engine.connect();
colab_logica.SetDbConnection(connection)

%%logica StructureTest

@Engine("psql");


StructureTest(a: {x: 1, y: 2, z: { w: "hello", v: "world"}});
StructureTest(a: {x: 3, y: 4, z: { w: "bonjour", v: "monde"}});

The Error:

The following query is stored at StructureTest_sql variable.
-- Initializing PostgreSQL environment.
set client_min_messages to warning;
create schema if not exists logica_test;

SELECT * FROM (
  
    SELECT
      STRUCT(1 AS x, 2 AS y, STRUCT('hello' AS w, 'world' AS v) AS z) AS a
   UNION ALL
  
    SELECT
      STRUCT(3 AS x, 4 AS y, STRUCT('bonjour' AS w, 'monde' AS v) AS z) AS a
  
) AS UNUSED_TABLE_NAME  ;

StructureTest

Running predicate: StructureTest

---------------------------------------------------------------------------

ProgrammingError                          Traceback (most recent call last)

/usr/local/lib/python3.7/dist-packages/sqlalchemy/engine/base.py in _execute_context(self, dialect, constructor, statement, parameters, execution_options, *args, **kw)
   1760                     self.dialect.do_execute_no_params(
-> 1761                         cursor, statement, context
   1762                     )

20 frames

ProgrammingError: syntax error at or near "AS"
LINE 4:       STRUCT(1 AS x, 2 AS y, STRUCT('hello' AS w, 'world' AS...
                       ^


The above exception was the direct cause of the following exception:

ProgrammingError                          Traceback (most recent call last)

/usr/local/lib/python3.7/dist-packages/sqlalchemy/engine/default.py in do_execute_no_params(self, cursor, statement, context)
    718 
    719     def do_execute_no_params(self, cursor, statement, context=None):
--> 720         cursor.execute(statement)
    721 
    722     def is_disconnect(self, e, connection, cursor):

ProgrammingError: (psycopg2.ProgrammingError) syntax error at or near "AS"
LINE 4:       STRUCT(1 AS x, 2 AS y, STRUCT('hello' AS w, 'world' AS...
                       ^

[SQL: SELECT * FROM (
  
    SELECT
      STRUCT(1 AS x, 2 AS y, STRUCT('hello' AS w, 'world' AS v) AS z) AS a
   UNION ALL
  
    SELECT
      STRUCT(3 AS x, 4 AS y, STRUCT('bonjour' AS w, 'monde' AS v) AS z) AS a
  
) AS UNUSED_TABLE_NAME  ]
(Background on this error at: http://sqlalche.me/e/14/f405)

How to do window queries in Logica?

Can window queries be expressed in Logica? Can't find any examples in the tutorials :/

Number of arguments of `TimestampSub()`

Since BigQuery's TIMESTAMP_SUB() has two arguments, I assume that TimestampSub() also has two arguments. However, the following predicate will result in an error.

A(t: TimestampSub(CurrentDate("Asia/Tokyo"), SqlExpr("INTERVAL 9 HOUR",{})));

Compiling:
A(t: TimestampSub(CurrentDate("Asia/Tokyo"), SqlExpr("INTERVAL 9 HOUR",{})))

[ Error ] Built-in function TimestampSub takes (3, 3) arguments, but 2 arguments were given.

FileNotFoundError: [Errno 2] No such file or directory: 'bq'

Whenever example command logica primes.l run Prime is executed, error is produced:
FileNotFoundError: [Errno 2] No such file or directory: 'bq'

It happens in docker container ubuntu:latest with python3.8.5 installed

Please see the log output:

root@d0094b59f905:/home# logica primes.l run Prime
Traceback (most recent call last):
  File "/usr/local/bin/logica", line 8, in <module>
    sys.exit(run_main())
  File "/usr/local/lib/python3.8/dist-packages/logica/logica.py", line 184, in run_main
    main(sys.argv)
  File "/usr/local/lib/python3.8/dist-packages/logica/logica.py", line 157, in main
    p = subprocess.Popen(['bq', 'query',
  File "/usr/lib/python3.8/subprocess.py", line 854, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/lib/python3.8/subprocess.py", line 1702, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'bq'

CamelCase weirdness

👋 Hello! I'm having great fun playing around with Logica. Thanks for the project.

I run into something a bit strange and was wondering if it is a bug.

I have a fact and rule like so:

Tuple(namespace: "document", id: "1", rel: "writer", user: "id", userId: "badger", user_namespace: "", user_object_id: "");

List(namespace:, id:, rel:, user:, userId:, user_namespace:, user_object_id:) :-
    Tuple(namespace:, id:, rel:, user:, userId:, user_namespace:, user_object_id:);

and if I try to run this as is, it returns a parsing error:

Parsing:
List(namespace:, id:, rel:, user:, userId:, user_namespace:, user_object_id:) :-
    Tuple(namespace:, id:, rel:, user:, userId:, user_namespace:, user_object_id:)

[ Error ] Could not parse expression of a value.

If I parametrize userId (with an x) as in the following:

Tuple(namespace: "document", id: "1", rel: "writer", user: "id", userId: "badger", user_namespace: "", user_object_id: "");

List(namespace:, id:, rel:, user:, userId: x, user_namespace:, user_object_id:) :-
    Tuple(namespace:, id:, rel:, user:, userId: x, user_namespace:, user_object_id:);

then it works fine:

(env) $ logica zanzibar.l run List
+-----------+----+--------+--------+----------+----------------+----------------+
| namespace | id | rel    | user   | userId   | user_namespace | user_object_id |
+-----------+----+--------+--------+----------+----------------+----------------+
| document  | 1  | writer | id     | badger   |                |                |
+-----------+----+--------+--------+----------+----------------+----------------+

Similarly, if I don't parametrize, but use snake case user_id then it also works fine.

So, should I always just stick with snake_case?

Thanks for any insights 🙌

External Functors

Functors in Logica are functions that map named tuples of predicates to a predicate.
Currently the only available functors are substitutions. Any predicate can work as a functor.

External Functors would be arbitrary executables that are running in the middle of Logica program execution. They would create new database tables from the ones that Logica already produced. Here is a toy high level example of what application of such
function would look like:

# LargestConnectedComponent takes in a graph and produces a graph
# that consists of a largest connected component.
# Note that this functor is in fact easy to implement in Logica, we use this example
# just because it's an intuitive problem.
@ExternalFunctor(
    LargestConnectedComponent,
    binary: "/path/to/extract_component");

# Building a traveling graph. Assuming Trips table that has source and destination of trips.
# There is an edge in traveling graph if there are more than 1000 trips between the cities.
TravelVolume(source, target) += 1 :- Trips(source:, target:);
@Ground(TravelEdge);
TravelEdge(source, target) :- TravelVolume(source, target) > 1000;

# This functor call results in execution of command like:
# /path/to/extract_component --args='{"Graph": "TravelEdge"} --dataset="logica_test"
# logica_test is a default dataset where Logica writes tables.
# The binary would read the logica_test.Graph dataset and write logica_test.LargestAreaGraph dataset.
# Logica would then associate logica_test.LargestAreaGraph table with LargestAreaGraph
# predicate. 
LargestAreaGraph := LargestConnectedComponent(Graph: TravelEdge);

NumCitiesInLargestComponent() Count= city :-
  city in [source, target], LargestAreaGraph(source, trarget);

To implement this we would need to implement new execution engine that would be doing these calls.

Estimated work time to implement: 3 - 5 weeks.

How to translate case when?

In bigquery and firebase data.

SELECT user_pseudo_id,
 event_timestamp ,
CASE event_name
  WHEN 'screen_view' THEN 
         event_name || (SELECT param.value.string_value FROM UNNEST(event_params) AS param WHERE 
        param.key="firebase_screen_class") 
else event_name
END as name
 FROM `bq.x.x` 

order by event_timestamp

I can't find example for document In Logica tutorial.ipynb .

Can we support raw sql in logica? So we can write complicate sql in logica.

Too many references to "advised": max 65535

Hi,

When running the following script

%%logica Lowest_Common_Ancestors_with_Name
@Engine("sqlite");

@AttachDatabase("mgdb","mgdb.db");
@Dataset("advised");
@Dataset("person");
@Dataset("dissertation");

Adv_Stu(advisor:, student:author) :- Advised(did:x, advisor:),Dissertation(did:y, author:), x=y;

@Recursive(Anc,33);
Anc(ancestor:advisor, student:m) distinct :- Adv_Stu(advisor:, student:m), m=63244;
Anc(ancestor:x, student:) distinct:- Adv_Stu(advisor:x, student:y),Anc(ancestor:y, student:);

Anc(ancestor:advisor, student:l) distinct :- Adv_Stu(advisor:, student:l), l=119280;
Anc(ancestor:x, student:) distinct:- Adv_Stu(advisor:x, student:y),Anc(ancestor:y, student:);

Common_Ancestors(pid_1:m, pid_2:l, anc_id: x) :- Anc(ancestor:x, student:m), 
Anc(ancestor:x, student:l), m=63244,l=119280 ;

Not_Lowest_Common_Ancestors(pid_1:m, pid_2:l, anc_id: x) :- Common_Ancestors(pid_1:m, pid_2:l, anc_id: x), 
Common_Ancestors(pid_1:m, pid_2:l, anc_id: y), Adv_Stu(advisor:x, student:y),m=63244,l=119280;

Lowest_Common_Ancestors_with_Name(pid:x, name:name) :- Common_Ancestors(pid_1:m, pid_2:l, anc_id: x),
~Not_Lowest_Common_Ancestors(pid_1:m, pid_2:l, anc_id: x), Person(pid:x,name:name),m=63244,l=119280;

I will encounter an error below

---------------------------------------------------------------------------
OperationalError                          Traceback (most recent call last)
File ~/opt/miniconda3/envs/logica/lib/python3.11/site-packages/pandas/io/sql.py:2200, in SQLiteDatabase.execute(self, sql, params)
   2199 try:
-> 2200     cur.execute(sql, *args)
   2201     return cur

OperationalError: too many references to "advised": max 65535

The above exception was the direct cause of the following exception:

DatabaseError                             Traceback (most recent call last)
Cell In[13], line 1
----> 1 get_ipython().run_cell_magic('logica', 'Lowest_Common_Ancestors_with_Name', '\n@Engine("sqlite");\n\n@AttachDatabase("mgdb","mgdb.db");\n@Dataset("advised");\n@Dataset("person");\n@Dataset("dissertation");\n\nAdv_Stu(advisor:, student:author) :- Advised(did:x, advisor:),Dissertation(did:y, author:), x=y;\n\n@Recursive(Anc,33);\nAnc(ancestor:advisor, student:m) distinct :- Adv_Stu(advisor:, student:m), m=63244;\nAnc(ancestor:x, student:) distinct:- Adv_Stu(advisor:x, student:y),Anc(ancestor:y, student:);\n\nAnc(ancestor:advisor, student:l) distinct :- Adv_Stu(advisor:, student:l), l=119280;\nAnc(ancestor:x, student:) distinct:- Adv_Stu(advisor:x, student:y),Anc(ancestor:y, student:);\n\nCommon_Ancestors(pid_1:m, pid_2:l, anc_id: x) :- Anc(ancestor:x, student:m), \nAnc(ancestor:x, student:l), m=63244,l=119280 ;\n\nNot_Lowest_Common_Ancestors(pid_1:m, pid_2:l, anc_id: x) :- Common_Ancestors(pid_1:m, pid_2:l, anc_id: x), \nCommon_Ancestors(pid_1:m, pid_2:l, anc_id: y), Adv_Stu(advisor:x, student:y),m=63244,l=119280;\n\nLowest_Common_Ancestors_with_Name(pid:x, name:name) :- Common_Ancestors(pid_1:m, pid_2:l, anc_id: x),\n~Not_Lowest_Common_Ancestors(pid_1:m, pid_2:l, anc_id: x), Person(pid:x,name:name),m=63244,l=119280;\n')

File ~/opt/miniconda3/envs/logica/lib/python3.11/site-packages/IPython/core/interactiveshell.py:2478, in InteractiveShell.run_cell_magic(self, magic_name, line, cell)
   2476 with self.builtin_trap:
   2477     args = (magic_arg_s, cell)
-> 2478     result = fn(*args, **kwargs)
   2480 # The code below prevents the output from being displayed
   2481 # when using magics with decodator @output_can_be_silenced
   2482 # when the last Python token in the expression is a ';'.
   2483 if getattr(fn, magic.MAGIC_OUTPUT_CAN_BE_SILENCED, False):

File ~/opt/miniconda3/envs/logica/lib/python3.11/site-packages/logica/colab_logica.py:128, in logica(line, cell)
    126 @register_cell_magic
    127 def logica(line, cell):
--> 128   Logica(line, cell, run_query=True)

File ~/opt/miniconda3/envs/logica/lib/python3.11/site-packages/logica/colab_logica.py:273, in Logica(line, cell, run_query)
    269   else:
    270     raise Exception('Logica only supports BigQuery, PostgreSQL and SQLite '
    271                     'for now.')   
--> 273   result_map = concertina_lib.ExecuteLogicaProgram(
    274     executions, sql_runner=sql_runner, sql_engine=engine)
    276 for idx, predicate in enumerate(predicates):
    277   t = result_map[predicate]

File ~/opt/miniconda3/envs/logica/lib/python3.11/site-packages/logica/common/concertina_lib.py:270, in ExecuteLogicaProgram(logica_executions, sql_runner, sql_engine, display_mode)
    267   sql_runner(preamble, sql_engine, is_final=False)
    269 concertina = Concertina(config, engine, display_mode=display_mode)
--> 270 concertina.Run()
    271 return engine.final_result

File ~/opt/miniconda3/envs/logica/lib/python3.11/site-packages/logica/common/concertina_lib.py:97, in Concertina.Run(self)
     95 def Run(self):
     96   while self.actions_to_run:
---> 97     self.RunOneAction()

File ~/opt/miniconda3/envs/logica/lib/python3.11/site-packages/logica/common/concertina_lib.py:90, in Concertina.RunOneAction(self)
     88 self.running_actions |= {one_action}
     89 self.UpdateDisplay()
---> 90 self.engine.Run(self.action[one_action].get('action', {}))
     91 self.running_actions -= {one_action}
     92 self.complete_actions |= {one_action}

File ~/opt/miniconda3/envs/logica/lib/python3.11/site-packages/logica/common/concertina_lib.py:32, in ConcertinaQueryEngine.Run(self, action)
     30   print('Running predicate:', predicate, end='')
     31 start = datetime.datetime.now()
---> 32 result = self.sql_runner(action['sql'], action['engine'],
     33                          is_final=(predicate in self.final_predicates))
     34 end = datetime.datetime.now()
     35 if self.print_running_predicate:

File ~/opt/miniconda3/envs/logica/lib/python3.11/site-packages/logica/colab_logica.py:173, in SqliteRunner.__call__(self, sql, engine, is_final)
    172 def __call__(self, sql, engine, is_final):
--> 173   return RunSQL(sql, engine, self.connection, is_final)

File ~/opt/miniconda3/envs/logica/lib/python3.11/site-packages/logica/colab_logica.py:160, in RunSQL(sql, engine, connection, is_final)
    158     print(sql)
    159     ShowError("Error while executing SQL:\n%s" % e)
--> 160     raise e
    161   return None
    162 else:

File ~/opt/miniconda3/envs/logica/lib/python3.11/site-packages/logica/colab_logica.py:153, in RunSQL(sql, engine, connection, is_final)
    150 try:
    151   if is_final:
    152     # For final predicates this SQL is always a single statement.
--> 153     return pandas.read_sql(sql, connection)
    154   else:
    155     connection.executescript(sql)

File ~/opt/miniconda3/envs/logica/lib/python3.11/site-packages/pandas/io/sql.py:633, in read_sql(sql, con, index_col, coerce_float, params, parse_dates, columns, chunksize, dtype_backend, dtype)
    631 with pandasSQL_builder(con) as pandas_sql:
    632     if isinstance(pandas_sql, SQLiteDatabase):
--> 633         return pandas_sql.read_query(
    634             sql,
    635             index_col=index_col,
    636             params=params,
    637             coerce_float=coerce_float,
    638             parse_dates=parse_dates,
    639             chunksize=chunksize,
    640             dtype_backend=dtype_backend,  # type: ignore[arg-type]
    641             dtype=dtype,
    642         )
    644     try:
    645         _is_table_name = pandas_sql.has_table(sql)

File ~/opt/miniconda3/envs/logica/lib/python3.11/site-packages/pandas/io/sql.py:2264, in SQLiteDatabase.read_query(self, sql, index_col, coerce_float, parse_dates, params, chunksize, dtype, dtype_backend)
   2253 def read_query(
   2254     self,
   2255     sql,
   (...)
   2262     dtype_backend: DtypeBackend | Literal["numpy"] = "numpy",
   2263 ) -> DataFrame | Iterator[DataFrame]:
-> 2264     cursor = self.execute(sql, params)
   2265     columns = [col_desc[0] for col_desc in cursor.description]
   2267     if chunksize is not None:

File ~/opt/miniconda3/envs/logica/lib/python3.11/site-packages/pandas/io/sql.py:2212, in SQLiteDatabase.execute(self, sql, params)
   2209     raise ex from inner_exc
   2211 ex = DatabaseError(f"Execution failed on sql '{sql}': {exc}")
-> 2212 raise ex from exc

DatabaseError: Execution failed on sql ....: too many references to "advised": max 65535

Sqlite: Custom Aggregation function returning none.

Hi,
Re: the following code in the tutorial.

@Engine("sqlite");
AnonymizedCodeContribution(cl_lengths: [110, 220, 405], org: "ads");
AnonymizedCodeContribution(cl_lengths: [30, 51, 95], org: "ads");
AnonymizedCodeContribution(cl_lengths: [10, 20, 1000], org: "games");

HarmonicMean(x) = Sum(1) / Sum(1 / x);

OrgStats(
    org:,
    mean_cl_size? Avg= cl_size,
    harmonic_mean_cl_size? HarmonicMean= cl_size) distinct :-
  AnonymizedCodeContribution(cl_lengths:, org:),
  cl_size in cl_lengths;

The output from BigQuery has values for harmonic_mean_cl_size



  | org | mean_cl_size | harmonic_mean_cl_size
-- | -- | -- | --
ads | 151.833333 | 75.402468
games | 343.333333 | 19.867550

but when I run the above code for sqlite, I get none for the same column



  | org | mean_cl_size | harmonic_mean_cl_size
-- | -- | -- | --
ads | 151.833333 | None
games | 343.333333 | None

Generated SQL Query:

The following query is stored at OrgStats_sql variable.
WITH t_0_AnonymizedCodeContribution AS (SELECT * FROM (
  
    SELECT
      JSON_ARRAY(110, 220, 405) AS cl_lengths,
      'ads' AS org
   UNION ALL
  
    SELECT
      JSON_ARRAY(30, 51, 95) AS cl_lengths,
      'ads' AS org
   UNION ALL
  
    SELECT
      JSON_ARRAY(10, 20, 1000) AS cl_lengths,
      'games' AS org
  
) AS UNUSED_TABLE_NAME  )
SELECT
  AnonymizedCodeContribution.org AS org,
  AVG(x_5.value) AS mean_cl_size,
  ((SUM(1)) / (SUM(((1) / (x_5.value))))) AS harmonic_mean_cl_size
FROM
  t_0_AnonymizedCodeContribution AS AnonymizedCodeContribution, JSON_EACH(AnonymizedCodeContribution.cl_lengths) as x_5
GROUP BY org;

Connecting to a SQLite/PostgreSQL database in a script

Logica is a wonderful alternative to SQL, and I would like to connect to SQLite/PostgreSQL database in a l script for data query. However currently there seems to be limited support for this.

Connection to a SQLite database file seems to have been already implemented but has gone undocumented. We could use the @AttachDatabase annotation:

@Engine("sqlite");
@AttachDatabase("db", "databaseFile.db");
Table(..r) :- `(db.tableName)`(..r);

and then run this script in local command line. But this usage does not appear anywhere in the tutorial and examples.

We have an example of connecting to a PosgreSQL database in ipynb files with the help of sqlalchemy library:

# Connect to the database.
from logica import colab_logica
from sqlalchemy import create_engine
import pandas
engine = create_engine('postgresql+psycopg2://logica:[email protected]', pool_recycle=3600);
connection = engine.connect();
colab_logica.SetDbConnection(connection)

which is great, but I'm not sure if this is possible in a script program.

Blog post with examples

Whoever wrote the article here does not seem to understand databases much. https://opensource.googleblog.com/2021/04/logica-organizing-your-data-queries.html

SELECT 2 AS x

UNION ALL

SELECT 3 AS x

UNION ALL

SELECT 5 AS x;

Would never happen. You'd have a table of ints.

PyCharm syntax highlighting

This is how syntax highlighting could be added into PyCharm:
https://www.jetbrains.com/help/pycharm/tutorial-using-textmate-bundles.html

In order to add syntax highlighting for a new file, it should be a TextMate Bundle.

According https://github.com/EvgSkv/logica/blob/main/syntax/vscode/DEV.md
src/logica.tmLanguage.yaml file is already written in TextMate grammar.

Probably, we could convert existing file to TextMate Bundle, so it would be possible to add syntax highlighting in PyCharm.
@EvgSkv , what do you think?

Python eDSL

It would be really nice if there was a Python eDSL so that I could write queries in Python, which I have much better tooling for. 😄

Import search path

In version 1.3.14, the import resolves relative to the current directory. We are trying to deploy Logica files embedded with our Python package, so they can be anywhere on sys.path. Would it be possible to change the parser so that a list of directories (like sys.path) could be provided for resolving import?

Databricks SQL support

Based on recent work for clients, I for one would think support for Databricks SQL would be a huge deal. Seems to be used a little bit of everywhere.

I recently learned Databricks has teamed up with dbt to enable using their SQL templating tech for simplifying writing transformation logic, but Logica would be on a completely different level in my opinion.

ResourceWarning: unclosed file

Version 1.3.14

We are using the Logica library in a Python application and are seeing the following warnings when running unittests for our Python application:

/usr/local/lib/python3.6/dist-packages/logica/parser_py/parse.py:1111: ResourceWarning: unclosed file <_io.TextIOWrapper name='***' mode='r' encoding='UTF-8'>
  file_content = open(file_path).read()

This is not blocking us from using Logica.

I would be willing to put up a PR with a fix (if desired).

Sqlite: StructuredTest returns string instead of json object?

%%logica StructureTest
@Engine("sqlite");
StructureTest(a: {x: 1, y: 2, z: { w: "hello", v: "world"}});
StructureTest(a: {x: 3, y: 4, z: { w: "bonjour", v: "monde"}});

print(StructureTest['a'].values[0])
print(type(StructureTest['a'].values[0]))
print(StructureTest['a'].values[0]['z']['w']) # <===== This is the line in the tutorial that fails

{"x":1,"y":2,"z":{"w":"hello","v":"world"}}
<class 'str'>

---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-14-dbaa4016c4ed> in <module>()
      2 print(StructureTest['a'].values[0])
      3 print(type(StructureTest['a'].values[0]))
----> 4 print(StructureTest['a'].values[0]['z']['w'])

TypeError: string indices must be integers

Syntax highlighting in VSCode

Just as subject says: We should make ".l" files have proper syntax highlighting in VSCode editor.

How to use logica in MacOS?

Thanks for everyone to create this project.
I want to learn logica, but i don't known how to install logica.
I visit the README.md file, install Google Cloud SDK and clone https://github.com/EvgSkv/logica.
But i don't know how to use logica to run file.
Thank your very much.

Imports in an imported file cannot be found

Version 1.3.14.15.92

Imports in an imported file cannot be found if they are relative to a path in import_root. They can only be found if they are relative to the directory Logica is initially called from.

To reproduce using integration_tests/import_root_test.l change the imports in integration_tests/import_tests/canada.l to be relative to the import_root used for the test:

import usa.ProductUSA;
import usa.SoftwareEngineer;

The test fails to find usa.l event though usa.l is on the import_root given to the test.

Check precedence of && and ||

how to launch a subset query?

hi, do logica have subset function? I want to compare one set to another, and return a boolean value showing whether the two sets are equal or subset?
eg:
SUBSET({ }, { }) = TRUE
SUBSET({ 1, 2, 3 }, { }) = TRUE
SUBSET({ 1, 2 }, { 1, 2 }) = TRUE
SUBSET({ 1, 2, 3 }, { 1, 2 }) = TRUE
SUBSET({ 1, 3, 5 }, { 1, 2 }) = FALSE
many thanks!

Support MySQL

We should add support for running programs on MySQL database.