Giter VIP home page Giter VIP logo

cosette's Introduction

Gitter chat Build Status

Cosette

Note: we are busy developing Cosette 2.0, any feature request on Cosette 1.0 will not be considered.

Cosette is a language, and an automated solver for reasoning SQL equivalences.

This project is in active development. Shoot us a message ([email protected]) or create an issue if you find something doesn't work!

cosette's People

Contributors

akcheung avatar bfitzsimmons avatar jroesch avatar konne88 avatar mestway avatar primeapple avatar shamrock-frost avatar snoozy avatar stechu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cosette's Issues

support AVG

Can be supported as an uninterpreted function in Coq and interpreted in Rosette.

Can't verify queries with const expr

/* define schema s1,
here s1 can contain any number of attributes,
but it has to at least contain integer attributes
x and y */
schema s1(x:int, y:int, ??);

table a(s1); -- table a of schema s1

query q1 -- query 1
select x.x as ax from a x where 1=1;

query q2 -- query 2
select x.x as ax from a x where 0=0;

verify q1 q2; -- verify the equivalence

Generate invalid Coq code when same attribute names from different schemas

/* define schema s1, 
   here s1 can contain any number of attributes, 
   but it has to at least contain integer attributes 
   x and y */
schema s1(x:int, ya:int, yb:int, ??);

schema s2(yb:int, ??);        -- define schema s2

table a(s1);            -- define table a using schema s1
table b(s2);            -- define table b using schema s1

query q1                -- define query q1 on tables a and b
`select distinct x.x as ax from a x, b y
 where x.ya = y.yb`;

query q2                -- define query q2 likewise
`select distinct x.x as ax from a x, a y, b z 
 where x.x = y.x`;

verify q1 q2;           -- does q1 equal to q2?

The above cosette program leads to invalid coq code generation. The reason is that s1 and s2 both have a column named yb. A better handling of generated column names should be done.

Rosette Backend Timeout Mechanism

Add a timeout mechanism for rosette backend.

If the solver returns a counterexample, return the counter example immediately.
If the timeout, return the last size of symbolic relations tried.

Plan to use racket threading to implement, the threading model:

image

potential bug with union?

I thought this should work? Modified from https://www.reddit.com/r/programming/comments/6oektc/introducing_cosette_a_sql_solver_for/dkhc92b/

/* define schema s1, 
   here s1 can contain any number of attributes, 
   but it has to at least contain integer attributes 
   x and y */
schema s1(x:int, ya:int, ??);

schema s2(yb:int, ??);        -- define schema s2

table a(s1);            -- define table a using schema s1
table b(s2);            -- define table b using schema s1

query q1                -- define query q1 on tables a and b
`select distinct x.x as ax from a x, b y
 where x.ya = y.yb UNION ALL
 select distinct x.x as ax from a x where 1=2`;

query q2                -- define query q2 likewise
`select distinct x.x as ax from a x, a y, b z 
 where x.x = y.x and x.ya = z.yb`;

verify q1 q2;           -- does q1 equal to q2?

racket error returned

This currently returns a racket error:

schema s1(x:int, ya:int);

schema s2(yb:int, yc: int);        -- define schema s2

table a(s1);            -- define table a using schema s1
table b(s2);            -- define table b using schema s1

query q1                -- define query q1 on tables a and b
`select distinct A.x as ax from a A
 where NOT EXISTS (select * from b B
 where (B.yb < A.ya) or (B.yb>A.ya) )`;

query q2                -- define query q2 likewise
`select distinct A.x as ax from a A
where 
 ((select count(B.yb)  from b B where B.yb=A.ya) = 
(select count(B.yb) from b B))`;

verify q1 q2;           -- does q1 equal to q2?
{"size":[1],"status":"UNSAT"}
$=: expected real? arguments
arguments: '(#<void>)
context...:
/root/.racket/6.8/pkgs/rosette/rosette/base/core/real.rkt:160:10
/root/.racket/6.8/pkgs/rosette/rosette/base/form/control.rkt:31:25
/root/.racket/6.8/pkgs/rosette/rosette/base/adt/list.rkt:119:9: @for-each
/Cosette-Web/backend/Cosette/rosette/cosette.rkt:185:0: solve-queries-symbreak

Self JOINs produce Coq syntax error

With this query,

schema s1(user_id:ty0, parent_id:ty0, ??);

table users(s1);

query q1
`SELECT user.user_id as id
FROM users u
JOIN users AS p
  ON p.user_id = u.parent_id
WHERE u.user_id = u.user_id`;

query q2
`SELECT user.user_id as id
FROM users u
JOIN users AS p
  ON p.user_id = u.parent_id
WHERE u.user_id = u.user_id`;

verify q1 q2;

Cosette returns a syntax error pointing to users on the line of the the JOIN.

Front End Bug

3 front end bugs found, all related to handling * and aggregate:

  1. Bug 1
schema s1(a:int, b:int, ??);

schema s2(a:int, b:int, ??);        -- define schema s2

table r1(s1);            -- define table a using schema s1
table r2(s2);            -- define table b using schema s1

query q1                -- define query q1 on tables a and b
`select x.b as xb, y.b as yb 
 from r1 x, r2 y`;

query q2                -- define query q2 likewise
`select x1.xb as xb, y1.yb as yb 
 from (select x.b as xb from r1 x) x1, 
      (select y.b as yb from r2 y) y1`;

verify q1 q2;           -- does q1 equal to q2?

The solver returns

Coq Error 
ERROR: Error in elimSatr (ToHoTTSQL.lhs).

EXCEPT statements produce Coq syntax error

schema s1(id:ty0, ??);

table Products(s1);
table Inventory(s1);

query q1
`
SELECT id as p_id
FROM Products
EXCEPT
SELECT id as i_id
FROM Inventory
`;

verify q1 q1;

The above query will cause a syntax error on the id in the second SELECT line.

support string literal

We should be able to support strings as literals. This wouldn't require supporting functions on strings.

Supporting general group by

We want to enable Cosette to handle the following new forms of aggregation query forms:

  • Allowing select fields to have different orders (now we only support aggregation functions in the last)
  • Supporting operations over aggregation columns.
  • Support having functions.

An demonic example is shown below.

Select a, max(c), b + 1, count(*) + 2, min(a + b)
From  t
Group by a, b 
Having min(d - a) > 2

Invalid Coq generated

This generated an "Invalid Coq generated" message and told me to create an issue.

schema sch_flights(fid:int,
    year:int,
	month_id:string,
	day_of_month:int,
	day_of_week_id:int,
	carrier_id:int,
	flight_num:int,
	origin_city:string,
	origin_state:int,
	dest_city:string,
	dest_state:int,
	departure_delay:int,
	taxi_out:int,
	arrival_delay:int,
	canceled:int,
	actual_time:int,
	distance:int
);

schema sch_carriers
(
	cid:int,
	name:int
);

SCHEMA sch_months
(
	mid:int,
	month:int
);

SCHEMA sch_days
(
	did:int,
	day_of_week:int
);

table Flights(sch_flights);
table Carriers(sch_carriers);
table Weekdays(sch_days);
table Months(sch_months);

query q1 `SELECT C.name AS name,
       F1.flight_num AS f1_flight_num,
       F1.origin_city AS f1_origin_city,
       F1.dest_city AS f1_dest_city,
       F1.actual_time AS f1_actual_time,
       F2.flight_num AS f2_flight_num,
       F2.origin_city AS f2_origin_city,
       F2.dest_city AS f2_dest_city,
       F2.actual_time AS f2_actual_time,
       (F1.actual_time + F2.actual_time) AS actual_time
  FROM FLIGHTS F1, FLIGHTS F2
  JOIN MONTHS M
    ON F1.month_id = M.mid AND F2.month_id = M.mid
  JOIN CARRIERS C
    ON F1.carrier_id = C.cid AND F2.carrier_id = C.cid
 WHERE M.month = 'July'
   AND F1.day_of_month = 15
   AND F2.day_of_month = 15
   AND F1.year = 2015
   AND F2.year = 2015
   AND F1.origin_city = 'Seattle WA'
   AND F2.dest_city = 'Boston MA'
   AND F1.dest_city = F2.origin_city
   AND (F1.actual_time + F2.actual_time) < 7 * 60
   AND F1.carrier_id = F2.carrier_id`;

query q2                -- define query q2 likewise
`SELECT C.name AS name,
       F1.flight_num AS f1_flight_num,
       F1.origin_city AS f1_origin_city,
       F1.dest_city AS f1_dest_city,
       F1.actual_time AS f1_actual_time,
       F2.flight_num AS f2_flight_num,
       F2.origin_city AS f2_origin_city,
       F2.dest_city AS f2_dest_city,
       F2.actual_time AS f2_actual_time,
       (F1.actual_time + F2.actual_time) AS actual_time
  FROM FLIGHTS F1, FLIGHTS F2
  JOIN MONTHS M
    ON F1.month_id = M.mid AND F2.month_id = M.mid
  JOIN CARRIERS C
    ON F1.carrier_id = C.cid AND F2.carrier_id = C.cid
 WHERE M.month = 'July'
   AND F1.day_of_month = 15
   AND F2.day_of_month = 15
   AND F1.year = 2015
   AND F2.year = 2015
   AND F1.origin_city = 'Seattle WA'
   AND F2.dest_city = 'Boston MA'
   AND F1.dest_city = F2.origin_city
   AND (F1.actual_time + F2.actual_time) < 7 * 60
   AND F1.carrier_id = F2.carrier_id`;

verify q1 q2;           -- does q1 equal to q2?

Add symbolic predicate in Rosette

To support some rewrites, we want to support symbolic predicate as an uninterpreted function in Rosette explicitly.

  • Add new syntax in Rosette code

  • parser support

Query cause coq code generation error

From https://news.ycombinator.com/item?id=14810085#14811708

/* define schema s1, here s1 can contain any number of attributes, but it has to at least contain integer attributes x and y */ schema s1(x:int, ya:int, ??);
schema s2(yb:int, ??); -- define schema s2

table a(s1);            -- define table a using schema s1
table b(s2);            -- define table b using schema s1

query q1                -- define query q1 on tables a and b
`select distinct x.x as ax 
from a x, b y
 where x.ya = y.yb`;

query q2                -- define query q2 likewise
`(select x.x as ax 
 from a x, a y, b z 
 where x.x = y.x and x.ya = z.yb)
 union all (select 1 as ax from a x where 1 = 0)`;

verify q1 q2;           -- does q1 equal to q2?

Compile Constant to Coq

We need compile constants in user submitted queries to Coq. Currently we can assume all the constants are integers.

Coq code generation bug when aggregate without group by

Queries:

Cosette Program:

schema s1(x:int);

table a(s1);            -- define table a using schema s1

query q1                
`select count(*) as ax from a x`;

query q2                
`select * from a x`;

verify q1 q2;           -- does q1 equal to q2?

the solver returns

Coq Error
Internal Error (to rosette).
CoqCodeGen: ToHoTTSQL.lhs:(450,5)-(456,66): Non-exhaustive patterns in function toCoq

Cosette Program:

schema s1(x:int);

table a(s1);            -- define table a using schema s1

query q1                
`select count(*) as ax from 
 (select x.x as ax from a x) x1`;

query q2                
`select * from a x`;

verify q1 q2;           -- does q1 equal to q2?

The solver returns

Coq Error
Syntax Error.
ERROR: Error in elimSatr (ToHoTTSQL.lhs).

Feature request: EXISTS, IN

Other Features and Limitations. The COSETTE constraints generator currently supports most standard SQL features. Besides the features mentioned above, EXISTS, IN, and LEFT OUTER JOIN are also supported. Currently COSETTE does not support string operations (e.g., LIKE) since COSETTE currently does not model strings as character arrays, along with ORDER BY.

The paper says Cosette supports EXISTS and IN, but it doesn't.
For example, Cosette complains an example code in the github repositiory has a syntax error.

example code: https://raw.githubusercontent.com/uwdb/Cosette/master/examples/calcite/testDecorrelateTwoIn.cos
tested under Cosette online demo and Cosette source code.

EXISTS Freezes

When I ran cosette to check query that uses EXISTS in its WHERE caluse, cosette freezes. A similar query using IN in its WHERE clause works perfectly. Query 1 below works fine but query 2 freezes

schema sch_flights(fid:int,
    year:int,
	month_id:int,
	day_of_month:int,
	day_of_week_id:int,
	carrier_id:string,
	flight_num:int,
	origin_city:string,
	origin_state:string,
	dest_city:string,
	dest_state:string,
	departure_delay:int,
	taxi_out:int,
	arrival_delay:int,
	canceled:int,
	actual_time:int,
	distance:int,
	capacity:int,
	price:int
);

schema sch_carriers
(
	cid:int,
	name:string
);

SCHEMA sch_months
(
	mid:int,
	month:string
);

SCHEMA sch_weekdays
(
	did:int,
	day_of_week:string
);

table Flights(sch_flights);
table Carriers(sch_carriers);
table Weekdays(sch_weekdays);
table Months(sch_months);
query q1                -- define query q1 on tables a and b
 select distinct C.name as carrier
 from Carriers C
 where C.cid in (select F.carrier_id as carrier_id
                          from Flights F
                          where F.origin_city = 'Seattle WA' and 
                            F.dest_city = 'San Francisco CA' and
                            F.carrier_id = C.cid);

query q2                -- define query q2 likewise
 SELECT DISTINCT c.name AS carrier
 FROM Carriers AS c
 WHERE EXISTS ( SELECT *
	                      FROM flights AS f
	                      WHERE f.carrier_id = c.cid
		              AND f.origin_city = 'Seattle WA'
		              AND f.dest_city = 'San Francisco CA');

Testing counter examples on a sqlite engine

Whenever we generate a counter example for two queries q1 and q2, we should generate a sqlite3 scripts and execute it on a sqlite engine -- to test whether the generated counter example is indeed a counter example.

The purpose is to test our sql interpreter (as it may contain bugs).

Feature request: In/Not In

Hey! Could we get these features added? This is a query found when grading CSEP 544, HW3.

select distinct F2.dest_city as city
from Flights F1, Flights F2
where F1.origin_city = 'Seattle WA' and 
      F1.dest_city = F2.origin_city and F2.dest_city != F1.origin_city and 
      F2.dest_city not in (select F.dest_city as dest_city 
                           from Flights F 
                           where F.origin_city = 'Seattle WA');

Increasing size of symbolic relations in Rosette

Increasing size of symbolic relations in Rosette.

The rosette program that the parser produced only need to provide query and input schemas.

  • add interface in rosette
  • actual incremental solving in rosette

Better proof search for query with union

We should be able to find the Coq proof for the following query. Essentially A UNION ALL EMPTY = A. There are various ways of implementing that, though.

schema s1(x:int);

table a(s1);            -- define table a using schema s1

query q1                -- define query q1 on tables a and b
`select * from a x UNION ALL
 select * from a x where false`;

query q2                -- define query q2 likewise
`select * from a x`;

verify q1 q2;           -- does q1 equal to q2? 

Rosette counterexample generation?

I notice that all test cases in rosette are only checking equivalence of two queries. But the purpose of using rosette is to find counterexamples that can proof the inequality of two queries, am I right?

Could you specify that how to use the given Rosette codes to generate counterexample?

README.md links are broken

  • Read Introducing Cosette --> 404
  • See the Cosette Guide to see how to use Cosette --> 404 (trailing dot should be removed)

Feature request: Not Exists

I would expect the following queries to be comparable, but alas I think NOT EXISTS does not parse.

schema sch_flights(fid:int,
    year:int,
	month_id:int,
	day_of_month:int,
	day_of_week_id:int,
	carrier_id:int,
	flight_num:int,
	origin_city:int,
	origin_state:int,
	dest_city:int,
	dest_state:int,
	departure_delay:int,
	taxi_out:int,
	arrival_delay:int,
	canceled:int,
	actual_time:int,
	distance:int
);

schema sch_carriers
(
	cid:int,
	name:int
);

SCHEMA sch_months
(
	mid:int,
	month:int
);

SCHEMA sch_days
(
	did:int,
	day_of_week:int
);

table Flights(sch_flights);
table Carriers(sch_carriers);
table Weekdays(sch_days);
table Months(sch_months);

query q1 `select distinct F1.origin_city as orig from Flights F1
where not exists(	select * from Flights F2
					where F2.actual_time >= 180
					and F2.origin_city = F1.origin_city )`;

query q2                -- define query q2 likewise
`select distinct F1.origin_city as orig
from Flights F1
where not exists(	select * from Flights F2
					where F2.actual_time >= 180
					and F2.origin_city = F1.origin_city )`;

verify q1 q2;           -- does q1 equal to q2?

CTE support

Hey guys, are there any plans to work on CTE (common table expressions) support?
They are a part of SQL standard, supported by most popular RDBM and their usage is widely considered to be best practice when there's a need for multiple (potentially nested) subquieries.

Exactly Equal Queries Time out

The following queries are not known as equivalent, even though they are exactly the same. Maybe you could try sorting the x = y and y = x, such that they are all x = y or something. Either way, I feel like it should know that these are equivalent statements.

schema sch_flights(fid:int,
    year:int,
	month_id:int,
	day_of_month:int,
	day_of_week_id:int,
	carrier_id:string,
	flight_num:int,
	origin_city:string,
	origin_state:string,
	dest_city:string,
	dest_state:string,
	departure_delay:int,
	taxi_out:int,
	arrival_delay:int,
	canceled:int,
	actual_time:int,
	distance:int,
	capacity:int,
	price:int
);

schema sch_carriers
(
	cid:int,
	name:string
);

SCHEMA sch_months
(
	mid:int,
	month:string
);

SCHEMA sch_days
(
	did:int,
	day_of_week:string
);

table Flights(sch_flights);
table Carriers(sch_carriers);
table Weekdays(sch_days);
table Months(sch_months);
query q1                -- define query q1 on tables a and b
` select C.name as name, sum(F.departure_delay) as delay
 from FLIGHTS as F, CARRIERS as C
 where C.cid=F.carrier_id 
 group by F.carrier_id`;

query q2                -- define query q2 likewise
`select c.name as name, sum(f.departure_delay) as delay
from FLIGHTS as f, CARRIERS as c
where f.carrier_id = c.cid
group by f.carrier_id`;

verify q1 q2;           -- does q1 equal to q2?

Building hott for Coq results in "Unable to satisfy the following constraints" error

Scenario:

  1. set up the necessary prerequisites
  2. go to Cosette/hott/
  3. type make

Expected:

hott module for Coq should complete building successfully

What happens:

make[1]: Entering directory `/pstore/home/adaszews/Cosette/hott/.build'
COQC library/UnivalentSemantics.v
COQC .build_solve/library/UnivalentSemantics.v
File "./library/UnivalentSemantics.v", line 10, characters 2-57:
Error:
Unable to satisfy the following constraints:
In environment:
T : type

?Denotation : "Denotation type Type"

Extract common translation to a single place

Some AST transformation is common to both rosette and coq path, e.g. converting group by to subquery.
They should put into a single place.

Currently, two transformation could be done:

  • Convert the list of TableRef in From to a binary tree of XProd
  • Convert group by to correlated subquery

incorrect coq code gen when constant in aggregate expr

-- define schema r with attributes a and b
schema s(a:int, b:int); 

table r(s);            -- define table r using schema s

query q1                -- define query q1 
`select sum(1+x.b) 
 from r x`;

query q2                -- define query q2 likewise
`select sum(1+y.b) 
 from r y`;

verify q1 q2;           -- does q1 equal to q2?

generate incorrect coq code.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.