ubodin / mimir Goto Github PK
View Code? Open in Web Editor NEWData-ish exploration through SQL+Uncertainty
Home Page: http://mimirdb.info
License: Apache License 2.0
Data-ish exploration through SQL+Uncertainty
Home Page: http://mimirdb.info
License: Apache License 2.0
Follow the steps in the Demo and then run the following query:
SELECT * FROM FINALDATA where (rating > 4)
Uncertainty should be captured using VG Terms
E.g.: "Replaced NULL value for 'attribute' with 'guessed value'"
Perhaps also include some reference to the other attributes of the row.
Need to import this into the user-facing Scala codebase from Java land.
A side-column with all tables and lenses would be helpful. For example, see W3School's Try SQL feature
select * from (SELECT * FROM Matched UNION SELECT * FROM typedratings1) ratings, product where product.pid = ratings.pid
[info] x handle full-nondeterministic join conflicts
[error] 'PROJECT[A1 <= R_A, B1 <= R_B, N <= {{ test_0[] }}, A2 <= R_A, B2 <= R_B, M <= {{ test_1[] }}](
[error] SELECT[ (R_A=R_A) ](
[error] JOIN(
[error] PROJECT[__LHS_ROWID <= ROWID](
[error] R(ROWID:int)
[error] ),
[error] PROJECT[__RHS_ROWID <= ROWID](
[error] R(ROWID:int)
[error] )
[error] )
[error] )
[error] )'
[error] is not equal to
[error] 'PROJECT[A1 <= __LHS_R_A, B1 <= __LHS_R_B, N <= {{ test_0[] }}, A2 <= __RHS_R_A, B2 <= __RHS_R_B, M <= {{ test_1[] }}](
[error] SELECT[ (__LHS_R_A=__RHS_R_A) ](
[error] JOIN(
[error] PROJECT[__LHS_R_A <= R_A, __LHS_R_B <= R_B, __LHS_R_C <= R_C, __LHS_ROWID <= ROWID](
[error] R(ROWID:int)
[error] ),
[error] PROJECT[__RHS_R_A <= R_A, __RHS_R_B <= R_B, __RHS_R_C <= R_C, __RHS_ROWID <= ROWID](
[error] R(ROWID:int)
[error] )
[error] )
[error] )
[error] )' (CompilerSpec.scala:211)
[error] Expected: ...OJECTA1...= [__LHS_]R_...= [__LHS_]R_..._0[] }...= [__]R[HS]_[R_]A,...= [__]R[HS]_[R_]B,..._1[] }}
[error] ...ELECT ([__LHS_]R_A=[__]R[HS]_[R_]A)
[error] ...JOIN(
[error] ...OJECT__..._R[_A <= R_A, __LHS_R_B <= R_B, __LHS_R_C <= R_C, __LHS_R]OWID ...
[error] ...:int)
[error] ... ),
[error] ...OJECT__..._R[_A <= R_A, __RHS_R_B <= R_B, __RHS_R_C <= R_C, __RHS_R]OWID ...
[error] ...:int)
[error] ... )
[error] )
[error] )
[error] )
[error] Actual: ...OJECTA1...= []R_...= []R_..._0[] }...= []R[]_[]A,...= []R[]_[]B,..._1[] }}
[error] ...ELECT ([]R_A=[]R[]_[]A)
[error] ...JOIN(
[error] ...OJECT__..._R[]OWID ...
[error] ...:int)
[error] ... ),
[error] ...OJECT__..._R[]OWID ...
[error] ...:int)
[error] ... )
[error] )
[error] )
[error] )
[info]
Replace placeholders with an actual call to the implementation. Switch the UI to on-mouseup from on-mouseover if necessary.
Ideally: BOUNDS, VARIANCE, CONFIDENCE BOUNDS (95%)
support for arbitrary type casting constraints (not just NOT NULL) on the domain constraint lens.
A GITFlow-style diagram of the query currently being displayed in the web view.
Consider the following expression:
CASE WHEN X IS NULL THEN {{foo}} ELSE X END
ResultIterator.isDeterministic(...)
returns false for this expression only when X is in fact null. getVGTerms should follow suit. In fact, this may be better implemented as a method on resultIterator rather than on Database.
The simple way to implement this would to use Eval.inline()
to assign all of the Column()
values and then emit the VGTerms remaining in the reduced expression.
Lens type definitions should not be case sensitive. Right now, these behave differently
create lens x as select * from ratings2 with SCHEMA_MATCHING (PID string, RATING float, REVIEW_COUNT float);
create lens x as select * from ratings2 with schema_matching (PID string, RATING float, REVIEW_COUNT float);
For example:
DOMAIN_CONSTRAINT ( B NOT NULL, B < 7 )
Add a menu to simplify building lenses
[error] Could not create an instance of mimir.ctables.SqlLoaderSpec
[error] caused by java.lang.Exception: Can't find a constructor for class mimir.ctables.SqlLoaderSpec
[error] org.specs2.reflect.Classes$class.tryToCreateObjectEither(Classes.scala:96)
[error] org.specs2.reflect.Classes$.tryToCreateObjectEither(Classes.scala:207)
[error] org.specs2.specification.SpecificationStructure$$anonfun$createSpecificationEither$2.apply(BaseSpecification.scala:119)
[error] org.specs2.specification.SpecificationStructure$$anonfun$createSpecificationEither$2.apply(BaseSpecification.scala:119)
[error] scala.Option.getOrElse(Option.scala:120)
[error] org.specs2.specification.SpecificationStructure$.createSpecificationEither(BaseSpecification.scala:119)
[error] org.specs2.runner.SbtRunner.org$specs2$runner$SbtRunner$$specificationRun(SbtRunner.scala:73)
[error] org.specs2.runner.SbtRunner$$anonfun$newTask$1$$anon$5.execute(SbtRunner.scala:59)
[error] sbt.ForkMain$Run$2.call(ForkMain.java:294)
[error] sbt.ForkMain$Run$2.call(ForkMain.java:284)
[error] java.util.concurrent.FutureTask.run(FutureTask.java:266)
[error] java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[error] java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[error] java.lang.Thread.run(Thread.java:744)
The root dir has a bunch of .db files piling up. These should get organized into a Databases directory.
A simplified form of the archival lens that simply adds a user-specified gaussian to any or all of its input columns.
e.g., Type Inference on Ratings would have a default name Ratings_Typed
The TYPE_INFERENCE lens takes the form -
PROJECT[PID <= {{ TR1CAST_0[ROWID, {{ TR1INFER_0[] }}] }}, RATING <= {{ TR1CAST_1[ROWID, {{ TR1INFER_1[] }}] }}, REVIEW_CT <= {{ TR1CAST_2[ROWID, {{ TR1INFER_2[] }}] }}]( RATINGS1(...) )
It seems passing a VGTerm as an argument to another VGTerm is confusing the operator parser. The lens works on its own, but when you try to compose it with another lens, the lens.load() step fails. The error can be reproduced by creating a type_inference lens, then creating a missing_value lens on top of it and trying to see any tooltip, or creating another lens on it or even just trying to do a SELECT * FROM MIMIR_LENSES
A CSV file import feature would be helpful, and could form the basis for some later features for log parsing.
The semantics I'd be looking to see are something along the lines of:
SELECT * INTO new_table FROM uploaded_csv_file
Probably because there's some glitch in how it's being used. Also, the parser needs to be fixed to properly grab multiple expressions.
e.g., "'Guessed type 'type' for attribute 'attribute'"
and...
"Could not coerce 'string value' to 'type'"
Asterisks are good for text-only views. In a web interface, we should have highlighted cells instead.
CONFIDENCE(Expr, P) -> produces epsilon,delta bounds: Upper/lower bounds for the Pth percentile.
Running any SELECT query and following it with a CREATE LENS:
SELECT * FROM sane_r;
CREATE LENS insane_r AS SELECT * FROM r WITH missing_value('C')
results in the following exception
java.sql.SQLException: [SQLITE_BUSY] The database file is locked (database is locked)
at org.sqlite.core.DB.newSQLException(DB.java:890)
at org.sqlite.core.DB.newSQLException(DB.java:901)
at org.sqlite.core.DB.execute(DB.java:807)
at org.sqlite.jdbc3.JDBC3PreparedStatement.execute(JDBC3PreparedStatement.java:50)
at mimir.sql.JDBCBackend.update(JDBCBackend.scala:56)
at mimir.Database.update(Database.scala:96)
at mimir.lenses.LensManager.save(LensManager.scala:71)
at mimir.lenses.LensManager.create(LensManager.scala:67)
Click on a row/cell to run an EXPLAIN on its uncertainty.
From the paper
Provide a way to compute the probability of a row being in the output set. That is, the chance that __MIMIR_CONDITION evaluates to BooleanPrimitive(true)
I was playing around with some tables for CSV import + Type Inference when I noticed that with more than a few columns, the order of the columns of the tables are getting messed up. For example -
Name
is getting displayed in Married
, Married
in Joining
and Joining
in Name
This is because in line 219 of SqlToRA
, the toMap
is returning a HashMap, which is not preserving the order of columns. Consequently, in RAToSql, the mappings of the SelectItem
s are wrong.
ret
has incorrectly ordered mappings above.
Should we correct this?
Can't have 0 arguments.
2 arguments seems to break things too.
[info] x handle row-ids correctly
[error] 'PROJECT[A <= R_A, C <= R_C, N <= {{ test_0[__LHS_ROWID, R_A] }}, S_C <= S_C, S_D <= S_D](
[error] SELECT[ (R_C=S_C) ](
[error] JOIN(
[error] PROJECT[__LHS_ROWID <= ROWID, __LHS_ROWID <= ROWID, __LHS_ROWID <= ROWID](
[error] R(ROWID:int // ROWID:rowid, ROWID:rowid)
[error] ),
[error] PROJECT[S_C <= S_C, S_D <= S_D](
[error] S(S_C:int, S_D:decimal)
[error] )
[error] )
[error] )
[error] )'
[error] is not equal to
[error] 'PROJECT[A <= R_A, C <= R_C, N <= {{ test_0[__LHS_ROWID, R_A] }}, S_C <= S_C, S_D <= S_D](
[error] SELECT[ (R_C=S_C) ](
[error] JOIN(
[error] PROJECT[R_A <= R_A, R_B <= R_B, R_C <= R_C, __LHS_ROWID <= ROWID](
[error] R(ROWID:int)
[error] ),
[error] PROJECT[S_C <= S_C, S_D <= S_D](
[error] S(S_C:int, S_D:decimal)
[error] )
[error] )
[error] )
[error] )' (CompilerSpec.scala:240)
[error] Expected: ...OJECTA ..._0[__LHS_ROWID, R_A] }}, ...
[error] ...ELECT[ (R_C=S_C) ](
[error] ...JOIN(
[error] ...OJECT[]R[_A] <= R[_A], [R]_[B <= R]_[B, R_C] <= R[_C], __L...
[error] ...D:int[])
[error] ... ),
[error] ...OJECT[S_C <= S_C, S_D <= S_D](
[error] ...imal)
[error] ... )
[error] )
[error] )
[error] )
[error] Actual: ...OJECTA ..._0[__LHS_ROWID, R_A] }}, ...
[error] ...ELECT[ (R_C=S_C) ](
[error] ...JOIN(
[error] ...OJECT[__LHS_]R[OWID] <= R[OWID], []_[_LHS]_[ROWID] <= R[OWID], __L...
[error] ...D:int[ // ROWID:rowid, ROWID:rowid])
[error] ... ),
[error] ...OJECT[S_C <= S_C, S_D <= S_D](
[error] ...imal)
[error] ... )
[error] )
[error] )
[error] )
The explain box should have a Confidence (probability of the row's presence) and a list of var terms in the __MIMIR_CONDITION column.
No effects yet.
"Reasons" is more understandable to users who haven't used mimir in general.
[error] - play.core.server.netty.PlayDefaultUpstreamHandler - Cannot invoke the action
java.sql.SQLException: near ".": syntax error
at org.sqlite.core.NativeDB.throwex(NativeDB.java:397) ~[sqlite-jdbc-3.8.7.jar:na]
at org.sqlite.core.NativeDB._exec(Native Method) ~[sqlite-jdbc-3.8.7.jar:na]
at org.sqlite.jdbc3.JDBC3Statement.executeUpdate(JDBC3Statement.java:116) ~[sqlite-jdbc-3.8.7.jar:na]
at mimir.sql.JDBCBackend.update(JDBCBackend.scala:48) ~[classes/:na]
at mimir.Database.update(Database.scala:94) ~[classes/:na]
at mimir.Database.handleLoadTable(Database.scala:291) ~[classes/:na]
at mimir.WebAPI.configure(WebAPI.scala:50) ~[classes/:na]
at controllers.Application$$anonfun$loadTable$1$$anonfun$apply$1.apply(Application.scala:123) ~[classes/:na]
at controllers.Application$$anonfun$loadTable$1$$anonfun$apply$1.apply(Application.scala:117) ~[classes/:na]
This issue occurs when uploading the file https://github.com/UBOdin/mimir/blob/master/test/data/CPUSpeed.csv
Sample(Expr) that produces a sample from one possible world of evaluating the expression.
Selects the type of each attribute based on the majority of values in the record. Allows for the possibility of errors in the type selection.
e.g., "Using 'source attribute' for 'target attribute'"
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.