heavyai / metis Goto Github PK
View Code? Open in Web Editor NEWTools for massively parallel and multi-variate data exploration
License: Other
Tools for massively parallel and multi-variate data exploration
License: Other
The Project
Transform will basically be the Formula
Transform.
It will support string expressions as well as object type expressions.
Expressions can also be specified as an array.
{
type: "project",
expr: Array<string | Expression> | string | Expression,
as?: Array<string> | string
}
{
type: "project",
expr: {
type: "date_trunc",
unit: "month",
field: "tweet_time",
as: "key0"
}
}
// SELECT date_trunc(month, tweet_time) as key0
{
type: "project"
expr: ["conv(lon)", "conv(lat)", "lang", "followers"],
as: ["x", "y", "size", "color"]
}
// SELECT conv(lon) as x, conv(lat) as y, lang as size, followers as color
String values containing single quotes that are passed to mapd-data-layer
, such as for a SQL CASE statement, are currently not having their single quotes escaped, so a string like 'Chicago O'Hare International' will cause a malformed SQL query.
For example:
CASE
WHEN origin_name IN
('Chicago O'Hare International','William B Hartsfield-Atlanta Intl','Dallas-Fort Worth International','Los Angeles International','Phoenix Sky Harbor International')
THEN origin_name
ELSE 'undefined'
END AS key1
These are currently documented as flow types. It would be helpful to document concrete examples of the transforms and how they translate into SQL
Vega-Lite provides a higher level visualization grammar that ties together encodings and data transformations.
It would useful to explore how that grammar maps directly to Vega encodings and the mapd-data-layer transformations.
One possible outcome of this exploration is defining a higher-level parser for vega-lite specifications. This parser would translate vega-lite spec to a vega spec and a data transform spec (to be used by the data layer)
The goal of this feature is to expose the SQL writer as a module that can be extended by the user.
The user would be able to declare a new type of transform or expression by registering a "definition" of it, along with a function that parses it.
const writer = createSQLWriter()
writer.registerParser(typeDef, parser)
writer.writeSQL(DataState)
This would be the same writer module used internally by the graph
instance.
const graph = createGraph()
graph.getWriter().registerParser(transformDef, parser)
good test case:
select count(*) from (
select distinct user_b from twitter_edges where user_a in (
select distinct user_b from twitter_edges where user_a in (
select distinct user_b from twitter_edges where user_a in (
select distinct user_b from twitter_edges where user_a = '40981798'
)
)
)
)
The general idea is to add to the dataNode instances helper methods for constructing and setting transform objects.
For instance, this:
// extract and between would be expression creators
node.project("key1", extract("day", "contrib_date"))
node.filter(between("amount", [0, 100]))
would be equivalent to:
node.transform({
type: "project",
expr: {
type: "extract",
unit: "day",
field: "contrib_date"
},
as: "key1"
})
node.transform({
type: "filter",
expr: {
type: "between",
field: "amount",
left: 0,
right: 100
}
})
{
type: "sample",
method: "multiplicative",
size: number,
limit: number
}
const ratio = Math.min(limit/size, 1.0)
const threshold = Math.floor(4294967296 * ratio);
`MOD(${table}.rowid * 265445761, 4294967296) < ${threshold}`
const graph = createGraph()
const root = graph.data({
source: "flights",
name: "root"
})
const child = graph.data({
source: "root",
name: "child",
transform: [
{
type: "filter",
expr: "recipient_party = 'D'"
}
]
})
const grandchild = graph.data({
source: "child",
name: "grandchild",
transform: [
{
type: "aggregate",
fields: ["*", "amount"],
ops: ["average", "average"],
groupby: "recipient_party"
}
]
})
const graph = createGraph()
const root = graph.createRoot({
source: "flights",
name: "root"
})
const child = root.createChild({
name: "child",
transform: [
{
type: "filter",
expr: "recipient_party = 'D'"
}
]
})
const grandchild = child.createChild({
name: "grandchild",
transform: [
{
type: "aggregate",
fields: ["*", "amount"],
ops: ["average", "average"],
groupby: "recipient_party"
}
]
})
const graph = createGraph()
const root = graph.createRoot({
source: "flights",
name: "root"
})
const childNode = createNode({
name: "child",
transform: [
{
type: "filter",
expr: "recipient_party = 'D'"
}
]
})
const grandchildNode = createNode({
name: "grandchild",
transform: [
{
type: "aggregate",
fields: ["*", "amount"],
ops: ["average", "average"],
groupby: "recipient_party"
}
]
})
const child = root.pushChild(childNode)
const grandchild = child.pushChild(grandchildNode)
The two API proposals seem compatible as well.
Graph State is currently represented as:
const state = {
root: {
source: "flights",
name: "root"
},
child: {
source: "root",
name: "child",
transform: [
{
type: "filter",
expr: "recipient_party = 'D'"
}
]
},
grandchild: {
source: "child",
name: "grandchild",
transform: [
{
type: "aggregate",
fields: ["*", "amount"],
ops: ["average", "average"],
groupby: "recipient_party"
}
]
}
}
In the latter two proposals it would be represented as:
const state = {
root: {
source: "flights",
name: "root",
children: [
child
]
}
}
const childState = {
source: root,
name: "child",
transform: [
{
type: "filter",
expr: "recipient_party = 'D'"
}
],
children: [
grandchild
]
}
const grandchildState = {
source: child,
name: "grandchild",
transform: [
{
type: "aggregate",
fields: ["*", "amount"],
ops: ["average", "average"],
groupby: "recipient_party"
}
],
children: []
}
Should provide a lightweight layer of abstraction to manage the "crossfiltering" behavior among nodes.
The aim is to help with adding / modifying / removing crossfilters.
An example is something like https://github.com/mapd/mapd-data-layer/blob/master/example/vega/src/crossfilter.js
Currently "crossfiltering" behavior is implemented through the transforms Crossfilter
and ResolveFilter
.
The Crossfilter
transform represents a set of filter transformations that should be applied to child nodes. These filters only get applied when the child nodes explicitly allowed them through the ResolveFilter
transform .
For instance, a parent can have this Crossfilter
transform
const xfilterDataNode = graph.data({
source: "flights_donotmodify",
name: "xfilter",
transform: [
{
type: "crossfilter",
signal: "xfilter",
filter: [
{
type: "filter",
id: "amount-filter",
expr: {
type: "between"
field: "amount",
left: 50,
right: 100
}
},
{
type: "filter",
id: "party-filter",
expr: {
type: "="
left: "party",
right: "D"
}
}
]
}
]
});
And a child can resolve it like so (ignoring the party-filter
)
const childDataNote = graph.data({
source: "xfilter",
name: "child",
transform: [
{
type: "resolveFilter",
filter: { signal: "xfilter" },
ignore: ["party-filter"]
}
]
});
Open to any other possible ideas.
We know that the module will be easier to read/write in ReasonML. I wonder if there are also performance benefits as well.
declare type NowExpression = {|
type: "now"
|}
declare type RelativeTimeExpression = {|
type: "relative",
interval: "minute" | "hour" | "day" | "week" | "month" | "quarter" | "year",
step: number
|}
SELECT ticker_subticker_map.ticker as ticker,end_month_date,AVG(avg_amount) as aov,COUNT(DISTINCT(final_transactions.resolved_mem_id)) as num_buyers,COUNT(final_transactions.resolved_mem_id) as num_purchases
FROM final_transactions
JOIN cohort_members_true as coh
ON coh.resolved_mem_id = final_transactions.resolved_mem_id
JOIN ticker_subticker_map
ON ticker_subticker_map.subticker = final_transactions.ticker AND date_date >= COALESCE(ticker_subticker_map.acquisition_date, date_date)
JOIN (SELECT
MIN(start_week_date) AS start_week_date,
MAX(end_week_date) AS end_week_date,
end_month_date
FROM calendar_months
WHERE start_week_date >= '2014-01-01'
GROUP BY end_month_date) as tw
ON date_date BETWEEN tw.start_week_date AND tw.end_week_date
WHERE 1=1
AND final_transactions.date_date >= '2014-01-01' AND final_transactions.transaction_base_type = 'debit'
AND date_date > coh.birth_month
GROUP BY ticker_subticker_map.ticker, end_month_date
LIMIT 10;
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.