heavyai / metis Goto Github PK

View Code? Open in Web Editor NEW

39.0 27.0 9.0 29.62 MB

Tools for massively parallel and multi-variate data exploration

License: Other

JavaScript 98.03% Shell 1.97%

visualization charting sql crossfilter mapd

metis's Issues

Support Project Transform in Favor of "Formula" Transform

The Project Transform will basically be the Formula Transform.

It will support string expressions as well as object type expressions.

Expressions can also be specified as an array.

{
  type: "project",
  expr: Array<string | Expression> | string | Expression,
  as?: Array<string> | string
}

{
  type: "project",
  expr: {
     type: "date_trunc",
     unit: "month",
     field: "tweet_time",
     as: "key0"
  }
}
// SELECT date_trunc(month, tweet_time) as key0

{
  type: "project"
  expr: ["conv(lon)", "conv(lat)", "lang", "followers"],
  as: ["x", "y", "size", "color"]
}
// SELECT conv(lon) as x, conv(lat) as y, lang as size, followers as color

data-layer: SQL parser should escape single quotes in string values

String values containing single quotes that are passed to mapd-data-layer, such as for a SQL CASE statement, are currently not having their single quotes escaped, so a string like 'Chicago O'Hare International' will cause a malformed SQL query.

For example:

CASE 
WHEN origin_name IN 
('Chicago O'Hare International','William B Hartsfield-Atlanta Intl','Dallas-Fort Worth International','Los Angeles International','Phoenix Sky Harbor International') 
THEN origin_name 
ELSE 'undefined' 
END AS key1

Document Transform and Expression JSON API

These are currently documented as flow types. It would be helpful to document concrete examples of the transforms and how they translate into SQL

Explore Use of Vega-Lite as Higher-Level API

Vega-Lite provides a higher level visualization grammar that ties together encodings and data transformations.

It would useful to explore how that grammar maps directly to Vega encodings and the mapd-data-layer transformations.

One possible outcome of this exploration is defining a higher-level parser for vega-lite specifications. This parser would translate vega-lite spec to a vega spec and a data transform spec (to be used by the data layer)

Extensible and Modular Parser and Writer

The goal of this feature is to expose the SQL writer as a module that can be extended by the user.

The user would be able to declare a new type of transform or expression by registering a "definition" of it, along with a function that parses it.

const writer = createSQLWriter()
writer.registerParser(typeDef, parser)
writer.writeSQL(DataState)

This would be the same writer module used internally by the graph instance.

const graph = createGraph()
graph.getWriter().registerParser(transformDef, parser)

Support Subquery Wherever There Can be An Expression

good test case:

select count(*) from (
    select distinct user_b from twitter_edges where user_a in (
        select distinct user_b from twitter_edges where user_a in (
            select distinct user_b from twitter_edges where user_a in (
                select distinct user_b from twitter_edges where user_a = '40981798'
            )
        )
    )
)

Support All Numerical / String / Null Filter Transforms

Implement Relation Builder API as Node Method

The general idea is to add to the dataNode instances helper methods for constructing and setting transform objects.

For instance, this:

// extract and between would be expression creators
node.project("key1", extract("day", "contrib_date"))
node.filter(between("amount", [0, 100]))

would be equivalent to:

node.transform({
  type: "project",
  expr: {
    type: "extract",
    unit: "day",
    field: "contrib_date"
  },
  as: "key1"
})

node.transform({
  type: "filter",
  expr: {
    type: "between",
    field: "amount",
    left: 0,
    right: 100
  }
})

Demo links lead to 404

Support Multiplicative Sampling Transform

{
  type: "sample",
  method: "multiplicative",
  size: number,
  limit: number
}

const ratio = Math.min(limit/size, 1.0)
const threshold = Math.floor(4294967296  * ratio);

`MOD(${table}.rowid * 265445761, 4294967296) < ${threshold}`

Settle on Data Node Constructor API

Current API

const graph = createGraph()

const root = graph.data({
  source: "flights",
  name: "root"
})

const child = graph.data({
  source: "root",
  name: "child",
  transform: [
    {
      type: "filter",
      expr: "recipient_party = 'D'"
    }
  ]
})

const grandchild = graph.data({
  source: "child",
  name: "grandchild",
  transform: [
    {
      type: "aggregate",
      fields: ["*", "amount"],
      ops: ["average", "average"],
      groupby: "recipient_party"
    }
  ]
})

Proposed API 1

const graph = createGraph()

const root = graph.createRoot({
  source: "flights",
  name: "root"
})

const child = root.createChild({
  name: "child",
  transform: [
    {
      type: "filter",
      expr: "recipient_party = 'D'"
    }
  ]
})

const grandchild = child.createChild({
  name: "grandchild",
  transform: [
    {
      type: "aggregate",
      fields: ["*", "amount"],
      ops: ["average", "average"],
      groupby: "recipient_party"
    }
  ]
})

Proposed API 2

const graph = createGraph()

const root = graph.createRoot({
  source: "flights",
  name: "root"
})

const childNode = createNode({
  name: "child",
  transform: [
    {
      type: "filter",
      expr: "recipient_party = 'D'"
    }
  ]
})

const grandchildNode = createNode({
  name: "grandchild",
  transform: [
    {
      type: "aggregate",
      fields: ["*", "amount"],
      ops: ["average", "average"],
      groupby: "recipient_party"
    }
  ]
})

const child = root.pushChild(childNode)
const grandchild = child.pushChild(grandchildNode)

The two API proposals seem compatible as well.

Graph State is currently represented as:

const state = {
  root: {
    source: "flights",
    name: "root"
  },
  child: {
    source: "root",
    name: "child",
    transform: [
      {
        type: "filter",
        expr: "recipient_party = 'D'"
      }
    ]
  },
  grandchild: {
    source: "child",
    name: "grandchild",
    transform: [
      {
        type: "aggregate",
        fields: ["*", "amount"],
        ops: ["average", "average"],
        groupby: "recipient_party"
      }
    ]
  }
}

In the latter two proposals it would be represented as:

const state = {
  root: {
    source: "flights",
    name: "root",
    children: [
      child
    ]
  }
}

const childState = {
  source: root,
  name: "child",
  transform: [
    {
      type: "filter",
      expr: "recipient_party = 'D'"
    }
  ],
  children: [
    grandchild
  ]
}

const grandchildState = {
  source: child,
  name: "grandchild",
  transform: [
    {
      type: "aggregate",
      fields: ["*", "amount"],
      ops: ["average", "average"],
      groupby: "recipient_party"
    }
  ],
  children: []
}

Implement Relation Builder API as Functional Helpers

Release Version 1.0

Support Parsing of Expression Types

High-Level Helper: Lightweight Crossfilter Mananger

Should provide a lightweight layer of abstraction to manage the "crossfiltering" behavior among nodes.

The aim is to help with adding / modifying / removing crossfilters.

An example is something like https://github.com/mapd/mapd-data-layer/blob/master/example/vega/src/crossfilter.js

Support `Case` and `Coalesce` Expressions

Improve Crossfilter/ResolveFilter API

Currently "crossfiltering" behavior is implemented through the transforms Crossfilter and ResolveFilter.

The Crossfilter transform represents a set of filter transformations that should be applied to child nodes. These filters only get applied when the child nodes explicitly allowed them through the ResolveFilter transform .

For instance, a parent can have this Crossfilter transform

const xfilterDataNode = graph.data({
  source: "flights_donotmodify",
  name: "xfilter",
  transform: [
    {
      type: "crossfilter",
      signal: "xfilter",
      filter: [
        {
           type: "filter",
           id: "amount-filter",
           expr: {
              type: "between"
              field: "amount",
              left: 50,
              right: 100
           }
        },
        {
           type: "filter",
           id: "party-filter",
           expr: {
              type: "="
              left: "party",
              right: "D"
           }
        }
      ]
    }
  ]
});

And a child can resolve it like so (ignoring the party-filter)

const childDataNote = graph.data({
  source: "xfilter",
  name: "child",
  transform: [
    {
      type: "resolveFilter",
      filter: { signal: "xfilter" },
      ignore: ["party-filter"]
     }
  ]
});

Open to any other possible ideas.

declare type NowExpression = {|
  type: "now"
|}

declare type RelativeTimeExpression = {|
  type: "relative",
  interval: "minute" | "hour" |  "day" | "week" | "month" | "quarter" | "year",
  step: number
|}

Support Joins

SELECT ticker_subticker_map.ticker as ticker,end_month_date,AVG(avg_amount) as aov,COUNT(DISTINCT(final_transactions.resolved_mem_id)) as num_buyers,COUNT(final_transactions.resolved_mem_id) as num_purchases
FROM final_transactions
JOIN cohort_members_true as coh
ON coh.resolved_mem_id = final_transactions.resolved_mem_id
JOIN ticker_subticker_map
ON ticker_subticker_map.subticker = final_transactions.ticker AND date_date >= COALESCE(ticker_subticker_map.acquisition_date, date_date)
JOIN (SELECT
         MIN(start_week_date) AS start_week_date,
                                    MAX(end_week_date)   AS end_week_date,
                                    end_month_date
         FROM calendar_months
         WHERE start_week_date >= '2014-01-01'
         GROUP BY end_month_date) as tw
ON date_date BETWEEN tw.start_week_date AND tw.end_week_date
WHERE 1=1
  AND final_transactions.date_date >= '2014-01-01' AND final_transactions.transaction_base_type = 'debit'
  AND date_date > coh.birth_month
GROUP BY ticker_subticker_map.ticker, end_month_date
LIMIT 10;

heavyai / metis Goto Github PK

metis's Issues

Current API

Proposed API 1

Proposed API 2

Recommend Projects

Recommend Topics

Recommend Org