vega / compassql Goto Github PK

CompassQL Query Language for visualization recommendation.

License: Other

Shell 0.24% TypeScript 99.03% HTML 0.34% JavaScript 0.39%

compassql's Introduction

Vega: A Visualization Grammar

Vega is a visualization grammar, a declarative format for creating, saving, and sharing interactive visualization designs. With Vega you can describe data visualizations in a JSON format, and generate interactive views using either HTML5 Canvas or SVG.

For documentation, tutorials, and examples, see the Vega website. For a description of changes between Vega 2 and later versions, please refer to the Vega Porting Guide.

Build Instructions

For a basic setup allowing you to build Vega and run examples:

Clone https://github.com/vega/vega.
Run yarn to install dependencies for all packages. If you don't have yarn installed, see https://yarnpkg.com/en/docs/install. We use Yarn workspaces to manage multiple packages within this monorepo.
Once installation is complete, run yarn test to run test cases, or run yarn build to build output files for all packages.
After running either yarn test or yarn build, run yarn serve to launch a local web server — your default browser will open and you can browse to the "test" folder to view test specifications.

This repository includes the Vega website and documentation in the docs folder. To launch the website locally, first run bundle install in the docs folder to install the necessary Jekyll libraries. Afterwards, use yarn docs to build the documentation and launch a local webserver. After launching, you can open http://127.0.0.1:4000/vega/ to see the website.

Internet Explorer Support

For backwards compatibility, Vega includes a babel-ified IE-compatible version of the code in the packages/vega/build-es5 directory. Older browser would also require several polyfill libraries:

<script src="https://cdnjs.cloudflare.com/ajax/libs/babel-polyfill/7.4.4/polyfill.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/[email protected]/runtime.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/fetch.umd.min.js"></script>

Contributions, Development, and Support

Interested in contributing to Vega? Please see our contribution and development guidelines, subject to our code of conduct.

Looking for support, or interested in sharing examples and tips? Post to the Vega discussion forum or join the Vega slack organization! We also have examples available as Observable notebooks.

If you're curious about system performance, see some in-browser benchmarks. Read about future plans in our roadmap.

compassql's People

Contributors

Stargazers

Watchers

compassql's Issues

Support varying number of fields mapped

Rank Encoding Based on transform

like bin bin count should be better than avg avg?

Criteria for enumerating timeUnit

We need good criteria for determine timeUnits that we should determine by default for a particular dataset

Blocked #95 -- need stats first

Refactor / Additional Test

Extract and test hasRequiredPropertyAsEnumSpec in satisfy of EncodingConstraintModel and SpecConstraintModel

Replicating Compass

Gen

aggregate.test.ts
encodings.test.ts

Run npm run cover and see coverage report -- add more tests for uncovered constraints

constraint/encoding.test.ts @FelixCodes
constraint/spec.test.ts – @FelixCodes

Normalize flat version to nested version

(for demo)

Enumerate Scale Properties

Scale

Background

Look at description and changes of #27 to see the infrastructure for adding nested property (bin.maxbins) -- note that I might miss something in the description, but if that's the case, you'll notice problem as you debug.

1st step Scale.type

Scale.*

Repeat the process for other scale properties (one PR for each)

add ones that are required by other tasks
- type
  - clamp: Q, T
  - exponent: pow
  - round: Q, T
    - accept types of values depending on scale type
- zero --> zero doesn't play well with [ ScaleType.ORDINAL, LOG, TIME, UTC]. I don't think I'm missing anything else...
  - #105
- bandSize
  - #93
  - ~~bandSize must be at least 0~~
- range
  - #101
  - ~~values must contain two or more values.~~
- domain
- round
- clamp
  - must have continuous domain / continuous domain (quantitative and time types only)
- nice
  - similar to clamp.. quantitatiev and time.
- exponent
- useRawDomain

--- LATER ---

padding
- works with channel.x, channel.y --> uses pixels
- ??? padding (0, 1) for rangeBands ??? -- LATER

Allow outputing pruned output that do not satisfy non-strict constraints

The rationale is that sometimes users might expect to see some visualizations but do not see it because they are pruned.

Outputting them with a label that they do not satisfy constraint could be useful both for debugging and for helping users understand.

Improve Ranking

Channel, Cardinality
Penalize over encoding

Test

TxT
TxQ
QxT > Q

Example: Histogram - maxbins is broken

(See editor)

Data-driven occlusion test

Right now we just say aggregate has no occlusion, while raw has occlusion -- that's not always correct.

Distinguish high-cardinality strings from nominal fields

Fields with too high cardinality takes up a lot of space and can be slow to render.

add a flag isKeyLike (or some better name) to schema

We might want to consider a few options:

distinguish between categories (low cardinality) and text (high cardinality) as they serve different purpose in data analysis anyway.
- Check if the cardinality is above X% (50%?) of the overall data count and above minimum threshold (e.g., 40)

Maybe check if "if the cardinality is above ~80% of the overall data count" or some similar criteria

Add a constraint that excludes fields with too high cardanality from being added automatically.

This spec generates duplicated output

{
  "mark": {
    "mode:": "pick/enum"
    "values": [""]
  },
  "encodings": [
    {
      "channel": "x",
      "field": "Cylinders",
      "type": "quantitative"
    },{
      "autoCount": true
    }
  ]
}

Cardinality Based Constraints

determine input format for cardinality in the schema
maxCardinalityForFacets
maxCardinalityForColor
maxCardinalityForShape
minCardinalityForBin

Revise old compass constraints

Not sure if we should add the following

~~maxCardinalityForAutoAddOrdinal~~ #70
alwaysAddHistogram
~~consistentAutoQ -- if aggregate for all Q are "*" -- give all of them same level of aggregation.~~ (already have omitRawContinuousFieldForAggregatePlot)

Group by chart type name.

(Originally vega/compass#16)

Make debugger stop in mocha

Sometimes adding debugger breakpoint in mocha doesn't work.

http://stackoverflow.com/questions/30023736/mocha-breakpoints-using-visual-studio-code

Revise how we group config for preferred axis based on field type

preferredTemporalAxis?: Channel;
preferredOrdinalAxis?: Channel;
preferredNominalAxis?: Channel;

Should ordinal and nominal be grouped together?
Should we group this by the output scale type? (If so, time field with timeUnit would be ordinal scale.)

Make the item title in editor displays score

Systematically Test Constraints / Ranking

Constraints

Refactor Bin to Support Bin Parameter

Currently in EncodingQuery, it's

bin?: boolean | EnumSpec<boolean> | ShortEnumSpec;

However, bin can have parameter too and I don't want mixing up between boolean and object here.

So I'm thinking

bin?: BinQuery

with the following interface

interface BinQuery {
  enable: boolean | EnumSpec<boolean> | ShortEnumSpec;
  maxbins: number | EnumSpec<number> | ShortEnumSpec;
  ... // other params
}

Any thoughts? @domoritz

Add JSON schema

Generate JSON Schema for CompassQL schema

Look at this line in Vega-Lite
https://github.com/vega/vega-lite/blob/master/package.json#L35

Do the same for Query.

Add Tests to validate all examples

In Vega-Lite, we have a test that validates all example specs so that both its input and output validates JSON schema.

Validates input CompassQL query (each example json files)
For each example query, run the query method in query.ts and check the output. For each SpecQueryModel in the output convert them into Vega-Lite specs (call .toSpec()) and validates Vega-Lite output.)

Make sure that the example test is excluded from test coverage.
(See Vega-Lite's package.json)

Constraint: AggregateOnly / RawOnly

AggregateOnly
RawOnly

Support automatically adding count

Make EnumSpec support `exclude` in addition to `enum`

modify model build logic to merge exclude with `values.

Additional Constraints

(for completeness)

For raw plots, don't put a field on detail (originally vega/compass#98)

MVP Grouping

MVP for Enumerate

enumerate answers based on input CompassQL query
- check if the constraint is enabled (in the option)
- generate fields -- read from schema
support two types of constraints
- encoding constraint (constraint for one encoding mappings)
- spec constraint (constraint that involves multiple encoding mappings or involves relationship between mark and encoding)
determine order in a way that automatically adding count still works
- noRepeatedField --> '*'
Remember which field we assign for later reference

Missing Constraints

channelsSupportRoles
omitShapeWithBin (channel supports role?)
omitShapeWithTimeDimension (channel supports role?)
omitBarWithSize
omitRawBar/Area

Distinguish between grouping by "fields" and by "fields and transforms"

Constraint propertyPrecedence

Prevent duplicate output if autoCount comes after channel in propertyPrecedence

Basically, whenever, autoCount is false, we shouldn't even assign it to a channel.

We have to either add Logic to prevent autoCount to come after channel in the propertyPrecedence
or make answerSet in generate really a set to prevent duplication

Prevent nested property output from coming before its parent

Expanded the top-level only the top 5

Add missing core tests

enumerator.test.ts

For each of these properties:

aggregate
timeUnit
field
type

Write a test that enumerate all valid values

aggregate
timeUnit
field
type

hint: turn config.verbose = true

Write a test that enumerate both valid and invalid values (and test that the output contains only valid values)

aggregate
- To see relevant constraints, look at constraints/{spec|encoding}.ts
  - look at properties of each constraint
  - look at a few ones that contain Property.AGGREGATE

(LATER)

Write a test that enumerate all valid values

bin -- bin is the most complicated -- ping me to explain about it

Write a test that enumerate both valid and invalid values (and test that the output contains only valid values)

To see relevant constraints, look at constraints/{spec|encoding}.ts
timeUnit
field
type

Other Files

Run npm run cover and see coverage report -- add more tests for uncovered constraints

constraint/encoding.test.ts @FelixCodes
constraint/spec.test.ts – @FelixCodes
nest.ts – @RileyChang
query.ts – @RileyChang

Enumerate Stack

Stack
Stack constraint (don't enumerate non-summing aggregate for stack)

Refactor

Consistent Variable Name
- encodingQ => encQ
- property => prop
EnumSpecIndex.timeunit => timeUnit

cc: @ZeningQu

Aggregate Plot with Facet the only group-by should be rated worse

e.g.,

{
  "data": {
    "url": "data/cars.json"
  },
  "mark": "point",
  "encoding": {
    "row": {
      "field": "Cylinders",
      "type": "nominal"
    },
    "x": {
      "aggregate": "mean",
      "bin": false,
      "field": "Horsepower",
      "type": "quantitative"
    },
    "y": {
      "aggregate": "mean",
      "bin": false,
      "field": "Acceleration",
      "type": "quantitative"
    }
  }
}

MVP Ranking

Turning line overlay on only for ordinal scale

(For temporal line, there might be too many points?)

Split generate.ts into two files

Right now enumerator stuff are in generate.ts.
However, this makes generate.test.ts unduly long.

Therefore, we should extract enumerator.ts from generate.ts

Set max size for each cell in vega-editor

Supports top N query / or at least make vega-editor shows top N

Headless Mode

Improve OxQxQ ranking

x:Q, y:Q, color: O should be > y:O, x:Q, size: Q

Syntax for nested grouping

Nested grouping is very important for understanding structure / debugging output results.
(I'm currently flooded by transposes of the visualizations.)

Therefore we need a good syntax for nested grouping.

Suppose I want to hierarchical grouping that first group by dataQueryKey then by encodingKey.

For each subgroup (by encodingKey), I want to order the subgroup's items by rankFn1.
For each group (by dataQueryKey), I want to order the group's items (which are subgroups based on encodingKey) by rankFn2.
Finally, I want to order groups by rankFn3.

For example, rankingFn1 = rankingFn2 = "effectiveness". rankFn3 can be some data enumeration order. The ranking function will rank groups by calculating score for the top-item in each list.

Suppose

spec = {
    "data": {"url": "data/cars.json"},
    "mark": "?",
    "encodings": [
      {
        "channel": "?",
        "field": "Cylinders",
        "type": "ordinal"
      },{
        "channel": "?",
        "bin": "?",
        "aggregate": "?",
        "field": "Horsepower",
        "type": "quantitative"
      }
    ]
  }

Here are a few alternative queries:

a) Nested version

{
  spec: spec, 
  group/groupings: { 
    // This case, definitely start with top-level grouping key. 
    by: 'dataKey',
    // if we want one output for each group, we can replace this orderItemBy with chooseBy
    orderItemBy: 'rankingFn2' 
    subgroup/subgroupings: {
      by: 'encodingKey',
      orderItemBy: 'rankingFn1'     
    }
  }],
  orderBy: 'rankingFn3'   
}

b) Array-based

{
  spec: spec, 
  // should the first one be the top-level one or the subgroup one -- current it's the subgroup one
  group/groupings: [{ 

     groupBy: 'encodingKey',
     // if we want one output for each group, we can replace this orderItemBy with chooseBy
     orderItemBy: 'rankingFn1'  
  },{
     groupBy: 'dataKey',
     orderItemBy: 'rankingFn2'  
  }],
  orderBy: 'rankingFn3'   // or orderGroupBy?
}

@jheer @domoritz any preference for a. or b. (or other options) / minor wordings?

I am not married to of these yet. Other ideas are welcomed.
I'm leaning toward the nested version because it's seems clearer which one is the top-level grouping.

Add statistical profiling

1D
2D
Need to think what to add

Refactor constraints

Specs

hasAppropriateGraphicTypeForMark
omitRawBarLineArea
omitRawTable

Reduce redundancy for checking type of properties

Spec constraint's satisfy can become more effective when requireAllProperties is true.
Currently we repeatedly checking if we have all the requires properties.

Transform: Filter

Adjust interface to be similar to vega/vega-lite#1461
Find //TODO: transform and implement each relevant part.

Don't bin Q-field add autoCount if there are already dimension in the spec

For example,

{
  "spec": {
    "data": {"url": "data/cars.json"},
    "mark": "?",
    "encodings": [
      {
        "channel": "?",
        "field": "Cylinders",
        "type": "nominal"
      },{
        "channel": "?",
        "field": "Origin",
        "type": "ordinal"
      },{
        "channel": "?",
        "bin": "?",
        "aggregate": "?",
        "field": "Acceleration",
        "type": "quantitative"
      }
    ]
  },
  "groupBy": "data",
  "config": {
    "autoAddCount": true
  }
}

has this group group: Cylinders,n|Origin,o|bin(Acceleration,q)|count(*,q) that contains a visualization like this one:

{
  "data": {
    "url": "data/cars.json"
  },
  "mark": "point",
  "encoding": {
    "y": {
      "field": "Cylinders",
      "type": "nominal"
    },
    "x": {
      "field": "Origin",
      "type": "ordinal"
    },
    "row": {
      "bin": true,
      "field": "Acceleration",
      "type": "quantitative"
    },
    "size": {
      "aggregate": "count",
      "field": "*",
      "type": "quantitative"
    }
  }
}