altair-viz / altair Goto Github PK

Declarative statistical visualization library for Python

Home Page: https://altair-viz.github.io/

License: BSD 3-Clause "New" or "Revised" License

Python 99.75% TeX 0.06% JavaScript 0.19%

altair's Introduction

Vega-Altair

Vega-Altair is a declarative statistical visualization library for Python. With Vega-Altair, you can spend more time understanding your data and its meaning. Vega-Altair's API is simple, friendly and consistent and built on top of the powerful Vega-Lite JSON specification. This elegant simplicity produces beautiful and effective visualizations with a minimal amount of code.

Vega-Altair was originally developed by Jake Vanderplas and Brian Granger in close collaboration with the UW Interactive Data Lab. The Vega-Altair open source project is not affiliated with Altair Engineering, Inc.

Documentation

See Vega-Altair's Documentation Site as well as the Tutorial Notebooks. You can run the notebooks directly in your browser by clicking on one of the following badges:

Example

Here is an example using Vega-Altair to quickly visualize and display a dataset with the native Vega-Lite renderer in the JupyterLab:

import altair as alt

# load a simple dataset as a pandas DataFrame
from vega_datasets import data
cars = data.cars()

alt.Chart(cars).mark_point().encode(
    x='Horsepower',
    y='Miles_per_Gallon',
    color='Origin',
)

One of the unique features of Vega-Altair, inherited from Vega-Lite, is a declarative grammar of not just visualization, but interaction. With a few modifications to the example above we can create a linked histogram that is filtered based on a selection of the scatter plot.

import altair as alt
from vega_datasets import data

source = data.cars()

brush = alt.selection_interval()

points = alt.Chart(source).mark_point().encode(
    x='Horsepower',
    y='Miles_per_Gallon',
    color=alt.condition(brush, 'Origin', alt.value('lightgray'))
).add_params(
    brush
)

bars = alt.Chart(source).mark_bar().encode(
    y='Origin',
    color='Origin',
    x='count(Origin)'
).transform_filter(
    brush
)

points & bars

Features

Carefully-designed, declarative Python API.
Auto-generated internal Python API that guarantees visualizations are type-checked and in full conformance with the Vega-Lite specification.
Display visualizations in JupyterLab, Jupyter Notebook, Visual Studio Code, on GitHub and nbviewer, and many more.
Export visualizations to various formats such as PNG/SVG images, stand-alone HTML pages and the Online Vega-Lite Editor.
Serialize visualizations as JSON files.

Installation

Vega-Altair can be installed with:

pip install altair

If you are using the conda package manager, the equivalent is:

conda install altair -c conda-forge

For full installation instructions, please see the documentation.

Getting Help

If you have a question that is not addressed in the documentation, you can post it on StackOverflow using the altair tag. For bugs and feature requests, please open a Github Issue.

Development

You can find the instructions on how to install the package for development in the documentation.

To run the tests and linters, use

hatch test

For information on how to contribute your developments back to the Vega-Altair repository, see CONTRIBUTING.md

Citing Vega-Altair

If you use Vega-Altair in academic work, please consider citing https://joss.theoj.org/papers/10.21105/joss.01057 as

@article{VanderPlas2018,
    doi = {10.21105/joss.01057},
    url = {https://doi.org/10.21105/joss.01057},
    year = {2018},
    publisher = {The Open Journal},
    volume = {3},
    number = {32},
    pages = {1057},
    author = {Jacob VanderPlas and Brian Granger and Jeffrey Heer and Dominik Moritz and Kanit Wongsuphasawat and Arvind Satyanarayan and Eitan Lees and Ilia Timofeev and Ben Welsh and Scott Sievert},
    title = {Altair: Interactive Statistical Visualizations for Python},
    journal = {Journal of Open Source Software}
}

Please additionally consider citing the Vega-Lite project, which Vega-Altair is based on: https://dl.acm.org/doi/10.1109/TVCG.2016.2599030

@article{Satyanarayan2017,
    author={Satyanarayan, Arvind and Moritz, Dominik and Wongsuphasawat, Kanit and Heer, Jeffrey},
    title={Vega-Lite: A Grammar of Interactive Graphics},
    journal={IEEE transactions on visualization and computer graphics},
    year={2017},
    volume={23},
    number={1},
    pages={341-350},
    publisher={IEEE}
}

altair's People

Stargazers

Watchers

Forkers

tacaswell jakevdp wrobstory freeman-lab alope107 kootenpv intermezzo-fr mathisonian aggftw nkhuyu fish2000 mindis willingc fperez danielballan jdetle minrk watsona4 nudomarinero sirrice bodyk fgcarto arvind chendaniely stsievert kilimanjaro-serengeti fernanccervone rlugojr awesome-python 3kwa siliutors ubergarm eotp timuraykutyildirim bsipocz nanqiangyipo matsen solertis rahmanhpu ellisonbg tomaugspurger snashraf xuminhua sylvaincorlay pierre-haessig sanga trajendra muxuezi tzonghao snie2012 vdt geweiwang whbjob xetrocoen chauhanu blatoo umeshach rgenaro1 mirca zjffdu olivierh59500 illuminate-imaging albinb pavankumarag gnestor windhaunting biocodings yasiral sawon1234 veghbernadett codeaudit serignecisse davegavin namghiwook pbaljeka chagge keita1 radovankavicky gapdata aeroaks 4n6strider elliotjh anastasiaclark craigcitro cjiang gemunu binarycrayon madhu94 youyoude hhumayun eocarragain lgeiger bluetyson rpmunoz kenseii ian-r-rose olummy princessd8251 egnha pberkes

altair's Issues

[API] Should encode() overwrite or update?

Consider this:

layer = Layer(data)
layer.encode(x='Horsepower')
layer.encode(y='Miles_per_Gallon')

Currently, this result is identical to

Layer(data).encode(y='Miles_per_Gallon')

this is because every time encode is called, it overwrites the current encoding.

I think it might be less confusing to change encode so that it instead updates the encoding, such that the above would be equivalent to

layer = Layer(data).encode(
   x='Horsepower',
   y='Miles_per_Gallon'
)

Thoughts?

Finish implementing rest of JSON spec

Make the defaults of our custom Enum types is None

All of our default values should be None when we don't define anything. Right now, the default values for custom Enum types are empty strings rather than None. This is forcing is to add extra logic.

Auto-generate top-level objects?

We might think about auto-generating the top-level objects. The main benefit would be the ability to explicitly define keywords in the methods for better tab-completion. Also, if we do it well it will be less work to keep things up-to-date as the Vega-Lite schema evolves.

On the other hand, maintaining templates is probably harder than maintaining code in the long-run.

config_*() methods overwrite previous attributes

Here is the current behavior:

from altair import Chart, load_dataset
cars = load_dataset('cars')

chart = Chart(cars)
chart.configure_axis(axisWidth=500)
chart.configure_axis(axisColor='red')
chart.to_dict()['config']
# {'axis': {'axisColor': 'red'}}

Note that axisWidth is overwritten by the second method call. I'm not sure whether it makes more sense to have it this way, or to have it so that additional method calls add properties to those which were previously defined.

rc1 comment.

As per @ellisonbg testing RC1 and commenting, apologies for shortness, I'll assume you want efficiencies and quick review for SciPy.

It looks really great ! Here are what goes through my mind as explore. Sorry if it's all over the place.

Altair requires the following dependencies:

numpy
pandas
py.test

If this is true, it should be in setup.py install_requires, or don't say it.
You also requires traitlets in setup.py , which is an indirect dependency, I suggest removing.

I suggest making a

extras_require = dict(
 'test' = [...],
 'notebook' = [...]
 'jupyter' = [...]
)

if you want to.

Does it needs jupyter nbextension enable vega --py --sys-prefix as said during the installation step (a word on that might be good)

Installation when smoothly ; It works.

I was confuse the docs in notebooks, I was searching for docs/notebook.
I'm confused by the dotted line around graph, and the fact that save freeze the graph as PNG (but I might have bad ipywidgets)
The colors and style looks great.
The PNGs looks blurry to me
- I registed altair.readtehdocs.io, made Brian and Jake maintainer/owner.
.encode(...) has a strong bytes/unicode meaning to me.
Same case kwargs feel weird timeUnit, labelAngle
Examples implicitely display things as they are last cells statements, I guess people might get confused.
I still like the color.
rename notebook with leading number, Introduction / index are not the firsts.

Config naming confusion

Now that all of our classes are inheriting from traitlets Configurable, they have a config property. What we didn't notice is that Viz objects in the VL spec also have a config property. These two notions of config are completely separate, but have the same property names, which conflixts. To get around this I have called the VL config property Viz/vlconfig but make sure that it still gets serialized to config in to_dict. I am not too fond of this, but don't have a better idea at this point. Any ideas?

@jakevdp @wrobstory @tacaswell

Regression recipe

Here's a gist with a quick example of doing group-aware regression plots. We might think about whether to create an API entry-point for this type of compound plot: https://gist.github.com/jakevdp/31fe746a747dab9e424b4dc1a682bfe4

Version 1.0 Release Roadmap

We're getting close to release I think. Here's what I have in mind:

Syntax Questions / Comments

I skim through the notebooks in the notebooks folder.

I wonder why Scatterplot.ipynb and SimpleBarChart.ipynb have x=, y= in the encode() function. (x=X('Horsepower')) seems somewhat redundant compared to BasicExample.ipynb, which do not require x= prefix. That said, x= might be useful in the sense that people shouldn't assign multiple fieldDefs to the same channel except detail.

Layer(data).encode(     x=X('Horsepower'),     y=Y('Miles_per_Gallon') ).point()

Note that using Layer as the base object might introduce future conflict with our layer composition operator.

Serialization of lists is broken

Right now, some attributes of the different objects can be lists. Right now, those don't serialize. We probably want to create a serialize function that can be called recursively to better deal with this situation.

Redo Unit Tests

With the rewrite for the 1.0 spec, the unit tests are failing.

@ellisonbg – can I work on this while you're scraping the example plots this morning?

population dataset not encodable as JSON

Something is going on here... I'm having trouble figuring out what the problem is:

>>> from altair import *
>>> population = load_dataset('population')
>>> Layer(population)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/Users/jakevdp/anaconda/envs/python3.5/lib/python3.5/site-packages/IPython/core/formatters.py in __call__(self, obj)
    907             method = _safe_get_formatter_method(obj, self.print_method)
    908             if method is not None:
--> 909                 method()
    910                 return True
    911 

/Users/jakevdp/anaconda/envs/python3.5/lib/python3.5/site-packages/vega-0.3.0-py3.5.egg/vega/base.py in _ipython_display_(self)
     47         )
     48         publish_display_data(
---> 49             {'application/javascript': self._generate_js(id)},
     50             metadata={'jupyter-vega': '#{0}'.format(id)}
     51         )

/Users/jakevdp/anaconda/envs/python3.5/lib/python3.5/site-packages/vega-0.3.0-py3.5.egg/vega/base.py in _generate_js(self, id)
     34         payload = template.format(
     35             selector=selector,
---> 36             spec=json.dumps(self.spec),
     37             type=self.render_type
     38         )

/Users/jakevdp/anaconda/envs/python3.5/lib/python3.5/json/__init__.py in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
    228         cls is None and indent is None and separators is None and
    229         default is None and not sort_keys and not kw):
--> 230         return _default_encoder.encode(obj)
    231     if cls is None:
    232         cls = JSONEncoder

/Users/jakevdp/anaconda/envs/python3.5/lib/python3.5/json/encoder.py in encode(self, o)
    197         # exceptions aren't as detailed.  The list call should be roughly
    198         # equivalent to the PySequence_Fast that ''.join() would do.
--> 199         chunks = self.iterencode(o, _one_shot=True)
    200         if not isinstance(chunks, (list, tuple)):
    201             chunks = list(chunks)

/Users/jakevdp/anaconda/envs/python3.5/lib/python3.5/json/encoder.py in iterencode(self, o, _one_shot)
    255                 self.key_separator, self.item_separator, self.sort_keys,
    256                 self.skipkeys, _one_shot)
--> 257         return _iterencode(o, 0)
    258 
    259 def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,

/Users/jakevdp/anaconda/envs/python3.5/lib/python3.5/json/encoder.py in default(self, o)
    178 
    179         """
--> 180         raise TypeError(repr(o) + " is not JSON serializable")
    181 
    182     def encode(self, o):

TypeError: 1850 is not JSON serializable

Replace lightning renderer

@domoritz of the vega team is looking at implementing a simple Python API for rendering Python dicts as vega-lite JSON in the Jupyter notebook in this repo: https://github.com/vega/ipython-vega

Once that is working with the vega-lite 1.0 spec, we should move to using that for vega-lite rendering instead of lightning.

Integration with topological data analysis library?

Here is a python TDA library with a JS viz GUI: https://github.com/rosinality/knotter

Do you think this would be in scope for exploratory data analysis in Altair?

Start to use data in `spec.py` for various enums in `api.py`

We currently ship the vega-lite spec as a Python dict in spec.py. There are some data fields, such as aggregation enums that we should just pull from spec.py to ease following the spec over time.

The width and height of config are not being respected by vega-lite

API question: should we implement encode_x(), etc.?

Currently we have two ways of specifying configurations, which are equivalent:
Chart().configure(cell=CellConfig(**kwargs)) and
Chart().configure_cell(**kwargs).

On the other hand, for encodings we have
Chart().encode(x=X('name', **kwds)) or Chart().encode(x='name'), where in the second shorthand there is no way to specify additional keywords.

I wonder if it wouldn't be convenient to have, by analogy with the configure methods, Chart().encode_x('name', **kwds)?

On the one hand, there are situations where it would be very useful – especially for tab-completion of arguments within the function. It also creates some symmetry with the configure_* methods, and users might expect to be able to do this. On the other hand, it would add yet another way of solving the same probelm, and is not particularly convenient for the most common case of defining several encodings at once.

Thoughts?

name/type parsing breaks column names with whitespace

I think the best fix is to not strip everything.

How to structure narrative docs for altair

Vega-Lite itself now has very nice narrative docs. I am wondering how we want to handle the narrative docs for altair. Some options:

Just add the equivalent altair/python code to the main Vega-Lite documentation and have that be the main single source of narrative documentation.
Create notebooks for the examples that are in the Vega-Lite documentation to help with that.
Keep it all completely separate.
Offer tutorials on specific dataset in altair, similar to how seaborn has the nice one on the titanic dataset.

@domoritz

Implement data transformation module

For Python based renderers, the main challenge is starting from the initial data from and then transforming the data into something that can be more directly plotting onto subplots in the various plotting libraries. The data transformation logic will be the same for any Python based renderer (Matplotlib, Bokeh, Plotly, bqplot) so it should probably live in Altair. Plus the mpl.py module has a start on this.

From our talking to Jeff Heer, the data transformation logic is as follows:

Binning. Any column that should be binned creates a new column of binned data and the data is then grouped by that binned column.
Grouping. Next the binned columns and all columns associated with shelves (row, col, color, size, etc.) are grouped.
Aggregation. Any aggregations are then applied.

Here is an issues on the vega-lite repo where we have asked about how this logic happens:

vega/vega-lite#584

Fix Datetime Serialization

In ellisonbg@976bd5f I added a quick fix for the most common case: datetime64[ns]. We should add suitable string formatting for other units as well.

Update to the Vega-Lite 1.0 spec

@jakevdp

Vega-Lite 1.0 is out now. We should update the Python APIs to that spec.

Allow columns to be specified by passing a pandas Series

From @Carreau in #140

Chart(cars).mark_circle().encode(
    x='Horsepower',
    y='Miles_per_Gallon',
    color='Origin',
    size='Acceleration'
)

would it be possible at some future point to use:

    x='Horsepower',
    y='Miles_per_Gallon',
    color='Origin',
    size=cars.Acceleration
)

to be smart and figure back the name of the column ?

Avoid usage of `_` and `Out[]` variables in docs

This is confusing for users to who don't know these tricks in IPython.

Altair throws error for calculated attributes

e.g.

from altair import *
population = load_dataset('population')
for col in population:
    population[col] = population[col].astype(float)

transform = Transform(filter='datum.year==2000',
                      calculate=[Formula(field='gender',
                                         expr='datum.sex == 2 ? "Female" : "Male"')])

Layer(population, transform=Transform(filter="datum.year==2000")).encode(
    x='age:O',
    y='sum(people)',
    color=Color('gender')
).bar()


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/Users/jakevdp/anaconda/envs/python3.5/lib/python3.5/site-packages/IPython/core/formatters.py in __call__(self, obj)
    907             method = _safe_get_formatter_method(obj, self.print_method)
    908             if method is not None:
--> 909                 method()
    910                 return True
    911 

/Users/jakevdp/anaconda/envs/python3.5/lib/python3.5/site-packages/altair-0.0.1-py3.5.egg/altair/api.py in _ipython_display_(self)
    424         from IPython.display import display
    425         from vega import VegaLite
--> 426         display(VegaLite(self.to_dict()))
    427 
    428     def display(self):

/Users/jakevdp/anaconda/envs/python3.5/lib/python3.5/site-packages/altair-0.0.1-py3.5.egg/altair/api.py in to_dict(self, data)
    362 
    363     def to_dict(self, data=True):
--> 364         D = super(Layer, self).to_dict()
    365         if data:
    366             if isinstance(self.data, Data):

/Users/jakevdp/anaconda/envs/python3.5/lib/python3.5/site-packages/altair-0.0.1-py3.5.egg/altair/schema/baseobject.py in to_dict(self)
     25                 if v is not None:
     26                     if isinstance(v, BaseObject):
---> 27                         result[k] = v.to_dict()
     28                     else:
     29                         result[k] = v

/Users/jakevdp/anaconda/envs/python3.5/lib/python3.5/site-packages/altair-0.0.1-py3.5.egg/altair/schema/baseobject.py in to_dict(self)
     25                 if v is not None:
     26                     if isinstance(v, BaseObject):
---> 27                         result[k] = v.to_dict()
     28                     else:
     29                         result[k] = v

/Users/jakevdp/anaconda/envs/python3.5/lib/python3.5/site-packages/altair-0.0.1-py3.5.egg/altair/api.py in to_dict(self)
     65             return None
     66         if not self.type:
---> 67             raise ValueError("No vegalite data type defined for {0}".format(self.field))
     68         return super(_ChannelMixin, self).to_dict()
     69 

ValueError: No vegalite data type defined for gender

Not possible to tab complete `mark_*()` methods in chains

When we use the syntax:

Chart(data).mark_foo()

It isn't possible to tab complete the mark_ unless you are using the non-default Greedy tab completion in IPython. Maybe we should just create the chart as c = Chart(data)? in our tutorial examples?

From @Carreau in #140

[WIP] Create notebooks for all of the vega-lite examples

We should create notebooks for all of the vega-lite examples that show how to use Altair to create the examples.

I am working on this issue this morning.

Cannot import the lightning renderer

I'm getting latest altair and installing it following instructions. I'm running the server like so:

The version of the notebook server is 3.2.0-8b0eef4 and is running on:
Python 2.7.10 |Anaconda 2.3.0 (x86_64)| (default, May 28 2015, 17:04:42) 
[GCC 4.2.1 (Apple Inc. build 5577)]

When I try to run the BasicExample notebook with alt.use_renderer('lightning'), I get the following exception:

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-2-6d2e0a8f478e> in <module>()
----> 1 alt.use_renderer('lightning')

/Users/alexgg/anaconda/lib/python2.7/site-packages/altair-0.0.1-py2.7.egg/altair/api.pyc in use_renderer(r)
    437     else:
    438         if r in _renderers:
--> 439             _renderer = _renderers[r]()
    440         else:
    441             raise ValueError('renderer could not be found: {0}').format(r)

/Users/alexgg/anaconda/lib/python2.7/site-packages/altair-0.0.1-py2.7.egg/altair/api.pyc in _get_lightning_renderer()
    413 
    414 def _get_lightning_renderer():
--> 415     from .lgn import LightningRenderer
    416     return LightningRenderer()
    417 

build/bdist.macosx-10.5-x86_64/egg/altair/lgn.py in <module>()

build/bdist.macosx-10.5-x86_64/egg/altair/lightning.py in <module>()

NameError: name 'Lightning' is not defined

Any ideas?
Thanks!

Treat `Q` columns as `O` when groupby'd

Right now in vega-lite, all columns that are not aggregated are groupby'd. This includes Q columns, which leads to rather unexpected visualizations where a quantity column is grouped by value. I brought this up on the vega-lite repo and the consensus is to treat such Q columns as O in this case and warn. Here is the discussion:

vega/vega-lite#688

Lightning renderer broken inside `ipywidgets.Output`

When trying to get the lightening renderer to work with the new IPython widget, it breaks in many ways. In particular, it won't render at all inside ipywidget.Output. This may be related to:

How lightning uses IFrames
The lack of a unique id on the div (should have a uuid).
Other?

Wrapping JS libraries (Like R HTML widgets)

JS has lots of great specialized libraries. HTML Widgets wraps them in R.

Maybe Altair can have similar facilities. Is this in scope?

Auto-generate datasets

I noticed that there are some new datasets available for vega-lite. We should try to auto-generate them from the vega-datasets repo. I would probably generate a datasets.json file, put it in the package, and read it into the _datasets variable in the current datasets.py.

A couple other dataset-related thoughts:

maybe use lru_cache to cache datasets in memory so they're not downloaded twice within a session
maybe auto-generate dataset functions like load_cars() so that they can be tab-completed, with description in the doc-string

Simplify Data URLs

To create a visualization currently from a URL we use:

Chart(Data(url='http://some.url/some/path.json', format='json')).mark_point()

I think we could pretty easily shorten this to

Chart('http://some.url/some/path.json').mark_point()

without introducing any ambiguity.

[Discussion] Object API

As I'm coding up the tutorial & examples, it strikes me that there's a confusing inconsistency in the API.

Consider this:

from altair import *
data = load_dataset('cars')
Layer(data).encode(
    X('Horsepower', bin=Bin(maxbins=10)),
    y='count(*):Q'
).bar()

The encode() interface has some nice properties: that is, you can pass attributes either as an unnamed argument (e.g. X(...)) or as a named argument (e.g. y=...). The benefit here is that it makes the API intuitive and reduces duplication of information (vs. e.g. x=X(...)).

When building these plots, I found it confusing that nested arguments don't allow the same flexibility. That is, I'd like to be able to write X('Horsepower', Bin(maxbins=10)) or perhaps also X('Horsepower', bin={'maxbins':10}). Could we override the __init__() method of traitlets to somehow do this sort of inference on input arguments?

cc/@ellisonbg

(Unrelated: the above snippet is how we spell "histogram" in Altair; we might think about a convenience method to wrap this).

Add more tests

Traitlets silently ignore typos

>>> from altair.schema import Config
>>> Config(backgruond='blue').to_dict()
{}
>>> Config(background='blue').to_dict()
{'background': 'blue'}

Is there an easy way to tell traitlets to validate init arguments?

NaN fields produce errors with the lightning renderer - may affect other renderers too?

Using Altair to render the following pandas df:

records_text = '{"clientid":"8","querytime":"18:54:20","market":"en-US","deviceplatform":"Android","devicemake":"Samsung","devicemodel":"SCH-i500","state":"California","country":"United States","querydwelltime":13.9204007,"sessionid":0,"sessionpagevieworder":0}\n{"clientid":"23","querytime":"19:19:44","market":"en-US","deviceplatform":"Android","devicemake":"HTC","devicemodel":"Incredible","state":"Pennsylvania","country":"United States","sessionid":0,"sessionpagevieworder":0}'
json_array = "[{}]".format(",".join(records_text.split("\n")))
import json
d = json.loads(json_array)
result = pd.DataFrame(d)
result

the NaN for querydwelltime produces the following error:

Javascript error adding output!
TypeError: Cannot read property 'prop' of undefined
See your browser Javascript console for more details.

The vegalite spec produced by Altair is:

{'config': {'width': 600, 'gridOpacity': 0.08, 'gridColor': u'black', 'height': 400}, 'marktype': 'point', 'data': {'formatType': 'json', 'values': [{u'deviceplatform': u'Android', u'devicemodel': u'SCH-i500', u'country': u'United States', u'sessionpagevieworder': 0, u'state': u'California', u'clientid': u'8', u'sessionid': 0, u'querytime': u'18:54:20', u'devicemake': u'Samsung', u'market': u'en-US', u'querydwelltime': 13.9204007}, {u'deviceplatform': u'Android', u'devicemodel': u'Incredible', u'country': u'United States', u'sessionpagevieworder': 0, u'state': u'Pennsylvania', u'clientid': u'23', u'sessionid': 0, u'querytime': u'19:19:44', u'devicemake': u'HTC', u'market': u'en-US', u'querydwelltime': nan}]}}

For commentary and possible fixes, this issue is tracked by lightning renderer in:
lightning-viz/lightning-python#34

This issue is also documented in sparkmagic at: jupyter-incubator/sparkmagic#39

cc @mathisonian

API: should ``Layer()`` not derive from ``BaseObject``?

Since Layer is the main interface, it would be nice if tab completion on the object only listed relevant pieces of the API so that you can quickly find what plot types are available (e.g. point(), bar(), text(), etc.)

Currently, since it derives from BaseObject the namespace is polluted with all sorts of traitlet stuff that the user probably doesn't care about.

I'd propose something like this:

class LayerObject(BaseObject):
    # traitlet-related stuff goes here
    def __init__(self, *args, **kwargs):
        super(LayerObject, self).__init__(**kwargs)

    # etc.

class Layer(object):
    # non-traitlet-related Layer methods here
    def __init__(self, *args, **kwargs):
        if len(args)==1:
            self.data = args[0]
        self._layerobject = LayerObject(**kwargs)

    def point(self):
        self.mark = 'point'
        return self

    # etc.

The only problem would be if we ever want to pass Layer to some other class this would complicate things. What do you think?

playing BasicExample raises pandas 0.17 warning

not sure if it's altair or seaborn, so I post here.

Are filter and config in spec methods for Viz just like encode is?

I'm having trouble understanding exactly how those should be implemented. I'll take another look tomorrow when I have a clearer head =)

Create conda package on conda forge

Here is an example...

https://github.com/conda-forge/ipyleaflet-feedstock

Add html render?

I think I've hacked together a combo of vega 1.x + vega-lite that I think will embed rendered html based on the specs we generate 🙏 . Worth adding here alongside the render, maybe as a render_html? Probably just for intermediate testing purposes, while we wait for vega-lite 2.x support and/or incorporation it into other interactive libraries (e.g. Lightning).

Warning on `df==None`

When the data attribute of a Viz class changed, traitlets does a df==None test to see if the value has changed. This raises a warning.

Test type inference

When the Layer.data attribute is set, we trigger logic to infer the vegalite type of the channels based on what field (column) they are pointing to. Likewise, if data is set to None, the type inference should be reset.

We need to write tests to make sure all of this is working.

Implement full validation

The vega-lite spec has contraints about column types, shelves and aggregation. We should implement those in traitlets.

Update README

We haven't updated the README for the new APIs and renderer appoach. Would also be great to have a simple example on the main README (the PNG and python code).

zero-valued traitlet attributes get ignored

>>> CellConfig(strokeWidth=1).to_dict()
{'strokeWidth': 1.0}
>>> CellConfig(strokeWidth=0).to_dict()
{}

This should be {'strokeWidth': 0.0}. From a quick glance at the code, I'm not sure where this is being lost...

Spec for boxplots

Starting to think about how to express a box plot with this type of spec. For this discussion, let's say we have two columns: amount:Q and state:N.

Option 1 (boxplot implies computing the summary stats):

Vis(df).encode(x='state:N', y='amount:Q').boxplot()

Option 2 (explicit summary stats call):

Vis(df).encode(x='state:N', y='summarize(amount):Q').boxplot()

The big change from the existing vega-lite spec is that a box plot requires more than a scalar aggregation of the y data (mean, var, etc.). Questions:

Should aggregations be able to emit non-scalars?
Should marks be able to comsume non-scalars (either entire data sets or non-scalar aggregations)?

Additional aliases

There are a few other class names that we alias in api.py:

AxisProperties->Axis
BinProperties->Bin
LegendProperties->Legend
VgFormula->Formula

Two action items related to these:

The Python code generation currently uses the unaliased names.
We might want to autogenerate the aliased classes rather than handcoding them in api.py