sa-lee / easel Goto Github PK

View Code? Open in Web Editor NEW

4.0 4.0 1.0 1.23 MB

👩‍🎨👨‍🎨🎨🖌

R 89.28% JavaScript 10.72%

easel's People

Contributors

Stargazers

Watchers

Forkers

starxian

easel's Issues

revisit data frame columns

next release of tibble has support for data.frame columns, that would mean we could avoid name mangling

Defining handler functions for control elements

I've been playing around with shiny to try and get some more low level functions to work - which is proving surprisingly difficult, the official way of getting these extensions in is via the shinyjs package via the onevent function but these give you surprisingly little control over the event but can't assign the output to an object on the server side. The alternative is to use Shiny.onInputChange with javascript, here's a working demo with a proper drag (rather than a brush):

library(shiny)
library(ggplot2)

click_handler_js <- function() {
  "$(function(){ 
    $(pl).click(function(e) {
      var output = {};
      output.coord_x = e.clientX;
      output.coord_y = e.clientY;
      output.width = $(pl).width();
      output.height = $(pl).height();
      Shiny.onInputChange('click', output)
    });
  }) 
 "
}

drag_handler_js <- function() {
  "$(function(){
    var is_drawing = false;
    var output = {};
    output.width = $(pl).width();
    output.height = $(pl).height();
    $(pl).on('dragstart', function(e) { e.preventDefault()});
    $(pl).mousedown(function(e) {
      is_drawing = true;
      output.start_x = e.clientX;
      output.start_y = e.clientY;
    }).mousemove(function(e) {
      if (is_drawing) {
        console.log('moving')
      }
    }).mouseup(function(e) {
      is_drawing = false;
      output.end_x = e.clientX;
      output.end_y = e.clientY
      console.log('not moving');
      console.log(output);
      Shiny.onInputChange('drag', output);
    });
  })"
}

ui <- basicPage(
  tags$script(HTML(click_handler_js())),
  tags$script(HTML(drag_handler_js())),
  plotOutput("pl"),
  tableOutput("clicked"),
  verbatimTextOutput("dragged")
)

# an example of projecting onto data coordinates, would only work for cts vars
click_handler_r <- function(input) {
  coord_x <- input$click$coord_x
  coord_y <- input$click$coord_y
  width <- input$click$width
  height <- input$click$height
  # this doesnt take into account plot margins but will do for now
  data.frame(hp = coord_x*(diff(range(mtcars$hp))/width), 
             wt = (height - coord_y)*(diff(range(mtcars$wt))/height))
}

server <- function(input, output, session) {
  plot <- ggplot(mtcars, aes(x = hp, y = wt)) + geom_point()
  output$pl <- renderPlot(plot)
  output$clicked <- renderTable(click_handler_r(input))
  output$dragged <- renderPrint(input$drag)
}

shinyApp(ui, server)

Again these handlers are specific to how shiny does things - it also seems kinda clunky to write out javascript as inline text and having the user write javascript is something we should avoid.

integrating vega

I think vega (not vegalite) is the right level of abstraction for what we are trying to do here. Like vegalite it uses a json spec for defining graphics but has a much lower level API for defining event streams and responding to them.

vega spec

{
  "$schema": "https://vega.github.io/schema/vega/v4.json",
  "width": null,
  "height": null,
  "signals": [],
  "data": [],
  "scales": [],
  "axes": [],
  "marks": []
}

The basic vega spec maps somewhat organically to our ideas surrounding the plot_tibble.
Evaluated plot data that results from a graphics pipeline can be inserted as a row oriented json into the spec at runtime, likewise marks, axes, and scales can be parsed into the spec via our aesthetic mappings (multiple layers can be generated via the details determined by the plot data.),

I currently can mostly render a few basic plots statically using this approach via the vegalite::from_spec() function.

Signals are an interesting feature of the vega api , they are reactive variables that respond to input event streams. As an example here's a signal that defines a basic brush on the x axis:

{
  "name": "brushX",
  "on": [
    {
      "events": "mousedown",
      "update": "[x(), x()]"
    },
    {
      "events": "[mousedown, window:mouseup] > window:mousemove!",
      "update": "[brushX[0], clamp(x(), 0, width)]"
    }
  ]
}

In this case the signal is composed of two events, one that initializes the brush on mouse down as an array of size 2, and then updates the second value of the array after a mousemove and mouseup, while ensuring the range of the second element is within the boundaries of the plot area.

Since the brush is reactive we can then draw it as rectangle mark by encoding the xmin and xmax aesthetics should update in response to the streams.

"marks": {
   "type": "rect",
   "encoding":  {
         "update" : {
            "x" : {"signal" : "brushX[0]"},
            "x2" : {"signal" : "brushX[1]"},
            "y" : {"value" : 0},
           "y2": {"value": "height"}
         }
     }
}

A question is how to best represent the creation of signals via our control_ verbs, since they are not realized until graph is rendered, how can we best represent something like control_drag() %>% draw_rect() in our plot tibble? Maybe as an empty reactive data frame with the columns corresponding to the signal array? Should a user then have to specifiy a new visualise call to be explicit about their intention of how to draw the brush?

mtcars %>%
    visualise(x = mpg, y = hp) %>%
    draw_points() %>%
    control_drag_x() %>%
    visualise(xmin = brushX0, xmax = brushX1) %>%
    draw_rect(...)

In a way this is more explicit then what we have discussed before... It does bring up one question though - is the data outputted from a brush a valid aesthetic in the way we ordinarily think of them or should something 'special' be associated with a control?

Essentially we could have a very compressed set of signals that are initialized with control_* so we can limit the possible signals provided by vega.

View api

Instead of embedding the data into the json spec, we can programmatically insert it at run time using the View API, all we need to do is have a reference to the name of the data object from R as a field in the json spec - this requires a bit of hackage on the htmlwidget side.

Pushing everything to down vega's runtime

Another possibility is to have our graphics pipeline transpiled into a vega spec.
There's a pretty extensive array of ops available to aggregate/filter etc - could we map these to dplyr verbs?

integrating with shiny

See here for examples of sending data back and forth between shiny and js:

https://shiny.rstudio.com/articles/js-send-message.html

Since the View API allows signals to be modified/updated/listened to at runtime we can use that to send events back to Shiny. See here for details:
https://vega.github.io/vega/docs/api/view/#view_addSignalListener

Some notes on control

These are just some thoughts based on our new api

To go back to our API - we have a new verb called control_ that defines events with respect to some aesthetics:

We create the 'root' node of our graphics pipeline:

p <- mtcars %>% pin() %>% visualise(x =hp,  y =   wt)

Then a static plot can be initialized with

scatter <- p %>% draw_points()

Now suppose we want to trigger an event based on doing something with the plot. The control verbs tell the user an event will be observed by the plot and return data, whenever the event is observed. The important thing is that the emitted data is in the same space as the aesthetic mappings at the root node of the pipeline. (The data model of the emitted event is determined by the type of control and the aesthetic mappings).

# this should emit a data.frame with x,y columns
click <- p   %>% control_click()

On its own a control does not do anything exciting aside from emitting data based on user input (how it does this can be flexible - we could use the js examples as shown above). It represents a branch in the pipeline where new data is emitted and forms a path back to any transformations made before rendering to the device.

data →  visualise  →  transformations  → device
            ↓                          
                 ↗                         ↓
          control   ←       ←      ←     ←

Now the control function also needs to include a function that provides a mapping (i.e a handler, the diagonal arrow in the pipeline above), which stages the emitted data to be included into the pipeline (right of visualise). The handler will be a function of both the emitted data and the data at the root node of the pipeline, for example for a click event that stages nearest neighbours of the click :

handler <- function(.data, .emitted) {
   if (!is.null(.emitted)) {
      near(.data[['aes_x']], .emitted[['aes_x']]) & near(.data[['aes_y']], .emitted[['aes_y']])
   } else {
      rep(TRUE, nrow(.data))
   }
}

In most cases the handler amounts to a selection (either rows or columns) - but I guess there could be other cases?

Now we have something staged, we would like to do something with our selection, ie. we would like to 'put' up a new stage in the pipeline that depends on the previous one. For example, we could highlight or annotate a point based on a click event. For example, adding a highlight could be putting modifying the points to be red

highlight <- click %>% put(colour = "red")

These would have to be checked to be valid options for downstream draw_. Essentially the result of put triggers an new stage that calls the handler and returns a new vector placing the colour red when true, and leaving the current value if false.

function(opts) if_else(handler(), opts, current_value(opts))

If we put a new stage based on modifying an aesthetic, we would need to have access to the underlying guides...

Now for a basic rectangular brush - we want to trigger a new layer (draw a rectangle), upon new data being emitted. This could be done directly via a call to a draw, since we know that control emits data from an event, in this case a new layer is added to our plot data (independently of the handler).

brush <- p %>% control_drag() %>% draw_rect()

The handler being a function of just the .data and the .emitted data provides a pretty general framework for performing selection since all variables are available - and allowing layers to be built on top of the emitted data also provides a lot of flexibility.

Open questions:

handling multiple layers?
handling multiple datasets?
handling modifying aesthetics upon putting up a new stage

A grammar of aesthetics

At the moment we define aesthetic mappings to variables with the visualise function - we define explicitly which aesthetic elements map to a variable. This eventually results in a call to dplyr::mutate to augment the data with "aes_" columns. Currently, the way the grammar of graphics is set up a user is generally required to create a long form data frame via some data manipulations (if they are long form). In ggplot2, there is a one to one mapping between aesthetics and variables, but it doesn't necessarily have to be.

Could we use scoped variants of these functions to imply these operations are being done on certain collections of variables? Can we map multiple variables to an aesthetic?

Let's consider two examples a side by side box plot and parallel coordinates plot.

Here's a fairly common matrix structure in genomics along with the ggplot specs for a boxplot:

library(tidyverse)
set.seed(100)
tbl <- tibble::tibble(gene_id = 1:30L, 
                      A1 = rnorm(30), 
                      A2 = rnorm(30), 
                      B1 = rnorm(30, mean =  0.5), 
                      B2 = rnorm(30, mean = 0.5, sd = 3))

tbl_by_expr <- tbl %>%
  gather("sample", "expression", -gene_id) 
# boxplot
tbl_by_expr %>%
  ggplot(aes(x = sample, y = expression)) +
  geom_boxplot()

The boxplot requires performing a gather call to go from long to wide and then computing summary statistics on each slice of the long form. This is computation is inefficient as the number of different variables grows. I would also argue that people do intuit wide form. An alternative would be to keep the wide form around and perform operations column wise. Here we introduce the notion of visualise_at which allows the use of the slice operator to multiple variables to place on aesthetic, in our API this could something like:

tbl %>%
    visualise_at(x = A1:B2)

i.e. we are specifying that on the x-axis we are placing all variables A1 up to B2 (i.e. one to many map), we could either implicitly change the table here via a gather call or reserve the gathering until the end of the pipeline. If we could have data.frame columns in a tibble nesting the aes_x column could provide some computational gains. A boxplot is an interesting geom too since it's a compound of points, rectangles and lines (perhaps best to just leave as draw_boxplot)

tbl %>%
     visualise_at(x = A1:B2) %>%
     summarise_box() %>% 
     # computed without gathering first using `dplyr::summarise_at` then use tidyr::gather
    draw_boxplot()

Another example is parallel coordinates plot - again this is a plot where multiple variables are mapped to a single aesthetic. There is also a scaling operation required for this plot so variables can be compared to each other.

Here's one possible way of making a PCP with ggplot2

tbl_by_expr %>%
  group_by(sample) %>%
  mutate(scaled_expression = (expression - mean(expression)) / sd(expression)) %>%
  ggplot(aes(x = sample, y = scaled_expression, group = gene_id)) +
  geom_line()

Again with our API one possible is to use compound aesthetics - how to represent the scaling options - essentially this is a mutate at each variable (should it be done before or after a visualise call), another option is to just call visualise_at with the option of including a function to modify those aesthetics (i.e. pass it down to dplyr::mutate_at then gather)

tbl %>%
     visualise_at(x = A1:B2, .f = scale) %>%
     draw_lines()

dataflow post signal propagation

I've got this mostly working (at least in the case of highlighting brush)
per the last couple of commits, now just need to wrap into our API.

The idea here is that any changes that reference a control dataset need to in turn become reactive. To be more concrete, here's our canonical brush example

mtcars %>%
  visualise(x = hp, y = mpg) %>%
  draw_points() %>%
  control_drag() %>%
  draw_rect()

Now control_drag emits a reactive (currently stored as a list column in the tibble) based on the signals that are listened to from the Vega spec. By design, we emit these in data space rather than pixel space. Now to generate where we've selected, we really want to do something like

mutate(.,  selected = c(aes_x, aes_y) %in% control_data, aes_colour = ifelse(selected, "red", "blue"))

Where control_data is just the scalars from our reactive values. In this mutate call all the expressions generated need to be reactive as well, and any reactive that's previously referenced needs to instantiated. Things get tricky fast.

On the render side, any new mark from a signal to be added to the spec and the data sent to the viz upon observation.

sa-lee / easel Goto Github PK

easel's People

Contributors

Stargazers

Watchers

Forkers

easel's Issues

revisit data frame columns

Defining handler functions for control elements

integrating vega

vega spec

View api

Pushing everything to down vega's runtime

integrating with shiny

Some notes on control

A grammar of aesthetics

dataflow post signal propagation

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent