Giter VIP home page Giter VIP logo

greta's People

Contributors

assignuser avatar benmarwick avatar edwinth avatar flyaflya avatar goldingn avatar jdyen avatar jeffreypullin avatar lionel68 avatar lorenzwalthert avatar maelle avatar michaelquinn32 avatar mmulvahill avatar njtierney avatar pteetor avatar tiphainecmartin avatar twolodzko avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

greta's Issues

propagate cholesky factors through transformations

correlation matrices and covariance matrices created via lkj_correlation() and wishart() have a cholesky factor representation, so cholesky factorisation is skipped in e.g. multivariate_normal(). This improves numerical stability and speed.

However, this benefit is lost if any transformations are applied to the matrix, e.g. multiplying by a scalar or inverting the matrix. The speed ups of avoiding the cholesky factorization could be maintained in these cases, by handling certain transformations automatically. E.g. multiplying the cholesky factor by the square root of the scalar, or by inverting and transposing the cholesky factor rather than the resulting matrix.

These could be defined directly in e.g. solve.greta_array() and *.greta_array(), or via some nice (unexported) dispatch method.

feedback on install process

Just wanted to leave a quick note with some feedback on the installation process that could be used to make the documentation clearer.

It took us a while to realize that greta would only run on 64 bit R because tensorflow requires 64 bit python. It seems obvious in hindsight. Because we had both 32 and 64 bit R installed on our machine, install_github was trying to install greta for both. It would fail at the 32 bit install, and then not attempt the 64. We fixed this with:

library(devtools)
options(devtools.install.args = "--no-multiarch") 

Merge flat() into free()

flat() is confusingly named, since it isn't uniform, it's density-free. Instead, it would be better to add min and max arguments to free(), with default values of -Inf and Inf.
This could be superficial, and preserve the two underlying node types, but it might be better to consolidate them instead.

Need to define transforms for the four possible sets of ranges:

  • a < x < b
  • a < x < Inf
  • -Inf < x < b
  • -Inf < x < Inf

improve transform for flat distribution

The flat() distribution enables user-specification of constraints on variables without implying a density. Like all parameters, these parameter are transformed to an unconstrained scale, using a form of logit transform. It seems to be particularly unstable however, so it would be worth trying the transform used in STAN:

a < X < b
Y = logit((X - a) / (b - a))
logit(u) = log(u / (1 - u))

add extraction/subsetting syntax

This should be possible:

Betas = normal(0, 1, dim = c(2, 4))
betas_1 = Betas[1, ]

(the top row of Betas being assigned to betas_1)

as should this:

Betas = normal(0, 1, dim = c(2, 4))
Betas[2, 3] = lognormal(3, 3)

(one element of Betas being lognormally distributed, the rest staying normally distributed)

I have a plan for how to implement this. Any comments on the proposed syntax and behaviour would be welcome though

automatically decentre symmetric distributions

In some hierarchical models, the usual 'centred' formulation of a distribution can cause convergence problems due to a strongly funnel-shaped posterior. A solution to this is to explicitly use an equivalent non-centred version of the hierarchical model.

centred:

mu = normal(0, 1)
sigma = lognormal(0, 1)
x = normal(mu, sigma)

non-centred:

mu = normal(0, 1)
sigma = lognormal(0, 1)
x_raw = normal(0, 1)
x = mu + sigma * x_raw

Because greta defines an unconstrained node for each parameter anyway, it should be possible to automatically implement the non-centred version under the hood. I.e. to define the following Tensors in the TF graph:

# define the unconstrained node
xnode_free <- tf$Variable(...)

# create an uncentred version (like x_raw above)
xnode_uncentred <- tf_from_free(xnode_free)

# centre it; applying the mean and sd tensorflow values it depends on
xnode <- centre(xnode_uncentred)

The density could then still be defined on the centred version, or a few FLOPs might be saved by defining a reduced version of the density and applying it to xnode_uncentred. E.g. for the normal distribution, just: exp(-0.5 * x ^ 2) * 0.3989 / sigma.

export install_tensorflow() method

The tensorflow R package provides this function, which should be super useful. Just need to re-export it, so that users have access without having to load the package.

change print method for array?

I've been playing around with greta and I really like some of the features, but I was wondering if perhaps you would consider changing the print method for a greta_array? For example, in your examples section of the README, it states:

To make a matrix that has random variables in the first column, but zeros everywhere else, we could do:

 # a 4x3 greta array of zeros
 x <- zeros(4, 3)

 # now with random variables in the first column
 x[, 1] = normal(0, 1, dim = 4)

Thus, my expectation is that x should now look like

 ## greta array (operation)
 ## 
 ##      [,1] [,2] [,3]
 ## [1,]   ?    0    0
 ## [2,]   ?    0    0
 ## [3,]   ?    0    0
 ## [4,]   ?    0    0

but when I inspect it, I instead get

 ## greta array (operation)
 ## 
 ##      [,1] [,2] [,3]
 ## [1,]   ?    ?    ?
 ## [2,]   ?    ?    ?
 ## [3,]   ?    ?    ?
 ## [4,]   ?    ?    ?

Perhaps the print output could be modified to aid in visual inspection of the model form?

add mixture distribution

This should have syntax like:

theta = variable(lower = 0, upper = 1, dim = 3)
dist = mixture(normal(0, 1), normal(1, 1), normal (2, 2), weights = theta)
distribution(y) = dist

This should strip the distributions from the variables passed to it and define the children of those distributions. Use log_sum_exp to sum the weighted densities on the probability scale. Possibly just let them define themselves in TF, combine the densities into the mixture density, then delete the component TF densities (so they don't count toward the total).

Also add variables representing ordered vectors and simplices (with appropriate jacobians) for use in these models.

add copy method

The pass-by-reference nature of greta arrays may trip up users, particularly those modifying greta arrays in functions. This should be documented in the technical details section.

A method to explicitly copy greta arrays would be useful too. In order to retain the connectedness of the DAG (for non-data greta arrays) this should probably be an operation. The following code snippet doe that:

copy <- function (x) {
  if (greta:::node_type(x$node) == 'data') {
    as_data(x$node$value())
  } else {
    greta:::op("copy", x, tf_operation = "tf$identity")
  }
}

E.g. the following code creates copies of data (termini, so not recorded on the graph) and variables (as an op):

y1 <- as_data(rbinom(3, 1, 0.5))
y2 <- copy(y1)

p <- variable()
p1 <- copy(p)
distribution(y1) = bernoulli(p1)
p2 <- copy(p)
distribution(y2) = bernoulli(p2)
plot(model(p))

screen shot 2017-06-20 at 8 12 29 pm

double precision with LKJ correlation distribution

running model(..., precision = "double") fails with:

Error in py_call_impl(callable, dots$args, dots$keywords) : 
  TypeError: Input 'y' of 'Sub' Op has type float64 that does not match type float32 of argument 'x'.

if the model contains random variables created with lkj_correlation()

This is caused by the Jacobian adjustment term for correlation matrices using a 32-bit float.
The fix is just swapping: tf$log(1 - ...) to tf$log(fl(1) - ...)

There's also a loose single-precision float on this line which may or may not need changing.

Each of these jacobians should be tested under double precision floats.

tidy up tensorflow's compiler suggestions

ideally, detect on package load the compiler options TF thinks we should have used. Issue a message like:

your Tensorflow installation could be made faster, see speedup() for details

then

speedup()
The following CPU instructions are available on your machine, but Tensorflow wasn't compiled to use them: SSE4.1,  SSE4.2, AVX, AVX2, FMA
See http://some_url for instructions on how to compile Tensorflow for your system. 

change the dimensions of greta arrays

It's a bit tricky to reshape greta arrays. E.g. how to convert the following to a 3x4 greta array?

a <- greta_array(1:12)

This works (filling by column), but is non-intuitive:

b <- zeros(3, 4)
b[] <- a
b

methods like these would be preferable, but aren't currently implemented:

dim(a) <- c(3, 4)
b <- greta_array(a, dim = c(3, 4))

add operations

The basics are implemented, and new operations are easy to register.

I intend to implement all of the operations listed in the JAGS and STAN manuals

please comment here if there are any operations you would like to be added as a priority

create website

with pkgdown and automated github integration, using existing css and with custom index page etc.

Possible issue with beta() distribution

Hi,

First, I would like to congratulate you on a great package. It is very convenient and user friendly. Thank you for your work!

I am running into an error when I try to execute the following code:

library(greta)
my_rate <- beta(2, 5)
my_data <- 6
likelihood(my_data) <- binomial(16, my_rate)
m <- define_model(my_rate)

Error in -x : invalid argument to unary operator

If I substitute beta() with another distribution for my prior (ex. lognormal()), the code executes without any issues.

I am not sure whether I am doing something wrong and this is intended or it is a possible bug.
Thank you for your help.

Best regards,
LG

Gaussian process module

This should make it easy to combine kernels, and have a nice syntax for defining GPs and evaluating them at new locations.

The syntax might look something like:

x <- as_data(x_data)
y <- as_data(y_data) 
x_pred <- as_data(x_star_data)

l = lognormal(0, 2)
sigma_rbf = lognormal(0, 1)
sigma_lin = lognormal(0, 1)
sigma = lognormal(0, 2)

# k1, k2 and K are functions
k1 <- gp$rbf_kernel(l, sigma_rbf)
k2 <- gp$lin_kernel(sigma_lin) 
K <- k1 + k2 # overload arithmetic operators for kernel functions

z = gp$gp(0, K(x))
likelihood(y) = normal(z, sigma) 

z_pred <- gp$project(z, K(x, x_pred)) 

Check and advise on installation at startup

When loading the package, greta (via reticulate) should check for availability of python and tensorflow, then check the versions are sufficient. It they aren't, it should message/error and provide installation advice.
This will be particularly important once it's on CRAN and users don't have the context.

TypeError

I tried to replicate the example in the README, but failed, even though TensorFlow is correctly installed (I can succesfully replicate the examples from the TensorFlow documentation).

library(greta)
#> 
#> Attachement du package : 'greta'
#> The following objects are masked from 'package:stats':
#> 
#>     binomial, poisson
#> The following objects are masked from 'package:base':
#> 
#>     %*%, beta, diag, gamma, sweep

# create parameters, stating their prior distributions
intercept = normal(0, 5)
coefficient = normal(0, 3)
sd = lognormal(0, 3)

# write the equation for the expected mean sepal length
mean <- intercept + coefficient * iris$Petal.Length

# define the likelihood of the observed data
likelihood(iris$Sepal.Length) = normal(mean, sd)

model <- define_model(intercept, coefficient, sd)
#> Error in py_call_impl(callable, dots$args, dots$keywords): TypeError: Expected int32, got list containing Tensors of type '_Message' instead.
#> 
#> Detailed traceback: 
#>   File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/array_ops.py", line 1000, in concat
#>     dtype=dtypes.int32).get_shape(
#>   File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 669, in convert_to_tensor
#>     ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
#>   File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/constant_op.py", line 176, in _constant_tensor_conversion_function
#>     return constant(v, dtype=dtype, name=name)
#>   File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/constant_op.py", line 165, in constant
#>     tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape, verify_shape=verify_shape))
#>   File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/tensor_util.py", line 367, in make_tensor_proto
#>     _AssertCompatible(values, dtype)
#>   File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/tensor_util.py", line 302, in _AssertCompatible
#>     (dtype.name, repr(mismatch), type(mismatch).__name__))

Here is my session info:

devtools::session_info()
#> Session info --------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.4.0 (2017-04-21)
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language fr_FR                       
#>  collate  fr_FR.UTF-8                 
#>  tz       Europe/Paris                
#>  date     2017-04-30
#> Packages ------------------------------------------------------------------
#>  package    * version    date       source                            
#>  backports    1.0.5      2017-01-18 CRAN (R 3.3.2)                    
#>  coda         0.19-1     2016-12-08 cran (@0.19-1)                    
#>  devtools     1.12.0     2016-12-05 CRAN (R 3.4.0)                    
#>  digest       0.6.12     2017-01-27 CRAN (R 3.4.0)                    
#>  evaluate     0.10       2016-10-11 CRAN (R 3.3.1)                    
#>  greta      * 0.1.1      2017-04-30 Github (goldingn/greta@bd64983)   
#>  htmltools    0.3.6      2017-04-28 CRAN (R 3.4.0)                    
#>  knitr        1.15.1     2016-11-22 CRAN (R 3.3.2)                    
#>  lattice      0.20-35    2017-03-25 CRAN (R 3.3.3)                    
#>  magrittr     1.5        2014-11-22 CRAN (R 3.4.0)                    
#>  memoise      1.1.0      2017-04-21 CRAN (R 3.4.0)                    
#>  R6           2.2.0      2016-10-05 CRAN (R 3.4.0)                    
#>  Rcpp         0.12.10    2017-03-19 CRAN (R 3.4.0)                    
#>  reticulate   0.7        2017-03-14 CRAN (R 3.3.3)                    
#>  rmarkdown    1.4.0.9001 2017-04-10 Github (rstudio/rmarkdown@b7434dc)
#>  rprojroot    1.2        2017-01-16 CRAN (R 3.3.2)                    
#>  stringi      1.1.5      2017-04-07 cran (@1.1.5)                     
#>  stringr      1.2.0      2017-02-18 cran (@1.2.0)                     
#>  tensorflow   0.7        2017-03-22 CRAN (R 3.3.3)                    
#>  withr        1.0.2      2016-06-20 CRAN (R 3.4.0)                    
#>  yaml         2.1.14     2016-11-12 CRAN (R 3.3.2)

send to CRAN

tensorflow is now there, so should be straightforward

implement NUTS

This should make inference easier for people (like me) who don't like fiddling with samplers.

add distributions

Only a couple are implemented so far. I should get round to adding more before too long.

If you're reading this and there's a particular distribution you're after, please comment below and Ill try to get to that as a priority

write tests

the travis build file for gpflowr and tensorflow successfully links TF, so we should be able to test evaluation of densities etc.

add method for lower-level TensorFlow functions

Standard greta operations are manually registered, and include checking of dimensions in interactive mode.

Doing this for all TF functions is onerous and will break as TF changes. Instead, it would be best to provide an access point to the functions which will error only at compile time, for advanced users.

syntax

Whilst it would be nice to either use the tf$whatever() functions directly, it doesn't appear possible to make these delayable on nodes, without rewriting tensorflow (the R API). This syntax is achieved by overloading the dollar operator for objects of class "tensorflow.builtin.object".

An alternative would be a wrapper function, e.g. do.call:

x = observed(rnorm(3))
y = do.call(tf$reduce_sum, x)

do.call() could accept either a call like this, or a string representing the call, and evaluate later.
This would need do.call to be registered as a generic and overloaded for nodes

add lkj correlation distribution

a la stan

Works on the log determinant of the cholesky factor so should be quick and easy. There's a reference implementation in the rethinking package.

define data greta arrays as TF placeholders

currently, data greta arrays define themselves in the TF graph as TF constant objects. That's fine in the current implementation of greta, since the model is defined after the data is set.

In a future version, these should be defined as TF placeholder objects, with values passed in the parameter_dict python object.

That way, the data could be switched after definition of the TF graph. This would enable batch-based methods such as stochastic gradient descent, as well as speeding up applications like geweke testing of samplers.

This shouldn't be hard to implement, just some book-keeping. Worth leaving until after a refactor of the TF graph creation code though.

switch to defining a model object

Using the following syntax:

m <- define_model(alpha, beta, ...)

alternatively, to get dag for all nodes defined in the workspace:

m <- define_model()

Provide print() and plot() methods to show the user which variables were part of the DAG. Do sampling with:

draws <- mcmc(m, 100)

This will enable tweaking of initial values etc., and selection of parameters we care about.

concatenate greta array and NULL

The behaviour for concatenating NULLs with greta arrays is unexpected, and does not match arrays:

library(greta)
x = normal(0, 1)
c(x, NULL)
Error: objects of class NULL cannot be coerced to greta arrays 
c(NULL, x)
$node
<variable_node>
  Inherits from: <node>
  Public:
    ...
    value: function (new_value = NULL, ...) 

In both cases, the expected result is:

greta array (operation)

     [,1]
[1,]  ? 

The error (Error: objects of class NULL cannot be coerced to greta arrays ) is due to c.greta_array() coercing all inputs to greta arrays. This could be avoided by changing c.greta_array() to check for NULLs and skip them in the concatenation.

The second result is more pernicious. c() is an internal generic, so it dispatches on the mode of the object before dispatching on its class. At present, the mode of greta_array objects is a list, so it calls c.list(), which unpacks the node from the greta array object.

There are a few options for dealing with this:

  • switch to S4 classes ๐Ÿ˜จ ๐Ÿ˜“ ๐Ÿ’ฉ (hard to debug, and would probably require overloading many more generics)
  • switch the mode of greta_array objects to environment (need to check the pass-by reference thing isn't an issue, but it shouldn't be)
  • switch the model of greta_array objects to array, with the node as an attribute, and change all the x$node code to node(x), where node() is an unexported helper to get the node attribute

%~% doesn't error on dims

This should error in interactive mode, but doesn't:

library(greta)
y <- observed(1:5)
lambda <- lognormal(0, 1, dim = 10)
y %~% lambda

More model examples

Especially hierarchical models, as in the BUGS examples and Stan's manual.

extreme value distributions module

As per @TonyLadson's request in #1 distributions for extreme value modelling would be useful. These would be best placed in a module (though admittedly the boundary between general distributions and extreme value distributions is blurry).

Specific requests include:

  • GEV
  • Gumbel
  • Gumbel type B
  • Pearson III
  • Log Pearson III
  • Generalised pareto

Like the already implemented distributions, these should all have a reference R implementation, where possible from a widely used package and preferably all the same package. evd looks like it has the best combination of consistency and scope, and appears to be widely used (judging by download stats from cranlogs).

This is not currently at the top of my to do list, but I would be really keen to support someone else in developing this module!

implement diag() assignment

We can assign on diagonal elements like this:

x <- zeros(3, 3)
idx <- which(row(x) == col(x))
x[idx] = normal(0, 1, dim = 3)

So it should be easy enough to define `diag<-.greta_array`, and enable syntax like this:

x <- zeros(3, 3)
diag(x) = normal(0, 1, dim = 3)

which would be handy

Unable to sample from anything but small datasets

I'm having trouble sampling from models with any moderate amount of data. I was working with a real dataset today and couldn't get anything to work โ€” every sample was rejected. I started to have a hunch that it was a problem with the data, so I began to work with random, normally-distributed data.

When I generated a small dataset (N=100), everything works fine:

# define data
y = as_data(rnorm(100,10,1))
x = as_data(rnorm(100,10,1))

# define parameters
intercept = normal(10,10)
coefficient = normal(0,10)
sd = lognormal(0,1)

# define model
mean <- intercept + coefficient * x

# define the likelihood of the observed data
distribution(y) = normal(mean, sd)

# sample from model
greta_model <- define_model(intercept, coefficient)
draws <- mcmc(greta_model, n_samples = 1000, warmup=100)

# evaluate model
summary(draws)

MCMCtrace(draws)
MCMCplot(draws)

But if I change the distributions to have 10,000 samples each, every sample is rejected during mcmc() and the posterior is invariant:

# define data
y = as_data(rnorm(10000,10,1))
x = as_data(rnorm(10000,10,1))
...

Any ideas about what might be causing this problem?

add node_list control functions

Currently the node list is stored in options, though I will move it back to the namespace soon. Then, I will add the following exported functions:

node_list() - looks first in upstream environments for a node_list object named .nodes, which it returns. If that isn't visible, it grabs the one in the namespace.

greta_here() - creates a node_class object called .nodes in the current environment

greta_flush(recursive = FALSE) - gets the node list return in node_list(), and removes all of the nodes listed there. If run with recursive = TRUE, it keeps doing this until it has recieved and flushed the namespace .nodes

This will enable people to write functions that isolate greta models and dags, e.g. for use in packages. These will be documented in a single page, emphasising that it is for more advanced use and not necessary for most users.

sweep should handle numeric x, but greta_array STATS

Currently it dispatches to sweep.default, which errors because STATS is a greta array.

This could be handled in greta::sweep, check the type of STATS, and coerce x toa greta array if needs be. With something like:

if (inherits(STATS, "greta_array"))
  x <- as_data(x)

make distributions default to tf$contrib$distributions

currently, the log density function and (where applicable) CDF and log CDF functions are all manually defined. It would be less code (and potentially more computationally efficient) if these piggy-backed on the distributions in the contributed TF module.

That would mean adding a tf_distrib field to distributions, which would look like this for normal:

tf_distrib = function (parameters) {
  tf$contrib$distributions$Normal(loc = parameters$mean,
                                  scale = parameters$sd)
}

The tf_log_density_function, tf_cdf_function and tf_log_cdf_function members could then be defined generically in distribution_node as:

tf_log_density_function = function(x, parameters)
  self$tf_distrib(parameters)$log_prob(x)
tf_cdf_function = function(x, parameters)
  self$tf_distrib(parameters)$cdf(x)
tf_log_cdf_function = function(x, parameters)
  self$tf_distrib(parameters)$log_cdf(x)

cdf and log_cdf aren't defined for all distributions, so tf_cdf_function and tf_log_cdf_function should be set to NULL in the relevant distributions, since this is checked by distribution<-


For distributions not already contributed to TF, tf_distrib would instead return a list:

tf_distrib = function (parameters) {
  log_prob = function (x) {
       <do something with parameters and x>
  }
  cdf = function (x) {
       <do something with parameters and x>
  }
  log_cdf = function (x) {
       <do something with parameters and x>
  }
  list(log_prob = log_prob, cdf = cdf, log_cdf = log_cdf)
}

ideal point model

Hi, very interesting package, congratulations. This is the ideal point model I tried to implement with greta:

library(greta)
library(congressbr)

data("senate_nominal_votes")

votes <- senate_nominal_votes %>%
  dplyr::filter(vote_date >= "2007-03-01", vote_date <= "2010-12-01") %>% 
  dplyr::select(bill_id, senator_name, senator_vote)

rollcalls <- vote_to_rollcall(votes = votes$senator_vote, bills = votes$bill_id,
                              legislators = votes$senator_name, 
                              Stan = TRUE, ideal = FALSE)


# create parameters, stating their prior distributions
beta = normal(0, 3)
alpha = normal(0, 5)
xi = normal(0, 1)
sd = flat(range = c(0, 50))

y <- alpha + beta*xi

# define the likelihood of the observed data
likelihood(rollcalls$data) = normal(y, sd)

model <- define_model(alpha, beta, xi, sd)

draws <- mcmc(model, n_samples = 1000)

library(MCMCvis)

MCMCtrace(draws)
MCMCplot(draws, xlim = c(-1, 5))

Which gives me this:

How could I tweak this to get estimates for every x_i? (Individual senators, in this case. sd seems odd too but I'm primarily interested in the ideal points for now, since I'm just playing around with the package). Thanks for any help!

provide project method

Provide a project so that after sampling from (or optimising) a model, users can add extra nodes to the dag and project the draws to them:

draws <- mcmc(m, ...)
mu_new <- alpha + X_new %*% beta
abs_beta <- abs(beta)
draws_mu_new <- project(draws, mu_new, abs_beta)

This depends on #20

nicer node printing & summary methods

Printing nodes should say that they are greta nodes and preferably state the node type.

For data nodes, which have fixed values, they should summarise the data as if they were arrays (but with drop = FALSE). For stochastic or op nodes, they should summarise the data shape and print that with ? entries.

This gist defines an array-like class for unknown values, with subsetting methods etc. This can be used to overload head() for nodes. Printing would look like this:

> head(x)
greta node with unknown values

     [,1] [,2] [,3]
[1,]   ?    ?    ? 
[2,]   ?    ?    ? 
[3,]   ?    ?    ? 
[4,]   ?    ?    ? 
[5,]   ?    ?    ? 
[6,]   ?    ?    ? 

A dim() method (just returning x$dim), would also be helpful.

summary() and str() methods for distribution nodes should also return some information about the distributions and possibly parameters.

evaluate downstream variables

It might be useful, when defining or debugging a model, to be able to evaluate the values of operation nodes conditional on the nodes on which it depends.
E.g. with the following (silly) model, it might be helpful to work out what values c will take under reasonable values of a and b, or what d is like for different values of c:

x <- seq(0, 1, length.out = 100)
a = normal(0, 1)
b = exponential(3)
c <- log(a * x ^ b)
d <- ilogit(2 * c)

A way of plugging in values for variables and getting results back would be helpful. This would happen before, and separate from, a call to define_model() and mcmc(). The syntax might look something like this:

with(fixed_value(a = 2, b = 1), c)

or

with(fixed_value(c = -2), d)

This could be done by adding another way for nodes to evaluate on a Tensorflow graph, and should be quite simple as the conditioning nodes will always be fixed, and have no need to define densities or gradients etc.

This would also provide a (slightly clunky) way of running general numeric calculations on a GPU from R, even if not related to statistical modelling.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.