I have been considering a uniform interface for computing multiple statistics all at once, while allowing they share part of the computation.
Consider the following example. We want to compute sum
, mean
, var
, and std
from x
:
s, m, v, sd = sum(x), mean(x), var(x), std(x)
This clearly would waste a lot of computation (e.g it actually computes sum four times, mean three times, and variance twice).
A more efficient way would be
s = sum(x)
m = s / length(x)
v = varm(x, m)
sd = sqrt(v)
This is more efficient, but not as concise and convenient.
I am considering the following way:
s, m, v, sd = stats(x, (sum_, mean_, var_, std_))
Internally, it should find an efficient routine that computes them altogether. Here, sum_
and mean_
are typed indicators defined as
type Sum_ end
type Mean_ end
type Var_ end
type Std_ end
const sum_ = Sum_()
const mean_ = Mean_()
const var_ = Var_()
const std_ = Std_()
Different combinations of statistics are different tuple types, and therefore we can leverage Julia's multiple dispatch mechanism to choose the optimal computation paths.
This is not urgent, but would be really nice to have. I am not going to implement this in near future. Just open this thread to collect ideas, suggestions, and opinions.