ekmett / ad Goto Github PK

Automatic Differentiation

Home Page: http://hackage.haskell.org/package/ad

License: BSD 3-Clause "New" or "Revised" License

Haskell 92.04% C 4.63% C++ 3.33%

ad's Introduction

A package that provides an intuitive API for Automatic Differentiation (AD) in Haskell. Automatic differentiation provides a means to calculate the derivatives of a function while evaluating it. Unlike numerical methods based on running the program with multiple inputs or symbolic approaches, automatic differentiation typically only decreases performance by a small multiplier.

AD employs the fact that any program y = F(x) that computes one or more value does so by composing multiple primitive operations. If the (partial) derivatives of each of those operations is known, then they can be composed to derive the answer for the derivative of the entire program at a point.

This library contains at its core a single implementation that describes how to compute the partial derivatives of a wide array of primitive operations. It then exposes an API that enables a user to safely combine them using standard higher-order functions, just as you would with any other Haskell numerical type.

There are several ways to compose these individual Jacobian matrices. We hide the choice used by the API behind an explicit "Mode" type-class and universal quantification. This prevents users from confusing infinitesimals. If you want to risk infinitesimal confusion in order to get greater flexibility in how you curry, flip and generally combine the differential operators, then the Rank1.* modules are probably your cup of tea.

Features

Provides forward- and reverse- mode AD combinators with a common API.
Optional type-level "branding" is available to prevent the end user from confusing infinitesimals
Each mode has a separate module full of combinators, with a consistent look and feel.

Examples

You can compute derivatives of functions

Prelude Numeric.AD> diff sin 0 {- cos 0 -}
1.0

Or both the answer and the derivative of a function:

Prelude Numeric.AD> diff' (exp . log) 2
(2.0,1.0)

You can compute the derivative of a function with a constant parameter using auto:

Prelude Numeric.AD> let t = 2.0 :: Double
Prelude Numeric.AD> diff (\ x -> auto t * sin x) 0
2.0

You can use a symbolic numeric type, like the one from simple-reflect to obtain symbolic derivatives:

Prelude Debug.SimpleReflect Numeric.AD> diff atanh x
recip (1 - x * x) * 1

You can compute gradients for functions that take non-scalar values in the form of a Traversable functor full of AD variables.

Prelude Numeric.AD Debug.SimpleReflect> grad (\[x,y,z] -> x * sin (x + log y)) [x,y,z]
[ 0 + (0 + sin (x + log y) * 1 + 1 * (0 + cos (x + log y) * (0 + x * 1)))
, 0 + (0 + recip y * (0 + 1 * (0 + cos (x + log y) * (0 + x * 1))))
, 0
]

which one can simplify to:

[ sin (x + log y) + cos (x + log y) * x, recip y * cos (x + log y) * x, 0 ]

If you need multiple derivatives you can calculate them with diffs:

Prelude Numeric.AD> take 10 $ diffs sin 1
[0.8414709848078965,0.5403023058681398,-0.8414709848078965,-0.5403023058681398,0.8414709848078965,0.5403023058681398,-0.8414709848078965,-0.5403023058681398,0.8414709848078965,0.5403023058681398]

or if your function takes multiple inputs, you can use grads, which returns an 'f-branching stream' of derivatives, that you can inspect lazily. Somewhat more intuitive answers can be obtained by converting the stream into the polymorphically recursive Jet data type. With that we can look at a single "layer" of the answer at a time:

The answer:

Prelude Numeric.AD> headJet $ jet $  grads (\[x,y] -> exp (x * y)) [1,2]
7.38905609893065

The gradient:

Prelude Numeric.AD> headJet $ tailJet $ jet $  grads (\[x,y] -> exp (x * y)) [1,2]
[14.7781121978613,7.38905609893065]

The hessian (n * n matrix of 2nd derivatives)

Prelude Numeric.AD> headJet $ tailJet $ tailJet $ jet $  grads (\[x,y] -> exp (x * y)) [1,2]
[[29.5562243957226,22.16716829679195],[22.16716829679195,7.38905609893065]]

Or even higher order tensors of derivatives as a jet.

Prelude Numeric.AD> headJet $ tailJet $ tailJet $ tailJet $ jet $  grads (\[x,y] -> exp (x * y)) [1,2]
[[[59.1124487914452,44.3343365935839],[44.3343365935839,14.7781121978613]],[[44.3343365935839,14.7781121978613],[14.7781121978613,7.38905609893065]]]

Note the redundant values caused by the various symmetries in the tensors. The ad library is careful to compute each distinct derivative only once, lazily and to share the resulting computation.

Overview

Modules

Numeric.AD computes using whichever mode or combination thereof is suitable to each individual combinator. This mode is the default, re-exported by Numeric.AD
Numeric.AD.Mode.Forward provides basic forward-mode AD. It is good for computing simple derivatives.
Numeric.AD.Mode.Sparse computes a sparse forward-mode AD tower. It is good for higher derivatives or large numbers of outputs.
Numeric.AD.Mode.Kahn computes with reverse-mode AD. It is good for computing a few outputs given many inputs.
Numeric.AD.Mode.Reverse computes with reverse-mode AD. It is good for computing a few outputs given many inputs, when not using sparks heavily.
Numeric.AD.Mode.Tower computes a dense forward-mode AD tower useful for higher derivatives of single input functions.
Numeric.AD.Newton provides a number of combinators for root finding using Newton's method with quadratic convergence.
Numeric.AD.Halley provides a number of combinators for root finding using Halley's method with cubic convergence.
Numeric.AD.Rank1.* provides combinators for AD that are strictly rank-1. This makes it easier to flip and contort them with higher order functions at the expense of type safety when it comes to infinitsimal confusion.

Combinators

While not every mode can provide all operations, the following basic operations are supported, modified as appropriate by the suffixes below:

grad computes the gradient (vector of partial derivatives at a given point) of a function.
jacobian computes the Jacobian matrix of a function at a point.
diff computes the derivative of a function at a point.
du computes a directional derivative of a function at a point.
hessian computes the Hessian matrix (matrix of second partial derivatives) of a function at a point.

Combinator Suffixes

The following suffixes alter the meanings of the functions above as follows:

' also return the answer
With lets the user supply a function to blend the input with the output
F is a version of the base function lifted to return a Traversable (or Functor) result
s means the function returns all higher derivatives in a list or f-branching Stream
T means the result is transposed with respect to the traditional formulation (usually to avoid paying for transposing back)
0 means that the resulting derivative list is padded with 0s at the end.
NoEq means that an infinite list of converging values is returned rather than truncating the list when they become constant

Contact Information

Contributions and bug reports are welcome!

Please feel free to contact me through github or on the #haskell IRC channel on irc.freenode.net.

-Edward Kmett

ad's People

Contributors

Stargazers

Watchers

ad's Issues

Jacobian fails when function is a parameter

Hi!

I'm trying to automatize the creation of a jacobian for any function f with the following type:

f :: Floating a => [a] -> [a] -> [a]

If I define f beforehand, then everything works ok:

jac = jacobian (\params -> expModel params [1,1])

and I can then apply jac to any params.

But if I have:

jac f = jacobian (\params -> f params [1,1])

then it breaks:

<interactive>:42:33: error:
    • Couldn't match expected type ‘f (Numeric.AD.Internal.Reverse.Reverse
                                         s a)
                                    -> [Integer] -> g (Numeric.AD.Internal.Reverse.Reverse s a)’
                  with actual type ‘p’
        because type variable ‘s’ would escape its scope
      This (rigid, skolem) type variable is bound by
        a type expected by the context:
          forall s.
          reflection-2.1.4:Data.Reflection.Reifies
            s Numeric.AD.Internal.Reverse.Tape =>
          f (Numeric.AD.Internal.Reverse.Reverse s a)
          -> g (Numeric.AD.Internal.Reverse.Reverse s a)
        at <interactive>:42:12-47
    • In the expression: f params [1, 1]
      In the first argument of ‘jacobian’, namely
        ‘(\ params -> f params [1, 1])’
      In the expression: jacobian (\ params -> f params [1, 1])
    • Relevant bindings include
        params :: f (Numeric.AD.Internal.Reverse.Reverse s a)
          (bound at <interactive>:42:23)
        f :: p (bound at <interactive>:42:8)
        autjac :: p -> f a -> g (f a) (bound at <interactive>:42:1)

Same thing if I try to parametrize x:

> x = jacobian (\params -> expModel params x) 


<interactive>:55:45: error:
    • Couldn't match expected type ‘[Numeric.AD.Internal.Reverse.Reverse
                                       s a]’
                  with actual type ‘p’
        because type variable ‘s’ would escape its scope
      This (rigid, skolem) type variable is bound by
        a type expected by the context:
          forall s.
          reflection-2.1.4:Data.Reflection.Reifies
            s Numeric.AD.Internal.Reverse.Tape =>
          [Numeric.AD.Internal.Reverse.Reverse s a]
          -> [Numeric.AD.Internal.Reverse.Reverse s a]
        at <interactive>:55:8-46
    • In the second argument of ‘expModel’, namely ‘x’
      In the expression: expModel params x
      In the first argument of ‘jacobian’, namely
        ‘(\ params -> expModel params x)’
    • Relevant bindings include
        params :: [Numeric.AD.Internal.Reverse.Reverse s a]
          (bound at <interactive>:55:19)
        x :: p (bound at <interactive>:55:4)
<interactive>:55:45: error:
    • Couldn't match expected type ‘[Numeric.AD.Internal.Reverse.Reverse
                                       s a]’
                  with actual type ‘p’
        because type variable ‘s’ would escape its scope
      This (rigid, skolem) type variable is bound by
        a type expected by the context:
          forall s.
          reflection-2.1.4:Data.Reflection.Reifies
            s Numeric.AD.Internal.Reverse.Tape =>
          [Numeric.AD.Internal.Reverse.Reverse s a]
          -> [Numeric.AD.Internal.Reverse.Reverse s a]
        at <interactive>:55:8-46
    • In the second argument of ‘expModel’, namely ‘x’
      In the expression: expModel params x
      In the first argument of ‘jacobian’, namely
        ‘(\ params -> expModel params x)’
    • Relevant bindings include
        params :: [Numeric.AD.Internal.Reverse.Reverse s a]
          (bound at <interactive>:55:19)
        x :: p (bound at <interactive>:55:4)
        g2 :: p -> [a] -> [[a]] (bound at <interactive>:55:1)

        g2 :: p -> [a] -> [[a]] (bound at <interactive>:55:1)

What am I doing wrong?
Thanks for the help!

Horribly broken type signatures in Numeric.AD.Newton in 4

How can I add custom gradients?

If I want to give a custom or known gradient for a function, how can I do that in this library? (I don't want to autodifferentiate through this function.) I am using the grad function.

If the library doesn't provide this feature, is there some way I can easily implement this functionality myself, perhaps by changing the definitions of leaf nodes or by editing the dual numbers that presumably carry the numerical gradients?

Here's a concrete example of what I mean:

Say I have some function I want to take the gradient of, say f(x, y) = x^2 + 3 * g(x, y)^2. Then say that g(x, y) is a function whose definition is complicated and involves lots of Haskell code, but whose gradient I've already calculated analytically and is quite simple. Thus, when I take grad f and evaluate it at a point (x, y), I'd like to just plug in my custom gradient for g, instead of autodiffing through it: something like my_nice_grad_of_g (x, y).

I see other autodiff libraries do provide this feature, for example Stan and Tensorflow both allow users to define gradients of a function.

Thanks!

:t diff . diff

Consider

diff2 = diff . diff

Fill in the type:

diff2 :: ???

This would be right, but we'd need to use a fiddled-type variant of diff in the definition

diff2 :: Num a =>
      (forall (s1 :: * -> *).
       (forall (s2 :: * -> *).
        (Mode s1, Mode s2) => AD s1 (AD s2 a) -> AD s1 (AD s2 a)))
      -> a -> a

The "issue" is that these types drive you crazy making it impractical to use this library when defining even seemingly-simple differential operators.

Add `erf` support and perhaps an example of how to bundle a function with its derivatives.

From an email:

An example of how to "hot wire" a simple scalar numeric function (in
this case defined externally) with its derivative for use in AD would be
a nice example in the AD package.  So making this simple/clear seems
like it would be useful documentation.

Anyway, I'm leaving out the magic below.
Might need to use template magic, since that's how things like `atan` work.
That makes it double magical of course.

module Numeric.AD.Erf () where

import Data.Number.Erf
import Numeric.AD
import Numeric.AD.Internal

-- derf :: Erf a => a -> a -- or just Floating
derf x = (2 / sqrt pi)*exp(- x*x)

-- dinverf :: (InvErf a, Erf a) => a -> a  -- or just Floating
dinverf z x = recip $ derf x

MAGIC GOES HERE
 (Erf a, Mode t) => Erf (t a)
   MORE MAGIC
 erf1 = lift1 erf $ derf

MORE OF THE SAME MAGIC
 (InvErf a, Mode t) => InvErf (t a) -- maybe also add Erf a on the lhs
 inverf1 = lift1_ inverf $ dinverf

Could not deduce (Semigroup (Id a))

Bug in conjugateGradientDescent

Using vector, you can write the Rosenbrock function as

f xs = (1 - x) ** 2 + 100 * (y - x**2)**2
  where
  x = xs ! 0
  y = xs ! 1

(The global min is at (1,1))

Then

gradientDescent f $ fromList [0,0]
  = [fromList [0.2,0.0]
  , fromList [0.19,5.000000000000002e-2]
  , fromList [0.2067275,3.2624999999999994e-2]
  , fromList [0.21141771788025895,4.526407407031251e-2]
  , ...
  ]

but

conjugateGradientDescent f $ fromList [0.0]
  = [fromList [0.0,0.0],fromList [0.16126202313958898,0.0]]

Fix ComposeMode Data/Typeable

Missing instances.h

Compiling ad-4.0:

Preprocessing library ad-4.0...

src/Numeric/AD/Internal/Dense.hs:187:0:
fatal error: instances.h: No such file or directory

Multi-variate root finder?

Are there any plans for a multi-variate Newton-Raphson or Halley root finder? If not, I might contribute one.

Nesting Benchmark

My Haskell type fu is punching some air; anyone care to have a look?

This is for a cross-language benchmark. Each language/AD system has a "common" file shared by all the benchmarks. Each benchmark is rewritten in each language/system with an attempt to preserve logic and names across implementations. (Hence, e.g., the gradient = grad line.)

There is an outer level of AD (gradient descent) and an inner level (Euler's method).

Ideally, both inner and outer could be set to use Forward / Reverse mode independently, to yield four benchmarks (FF, FR, RF, RR). Yes, in this case things are scalar or 2D meaning reverse mode won't be faster. It's a benchmark.

Would be okay to sprinkle in minor magic for speed, like strictness annotations.

The style used throughout (across language) is to avoid type declarations except where necessary, i.e., to allow automatic type inference to do its thing.

File Common_Haskell_AD.hs

{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE FlexibleContexts #-}

module Common_Haskell_AD where

import Numeric.AD
import Numeric.AD.Internal.Forward (Forward)
import Data.Reflection (Reifies)
import Numeric.AD.Internal.Reverse (Tape, Reverse, primal)

lift x = auto x

derivative
  :: Num a =>
     (forall s.
      AD s (Numeric.AD.Internal.Forward.Forward a)
      -> AD s (Numeric.AD.Internal.Forward.Forward a))
     -> a -> a
derivative = diff

sqr x = x * x

vplus :: Num a => [a] -> [a] -> [a]
vplus = zipWith (+)

vminus :: Num a => [a] -> [a] -> [a]
vminus = zipWith (-)

ktimesv k = map (k *)

magnitude_squared x = foldl (+) 0 (map sqr x)

magnitude :: Floating a => [a] -> a
magnitude = sqrt . magnitude_squared

distance_squared u v = magnitude_squared (vminus u v)

distance u v = sqrt (distance_squared u v)

-- replace_ith (x : xs) 0 xi = (xi : xs)
-- replace_ith (x : xs) i xi = (x : (replace_ith xs (i - 1) xi))

replace_ith xs i xi = take i xs ++ [xi] ++ drop (i+1) xs

gradient
  :: Num a =>
     (forall s.
      Data.Reflection.Reifies s Numeric.AD.Internal.Reverse.Tape =>
      [Numeric.AD.Internal.Reverse.Reverse s a]
      -> Numeric.AD.Internal.Reverse.Reverse s a)
     -> [a] -> [a]

gradient f x = grad f x

-- lower_fs :: (Mode b, Num a) => ([b] -> Reverse s a) -> [Scalar b] -> a
-- lower_fs :: Num a => ([Reverse s a] -> Reverse s a) -> [a] -> a

-- lower_fs f xs = primal (f (map auto xs))

lower_fs
  :: Num a => (forall s. Data.Reflection.Reifies s Tape =>
               [Reverse s a] -> Reverse s a) -> [a] -> a

lower_fs f xs = fst (grad' f xs)

multivariate_argmin, multivariate_argmax ::
  (Ord a, Floating a, Fractional a) =>
  (forall s. Data.Reflection.Reifies s Tape =>
    [Reverse s a] -> Reverse s a) -> [a] -> [a]

multivariate_max ::
  (Ord a, Floating a, Fractional a) =>
  (forall s. Data.Reflection.Reifies s Tape =>
    [Reverse s a] -> Reverse s a) -> [a] -> a

multivariate_argmin f x =
    let 
        loop x fx gx eta i =
            if (magnitude gx) <= 1e-5
            then x
            else if i == 10
                 then loop x fx gx (2 * eta) 0
                 else let x_prime = vminus x (ktimesv eta gx)
                      in if (distance x x_prime) <= 1e-5
                         then x
                         else let fx_prime = lower_fs f x_prime
                              in if fx_prime < fx
                                 then
                                 loop
                                 x_prime fx_prime (gradient f x_prime) eta       (i + 1)
                                 else
                                 loop
                                 x       fx       gx          (eta / 2) 0
    in loop x (lower_fs f x) (gradient f x) 1e-5 0

multivariate_argmax f x = multivariate_argmin (\ x -> - (f x)) x

multivariate_max f x = (lower_fs f) (multivariate_argmax f x)

File particle-XX-haskell-ad.hs

{-# LANGUAGE AllowAmbiguousTypes #-}

import Common_Haskell_AD

naive_euler w =
    let charges = [[10, 10 - lift w], [10, 0]]
        x_initial = [0, 8]
        xdot_initial = [0.75, 0]
        delta_t = 1e-1
        p x = sum (map (recip . distance x) charges)
        loop x @ [_, x_1] xdot @ [_, xdot_1] =
            let xddot = ktimesv (-1) (gradient p x)
                x_new @ [_, x_new_1] = vplus x (ktimesv delta_t xdot)
            in if x_new_1 > 0
               then loop x_new (vplus xdot (ktimesv delta_t xddot))
               else let delta_t_f = - x_1 / xdot_1
                        [x_t_f_0, _] = vplus x (ktimesv delta_t_f xdot)
                    in sqr x_t_f_0
    in loop x_initial xdot_initial

run = let w0 = 0
          [w_star] = multivariate_argmin (\ [w] -> naive_euler w) [w0]
      in w_star

main = print run

GHC 8 compatible release needed

ad is no longer in stackage nightly. That probably means a lot of other packages are blocked as well.

derivative-taking or derivative-using functions are not first class

Part of the goal of Automatic Differentiation is to make taking derivatives easy. In a functional language this means that that functions for taking derivatives, or functions that use them, should be first class.

In another issue, I wrote something like this:

*Numeric.AD Data.Function> take 20 $ zipWith (-) (diffs (**(1/2)) 4) (diffs (**0.5) 4)
[0.0,-0.4571067811865476,NaN,5.7079671162651446e-2,NaN,6.408282648826724e-3,NaN,9.91344451904307e-3,NaN,3.0205026268959045e-2,NaN,0.15244099195115268,NaN,1.1504531111313554,NaN,12.133685156463514,NaN,170.44035868219845,NaN,3075.9158480928004]
*Numeric.AD Data.Function> take 20 $ zipWith (-) (diffs sqrt 4) (diffs (**(1/2)) 4)
[0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-9.094947017729282e-13]

As a functional programmer, the redundancy bothers me. I would really want to write this instead:

*Numeric.AD Data.Function> take 20 $ on (zipWith (-)) (flip diffs 4) sqrt (**(1/2))

<interactive>:46:43:
Couldn't match expected type `forall s.
                  Numeric.AD.Internal.Tower.Tower a0 s
                  -> Numeric.AD.Internal.Tower.Tower a0 s'
        with actual type `a1 -> a1'
In the third argument of `on', namely `sqrt'
In the second argument of `($)', namely
  `on (zipWith (-)) (flip diffs 4) sqrt (** (1 / 2))'
In the expression:
  take 20 $ on (zipWith (-)) (flip diffs 4) sqrt (** (1 / 2))

<interactive>:46:49:
Couldn't match expected type `forall s.
                  Numeric.AD.Internal.Tower.Tower a0 s
                  -> Numeric.AD.Internal.Tower.Tower a0 s'
        with actual type `a2 -> a2'
In the fourth argument of `on', namely `(** (1 / 2))'
In the second argument of `($)', namely
  `on (zipWith (-)) (flip diffs 4) sqrt (** (1 / 2))'
In the expression:
  take 20 $ on (zipWith (-)) (flip diffs 4) sqrt (** (1 / 2))

Inlining the on does not solve the problem:

*Numeric.AD Data.Function> take 20 $ zipWith (-) (flip diffs 4 sqrt) (flip diffs 4 (**(1/2)))

<interactive>:47:37:
Couldn't match expected type `forall s.
                  Numeric.AD.Internal.Tower.Tower a0 s
                  -> Numeric.AD.Internal.Tower.Tower a0 s'
        with actual type `a1 -> a1'
In the third argument of `flip', namely `sqrt'
In the second argument of `zipWith', namely `(flip diffs 4 sqrt)'
In the second argument of `($)', namely
  `zipWith (-) (flip diffs 4 sqrt) (flip diffs 4 (** (1 / 2)))'

<interactive>:47:58:
Couldn't match expected type `forall s.
                  Numeric.AD.Internal.Tower.Tower a0 s
                  -> Numeric.AD.Internal.Tower.Tower a0 s'
        with actual type `a2 -> a2'
In the third argument of `flip', namely `(** (1 / 2))'
In the third argument of `zipWith', namely
  `(flip diffs 4 (** (1 / 2)))'
In the second argument of `($)', namely
  `zipWith (-) (flip diffs 4 sqrt) (flip diffs 4 (** (1 / 2)))'

Further inlining the flip brings us back to the original expression. The algebraic rules which we would like to rely on---here, the ability to abstract---do not hold. In this sense, diffs and any other function which takes derivatives (even internally) are not ''first class functions'' in that they do not obey the algebra we expect function to.

I understand that this is due to the types involved. It seems unlikely that this can be fully addressed in Haskell, at least without augmentation of the type system. But I wanted to record the issue where users can find it, and an issue here seems like the best place.

norm-squared is differentiable at zero

The answer should be 0.
Prelude Numeric.AD> diff (\x -> x **2) 0 NaN

Cool package though.

Double specific implementation of Newton?

When using Newton.findZero I'm getting results in about 100ns. If I take the source code and change out diff' for Forward.Double.diff' I get results in 6ns - a substantial difference. Is there any thought about either having a .Double module of this, or maybe a parameterized version that takes diff' as an argument?

conjugateGradient used to give the wrong answer, now it gives no answer...

The below gave the wrong numeric result in 3.4. Now in 4.0 it gives a type error, as show.

$ ghci
GHCi, version 7.6.3: http://www.haskell.org/ghc/  :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Prelude> :m + Numeric.AD
Prelude Numeric.AD> conjugateGradientDescent (\[x,y]->x^2+y^2/2) [10,10]

<interactive>:3:42:
    Could not deduce (Fractional t) arising from a use of `/'
    from the context (Numeric.AD.Internal.Classes.Mode t,
                      Double ~ Numeric.AD.Internal.Classes.Scalar t,
                      Num t)
      bound by a type expected by the context:
                 (Numeric.AD.Internal.Classes.Mode t,
                  Double ~ Numeric.AD.Internal.Classes.Scalar t, Num t) =>
                 [t] -> t
      at <interactive>:3:1-52
    Possible fix:
      add (Fractional t) to the context of
        a type expected by the context:
          (Numeric.AD.Internal.Classes.Mode t,
           Double ~ Numeric.AD.Internal.Classes.Scalar t, Num t) =>
          [t] -> t
    In the second argument of `(+)', namely `y ^ 2 / 2'
    In the expression: x ^ 2 + y ^ 2 / 2
    In the first argument of `conjugateGradientDescent', namely
      `(\ [x, y] -> x ^ 2 + y ^ 2 / 2)'

Inefficiency caused by current Mode scheme

The current entangling of the concepts of Mode and infinitesimal mean that every user of the AD library has to dispatch virtually everything that want to do with an AD variable via a dictionary. While this provides unusually strong guarantees, that only the properties that AD s a inherits from Mode s and the underlying numeric type a can be used, it means that the resulting code can be considerably slower than would be otherwise possible.

We could adjust AD to have separate mode and infinitesimal attributes, and adjust UU, UF, FU, FF to have MUU MUF MFU MFF variants that accept a mode, so that most of the combinators do the right thing. The cost of this is that we need to be more careful about all the extraneous instances that our Modes have picked up, and which properties we allow AD to inherit.

To obtain something approximating the current level of safety, we'll have to remove a lot of current instances and hide the AD data constructor.

Matrix algebra support?

Does this package support matrix algebra operations? For instance, does this support HMatrix, Repa or Accelerate or some other way of getting gradients for functions involving matrices? I'm looking for a Haskell alternative for Python autograd package.

zero, NaN, zero?

What's that big stripe of NaNs, and why does it go back to zeros?

*Main Numeric.AD> take 1000 $ diffs (**3) (5::Double)
[125.0,75.0,30.0,6.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,NaN,NaN,NaN,
NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,
NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,
NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,
NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,
NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,
NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,
NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,
NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,
NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,
NaN,NaN,NaN,NaN,NaN,NaN,NaN,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0]

Feasibility finder often loops on Newton constrainedDescent

import Data.Functor.Identity
import Numeric.AD (auto)
import Numeric.AD.Newton

ex :: [ (Double , Identity Double) ]
ex = constrainedDescent (const (auto 1)) cs (Identity 18.5)
  where
    cs = [ CC \(Identity x) -> auto 0.0 - x
         , CC \(Identity x) -> x - auto 10.0
         ]

The above is simply trying to find a feasible value between 0 and 10, with a starting guess of 18.5.

It works for starting values below 18.4, but as soon as I try anything 18.5 or above, it just loops indefinitely.

I tried digging through the code some, and if I change that nrSteps = [take 20 | _ <- [1..length tValues]] ++ [id] line in constrainedConvex' to take 60 steps instead of 20, I have some more success, but I can still get failures just by increasing my starting guess a bit further.

I'm not entirely familiar with what's going on, but it looks like the search attempts often converge on infeasible solutions. I think it's because the earlier attempts try smaller steps, but bail out if a solution is not found quickly, while later attempts are taking such large steps that they converge on something outside the bounds entirely. Perhaps it would make sense to re-try smaller steps if the big steps converge, starting from that point?

We should rename 'lift' now that it is exported by default.

Use `diffs` instead of `diffs0` in `findZero`?

Can we use this to drop Eq constraints?

http://hackage.haskell.org/package/ad-4.3/docs/src/Numeric-AD-Rank1-Halley.html#findZero

Test suite floating point accuracy failure when building with non-glibc-libc

Hey Ed,

for the advancement of static linking in Haskell (https://github.com/nh2/static-haskell-nix and NixOS/nixpkgs#43795) I'm building all Stackage executables statically, which means building against musl libc instead of glibc.

I noticed the following test failures in ad:

/tmp/nix-build-ad-4.3.5.drv-0/ad-4.3.5/src/Numeric/AD.hs:201: failure in expression `hessianF (\[x,y] -> [x*y,x+y,exp x*cos y]) [1,2]'
expected: [[[0.0,1.0],[1.0,0.0]],[[0.0,0.0],[0.0,0.0]],[[-1.1312043837568135,-2.4717266720048188],[-2.4717266720048188,1.1312043837568135]]]
 but got: [[[0.0,1.0],[1.0,0.0]],[[0.0,0.0],[0.0,0.0]],[[-1.1312043837568138,-2.471726672004819],[-2.471726672004819,1.1312043837568138]]]

/tmp/nix-build-ad-4.3.5.drv-0/ad-4.3.5/src/Numeric/AD/Mode/Kahn.hs:194: failure in expression `hessianF (\[x,y] -> [x*y,x+y,exp x*cos y]) [1,2]'
expected: [[[0.0,1.0],[1.0,0.0]],[[0.0,0.0],[0.0,0.0]],[[-1.1312043837568135,-2.4717266720048188],[-2.4717266720048188,1.1312043837568135]]]
 but got: [[[0.0,1.0],[1.0,0.0]],[[0.0,0.0],[0.0,0.0]],[[-1.1312043837568138,-2.471726672004819],[-2.471726672004819,1.1312043837568138]]]

/tmp/nix-build-ad-4.3.5.drv-0/ad-4.3.5/src/Numeric/AD/Rank1/Kahn.hs:208: failure in expression `hessianF (\[x,y] -> [x*y,x+y,exp x*cos y]) [1,2]'
expected: [[[0.0,1.0],[1.0,0.0]],[[0.0,0.0],[0.0,0.0]],[[-1.1312043837568135,-2.4717266720048188],[-2.4717266720048188,1.1312043837568135]]]
 but got: [[[0.0,1.0],[1.0,0.0]],[[0.0,0.0],[0.0,0.0]],[[-1.1312043837568138,-2.471726672004819],[-2.471726672004819,1.1312043837568138]]]

/tmp/nix-build-ad-4.3.5.drv-0/ad-4.3.5/src/Numeric/AD/Mode/Reverse.hs:205: failure in expression `hessianF (\[x,y] -> [x*y,x+y,exp x*cos y]) [1,2]'
expected: [[[0.0,1.0],[1.0,0.0]],[[0.0,0.0],[0.0,0.0]],[[-1.1312043837568135,-2.4717266720048188],[-2.4717266720048188,1.1312043837568135]]]
 but got: [[[0.0,1.0],[1.0,0.0]],[[0.0,0.0],[0.0,0.0]],[[-1.1312043837568138,-2.471726672004819],[-2.471726672004819,1.1312043837568138]]]

Examples: 105  Tried: 105  Errors: 0  Failures: 4

I suspect that these numbers rely on implementation details in libm as provided by the libc, for example how sin and cos are implemented example of a recent change in glibc.

Possibly also relevant is fused-multiply-add (FMA), which means the results can differ depending on the processor.

Would it be possible to implement these checks against some epsilon instead of exact values so that they will pass no matter in what environment they are run?

Unexpected difference in behavior of `grad` for different modes?

Testing out gradient of the simple function x**y along the x axis at y /= 0, I get different answers depending on the mode:

> import qualified Numeric.AD.Mode.Sparse as S
> S.grad (\[x,y] -> x**y) [0,2]
[NaN, NaN]
> import qualified Numeric.AD.Mode.Reverse as R
> R.grad (\[x,y] -> x**y) [0,2]
[NaN, NaN]
> import qualified Numeric.AD.Mode.Kahn as K
> K.grad (\[x,y] -> x**y) [0,2]
[NaN, 0.0]

Only the forward mode yields the mathematically "correct" answer:

> import qualified Numeric.AD.Mode.Forward as F
> F.grad (\[x,y] -> x**y) [0,2]
[0.0, NaN]

Which of the three answers is the expected behavior? And, are all three supposed to give different answers for something like this?

efficiency of tower mode

$ ghci
> :m + Numeric.FAD Numeric.AD.Mode.Tower
> Numeric.FAD.diffs exp 1

[2.718281828459045, 2.718281828459045, 2.718281828459045, 2.718281828459045, 2.718281828459045, 2.718281828459045, 2.718281828459045, 2.718281828459045, 2.718281828459045, 2.718281828459045, 2.718281828459045, 2.718281828459045, 2.718281828459045, 2.718281828459045, 2.718281828459045, 2.718281828459045, 2.718281828459045, 2.718281828459045, 2.718281828459045, 2.718281828459045, 2.718281828459045, 2.718281828459045...scrolls endlessly at lightning speed until interrupted...]

> Numeric.AD.Mode.Tower.diffs exp 1

[2.718281828459045, 2.718281828459045, 2.718281828459045, 2.718281828459045, 2.718281828459045, 2.718281828459045, 2.718281828459045, 2.718281828459045, 2.718281828459045, 2.718281828459045, 2.718281828459045...slower and slower for three lines or so, then freezes, machine starts to thrash, takes a few min to respond to C-c...]

4.2 compilation failure

Preprocessing library ad-4.2...

src/Numeric/AD/Halley.hs:32:18:
    Could not find module `Numeric.AD.Rank1.Halley'
    Use -v to see a list of the files searched for.
Failed to install ad-4.2
cabal: Error: some packages failed to install:
ad-4.2 failed during the building phase. The exception was:
ExitFailure 1

multiple applications of grad giving incorrect result

I see this problem in multidimensional functions but here is a simple one dimensional case.

This one liner should be [8.0]

grad (sum . map (\x -> x*x) . grad (\[x] -> x**2)) [1]

Instead I get [48.0]. Using diff I get the right answer.

diff ((sum . map (\x -> x*x) . grad (\[x] -> x**2)) . (:[])) 1

Any idea what is going on here?

Determining when Jacobians are sparse

To what extent is it possible for automatic differentiation to tell us when the partial derivative of a function with respect to one of its inputs is everywhere zero?

The general approach works by evaluating derivatives only at finitely many explicitly specified points.

But it seems that this analysis, or at least a very good conservative approximation of it, is syntax-directed on the grammar of math expressions that Num and Floating and friends let us construct, so maybe there is hope?

Forward : Wengert :: Speelpenning : Reverse

Even though "Wengert list" is common terminology for a topological sort of the flow graph, I think Wengert Mode has the wrong association here. Robert Edwin Wengert invented (or discovered, or rediscovered if you read Russian) forward mode. Bert Speelpenning invented (or discovered or rediscovered if you read Werbos or Bryson & Ho or Pontryagin) reverse mode. And I suppose Babbage or Lovelace probably invented serialization of the dynamic trace of an instruction stream... Point is, naming a Reverse Mode subtype "Wengert" mode is really confusing. I would suggest PontryaginDiscretized since the list at issue is precisely the discrete version of his continuous differential equation.

interesting complementary evaluation strategy?

this might not quite be under the purview of AD, but seems like an interesting tech that might be a sort of compensated precision finite difference in sheeps clothing? http://dl.acm.org/citation.cfm?id=2168774&CFID=181882206&CFTOKEN=50391301
Using Multicomplex Variables for Automatic Computation of High-Order Derivatives

(might alternatively be useful for compensated?)

Add stochastic gradient descent

http://idontgetoutmuch.wordpress.com/2013/04/30/logistic-regression-and-automated-differentiation-3/

Two associated pattern synonyms (KnownConstant / KnownZero) for Mode

Worth noting, it seems like we can replace isKnownConstant / auto and isKnownZero / zero with pattern synonyms once we get associated pattern synonyms (and we can also default type Scalar t = t)

class (Num t, Num (Scalar t)) => Mode t where
  type Scalar t
  type Scalar t = t

  pattern KnownConstant :: Scalar s -> s
  pattern KnownZero     :: s
  pattern KnownZero     = KnownConstant 0

  (*^) :: Scalar t -> t -> t
  a *^ b = KnownConstant a * b

  (^*) :: t -> Scalar t -> t
  a ^* b = a * KnownConstant b

  (^/) :: Fractional (Scalar t) => t -> Scalar t -> t
  a ^/ b = a ^* recip b

And the old

auto :: Mode s => Scalar s -> s
auto = KnownConstant

zero :: Mode s => s
zero = auto' 0

isKnownConstant :: Mode s => s -> Bool
isKnownConstant KnownConstant{} = True
isKnownConstant _               = False

isKnownZero :: Mode s => s -> Bool
isKnownZero KnownZero = True
isKnownZero _         = False

while also giving us

getKnownConstant :: Mode s => s -> Maybe (Scalar s)
getKnownConstant (KnownConstant n) = Just n
getKnownConstant _                 = Nothing

These could be defined in terms of Prisms but that rules out deriving.

weird stuff with partial function application to lists?

Hi, I'm new to Haskell, but I've asked around on some of the freenode communities and nobody there can help me. This may not be a bug, but simply my inexperience.

f :: [a] -> a
f = [u, v] -> (v - (u * u * u))

grad f [1, 1] works just fine in GHCI

map (\x -> grad x) [f] gives the following error:
Couldn't match type ‘Integer’
with ‘Numeric.AD.Internal.Reverse.Reverse s a’
Expected type: [Numeric.AD.Internal.Reverse.Reverse s a]
-> Numeric.AD.Internal.Reverse.Reverse s a
Actual type: [Integer] -> Integer
• In the first argument of ‘grad’, namely ‘f’
In the expression: grad f
In the first argument of ‘map’, namely ‘(\ f -> grad f)’
• Relevant bindings include
it :: [[a] -> [a]] (bound at :6:1)

I've tried many different ways of partially applying grad to a list of functions but cannot get it to typecheck.

Vectored AD

It would be nice to invert our current control mechanism so that instead of having

grad :: (Traversable f, Num a) => (forall s. Mode s => f (AD s a) -> AD s a) -> f a -> f a

we could have (also factoring in issue 1):

type RAD = AD Reverse

grad :: (Traversable f, Num a) => (forall s. RAD f s a -> RAD Scalar s a) -> f a -> f a

This can then be used to allow us to unify instance heads for the various UU, UF, FU, FF arguments, allow vgrad to take multiple 'containers' worth of AD variables, and it can act as a path to support Taylor Series-of-Matrices rather than Matrices-of-Taylor Series for better ability to leverage traditional BLAS packages, etc.

(**) higher-order derivatives

This is against the current master branch tip.

> take 10 $ diffs (\x->x^1) 1
[1,1]
> take 10 $ diffs (\x->x**1) 1
[1.0,1.0,Infinity,NaN,NaN,NaN,NaN,NaN,NaN,NaN]

Glork.

Readme refers to "tensors" but it is no more

Was it renamed to jet?

Gradient descent fails to converge (to the right result).

Consider the function,

let f [m,b] = (6020.272727m+b-23680.30303)^(2::Int) + (7254.196429m+b-28807.61607)^(2::Int) + (6738.575342m+b-26582.76712)^(2::Int) + (5464.658537m+b-23894.34756)^(2::Int)

It's a smooth convex function and gradient descent should converge to [m,b] = [2.911525576,7196.512447]. See Wolfram Alpha.

Instead,

(!! 10) $ gradientDescent f [0,0]

[6.503543385615585,1.0522541316691529e-3]

(!! 100) $ gradientDescent f [0,0]

[4.074543700217947,1.0251230667415617e-3]

(!! 1000) $ gradientDescent f [0,0]

[4.028579379271611,4.621971888645496e-3]

(!! 10000) $ gradientDescent f [0,0]

[4.02857388945333,4.100383031651195e-2]

Chain is a pretty bad name for the use of Wengert lists.

'cabal install AD' fails when installing semigroupoids

When running 'cabal install AD' on ubuntu 14.04.2 with cabal 1.16.0.2:

src/Data/Semigroup/Foldable/Class.hs:103:10:
    No instance for (Bifoldable (Tagged *))
    arising from the superclasses of an instance declaration
    Possible fix:
    add an instance declaration for (Bifoldable (Tagged *))
    In the instance declaration for `Bifoldable1 (Tagged *)'
Failed to install semigroupoids-5.0.0.2
cabal: Error: some packages failed to install:
ad-4.2.2 depends on semigroupoids-5.0.0.2 which failed to install.
free-4.12.1 depends on semigroupoids-5.0.0.2 which failed to install.
semigroupoids-5.0.0.2 failed during the building phase. The exception was:
ExitFailure 1

No documentation on hackage for 4.3.1

miscalculating grads

grads appears to miscalculate derivatives of degree higher than 2. For example,

> let f = \[x] -> x^3
> let cs = [1]
> headJet $ jet (grads f cs)
1
> headJet $ tailJet $ jet (grads f cs)
[3]
> headJet $ tailJet $ tailJet $ jet (grads f cs)
[[6]]
> headJet $ tailJet $ tailJet $ tailJet $ jet (grads f cs)
[[[2]]]
> headJet $ tailJet $ tailJet $ tailJet $ tailJet $ jet (grads f cs)
[[[[0]]]]

Compare diffs.

> diffs (\x -> x^3) 1
[1,3,6,6]

Stack overflows on some large problems

From Lennart Augustsson,

Hi,

I'm trying to use your AD package for some fairly serious computations.
I'm using grad' to compute the partial derivatives for a function of about 800 variables.
The computation itself take 10-20s.
A problem I frequently experience is that I get a stack overflow.
I think it happens while the partial derivatives are evaluated rather than the regular computation, because using Double rather than AD numbers behaves well.

It's not a serious problem, but maybe you want to take a look.

And no, I can't send you the code. It's computing PV and delta risk for some actual products.

Thanks for a great package!

du failing on simple function

The directional derivative operator du doesn't seem to handle the simplest case.

$ ghci
GHCi, version 7.6.3: http://www.haskell.org/ghc/  :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.

Prelude> :m + Numeric.AD

Prelude Numeric.AD> let f [x0,x1,x2] = [x0*x1^2*x2^3, x0^3*x1^2*x2]

Prelude Numeric.AD> :t f
f :: Num t => [t] -> [t]

Prelude Numeric.AD> :t du f

<interactive>:1:4:
    Couldn't match type `[AD
                            s (Numeric.AD.Internal.Forward.Forward a0)]'
                  with `AD s (Numeric.AD.Internal.Forward.Forward a0)'
    Expected type: [AD s (Numeric.AD.Internal.Forward.Forward a0)]
                   -> AD s (Numeric.AD.Internal.Forward.Forward a0)
      Actual type: [AD s (Numeric.AD.Internal.Forward.Forward a0)]
                   -> [AD s (Numeric.AD.Internal.Forward.Forward a0)]
    In the first argument of `du', namely `f'
    In the expression: du f

Using AD with exceptions

I tried using AD with a function that has a Typeable constraint and ran into Could not deduce (Typeable s) [..] from: Reifies s Tape [..].

This seems very similar to the problem described in ekmett/reflection#14.
Am I correct in assuming that this comes from the usage of reifyTape in the AD internals and could be fixed by using something closer to the reifyTypeable like in the other issue?

Generate Specialized Code for known objective functions (recording prior dev discussion on this topic)

One long standing idea that's been discussed is generating more specialized / optimized code when the expression to be evaluated is known.

An example could be generating specialized gradient code for a known objective function, etc.

Theres three main strategies for doing so:

a) template haskell, which would likely be a bit tricky to write and make easy to use.
b) a ghc compiler plugin, which if written well, would work for normal AD code as is, or at least will do so some fraction of the time.
c) alternatively, some sort of run time code generation scheme via llvm might be possible.

AD with Accelerate types

It would be nice to be able to use AD with Accelerate types (like Exp Double or Exp (Complex Double))

I think the big issues are:

A lot of typeclass methods on accelerate types result in bottoms. e.g.

instance P.Eq (Exp a) where
  (==) = preludeError "Eq.==" "(==)"
  (/=) = preludeError "Eq./=" "(/=)"

instance Ord a => P.Ord (Exp a) where
  compare = error "Prelude.Ord.compare applied to EDSL types"
  (<)     = preludeError "Ord.<"  "(<)"
  (<=)    = preludeError "Ord.<=" "(<=)"
  (>)     = preludeError "Ord.>"  "(>)"
  (>=)    = preludeError "Ord.>=" "(>=)"
  min     = min
  max     = max

instance P.Enum (Exp a) where
  toEnum   = preludeError "toEnum"
  fromEnum = preludeError "fromEnum"

instance (Num a, Ord a) => P.Real (Exp a) where
  toRational = P.error "Prelude.toRational not supported for Accelerate types"

Accelerate appears to be very slow at compiling code with recursive functions. It seems better to iterate with Data.Array.Accelerate.while or Data.Array.Accelerate.iterate.

nesting: diff (\x -> diff (*x) 1) 1

Was showing off nesting, simplest example I could think of, oops.

tldr: ad 4.3.2.1, ghc 8.0.1, diff (\x -> diff (*x) 1) 1 gets a type error.

$ cabal update
Downloading the latest package list from hackage.haskell.org
$ cabal install --user ad
Resolving dependencies...
Downloading base-orphans-0.5.4...
...
Installed free-4.12.4
Downloading ad-4.3.2.1...
Configuring ad-4.3.2.1...
Building ad-4.3.2.1...
Installed ad-4.3.2.1
$ ghci
GHCi, version 8.0.1: http://www.haskell.org/ghc/  :? for help
Prelude> :m + Numeric.AD
Prelude Numeric.AD> diff (*2) 1
2
Prelude Numeric.AD> diff sin 0
1.0
Prelude Numeric.AD> diff (\x -> diff (*x) 1) 1

<interactive>:6:20: error:
    • Occurs check: cannot construct the infinite type:
        a ~ AD s (Numeric.AD.Internal.Forward.Forward a)
      Expected type: AD
                       s1
                       (Numeric.AD.Internal.Forward.Forward
                          (AD s (Numeric.AD.Internal.Forward.Forward a)))
        Actual type: AD s (Numeric.AD.Internal.Forward.Forward a)
    • In the second argument of ‘(*)’, namely ‘x’
      In the first argument of ‘diff’, namely ‘(* x)’
      In the expression: diff (* x) 1
    • Relevant bindings include
        x :: AD s (Numeric.AD.Internal.Forward.Forward a)
          (bound at <interactive>:6:8)
        it :: a (bound at <interactive>:6:1)

This happens even without nesting.

> diff (pi*) 1
3.141592653589793
> (\x -> diff (x*) 1) pi
error: ...
> let x=pi in diff (x*) 1
3.141592653589793

and yes, I know auto makes it okay

Prelude Numeric.AD> diff (\x -> diff (auto x*) 2) 3
1
Prelude Numeric.AD> (\x -> diff (auto x*) 2) 3
3

But it's still a pretty big wart on an otherwise lovely countenance.

### Failure in /home/ogre/Documents/kmettverse/ad/src/Numeric/AD/Rank1/Kahn.hs:208: expression `hessianF (\[x,y] -> [x*y,x+y,exp x*cos y]) [1,2]'
expected: [[[0.0,1.0],[1.0,0.0]],[[0.0,0.0],[0.0,0.0]],[[-1.1312043837568135,-2.4717266720048188],[-2.4717266720048188,1.1312043837568135]]]
 but got: [[[*** Exception: Data.Array.Base.bOOL_WORD_SCALE: Overflow
          CallStack (from HasCallStack):
            error, called at libraries/array/Data/Array/Base.hs:1363:17 in array-0.5.1.2:Data.Array.Base

In total: Examples: 101 Tried: 97 Errors: 0 Failures: 20