I'm trying to replicate the TARF payoff from this <a href="https://arxiv.org/pdf/2012.

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

XLA compilation error of a TARF payoff with MC,about google/tf-quant-finance

Comments (16)

cyrilchim commented on May 14, 2024

Thanks for reaching out! The issue you are referring to is a TensorFlow (TF) issue as you implicitly have a while loop inside a vectorized map. I just checked internally, it seems that someone is on it but feel free filing the bug with TF team directly.
On the other hand, even if the issue is resolved, I don't think that the code is well-optimized for a GPU device because of all the if statements branching.

Could you please give me edit rights for the colab? Otherwise, I have fiddled a bit with your my_function adding batched version of the calculation, making the code a bit more inefficient on a CPU device but clearly improving GPU performance. Also, it is XLA-compatible now (runs in < 70 ms for me on a T4 device).

        def my_function_new(paths):
           # Shape [num_timesteps, num_samples]
            paths = tf.transpose(paths)
            cur_spot = paths[0]
            total = tf.zeros_like(cur_spot)
            discounted_payoff = tf.zeros_like(cur_spot)
            df = tf.constant(1.0, dtype=tf.float64)
            is_active = tf.ones([num_samples], dtype=tf.bool)
            cashflow = tf.zeros_like(cur_spot)
            i = tf.constant(0, dtype=tf.int32)
           # Explicitly define the while_loop 
            def cond(i, is_active, cashflow, total, discounted_payoff):
              return i < num_timesteps

            def body(i, is_active, cashflow, total, discounted_payoff):
              # Here Tensors are of shape `[num_samples]`
              cur_spot = paths[i]
              add_cashflow = False
              new_is_active = K_knockout > cur_spot
              add_cashflow = tf.where(
                  tf.logical_or(K_upper <= cur_spot, cur_spot < K_lower),
                  True, False)

              new_cashflow = tf.where(
                 K_upper <= cur_spot,
                 cur_spot - strike,
                 cashflow
              )
              new_cashflow = tf.where(cur_spot < K_lower,
                                  step_up_ratio*(cur_spot - strike),
                                  new_cashflow)
              new_is_active = tf.where(
                  add_cashflow,
                  tf.where(total + new_cashflow >= tarf_target,
                           False, new_is_active),
                  new_is_active)

              new_cashflow = tf.where(
                  add_cashflow,
                  tf.where(total + new_cashflow >= tarf_target,
                           tarf_target - total, new_cashflow),
                  new_cashflow
                  )

              new_total = tf.where(
                  add_cashflow,
                  total + new_cashflow,
                  total
                  )
              new_discounted_payoff = tf.where(
                  add_cashflow,
                  discounted_payoff + df * new_cashflow,
                  discounted_payoff)
              # Update values only if active
              new_cashflow = tf.where(is_active, 
                                      new_cashflow,
                                      cashflow)
              new_total = tf.where(is_active,  new_total, total)
              new_discounted_payoff = tf.where(is_active, 
                                      new_discounted_payoff,
                                      discounted_payoff)
              new_is_active = tf.where(is_active, 
                                      new_is_active,
                                      is_active)
              return (i + 1, new_is_active, new_cashflow,
                      new_total, new_discounted_payoff)
            _, is_active, cashflow, total, discounted_payoff = tf.while_loop(
                cond, body, (i, is_active, cashflow, total, discounted_payoff)
            )
            return discounted_payoff

Then no need to use vectorized_map. Simply

payoffs = my_function_new(reshaped_paths)

Please let me know if that makes sense

Would you be interested cleaning up the code and contributing it to the library? I have not checked the maths so testing is needed.

from tf-quant-finance.

arthurpham commented on May 14, 2024

Thank you for the help.

I reused your payoff and that worked for the price.
When i try to compute the delta with XLA compilation, i get an error :
https://colab.research.google.com/github/arthurpham/google_colab/blob/494d8a07b8eb39b2c93dbe4dbe8f519730be0030/TARF_MC_Performance_TQF.ipynb

@tf.function(jit_compile=True,
             input_signature=[tf.TensorSpec([], dtype=tf.float64),
                            tf.TensorSpec([], dtype=tf.float64),
                            tf.TensorSpec([], dtype=tf.float64)
                               ])
def delta_fn_xla(strikes, spot, sigma):
    fn = lambda spot: price_eu_options_xla(strikes, spot, sigma)
    return tff.math.fwd_gradient(fn, spot, use_gradient_tape=True)

InternalError: Propagate: Cannot find body function while_body_5075_grad_16381_grad_16947_const_0 for While node gradients/PartitionedCall_2/gradients/gradients/PartitionedCall_grad/PartitionedCall_15_grad/PartitionedCall/gradients/gradients/while_grad/while_grad_grad/gradients/while_grad/while_grad_grad [Op:__inference_delta_fn_xla_17607]

Also for the non-xla version, is there anything i can do to improve the performance ?

t = time.time()
tarf_price = price_eu_options_xla(strikes, spot, sigma)
tarf_delta = delta_fn(strikes, spot, sigma)
tarf_vega = vega_fn(strikes, spot, sigma)
time_tqf = time.time() - t

With fwd_gradients:

TQF gpu TARF price+delta+vega
wall time + tracing:  31.278723001480103
options per second + tracing:  0.03197061465561366
wall time:  7.026975631713867
options per second:  0.142308733146425
------------------------
price tf.Tensor(-246.927357070608, shape=(), dtype=float64)
delta tf.Tensor(21.37942877385511, shape=(), dtype=float64)
vega tf.Tensor(-331.98229464223164, shape=(), dtype=float64)

With gradients:

TQF gpu TARF price+delta+vega
wall time + tracing:  15.258108139038086
options per second + tracing:  0.0655389246745136
wall time:  5.396306037902832
options per second:  0.1853119509857581
------------------------
price tf.Tensor(-246.927357070608, shape=(), dtype=float64)
delta tf.Tensor(21.37942877385511, shape=(), dtype=float64)
vega tf.Tensor(-331.98229464223175, shape=(), dtype=float64)

For reference, here is the same TARF payoff implemented in c++, using Antoine Savine library from the book "Modern Computational Finance: AAD and Parallel Simulations".
https://github.com/arthurpham/CompFinance/blob/e3f2c4e804901fe39d9a33d6fb01c30c8880baee/xlSpreadheets/CompFinance_TARF.xlsx?raw=true
This is run on a regular laptop : 243 ms in parallel (6 cores), 1020 ms in serial.

from tf-quant-finance.

cyrilchim commented on May 14, 2024

Thank you, Arthur!

The issue you are having is nested jit_compilation. Here you'd need to do a few changes:

In my_function_new, please add maximum_iterations to the while_loop:

            _, is_active, cashflow, total, discounted_payoff = tf.while_loop(
                cond, body, (i, is_active, cashflow, total, discounted_payoff),
                maximum_iterations=num_timesteps,
            )

Greeks can be computed using backward gradient

@tf.function(jit_compile=True,
             input_signature=[tf.TensorSpec([], dtype=tf.float64),
                            tf.TensorSpec([], dtype=tf.float64),
                            tf.TensorSpec([], dtype=tf.float64)
                               ])
def greeks_fn_xla(strikes, spot, sigma):
    with tf.GradientTape() as tape:
      tape.watch([spot, sigma])
      prices = price_eu_options(strikes, spot, sigma)
    return prices, tape.gradient(prices, [spot, sigma])

As for the perfomance, just for the reference, GPU performance I get is ~130 ms for greeks_fn_xla function (Please check how long it takes on your CPU, also when running with XLA, check CPU utilization as this might as well be running on a single thread).

We can try to improve performance separately.

Does 1020 ms include greek computation? If so, what is pricing speed separately? Could you please point to the source code?

from tf-quant-finance.

cyrilchim commented on May 14, 2024

Also, do you need Euler sampling for the Geometric Brownian Motion? This can be sampled more efficiently with the designated sampler

from tf-quant-finance.

arthurpham commented on May 14, 2024

The 1020ms include the price+greeks (1st derivative with respect to the spot, vol, risk free rate, dividend yield) in serial (no parallel computation).
For the price alone :
.
It's 94 ms in parallel (using the 6 cores of the cpu on my laptop), and 452 ms in serial.
As for the code, it's in the same git repo as the spreadsheet : arthurpham/CompFinance@e118228
https://github.com/arthurpham/CompFinance
I just cloned the git repo and added the TARF payoff, and an xll function to create the Tarf product object.

I will include your suggestions and measure again the timing.
As for the GeometricBrownianMotion vs GenericItoProcess, reason i used GenericItoProcess is because that's what was in the notebook samples for Monte Carlo, and i assumed that with the same number of timesteps, we should get the same performance (i might be wrong), but get different accuracy.

from tf-quant-finance.

arthurpham commented on May 14, 2024

GPU performance I get is ~130 ms for greeks_fn_xla function

Running on Google Colab Pro with a Tesla T4, i get 1.27s with GPU.
https://colab.research.google.com/github/arthurpham/google_colab/blob/0586ac3a8fb9d477cdd752fcda69fadd4ec3bb0e/TARF_MC_Performance_TQF.ipynb#scrollTo=98958e3a

Running on Google Colab Pro with a Tesla P100, i get 780ms with GPU.
https://colab.research.google.com/github/arthurpham/google_colab/blob/44cb4956ba87ec5cc3a04323afd548cee91d9772/TARF_MC_Performance_TQF.ipynb#scrollTo=5CMM52A4Wvqy

What kind of gpu do you use to get 130 ms ?

from tf-quant-finance.

cyrilchim commented on May 14, 2024

I am using Tesla T4.

So given it is working, I looked a bit in performance details. I would still recommend using Geometric Brownian motion directly as that has a different implementation than the GenericItoProcess. The latter relies on TensorArrays (basically a list) to store location values, so differentiating through it is a bit slow. Nevertheless,

Since you have 53 time points it is more efficient not to use watch_params (set it to False).
Also, you do not need time_step in GenericItoProcess as you are stepping though times (basically, the algorithm steps through all your times + the grid formed with time_step). Instead, use times_grid directly

        paths = process.sample_paths(
           times=times, num_samples=num_samples,
            initial_state=log_spot, 
            watch_params=watch_params_list,
            times_grid=times,  # the algorithms steps only through times.
            # Select a random number generator
            random_type=tff.math.random.RandomType.SOBOL,
            )

This improves things quite a bit. On public colab with Tesla T4 I get ~100 ms on a GPU to get all Greeks and ~3 secs on a CPU (that has 2 processors, see !cat /proc/cpuinfo).

If needed, I can try further optimize CPU performance

from tf-quant-finance.

cyrilchim commented on May 14, 2024

Also, could you please report the random number generation speed? I assume you'd need [200_000, 53] samples from Sobol that you then covert to normals. Could you please let me know how long that takes?

from tf-quant-finance.

cyrilchim commented on May 14, 2024

I have also changed Sobol sequence to use tf.math.qmc instead (The default RandomType.SOBOL is tied to an old implementation)

        sobol_seq = tff.math.qmc.sobol_sample(num_results=num_samples,
                                                 dim=num_timesteps, 
                                                 dtype=tf.float64)
        # Shape [num_samples, num_timesteps, 1]
        normal_draws = tf.math.erfinv((sobol_seq[..., tf.newaxis] - 0.5) * 2)* np.sqrt(2)
        paths = process.sample_paths(
           times=times, num_samples=num_samples,
            initial_state=log_spot,  
            watch_params=watch_params_list,
            times_grid=times,
            normal_draws=normal_draws)

On Tesla T4, I seem to compute price + greeks is 90 ms.

The colab CPUs are a bit weak. On my 2020 intel macbook pro (with enforced single-threading) I get 1.69 secs for price with greeks and 0.61 secs for pricing. I think this is more or less in the ballpark of your results.

from tf-quant-finance.

arthurpham commented on May 14, 2024

Ok that's surprising to see that watch_params=False would be faster as the documentation says the opposite.
Just changing that gives better result, but how would you know in advance if it should be True or False ?

Another question: https://colab.research.google.com/github/arthurpham/google_colab/blob/f02d64e65e9ba59ca9ce33048f3459560cef7fa5/TARF_MC_Performance_TQF.ipynb

Let's assume i'm trying to determine what would be a realistic latency and cost (assuming a $/sec for gpu) of a pricing service relying on TQF. So let's say i receive a new trade with different inputs (the request arrives at different time, so no batching possible), i should be able to reuse the optimized function (maybe the input_signature of the tf.function is not configured properly ?), but i don't observe that.
I would expect faster timing where i put the <======== (strike 18.2, 20.0 and 25.0).

So, is it possible to avoid retracing with xla when changing the inputs (assuming no change in shapes) ?

for spot in [15.0, 18.2, 20.0, 25.0]:
  spot = tf.constant(spot, dtype=dtype)
  t = time.time()
  tarf_price, tarf_greeks = greeks_fn_xla(strikes, spot, sigma, rate, dividend)
  time_tqf0 = time.time() - t

  t = time.time()
  tarf_price, tarf_greeks = greeks_fn_xla(strikes, spot, sigma, rate, dividend)
  time_tqf = time.time() - t

------------------------
TQF gpu TARF XLA price+delta+vega spot: 15.0
wall time + tracing:  2.576292037963867
options per second + tracing:  0.38815475313518205
wall time:  0.06866049766540527
options per second:  14.564415260623022
------------------------

------------------------
TQF gpu TARF XLA price+delta+vega spot: 18.2
wall time + tracing:  2.5903542041778564 <========
options per second + tracing:  0.3860475908611836
wall time:  0.06723594665527344
options per second:  14.872996510737284
------------------------

------------------------
TQF gpu TARF XLA price+delta+vega spot: 20.0
wall time + tracing:  2.6097450256347656 <========
options per second + tracing:  0.3831791957364766
wall time:  0.06735658645629883
options per second:  14.846358056591885
------------------------

------------------------
TQF gpu TARF XLA price+delta+vega spot: 25.0
wall time + tracing:  3.6820600032806396 <========
options per second + tracing:  0.27158710045708667
wall time:  0.0670621395111084
options per second:  14.911543343086402
------------------------

When i call print(greeks_fn_xla.pretty_printed_concrete_signatures()), i only see:

greeks_fn_xla(strikes, spot, sigma, rate, dividend)
  Args:
    strikes: float64 Tensor, shape=()
    spot: float64 Tensor, shape=()
    sigma: float64 Tensor, shape=()
    rate: float64 Tensor, shape=()
    dividend: float64 Tensor, shape=()
  Returns:
    (<1>, [<2>, <3>, <4>, <5>])
      <1>: float64 Tensor, shape=()
      <2>: float64 Tensor, shape=()
      <3>: float64 Tensor, shape=()
      <4>: float64 Tensor, shape=()
      <5>: float64 Tensor, shape=()

from tf-quant-finance.

arthurpham commented on May 14, 2024

Also, could you please report the random number generation speed? I assume you'd need [200_000, 53] samples from Sobol that you then covert to normals. Could you please let me know how long that takes?

The library that i forked to add the payoff is not mine, it comes from a book and the code is shared on github.
https://github.com/asavine/CompFinance/blob/6538e90c95993eebac6b728f1bf28b877bec9b5d/mcBase.h#L300

I don't have an easy way to measure what you are asking, but if i comment the path generation and payoff, i think it that the random number generation takes roughly 40% of the total time in serial mode :

auto cRng = rng.clone();
cRng->init(cMdl->simDim());        
//	Iterate through paths	
for (size_t i = 0; i<nPath; i++)
{
    //  Next Gaussian vector, dimension D
    cRng->nextG(gaussVec);                        
    //  Generate path, consume Gaussian vector
    ////cMdl->generatePath(gaussVec, path);     
    //	Compute result
    ////prd.payoffs(path, results[i]);
}

from tf-quant-finance.

cyrilchim commented on May 14, 2024

Hi Arthur,

For TF I checked that the random numbers also take roughly 40% of the compute time.
As for your questions:

Compiled functions expect tensor inputs, not numpy objects, so should be something like

for spot in [tf.convert_to_tensor(x, tf.float64) for x in [15.0, 18.2, 20.0, 25.0]]:
...

watch_params works well when there is a single expiry in the options (so you do not record intermediate values). I could try fixing that as I would need to adjust how gradient flows through tf.TensorArray. The idea was to mock XLA-friendly support for forward gradients.

from tf-quant-finance.

cyrilchim commented on May 14, 2024

I've pushed a change so that watch_params should work fine now. Could you please double check? On my mac CPU I get ~1.13 secs for greeks and price calculation (on a single thread).

from tf-quant-finance.

arthurpham commented on May 14, 2024

Compiled functions expect tensor inputs, not numpy objects, so should be something like

In my example above the spot was converted with spot = tf.constant(spot, dtype=dtype)
I tried your suggestion, and i still don't get the correct behavior: each time the strike value is changed, the first call to greeks_fn_xla is very slow.

https://colab.research.google.com/github/arthurpham/google_colab/blob/dff028c368a615c585e7a0e92c87c287c08b6bb4/TARF_MC_Performance_TQF.ipynb#scrollTo=cb03ad7b

for spot in [tf.convert_to_tensor(x, tf.float64) for x in [15.0, 18.2, 20.0, 25.0]]:
  t = time.time()
  tarf_price, tarf_greeks = greeks_fn_xla(strikes, spot, sigma, rate, dividend)
  time_tqf0 = time.time() - t
  
  t = time.time()
  tarf_price, tarf_greeks = greeks_fn_xla(strikes, spot, sigma, rate, dividend)
  time_tqf = time.time() - t

------------------------
TQF gpu TARF XLA price+delta+vega spot: 15.0
wall time + tracing:  2.9289374351501465 <=============
options per second + tracing:  0.34142074460144173
wall time:  0.12999844551086426
options per second:  7.692399675013251
------------------------

------------------------
TQF gpu TARF XLA price+delta+vega spot: 18.2
wall time + tracing:  2.884617805480957 <=============
options per second + tracing:  0.346666375732666
wall time:  0.13253283500671387
options per second:  7.5453000001798936
------------------------

------------------------
TQF gpu TARF XLA price+delta+vega spot: 20.0
wall time + tracing:  2.8701701164245605 <=============
options per second + tracing:  0.34841140400615833
wall time:  0.13067197799682617
options per second:  7.652750155999693
------------------------

------------------------
TQF gpu TARF XLA price+delta+vega spot: 25.0
wall time + tracing:  3.2776947021484375 <=============
options per second + tracing:  0.3050924783642991
wall time:  0.13060975074768066
options per second:  7.656396205302135
------------------------

I've pushed a change so that watch_params should work fine now. Could you please double check? On my mac CPU I get ~1.13 secs for greeks and price calculation (on a single thread).

Thank you for the quick fix, seems to make a difference from a quick test.

from tf-quant-finance.

cyrilchim commented on May 14, 2024

You need to remove @tf.function decorator from tarf_payoff. Not sure why this causing a problem here but generally one should avoid nested tf.functions for XLA compilation.

Just in case, here is a version of the pricer incorporating all suggestions above:

def set_up_pricer_xla(times, watch_params=False):
    def price_eu_options(strikes, spot, sigma, rate, dividend):
        # Define drift and volatility functions. 
        def drift_fn(t, x):
          del t, x
          return rate - dividend - 0.5 * sigma**2
        def vol_fn(t, x):
          del t, x
          return tf.reshape(sigma, [1, 1])
        # Use GenericItoProcess class to set up the Ito process
        process = tff.models.GenericItoProcess(
            dim=1,
            drift_fn=drift_fn,
            volatility_fn=vol_fn,
            dtype=dtype)
        log_spot = tf.math.log(tf.reduce_mean(spot))
        if watch_params:
            watch_params_list = [sigma, rate, dividend]
        else:
            watch_params_list = None
        # Feed a new version of Sobol numbers skipping the 1st element which is `0.0`
        sobol_seq = tff.math.qmc.sobol_sample(num_results=num_samples + 1,
                                                 dim=num_timesteps, 
                                                 dtype=tf.float64)[1:]
        # Convert uniform draws to normals
        normal_draws = tf.math.erfinv((sobol_seq[..., tf.newaxis] - 0.5) * 2)* np.sqrt(2)
        paths = process.sample_paths(
           times=times, num_samples=num_samples,
            initial_state=log_spot,  
            watch_params=watch_params_list,
            times_grid=times,
            normal_draws=normal_draws)
        def tarf_payoff(paths):
            # Shape [num_timesteps, num_samples]
            paths = tf.transpose(paths)
            cur_spot = paths[0]
            total = tf.zeros_like(cur_spot)
            discounted_payoff = tf.zeros_like(cur_spot)
            df = tf.constant(1.0, dtype=tf.float64)
            is_active = tf.ones([num_samples], dtype=tf.bool)
            i = tf.constant(0, dtype=tf.int32)
            # Explicitly define the while_loop 
            def cond(i, is_active, total, discounted_payoff, df):
                return i < num_timesteps
            def body(i, is_active, total, discounted_payoff, df):
                # Here Tensors are of shape `[num_samples]`
                cur_spot = paths[i]
                cashflow = tf.zeros_like(cur_spot)
                new_is_active = K_knockout > cur_spot
                add_cashflow = tf.where(tf.logical_or(K_upper <= cur_spot, cur_spot < K_lower),
                    True, 
                    False)
                new_cashflow = tf.where(K_upper <= cur_spot,
                    cur_spot - strike,
                    cashflow
                )
                new_cashflow = tf.where(cur_spot < K_lower,
                                    step_up_ratio*(cur_spot - strike),
                                    new_cashflow)
                new_is_active = tf.where(add_cashflow,
                    tf.where(total + new_cashflow >= tarf_target,
                            False, new_is_active),
                    new_is_active)
                new_cashflow = tf.where(add_cashflow,
                    tf.where(total + new_cashflow >= tarf_target,
                            tarf_target - total, new_cashflow),
                    new_cashflow
                    )
                new_total = tf.where(add_cashflow,
                    total + new_cashflow,
                    total
                    )
                new_discounted_payoff = tf.where(add_cashflow,
                    discounted_payoff + df * new_cashflow,
                    discounted_payoff)
                # Update values only if active
                new_cashflow = tf.where(is_active, 
                                        new_cashflow,
                                        cashflow)
                new_total = tf.where(is_active, new_total, total)
                new_discounted_payoff = tf.where(is_active, 
                                        new_discounted_payoff,
                                        discounted_payoff)
                new_is_active = tf.where(is_active, new_is_active, is_active)
                return (i + 1, new_is_active, new_total, new_discounted_payoff, df)
            _, is_active, total, discounted_payoff, _ = tf.while_loop(
                cond, body, (i, is_active, total, discounted_payoff, df),
                maximum_iterations=num_timesteps,
            )
            return discounted_payoff
        reshaped_paths = tf.reshape(tf.math.exp(paths), [num_samples, num_timesteps])
        payoffs = tarf_payoff(reshaped_paths)
        prices = tf.reduce_mean(payoffs)
        return prices
    return price_eu_options

Now I can define

price_eu_options = set_up_pricer_xla(times, watch_params=True)

@tf.function(jit_compile=True,
             input_signature=[tf.TensorSpec([], dtype=tf.float64),
                            tf.TensorSpec([], dtype=tf.float64),
                            tf.TensorSpec([], dtype=tf.float64),
                            tf.TensorSpec([], dtype=tf.float64),
                            tf.TensorSpec([], dtype=tf.float64)
                               ])
def greeks_fn_xla(strikes, spot, sigma, rate, dividend):
    with tf.GradientTape() as tape:
      tape.watch([spot, sigma, rate, dividend])
      prices = price_eu_options(strikes, spot, sigma, rate, dividend)
    return prices, tape.gradient(prices, [spot, sigma, rate, dividend])

and try running for different spot values:

for spot in [tf.convert_to_tensor(x, tf.float64) for x in [25.0, 28.2, 30.0, 45.0]]:
  t = time.time()
  tarf_price, tarf_greeks = greeks_fn_xla(strikes, spot, sigma, rate, dividend)
  print(time.time() - t)

This works as expected for me

from tf-quant-finance.

arthurpham commented on May 14, 2024

You need to remove @tf.function decorator from tarf_payoff. Not sure why this causing a problem here but generally one should avoid nested tf.functions for XLA compilation.

Yes that was the problem, thanks a lot for being patient.

from tf-quant-finance.

XLA compilation error of a TARF payoff with MC about tf-quant-finance HOT 16 CLOSED

Comments (16)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent