Comments (16)
Hi @arthurpham ,
Thanks for reaching out! The issue you are referring to is a TensorFlow (TF) issue as you implicitly have a while loop inside a vectorized map. I just checked internally, it seems that someone is on it but feel free filing the bug with TF team directly.
On the other hand, even if the issue is resolved, I don't think that the code is well-optimized for a GPU device because of all the if statements branching.
Could you please give me edit rights for the colab? Otherwise, I have fiddled a bit with your my_function adding batched version of the calculation, making the code a bit more inefficient on a CPU device but clearly improving GPU performance. Also, it is XLA-compatible now (runs in < 70 ms for me on a T4 device).
def my_function_new(paths):
# Shape [num_timesteps, num_samples]
paths = tf.transpose(paths)
cur_spot = paths[0]
total = tf.zeros_like(cur_spot)
discounted_payoff = tf.zeros_like(cur_spot)
df = tf.constant(1.0, dtype=tf.float64)
is_active = tf.ones([num_samples], dtype=tf.bool)
cashflow = tf.zeros_like(cur_spot)
i = tf.constant(0, dtype=tf.int32)
# Explicitly define the while_loop
def cond(i, is_active, cashflow, total, discounted_payoff):
return i < num_timesteps
def body(i, is_active, cashflow, total, discounted_payoff):
# Here Tensors are of shape `[num_samples]`
cur_spot = paths[i]
add_cashflow = False
new_is_active = K_knockout > cur_spot
add_cashflow = tf.where(
tf.logical_or(K_upper <= cur_spot, cur_spot < K_lower),
True, False)
new_cashflow = tf.where(
K_upper <= cur_spot,
cur_spot - strike,
cashflow
)
new_cashflow = tf.where(cur_spot < K_lower,
step_up_ratio*(cur_spot - strike),
new_cashflow)
new_is_active = tf.where(
add_cashflow,
tf.where(total + new_cashflow >= tarf_target,
False, new_is_active),
new_is_active)
new_cashflow = tf.where(
add_cashflow,
tf.where(total + new_cashflow >= tarf_target,
tarf_target - total, new_cashflow),
new_cashflow
)
new_total = tf.where(
add_cashflow,
total + new_cashflow,
total
)
new_discounted_payoff = tf.where(
add_cashflow,
discounted_payoff + df * new_cashflow,
discounted_payoff)
# Update values only if active
new_cashflow = tf.where(is_active,
new_cashflow,
cashflow)
new_total = tf.where(is_active, new_total, total)
new_discounted_payoff = tf.where(is_active,
new_discounted_payoff,
discounted_payoff)
new_is_active = tf.where(is_active,
new_is_active,
is_active)
return (i + 1, new_is_active, new_cashflow,
new_total, new_discounted_payoff)
_, is_active, cashflow, total, discounted_payoff = tf.while_loop(
cond, body, (i, is_active, cashflow, total, discounted_payoff)
)
return discounted_payoff
Then no need to use vectorized_map
. Simply
payoffs = my_function_new(reshaped_paths)
Please let me know if that makes sense
Would you be interested cleaning up the code and contributing it to the library? I have not checked the maths so testing is needed.
from tf-quant-finance.
Thank you for the help.
I reused your payoff and that worked for the price.
When i try to compute the delta with XLA compilation, i get an error :
https://colab.research.google.com/github/arthurpham/google_colab/blob/494d8a07b8eb39b2c93dbe4dbe8f519730be0030/TARF_MC_Performance_TQF.ipynb
@tf.function(jit_compile=True,
input_signature=[tf.TensorSpec([], dtype=tf.float64),
tf.TensorSpec([], dtype=tf.float64),
tf.TensorSpec([], dtype=tf.float64)
])
def delta_fn_xla(strikes, spot, sigma):
fn = lambda spot: price_eu_options_xla(strikes, spot, sigma)
return tff.math.fwd_gradient(fn, spot, use_gradient_tape=True)
InternalError: Propagate: Cannot find body function while_body_5075_grad_16381_grad_16947_const_0 for While node gradients/PartitionedCall_2/gradients/gradients/PartitionedCall_grad/PartitionedCall_15_grad/PartitionedCall/gradients/gradients/while_grad/while_grad_grad/gradients/while_grad/while_grad_grad [Op:__inference_delta_fn_xla_17607]
Also for the non-xla version, is there anything i can do to improve the performance ?
t = time.time()
tarf_price = price_eu_options_xla(strikes, spot, sigma)
tarf_delta = delta_fn(strikes, spot, sigma)
tarf_vega = vega_fn(strikes, spot, sigma)
time_tqf = time.time() - t
With fwd_gradients:
TQF gpu TARF price+delta+vega
wall time + tracing: 31.278723001480103
options per second + tracing: 0.03197061465561366
wall time: 7.026975631713867
options per second: 0.142308733146425
------------------------
price tf.Tensor(-246.927357070608, shape=(), dtype=float64)
delta tf.Tensor(21.37942877385511, shape=(), dtype=float64)
vega tf.Tensor(-331.98229464223164, shape=(), dtype=float64)
With gradients:
TQF gpu TARF price+delta+vega
wall time + tracing: 15.258108139038086
options per second + tracing: 0.0655389246745136
wall time: 5.396306037902832
options per second: 0.1853119509857581
------------------------
price tf.Tensor(-246.927357070608, shape=(), dtype=float64)
delta tf.Tensor(21.37942877385511, shape=(), dtype=float64)
vega tf.Tensor(-331.98229464223175, shape=(), dtype=float64)
For reference, here is the same TARF payoff implemented in c++, using Antoine Savine library from the book "Modern Computational Finance: AAD and Parallel Simulations".
https://github.com/arthurpham/CompFinance/blob/e3f2c4e804901fe39d9a33d6fb01c30c8880baee/xlSpreadheets/CompFinance_TARF.xlsx?raw=true
This is run on a regular laptop : 243 ms in parallel (6 cores), 1020 ms in serial.
from tf-quant-finance.
Thank you, Arthur!
The issue you are having is nested jit_compilation. Here you'd need to do a few changes:
- In
my_function_new
, please addmaximum_iterations
to thewhile_loop
:
_, is_active, cashflow, total, discounted_payoff = tf.while_loop(
cond, body, (i, is_active, cashflow, total, discounted_payoff),
maximum_iterations=num_timesteps,
)
- Greeks can be computed using backward gradient
@tf.function(jit_compile=True,
input_signature=[tf.TensorSpec([], dtype=tf.float64),
tf.TensorSpec([], dtype=tf.float64),
tf.TensorSpec([], dtype=tf.float64)
])
def greeks_fn_xla(strikes, spot, sigma):
with tf.GradientTape() as tape:
tape.watch([spot, sigma])
prices = price_eu_options(strikes, spot, sigma)
return prices, tape.gradient(prices, [spot, sigma])
As for the perfomance, just for the reference, GPU performance I get is ~130 ms
for greeks_fn_xla
function (Please check how long it takes on your CPU, also when running with XLA, check CPU utilization as this might as well be running on a single thread).
We can try to improve performance separately.
Does 1020 ms include greek computation? If so, what is pricing speed separately? Could you please point to the source code?
from tf-quant-finance.
Also, do you need Euler sampling for the Geometric Brownian Motion? This can be sampled more efficiently with the designated sampler
from tf-quant-finance.
The 1020ms include the price+greeks (1st derivative with respect to the spot, vol, risk free rate, dividend yield) in serial (no parallel computation).
For the price alone :
.
It's 94 ms in parallel (using the 6 cores of the cpu on my laptop), and 452 ms in serial.
As for the code, it's in the same git repo as the spreadsheet : arthurpham/CompFinance@e118228
https://github.com/arthurpham/CompFinance
I just cloned the git repo and added the TARF payoff, and an xll function to create the Tarf product object.
I will include your suggestions and measure again the timing.
As for the GeometricBrownianMotion vs GenericItoProcess, reason i used GenericItoProcess is because that's what was in the notebook samples for Monte Carlo, and i assumed that with the same number of timesteps, we should get the same performance (i might be wrong), but get different accuracy.
from tf-quant-finance.
GPU performance I get is ~130 ms for greeks_fn_xla function
Running on Google Colab Pro with a Tesla T4, i get 1.27s with GPU.
https://colab.research.google.com/github/arthurpham/google_colab/blob/0586ac3a8fb9d477cdd752fcda69fadd4ec3bb0e/TARF_MC_Performance_TQF.ipynb#scrollTo=98958e3a
Running on Google Colab Pro with a Tesla P100, i get 780ms with GPU.
https://colab.research.google.com/github/arthurpham/google_colab/blob/44cb4956ba87ec5cc3a04323afd548cee91d9772/TARF_MC_Performance_TQF.ipynb#scrollTo=5CMM52A4Wvqy
What kind of gpu do you use to get 130 ms ?
from tf-quant-finance.
I am using Tesla T4.
So given it is working, I looked a bit in performance details. I would still recommend using Geometric Brownian motion directly as that has a different implementation than the GenericItoProcess. The latter relies on TensorArrays (basically a list) to store location values, so differentiating through it is a bit slow. Nevertheless,
-
Since you have 53 time points it is more efficient not to use
watch_params
(set it toFalse
). -
Also, you do not need
time_step
in GenericItoProcess as you are stepping thoughtimes
(basically, the algorithm steps through all your times + the grid formed with time_step). Instead, usetimes_grid
directly
paths = process.sample_paths(
times=times, num_samples=num_samples,
initial_state=log_spot,
watch_params=watch_params_list,
times_grid=times, # the algorithms steps only through times.
# Select a random number generator
random_type=tff.math.random.RandomType.SOBOL,
)
This improves things quite a bit. On public colab with Tesla T4 I get ~100
ms on a GPU to get all Greeks and ~3
secs on a CPU (that has 2 processors, see !cat /proc/cpuinfo
).
If needed, I can try further optimize CPU performance
from tf-quant-finance.
Also, could you please report the random number generation speed? I assume you'd need [200_000, 53] samples from Sobol that you then covert to normals. Could you please let me know how long that takes?
from tf-quant-finance.
I have also changed Sobol sequence to use tf.math.qmc
instead (The default RandomType.SOBOL is tied to an old implementation)
sobol_seq = tff.math.qmc.sobol_sample(num_results=num_samples,
dim=num_timesteps,
dtype=tf.float64)
# Shape [num_samples, num_timesteps, 1]
normal_draws = tf.math.erfinv((sobol_seq[..., tf.newaxis] - 0.5) * 2)* np.sqrt(2)
paths = process.sample_paths(
times=times, num_samples=num_samples,
initial_state=log_spot,
watch_params=watch_params_list,
times_grid=times,
normal_draws=normal_draws)
On Tesla T4, I seem to compute price + greeks
is 90 ms
.
The colab CPUs are a bit weak. On my 2020 intel macbook pro (with enforced single-threading) I get 1.69
secs for price with greeks and 0.61
secs for pricing. I think this is more or less in the ballpark of your results.
from tf-quant-finance.
Ok that's surprising to see that watch_params=False
would be faster as the documentation says the opposite.
Just changing that gives better result, but how would you know in advance if it should be True or False ?
Another question: https://colab.research.google.com/github/arthurpham/google_colab/blob/f02d64e65e9ba59ca9ce33048f3459560cef7fa5/TARF_MC_Performance_TQF.ipynb
Let's assume i'm trying to determine what would be a realistic latency and cost (assuming a $/sec for gpu) of a pricing service relying on TQF. So let's say i receive a new trade with different inputs (the request arrives at different time, so no batching possible), i should be able to reuse the optimized function (maybe the input_signature of the tf.function is not configured properly ?), but i don't observe that.
I would expect faster timing where i put the <======== (strike 18.2, 20.0 and 25.0).
So, is it possible to avoid retracing with xla when changing the inputs (assuming no change in shapes) ?
for spot in [15.0, 18.2, 20.0, 25.0]:
spot = tf.constant(spot, dtype=dtype)
t = time.time()
tarf_price, tarf_greeks = greeks_fn_xla(strikes, spot, sigma, rate, dividend)
time_tqf0 = time.time() - t
t = time.time()
tarf_price, tarf_greeks = greeks_fn_xla(strikes, spot, sigma, rate, dividend)
time_tqf = time.time() - t
------------------------
TQF gpu TARF XLA price+delta+vega spot: 15.0
wall time + tracing: 2.576292037963867
options per second + tracing: 0.38815475313518205
wall time: 0.06866049766540527
options per second: 14.564415260623022
------------------------
------------------------
TQF gpu TARF XLA price+delta+vega spot: 18.2
wall time + tracing: 2.5903542041778564 <========
options per second + tracing: 0.3860475908611836
wall time: 0.06723594665527344
options per second: 14.872996510737284
------------------------
------------------------
TQF gpu TARF XLA price+delta+vega spot: 20.0
wall time + tracing: 2.6097450256347656 <========
options per second + tracing: 0.3831791957364766
wall time: 0.06735658645629883
options per second: 14.846358056591885
------------------------
------------------------
TQF gpu TARF XLA price+delta+vega spot: 25.0
wall time + tracing: 3.6820600032806396 <========
options per second + tracing: 0.27158710045708667
wall time: 0.0670621395111084
options per second: 14.911543343086402
------------------------
When i call print(greeks_fn_xla.pretty_printed_concrete_signatures())
, i only see:
greeks_fn_xla(strikes, spot, sigma, rate, dividend)
Args:
strikes: float64 Tensor, shape=()
spot: float64 Tensor, shape=()
sigma: float64 Tensor, shape=()
rate: float64 Tensor, shape=()
dividend: float64 Tensor, shape=()
Returns:
(<1>, [<2>, <3>, <4>, <5>])
<1>: float64 Tensor, shape=()
<2>: float64 Tensor, shape=()
<3>: float64 Tensor, shape=()
<4>: float64 Tensor, shape=()
<5>: float64 Tensor, shape=()
from tf-quant-finance.
Also, could you please report the random number generation speed? I assume you'd need [200_000, 53] samples from Sobol that you then covert to normals. Could you please let me know how long that takes?
The library that i forked to add the payoff is not mine, it comes from a book and the code is shared on github.
https://github.com/asavine/CompFinance/blob/6538e90c95993eebac6b728f1bf28b877bec9b5d/mcBase.h#L300
I don't have an easy way to measure what you are asking, but if i comment the path generation and payoff, i think it that the random number generation takes roughly 40% of the total time in serial mode :
auto cRng = rng.clone();
cRng->init(cMdl->simDim());
// Iterate through paths
for (size_t i = 0; i<nPath; i++)
{
// Next Gaussian vector, dimension D
cRng->nextG(gaussVec);
// Generate path, consume Gaussian vector
////cMdl->generatePath(gaussVec, path);
// Compute result
////prd.payoffs(path, results[i]);
}
from tf-quant-finance.
Hi Arthur,
For TF I checked that the random numbers also take roughly 40% of the compute time.
As for your questions:
- Compiled functions expect tensor inputs, not numpy objects, so should be something like
for spot in [tf.convert_to_tensor(x, tf.float64) for x in [15.0, 18.2, 20.0, 25.0]]:
...
watch_params
works well when there is a single expiry in the options (so you do not record intermediate values). I could try fixing that as I would need to adjust how gradient flows throughtf.TensorArray
. The idea was to mock XLA-friendly support for forward gradients.
from tf-quant-finance.
I've pushed a change so that watch_params
should work fine now. Could you please double check? On my mac CPU I get ~1.13 secs
for greeks and price calculation (on a single thread).
from tf-quant-finance.
Compiled functions expect tensor inputs, not numpy objects, so should be something like
In my example above the spot was converted with spot = tf.constant(spot, dtype=dtype)
I tried your suggestion, and i still don't get the correct behavior: each time the strike value is changed, the first call to greeks_fn_xla
is very slow.
for spot in [tf.convert_to_tensor(x, tf.float64) for x in [15.0, 18.2, 20.0, 25.0]]:
t = time.time()
tarf_price, tarf_greeks = greeks_fn_xla(strikes, spot, sigma, rate, dividend)
time_tqf0 = time.time() - t
t = time.time()
tarf_price, tarf_greeks = greeks_fn_xla(strikes, spot, sigma, rate, dividend)
time_tqf = time.time() - t
------------------------
TQF gpu TARF XLA price+delta+vega spot: 15.0
wall time + tracing: 2.9289374351501465 <=============
options per second + tracing: 0.34142074460144173
wall time: 0.12999844551086426
options per second: 7.692399675013251
------------------------
------------------------
TQF gpu TARF XLA price+delta+vega spot: 18.2
wall time + tracing: 2.884617805480957 <=============
options per second + tracing: 0.346666375732666
wall time: 0.13253283500671387
options per second: 7.5453000001798936
------------------------
------------------------
TQF gpu TARF XLA price+delta+vega spot: 20.0
wall time + tracing: 2.8701701164245605 <=============
options per second + tracing: 0.34841140400615833
wall time: 0.13067197799682617
options per second: 7.652750155999693
------------------------
------------------------
TQF gpu TARF XLA price+delta+vega spot: 25.0
wall time + tracing: 3.2776947021484375 <=============
options per second + tracing: 0.3050924783642991
wall time: 0.13060975074768066
options per second: 7.656396205302135
------------------------
I've pushed a change so that watch_params should work fine now. Could you please double check? On my mac CPU I get ~1.13 secs for greeks and price calculation (on a single thread).
Thank you for the quick fix, seems to make a difference from a quick test.
from tf-quant-finance.
You need to remove @tf.function
decorator from tarf_payoff
. Not sure why this causing a problem here but generally one should avoid nested tf.function
s for XLA compilation.
Just in case, here is a version of the pricer incorporating all suggestions above:
def set_up_pricer_xla(times, watch_params=False):
def price_eu_options(strikes, spot, sigma, rate, dividend):
# Define drift and volatility functions.
def drift_fn(t, x):
del t, x
return rate - dividend - 0.5 * sigma**2
def vol_fn(t, x):
del t, x
return tf.reshape(sigma, [1, 1])
# Use GenericItoProcess class to set up the Ito process
process = tff.models.GenericItoProcess(
dim=1,
drift_fn=drift_fn,
volatility_fn=vol_fn,
dtype=dtype)
log_spot = tf.math.log(tf.reduce_mean(spot))
if watch_params:
watch_params_list = [sigma, rate, dividend]
else:
watch_params_list = None
# Feed a new version of Sobol numbers skipping the 1st element which is `0.0`
sobol_seq = tff.math.qmc.sobol_sample(num_results=num_samples + 1,
dim=num_timesteps,
dtype=tf.float64)[1:]
# Convert uniform draws to normals
normal_draws = tf.math.erfinv((sobol_seq[..., tf.newaxis] - 0.5) * 2)* np.sqrt(2)
paths = process.sample_paths(
times=times, num_samples=num_samples,
initial_state=log_spot,
watch_params=watch_params_list,
times_grid=times,
normal_draws=normal_draws)
def tarf_payoff(paths):
# Shape [num_timesteps, num_samples]
paths = tf.transpose(paths)
cur_spot = paths[0]
total = tf.zeros_like(cur_spot)
discounted_payoff = tf.zeros_like(cur_spot)
df = tf.constant(1.0, dtype=tf.float64)
is_active = tf.ones([num_samples], dtype=tf.bool)
i = tf.constant(0, dtype=tf.int32)
# Explicitly define the while_loop
def cond(i, is_active, total, discounted_payoff, df):
return i < num_timesteps
def body(i, is_active, total, discounted_payoff, df):
# Here Tensors are of shape `[num_samples]`
cur_spot = paths[i]
cashflow = tf.zeros_like(cur_spot)
new_is_active = K_knockout > cur_spot
add_cashflow = tf.where(tf.logical_or(K_upper <= cur_spot, cur_spot < K_lower),
True,
False)
new_cashflow = tf.where(K_upper <= cur_spot,
cur_spot - strike,
cashflow
)
new_cashflow = tf.where(cur_spot < K_lower,
step_up_ratio*(cur_spot - strike),
new_cashflow)
new_is_active = tf.where(add_cashflow,
tf.where(total + new_cashflow >= tarf_target,
False, new_is_active),
new_is_active)
new_cashflow = tf.where(add_cashflow,
tf.where(total + new_cashflow >= tarf_target,
tarf_target - total, new_cashflow),
new_cashflow
)
new_total = tf.where(add_cashflow,
total + new_cashflow,
total
)
new_discounted_payoff = tf.where(add_cashflow,
discounted_payoff + df * new_cashflow,
discounted_payoff)
# Update values only if active
new_cashflow = tf.where(is_active,
new_cashflow,
cashflow)
new_total = tf.where(is_active, new_total, total)
new_discounted_payoff = tf.where(is_active,
new_discounted_payoff,
discounted_payoff)
new_is_active = tf.where(is_active, new_is_active, is_active)
return (i + 1, new_is_active, new_total, new_discounted_payoff, df)
_, is_active, total, discounted_payoff, _ = tf.while_loop(
cond, body, (i, is_active, total, discounted_payoff, df),
maximum_iterations=num_timesteps,
)
return discounted_payoff
reshaped_paths = tf.reshape(tf.math.exp(paths), [num_samples, num_timesteps])
payoffs = tarf_payoff(reshaped_paths)
prices = tf.reduce_mean(payoffs)
return prices
return price_eu_options
Now I can define
price_eu_options = set_up_pricer_xla(times, watch_params=True)
@tf.function(jit_compile=True,
input_signature=[tf.TensorSpec([], dtype=tf.float64),
tf.TensorSpec([], dtype=tf.float64),
tf.TensorSpec([], dtype=tf.float64),
tf.TensorSpec([], dtype=tf.float64),
tf.TensorSpec([], dtype=tf.float64)
])
def greeks_fn_xla(strikes, spot, sigma, rate, dividend):
with tf.GradientTape() as tape:
tape.watch([spot, sigma, rate, dividend])
prices = price_eu_options(strikes, spot, sigma, rate, dividend)
return prices, tape.gradient(prices, [spot, sigma, rate, dividend])
and try running for different spot values:
for spot in [tf.convert_to_tensor(x, tf.float64) for x in [25.0, 28.2, 30.0, 45.0]]:
t = time.time()
tarf_price, tarf_greeks = greeks_fn_xla(strikes, spot, sigma, rate, dividend)
print(time.time() - t)
This works as expected for me
from tf-quant-finance.
You need to remove @tf.function decorator from tarf_payoff. Not sure why this causing a problem here but generally one should avoid nested tf.functions for XLA compilation.
Yes that was the problem, thanks a lot for being patient.
from tf-quant-finance.
Related Issues (20)
- New Release
- JointItoProcess
- Unable to import tf_quant_finance as tff in Colab notebook: Monte_Carlo_Euler_Scheme.ipynb
- Memory leak with TF.function HOT 2
- Discrete dividens for american options HOT 24
- Which precision should I use? float32 or float64 HOT 2
- case-sensitive paths on a case-insensitive filesystem HOT 4
- IRS Delta HOT 1
- conda-forge package HOT 1
- Fair performance comparison with QuantLib HOT 9
- times grid bug (XLA) HOT 1
- Quant
- The mailing-list is set to Invite Only HOT 1
- Issue with running Bazel tests HOT 2
- Options at-expiration return `nan` HOT 2
- SOFR USD OIS curve
- Hull White Model - calibration not aligning to input prices HOT 3
- Negative price for barrier option
- TF 2.12 (and Python 3.11) support HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tf-quant-finance.