About 30% of the entire Pandora runtime goes into calculating an arctan and a tan:

20x faster with new numpy <a class="issue-link js-issue-link" data-error-text="Fai

Will be released with Numpy 1.22 <a href="https://numpy.org/devdocs/release/1.22.0-not

Faster tan and arctan possible? about pandora HOT 7 CLOSED

hippke commented on August 26, 2024

Faster tan and arctan possible?

from pandora.

Comments (7)

hippke commented on August 26, 2024

CORDIC-like method is ~5x slower (cached table, 60 entries). Accuracy would be sufficient.

Pre-calculate table (takes negligible time):

cordic_table_size = 60
l1 = np.empty(cordic_table_size)
l2 = l1.copy()
for idx in range(cordic_table_size):
    exp = np.exp2(-idx)
    l1[idx] = exp
    l2[idx] = np.arctan(exp)

Call function (too slow):

@jit(cache=False, nopython=True, fastmath=True)
def arctan_cordic(y, l1, l2, x=1.0):
    r = 0.0
    for idx in range(len(l1)):
        if y < 0:
            r, x, y = r - l2[idx], x - l1[idx]*y, y + l1[idx]*x
        else:
            r, x, y = r + l2[idx], x + l1[idx]*y, y - l1[idx]*x
    return r

from pandora.

hippke commented on August 26, 2024

20x faster with new numpy
numpy/numpy#19478

from pandora.

hippke commented on August 26, 2024

New numpy with AVX512 not released yet
Will make only Intel CPUs faster (and future AMD if they get AVX512?), and currently only on Linux
Installed numpy dev from source
have to modify "setup.py" with version numbers (lines 60ff)
then pip install .
Own test for 1m linspace values artcan: 1s vs 0.2s (~5x)
In new env: numba installs (old) numpy --> install all else first, then new numpy version
Problem: numba requires version >1.17 <1.20
Can change, but is linked to github version (hard to change)(
not sure if it would break numba
easier to extract arctan source?

from pandora.

hippke commented on August 26, 2024

Will be released with Numpy 1.22 https://numpy.org/devdocs/release/1.22.0-notes.html

from pandora.

hippke commented on August 26, 2024

Can switch off numba warning in__init__.py
for ellipse_pos_iter: is hardly faster (9.18 vs 8.96)
for ellipse_pos: (10.92 8.01) vs (7.82 7.91)

from pandora.

hippke commented on August 26, 2024

Try CUDA https://numba.discourse.group/t/cant-use-basic-numpy-functions-with-cuda-like-zeros-or-empty/628/3

from pandora.

hippke commented on August 26, 2024

It takes ~0.1ms to transfer data from CPU to GPU and back, i.e. <10k models/sec even at zero model calc time
==> not worth using GPU (or only in vectorized case which is hard to implement)

from pandora.

Recommend Projects

Faster tan and arctan possible? about pandora HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent