Firstly thank you for your exellent job. When I try to calculate the FLOPs of MixN

Method for Sigmoid Was Not Implemented about pytorch-opcounter HOT 4 OPEN

lyken17 commented on July 18, 2024

Method for Sigmoid Was Not Implemented

from pytorch-opcounter.

Comments (4)

LuffysMan commented on July 18, 2024

I write an roughly approximation to calculate the ops of sigmoid and swish. But I don't know wether it is
correct. So I post the code below for discussion.

def count_sigmoid(m, x, y):
    """
    Using this approximation for exponetial operation:  exp(x) = 1 + x + x^2/2! + .. + x^9/9!
    For sigmoid f(x) = 1/(1+exp(x)): there are totally 10 add ops, 9 division ops(2! are considered as constant).
    Since it is element-wise operation. The final ops is about(10+9)*num_elements.
    """
    x = x[0]

    nelements = x.numel()

    total_ops = 19 * nelements
    m.total_ops += torch.Tensor([int(total_ops)])


def count_swish(m, x, y):
    """
    swish = x*sigmoid(x). So the total ops is 20*num_elements. See definition of count_sigmoid.
    """
    x = x[0]

    nelements = x.numel()

    total_ops = 20 * nelements
    m.total_ops += torch.Tensor([int(total_ops)])

from pytorch-opcounter.

Lyken17 commented on July 18, 2024

Hi Yue,

I agree that SIgmoid is an important function. Could you provide a reference for existing implementation in modern DL frameworks? I am not sure whether they are using Maclaurin Series since it is usually slow.

from pytorch-opcounter.

LuffysMan commented on July 18, 2024

You are right that Maclaurin manner is too slow. I tried to find the way of low level implementation of sigmoid and exp operations. But I failed since it seems involves lot of details... Maybe some day I'll find that.
I investigated the cost of both relu and sigmoid on gpu evironment by simply running iteratively. The result show that when input szie getting larger, relu runs faster than sigmiod. But the result did not show a certain ratio between the cost of relu and sigmoid. I guess it is affected by some other factors. Any way, sigmoid is not much slower than relu as I expected.

Test on gpu evironment(v100x1)

approach	round	input size	run time(s)	input size	run time(s)
relu	1000	1,3,224,224	0.015681	128,256,32,32	0.015775
sigmoid	1000	1,3,224,224	0.014912	128,256,32,32	0.322097
relu	2000	1,3,224,224	0.029916	128,256,32,32	0.339013
sigmoid	2000	1,3,224,224	0.029156	128,256,32,32	0.691235
relu	10000	1,3,224,224	0.144602	128,256,32,32	3.093490
sigmoid	10000	1,3,224,224	0.143373	128,256,32,32	3.465683

from pytorch-opcounter.

Lyken17 commented on July 18, 2024

I checked the document for Eigen, and found they are using series to approximate.

The cost of the computation is approximately $ 20 n^3 $ for matrices of size $ n $. The number 20 depends weakly on the norm of the matrix.
https://eigen.tuxfamily.org/dox/unsupported/group__MatrixFunctions__Module.html#matrixbase_exp

For a single number, I agree 20 is a reasonable number. Can you scratch a PR to support Sigmoid?

from pytorch-opcounter.

Method for Sigmoid Was Not Implemented about pytorch-opcounter HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent