Comments (4)
I write an roughly approximation to calculate the ops of sigmoid and swish. But I don't know wether it is
correct. So I post the code below for discussion.
def count_sigmoid(m, x, y):
"""
Using this approximation for exponetial operation: exp(x) = 1 + x + x^2/2! + .. + x^9/9!
For sigmoid f(x) = 1/(1+exp(x)): there are totally 10 add ops, 9 division ops(2! are considered as constant).
Since it is element-wise operation. The final ops is about(10+9)*num_elements.
"""
x = x[0]
nelements = x.numel()
total_ops = 19 * nelements
m.total_ops += torch.Tensor([int(total_ops)])
def count_swish(m, x, y):
"""
swish = x*sigmoid(x). So the total ops is 20*num_elements. See definition of count_sigmoid.
"""
x = x[0]
nelements = x.numel()
total_ops = 20 * nelements
m.total_ops += torch.Tensor([int(total_ops)])
from pytorch-opcounter.
Hi Yue,
I agree that SIgmoid is an important function. Could you provide a reference for existing implementation in modern DL frameworks? I am not sure whether they are using Maclaurin Series since it is usually slow.
from pytorch-opcounter.
You are right that Maclaurin manner is too slow. I tried to find the way of low level implementation of sigmoid and exp operations. But I failed since it seems involves lot of details... Maybe some day I'll find that.
I investigated the cost of both relu and sigmoid on gpu evironment by simply running iteratively. The result show that when input szie getting larger, relu runs faster than sigmiod. But the result did not show a certain ratio between the cost of relu and sigmoid. I guess it is affected by some other factors. Any way, sigmoid is not much slower than relu as I expected.
- Test on gpu evironment(v100x1)
approach | round | input size | run time(s) | input size | run time(s) |
---|---|---|---|---|---|
relu | 1000 | 1,3,224,224 | 0.015681 | 128,256,32,32 | 0.015775 |
sigmoid | 1000 | 1,3,224,224 | 0.014912 | 128,256,32,32 | 0.322097 |
relu | 2000 | 1,3,224,224 | 0.029916 | 128,256,32,32 | 0.339013 |
sigmoid | 2000 | 1,3,224,224 | 0.029156 | 128,256,32,32 | 0.691235 |
relu | 10000 | 1,3,224,224 | 0.144602 | 128,256,32,32 | 3.093490 |
sigmoid | 10000 | 1,3,224,224 | 0.143373 | 128,256,32,32 | 3.465683 |
from pytorch-opcounter.
I checked the document for Eigen, and found they are using series to approximate.
The cost of the computation is approximately $ 20 n^3 $ for matrices of size $ n $. The number 20 depends weakly on the norm of the matrix.
https://eigen.tuxfamily.org/dox/unsupported/group__MatrixFunctions__Module.html#matrixbase_exp
For a single number, I agree 20 is a reasonable number. Can you scratch a PR to support Sigmoid?
from pytorch-opcounter.
Related Issues (20)
- Count flops by a range
- thop/profile.py:12: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. `if LooseVersion(torch.__version__) < LooseVersion("1.0.0"):` HOT 2
- Does MACs and FLOPs count correctly for and INT8 quantized model? HOT 1
- Upload sdist to PyPI HOT 1
- Problem in bert HOT 1
- multiple inputs HOT 1
- Is the latest version calculate MACs or FLOPs HOT 2
- RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
- How to calculate the FLOPs of each type of layers?
- How to exclude flops of 1st input? HOT 1
- Incorrect macs without specifying batch size for conv layers
- will torch.matmul regards as zero_ops ?
- Is thop also effective for calculating Flops for spiking neural networks?
- rename calculate_conv2d_flops HOT 1
- thop calculates torch.nn module params incorrectly HOT 1
- RuntimeError: Can't add a new parameter
- count_normalization is only correct for batch_norm. wrong flops count for layernorm HOT 1
- Res add will be included when evaluate Resnet's OPs ? HOT 1
- Source distribution (sdist) and Git tags
- I got different results using thop and torchinfo HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pytorch-opcounter.