Comments (6)
Thanks for the issue. Main reason we swapped to torch.sparse.mm
is to utilize the reduce
argument that PyTorch introduced for the CPU path. Besides that, on GPU, both functions should map to the same underlying implementation. You can verify this by correcting your benchmark. When benchmarking on GPUs, you need to avoid measuring warm up times. The following code fixes this:
from time import time
import torch
torch.manual_seed(2022)
a = torch.rand(512, 512, dtype=torch.double).to_sparse().cuda()
b = torch.rand(512, 512, dtype=torch.double).to_sparse().cuda()
for i in range(100):
if i == 20:
sparse_start_time = time()
y1 = torch.sparse.mm(a, b)
sparse_end_time = time()
for i in range(100):
if i == 20:
spmm_start_time = time()
y2 = torch.spmm(a, b)
spmm_end_time = time()
print(y1)
print("=============================================")
print(y2)
print("=============================================")
print("torch.sparse.mm: ", (sparse_end_time - sparse_start_time), "s")
print("torch.spmm: ", (spmm_end_time - spmm_start_time), "s")
Output:
torch.sparse.mm: 5.9526426792144775 s
torch.spmm: 5.965423583984375 s
from pytorch_sparse.
Thank you very much for your prompt reply, I redid the test as per your reminder and the test result is consistent with what you said!👍
import torch
import numpy as np
from time import time
import scipy.sparse as sp
torch.manual_seed(2022)
np.random.seed(2022)
a = np.random.rand(512, 512)
row = np.random.choice(np.arange(a.shape[0]), replace=False,
size=int(a.shape[0] * 0.5))
col = np.random.choice(np.arange(a.shape[1]), replace=False,
size=int(a.shape[1] * 0.5))
a[row, col] = 0
sp_a = sp.coo_matrix(a)
for i in range(10000):
if i == 20:
np_scipy_start_time = time()
y0 = sp_a.dot(sp_a)
np_scipy_end_time = time()
# y0 = sp.coo_matrix(sp_a * sp_a)
a = torch.tensor(a).to_sparse().cuda()
for i in range(10000):
if i == 20:
sparse_start_time = time()
y1 = torch.sparse.mm(a, a)
sparse_end_time = time()
for i in range(10000):
if i == 20:
spmm_start_time = time()
y2 = torch.spmm(a, a)
spmm_end_time = time()
print("np_scipy: ", (np_scipy_end_time - np_scipy_start_time), "s")
print("torch.sparse.mm: ", (sparse_end_time - sparse_start_time), "s")
print("torch.spmm: ", (spmm_end_time - spmm_start_time), "s")
Output:
np_scipy: 1961.8122715950012 s
torch.sparse.mm: 275.561555147171 s
torch.spmm: 275.70165848731995 s
from pytorch_sparse.
Thanks for the issue. Main reason we swapped to
torch.sparse.mm
is to utilize thereduce
argument that PyTorch introduced for the CPU path. Besides that, on GPU, both functions should map to the same underlying implementation. You can verify this by correcting your benchmark. When benchmarking on GPUs, you need to avoid measuring warm up times. The following code fixes this:from time import time import torch torch.manual_seed(2022) a = torch.rand(512, 512, dtype=torch.double).to_sparse().cuda() b = torch.rand(512, 512, dtype=torch.double).to_sparse().cuda() for i in range(100): if i == 20: sparse_start_time = time() y1 = torch.sparse.mm(a, b) sparse_end_time = time() for i in range(100): if i == 20: spmm_start_time = time() y2 = torch.spmm(a, b) spmm_end_time = time() print(y1) print("=============================================") print(y2) print("=============================================") print("torch.sparse.mm: ", (sparse_end_time - sparse_start_time), "s") print("torch.spmm: ", (spmm_end_time - spmm_start_time), "s")Output:
torch.sparse.mm: 5.9526426792144775 s torch.spmm: 5.965423583984375 s
Hi, why your approach can avoid warm-up times? Thanks
Btw, does the backend kernel for spmm call cusparse kernels?
from pytorch_sparse.
Warm-up times are avoided by just measuring time from the 20th iteration onwards. I don't know if there exists a better way to do this, but that's what I am constantly using and found out to work quite well.
Backward pass does basically two things: (1) Compute the transposed version of the sparse matrix and (2) perform grad_mat = sparse_mat.t() @ grad_out
from pytorch_sparse.
Warm-up times are avoided by just measuring time from the 20th iteration onwards. I don't know if there exists a better way to do this, but that's what I am constantly using and found out to work quite well.
Backward pass does basically two things: (1) Compute the transposed version of the sparse matrix and (2) perform
grad_mat = sparse_mat.t() @ grad_out
Thank you!
That's what ge-spmm did (https://github.com/hgyhungry/ge-spmm/blob/master/pytorch-custom/op.py).
However, I also found the post (https://discuss.pytorch.org/t/manually-calculate-the-gradient-of-a-sparse-matrix/86203/2?u=jiuhnny) said it is only for gradient of the dense. The gradient of the sparse is also required, namely df/dA = (df/dY)@ B.t() in the post.
from pytorch_sparse.
Yes, that is correct.
from pytorch_sparse.
Related Issues (20)
- "OSError: /lib64/libm.so.6: version `GLIBC_2.29' not found" with torch-sparse 0.6.18+pt20cu118 HOT 1
- Time complexity of Sparse Sparse Matrix Multiplication? HOT 2
- ImportError: Could not find module '_version_cpu', after build from source HOT 1
- Benchmarks result in `INTERNAL ASSERT FAILED` when run with device `mps` HOT 4
- Batch in spspmm? HOT 4
- Is there a convenient way to Substract two vectors? HOT 2
- ROCm wheels
- High dimensional Sparse Matrix Multiplication HOT 1
- Factorization? HOT 1
- Branching gradient flow HOT 4
- SparseTensor or SparseStorage for data HOT 5
- Sparse-sparse matrix multiplication break GPU memory when dense does not. HOT 4
- Unexpected behavior sparse add HOT 1
- Sparse Bilinear map HOT 5
- pyproject.toml doesn't list as dependencies modules imported at runtime: torch, torch_scatter
- required shared library libtorch.so not found HOT 1
- torch_sparse.tensor.SparseTensor.partition causes "core dumped" HOT 3
- problem loading the *_cpu.so file HOT 4
- Can add 'multiplication' and 'div' operations in torch_scatter.coalesce ? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pytorch_sparse.