Comments (14)
Thanks for your reporting. It could be much appreciated if you could provide the model test_full_save.pt
as demonstrated in your example. Even better if you could pinpoint which function causes mem leak? Feel free to send PR if you could as I am in a very slow-response mode. Thank you!
from gotch.
Can you give me an email,I send this model to you, I try to find which function but it's hard for me, i am not familiar with c++
from gotch.
I would try the following things:
- Run a little longer (From the graph, your running was 10 minutes and memory increased ~ 1%). It may be plateau ?
- Try to move input tensor
tf
outside for-loop. Something like:
tf := ts.MustRand([]int64{1024, 7}, gotch.Float, gotch.CPU)
for i := 0; i < N; i++ {
ts.NoGrad(func() {
res, err := m.Forward(tf)
if err != nil {
panic(err)
}
res.MustDrop()
})
// tf.MustDrop()
if i%1000 == 0 {
fmt.Printf("Done %d \n", i)
}
}
If no leak, then the problem is at tensor initiation ts.MustRand
.
For-Loop
could be the problem. Depend on how you compose your server, try real use case rather thanfor-loop
?
from gotch.
[11:51:13] Top 10 stacks with outstanding allocations:
1056 bytes in 1 allocations from stack
c10::SmallVectorBase<unsigned int>::mallocForGrow(unsigned long, unsigned long, unsigned long&)+0x2f [libc10.so]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
1344 bytes in 7 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
1920 bytes in 10 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
3520 bytes in 5 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
3520 bytes in 5 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
4560 bytes in 10 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), &torch::autograd::VariableType::(anonymous namespace)::addmm>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)+0x32 [libtorch_cpu.so]
at::native::linear(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&)+0x45e [libtorch_cpu.so]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
5280 bytes in 55 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, c10::optional<c10::MemoryFormat>)+0x23 [libtorch_cpu.so]
5280 bytes in 5 allocations from stack
c10::SmallVectorBase<unsigned int>::mallocForGrow(unsigned long, unsigned long, unsigned long&)+0x2f [libc10.so]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
19008 bytes in 99 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
23150 bytes in 125 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[11:51:19] Top 10 stacks with outstanding allocations:
1920 bytes in 10 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
[unknown]
1920 bytes in 10 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
[unknown]
3840 bytes in 20 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
7040 bytes in 10 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
7040 bytes in 10 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
7392 bytes in 7 allocations from stack
c10::SmallVectorBase<unsigned int>::mallocForGrow(unsigned long, unsigned long, unsigned long&)+0x2f [libc10.so]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
9120 bytes in 20 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), &torch::autograd::VariableType::(anonymous namespace)::addmm>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)+0x32 [libtorch_cpu.so]
at::native::linear(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&)+0x45e [libtorch_cpu.so]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
10560 bytes in 110 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, c10::optional<c10::MemoryFormat>)+0x23 [libtorch_cpu.so]
37824 bytes in 197 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
46300 bytes in 250 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[11:51:24] Top 10 stacks with outstanding allocations:
2880 bytes in 15 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
[unknown]
2880 bytes in 15 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
[unknown]
5760 bytes in 30 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
9504 bytes in 9 allocations from stack
c10::SmallVectorBase<unsigned int>::mallocForGrow(unsigned long, unsigned long, unsigned long&)+0x2f [libc10.so]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
10560 bytes in 15 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
10560 bytes in 15 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
13680 bytes in 30 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), &torch::autograd::VariableType::(anonymous namespace)::addmm>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)+0x32 [libtorch_cpu.so]
at::native::linear(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&)+0x45e [libtorch_cpu.so]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
15840 bytes in 165 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, c10::optional<c10::MemoryFormat>)+0x23 [libtorch_cpu.so]
56832 bytes in 296 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
69450 bytes in 375 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[11:51:29] Top 10 stacks with outstanding allocations:
3840 bytes in 20 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
[unknown]
3840 bytes in 20 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
[unknown]
7680 bytes in 40 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
13728 bytes in 13 allocations from stack
c10::SmallVectorBase<unsigned int>::mallocForGrow(unsigned long, unsigned long, unsigned long&)+0x2f [libc10.so]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
14080 bytes in 20 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
14080 bytes in 20 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
18240 bytes in 40 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), &torch::autograd::VariableType::(anonymous namespace)::addmm>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)+0x32 [libtorch_cpu.so]
at::native::linear(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&)+0x45e [libtorch_cpu.so]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
21120 bytes in 220 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, c10::optional<c10::MemoryFormat>)+0x23 [libtorch_cpu.so]
75840 bytes in 395 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
92600 bytes in 500 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[11:51:34] Top 10 stacks with outstanding allocations:
4800 bytes in 25 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
[unknown]
4800 bytes in 25 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
[unknown]
9600 bytes in 50 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
15840 bytes in 15 allocations from stack
c10::SmallVectorBase<unsigned int>::mallocForGrow(unsigned long, unsigned long, unsigned long&)+0x2f [libc10.so]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
17600 bytes in 25 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
17600 bytes in 25 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
22800 bytes in 50 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), &torch::autograd::VariableType::(anonymous namespace)::addmm>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)+0x32 [libtorch_cpu.so]
at::native::linear(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&)+0x45e [libtorch_cpu.so]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
26400 bytes in 275 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, c10::optional<c10::MemoryFormat>)+0x23 [libtorch_cpu.so]
94656 bytes in 493 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
115750 bytes in 625 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[11:51:39] Top 10 stacks with outstanding allocations:
5760 bytes in 30 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
[unknown]
5776 bytes in 19 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
11520 bytes in 60 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
19008 bytes in 18 allocations from stack
c10::SmallVectorBase<unsigned int>::mallocForGrow(unsigned long, unsigned long, unsigned long&)+0x2f [libc10.so]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
21120 bytes in 30 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
21120 bytes in 30 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
27360 bytes in 60 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), &torch::autograd::VariableType::(anonymous namespace)::addmm>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)+0x32 [libtorch_cpu.so]
at::native::linear(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&)+0x45e [libtorch_cpu.so]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
31680 bytes in 330 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, c10::optional<c10::MemoryFormat>)+0x23 [libtorch_cpu.so]
113664 bytes in 592 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
138900 bytes in 750 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[11:51:44] Top 10 stacks with outstanding allocations:
6720 bytes in 35 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
6720 bytes in 35 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
[unknown]
13440 bytes in 70 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
23232 bytes in 22 allocations from stack
c10::SmallVectorBase<unsigned int>::mallocForGrow(unsigned long, unsigned long, unsigned long&)+0x2f [libc10.so]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
24640 bytes in 35 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
24640 bytes in 35 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
31920 bytes in 70 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), &torch::autograd::VariableType::(anonymous namespace)::addmm>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)+0x32 [libtorch_cpu.so]
at::native::linear(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&)+0x45e [libtorch_cpu.so]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
36960 bytes in 385 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, c10::optional<c10::MemoryFormat>)+0x23 [libtorch_cpu.so]
132672 bytes in 691 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
162050 bytes in 875 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[11:51:49] Top 10 stacks with outstanding allocations:
7680 bytes in 40 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
7680 bytes in 40 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
[unknown]
15360 bytes in 80 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
25344 bytes in 24 allocations from stack
c10::SmallVectorBase<unsigned int>::mallocForGrow(unsigned long, unsigned long, unsigned long&)+0x2f [libc10.so]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
28160 bytes in 40 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
28160 bytes in 40 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
36480 bytes in 80 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), &torch::autograd::VariableType::(anonymous namespace)::addmm>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)+0x32 [libtorch_cpu.so]
at::native::linear(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&)+0x45e [libtorch_cpu.so]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
42240 bytes in 440 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, c10::optional<c10::MemoryFormat>)+0x23 [libtorch_cpu.so]
151680 bytes in 790 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
185200 bytes in 1000 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[11:51:54] Top 10 stacks with outstanding allocations:
8640 bytes in 45 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
8640 bytes in 45 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
[unknown]
17280 bytes in 90 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
28512 bytes in 27 allocations from stack
c10::SmallVectorBase<unsigned int>::mallocForGrow(unsigned long, unsigned long, unsigned long&)+0x2f [libc10.so]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
31680 bytes in 45 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
31680 bytes in 45 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
41040 bytes in 90 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), &torch::autograd::VariableType::(anonymous namespace)::addmm>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)+0x32 [libtorch_cpu.so]
at::native::linear(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&)+0x45e [libtorch_cpu.so]
[unknown]
[unknown]
c10::TensorImpl::~TensorImpl() [clone .localalias.356]+0x0 [libc10.so]
[unknown]
47520 bytes in 495 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, c10::optional<c10::MemoryFormat>)+0x23 [libtorch_cpu.so]
170688 bytes in 889 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[unknown]
208350 bytes in 1125 allocations from stack
operator new(unsigned long)+0x18 [libstdc++.so.6.0.25]
[11:51:59] Top 10 stacks with outstanding allocations:
0 bytes in 48 allocations from stack
[unknown]
256 bytes in 8 allocations from stack
[unknown]
416 bytes in 2 allocations from stack
[unknown]
[unknown]
[unknown]
[unknown]
[unknown]
[unknown]
[unknown]
704 bytes in 2 allocations from stack
[unknown]
[11:52:04] Top 10 stacks with outstanding allocations:
0 bytes in 48 allocations from stack
[unknown]
256 bytes in 8 allocations from stack
[unknown]
416 bytes in 2 allocations from stack
[unknown]
[unknown]
[unknown]
[unknown]
[unknown]
[unknown]
[unknown]
704 bytes in 2 allocations from stack
[unknown]
[11:52:09] Top 10 stacks with outstanding allocations:
0 bytes in 48 allocations from stack
[unknown]
256 bytes in 8 allocations from stack
[unknown]
416 bytes in 2 allocations from stack
[unknown]
[unknown]
[unknown]
[unknown]
[unknown]
[unknown]
[unknown]
704 bytes in 2 allocations from stack
[unknown]
[11:52:14] Top 10 stacks with outstanding allocations:
0 bytes in 48 allocations from stack
[unknown]
256 bytes in 8 allocations from stack
[unknown]
416 bytes in 2 allocations from stack
[unknown]
[unknown]
[unknown]
[unknown]
[unknown]
[unknown]
[unknown]
704 bytes in 2 allocations from stack
[unknown]
from gotch.
But I think any model may have memory leaks.
from gotch.
Please do a fork and add your model file to the release or make an entry in the examples
and share your fork.
Other way is sharing with Google drive, dropbox or any public file sharing would be great. Thanks.
from gotch.
I have quick test your example and putting forward pass inside ts.NoGrad()
to complete shutdown the grad accumulation (due to ts.Randn() op by default set grad to true), I also increase size of tensor to expose any modest leak and it seems to be fine. My box memory seem to be stable for at least 1M cycles.
package main
import (
"fmt"
"github.com/sugarme/gotch"
"github.com/sugarme/gotch/ts"
)
func main() {
TestModel()
}
func TestModel() {
N := 1_000_000_000
m, err := ts.ModuleLoad("test_full_save.pt")
if err != nil {
panic(err)
}
m.SetEval()
for i := 0; i < N; i++ {
// tf := ts.MustRand([]int64{1, 7}, gotch.Float, gotch.CPU)
tf := ts.MustRand([]int64{1024, 7}, gotch.Float, gotch.CPU)
ts.NoGrad(func() {
res, err := m.Forward(tf)
if err != nil {
panic(err)
}
res.MustDrop()
})
tf.MustDrop()
if i%1000 == 0 {
fmt.Printf("Done %d \n", i)
}
}
}
Please always handle error as well. Let's me know if that's fine in your box.
A note that when putting forward() in a for loop particularly for Go in CPU, we should see some spiky fluctuation of memory consuming.
from gotch.
@sugarme I use valgrind ,it still find memory leak,I re-wrote your code and found through stress testing that the memory is still growing, but the QPS has not increased.
from gotch.
@sugarme My service over 5000QPS/Per node,it's easy to reach 1M cycles, It‘s a 20C/32G node
from gotch.
@sugarme Sorry, on the way home just now, the service using gotch has been online. Now the cluster will be restarted regularly every day to ensure that there will be no OOM. The service code is not a for loop, it is calculated once per request. I used valgrind to run 100 loops and detected a memory leak of 18B. 1 I did a stress test for 2 days last week. The service memory increased to 95% of the memory and then OOM restarted.
here is my online code,requset will send a [][]float64 array and I need change it to [][]float64 tensor
xPredict := tensors["x"].([][]float32)
for _, v := range xPredict {
modelInput = append(modelInput, v...)
}
tf := ts.MustOfSlice(modelInput).MustView([]int64{int64(len(xPredict)), int64(len(xPredict[0]))}, true)
forward, err := model.Forward(tf)
if err != nil {
log.V2.Error().Str("local inf fail").With(ctx).Error(err).Emit()
} else {
//toString, _ := forward.ToString(10)
log.V2.Info().With(ctx).Str("local inf success").Emit()
}
from gotch.
![image](https://private-user-images.githubusercontent.com/28078734/333236253-f32b24b5-b6c9-42f2-8dc8-3fb34f5ac641.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk0MjcyNTIsIm5iZiI6MTcxOTQyNjk1MiwicGF0aCI6Ii8yODA3ODczNC8zMzMyMzYyNTMtZjMyYjI0YjUtYjZjOS00MmYyLThkYzgtM2ZiMzRmNWFjNjQxLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjI2VDE4MzU1MlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTE5YjMzMDgyMDFhMzc1NjJjODdhNWYzNmI2MTY1NDIxM2Y3ZGVmMDUyYWMxOGFlOTQ2MWM0ZTQ5OTg4MDhkYTUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.WzocdWvKXcjeASFB1SWXxaq_fouA5D4Pk2YpTdkCyoo)
from gotch.
I suspect it's related to closed issue #102
It would be great if you could try with the suggested solution:
func Rand(...) (...) {
var untypedPtr uintptr
ptr := (*lib.Ctensor)(unsafe.Pointer(&untypedPtr))
// Some C call that stores an allocated tensor at *ptr.
retVal = &Tensor{ctensor: *ptr}
return retVal, err
}
Thank you.
from gotch.
@sugarme I tried, but it does not work, you can use valgrind, it still memory leak,and I did stress test,memory still going up
from gotch.
@sugarme According to the above changes, there is still a memory leak. Please help me solve it.
from gotch.
Related Issues (20)
- Q) Is there a function or method to clear the cached memory? HOT 1
- Float64Values() shows an error 'Unsupported Go type: []float64' HOT 2
- Releasing tensor causes segmentation fault error
- v2.0 support HOT 3
- Possible Memory Leak From C.malloc(0) HOT 3
- how to load model pytorch_model.bin HOT 5
- Indexing documentation HOT 1
- Can't build project with gotch v0.9.0 HOT 3
- Production use for gotch just for inference HOT 3
- Concurrency issue in generating tensor name in newTensor HOT 3
- Can memory leak in tensor-generated.go because of malloc(0) ? HOT 8
- Memory Leak in JIT Model under Multi-Goroutine Environment HOT 9
- how can i convert gocv.Mat image data to Tensor ? HOT 2
- `*ts.CModule` does not implement `ts.Module` interface correctly
- Cannot Run the Application with Cgo HOT 1
- Cannot Run the Application using Libtorch 2.1 (CPU) Docker Image
- Consider using build tags instead of a bash script HOT 6
- TestOptimizer is flaky
- install steps on macos HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gotch.