Comments (19)
We will try to reproduce it on our lab with enabling recording and setting memory limits on docker container.
from janus-gateway.
may we have some partial debug_locks implementation only for locks on session creation to debug it?
from janus-gateway.
Under some circuumstances janus stops to handle any api calls via ws
Do other requests work though? E.g. "info" request or attaching a handle for a different plugin. That could help understanding if the deadlock lies in the transport (WebSocket) or somewhere else.
but create_session has timeouts and janus doesn't work
Do you mean plugin->create_session
? If that's the case and you are using the videoroom plugin, then it's very likely a deadlock on videoroom sessions_mutex
.
We tried to add debug_locks, but we got situation what janus server eats all memory and IO and stuck, so we got problems with performance before issue appeared
The debug_locks has a massive impact on verbosity, that's probably the reason of performance issue. If you are using Janus in a containerized environment with cgroups v2, the huge log file increasing might increase the memory allocated (due to pages being kept in buffer) and might explain the OOM.
may we have some partial debug_locks implementation only for locks on session creation to debug it?
There is no such option available. You might try customizing the code and just logs the sessions_mutex
lock/unlock (given that my previous guess is correct)
from janus-gateway.
Do you mean plugin->create_session? If that's the case and you are using the videoroom plugin, then it's very likely a deadlock on videoroom sessions_mutex.
I mean the following request isn't working
{
"janus" : "create",
"transaction" : "<random alphanumeric string>"
}
Do other requests work though? E.g. "info" request or attaching a handle for a different plugin. That could help understanding if the deadlock lies in the transport (WebSocket) or somewhere else.
I didn't tried this, but I can try next time this happens.
from janus-gateway.
Taking a deeper inspection at the logs you shared, the issue seems to start with some errors:
Didn't receive audio for more than 1 second(s)...
Didn't receive video for more than 1 second(s)...
SRTCP unprotect error: srtp_err_status_replay_fail
Error setting ICE locally
Those reminds me of situations where the host memory is exhausting (like the issue about cgroups v2 I already mentioned).
Are you running janus in containers with a memory limit?
Are you doing long lasting recordings?
If you suspect a memory leak, try running your janus app in a lab environment under valgrind
.
from janus-gateway.
Are you running janus in containers with a memory limit?
we are running it in containers, but without memory limits set
Are you doing long lasting recordings?
yes
from janus-gateway.
All right, this is a long shot, but can you check the status of the memory in the containers?
cat /sys/fs/cgroup/system.slice/docker-<long ID>.scope/memory.max
cat /sys/fs/cgroup/system.slice/docker-<long ID>.scope/memory.stat
replace long-id with the id of the docker container.
If you see the file
bytes amount approaching the memory.max
limit, then containers under cgroups v2 will start having issues with memory allocation, since they share the same memory for network and file buffers.
We have proposed a potential patch (in the PR) for long lasting recordings where basically we flush the buffers and tells the kernel to release the used pages every ~2MB of written data.
from janus-gateway.
Got this on another customer. We are using this on our production, so in case we got it on our servers, we will enable debug_locks, since we have no such high load as on customers servers, we will try to catch it.
from janus-gateway.
got info from our customer, after janus restart
# cat /sys/fs/cgroup/system.slice/docker-9efbad4747099c922f9d8c1dc16a74e029e871a1e21cdcea1b5b8bfc7de47546.scope/memory.max
max
# cat /sys/fs/cgroup/system.slice/docker-9efbad4747099c922f9d8c1dc16a74e029e871a1e21cdcea1b5b8bfc7de47546.scope/memory.stat
anon 21372928
file 4861952
kernel_stack 294912
pagetables 131072
percpu 288
sock 0
shmem 0
file_mapped 0
file_dirty 0
file_writeback 0
swapcached 0
anon_thp 0
file_thp 0
shmem_thp 0
inactive_anon 21368832
active_anon 4096
inactive_file 4861952
active_file 0
unevictable 0
slab_reclaimable 444944
slab_unreclaimable 357096
slab 802040
workingset_refault_anon 0
workingset_refault_file 0
workingset_activate_anon 0
workingset_activate_file 0
workingset_restore_anon 0
workingset_restore_file 0
workingset_nodereclaim 0
pgfault 48048
pgmajfault 0
pgrefill 0
pgscan 0
pgsteal 0
pgactivate 0
pgdeactivate 0
pglazyfree 0
pglazyfreed 0
thp_fault_alloc 0
thp_collapse_alloc 0
from janus-gateway.
we unable to reproduce it on our test stand, but it consistently repeats on customers envs
from janus-gateway.
got info from our customer, after janus restart
Those data are useless after a restart, we need them while the issue exists (btw, the max
output is wrong, it should be a number).
we unable to reproduce it on our test stand, but it consistently repeats on customers envs
If you suspect a deadlock wait for the issue and then provide the output of this gdb
snippet:
gdb -p "$(pidof janus)" --batch -ex "set print pretty on" -ex "set pagination off" -ex "thread apply all bt full"
from janus-gateway.
got new log for issue, unfortunatelly customers janus is running in docker with autoheal, so we can't get gdb output yet, but got something strange in log:
janus-1 | 2024-05-15T08:41:17.866180063Z Stopping server, please wait...
janus-1 | 2024-05-15T08:41:17.866215282Z Ending sessions timeout watchdog...
janus-1 | 2024-05-15T08:41:17.866220027Z Sessions watchdog stopped
janus-1 | 2024-05-15T08:41:17.866348571Z Closing transport plugins:
janus-1 | 2024-05-15T08:41:17.866984830Z WebSockets thread ended
full log in attach, restart of janus by autoheal happened at 2024-05-15T08:41:28
janus-20240515.txt
from janus-gateway.
I see a whole bunch of Error setting ICE locally
errors, which suggest a possibly broken management of handles and PeerConnections on the client side. A new PeerConnection is attempted on a handle that already has a handle (which seems to be confirmed by mentions of restart, which is what Janus thinks it is because it gets an SDP with unknown ICE credentials).
from janus-gateway.
I see a whole bunch of
Error setting ICE locally
errors, which suggest a possibly broken management of handles and PeerConnections on the client side. A new PeerConnection is attempted on a handle that already has a handle (which seems to be confirmed by mentions of restart, which is what Janus thinks it is because it gets an SDP with unknown ICE credentials).
It seems to happen after restart of it. Our clients are not connected directly to janus with ws and don't manage handles to janus, we use server side ws connections to janus and server-side management of handles. Now it seems clients tried to ice restart subscribe connections after janus restart, but we don't handle janus restarts correctly now on server - we don't detect janus restarts and don't recreate necessary rooms and logic in our app isn't recovering now from such situation.
from janus-gateway.
Related Issues (20)
- [0.x] [janus.js] black screen or streaming stops periodically when connected through WebRTC (Chromium only) HOT 4
- [1.x] Issue with outbound calls for SIP Gateway HOT 3
- [1.x] videoroom: remote publisher doesn't release RTP/RTCP ports HOT 10
- [1.x] Does AudioBridge supports multiple participant join from RTP in a single room . Getting `Already in a room` Error for 2nd participant HOT 3
- [1.x] rtp_port_range not being respected HOT 2
- [1.x] Crash / segfault on 1.2.2 HOT 5
- [1.x] Crash / segfault on 1.2.3 HOT 10
- [1.x] inconsistency happen when more than 6 users join at same time HOT 2
- [1.x]inconsistency happen when more than 6 users join at same time with different browsers (web and mobile) HOT 8
- [1.x] videoroom: support of string_ids/string_ids_user HOT 2
- [1.x] Increased event handlers SIP logging buffer size one more time or make it configurable HOT 7
- [1.x] build failure on macos HOT 3
- [0.x] Possible memory leaks found with libasan latest HOT 12
- [1.x] SIP plugin "Couldn't connect audio RTP? -- 22 (Invalid argument)" HOT 5
- [1.x] Transport Websockets plugin binds to ANY network interface HOT 4
- [1.x] AudioBridge Talking Event is not firing HOT 8
- Failed to compile janus on an arm machine inside docker container with ubuntu 24.04 as image HOT 1
- [1.x] Im used Janus in docker contaner with docker-compose HOT 1
- [1.x] Commandline option "--version" no longer exists HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from janus-gateway.