Giter VIP home page Giter VIP logo

Comments (16)

tomaswolf avatar tomaswolf commented on June 2, 2024 1

I guess we can close this issue then? I think it's clear by now that this is a race condition in the way Apache Karaf closes its shell/session, not some bug in Apache MINA SSHD.

from mina-sshd.

tomaswolf avatar tomaswolf commented on June 2, 2024

Can you provide a full server-side log at DEBUG or TRACE level including timestamps and everything from the moment the sever sent the EOF? (I.e., of the EOF message, exit code message, and everything after that.)

From just these log excerpts it's not possible to even guess what might going on.

from mina-sshd.

MrEasy avatar MrEasy commented on June 2, 2024

Can you provide a full server-side log at DEBUG or TRACE level (...)

Here comes the problem with that: As soon as debug logging is active, the problem is not reproducible anymore (at least on lots of tries).
So it seems to be a race-condition.
Will try to reproduce it and then provide such a log.

For what it's worth, here is a log without the problem arising (exit code 0) with the Closing gracefully:
sshdserver-exit-code-0.log

from mina-sshd.

MrEasy avatar MrEasy commented on June 2, 2024

(...) Will try to reproduce it and then provide such a log. (...)

Attaching a log with problem occuring. To note again: Very less likely to occur with debug logging active.

Doing a diff between the problematic and successful ones shows the following (left=exit-code-0, right=exit-code-1). The order of the 2 unregisterChannel and the 2 close calls differ.
image

Log:
sshdserver-exit-code-1.log

from mina-sshd.

tomaswolf avatar tomaswolf commented on June 2, 2024

From the original report it already appears that the server closes the network connection before the final SSH_MSG_CHANNEL_CLOSE handshake. However, I'm missing a number of other expected log lines in these logs. Are these filtered somehow? (Possibly through the logging config, which might set DEBUG or TRACE only for certain paths?) If so, please provide unfiltered logs.

In both logs (successful or not) it is in any case very strange to see a session closing immediately after having sent the SSH_MSG_CHANNEL_CLOSE message. That should not happen. It looks as if the session gets shut down before the channel has been fully closed. Did you re-configure CoreModuleProperties.CHANNEL_CLOSE_TIMEOUT?

from mina-sshd.

MrEasy avatar MrEasy commented on June 2, 2024

Struggled a bit with the logger config, so log files contain only org.apache.sshd.server.session so far.
Will try to include the rest as well.

Did you re-configure CoreModuleProperties.CHANNEL_CLOSE_TIMEOUT?

No, CHANNEL_CLOSE_TIMEOUT is not set and should default to 5 sec.

from mina-sshd.

MrEasy avatar MrEasy commented on June 2, 2024

Let it run in a loop until occurence - with activated trace logs for all sshd-packages, it took 674 iterations to hit it.

Attaching log, including last successful commands and finishing with the problematic ones.
There are 2 exceptions showing-up at the end, especially org.apache.sshd.common.SshException: Write attempt on closing session: SSH_MSG_CHANNEL_CLOSE

logs-exit-1.zip

Test-script (just for info): plink-test.txt

from mina-sshd.

tomaswolf avatar tomaswolf commented on June 2, 2024

Thank you. That is useful and gives me a starting point to analyze this deeper. The log confirms my initial suspicion: it looks as if closing the channel by mistake also closes the session. Plus there's a race between this closing of the session (which is too early) and sending the CLOSE message for the channel. If the session closes earlier, the client doesn't get the CLOSE message and complains, plus we get that "write on closing session" exception on the server. If sending the CLOSE is earlier, the client might be happy, especially if it manages to send back its own CLOSE before the network connection goes down.

I'll try to come up with some unit test, and I'll have to dig deeper to figure out why the session is closed at all at that point. It should not be closed yet.

Not sure the race and bug are new since 2.10.0; but it's possible that changes made since then make an old bug surface now more frequently.

from mina-sshd.

MrEasy avatar MrEasy commented on June 2, 2024

Thank you!

Not knowing the code in detail, but maybe this change in 2.11.0 somehow could be related (just a wild guess based on changes there) #410

from mina-sshd.

tomaswolf avatar tomaswolf commented on June 2, 2024
  1. The issue is Windows-specific. The exception message "The specified network name is no longer available" comes from the Windows Java NIO2 implementation.
  2. The exception indicates that the client unexpectedly closed the session. Thus the server's read waiting on the client's SSH_MSG_CHANNEL_CLOSE is interrupted, which gives this exception. Then the server-side session is closed.
  3. I am not able to reproduce this error with plink 0.77 or 0.80 running against an Apache MINA SSHD 2.11.0 server.
  4. I do not see any bug or race in the channel closing code in Apache MINA SSHD (server part). I do see two (minor) bugs in our client-side code, but they cannot and do not affect the server-side handling of shells/commands.

Can you share a minimal server implementation with which you can reproduce this problem?

from mina-sshd.

MrEasy avatar MrEasy commented on June 2, 2024

Hi,
Created a simple example server code and can confirm, that I am also not able to reproduce the issue with that.

The SSH-server which shows the issue is the one that Apache Karaf starts-up for its client-shell.
You see I oriented on their way of starting it up, code, difference between my simple test and Karaf-integrated SSH-server are the implementations for authentication etc. so issue could also lay in the combination with those.
Will look into that a bit more.

from mina-sshd.

tomaswolf avatar tomaswolf commented on June 2, 2024

We may be getting somewhere now... I find the ShellFactoryImpl interesting.

This destroy() method kills the SSH session. (Immediate close; which is what we see in your logs.) I suspect this is the culprit: if this closing of the session occurs before the client has sent back its SSH_MSG_CHANNEL_CLOSE reply, then putty won't be happy.

Now destroy() is normally being called when the ChannelSession has been closed (the SSH_MSG_CHANNEL_CLOSE request/reply dance having been done). But I also see it being passed to the sessionFactory in line 106/107. I don't know when that runs. If that closes the SSH session before the SSH_MSG_CHANNEL_CLOSE exchange has completed, then putty will complain.

So let me modify my point (2) from above: it's not the client that unexpectedly closes the session, it's the server itself via this destroy().

from mina-sshd.

MrEasy avatar MrEasy commented on June 2, 2024

Ah, remembering this change apache/karaf#1427 which was added after upgrade to sshd 2.5.1. Maybe that should not be done anymore in combination with the recent versions.

from mina-sshd.

tomaswolf avatar tomaswolf commented on June 2, 2024

Line 122 itself is not a problem. I don't know why it was added or whether it is still necessary; I don't remember any bug report about that Karaf issue to Apache MINA SSHD. In any case, when destroy(ChannelSession channel) is called from Apache MINA SSHD, the SSH_MSG_CHANNEL_CLOSE exchange has been fully done. So unless you know it's no longer needed and why I would recommend leaving line 122 as is.

I'm worried about line 107, which passes this destroy() method as a "closeCallback". Without digging what where why and on which thread calls that closeCallback when, I would suspect that it gets invoked by Karaf while Apache MINA SSHD is still waiting for the client's SSH_MSG_CHANNEL_CLOSE to arrive. I would try passing null as a callback to SessionFactory.create() and leave the rest to the Apache MINA SSHD framework.

from mina-sshd.

MrEasy avatar MrEasy commented on June 2, 2024

Thanks for this hint!

I was testing with having the closeCallback null in line 107, but that did not change anything for me.

While looking around however, I came across closing of the session in ShellCommand and was wondering if there was accidentally the field named session closed instead of local variable also named session. Made this changes as a test: code and test works perfectly now - no wrong exit codes anymore.
Now I'm not entirely sure this is the issue, maybe I just created a resource leak, but asked JB Onofré to share his thoughts on this.

from mina-sshd.

MrEasy avatar MrEasy commented on June 2, 2024

Agree with that - thank you for your time.

from mina-sshd.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.