Giter VIP home page Giter VIP logo

Comments (29)

sabi0 avatar sabi0 commented on June 26, 2024 1

I think the problem you're getting is not due to the JDK but to something preventing processes from binding to local ports. I suspect a firewall rule, perhaps?

Then changing gradle\testing\randomization\policies\tests.policy from

permission java.net.SocketPermission "127.0.0.1:0", "accept,listen,resolve";

to

permission java.net.SocketPermission "127.0.0.1:1024-", "accept,listen,resolve";

wouldn't have helped, I suppose.
But it did.

What do you think of catching this AccessControlException and wrapping it with AssumptionViolatedException ?

from lucene.

rmuir avatar rmuir commented on June 26, 2024 1

sorry, i'm late to the party. yes, the entire purpose of this is to ensure tests only use ephemeral ports when binding. otherwise there will be port conflicts. so we should not be lenient about it.

seems like any issue here is in the JDK not respecting the operating system's configuration: not in lucene.

from lucene.

dweiss avatar dweiss commented on June 26, 2024

This must be something on your system preventing socket opening. No idea what though.

from lucene.

dweiss avatar dweiss commented on June 26, 2024

I've checked the repro line and seed, works for me. On a second glance, it looks like the security policy of your Java is restricting socket accept. Is it possible that you have such a policy in place (some corporate setup, perhaps)?

from lucene.

sabi0 avatar sabi0 commented on June 26, 2024

I have a vanilla Amazon Corretto JDK 17. And it has this:

    // allows anyone to listen on dynamic ports
    permission java.net.SocketPermission "localhost:0", "listen";

As far as I can see this is a common practice. Zulu JDK 11 also has the exact same permission. As well as JDK 8.

There is also gradle\testing\randomization\policies\tests.policy in the project itself that opens it a bit more:

  // TestLockFactoriesMultiJVM opens a random port on 127.0.0.1 (port 0 = ephemeral port range):
  permission java.net.SocketPermission "127.0.0.1:0", "accept,listen,resolve";

from lucene.

sabi0 avatar sabi0 commented on June 26, 2024

After changing the project's tests.policy to

permission java.net.SocketPermission "127.0.0.1:1024-", "accept,listen,resolve";

the tests pass on my side.

The special port value 0 refers to the entire ephemeral port range. This is a fixed range of ports a system may use to allocate dynamic ports from. The actual range may be system dependent.

I guess the "ephemeral range" on my system does not include 200xx ports?

Shall I open a PR with this change?

from lucene.

sabi0 avatar sabi0 commented on June 26, 2024

Though I see the server also uses a dynamic port:

      s.bind(new InetSocketAddress(hostname, 0));

Maybe the "ephemeral range" in the server JVM is different from the ranges in the clients' JVMs?

from lucene.

dweiss avatar dweiss commented on June 26, 2024

You're the first person among many (including CIs) to have experienced this problem, so I'd look at what exactly is causing this first - is it the JDK distribution, is it something else? Port "0" indicates any available port so it should work fine in my opinion - I'm not a network guru though.

from lucene.

uschindler avatar uschindler commented on June 26, 2024

Hi,
please let's not change this without understanding what the problem is. We have not seen this issue anywhere (not even on Solr where this is used for almost every test).
Can you check with another non-corretto JDK? I have the feeling that maybe corretto applied some changes to the permissions. If thats the case, report it to them.

Though I see the server also uses a dynamic port:

      s.bind(new InetSocketAddress(hostname, 0));

Maybe the "ephemeral range" in the server JVM is different from the ranges in the clients' JVMs?

The client and the server are the same JVM version with same options.

from lucene.

dweiss avatar dweiss commented on June 26, 2024

I've downloaded coretto (Windows 10):

>java -version
openjdk version "11.0.21" 2023-10-17 LTS
OpenJDK Runtime Environment Corretto-11.0.21.9.1 (build 11.0.21+9-LTS)
OpenJDK 64-Bit Server VM Corretto-11.0.21.9.1 (build 11.0.21+9-LTS, mixed mode)

and I ran the repro line:

gradlew test --tests TestStressLockFactories.testNativeFSLockFactory -Dtests.seed=2D42F3FDF1FAF153 -Dtests.locale=sg -Dtests.timezone=Australia/Lindeman -Dtests.asserts=true -Dtests.file.encoding=UTF-8

Works for me. It's got to me something else than the JDK, I guess?

from lucene.

sabi0 avatar sabi0 commented on June 26, 2024

How did you run it with Java 11?
When I try that I get

ERROR: java version must be between 17 and 21, your version: 11

from lucene.

sabi0 avatar sabi0 commented on June 26, 2024

I ran the test with Oracle's Java 17 and it failed in the same way:

  1> Listening on /127.0.0.1:12778...
   >     java.security.AccessControlException: access denied ("java.net.SocketPermission" "127.0.0.1:12779" "accept,resolve")
   >         at __randomizedtesting.SeedInfo.seed([2D42F3FDF1FAF153:31931CEF68004D20]:0)
   >         at java.base/java.security.AccessControlContext.checkPermission(AccessControlContext.java:485)

from lucene.

dweiss avatar dweiss commented on June 26, 2024

How did you run it with Java 11? When I try that I get

ERROR: java version must be between 17 and 21, your version: 11

Hmm... I might have been on branch_9x - didn't check, sorry.

I think the problem you're getting is not due to the JDK but to something preventing processes from binding to local ports. I suspect a firewall rule, perhaps? Nobody else is getting this exception... Can you try it on a different system, perhaps?

from lucene.

dweiss avatar dweiss commented on June 26, 2024

I checked again, this time making sure it's Java 21 and the main branch:
image

from lucene.

uschindler avatar uschindler commented on June 26, 2024

Hi,

Sorry no. The test is fine. This test and many more exist like this since years. There's no need to change them or the policy file. Passing 0 as port number on the policy file is correct, because we want to prevent anybody to write a test with a fixed port number. All ports must be empheral.

Unless you give a clear explanation why it fails for you and there is no workaround, we won't change this test. This is definitely a problem in your setup. This test does not fail with any JDK out there.

Thanks,
Uwe

from lucene.

uschindler avatar uschindler commented on June 26, 2024

You can always work around it by running the tests without security manager. Read the gradle documentation about the responsible system properties, e.g. -Ptests.useSecurityManager=false.

from lucene.

sabi0 avatar sabi0 commented on June 26, 2024

My understanding of the situation is the following:
Dynamic / ephemeral is only applicable to a local port. Thus permission 127.0.0.1:0/listen allows to bind to a dynamic local port.
But when accepting a connection from some remote port local system's "ephemeral port range" is not applicable. And the permission 127.0.0.1:0/accept does not work.

I have no idea why this only happens to me. And I understand your position of not wanting to change the test or the policy.

Just in case this might help someone else the tests also pass with the following permissions:

  permission java.net.SocketPermission "127.0.0.1:0", "listen,resolve";
  permission java.net.SocketPermission "127.0.0.1:*", "accept,resolve";

from lucene.

dweiss avatar dweiss commented on June 26, 2024

I'd like to understand why your system is different than mine (or Uwe's)... It's great that you've found a workaround but it doesn't explain what's happening and - as Uwe mentioned - it's been working fine for everyone for years - there's something different in your setup that requires this workaround and it'd be interesting to figure out what it is!

from lucene.

dweiss avatar dweiss commented on June 26, 2024

Do you use multiple network interfaces? Are these normal network adapters or something else? It's really unfortunate that it doesn't work for you out of the box. Strange!

from lucene.

sabi0 avatar sabi0 commented on June 26, 2024

My assumption was wrong. When the permission has port 0 the remote port number is validated against the local system's "ephemeral port range":

        if (policyLow == 0 && policyHigh == 0) {
            // ephemeral range only
            return targetLow >= ephemeralLow && targetHigh <= ephemeralHigh;
        }

The range itself is defined by jdk.net.ephemeralPortRange.low / jdk.net.ephemeralPortRange.high system properties.
And when those are not set the range defaults to 49152 - 65535:
https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/net/SocketPermission.java#L1228

So on my system this prints "false":

		SocketPermission policy = new SocketPermission("127.0.0.1:0", "accept,listen");
		SocketPermission request = new SocketPermission("127.0.0.1:20022", "accept");
		System.out.println(policy.implies(request));

and this prints "true":

		SocketPermission policy = new SocketPermission("127.0.0.1:0", "accept,listen");
		SocketPermission request = new SocketPermission("127.0.0.1:50123", "accept");
		System.out.println(policy.implies(request));

Probably the "ephemeral port range" in the network stack and in the SocketPermission are somehow out of sync?

I found this snippet in DNSDatagramSocketFactory.open() javadoc:

if binding a socket to port 0 binds it to a random port) then the underlying OS implementation is used. Otherwise, this method will allocate and bind a socket on a randomly selected ephemeral port in the dynamic range.

So when OS allocates a random port it does not necessarily fall in the JVM's ephemeral port range?
This does not break 127.0.0.1:0/listen because the permission is checked before binding (when the actual port number is still not known). But 127.0.0.1:0/accept is out of luck.

from lucene.

uschindler avatar uschindler commented on June 26, 2024

Note: I corrected this answer.

Hi,
it looks like your linux kernel has an extended ephemeral port range. The RFC defines it to be 49152-65535 (see RFC 6335).

The range itself is defined by jdk.net.ephemeralPortRange.low / jdk.net.ephemeralPortRange.high system properties.
And when those are not set the range defaults to 49152 - 65535:
https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/net/SocketPermission.java#L1228

This is not fully true. If the sysprops are undefined (which is by default), it only uses 49152-65535 on Windows (as Windows adheres to the standard). This default range is defined in PortConfig class which has several implementation depending on operating system. For Linux it uses this class: https://github.com/openjdk/jdk/blob/28c82bf18d85be00bea45daf81c6a9d665ac676f/src/java.base/unix/classes/sun/net/PortConfig.java#L36; for Windows it uses: https://github.com/openjdk/jdk/blob/28c82bf18d85be00bea45daf81c6a9d665ac676f/src/java.base/windows/classes/sun/net/PortConfig.java#L33

In short the default range is defined in a platform dependent way, but on Linux, it uses a hardcoded default range on linux, but later it also reads the platform's defaults (see below):

            case LINUX:
                defaultLower = 32768;
                defaultUpper = 61000;

This matches the defaults in Linux kernel variables (see also Linux source code):

# sysctl net.ipv4.ip_local_port_range
net.ipv4.ip_local_port_range = 32768    60999

As one can change those variables with the sysctl command or by using /etc/sysctl.conf or /etc/systcl.d, it also reads in native code the values from sysctl variables from /proc filesystem: https://github.com/openjdk/jdk/blob/28c82bf18d85be00bea45daf81c6a9d665ac676f/src/java.base/unix/native/libnet/portconfig.c#L50-L62

But this could fail, if for example the /proc/sys/net path is not available/readable (due to selinux, firewall, virus scanner/...). It then returns -1 and then the code falls back to the hardcoded default 32768-61000. So make sure that Java has enough access rights to read the /proc filesystem. If you use Docker images or similar you are on your own.

So please print what your current kernel uses by executing sysctl net.ipv4.ip_local_port_range and if it does not adhere to the default please fix your config. Alternatively set the system properties.

from lucene.

uschindler avatar uschindler commented on June 26, 2024

Oh I see you have Windows. On Windows it uses a hardcoded range. It is not dynamic, so it looks like your windows system has changed it away from the defaults!

We can't help with that, please fix your Windows installation or open bug report on OpenJDK that they make the Windows range dynamic.

from lucene.

uschindler avatar uschindler commented on June 26, 2024

On Windows the defaults can also be changed, but OpenJDK does not read those settings. Here is my Windows example (Windows 10):

> netsh int ipv4 show dynamicport tcp

Protocol tcp dynamic port range
---------------------------------
Start port      : 49152
Number of ports : 16384

So either fix your Windows network stack to use the defaults or open a bug report to fix the hard coded range in OpenJDK.

Heres how to change the settings (you may need to persist them): https://learn.microsoft.com/en-us/troubleshoot/windows-server/networking/default-dynamic-port-range-tcpip-chang

from lucene.

sabi0 avatar sabi0 commented on June 26, 2024

Indeed, my machine has this:

Protocol tcp Dynamic Port Range
---------------------------------
Start Port      : 1024
Number of Ports : 20977

I do not know why this was changed by our corp. IT. I guess they had a reason to.

I agree that OpenJDK not reading the OS settings is the root cause. But suggesting everyone to "fix their Windows" or wait for some Java 27 to fix this or otherwise run the tests with the SecurityManager completely off is unnecessarily rigid IMO.

I believe this is the perfect fit for the AssumptionViolatedException.
The test assumes the ports are allocated within JDK's ephemeral port range. Now we know for a fact that this might not be the case on Windows. And that's the assumption violation, not the test failure.
I can open a PR for that if you want. Otherwise I suggest us all to move on.

P.S. Another similar case I ran into recently: creating a symlink on Windows with UAC requires elevated permissions causing Elasticsearch test to fail.

from lucene.

uschindler avatar uschindler commented on June 26, 2024

Sorry, your computer does not behave standards conform. Please report this to your organisation.

There is no reason to change anything in Lucene.

from lucene.

uschindler avatar uschindler commented on June 26, 2024

Please also run Apache Solr test. It uses same config. Applying your proposed fix will disable all integration tests. This is not the correct way to fix this.

So an "assume" here isn't the correct way to fix it.

from lucene.

uschindler avatar uschindler commented on June 26, 2024

P.S.: in Lucene we won't throw assumption violation exceptions. We have LTC#assumeTrue for this.

from lucene.

uschindler avatar uschindler commented on June 26, 2024

Thanks Robert.

To clarify: This test is so important for data safety in Lucene that silently disabling it on highly incompetent sysadmin's decisions is a No-Go.

Please don't open any more issues or PRs about this. Thanks.

from lucene.

dweiss avatar dweiss commented on June 26, 2024

But suggesting everyone to "fix their Windows" or wait for some Java 27 to fix this or otherwise run the tests with the SecurityManager completely off is unnecessarily rigid IMO.

Thank you for a thorough investigation into the cause of the failure - it is really enlightening. I agree with the others that making exceptions for broken system setups is probably not the right way to go. What happened to you has never been reported before, so feel unique. :) The Lucene test case setup is quite strict but this strictness has a purpose - find the problems early. This issue is a testament to how weird the real world systems can be and that the test infrastructure is actually doing its job quite well!

If you need a more permanent workaround, you can turn off the security manager in your locally generated gradle.properties - sure, you won't be running the full test suite but any PR will do it anyway, so it seems fine. Thanks again for your time spent on this.

from lucene.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.