Code Review Request 7142596: RMI JPRT tests are failing

Fri Jul 13 00:13:59 UTC 2012

OK, I took a look at the revised webrev. Most stuff is good. A couple changes 
are necessary though.

*** MultipleRegistries.java

Not a big deal, but the comment at lines 65-67 is no longer necessary.

*** TestLibrary.java

I think the reserved port range stuff still needs to be fixed up.

The comment at lines 81-82 talks about a range 64000-64100 which isn't really 
present anywhere. The comment should instead say that PORT_MIN and PORT_MAX 
should be kept adjusted so that they specify the range of ports defined by the 
constants following. Make sure to do this adjustment; PORT_MAX is 64002 here.

The previous webrev had PORT_MIN and PORT_MAX specify a range of 1024-64000 as 
the *allowed* range for ports returned by getUnusedRandomPort(). But as I 
described previously, this is difficult to enforce given the varying port 
allocation schemes on different OSes.

Instead I think we need to invert the sense of the range PORT_MIN..PORT_MAX and 
make it be the range of ports reserved by tests that require fixed ports (64001 
through 64005). Then, getUnusedRandomPort() needs to be changed to *disallow* 
ports in this range, that is, to retry if the port is within this range. Thus 
the condition on the while-loop has to be inverted, something like

     while (... unusedRandomPort >= PORT_MIN && unusedRandomPort <= PORT_MAX)

So, how could getUnusedRandomPort() have worked in its present state? Well I 
think it ends up retrying 10 times, then at the end of the loop 
unusedRandomPort is actually some legal port number -- albeit possibly one 
within the disallowed range -- and almost certainly not -1, and so the method 
returns it and the caller uses it. So, the success-testing logic here will also 
have to change so that it retests whether the port is in the disallowed range.

The comment on getUnusedRandomPort() also needs to be updated to reflect its 
new policy on the reserved range, as well as throwing an exception on failure. 
Also (nitpick) it should say "less than" not "less then".

Looking at getUnusedRandomPort() again more closely [sorry] I think the catch 
of Exception that does nothing is suspicious. I'm not sure why getting a 
ServerSocket would fail, but if it does (maybe the system is out of ports??) we 
should just throw out to the caller. Perhaps ideally we'd just have this method 
throw IOException, but if that requires all callers to be updated, it might be 
easier just to catch IOException and throw a RuntimeException that wraps the 
caught IOException.

Similar comments apply to the catch(Exception)/do-nothing code in the other 
utility methods.

Certainly getRegistryPort() should just throw (or possibly wrap and rethrow) 
any exception caught.

For createRegistryOnUnusedPort(), the catching of ExportException is handled 
properly. The second catch clause of the outer try-block, and the catch of the 
inner try-block, both ignore exceptions. The code will then end up retrying. Is 
it reasonable that retrying in these cases will result in a different outcome? 
My guess is that it's more likely that something is seriously wrong that will 
cause all the retries to fail, in which case this method will discard the 
exceptions it has just caught and throw a *new* RemoteException instance.

I'm particularly sensitive to this; as you might recall a couple weeks ago I 
was having a wrestling match with the jstatd tests (I finished the match, but 
I'm not sure I won). The primary problem was that several layers of code would 
catch and discard exceptions, which made diagnosing the problem incredibly 
difficult. So, I'd recommend removing the "do nothing" catch clauses.

Thinking further about createRegistryOnUnusedPort(), I'm not sure that retrying 
10 (or some other number of times) actually makes sense. It does for 
getUnusedRandomPort(), which has to retry in order to get a port outside the 
disallowed range. But for creating a registry, createRegistry(0) will usually 
work the first time. If it throws ExportException, it does so for a specific 
reason, so we should retry once on an unused random port. But if this still 
fails, don't think retrying repeatedly makes sense.

s'marks

On 7/10/12 2:14 PM, Darryl Mocek wrote:
>
> On 07/09/2012 04:41 PM, Stuart Marks wrote:
>> OK, here's the review for the second half of the files in the webrev. I saw
>> your reply to the first half (to which I'll reply separately), and I don't
>> think there's anything here that's affected by them.
>>
>>
>> *** AppleUserImpl.java
>> *** ApplicationServer.java
>>
>>
>> REGISTRY_PORT should be a local variable; also rename to use mixed case.
> Changed to a private registryPort (see next issue).
>>
>> Eh, whoops, after looking at ApplicationServer.java I see that it accesses
>> the REGISTRY_PORT field directly. This is why direct field access is a bad
>> idea. :-) Now the question is, has REGISTRY_PORT been initialized before
>> ApplicationServer needs it? It turns out that it has been -- but only in some
>> cases.
>>
>> It seems like the test is trying to support two modes, one that runs in two
>> threads in the same JVM, and the other that runs in two separate JVMs. If
>> they are in separate JVMs, things will no longer work because in the JVM that
>> runs ApplicationServer.main(), AppleUserImpl.REGISTRY_PORT will be -1. I
>> suspect that our test environment doesn't support the separate JVM mode, but
>> it seems unwise to break it.
>>
>> I'd suggest that in two-JVM mode the classes fall back to using a
>> "well-known" default registry port number, which in this case seems like 2006.
>>
>> In single-JVM mode, AppleUserImpl creates an instance of ApplicationServer,
>> so I'd suggest adding a method to ApplicationServer that allows AppleUserImpl
>> to store the randomly-assigned registry port number into it, overriding the
>> default value.
>>
>> This seems like this is the simplest way to preserve the two modes of
>> operation but to support the random port selection model we're trying to
>> achieve.
> Rather then going the "fixed port" route, which is what we're trying to get
> away from, I've changed the implementation of both AppletUserImpl's and
> ApplicationServer so ApplicationServer requires a port and AppleUserImpl
> supplies the port on construction of ApplicationServer. I thought of modifying
> ApplicationServer's constructor to create a port using
> TestLibrary.getUnusedRandomPort, but decided requiring a port is better as
> ApplicationServer's job is to look for already exported AppleUser objects.
>>
>>
>> *** activatable/EchoImpl.java
>>
>>
>> int registryPort = new Integer(System.getProperty("rmi.registry.port"));
>>
>> I'd suggest using Integer.parseInt() instead of new Integer(). Not a huge
>> deal, but it's probably more conventional to use parseInt() and it avoids
>> boxing.
>>
>> One could probably do Integer.getInteger("rmi.registry.port") but this is
>> seems pretty obscure to me even though it's more succinct.
>>
>> The same also applies to the following:
>> - HelloImpl.java
>> - unicast/EchoImpl.java
>> - ShutdownImpl.java
>> - SelfTerminator.java
>> - CheckFQDNClient.java
>> - LeaseLeakClient.java
>> - dgcDeadLock/TestImpl.java
>>
> Integer.parseInt returns a primitive (which is what the return is assigned to)
> and it appears Integer.parseInt is "faster" then creating a new Integer.
> Changed to Integer.parseInt in all places referenced.
>>
>> *** FiniteGCLatency.java
>>
>>
>> The pattern here is a bit odd, as the test creates the registry, throws away
>> the returned reference, and then calls getRegistry() to get another Registry
>> reference. It *seems* like they're identical references, but in fact the
>> first is apparently a reference to the actual Registry implementation,
>> whereas the second is a remote stub.
>>
>> The tests seem to do all the actual work using the remote stub, which seems
>> proper.
>>
>> This is confusing, though, as it looks like there's a redundant Registry
>> reference now. This might lead someone in the future to "simplify" the test
>> by not getting the remote stub, which in turn might invalidate some tests.
>> (In fact I was going to suggest this but I decided to investigate further
>> first.)
>>
>> At the very least, I'd suggest renaming the variable that holds the newly
>> created Registry to something like "registryImpl" to make it clear that it's
>> different from the thing returned by getRegistry(), even though they have a
>> the same time.
>>
>> Another possibility is to rearrange the TestLibrary API so that there is a
>> single utility method that combines createRegistryOnUnusedPort() and
>> getRegistryPort(). That is, it creates a new registry and simply returns the
>> port on which it was created, not a reference to the registry implementation.
>> I don't think the registry implementation is actually ever used by the tests,
>> and it might simplify things a bit as well.
>>
>> Possibly similar issues with:
>> - UnreferencedContext.java
>> - NoConsoleOutput.java
>>
>>
>> *** HttpSocketTest.java
>>
>>
>> Unnecessary call to TestLibrary.getUnusedRandomPort()?
> Looks like extra code left over from the change from using
> TestLibrary.getUnusedRandomPort/LocateRegistry.createRegistry(randomPort) to
> TestLibrary.createRegistryOnUnusedPort...removed.
>>
>>
>> *** TestLibrary.java
>>
>> Mostly pretty straightforward, but I do have some concerns about the random
>> port selection and a potential clash with the "reserved port range" as
>> defined in this test library.
>>
>> The getUnusedRandomPort() method attempts to get a socket within the range
>> (1024,64000) and will retry 10 times if it can't. Unfortunately, MacOS
>> allocates ports more-or-less sequentially in the range [49152, 65536) which
>> means that when the kernel's internal counter gets to 64000,
>> getUnusedRandomPort()'s retries will fail, causing tests to fail until the
>> counter wraps around.
>>
>> Other systems behave differently; Linux seems to allocate them randomly in
>> the range [32768,65536) and Windows XP SP3 allocates them sequentially in the
>> range (1024,5000]. So it's probably not a problem for them.
>>
>> I think the thing to do is to check only for "reserved ports" that are
>> actually used by tests here. These are in the range [64001,64005]. In
>> getUnusedRandomPort(), it should only need to retry if the returned port is
>> within this narrow, reserved range. If it's anything else it should be OK.
> I'll try setting the range this narrow, but I don't know how many sequential
> tests will be run at a time and I'm concerned 5 is too few. The -concurrency
> option on jtreg allows you to specify how many concurrent tests will be run. We
> should have enough test ports reserved to satisfy any concurrency request. I've
> run the tests with -concurrency=8 (I have a dual-core system showing 4 CPU's).
> I tried reducing the port range to 64001/64002 and concurrency=4 and all passed
> fine, so maybe we're OK with just 5.
>>
>> On another topic, the three utility methods here:
>> - createRegistryOnUnusedPort
>> - getRegistryPort
>> - getUnusedRandomPort
>>
>> all catch exceptions and then return illegal values (null or -1), sometimes
>> after printing some diagnostic information. The problem is that the caller
>> will attempt to soldier on with the illegal return value and will stumble
>> over something later, such as NullPointerException or
>> IllegalArgumentException. This will probably be obvious but it's equally
>> likely to be confusing.
>>
>> Since these utilities are all called from test code, and the tests are
>> relying on them to return valid results, I'd suggest just throwing exceptions
>> from the utility methods if they fail. This will (should) cause the test to
>> error out, but that's OK, as it never could have succeeded anyway if the
>> utility call had failed.
> I already modified createRegistryOnUnusedPort to throw an exception as part of
> the MultipleRegistries change. I'm now throwing a RuntimeException for
> getRegistryPort and getUnusedRandomPort if they fail.
>
> See updated webrev: http://cr.openjdk.java.net/~dmocek/7142596/webrev.03
>
> Darryl
>>
>> s'marks
>>
>>
>>
>> On 7/5/12 2:22 PM, Darryl Mocek wrote:
>>> Hello core-libs. Please review this webrev to fix Bugs #7142596 and 7161503.
>>> Webrev can be found here: http://cr.openjdk.java.net/~dmocek/7142596/webrev.02.
>>> This commit fixes concurrency issues with the RMI tests.
>>>
>>> - Added TestLibrary.createRegistryOnUnusedPort method. This creates an
>>> RMIRegistry on an unused port. It will try up to 10 times before giving up.
>>> - Added a TestLibrary.getRegistryPort(Registry) method to get the port number
>>> of the registry.
>>> - Changed almost all tests from using hard port numbers to using random port
>>> numbers for running the RMI Registry and RMID.
>>> - Removed othervm from those tests which don't need it.
>>> - Added parameters for tests which spawn a separate VM to pass RMI Registry and
>>> RMID ports in cases where needed.
>>> - Added PropertyPermission to security policy files where needed.
>>> - Removed java/rmi and sun/rmi from tests which cannot be run concurrently.
>>> - Added java/rmi/Naming to list of tests which cannot be run concurrently.
>>>
>>> Thanks,
>>> Darryl
>>>
>
>