Code Review Request 7142596: RMI JPRT tests are failing

Mon Jul 16 22:54:57 UTC 2012

Here's webrev.05 
(http://cr.openjdk.java.net/~dmocek/7142596/webrev.05).  Changes:

- Simplify createRegistryOnUnusedPort()
- Simplify getUnusedRandomPort()
- Add isReservedPort() method to determine if a port is a reserved port 
# (not just one of the FIXED_PORT's, but also any port which shouldn't 
be used).

Passes all tests.

Darryl

On 07/16/2012 08:42 AM, Stuart Marks wrote:
> Hi Darryl,
>
> Sorry, I'm going to have to ask you to update this again. The issues 
> are in TestLibrary.java and are all related to catching exceptions and 
> throwing new exception instances, discarding the original exception 
> (and probably relevant diagnostic information) in the process.
>
> Let's get together today and work on it. I expect it will take only 
> half an hour or so to knock things out.
>
> s'marks
>
>
>
>
> On 7/13/12 1:40 PM, Darryl Mocek wrote:
>>
>> On 07/12/2012 05:13 PM, Stuart Marks wrote:
>>> OK, I took a look at the revised webrev. Most stuff is good. A 
>>> couple changes
>>> are necessary though.
>>>
>>>
>>> *** MultipleRegistries.java
>>>
>>>
>>> Not a big deal, but the comment at lines 65-67 is no longer necessary.
>> Comment removed.
>>>
>>>
>>> *** TestLibrary.java
>>>
>>>
>>> I think the reserved port range stuff still needs to be fixed up.
>>>
>>> The comment at lines 81-82 talks about a range 64000-64100 which 
>>> isn't really
>>> present anywhere. The comment should instead say that PORT_MIN and 
>>> PORT_MAX
>>> should be kept adjusted so that they specify the range of ports 
>>> defined by
>>> the constants following. Make sure to do this adjustment; PORT_MAX 
>>> is 64002
>>> here.
>> I reserved ports 64000-64100 (even though only 64001-64005 are 
>> currently being
>> used) for fixed port tests (for tests which require fixed ports) so 
>> any future
>> tests which require fixed ports can be assured there are ports 
>> available within
>> a range for their use. In fixing the tests, I saw that those who 
>> wrote tests
>> and used fixed ports grabbed random port numbers all over the place. 
>> Providing
>> a reserved range allows them to just grab the next number and to keep 
>> the fixed
>> port numbers in a group, even when new tests are added. If the ports are
>> defined in a central place, like TestLibrary, then hopefully 
>> developers will
>> come to TestLibrary to reserve the ports rather then define them in 
>> the tests
>> themselves, like was done previously. I chose 100 ports for
>> "future-proofing"...I don't think 100 is too many, but I've reduced 
>> it to 10
>> and lowered the max to 64010. I was also thinking of looking for 
>> tests which
>> used fixed ports outside of RMI tests and standardizing those ports 
>> here, but
>> that's possible future work and I'm not sure it's necessary.
>>>
>>> The previous webrev had PORT_MIN and PORT_MAX specify a range of 
>>> 1024-64000
>>> as the *allowed* range for ports returned by getUnusedRandomPort(). 
>>> But as I
>>> described previously, this is difficult to enforce given the varying 
>>> port
>>> allocation schemes on different OSes.
>>>
>>> Instead I think we need to invert the sense of the range 
>>> PORT_MIN..PORT_MAX
>>> and make it be the range of ports reserved by tests that require 
>>> fixed ports
>>> (64001 through 64005). Then, getUnusedRandomPort() needs to be 
>>> changed to
>>> *disallow* ports in this range, that is, to retry if the port is 
>>> within this
>>> range. Thus the condition on the while-loop has to be inverted, 
>>> something like
>>>
>>> while (... unusedRandomPort >= PORT_MIN && unusedRandomPort <= 
>>> PORT_MAX)
>> Yes, this is correct and it has been changed.
>>>
>>> So, how could getUnusedRandomPort() have worked in its present 
>>> state? Well I
>>> think it ends up retrying 10 times, then at the end of the loop
>>> unusedRandomPort is actually some legal port number -- albeit 
>>> possibly one
>>> within the disallowed range -- and almost certainly not -1, and so 
>>> the method
>>> returns it and the caller uses it. So, the success-testing logic 
>>> here will
>>> also have to change so that it retests whether the port is in the 
>>> disallowed
>>> range.
>> Correct...done.
>>>
>>> The comment on getUnusedRandomPort() also needs to be updated to 
>>> reflect its
>>> new policy on the reserved range, as well as throwing an exception on
>>> failure. Also (nitpick) it should say "less than" not "less then".
>> Yes, I always do this. :(
>>>
>>> Looking at getUnusedRandomPort() again more closely [sorry] I think 
>>> the catch
>>> of Exception that does nothing is suspicious. I'm not sure why 
>>> getting a
>>> ServerSocket would fail, but if it does (maybe the system is out of 
>>> ports??)
>>> we should just throw out to the caller. Perhaps ideally we'd just 
>>> have this
>>> method throw IOException, but if that requires all callers to be 
>>> updated, it
>>> might be easier just to catch IOException and throw a 
>>> RuntimeException that
>>> wraps the caught IOException.
>> I see no harm in trying again if it fails. If we have run out of 
>> ports, then
>> possibly one or more used ports will be closed and freed and a 
>> subsequent try
>> will succeed. I don't know all of the cases where a failure might occur.
>> Anyway, if this is the prevailing way of handling this, I'll change it.
>>>
>>> Similar comments apply to the catch(Exception)/do-nothing code in 
>>> the other
>>> utility methods.
>> Now throwing RuntimeException in getRegistryPort.
>>>
>>> Certainly getRegistryPort() should just throw (or possibly wrap and 
>>> rethrow)
>>> any exception caught.
>>>
>>> For createRegistryOnUnusedPort(), the catching of ExportException is 
>>> handled
>>> properly. The second catch clause of the outer try-block, and the 
>>> catch of
>>> the inner try-block, both ignore exceptions. The code will then end up
>>> retrying. Is it reasonable that retrying in these cases will result 
>>> in a
>>> different outcome? My guess is that it's more likely that something is
>>> seriously wrong that will cause all the retries to fail, in which 
>>> case this
>>> method will discard the exceptions it has just caught and throw a *new*
>>> RemoteException instance.
>> This was a holdover from webrev.00 which should have been removed. I 
>> agree with
>> Alan and you here that multiple tries of 
>> LocateRegistry.createRegistry(0) is
>> unnecessary. I've modified this to make a single attempt at using
>> LocateRegistry.createRegistry(0) and if it fails, make a single 
>> attempt at
>> getting an unused random port and creating a registry on this port, then
>> failing with a RuntimeException.
>>>
>>> I'm particularly sensitive to this; as you might recall a couple 
>>> weeks ago I
>>> was having a wrestling match with the jstatd tests (I finished the 
>>> match, but
>>> I'm not sure I won). The primary problem was that several layers of 
>>> code
>>> would catch and discard exceptions, which made diagnosing the problem
>>> incredibly difficult. So, I'd recommend removing the "do nothing" 
>>> catch clauses.
>>>
>>> Thinking further about createRegistryOnUnusedPort(), I'm not sure that
>>> retrying 10 (or some other number of times) actually makes sense. It 
>>> does for
>>> getUnusedRandomPort(), which has to retry in order to get a port 
>>> outside the
>>> disallowed range. But for creating a registry, createRegistry(0) 
>>> will usually
>>> work the first time. If it throws ExportException, it does so for a 
>>> specific
>>> reason, so we should retry once on an unused random port. But if 
>>> this still
>>> fails, don't think retrying repeatedly makes sense.
>> See above.
>>
>> Changes pass all tests. Updated webrev at
>> http://cr.openjdk.java.net/~dmocek/7142596/webrev.04
>>
>> Darryl
>>>
>>> s'marks
>>>
>>>
>>>
>>>
>>> On 7/10/12 2:14 PM, Darryl Mocek wrote:
>>>>
>>>> On 07/09/2012 04:41 PM, Stuart Marks wrote:
>>>>> OK, here's the review for the second half of the files in the 
>>>>> webrev. I saw
>>>>> your reply to the first half (to which I'll reply separately), and 
>>>>> I don't
>>>>> think there's anything here that's affected by them.
>>>>>
>>>>>
>>>>> *** AppleUserImpl.java
>>>>> *** ApplicationServer.java
>>>>>
>>>>>
>>>>> REGISTRY_PORT should be a local variable; also rename to use mixed 
>>>>> case.
>>>> Changed to a private registryPort (see next issue).
>>>>>
>>>>> Eh, whoops, after looking at ApplicationServer.java I see that it 
>>>>> accesses
>>>>> the REGISTRY_PORT field directly. This is why direct field access 
>>>>> is a bad
>>>>> idea. :-) Now the question is, has REGISTRY_PORT been initialized 
>>>>> before
>>>>> ApplicationServer needs it? It turns out that it has been -- but 
>>>>> only in some
>>>>> cases.
>>>>>
>>>>> It seems like the test is trying to support two modes, one that 
>>>>> runs in two
>>>>> threads in the same JVM, and the other that runs in two separate 
>>>>> JVMs. If
>>>>> they are in separate JVMs, things will no longer work because in 
>>>>> the JVM that
>>>>> runs ApplicationServer.main(), AppleUserImpl.REGISTRY_PORT will be 
>>>>> -1. I
>>>>> suspect that our test environment doesn't support the separate JVM 
>>>>> mode, but
>>>>> it seems unwise to break it.
>>>>>
>>>>> I'd suggest that in two-JVM mode the classes fall back to using a
>>>>> "well-known" default registry port number, which in this case 
>>>>> seems like 2006.
>>>>>
>>>>> In single-JVM mode, AppleUserImpl creates an instance of 
>>>>> ApplicationServer,
>>>>> so I'd suggest adding a method to ApplicationServer that allows 
>>>>> AppleUserImpl
>>>>> to store the randomly-assigned registry port number into it, 
>>>>> overriding the
>>>>> default value.
>>>>>
>>>>> This seems like this is the simplest way to preserve the two modes of
>>>>> operation but to support the random port selection model we're 
>>>>> trying to
>>>>> achieve.
>>>> Rather then going the "fixed port" route, which is what we're 
>>>> trying to get
>>>> away from, I've changed the implementation of both AppletUserImpl's 
>>>> and
>>>> ApplicationServer so ApplicationServer requires a port and 
>>>> AppleUserImpl
>>>> supplies the port on construction of ApplicationServer. I thought 
>>>> of modifying
>>>> ApplicationServer's constructor to create a port using
>>>> TestLibrary.getUnusedRandomPort, but decided requiring a port is 
>>>> better as
>>>> ApplicationServer's job is to look for already exported AppleUser 
>>>> objects.
>>>>>
>>>>>
>>>>> *** activatable/EchoImpl.java
>>>>>
>>>>>
>>>>> int registryPort = new 
>>>>> Integer(System.getProperty("rmi.registry.port"));
>>>>>
>>>>> I'd suggest using Integer.parseInt() instead of new Integer(). Not 
>>>>> a huge
>>>>> deal, but it's probably more conventional to use parseInt() and it 
>>>>> avoids
>>>>> boxing.
>>>>>
>>>>> One could probably do Integer.getInteger("rmi.registry.port") but 
>>>>> this is
>>>>> seems pretty obscure to me even though it's more succinct.
>>>>>
>>>>> The same also applies to the following:
>>>>> - HelloImpl.java
>>>>> - unicast/EchoImpl.java
>>>>> - ShutdownImpl.java
>>>>> - SelfTerminator.java
>>>>> - CheckFQDNClient.java
>>>>> - LeaseLeakClient.java
>>>>> - dgcDeadLock/TestImpl.java
>>>>>
>>>> Integer.parseInt returns a primitive (which is what the return is 
>>>> assigned to)
>>>> and it appears Integer.parseInt is "faster" then creating a new 
>>>> Integer.
>>>> Changed to Integer.parseInt in all places referenced.
>>>>>
>>>>> *** FiniteGCLatency.java
>>>>>
>>>>>
>>>>> The pattern here is a bit odd, as the test creates the registry, 
>>>>> throws away
>>>>> the returned reference, and then calls getRegistry() to get 
>>>>> another Registry
>>>>> reference. It *seems* like they're identical references, but in 
>>>>> fact the
>>>>> first is apparently a reference to the actual Registry 
>>>>> implementation,
>>>>> whereas the second is a remote stub.
>>>>>
>>>>> The tests seem to do all the actual work using the remote stub, 
>>>>> which seems
>>>>> proper.
>>>>>
>>>>> This is confusing, though, as it looks like there's a redundant 
>>>>> Registry
>>>>> reference now. This might lead someone in the future to "simplify" 
>>>>> the test
>>>>> by not getting the remote stub, which in turn might invalidate 
>>>>> some tests.
>>>>> (In fact I was going to suggest this but I decided to investigate 
>>>>> further
>>>>> first.)
>>>>>
>>>>> At the very least, I'd suggest renaming the variable that holds 
>>>>> the newly
>>>>> created Registry to something like "registryImpl" to make it clear 
>>>>> that it's
>>>>> different from the thing returned by getRegistry(), even though 
>>>>> they have a
>>>>> the same time.
>>>>>
>>>>> Another possibility is to rearrange the TestLibrary API so that 
>>>>> there is a
>>>>> single utility method that combines createRegistryOnUnusedPort() and
>>>>> getRegistryPort(). That is, it creates a new registry and simply 
>>>>> returns the
>>>>> port on which it was created, not a reference to the registry 
>>>>> implementation.
>>>>> I don't think the registry implementation is actually ever used by 
>>>>> the tests,
>>>>> and it might simplify things a bit as well.
>>>>>
>>>>> Possibly similar issues with:
>>>>> - UnreferencedContext.java
>>>>> - NoConsoleOutput.java
>>>>>
>>>>>
>>>>> *** HttpSocketTest.java
>>>>>
>>>>>
>>>>> Unnecessary call to TestLibrary.getUnusedRandomPort()?
>>>> Looks like extra code left over from the change from using
>>>> TestLibrary.getUnusedRandomPort/LocateRegistry.createRegistry(randomPort) 
>>>> to
>>>> TestLibrary.createRegistryOnUnusedPort...removed.
>>>>>
>>>>>
>>>>> *** TestLibrary.java
>>>>>
>>>>> Mostly pretty straightforward, but I do have some concerns about 
>>>>> the random
>>>>> port selection and a potential clash with the "reserved port 
>>>>> range" as
>>>>> defined in this test library.
>>>>>
>>>>> The getUnusedRandomPort() method attempts to get a socket within 
>>>>> the range
>>>>> (1024,64000) and will retry 10 times if it can't. Unfortunately, 
>>>>> MacOS
>>>>> allocates ports more-or-less sequentially in the range [49152, 
>>>>> 65536) which
>>>>> means that when the kernel's internal counter gets to 64000,
>>>>> getUnusedRandomPort()'s retries will fail, causing tests to fail 
>>>>> until the
>>>>> counter wraps around.
>>>>>
>>>>> Other systems behave differently; Linux seems to allocate them 
>>>>> randomly in
>>>>> the range [32768,65536) and Windows XP SP3 allocates them 
>>>>> sequentially in the
>>>>> range (1024,5000]. So it's probably not a problem for them.
>>>>>
>>>>> I think the thing to do is to check only for "reserved ports" that 
>>>>> are
>>>>> actually used by tests here. These are in the range [64001,64005]. In
>>>>> getUnusedRandomPort(), it should only need to retry if the 
>>>>> returned port is
>>>>> within this narrow, reserved range. If it's anything else it 
>>>>> should be OK.
>>>> I'll try setting the range this narrow, but I don't know how many 
>>>> sequential
>>>> tests will be run at a time and I'm concerned 5 is too few. The 
>>>> -concurrency
>>>> option on jtreg allows you to specify how many concurrent tests 
>>>> will be run. We
>>>> should have enough test ports reserved to satisfy any concurrency 
>>>> request. I've
>>>> run the tests with -concurrency=8 (I have a dual-core system 
>>>> showing 4 CPU's).
>>>> I tried reducing the port range to 64001/64002 and concurrency=4 
>>>> and all passed
>>>> fine, so maybe we're OK with just 5.
>>>>>
>>>>> On another topic, the three utility methods here:
>>>>> - createRegistryOnUnusedPort
>>>>> - getRegistryPort
>>>>> - getUnusedRandomPort
>>>>>
>>>>> all catch exceptions and then return illegal values (null or -1), 
>>>>> sometimes
>>>>> after printing some diagnostic information. The problem is that 
>>>>> the caller
>>>>> will attempt to soldier on with the illegal return value and will 
>>>>> stumble
>>>>> over something later, such as NullPointerException or
>>>>> IllegalArgumentException. This will probably be obvious but it's 
>>>>> equally
>>>>> likely to be confusing.
>>>>>
>>>>> Since these utilities are all called from test code, and the tests 
>>>>> are
>>>>> relying on them to return valid results, I'd suggest just throwing 
>>>>> exceptions
>>>>> from the utility methods if they fail. This will (should) cause 
>>>>> the test to
>>>>> error out, but that's OK, as it never could have succeeded anyway 
>>>>> if the
>>>>> utility call had failed.
>>>> I already modified createRegistryOnUnusedPort to throw an exception 
>>>> as part of
>>>> the MultipleRegistries change. I'm now throwing a RuntimeException for
>>>> getRegistryPort and getUnusedRandomPort if they fail.
>>>>
>>>> See updated webrev: 
>>>> http://cr.openjdk.java.net/~dmocek/7142596/webrev.03
>>>>
>>>> Darryl
>>>>>
>>>>> s'marks
>>>>>
>>>>>
>>>>>
>>>>> On 7/5/12 2:22 PM, Darryl Mocek wrote:
>>>>>> Hello core-libs. Please review this webrev to fix Bugs #7142596 
>>>>>> and 7161503.
>>>>>> Webrev can be found here:
>>>>>> http://cr.openjdk.java.net/~dmocek/7142596/webrev.02.
>>>>>> This commit fixes concurrency issues with the RMI tests.
>>>>>>
>>>>>> - Added TestLibrary.createRegistryOnUnusedPort method. This 
>>>>>> creates an
>>>>>> RMIRegistry on an unused port. It will try up to 10 times before 
>>>>>> giving up.
>>>>>> - Added a TestLibrary.getRegistryPort(Registry) method to get the 
>>>>>> port number
>>>>>> of the registry.
>>>>>> - Changed almost all tests from using hard port numbers to using 
>>>>>> random port
>>>>>> numbers for running the RMI Registry and RMID.
>>>>>> - Removed othervm from those tests which don't need it.
>>>>>> - Added parameters for tests which spawn a separate VM to pass RMI
>>>>>> Registry and
>>>>>> RMID ports in cases where needed.
>>>>>> - Added PropertyPermission to security policy files where needed.
>>>>>> - Removed java/rmi and sun/rmi from tests which cannot be run 
>>>>>> concurrently.
>>>>>> - Added java/rmi/Naming to list of tests which cannot be run 
>>>>>> concurrently.
>>>>>>
>>>>>> Thanks,
>>>>>> Darryl
>>>>>>
>>>>
>>>>
>>
>>