RFR: JDK-8032050: TEST_BUG: java/rmi/activation/Activatable/shutdownGracefully/ShutdownGracefully.java fails intermittently

Stuart Marks stuart.marks at oracle.com
Sat Jan 25 02:20:03 UTC 2014


On 1/23/14 10:34 PM, Tristan Yan wrote:
> Hi All
> Could you review the bug fix for JDK-8032050.
>
> http://cr.openjdk.java.net/~tyan/JDK-8032050/webrev.00/
>
> Description:
> This rare happened failure caused because when RMID starts. It don't guarantee
> sun.rmi.server.Activation.startActivation finishes.
> Fix by adding a iterative getSystem with a 5 seconds timeout.

Hi Tristan,

Adding a timing/retry loop into this test isn't the correct approach for fixing 
this test.

The timing loop isn't necessary because there is already a timing loop in 
RMID.start() in the RMI test library. (There's another timing loop in 
ActivationLibrary.rmidRunning() which should probably be removed.) So the intent 
of this library call is that it start rmid and wait for it to become ready. That 
logic doesn't need to be added to the test.

In the bug report JDK-8032050 you had mentioned that the NullPointerException 
was suspicious. You're right! I took a look and it seemed like it was related to 
JDK-8023541, and I added a note to this effect to the bug report. The problem 
here is that rmid can come up and transiently return null instead of the stub of 
the activation system. That's what JDK-8023541 covers. I think that rmid itself 
needs to be fixed, though modifying the timing loop in the RMI test library to 
wait for rmid to come up *and* for the lookup to return non-null is an easy way 
to fix the problem. (Or at least cover it up.)

The next step in the analysis is to determine, given that 
ActivationLibrary.getSystem can sometimes return null, whether this has actually 
caused this test failure. This is pretty easy to determine; just hack in a line 
"system = null" in the right place and run the test. I've done this, and the 
test times out and the output log is pretty much identical to the one in the bug 
report. (I recommend you try this yourself.) So I think it's fairly safe to say 
that the problem in JDK-8023541 has caused the failure listed in JDK-8032050.

I can see a couple ways to proceed here. One way is just to close this out as a 
duplicate of JDK-8023541 since that bug caused this failure.

Another is that this test could use some cleaning up. This bug certainly covers 
a failure, but the messages emitted are confusing and in some cases completely 
wrong. For example, the "rmid has shutdown" message at line 180 is incorrect, 
because in this case rmid is still running and the wait() call has timed out. 
Most of the code here can be replaced with calls to various bits of the RMI test 
library. There are a bunch of other things in this test that could be cleaned up 
as well.

It's up to you how you'd like to proceed.

s'marks



More information about the core-libs-dev mailing list