RFR: JDK-8032050: TEST_BUG: java/rmi/activation/Activatable/shutdownGracefully/ShutdownGracefully.java fails intermittently

Tristan Yan tristan.yan at oracle.com
Sun Jan 26 04:57:00 UTC 2014


Hi Stuart
Thank you for your review and suggestion.
Yes, since this failure mode is very hard to be reproduced. I guess it's 
make sense  to do some hack. And I also noticed in 
ActivationLibrary.rmidRunning. It does try to look up ActivationSystem 
but doesn't check if it's null. So I add the logic to make sure we will 
look up the non-null ActivationSystem. Also I did some cleanup if you 
don't mind. Add a waitFor(long timeout, TimeUnit unit) for JavaVM. Which 
we can have a better waitFor control.
I appreciate you can review the code again.
http://cr.openjdk.java.net/~tyan/JDK-8032050/webrev.01/
Thank you
Tristan


On 01/25/2014 10:20 AM, Stuart Marks wrote:
> On 1/23/14 10:34 PM, Tristan Yan wrote:
>> Hi All
>> Could you review the bug fix for JDK-8032050.
>>
>> http://cr.openjdk.java.net/~tyan/JDK-8032050/webrev.00/
>>
>> Description:
>> This rare happened failure caused because when RMID starts. It don't 
>> guarantee
>> sun.rmi.server.Activation.startActivation finishes.
>> Fix by adding a iterative getSystem with a 5 seconds timeout.
>
> Hi Tristan,
>
> Adding a timing/retry loop into this test isn't the correct approach 
> for fixing this test.
>
> The timing loop isn't necessary because there is already a timing loop 
> in RMID.start() in the RMI test library. (There's another timing loop 
> in ActivationLibrary.rmidRunning() which should probably be removed.) 
> So the intent of this library call is that it start rmid and wait for 
> it to become ready. That logic doesn't need to be added to the test.
>
> In the bug report JDK-8032050 you had mentioned that the 
> NullPointerException was suspicious. You're right! I took a look and 
> it seemed like it was related to JDK-8023541, and I added a note to 
> this effect to the bug report. The problem here is that rmid can come 
> up and transiently return null instead of the stub of the activation 
> system. That's what JDK-8023541 covers. I think that rmid itself needs 
> to be fixed, though modifying the timing loop in the RMI test library 
> to wait for rmid to come up *and* for the lookup to return non-null is 
> an easy way to fix the problem. (Or at least cover it up.)
>
> The next step in the analysis is to determine, given that 
> ActivationLibrary.getSystem can sometimes return null, whether this 
> has actually caused this test failure. This is pretty easy to 
> determine; just hack in a line "system = null" in the right place and 
> run the test. I've done this, and the test times out and the output 
> log is pretty much identical to the one in the bug report. (I 
> recommend you try this yourself.) So I think it's fairly safe to say 
> that the problem in JDK-8023541 has caused the failure listed in 
> JDK-8032050.
>
> I can see a couple ways to proceed here. One way is just to close this 
> out as a duplicate of JDK-8023541 since that bug caused this failure.
>
> Another is that this test could use some cleaning up. This bug 
> certainly covers a failure, but the messages emitted are confusing and 
> in some cases completely wrong. For example, the "rmid has shutdown" 
> message at line 180 is incorrect, because in this case rmid is still 
> running and the wait() call has timed out. Most of the code here can 
> be replaced with calls to various bits of the RMI test library. There 
> are a bunch of other things in this test that could be cleaned up as 
> well.
>
> It's up to you how you'd like to proceed.
>
> s'marks




More information about the core-libs-dev mailing list