RFR: JDK-8032050: TEST_BUG: java/rmi/activation/Activatable/shutdownGracefully/ShutdownGracefully.java fails intermittently
Tristan Yan
tristan.yan at oracle.com
Sun Jan 26 04:57:00 UTC 2014
Hi Stuart
Thank you for your review and suggestion.
Yes, since this failure mode is very hard to be reproduced. I guess it's
make sense to do some hack. And I also noticed in
ActivationLibrary.rmidRunning. It does try to look up ActivationSystem
but doesn't check if it's null. So I add the logic to make sure we will
look up the non-null ActivationSystem. Also I did some cleanup if you
don't mind. Add a waitFor(long timeout, TimeUnit unit) for JavaVM. Which
we can have a better waitFor control.
I appreciate you can review the code again.
http://cr.openjdk.java.net/~tyan/JDK-8032050/webrev.01/
Thank you
Tristan
On 01/25/2014 10:20 AM, Stuart Marks wrote:
> On 1/23/14 10:34 PM, Tristan Yan wrote:
>> Hi All
>> Could you review the bug fix for JDK-8032050.
>>
>> http://cr.openjdk.java.net/~tyan/JDK-8032050/webrev.00/
>>
>> Description:
>> This rare happened failure caused because when RMID starts. It don't
>> guarantee
>> sun.rmi.server.Activation.startActivation finishes.
>> Fix by adding a iterative getSystem with a 5 seconds timeout.
>
> Hi Tristan,
>
> Adding a timing/retry loop into this test isn't the correct approach
> for fixing this test.
>
> The timing loop isn't necessary because there is already a timing loop
> in RMID.start() in the RMI test library. (There's another timing loop
> in ActivationLibrary.rmidRunning() which should probably be removed.)
> So the intent of this library call is that it start rmid and wait for
> it to become ready. That logic doesn't need to be added to the test.
>
> In the bug report JDK-8032050 you had mentioned that the
> NullPointerException was suspicious. You're right! I took a look and
> it seemed like it was related to JDK-8023541, and I added a note to
> this effect to the bug report. The problem here is that rmid can come
> up and transiently return null instead of the stub of the activation
> system. That's what JDK-8023541 covers. I think that rmid itself needs
> to be fixed, though modifying the timing loop in the RMI test library
> to wait for rmid to come up *and* for the lookup to return non-null is
> an easy way to fix the problem. (Or at least cover it up.)
>
> The next step in the analysis is to determine, given that
> ActivationLibrary.getSystem can sometimes return null, whether this
> has actually caused this test failure. This is pretty easy to
> determine; just hack in a line "system = null" in the right place and
> run the test. I've done this, and the test times out and the output
> log is pretty much identical to the one in the bug report. (I
> recommend you try this yourself.) So I think it's fairly safe to say
> that the problem in JDK-8023541 has caused the failure listed in
> JDK-8032050.
>
> I can see a couple ways to proceed here. One way is just to close this
> out as a duplicate of JDK-8023541 since that bug caused this failure.
>
> Another is that this test could use some cleaning up. This bug
> certainly covers a failure, but the messages emitted are confusing and
> in some cases completely wrong. For example, the "rmid has shutdown"
> message at line 180 is incorrect, because in this case rmid is still
> running and the wait() call has timed out. Most of the code here can
> be replaced with calls to various bits of the RMI test library. There
> are a bunch of other things in this test that could be cleaned up as
> well.
>
> It's up to you how you'd like to proceed.
>
> s'marks
More information about the core-libs-dev
mailing list