RFR(S): 8199811: com/sun/jdi/ProcessAttachTest.java fails intermittently: Remote thread failed for unknown reason

David Holmes david.holmes at oracle.com
Mon Aug 6 00:33:42 UTC 2018


Hi Chris,

On 4/08/2018 4:23 AM, Chris Plummer wrote:
> Hello,
> 
> Please review the following fix for JDK12:
> 
> http://cr.openjdk.java.net/~cjplummer/8199811/webrev.00
> 
> https://bugs.openjdk.java.net/browse/JDK-8199811
> 
> The root of the problem is that there is no code in the solaris or 
> windows AttachListener support that ensures that the listener is done 
> being initialized before attaching and  attempting to enqueue the first 
> command. The enqueue operation fails when is sees that the listener is 
> not attached yet.
> 
> I was able to force this failure to happen every time by adding a 10 
> second sleep in attach_listener_thread_entry() just before the call to 
> AttachListener::set_initialized(). This did not cause macosx or linux to 
> fail, but did make solaris fail (failures had not been noted previously) 
> and windows to fail (failures previously had been observed, but very 
> rarely).

I'm having trouble seeing the complete code paths here to understand the 
control flow for initialization and subsequent use. How do we get to the 
enqueue logic (that fails) if the initialization logic has not yet 
completed? Is the init logic asynchronous? (If so I would expect many 
more failures of this nature.)

> The proposed fix is to have the enqueue code sleep for up to 20 seconds 
> (in 1 second intervals) waiting for initialization to be complete. I 
> found this fixed the problem, even with the 10 second sleep in 
> attach_listener_thread_entry() still in place. A shorter sleep is 
> probably fine. I'm open to suggestions. Since this timing issue was so 
> rare, my guess is that a single 1 second sleep is likely to always fix 
> it, but since it is so hard to reproduce (without the 10 second sleep in 
> place), I can't say for sure.

That seems reasonable.

Not sure what the interruption issue is that you and Gary discussed. The 
os-level sleep function can only be interrupted by signals, and this 
thread shouldn't be receiving any signals in general.So it's not 
something I would be concerned about.

> Another approach to fixing this would be to use some sort of 
> synchronization between the init and enqueue code, like a condition 
> variable. I think I know how to do this with pthread_cond_wait() and 
> pthread_cond_signal(), although it gets to be a bit tricky since I'd 
> probably have to make the enqueue code create the condvar if 
> initialization is not yet complete, and then have the initialization 
> code check for the existence of the condvar when initialization is 
> complete, and signal on it if it exists. I'm pretty sure there's a 
> potential for race condition in there. I haven't thought it through 
> enough to say for sure. I also looked a bit at condition variable 
> support on windows, and it looks like I could do something similar there 
> too. However, I think the sleep approach I have implement is far more 
> straight forward and less error prone, so I'd prefer to stick with it if 
> others approve.

Can't comment on this without understanding exactly where the race is.

Thanks,
David

> 
> thanks,
> 
> Chris
> 
> 


More information about the serviceability-dev mailing list