CR 6860309 - solaris timing issue on thread startup

David Holmes david.holmes at oracle.com
Mon Nov 14 00:42:16 UTC 2011


Alan,

On 12/11/2011 9:58 PM, Alan Bateman wrote:
> On 11/11/2011 16:56, Gary Adams wrote:
>> CR 6860309 - TEST_BUG: Insufficient sleep time in
>> java/lang/Runtime/exec/StreamsSurviveDestroy.java
>>
>> A timing problem is reported for slow solaris systems for this
>> test to start up a process and systematically torture the underlying
>> threads processing data from the running process.
>>
>> On my fast solaris machine I can not reproduce the error,
>> but it is reasonable to assume that on a slower machine there
>> could be scheduling issues that could delay the thread startup
>> past the designated 100 millisecond delay in the main thread.
>>
>> This webrev suggests gating the process destruction until both
>> worker threads are alive.
>>
>> http://cr.openjdk.java.net/~gadams/6860309/
>>
>>
> -Xcomp on a slow machine, always fun when testing the untestable.
>
> I agree with David but I don't think there is perfect solution. I would
> suggest using a CountDownLatch or other synchronization so that the main
> thread waits until the Copier thread is just about to do the read. Then
> do a sleep in the main thread before invoking the destroy method. I
> suspect that is the best that you can do as can't be guaranteed that the
> Copier thread is blocked in the underlying read.

Will the exec'd process block until the copier threads read from its 
output streams? If not then the copier threads (well stdin anyway) could 
read their input and have terminated before the main thread even reaches 
the original sleep() call.

I don't think this test can be written correctly as-is. Even using a 
CountDownLatch won't help because you have to sync with two copier 
threads, so the first could be finished before the second signals the latch.

I would think we would need to exec our own process (a Java one of 
course) that assists with the synchronization issue - ie by not 
terminating until it receives an input token. At least that way we know 
the copier threads can not proceed passed the read() calls, even if we 
can't be 100% certain they are in the read at the time the process is 
destroyed.

Gary: while fixing timing bugs is a worthwhile goal in terms of test 
stability etc it is rarely if ever "low hanging fruit" as you have found.

David

> -Alan.



More information about the core-libs-dev mailing list