CR 6860309 - solaris timing issue on thread startup
David Holmes
david.holmes at oracle.com
Mon Nov 14 00:42:16 UTC 2011
Alan,
On 12/11/2011 9:58 PM, Alan Bateman wrote:
> On 11/11/2011 16:56, Gary Adams wrote:
>> CR 6860309 - TEST_BUG: Insufficient sleep time in
>> java/lang/Runtime/exec/StreamsSurviveDestroy.java
>>
>> A timing problem is reported for slow solaris systems for this
>> test to start up a process and systematically torture the underlying
>> threads processing data from the running process.
>>
>> On my fast solaris machine I can not reproduce the error,
>> but it is reasonable to assume that on a slower machine there
>> could be scheduling issues that could delay the thread startup
>> past the designated 100 millisecond delay in the main thread.
>>
>> This webrev suggests gating the process destruction until both
>> worker threads are alive.
>>
>> http://cr.openjdk.java.net/~gadams/6860309/
>>
>>
> -Xcomp on a slow machine, always fun when testing the untestable.
>
> I agree with David but I don't think there is perfect solution. I would
> suggest using a CountDownLatch or other synchronization so that the main
> thread waits until the Copier thread is just about to do the read. Then
> do a sleep in the main thread before invoking the destroy method. I
> suspect that is the best that you can do as can't be guaranteed that the
> Copier thread is blocked in the underlying read.
Will the exec'd process block until the copier threads read from its
output streams? If not then the copier threads (well stdin anyway) could
read their input and have terminated before the main thread even reaches
the original sleep() call.
I don't think this test can be written correctly as-is. Even using a
CountDownLatch won't help because you have to sync with two copier
threads, so the first could be finished before the second signals the latch.
I would think we would need to exec our own process (a Java one of
course) that assists with the synchronization issue - ie by not
terminating until it receives an input token. At least that way we know
the copier threads can not proceed passed the read() calls, even if we
can't be 100% certain they are in the read at the time the process is
destroyed.
Gary: while fixing timing bugs is a worthwhile goal in terms of test
stability etc it is rarely if ever "low hanging fruit" as you have found.
David
> -Alan.
More information about the core-libs-dev
mailing list