RFR: 8211851: (ch) java/nio/channels/AsynchronousSocketChannel/StressLoopback.java times out (aix) [v2]

Martin Doerr mdoerr at openjdk.org
Mon Dec 16 14:24:39 UTC 2024


On Tue, 10 Dec 2024 13:48:16 GMT, Joachim Kern <jkern at openjdk.org> wrote:

>> Test /jdk/java/nio/channels/AsynchronousSocketChannel/StressLoopback.java times out on aix.
>> 
>> The problem is that in queueControlEvent() for every event added to the controlQueue one byte is written to the ctlSP pipe to wakeup the poll thread running in poll(). The poll() function of the poll thread is just taking this one byte from the pipe but handling then all events in the controlQueue. Afterwards the controlQueue is empty, but the pipe is only 1 byte shorter. This is an asymmetric and therefore false behavior.  So I changed the drain1() to a drain() meaning drain_all() which heals this asymmetry.
>> The problem leading to the timeout was, that more and more (StressLoopback: nomen est omen) wakeup bytes were written to the pipe, but the poll thread runs in much lower frequency. So, at some time the pipe was full on OS side, resulting in a blocking Pollset.interrupt(ctlSp[1]) in queueControlEvent(). Unfortunately, at this moment the controlQueue sync object is in locked state, resulting that the poll thread cannot acquire this lock to read from the pipe. No reading from the pipe means the pipe remains full and the dead lock is there.
>> Additionally, I found a flaw in the emulator of the ‘one shot’ semantic. Here the original implantation was not save against deleting a just registered fd before the next poll. To fix this I separated the emulation of the ‘one shot’ from the rest of the poll() actions. Now, the ‘one shot’ emulation is encapsulated in the controlLock.lock()block, which guards the poll itself.
>
> Joachim Kern has updated the pull request incrementally with one additional commit since the last revision:
> 
>   use IOUtil.configureBlocking

Looks reasonable. I have a couple of questions and suggestions.

src/java.base/aix/classes/sun/nio/ch/AixPollPort.java line 32:

> 30: import sun.nio.ch.Pollset;
> 31: import sun.nio.ch.IOUtil;
> 32: import java.io.FileDescriptor;

Imports are typically sorted alphabetically.

src/java.base/aix/classes/sun/nio/ch/AixPollPort.java line 310:

> 308:             try {
> 309:                 for (;;) {
> 310:                     int n, m;

I'd declare `m` in the `try` block. It's not used outside of it.

src/java.base/aix/classes/sun/nio/ch/AixPollPort.java line 322:

> 320:                             // the file descriptor here.
> 321:                             if (fd != sp[0] && fd != ctlSp[0]) {
> 322:                                 synchronized (controlQueue) {

Ok, the lock order `controlLock` before `controlQueue` seems to be consistent. Otherwise, we might get deadlocks. (Exception is `controlLock.tryLock()` which is ok.)
Shouldn't `synchronized (controlQueue)` better get removed from `processControlQueue()`? The lock has always been acquired before it is called. If it didn't, it would make the lock order inconsistent and possibly cause deadlocks.

src/java.base/aix/native/libnio/ch/Pollset.c line 170:

> 168: 
> 169:     for (;;) {
> 170:         int n = read(fd, buf, sizeof(buf));

Shouldn't we handle `errno == EINTR`? Other code uses `RESTARTABLE`.

-------------

PR Review: https://git.openjdk.org/jdk/pull/22633#pullrequestreview-2506278053
PR Review Comment: https://git.openjdk.org/jdk/pull/22633#discussion_r1886871594
PR Review Comment: https://git.openjdk.org/jdk/pull/22633#discussion_r1886872793
PR Review Comment: https://git.openjdk.org/jdk/pull/22633#discussion_r1886887817
PR Review Comment: https://git.openjdk.org/jdk/pull/22633#discussion_r1886896261


More information about the nio-dev mailing list