RFR(s): 8228580: DnsClient TCP socket timeout
Pavel Rappo
pavel.rappo at oracle.com
Tue Sep 24 10:51:05 UTC 2019
Milan,
Thanks for looking into this. I think you should ask a question on the expected timing semantics and guarantees on net-dev (with maybe a cc to nio-dev).
As for our test. I agree with you that we should simply work a possibility of early returns into the check.
...
/* The acceptable variation of early returns from timed socket operations. */
private static final long PREMATURE_RETURN = adjustTimeout(100);
...
long elapsed = NANOSECONDS.toMillis(System.nanoTime() - startNanos);
if (elapsed < timeout - PREMATURE_RETURN || elapsed > timeout + TOLERANCE) {
String msg = String.format(
"elapsed=%s, timeout=%s, TOLERANCE=%s, PREMATURE_RETURN=%s",
timeout, timeout, TOLERANCE, PREMATURE_RETURN);
throw new RuntimeException(msg);
}
...
Thoughts?
-Pavel
> On 24 Sep 2019, at 09:12, Milan Mimica <milan.mimica at gmail.com> wrote:
>
> Hi Pavel
>
> Wow, I find this awesome. I don't have a Windows machine to play with,
> but I think I may have found something.
> The difference is how Java_sun_nio_ch_Net_poll is implemented. On unix
> it uses poll(2), on Windows it uses select(2). Regarding timeouts,
> poll() has "wait at least" semantics and overruns by design[1], while
> select() on windows has "waits at most" semantics, or how they put
> it[2]: "specifies the maximum time that select should wait before
> returning.". It returns early by design! Is this a known thing? I
> don't think there is much one can do here. It probably makes no sense
> to loop it and wait for time remainder.
> Java's soTimeout does not specify[3] should it wait at least or at
> most the specified timeout, so it's fine I guess. Old, "plain" socket
> impl are not much different.
>
> If the above is correct, should I just add a tolerance for the lower bound?
>
> [1] http://man7.org/linux/man-pages/man2/poll.2.html
> [2] https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-select
> [3] https://docs.oracle.com/javase/9/docs/api/java/net/Socket.html#setSoTimeout-int-
>
> On Mon, 23 Sep 2019 at 16:15, Pavel Rappo <pavel.rappo at oracle.com> wrote:
>>
>> Milan,
>>
>> I'm observing the latest version (.04) of this test failing quite frequently (4/100) on Windows (Windows Server 2012 R2 6.3 (amd64)) machines. The test passes fine on macOS and Linux. Here's the typical output I see in the logs:
>>
>> java.lang.RuntimeException: Query took 4997 ms. . Timeout value is 5000
>> java.lang.RuntimeException: Query took 4999 ms. . Timeout value is 5000
>> java.lang.RuntimeException: Query took 4995 ms. . Timeout value is 5000
>> java.lang.RuntimeException: Query took 4998 ms. . Timeout value is 5000
>> ...
>>
>> Now, there might be many reasons for that. One of which would be that the DnsClient code is buggy. The other reason would be that the accuracy guaranteed by Windows implementation of `read` is not what we would expect. Would you be able to investigate that?
>>
>> P.S. The good news is that the CSR has been approved:
>>
>> https://bugs.openjdk.java.net/browse/JDK-8230965
>>
>>> On 23 Sep 2019, at 14:20, Milan Mimica <milan.mimica at gmail.com> wrote:
>>>
>>> Got it. Thanks Pavel!
>>>
>>>
>>> On Mon, 23 Sep 2019 at 13:37, Pavel Rappo <pavel.rappo at oracle.com> wrote:
>>>>
>>>> Milan,
>>>>
>>>> How do you check which tests are run? That's what I see in the /test-support/jtreg_open_test_jdk_com_sun_jndi_dns_ConfigTests_TcpTimeout_java/com/sun/jndi/dns/ConfigTests/TcpTimeout.jtr file after I have run the test locally on my machine:
>>>>
>>>> ----------messages:(5/233)----------
>>>> command: main TcpTimeout
>>>> reason: User specified action: run main TcpTimeout
>>>> Mode: othervm
>>>> Additional options from @modules: --add-modules java.base --add-exports java.base/sun.security.util=ALL-UNNAMED
>>>> elapsed time (seconds): 1.751
>>>>
>>>> ...
>>>>
>>>> ----------messages:(5/313)----------
>>>> command: main TcpTimeout -Dcom.sun.jndi.dns.timeout.initial=5000
>>>> reason: User specified action: run main TcpTimeout -Dcom.sun.jndi.dns.timeout.initial=5000
>>>> Mode: othervm
>>>> Additional options from @modules: --add-modules java.base --add-exports java.base/sun.security.util=ALL-UNNAMED
>>>> elapsed time (seconds): 5.498
>>>>
>>>> ------------------------------------
>>>>
>>>> Which is consistent with what I would expect given the timeout values.
>>>>
>>>> The following output does not tell the full story, just the name of the test:
>>>>
>>>> ==============================
>>>> Test summary
>>>> ==============================
>>>> TEST TOTAL PASS FAIL ERROR
>>>> jtreg:open/test/jdk/com/sun/jndi/dns/ConfigTests/TcpTimeout.java
>>>> 1 1 0 0
>>>> ==============================
>>>> TEST SUCCESS
>>>>
>>>> -Pavel
>>>>
>>>>> On 20 Sep 2019, at 15:42, Milan Mimica <milan.mimica at gmail.com> wrote:
>>>>>
>>>>> Pavel,
>>>>>
>>>>> Here it is: http://cr.openjdk.java.net/~mmimica/8228580/webrev.04/
>>>>> I don't see the test is run twice when I execute "make test
>>>>> TEST=jtreg:test/jdk/com/sun/jndi/dns/ConfigTests/TcpTimeout.java". Am
>>>>> I missing something?
>>>>>
>>>
>>>
>>> --
>>> Milan Mimica
>>
>
>
> --
> Milan Mimica
More information about the core-libs-dev
mailing list