RFR 8075484:SocketInputStream.socketRead0 can hang even with soTimeout set
Vyom Tewari
vyom.tewari at oracle.com
Wed Sep 7 15:09:41 UTC 2016
Hi All,
Please find the latest
webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.4/index.html
<http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.4/index.html>).
there were some test failures in JPRT and Chris also pointed the same
issue in modified code. Now with latest code JPRT is clean.
Thanks,
Vyom
On Wednesday 07 September 2016 03:18 PM, Chris Hegarty wrote:
>
>
> On 07/09/16 07:45, Vyom Tewari wrote:
>> Hi Chris,
>>
>> Please find the latest
>> webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.3/index.html
>> <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.3/index.html>). I
>> did some code refactoring, JPRT is in progress.
>
> In terms of NET_Timeout**, I'm happy with this version, but I
> have an additional comment.
>
> If NET_ReadWithTimeout returns -1 because NET_TimeoutWithCurrentTime
> returns <= 0, then a JNI pending exception will be set. So the code
> calling NET_ReadWithTimeout should, a) free bufP, and b) return -1,
> immediately rather than continuing and possibly trying to set another
> JNI pending exception.
>
> One option is to check the return value from NET_ReadWithTimeout and
> also (*env)->ExceptionCheck(env). Another is to possibly consolidate
> the setting of JNI pending exceptions in one place, towards the end
> of Java_java_net_SocketInputStream_socketRead0, for example EBADF is
> handled in two places.
>
> -Chris.
>
>> I need help from some one to build and test latest changes on AIX, may
>> be Christoph can help.
>>
>> Thanks,
>>
>> Vyom
>>
>>
>> On Tuesday 06 September 2016 06:25 PM, Chris Hegarty wrote:
>>> Vyom,
>>>
>>>> On 6 Sep 2016, at 10:20, Vyom Tewari <vyom.tewari at oracle.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> Please find the latest
>>>> webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.2/index.html
>>>>
>>>> <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.2/index.html>).
>>>> I
>>>> have incorporated the review comments.
>>> Your changes completely avoid NET_Timeout, which is quite complex on
>>> Linux, Mac, and AIX, for various reasons. ( NET_Timeout will use the
>>> async close mechanism to signal/interrupt a thread blocked in a poll /
>>> select ). Also, select is used on Mac, which will use poll after your
>>> changes ( I do remember that there was a bug in poll on Mac around
>>> the time of the original Mac port ).
>>>
>>> Would is be better, rather than replicating the logic in NET_Timeout,
>>> to have a version that supports passing a function pointer, to run if
>>> the poll/select returns before the timeout? Just an idea.
>>>
>>> -Chris.
>>>
>>>> Thanks,
>>>>
>>>> Vyom
>>>>
>>>>
>>>> On Monday 05 September 2016 08:37 PM, Chris Hegarty wrote:
>>>>> On 05/09/16 15:37, Mark Sheppard wrote:
>>>>>> if the desire is to avoid making double calls on gettimeofday in the
>>>>>> NET_ReadWithTimeout's while loop for its main call flow,
>>>>> Yes, the desire is to make no more calls to gettimeofday, than are
>>>>> already done.
>>>>>
>>>>>> then consider a modification to NET_Timeout and create a version
>>>>>> int NET_TimeoutWithCurrentTime(int s, long timeout, stuct timeval *
>>>>>> current_time)
>>>>>>
>>>>>> int NET_TimeoutWithCurrentTime(int s, long timeout, stuct timeval *
>>>>>> current_time) {
>>>>>> int result;
>>>>>> struct timeval t;
>>>>>> long prevtime, newtime;
>>>>>> struct pollfd pfd;
>>>>>> pfd.fd = s;
>>>>>> pfd.events = POLLIN;
>>>>>>
>>>>>> if (timeout > 0) {
>>>>>> prevtime = (current_time->tv_sec * 1000) +
>>>>>> current_time->tv_usec / 1000;
>>>>>> }
>>>>>>
>>>>>> for(;;) {
>>>>>> result = poll(&pfd, 1, timeout);
>>>>>> if (result < 0 && errno == EINTR) {
>>>>>> if (timeout > 0) {
>>>>>> gettimeofday(&t, NULL);
>>>>>> newtime = (t.tv_sec * 1000) + t.tv_usec /1000;
>>>>>> timeout -= newtime - prevtime;
>>>>>> if (timeout <= 0)
>>>>>> return 0;
>>>>>> prevtime = newtime;
>>>>>> }
>>>>>> } else {
>>>>>> return result;
>>>>>> }
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> the NET_ReadWithTimeout is modified with.
>>>>>>
>>>>>> while (timeout > 0) {
>>>>>> result = NET_TimeoutWithCurrentTime(fd, timeout, &t);
>>>>>>
>>>>>> in any case there is the potential for multiple invocation of
>>>>>> gettimeofday due to a need to make
>>>>>> poll restartable,
>>>>> Yes, and that is fine. Just no more than is already there.
>>>>>
>>>>> and adjust the originally specified timeout
>>>>>> accordingly, and for the edge case
>>>>>> after the non blocking read.
>>>>> After an initial skim, your suggestion seems reasonable.
>>>>>
>>>>> -Chris.
>>>>>
>>>>>> regards
>>>>>> Mark
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 05/09/2016 12:02, Chris Hegarty wrote:
>>>>>>> Vyom,
>>>>>>>
>>>>>>> webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.1/index.html
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> There is some concern about the potential performance effect, etc,
>>>>>>> of gettimeofday, maybe there is a way out of this. The reuse of
>>>>>>> NET_Timeout is good, but it also calls gettimeofday. It seems that
>>>>>>> a specific NET_ReadWithTimeout could be written to NOT reuse
>>>>>>> NET_Timeout, but effectively inline its interruptible operation.
>>>>>>> Or write a variant of NET_Timeout that takes a function to
>>>>>>> execute. Rather than effectively two loops conditioned on the
>>>>>>> timeout. Disclaimer: I have not actually tried to do this, but
>>>>>>> it seems worth considering / evaluating.
>>>>>>>
>>>>>>> -Chris.
>>>>>>>
>>>>>>> On 02/09/16 04:39, Vyom Tewari wrote:
>>>>>>>> hi Dimitry,
>>>>>>>>
>>>>>>>> thanks for review, I did consider to use a monotonically
>>>>>>>> increasing
>>>>>>>> clock like "clock_gettime" but existing nearby
>>>>>>>> code("NET_Timeout") uses
>>>>>>>> "gettimeofday" so i choose to be consistent with the existing
>>>>>>>> code.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Vyom
>>>>>>>>
>>>>>>>>
>>>>>>>> On Friday 02 September 2016 01:38 AM, Dmitry Samersoff wrote:
>>>>>>>>> Vyom,
>>>>>>>>>
>>>>>>>>> Did you consider to use select() to calculate timeout instead of
>>>>>>>>> gettimeofday ?
>>>>>>>>>
>>>>>>>>> gettimeofday is affected by system time changes, so running ntpd
>>>>>>>>> can
>>>>>>>>> cause unpredictable behavior of this code. Also it's rather
>>>>>>>>> expensive
>>>>>>>>> syscall.
>>>>>>>>>
>>>>>>>>> -Dmitry
>>>>>>>>>
>>>>>>>>> On 2016-09-01 19:03, Vyom Tewari wrote:
>>>>>>>>>> please find the updated
>>>>>>>>>> webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.1/index.html
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.1/index.html>).
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I
>>>>>>>>>> incorporated the review comments.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Vyom
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tuesday 30 August 2016 04:11 PM, Mark Sheppard wrote:
>>>>>>>>>>> Hi
>>>>>>>>>>> perhaps there is an opportunity to do some refactoring
>>>>>>>>>>> here (...
>>>>>>>>>>> for me a "goto " carries a code smell! )
>>>>>>>>>>>
>>>>>>>>>>> along the lines
>>>>>>>>>>>
>>>>>>>>>>> if (timeout) {
>>>>>>>>>>> nread = NET_ReadWithTimeout(...);
>>>>>>>>>>> } else {
>>>>>>>>>>> nread = NET_Read(...);
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> the NET_ReadWithTimeout (...) function will contain a
>>>>>>>>>>> restructuring of
>>>>>>>>>>> your goto loop
>>>>>>>>>>> while (_timeout > 0) { nread = NET_Timeout(fd, _timeout);
>>>>>>>>>>> if (nread <= 0) {
>>>>>>>>>>> if (nread == 0) {
>>>>>>>>>>> JNU_ThrowByName(env, JNU_JAVANETPKG
>>>>>>>>>>> "SocketTimeoutException",
>>>>>>>>>>> "Read timed out");
>>>>>>>>>>> } else if (nread == -1) {
>>>>>>>>>>> if (errno == EBADF) {
>>>>>>>>>>> JNU_ThrowByName(env, JNU_JAVANETPKG
>>>>>>>>>>> "SocketException", "Socket closed");
>>>>>>>>>>> } else if (errno == ENOMEM) {
>>>>>>>>>>> JNU_ThrowOutOfMemoryError(env,
>>>>>>>>>>> "NET_Timeout
>>>>>>>>>>> native heap allocation failed");
>>>>>>>>>>> } else {
>>>>>>>>>>> JNU_ThrowByNameWithMessageAndLastError
>>>>>>>>>>> (env, JNU_JAVANETPKG
>>>>>>>>>>> "SocketException",
>>>>>>>>>>> "select/poll failed");
>>>>>>>>>>> }
>>>>>>>>>>> }
>>>>>>>>>>> // release buffer in main call flow
>>>>>>>>>>> // if (bufP != BUF) {
>>>>>>>>>>> // free(bufP);
>>>>>>>>>>> // }
>>>>>>>>>>> nread = -1;
>>>>>>>>>>> break;
>>>>>>>>>>> } else {
>>>>>>>>>>> nread = NET_NonBlockingRead(fd, bufP, len);
>>>>>>>>>>> if (nread == -1 && ((errno == EAGAIN) || (errno ==
>>>>>>>>>>> EWOULDBLOCK))) {
>>>>>>>>>>> gettimeofday(&t, NULL);
>>>>>>>>>>> newtime = t.tv_sec * 1000 + t.tv_usec / 1000;
>>>>>>>>>>> _timeout -= newtime - prevtime;
>>>>>>>>>>> if(_timeout > 0){
>>>>>>>>>>> prevtime = newtime;
>>>>>>>>>>> }
>>>>>>>>>>> } else { break; } } } return nread;
>>>>>>>>>>>
>>>>>>>>>>> e&oe
>>>>>>>>>>>
>>>>>>>>>>> regards
>>>>>>>>>>> Mark
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 29/08/2016 10:58, Vyom Tewari wrote:
>>>>>>>>>>>> gentle reminder, please review the below code change.
>>>>>>>>>>>>
>>>>>>>>>>>> Vyom
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Monday 22 August 2016 05:12 PM, Vyom Tewari wrote:
>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Please review the code changes for below issue.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Bug :
>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8075484
>>>>>>>>>>>>>
>>>>>>>>>>>>> webrev :
>>>>>>>>>>>>> http://cr.openjdk.java.net/~vtewari/8075484/webrev0.0/index.html
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.0/index.html>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> This issue is SocketInputStream.socketread0() hangs even with
>>>>>>>>>>>>> "soTimeout" set.the implementation of
>>>>>>>>>>>>> Java_java_net_SocketInputStream_socketRead0 assumes that
>>>>>>>>>>>>> read()
>>>>>>>>>>>>> won't block after poll() reports that a read is possible.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This assumption does not hold, as noted on the man page for
>>>>>>>>>>>>> select
>>>>>>>>>>>>> (referenced by the man page for poll): Under Linux, select()
>>>>>>>>>>>>> may
>>>>>>>>>>>>> report a socket file descriptor as "ready for reading", while
>>>>>>>>>>>>> nevertheless a subsequent read blocks. This could for example
>>>>>>>>>>>>> happen
>>>>>>>>>>>>> when data has arrived but upon examination has wrong checksum
>>>>>>>>>>>>> and is
>>>>>>>>>>>>> discarded.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Vyom
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>
More information about the net-dev
mailing list