RFR 8075484:SocketInputStream.socketRead0 can hang even with soTimeout set

Vyom Tewari vyom.tewari at oracle.com
Wed Sep 7 06:45:50 UTC 2016


Hi Chris,

Please find the latest 
webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.3/index.html 
<http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.3/index.html>). I 
did some code refactoring, JPRT is in progress.

I need help from some one to build and test latest changes on AIX, may 
be Christoph can help.

Thanks,

Vyom


On Tuesday 06 September 2016 06:25 PM, Chris Hegarty wrote:
> Vyom,
>
>> On 6 Sep 2016, at 10:20, Vyom Tewari <vyom.tewari at oracle.com> wrote:
>>
>> Hi,
>>
>> Please find the latest webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.2/index.html <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.2/index.html>). I have incorporated the review comments.
> Your changes completely avoid NET_Timeout, which is quite complex on
> Linux, Mac, and AIX, for various reasons. ( NET_Timeout will use the
> async close mechanism to signal/interrupt a thread blocked in a poll /
> select ). Also, select is used on Mac, which will use poll after your
> changes ( I do remember that there was a bug in poll on Mac around
> the time of the original Mac port ).
>
> Would is be better, rather than replicating the logic in  NET_Timeout,
> to have a version that supports passing a function pointer, to run if
> the poll/select returns before the timeout? Just an idea.
>
> -Chris.
>
>> Thanks,
>>
>> Vyom
>>
>>
>> On Monday 05 September 2016 08:37 PM, Chris Hegarty wrote:
>>> On 05/09/16 15:37, Mark Sheppard wrote:
>>>> if the desire is to avoid making double calls on gettimeofday in the
>>>> NET_ReadWithTimeout's while loop for its main call flow,
>>> Yes, the desire is to make no more calls to gettimeofday, than are
>>> already done.
>>>
>>>> then consider a modification to NET_Timeout and create a version
>>>> int NET_TimeoutWithCurrentTime(int s, long timeout, stuct timeval *
>>>> current_time)
>>>>
>>>> int NET_TimeoutWithCurrentTime(int s, long timeout, stuct timeval *
>>>> current_time) {
>>>>     int result;
>>>>     struct timeval t;
>>>>     long prevtime, newtime;
>>>>     struct pollfd pfd;
>>>>     pfd.fd = s;
>>>>     pfd.events = POLLIN;
>>>>
>>>>     if (timeout > 0) {
>>>>         prevtime = (current_time->tv_sec * 1000)  +
>>>> current_time->tv_usec / 1000;
>>>>     }
>>>>
>>>>     for(;;) {
>>>>         result = poll(&pfd, 1, timeout);
>>>>         if (result < 0 && errno == EINTR) {
>>>>             if (timeout > 0) {
>>>>                 gettimeofday(&t, NULL);
>>>>                 newtime = (t.tv_sec * 1000)  +  t.tv_usec /1000;
>>>>                 timeout -= newtime - prevtime;
>>>>                 if (timeout <= 0)
>>>>                     return 0;
>>>>                 prevtime = newtime;
>>>>             }
>>>>         } else {
>>>>             return result;
>>>>         }
>>>>     }
>>>> }
>>>>
>>>> the NET_ReadWithTimeout is modified with.
>>>>
>>>>    while (timeout > 0) {
>>>>         result = NET_TimeoutWithCurrentTime(fd, timeout, &t);
>>>>
>>>> in any case there is the potential for multiple invocation of
>>>> gettimeofday due to a need to make
>>>> poll restartable,
>>> Yes, and that is fine. Just no more than is already there.
>>>
>>> and adjust the originally specified timeout
>>>> accordingly, and for the edge case
>>>> after the non blocking read.
>>> After an initial skim, your suggestion seems reasonable.
>>>
>>> -Chris.
>>>
>>>> regards
>>>> Mark
>>>>
>>>>
>>>>
>>>> On 05/09/2016 12:02, Chris Hegarty wrote:
>>>>> Vyom,
>>>>>
>>>>> webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.1/index.html
>>>>>
>>>>> There is some concern about the potential performance effect, etc,
>>>>> of gettimeofday, maybe there is a way out of this. The reuse of
>>>>> NET_Timeout is good, but it also calls gettimeofday. It seems that
>>>>> a specific NET_ReadWithTimeout could be written to NOT reuse
>>>>> NET_Timeout, but effectively inline its interruptible operation.
>>>>> Or write a variant of NET_Timeout that takes a function to
>>>>> execute. Rather than effectively two loops conditioned on the
>>>>> timeout.  Disclaimer: I have not actually tried to do this, but
>>>>> it seems worth considering / evaluating.
>>>>>
>>>>> -Chris.
>>>>>
>>>>> On 02/09/16 04:39, Vyom Tewari wrote:
>>>>>> hi Dimitry,
>>>>>>
>>>>>> thanks for review, I did consider to use  a monotonically increasing
>>>>>> clock like "clock_gettime" but  existing nearby code("NET_Timeout") uses
>>>>>> "gettimeofday"  so i choose to be consistent with the existing code.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Vyom
>>>>>>
>>>>>>
>>>>>> On Friday 02 September 2016 01:38 AM, Dmitry Samersoff wrote:
>>>>>>> Vyom,
>>>>>>>
>>>>>>> Did you consider to use select() to calculate timeout instead of
>>>>>>> gettimeofday ?
>>>>>>>
>>>>>>> gettimeofday is affected by system time changes, so running ntpd can
>>>>>>> cause unpredictable behavior of this code. Also it's rather expensive
>>>>>>> syscall.
>>>>>>>
>>>>>>> -Dmitry
>>>>>>>
>>>>>>> On 2016-09-01 19:03, Vyom Tewari wrote:
>>>>>>>> please find the updated
>>>>>>>> webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.1/index.html
>>>>>>>>
>>>>>>>> <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.1/index.html>).
>>>>>>>> I
>>>>>>>> incorporated the review comments.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Vyom
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tuesday 30 August 2016 04:11 PM, Mark Sheppard wrote:
>>>>>>>>> Hi
>>>>>>>>>    perhaps there is an opportunity to do  some refactoring here (...
>>>>>>>>> for me a "goto " carries a code smell! )
>>>>>>>>>
>>>>>>>>> along the lines
>>>>>>>>>
>>>>>>>>> if (timeout) {
>>>>>>>>>      nread =  NET_ReadWithTimeout(...);
>>>>>>>>> } else {
>>>>>>>>>       nread = NET_Read(...);
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> the NET_ReadWithTimeout (...) function will contain a
>>>>>>>>> restructuring of
>>>>>>>>> your goto loop
>>>>>>>>> while (_timeout > 0) { nread = NET_Timeout(fd, _timeout);
>>>>>>>>>            if (nread <= 0) {
>>>>>>>>>                if (nread == 0) {
>>>>>>>>>                    JNU_ThrowByName(env, JNU_JAVANETPKG
>>>>>>>>> "SocketTimeoutException",
>>>>>>>>>                                "Read timed out");
>>>>>>>>>                } else if (nread == -1) {
>>>>>>>>>                    if (errno == EBADF) {
>>>>>>>>>                         JNU_ThrowByName(env, JNU_JAVANETPKG
>>>>>>>>> "SocketException", "Socket closed");
>>>>>>>>>                    } else if (errno == ENOMEM) {
>>>>>>>>>                        JNU_ThrowOutOfMemoryError(env, "NET_Timeout
>>>>>>>>> native heap allocation failed");
>>>>>>>>>                    } else {
>>>>>>>>> JNU_ThrowByNameWithMessageAndLastError
>>>>>>>>>                            (env, JNU_JAVANETPKG "SocketException",
>>>>>>>>> "select/poll failed");
>>>>>>>>>                    }
>>>>>>>>>                }
>>>>>>>>>                  // release buffer in main call flow
>>>>>>>>> //              if (bufP != BUF) {
>>>>>>>>> //                  free(bufP);
>>>>>>>>> //             }
>>>>>>>>>               nread = -1;
>>>>>>>>>               break;
>>>>>>>>>            } else {
>>>>>>>>> nread = NET_NonBlockingRead(fd, bufP, len);
>>>>>>>>> if (nread == -1 && ((errno == EAGAIN) || (errno == EWOULDBLOCK))) {
>>>>>>>>>                gettimeofday(&t, NULL);
>>>>>>>>> newtime = t.tv_sec * 1000 + t.tv_usec / 1000;
>>>>>>>>> _timeout -= newtime - prevtime;
>>>>>>>>> if(_timeout > 0){
>>>>>>>>> prevtime = newtime;
>>>>>>>>>                    }
>>>>>>>>> } else { break; } } } return nread;
>>>>>>>>>
>>>>>>>>> e&oe
>>>>>>>>>
>>>>>>>>> regards
>>>>>>>>> Mark
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 29/08/2016 10:58, Vyom Tewari wrote:
>>>>>>>>>> gentle reminder, please review the below code change.
>>>>>>>>>>
>>>>>>>>>> Vyom
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Monday 22 August 2016 05:12 PM, Vyom Tewari wrote:
>>>>>>>>>>> Hi All,
>>>>>>>>>>>
>>>>>>>>>>> Please review the code changes for below issue.
>>>>>>>>>>>
>>>>>>>>>>> Bug         : https://bugs.openjdk.java.net/browse/JDK-8075484
>>>>>>>>>>>
>>>>>>>>>>> webrev    :
>>>>>>>>>>> http://cr.openjdk.java.net/~vtewari/8075484/webrev0.0/index.html
>>>>>>>>>>> <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.0/index.html>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> This issue is SocketInputStream.socketread0() hangs even with
>>>>>>>>>>> "soTimeout" set.the implementation of
>>>>>>>>>>> Java_java_net_SocketInputStream_socketRead0 assumes that read()
>>>>>>>>>>> won't block after poll() reports that a read is possible.
>>>>>>>>>>>
>>>>>>>>>>> This assumption does not hold, as noted on the man page for select
>>>>>>>>>>> (referenced by the man page for poll): Under Linux, select() may
>>>>>>>>>>> report a socket file descriptor as "ready for reading", while
>>>>>>>>>>> nevertheless a subsequent read blocks. This could for example
>>>>>>>>>>> happen
>>>>>>>>>>> when data has arrived but upon examination has wrong checksum
>>>>>>>>>>> and is
>>>>>>>>>>> discarded.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Vyom
>>>>>>>>>>>
>>>>>>>>>>>



More information about the net-dev mailing list