RFR 8075484:SocketInputStream.socketRead0 can hang even with soTimeout set

Vyom Tewari vyom.tewari at oracle.com
Thu Sep 8 08:58:10 UTC 2016


sending this mail again as my laptop clock got screwed.

Vyom


On Thursday 08 September 2016 08:49 AM, Vyom Tewari wrote:
> Hi All,
>
> Please find the latest 
> webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.5/index.html 
> <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.5/index.html>).
>
> I fixed the AIX flag issue and move some come common code to 
> "net_util_md.c" file.
>
> Thanks,
> Vyom
>
> On 9/8/2016 12:32 PM, Langer, Christoph wrote:
>> Hi Vyom,
>>
>> ok, thanks. I have one addition to my proposal: I don't think there's 
>> a need for a global NET_GetCurrentTime function. This one should 
>> probably be a static function inside net_util_md.c (static long 
>> getCurrentTime()).
>>
>> Best
>> Christoph
>>
>>> -----Original Message-----
>>> From: Vyom Tewari [mailto:vyom.tewari at oracle.com]
>>> Sent: Mittwoch, 7. September 2016 18:31
>>> To: Langer, Christoph <christoph.langer at sap.com>
>>> Cc: net-dev at openjdk.java.net; Chris Hegarty <chris.hegarty at oracle.com>
>>> Subject: Re: RFR 8075484:SocketInputStream.socketRead0 can hang even 
>>> with
>>> soTimeout set
>>>
>>> Hi Chris,
>>>
>>> Sorry I was mostly focusing on our test  failure first, i will check
>>> your suggestions.
>>>
>>> Thanks,
>>>
>>> Vyom
>>>
>>>
>>> On Wednesday 07 September 2016 08:44 PM, Langer, Christoph wrote:
>>>> Hi Vyom,
>>>>
>>>> did you have a look at my suggestions regarding AIX and 
>>>> refactoring? I can't
>>> find none of it in the new webrev nor did you comment on it.
>>>> Best regards
>>>> Christoph
>>>>
>>>>> -----Original Message-----
>>>>> From: net-dev [mailto:net-dev-bounces at openjdk.java.net] On Behalf Of
>>> Vyom
>>>>> Tewari
>>>>> Sent: Mittwoch, 7. September 2016 17:10
>>>>> To: Chris Hegarty <chris.hegarty at oracle.com>
>>>>> Cc: net-dev at openjdk.java.net
>>>>> Subject: Re: RFR 8075484:SocketInputStream.socketRead0 can hang even
>>> with
>>>>> soTimeout set
>>>>>
>>>>> Hi All,
>>>>>
>>>>> Please find the latest
>>>>> webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.4/index.html 
>>>>>
>>>>> <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.4/index.html>). 
>>>>>
>>>>> there were some test failures in JPRT and Chris also pointed the same
>>>>> issue in modified code. Now with latest code JPRT is clean.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Vyom
>>>>>
>>>>>
>>>>> On Wednesday 07 September 2016 03:18 PM, Chris Hegarty wrote:
>>>>>> On 07/09/16 07:45, Vyom Tewari wrote:
>>>>>>> Hi Chris,
>>>>>>>
>>>>>>> Please find the latest
>>>>>>>
>>> webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.3/index.html
>>> <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.3/index.html>). 
>>> I
>>>>>>> did some code refactoring, JPRT is in progress.
>>>>>> In terms of NET_Timeout**, I'm happy with this version, but I
>>>>>> have an additional comment.
>>>>>>
>>>>>> If NET_ReadWithTimeout returns -1 because
>>> NET_TimeoutWithCurrentTime
>>>>>> returns <= 0, then a JNI pending exception will be set. So the code
>>>>>> calling NET_ReadWithTimeout should, a) free bufP, and b) return -1,
>>>>>> immediately rather than continuing and possibly trying to set 
>>>>>> another
>>>>>> JNI pending exception.
>>>>>>
>>>>>> One option is to check the return value from NET_ReadWithTimeout and
>>>>>> also (*env)->ExceptionCheck(env). Another is to possibly consolidate
>>>>>> the setting of JNI pending exceptions in one place, towards the end
>>>>>> of Java_java_net_SocketInputStream_socketRead0, for example EBADF is
>>>>>> handled in two places.
>>>>>>
>>>>>> -Chris.
>>>>>>
>>>>>>> I need help from some one to build and test latest changes on 
>>>>>>> AIX, may
>>>>>>> be Christoph can help.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Vyom
>>>>>>>
>>>>>>>
>>>>>>> On Tuesday 06 September 2016 06:25 PM, Chris Hegarty wrote:
>>>>>>>> Vyom,
>>>>>>>>
>>>>>>>>> On 6 Sep 2016, at 10:20, Vyom Tewari <vyom.tewari at oracle.com>
>>> wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Please find the latest
>>>>>>>>>
>>>>> webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.2/index.html 
>>>>>
>>>>> <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.2/index.html>). 
>>>>>
>>>>>>>>> I
>>>>>>>>> have incorporated the review comments.
>>>>>>>> Your changes completely avoid NET_Timeout, which is quite 
>>>>>>>> complex on
>>>>>>>> Linux, Mac, and AIX, for various reasons. ( NET_Timeout will 
>>>>>>>> use the
>>>>>>>> async close mechanism to signal/interrupt a thread blocked in a 
>>>>>>>> poll /
>>>>>>>> select ). Also, select is used on Mac, which will use poll 
>>>>>>>> after your
>>>>>>>> changes ( I do remember that there was a bug in poll on Mac around
>>>>>>>> the time of the original Mac port ).
>>>>>>>>
>>>>>>>> Would is be better, rather than replicating the logic in 
>>>>>>>> NET_Timeout,
>>>>>>>> to have a version that supports passing a function pointer, to 
>>>>>>>> run if
>>>>>>>> the poll/select returns before the timeout? Just an idea.
>>>>>>>>
>>>>>>>> -Chris.
>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Vyom
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Monday 05 September 2016 08:37 PM, Chris Hegarty wrote:
>>>>>>>>>> On 05/09/16 15:37, Mark Sheppard wrote:
>>>>>>>>>>> if the desire is to avoid making double calls on 
>>>>>>>>>>> gettimeofday in the
>>>>>>>>>>> NET_ReadWithTimeout's while loop for its main call flow,
>>>>>>>>>> Yes, the desire is to make no more calls to gettimeofday, 
>>>>>>>>>> than are
>>>>>>>>>> already done.
>>>>>>>>>>
>>>>>>>>>>> then consider a modification to NET_Timeout and create a 
>>>>>>>>>>> version
>>>>>>>>>>> int NET_TimeoutWithCurrentTime(int s, long timeout, stuct 
>>>>>>>>>>> timeval *
>>>>>>>>>>> current_time)
>>>>>>>>>>>
>>>>>>>>>>> int NET_TimeoutWithCurrentTime(int s, long timeout, stuct 
>>>>>>>>>>> timeval *
>>>>>>>>>>> current_time) {
>>>>>>>>>>>       int result;
>>>>>>>>>>>       struct timeval t;
>>>>>>>>>>>       long prevtime, newtime;
>>>>>>>>>>>       struct pollfd pfd;
>>>>>>>>>>>       pfd.fd = s;
>>>>>>>>>>>       pfd.events = POLLIN;
>>>>>>>>>>>
>>>>>>>>>>>       if (timeout > 0) {
>>>>>>>>>>>           prevtime = (current_time->tv_sec * 1000)  +
>>>>>>>>>>> current_time->tv_usec / 1000;
>>>>>>>>>>>       }
>>>>>>>>>>>
>>>>>>>>>>>       for(;;) {
>>>>>>>>>>>           result = poll(&pfd, 1, timeout);
>>>>>>>>>>>           if (result < 0 && errno == EINTR) {
>>>>>>>>>>>               if (timeout > 0) {
>>>>>>>>>>>                   gettimeofday(&t, NULL);
>>>>>>>>>>>                   newtime = (t.tv_sec * 1000) + t.tv_usec 
>>>>>>>>>>> /1000;
>>>>>>>>>>>                   timeout -= newtime - prevtime;
>>>>>>>>>>>                   if (timeout <= 0)
>>>>>>>>>>>                       return 0;
>>>>>>>>>>>                   prevtime = newtime;
>>>>>>>>>>>               }
>>>>>>>>>>>           } else {
>>>>>>>>>>>               return result;
>>>>>>>>>>>           }
>>>>>>>>>>>       }
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> the NET_ReadWithTimeout is modified with.
>>>>>>>>>>>
>>>>>>>>>>>      while (timeout > 0) {
>>>>>>>>>>>           result = NET_TimeoutWithCurrentTime(fd, timeout, &t);
>>>>>>>>>>>
>>>>>>>>>>> in any case there is the potential for multiple invocation of
>>>>>>>>>>> gettimeofday due to a need to make
>>>>>>>>>>> poll restartable,
>>>>>>>>>> Yes, and that is fine. Just no more than is already there.
>>>>>>>>>>
>>>>>>>>>> and adjust the originally specified timeout
>>>>>>>>>>> accordingly, and for the edge case
>>>>>>>>>>> after the non blocking read.
>>>>>>>>>> After an initial skim, your suggestion seems reasonable.
>>>>>>>>>>
>>>>>>>>>> -Chris.
>>>>>>>>>>
>>>>>>>>>>> regards
>>>>>>>>>>> Mark
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 05/09/2016 12:02, Chris Hegarty wrote:
>>>>>>>>>>>> Vyom,
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>> webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.1/index.html 
>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> There is some concern about the potential performance 
>>>>>>>>>>>> effect, etc,
>>>>>>>>>>>> of gettimeofday, maybe there is a way out of this. The 
>>>>>>>>>>>> reuse of
>>>>>>>>>>>> NET_Timeout is good, but it also calls gettimeofday. It 
>>>>>>>>>>>> seems that
>>>>>>>>>>>> a specific NET_ReadWithTimeout could be written to NOT reuse
>>>>>>>>>>>> NET_Timeout, but effectively inline its interruptible 
>>>>>>>>>>>> operation.
>>>>>>>>>>>> Or write a variant of NET_Timeout that takes a function to
>>>>>>>>>>>> execute. Rather than effectively two loops conditioned on the
>>>>>>>>>>>> timeout.  Disclaimer: I have not actually tried to do this, 
>>>>>>>>>>>> but
>>>>>>>>>>>> it seems worth considering / evaluating.
>>>>>>>>>>>>
>>>>>>>>>>>> -Chris.
>>>>>>>>>>>>
>>>>>>>>>>>> On 02/09/16 04:39, Vyom Tewari wrote:
>>>>>>>>>>>>> hi Dimitry,
>>>>>>>>>>>>>
>>>>>>>>>>>>> thanks for review, I did consider to use a monotonically
>>>>>>>>>>>>> increasing
>>>>>>>>>>>>> clock like "clock_gettime" but  existing nearby
>>>>>>>>>>>>> code("NET_Timeout") uses
>>>>>>>>>>>>> "gettimeofday"  so i choose to be consistent with the 
>>>>>>>>>>>>> existing
>>>>>>>>>>>>> code.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Vyom
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Friday 02 September 2016 01:38 AM, Dmitry Samersoff wrote:
>>>>>>>>>>>>>> Vyom,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Did you consider to use select() to calculate timeout 
>>>>>>>>>>>>>> instead of
>>>>>>>>>>>>>> gettimeofday ?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> gettimeofday is affected by system time changes, so 
>>>>>>>>>>>>>> running ntpd
>>>>>>>>>>>>>> can
>>>>>>>>>>>>>> cause unpredictable behavior of this code. Also it's rather
>>>>>>>>>>>>>> expensive
>>>>>>>>>>>>>> syscall.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -Dmitry
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 2016-09-01 19:03, Vyom Tewari wrote:
>>>>>>>>>>>>>>> please find the updated
>>>>>>>>>>>>>>>
>>>>> webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.1/index.html 
>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>> <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.1/index.html>). 
>>>>>
>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>> incorporated the review comments.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Vyom
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tuesday 30 August 2016 04:11 PM, Mark Sheppard wrote:
>>>>>>>>>>>>>>>> Hi
>>>>>>>>>>>>>>>>      perhaps there is an opportunity to do  some 
>>>>>>>>>>>>>>>> refactoring
>>>>>>>>>>>>>>>> here (...
>>>>>>>>>>>>>>>> for me a "goto " carries a code smell! )
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> along the lines
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> if (timeout) {
>>>>>>>>>>>>>>>>        nread = NET_ReadWithTimeout(...);
>>>>>>>>>>>>>>>> } else {
>>>>>>>>>>>>>>>>         nread = NET_Read(...);
>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> the NET_ReadWithTimeout (...) function will contain a
>>>>>>>>>>>>>>>> restructuring of
>>>>>>>>>>>>>>>> your goto loop
>>>>>>>>>>>>>>>> while (_timeout > 0) { nread = NET_Timeout(fd, _timeout);
>>>>>>>>>>>>>>>>              if (nread <= 0) {
>>>>>>>>>>>>>>>>                  if (nread == 0) {
>>>>>>>>>>>>>>>> JNU_ThrowByName(env, JNU_JAVANETPKG
>>>>>>>>>>>>>>>> "SocketTimeoutException",
>>>>>>>>>>>>>>>> "Read timed out");
>>>>>>>>>>>>>>>>                  } else if (nread == -1) {
>>>>>>>>>>>>>>>>                      if (errno == EBADF) {
>>>>>>>>>>>>>>>> JNU_ThrowByName(env, JNU_JAVANETPKG
>>>>>>>>>>>>>>>> "SocketException", "Socket closed");
>>>>>>>>>>>>>>>>                      } else if (errno == ENOMEM) {
>>>>>>>>>>>>>>>> JNU_ThrowOutOfMemoryError(env,
>>>>>>>>>>>>>>>> "NET_Timeout
>>>>>>>>>>>>>>>> native heap allocation failed");
>>>>>>>>>>>>>>>>                      } else {
>>>>>>>>>>>>>>>> JNU_ThrowByNameWithMessageAndLastError
>>>>>>>>>>>>>>>>                              (env, JNU_JAVANETPKG
>>>>>>>>>>>>>>>> "SocketException",
>>>>>>>>>>>>>>>> "select/poll failed");
>>>>>>>>>>>>>>>>                      }
>>>>>>>>>>>>>>>>                  }
>>>>>>>>>>>>>>>>                    // release buffer in main call flow
>>>>>>>>>>>>>>>> //              if (bufP != BUF) {
>>>>>>>>>>>>>>>> //                  free(bufP);
>>>>>>>>>>>>>>>> //             }
>>>>>>>>>>>>>>>>                 nread = -1;
>>>>>>>>>>>>>>>>                 break;
>>>>>>>>>>>>>>>>              } else {
>>>>>>>>>>>>>>>> nread = NET_NonBlockingRead(fd, bufP, len);
>>>>>>>>>>>>>>>> if (nread == -1 && ((errno == EAGAIN) || (errno ==
>>>>>>>>>>>>>>>> EWOULDBLOCK))) {
>>>>>>>>>>>>>>>> gettimeofday(&t, NULL);
>>>>>>>>>>>>>>>> newtime = t.tv_sec * 1000 + t.tv_usec / 1000;
>>>>>>>>>>>>>>>> _timeout -= newtime - prevtime;
>>>>>>>>>>>>>>>> if(_timeout > 0){
>>>>>>>>>>>>>>>> prevtime = newtime;
>>>>>>>>>>>>>>>>                      }
>>>>>>>>>>>>>>>> } else { break; } } } return nread;
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> e&oe
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> regards
>>>>>>>>>>>>>>>> Mark
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 29/08/2016 10:58, Vyom Tewari wrote:
>>>>>>>>>>>>>>>>> gentle reminder, please review the below code change.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Vyom
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Monday 22 August 2016 05:12 PM, Vyom Tewari wrote:
>>>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Please review the code changes for below issue.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Bug         :
>>>>>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8075484
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> webrev    :
>>>>>>>>>>>>>>>>>>
>>>>> http://cr.openjdk.java.net/~vtewari/8075484/webrev0.0/index.html
>>>>>>>>>>>>>>>>>>
>>>>> <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.0/index.html>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> This issue is SocketInputStream.socketread0() hangs even
>>> with
>>>>>>>>>>>>>>>>>> "soTimeout" set.the implementation of
>>>>>>>>>>>>>>>>>> Java_java_net_SocketInputStream_socketRead0 assumes
>>> that
>>>>>>>>>>>>>>>>>> read()
>>>>>>>>>>>>>>>>>> won't block after poll() reports that a read is 
>>>>>>>>>>>>>>>>>> possible.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> This assumption does not hold, as noted on the man 
>>>>>>>>>>>>>>>>>> page for
>>>>>>>>>>>>>>>>>> select
>>>>>>>>>>>>>>>>>> (referenced by the man page for poll): Under Linux, 
>>>>>>>>>>>>>>>>>> select()
>>>>>>>>>>>>>>>>>> may
>>>>>>>>>>>>>>>>>> report a socket file descriptor as "ready for 
>>>>>>>>>>>>>>>>>> reading", while
>>>>>>>>>>>>>>>>>> nevertheless a subsequent read blocks. This could for
>>> example
>>>>>>>>>>>>>>>>>> happen
>>>>>>>>>>>>>>>>>> when data has arrived but upon examination has wrong
>>>>> checksum
>>>>>>>>>>>>>>>>>> and is
>>>>>>>>>>>>>>>>>> discarded.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Vyom
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>



More information about the net-dev mailing list