RFR 8075484:SocketInputStream.socketRead0 can hang even with soTimeout set
Vyom Tewari
vyom.tewari at oracle.com
Thu Sep 8 09:43:05 UTC 2016
On Thursday 08 September 2016 02:50 PM, Langer, Christoph wrote:
> Hi Vyom,
>
> this looks good.
>
> Is there any particular reason why NET_ReadWithTimeout should remain in SocketInputStream.c and not also move to net_util_md.c? That way you could also have a "static long getCurrentTime()" inside net_util_md.c, instead of an exported NET_GetCurrentTime().
I thought this "NET_ReadWithTimeou" is specific to SocketInputStream so
i will be good if we put this method to socketinputstream.c.
In future we will in using the monotonic increasing clock so
"NET_GetCurrentTime()" will help , Just change the implementation you
are done.
Vyom
> And no "#include <sys/time.h>" would be needed in SocketInputStream.c - maybe not even now as it is.
>
> Best regards
> Christoph
>
>> -----Original Message-----
>> From: Vyom Tewari [mailto:vyom.tewari at oracle.com]
>> Sent: Donnerstag, 8. September 2016 05:20
>> To: Langer, Christoph <christoph.langer at sap.com>
>> Cc: net-dev at openjdk.java.net
>> Subject: Re: RFR 8075484:SocketInputStream.socketRead0 can hang even with
>> soTimeout set
>>
>> Hi All,
>>
>> Please find the latest
>> webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.5/index.html
>> <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.5/index.html>).
>>
>> I fixed the AIX flag issue and move some come common code to
>> "net_util_md.c" file.
>>
>> Thanks,
>> Vyom
>>
>> On 9/8/2016 12:32 PM, Langer, Christoph wrote:
>>> Hi Vyom,
>>>
>>> ok, thanks. I have one addition to my proposal: I don't think there's a need for
>> a global NET_GetCurrentTime function. This one should probably be a static
>> function inside net_util_md.c (static long getCurrentTime()).
>>> Best
>>> Christoph
>>>
>>>> -----Original Message-----
>>>> From: Vyom Tewari [mailto:vyom.tewari at oracle.com]
>>>> Sent: Mittwoch, 7. September 2016 18:31
>>>> To: Langer, Christoph <christoph.langer at sap.com>
>>>> Cc: net-dev at openjdk.java.net; Chris Hegarty <chris.hegarty at oracle.com>
>>>> Subject: Re: RFR 8075484:SocketInputStream.socketRead0 can hang even
>> with
>>>> soTimeout set
>>>>
>>>> Hi Chris,
>>>>
>>>> Sorry I was mostly focusing on our test failure first, i will check
>>>> your suggestions.
>>>>
>>>> Thanks,
>>>>
>>>> Vyom
>>>>
>>>>
>>>> On Wednesday 07 September 2016 08:44 PM, Langer, Christoph wrote:
>>>>> Hi Vyom,
>>>>>
>>>>> did you have a look at my suggestions regarding AIX and refactoring? I
>> can't
>>>> find none of it in the new webrev nor did you comment on it.
>>>>> Best regards
>>>>> Christoph
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: net-dev [mailto:net-dev-bounces at openjdk.java.net] On Behalf Of
>>>> Vyom
>>>>>> Tewari
>>>>>> Sent: Mittwoch, 7. September 2016 17:10
>>>>>> To: Chris Hegarty <chris.hegarty at oracle.com>
>>>>>> Cc: net-dev at openjdk.java.net
>>>>>> Subject: Re: RFR 8075484:SocketInputStream.socketRead0 can hang even
>>>> with
>>>>>> soTimeout set
>>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> Please find the latest
>>>>>>
>> webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.4/index.html
>> <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.4/index.html>).
>>>>>> there were some test failures in JPRT and Chris also pointed the same
>>>>>> issue in modified code. Now with latest code JPRT is clean.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Vyom
>>>>>>
>>>>>>
>>>>>> On Wednesday 07 September 2016 03:18 PM, Chris Hegarty wrote:
>>>>>>> On 07/09/16 07:45, Vyom Tewari wrote:
>>>>>>>> Hi Chris,
>>>>>>>>
>>>>>>>> Please find the latest
>>>>>>>>
>>>> webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.3/index.html
>>>> <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.3/index.html>). I
>>>>>>>> did some code refactoring, JPRT is in progress.
>>>>>>> In terms of NET_Timeout**, I'm happy with this version, but I
>>>>>>> have an additional comment.
>>>>>>>
>>>>>>> If NET_ReadWithTimeout returns -1 because
>>>> NET_TimeoutWithCurrentTime
>>>>>>> returns <= 0, then a JNI pending exception will be set. So the code
>>>>>>> calling NET_ReadWithTimeout should, a) free bufP, and b) return -1,
>>>>>>> immediately rather than continuing and possibly trying to set another
>>>>>>> JNI pending exception.
>>>>>>>
>>>>>>> One option is to check the return value from NET_ReadWithTimeout and
>>>>>>> also (*env)->ExceptionCheck(env). Another is to possibly consolidate
>>>>>>> the setting of JNI pending exceptions in one place, towards the end
>>>>>>> of Java_java_net_SocketInputStream_socketRead0, for example EBADF
>> is
>>>>>>> handled in two places.
>>>>>>>
>>>>>>> -Chris.
>>>>>>>
>>>>>>>> I need help from some one to build and test latest changes on AIX, may
>>>>>>>> be Christoph can help.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Vyom
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tuesday 06 September 2016 06:25 PM, Chris Hegarty wrote:
>>>>>>>>> Vyom,
>>>>>>>>>
>>>>>>>>>> On 6 Sep 2016, at 10:20, Vyom Tewari <vyom.tewari at oracle.com>
>>>> wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Please find the latest
>>>>>>>>>>
>> webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.2/index.html
>> <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.2/index.html>).
>>>>>>>>>> I
>>>>>>>>>> have incorporated the review comments.
>>>>>>>>> Your changes completely avoid NET_Timeout, which is quite complex
>> on
>>>>>>>>> Linux, Mac, and AIX, for various reasons. ( NET_Timeout will use the
>>>>>>>>> async close mechanism to signal/interrupt a thread blocked in a poll /
>>>>>>>>> select ). Also, select is used on Mac, which will use poll after your
>>>>>>>>> changes ( I do remember that there was a bug in poll on Mac around
>>>>>>>>> the time of the original Mac port ).
>>>>>>>>>
>>>>>>>>> Would is be better, rather than replicating the logic in NET_Timeout,
>>>>>>>>> to have a version that supports passing a function pointer, to run if
>>>>>>>>> the poll/select returns before the timeout? Just an idea.
>>>>>>>>>
>>>>>>>>> -Chris.
>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Vyom
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Monday 05 September 2016 08:37 PM, Chris Hegarty wrote:
>>>>>>>>>>> On 05/09/16 15:37, Mark Sheppard wrote:
>>>>>>>>>>>> if the desire is to avoid making double calls on gettimeofday in the
>>>>>>>>>>>> NET_ReadWithTimeout's while loop for its main call flow,
>>>>>>>>>>> Yes, the desire is to make no more calls to gettimeofday, than are
>>>>>>>>>>> already done.
>>>>>>>>>>>
>>>>>>>>>>>> then consider a modification to NET_Timeout and create a version
>>>>>>>>>>>> int NET_TimeoutWithCurrentTime(int s, long timeout, stuct
>> timeval *
>>>>>>>>>>>> current_time)
>>>>>>>>>>>>
>>>>>>>>>>>> int NET_TimeoutWithCurrentTime(int s, long timeout, stuct
>> timeval *
>>>>>>>>>>>> current_time) {
>>>>>>>>>>>> int result;
>>>>>>>>>>>> struct timeval t;
>>>>>>>>>>>> long prevtime, newtime;
>>>>>>>>>>>> struct pollfd pfd;
>>>>>>>>>>>> pfd.fd = s;
>>>>>>>>>>>> pfd.events = POLLIN;
>>>>>>>>>>>>
>>>>>>>>>>>> if (timeout > 0) {
>>>>>>>>>>>> prevtime = (current_time->tv_sec * 1000) +
>>>>>>>>>>>> current_time->tv_usec / 1000;
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> for(;;) {
>>>>>>>>>>>> result = poll(&pfd, 1, timeout);
>>>>>>>>>>>> if (result < 0 && errno == EINTR) {
>>>>>>>>>>>> if (timeout > 0) {
>>>>>>>>>>>> gettimeofday(&t, NULL);
>>>>>>>>>>>> newtime = (t.tv_sec * 1000) + t.tv_usec /1000;
>>>>>>>>>>>> timeout -= newtime - prevtime;
>>>>>>>>>>>> if (timeout <= 0)
>>>>>>>>>>>> return 0;
>>>>>>>>>>>> prevtime = newtime;
>>>>>>>>>>>> }
>>>>>>>>>>>> } else {
>>>>>>>>>>>> return result;
>>>>>>>>>>>> }
>>>>>>>>>>>> }
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> the NET_ReadWithTimeout is modified with.
>>>>>>>>>>>>
>>>>>>>>>>>> while (timeout > 0) {
>>>>>>>>>>>> result = NET_TimeoutWithCurrentTime(fd, timeout, &t);
>>>>>>>>>>>>
>>>>>>>>>>>> in any case there is the potential for multiple invocation of
>>>>>>>>>>>> gettimeofday due to a need to make
>>>>>>>>>>>> poll restartable,
>>>>>>>>>>> Yes, and that is fine. Just no more than is already there.
>>>>>>>>>>>
>>>>>>>>>>> and adjust the originally specified timeout
>>>>>>>>>>>> accordingly, and for the edge case
>>>>>>>>>>>> after the non blocking read.
>>>>>>>>>>> After an initial skim, your suggestion seems reasonable.
>>>>>>>>>>>
>>>>>>>>>>> -Chris.
>>>>>>>>>>>
>>>>>>>>>>>> regards
>>>>>>>>>>>> Mark
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 05/09/2016 12:02, Chris Hegarty wrote:
>>>>>>>>>>>>> Vyom,
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>> webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.1/index.html
>>>>>>>>>>>>> There is some concern about the potential performance effect,
>> etc,
>>>>>>>>>>>>> of gettimeofday, maybe there is a way out of this. The reuse of
>>>>>>>>>>>>> NET_Timeout is good, but it also calls gettimeofday. It seems that
>>>>>>>>>>>>> a specific NET_ReadWithTimeout could be written to NOT reuse
>>>>>>>>>>>>> NET_Timeout, but effectively inline its interruptible operation.
>>>>>>>>>>>>> Or write a variant of NET_Timeout that takes a function to
>>>>>>>>>>>>> execute. Rather than effectively two loops conditioned on the
>>>>>>>>>>>>> timeout. Disclaimer: I have not actually tried to do this, but
>>>>>>>>>>>>> it seems worth considering / evaluating.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Chris.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 02/09/16 04:39, Vyom Tewari wrote:
>>>>>>>>>>>>>> hi Dimitry,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> thanks for review, I did consider to use a monotonically
>>>>>>>>>>>>>> increasing
>>>>>>>>>>>>>> clock like "clock_gettime" but existing nearby
>>>>>>>>>>>>>> code("NET_Timeout") uses
>>>>>>>>>>>>>> "gettimeofday" so i choose to be consistent with the existing
>>>>>>>>>>>>>> code.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Vyom
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Friday 02 September 2016 01:38 AM, Dmitry Samersoff
>> wrote:
>>>>>>>>>>>>>>> Vyom,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Did you consider to use select() to calculate timeout instead of
>>>>>>>>>>>>>>> gettimeofday ?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> gettimeofday is affected by system time changes, so running
>> ntpd
>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>> cause unpredictable behavior of this code. Also it's rather
>>>>>>>>>>>>>>> expensive
>>>>>>>>>>>>>>> syscall.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -Dmitry
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 2016-09-01 19:03, Vyom Tewari wrote:
>>>>>>>>>>>>>>>> please find the updated
>>>>>>>>>>>>>>>>
>> webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.1/index.html
>>>>>>>>>>>>>>>>
>> <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.1/index.html>).
>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>> incorporated the review comments.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Vyom
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tuesday 30 August 2016 04:11 PM, Mark Sheppard wrote:
>>>>>>>>>>>>>>>>> Hi
>>>>>>>>>>>>>>>>> perhaps there is an opportunity to do some refactoring
>>>>>>>>>>>>>>>>> here (...
>>>>>>>>>>>>>>>>> for me a "goto " carries a code smell! )
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> along the lines
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> if (timeout) {
>>>>>>>>>>>>>>>>> nread = NET_ReadWithTimeout(...);
>>>>>>>>>>>>>>>>> } else {
>>>>>>>>>>>>>>>>> nread = NET_Read(...);
>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> the NET_ReadWithTimeout (...) function will contain a
>>>>>>>>>>>>>>>>> restructuring of
>>>>>>>>>>>>>>>>> your goto loop
>>>>>>>>>>>>>>>>> while (_timeout > 0) { nread = NET_Timeout(fd, _timeout);
>>>>>>>>>>>>>>>>> if (nread <= 0) {
>>>>>>>>>>>>>>>>> if (nread == 0) {
>>>>>>>>>>>>>>>>> JNU_ThrowByName(env, JNU_JAVANETPKG
>>>>>>>>>>>>>>>>> "SocketTimeoutException",
>>>>>>>>>>>>>>>>> "Read timed out");
>>>>>>>>>>>>>>>>> } else if (nread == -1) {
>>>>>>>>>>>>>>>>> if (errno == EBADF) {
>>>>>>>>>>>>>>>>> JNU_ThrowByName(env, JNU_JAVANETPKG
>>>>>>>>>>>>>>>>> "SocketException", "Socket closed");
>>>>>>>>>>>>>>>>> } else if (errno == ENOMEM) {
>>>>>>>>>>>>>>>>> JNU_ThrowOutOfMemoryError(env,
>>>>>>>>>>>>>>>>> "NET_Timeout
>>>>>>>>>>>>>>>>> native heap allocation failed");
>>>>>>>>>>>>>>>>> } else {
>>>>>>>>>>>>>>>>> JNU_ThrowByNameWithMessageAndLastError
>>>>>>>>>>>>>>>>> (env, JNU_JAVANETPKG
>>>>>>>>>>>>>>>>> "SocketException",
>>>>>>>>>>>>>>>>> "select/poll failed");
>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>> // release buffer in main call flow
>>>>>>>>>>>>>>>>> // if (bufP != BUF) {
>>>>>>>>>>>>>>>>> // free(bufP);
>>>>>>>>>>>>>>>>> // }
>>>>>>>>>>>>>>>>> nread = -1;
>>>>>>>>>>>>>>>>> break;
>>>>>>>>>>>>>>>>> } else {
>>>>>>>>>>>>>>>>> nread = NET_NonBlockingRead(fd, bufP, len);
>>>>>>>>>>>>>>>>> if (nread == -1 && ((errno == EAGAIN) || (errno ==
>>>>>>>>>>>>>>>>> EWOULDBLOCK))) {
>>>>>>>>>>>>>>>>> gettimeofday(&t, NULL);
>>>>>>>>>>>>>>>>> newtime = t.tv_sec * 1000 + t.tv_usec / 1000;
>>>>>>>>>>>>>>>>> _timeout -= newtime - prevtime;
>>>>>>>>>>>>>>>>> if(_timeout > 0){
>>>>>>>>>>>>>>>>> prevtime = newtime;
>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>> } else { break; } } } return nread;
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> e&oe
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> regards
>>>>>>>>>>>>>>>>> Mark
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 29/08/2016 10:58, Vyom Tewari wrote:
>>>>>>>>>>>>>>>>>> gentle reminder, please review the below code change.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Vyom
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Monday 22 August 2016 05:12 PM, Vyom Tewari wrote:
>>>>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Please review the code changes for below issue.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Bug :
>>>>>>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8075484
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> webrev :
>>>>>>>>>>>>>>>>>>>
>>>>>> http://cr.openjdk.java.net/~vtewari/8075484/webrev0.0/index.html
>>>>>> <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.0/index.html>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> This issue is SocketInputStream.socketread0() hangs even
>>>> with
>>>>>>>>>>>>>>>>>>> "soTimeout" set.the implementation of
>>>>>>>>>>>>>>>>>>> Java_java_net_SocketInputStream_socketRead0 assumes
>>>> that
>>>>>>>>>>>>>>>>>>> read()
>>>>>>>>>>>>>>>>>>> won't block after poll() reports that a read is possible.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> This assumption does not hold, as noted on the man page
>> for
>>>>>>>>>>>>>>>>>>> select
>>>>>>>>>>>>>>>>>>> (referenced by the man page for poll): Under Linux,
>> select()
>>>>>>>>>>>>>>>>>>> may
>>>>>>>>>>>>>>>>>>> report a socket file descriptor as "ready for reading",
>> while
>>>>>>>>>>>>>>>>>>> nevertheless a subsequent read blocks. This could for
>>>> example
>>>>>>>>>>>>>>>>>>> happen
>>>>>>>>>>>>>>>>>>> when data has arrived but upon examination has wrong
>>>>>> checksum
>>>>>>>>>>>>>>>>>>> and is
>>>>>>>>>>>>>>>>>>> discarded.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Vyom
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/net-dev/attachments/20160908/5298dd78/attachment-0001.html>
More information about the net-dev
mailing list