RFR 8075484:SocketInputStream.socketRead0 can hang even with soTimeout set
Vyom Tewari
vyom.tewari at oracle.com
Wed Sep 7 06:45:50 UTC 2016
Hi Chris,
Please find the latest
webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.3/index.html
<http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.3/index.html>). I
did some code refactoring, JPRT is in progress.
I need help from some one to build and test latest changes on AIX, may
be Christoph can help.
Thanks,
Vyom
On Tuesday 06 September 2016 06:25 PM, Chris Hegarty wrote:
> Vyom,
>
>> On 6 Sep 2016, at 10:20, Vyom Tewari <vyom.tewari at oracle.com> wrote:
>>
>> Hi,
>>
>> Please find the latest webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.2/index.html <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.2/index.html>). I have incorporated the review comments.
> Your changes completely avoid NET_Timeout, which is quite complex on
> Linux, Mac, and AIX, for various reasons. ( NET_Timeout will use the
> async close mechanism to signal/interrupt a thread blocked in a poll /
> select ). Also, select is used on Mac, which will use poll after your
> changes ( I do remember that there was a bug in poll on Mac around
> the time of the original Mac port ).
>
> Would is be better, rather than replicating the logic in NET_Timeout,
> to have a version that supports passing a function pointer, to run if
> the poll/select returns before the timeout? Just an idea.
>
> -Chris.
>
>> Thanks,
>>
>> Vyom
>>
>>
>> On Monday 05 September 2016 08:37 PM, Chris Hegarty wrote:
>>> On 05/09/16 15:37, Mark Sheppard wrote:
>>>> if the desire is to avoid making double calls on gettimeofday in the
>>>> NET_ReadWithTimeout's while loop for its main call flow,
>>> Yes, the desire is to make no more calls to gettimeofday, than are
>>> already done.
>>>
>>>> then consider a modification to NET_Timeout and create a version
>>>> int NET_TimeoutWithCurrentTime(int s, long timeout, stuct timeval *
>>>> current_time)
>>>>
>>>> int NET_TimeoutWithCurrentTime(int s, long timeout, stuct timeval *
>>>> current_time) {
>>>> int result;
>>>> struct timeval t;
>>>> long prevtime, newtime;
>>>> struct pollfd pfd;
>>>> pfd.fd = s;
>>>> pfd.events = POLLIN;
>>>>
>>>> if (timeout > 0) {
>>>> prevtime = (current_time->tv_sec * 1000) +
>>>> current_time->tv_usec / 1000;
>>>> }
>>>>
>>>> for(;;) {
>>>> result = poll(&pfd, 1, timeout);
>>>> if (result < 0 && errno == EINTR) {
>>>> if (timeout > 0) {
>>>> gettimeofday(&t, NULL);
>>>> newtime = (t.tv_sec * 1000) + t.tv_usec /1000;
>>>> timeout -= newtime - prevtime;
>>>> if (timeout <= 0)
>>>> return 0;
>>>> prevtime = newtime;
>>>> }
>>>> } else {
>>>> return result;
>>>> }
>>>> }
>>>> }
>>>>
>>>> the NET_ReadWithTimeout is modified with.
>>>>
>>>> while (timeout > 0) {
>>>> result = NET_TimeoutWithCurrentTime(fd, timeout, &t);
>>>>
>>>> in any case there is the potential for multiple invocation of
>>>> gettimeofday due to a need to make
>>>> poll restartable,
>>> Yes, and that is fine. Just no more than is already there.
>>>
>>> and adjust the originally specified timeout
>>>> accordingly, and for the edge case
>>>> after the non blocking read.
>>> After an initial skim, your suggestion seems reasonable.
>>>
>>> -Chris.
>>>
>>>> regards
>>>> Mark
>>>>
>>>>
>>>>
>>>> On 05/09/2016 12:02, Chris Hegarty wrote:
>>>>> Vyom,
>>>>>
>>>>> webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.1/index.html
>>>>>
>>>>> There is some concern about the potential performance effect, etc,
>>>>> of gettimeofday, maybe there is a way out of this. The reuse of
>>>>> NET_Timeout is good, but it also calls gettimeofday. It seems that
>>>>> a specific NET_ReadWithTimeout could be written to NOT reuse
>>>>> NET_Timeout, but effectively inline its interruptible operation.
>>>>> Or write a variant of NET_Timeout that takes a function to
>>>>> execute. Rather than effectively two loops conditioned on the
>>>>> timeout. Disclaimer: I have not actually tried to do this, but
>>>>> it seems worth considering / evaluating.
>>>>>
>>>>> -Chris.
>>>>>
>>>>> On 02/09/16 04:39, Vyom Tewari wrote:
>>>>>> hi Dimitry,
>>>>>>
>>>>>> thanks for review, I did consider to use a monotonically increasing
>>>>>> clock like "clock_gettime" but existing nearby code("NET_Timeout") uses
>>>>>> "gettimeofday" so i choose to be consistent with the existing code.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Vyom
>>>>>>
>>>>>>
>>>>>> On Friday 02 September 2016 01:38 AM, Dmitry Samersoff wrote:
>>>>>>> Vyom,
>>>>>>>
>>>>>>> Did you consider to use select() to calculate timeout instead of
>>>>>>> gettimeofday ?
>>>>>>>
>>>>>>> gettimeofday is affected by system time changes, so running ntpd can
>>>>>>> cause unpredictable behavior of this code. Also it's rather expensive
>>>>>>> syscall.
>>>>>>>
>>>>>>> -Dmitry
>>>>>>>
>>>>>>> On 2016-09-01 19:03, Vyom Tewari wrote:
>>>>>>>> please find the updated
>>>>>>>> webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.1/index.html
>>>>>>>>
>>>>>>>> <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.1/index.html>).
>>>>>>>> I
>>>>>>>> incorporated the review comments.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Vyom
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tuesday 30 August 2016 04:11 PM, Mark Sheppard wrote:
>>>>>>>>> Hi
>>>>>>>>> perhaps there is an opportunity to do some refactoring here (...
>>>>>>>>> for me a "goto " carries a code smell! )
>>>>>>>>>
>>>>>>>>> along the lines
>>>>>>>>>
>>>>>>>>> if (timeout) {
>>>>>>>>> nread = NET_ReadWithTimeout(...);
>>>>>>>>> } else {
>>>>>>>>> nread = NET_Read(...);
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> the NET_ReadWithTimeout (...) function will contain a
>>>>>>>>> restructuring of
>>>>>>>>> your goto loop
>>>>>>>>> while (_timeout > 0) { nread = NET_Timeout(fd, _timeout);
>>>>>>>>> if (nread <= 0) {
>>>>>>>>> if (nread == 0) {
>>>>>>>>> JNU_ThrowByName(env, JNU_JAVANETPKG
>>>>>>>>> "SocketTimeoutException",
>>>>>>>>> "Read timed out");
>>>>>>>>> } else if (nread == -1) {
>>>>>>>>> if (errno == EBADF) {
>>>>>>>>> JNU_ThrowByName(env, JNU_JAVANETPKG
>>>>>>>>> "SocketException", "Socket closed");
>>>>>>>>> } else if (errno == ENOMEM) {
>>>>>>>>> JNU_ThrowOutOfMemoryError(env, "NET_Timeout
>>>>>>>>> native heap allocation failed");
>>>>>>>>> } else {
>>>>>>>>> JNU_ThrowByNameWithMessageAndLastError
>>>>>>>>> (env, JNU_JAVANETPKG "SocketException",
>>>>>>>>> "select/poll failed");
>>>>>>>>> }
>>>>>>>>> }
>>>>>>>>> // release buffer in main call flow
>>>>>>>>> // if (bufP != BUF) {
>>>>>>>>> // free(bufP);
>>>>>>>>> // }
>>>>>>>>> nread = -1;
>>>>>>>>> break;
>>>>>>>>> } else {
>>>>>>>>> nread = NET_NonBlockingRead(fd, bufP, len);
>>>>>>>>> if (nread == -1 && ((errno == EAGAIN) || (errno == EWOULDBLOCK))) {
>>>>>>>>> gettimeofday(&t, NULL);
>>>>>>>>> newtime = t.tv_sec * 1000 + t.tv_usec / 1000;
>>>>>>>>> _timeout -= newtime - prevtime;
>>>>>>>>> if(_timeout > 0){
>>>>>>>>> prevtime = newtime;
>>>>>>>>> }
>>>>>>>>> } else { break; } } } return nread;
>>>>>>>>>
>>>>>>>>> e&oe
>>>>>>>>>
>>>>>>>>> regards
>>>>>>>>> Mark
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 29/08/2016 10:58, Vyom Tewari wrote:
>>>>>>>>>> gentle reminder, please review the below code change.
>>>>>>>>>>
>>>>>>>>>> Vyom
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Monday 22 August 2016 05:12 PM, Vyom Tewari wrote:
>>>>>>>>>>> Hi All,
>>>>>>>>>>>
>>>>>>>>>>> Please review the code changes for below issue.
>>>>>>>>>>>
>>>>>>>>>>> Bug : https://bugs.openjdk.java.net/browse/JDK-8075484
>>>>>>>>>>>
>>>>>>>>>>> webrev :
>>>>>>>>>>> http://cr.openjdk.java.net/~vtewari/8075484/webrev0.0/index.html
>>>>>>>>>>> <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.0/index.html>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> This issue is SocketInputStream.socketread0() hangs even with
>>>>>>>>>>> "soTimeout" set.the implementation of
>>>>>>>>>>> Java_java_net_SocketInputStream_socketRead0 assumes that read()
>>>>>>>>>>> won't block after poll() reports that a read is possible.
>>>>>>>>>>>
>>>>>>>>>>> This assumption does not hold, as noted on the man page for select
>>>>>>>>>>> (referenced by the man page for poll): Under Linux, select() may
>>>>>>>>>>> report a socket file descriptor as "ready for reading", while
>>>>>>>>>>> nevertheless a subsequent read blocks. This could for example
>>>>>>>>>>> happen
>>>>>>>>>>> when data has arrived but upon examination has wrong checksum
>>>>>>>>>>> and is
>>>>>>>>>>> discarded.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Vyom
>>>>>>>>>>>
>>>>>>>>>>>
More information about the net-dev
mailing list