RFR 8170541: serviceability/jdwp/AllModulesCommandTest.java fails intermittently on Windows and Solaris

David Holmes david.holmes at oracle.com
Sat Mar 3 04:41:42 UTC 2018


On 3/03/2018 8:57 AM, Chris Plummer wrote:
> Finally got around to reading up on this. At first I was expecting to 
> see it originally sent one large packet, but from reading up on the 
> issue it looks like the problem was it sent a packet for each field of 
> the JDWP header, resulting in too many small packets, and the fix for 
> this was to coalesce the header into one packet. So this CR doesn't 
> explain why two packets are sent (when the data is large) instead of one.

No there is no verbiage to explain it, but that was the fix that put in 
the code:

496 static jdwpTransportError JNICALL
497socketTransport_writePacket(jdwpTransportEnv* env, const jdwpPacket 
*packet)
498{
499    jint len, data_len, id;
500    /*
501     * room for header and up to MAX_DATA_SIZE data bytes
502     */
503    char header[HEADER_SIZE + MAX_DATA_SIZE];
504    jbyte *data;
505
506    /* packet can't be null */
507    if (packet == NULL) {
508	RETURN_ERROR(JDWPTRANSPORT_ERROR_ILLEGAL_ARGUMENT, "packet is NULL");
509    }
510
511    len = packet->type.cmd.len;		/* includes header */
512    data_len = len - HEADER_SIZE;
513
514    /* bad packet */
515    if (data_len < 0) {
516	RETURN_ERROR(JDWPTRANSPORT_ERROR_ILLEGAL_ARGUMENT, "invalid length");
517    }
518
519    /* prepare the header for transmission */
520    len = (jint)dbgsysHostToNetworkLong(len);
521    id = (jint)dbgsysHostToNetworkLong(packet->type.cmd.id);
522
523    memcpy(header + 0, &len, 4);
524    memcpy(header + 4, &id, 4);
525    header[8] = packet->type.cmd.flags;
526    if (packet->type.cmd.flags & JDWPTRANSPORT_FLAGS_REPLY) {
527	jshort errorCode =
528            dbgsysHostToNetworkShort(packet->type.reply.errorCode);
529        memcpy(header + 9, &errorCode, 2);
530    } else {
531        header[9] = packet->type.cmd.cmdSet;
532        header[10] = packet->type.cmd.cmd;
533    }
534
535    data = packet->type.cmd.data;
536    /* Do one send for short packets, two for longer ones */
537    if (data_len <= MAX_DATA_SIZE) {
538        memcpy(header + HEADER_SIZE, data, data_len);
539        if (dbgsysSend(socketFD, (char *)&header, HEADER_SIZE + 
data_len, 0) !=
540            HEADER_SIZE + data_len) {
541            RETURN_IO_ERROR("send failed");
542        }
543    } else {
544        memcpy(header + HEADER_SIZE, data, MAX_DATA_SIZE);
545        if (dbgsysSend(socketFD, (char *)&header, HEADER_SIZE + 
MAX_DATA_SIZE, 0) !=
546            HEADER_SIZE + MAX_DATA_SIZE) {
547            RETURN_IO_ERROR("send failed");
548        }
549        /* Send the remaining data bytes right out of the data area. */
550        if (dbgsysSend(socketFD, (char *)data + MAX_DATA_SIZE,
551                       data_len - MAX_DATA_SIZE, 0) != data_len - 
MAX_DATA_SIZE) {
552            RETURN_IO_ERROR("send failed");
553        }
554    }
555
556    return JDWPTRANSPORT_ERROR_NONE;
557}

David
------

> Chris
> 
> On 2/26/18 8:20 PM, David Holmes wrote:
>> The two-step send came in with:
>>
>> https://bugs.openjdk.java.net/browse/JDK-6401245
>>
>> "Small JDWP packets with the socket transport causes slow debugging on 
>> linux 2.6.15 kernel and newer"
>>
>> David
>> -----
>>
>> On 27/02/2018 9:29 AM, serguei.spitsyn at oracle.com wrote:
>>> On 2/26/18 15:06, Chris Plummer wrote:
>>>> On 2/26/18 3:00 PM, daniil.x.titov at oracle.com wrote:
>>>>>
>>>>>
>>>>> On 2/26/18 12:16 PM, Chris Plummer wrote:
>>>>>> On 2/26/18 11:51 AM, daniil.x.titov at oracle.com wrote:
>>>>>>> Hi David and Sergei,
>>>>>>>
>>>>>>> On 2/20/18 10:16 PM, serguei.spitsyn at oracle.com wrote:
>>>>>>>> Hi David,
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2/20/18 20:02, David Holmes wrote:
>>>>>>>>> Hi Daniil,
>>>>>>>>>
>>>>>>>>> Good find on this!
>>>>>>>>>
>>>>>>>>> What does the actual spec say about the length of things and 
>>>>>>>>> how they may be split across multiple packets? Are we 
>>>>>>>>> guaranteed that at most two packets will be involved?
>>>>>>>
>>>>>>> The JDWP spec 
>>>>>>> (https://docs.oracle.com/javase/9/docs/specs/jdwp/jdwp-spec.html) 
>>>>>>> says nothing about splitting JDWP reply packets at all but the 
>>>>>>> implementation limits the max number of the sent packets to two 
>>>>>>> packets max. The implementation is dated back to the initial load 
>>>>>>> that happened in 2007 and the information about the related Jira 
>>>>>>> issue is missing.
>>>>>>>
>>>>>>> open/src/jdk.jdwp.agent/share/native/libdt_socket/socketTransport.c
>>>>>>>
>>>>>>> 836    data = packet->type.cmd.data;
>>>>>>> 837    /* Do one send for short packets, two for longer ones */
>>>>>>> 838    if (data_len <= MAX_DATA_SIZE) {
>>>>>>> 839        memcpy(header + JDWP_HEADER_SIZE, data, data_len);
>>>>>>> 840        if (send_fully(socketFD, (char *)&header, 
>>>>>>> JDWP_HEADER_SIZE + data_len) !=
>>>>>>> 841            JDWP_HEADER_SIZE + data_len) {
>>>>>>> 842            RETURN_IO_ERROR("send failed");
>>>>>>> 843        }
>>>>>>> 844    } else {
>>>>>>> 845        memcpy(header + JDWP_HEADER_SIZE, data, MAX_DATA_SIZE);
>>>>>>> 846        if (send_fully(socketFD, (char *)&header, 
>>>>>>> JDWP_HEADER_SIZE + MAX_DATA_SIZE) !=
>>>>>>> 847            JDWP_HEADER_SIZE + MAX_DATA_SIZE) {
>>>>>>> 848            RETURN_IO_ERROR("send failed");
>>>>>>> 849        }
>>>>>>> 850        /* Send the remaining data bytes right out of the data 
>>>>>>> area. */
>>>>>>> 851        if (send_fully(socketFD, (char *)data + MAX_DATA_SIZE,
>>>>>>> 852                       data_len - MAX_DATA_SIZE) != data_len - 
>>>>>>> MAX_DATA_SIZE) {
>>>>>>> 853            RETURN_IO_ERROR("send failed");
>>>>>>> 854        }
>>>>>>> 855    }
>>>>>>>
>>>>>> Curious. First packet is limited to MAX_DATA_SIZE, 2nd packet has 
>>>>>> no size limit. What's the point then of splitting it then? Is 
>>>>>> there a desire to get the header transmitted in a smaller packet.
>>>>>>
>>>>>> Chris
>>>>>
>>>>> It looks as the goal was to somehow improve the responsiveness in 
>>>>> case of the large data but I am not sure about this. I could not 
>>>>> locate any traces in Jira related to this implementation.
>>>> I was thinking it might be something like that too. Get the header 
>>>> across the wire quickly. Maybe the user just wants the header (with 
>>>> size info) initially, and will allocate a large buffer for the rest 
>>>> if necessary.
>>>
>>> It was my guess too.
>>> At least, it is the best explanation for this design that looks 
>>> reasonable to me.
>>>
>>>
>>>> Chris
>>>>> Probably Serguei has some info what is the history behind this design.
>>>
>>> I don't know the history here.
>>> This was implemented in very early days, most likely, before JDK 1.5 
>>> or even 1.4.2.
>>>
>>> Thanks,
>>> Serguei
>>>
>>>>>>>>>
>>>>>>>>>  68     protected byte[] readJdwpString(DataInputStream ds) 
>>>>>>>>> throws IOException {
>>>>>>>>>   69         byte[] str = null;
>>>>>>>>>   70         int len = ds.readInt();
>>>>>>>>>   71         if (len > 0) {
>>>>>>>>>   72             str = new byte[len];
>>>>>>>>>   73             ds.read(str, 0, len);
>>>>>>>>>   74         }
>>>>>>>>>
>>>>>>>>> might we get a short-read of the string if it is split across 
>>>>>>>>> multiple packets?
>>>>>>>>
>>>>>>> This and all other reads happen not directly from the socket 
>>>>>>> input stream but rather from the  DataInputStream object that is 
>>>>>>> constructed in JdwpReply.initFromStream(InputStream) method. With 
>>>>>>> the proposed fix we do ensure that the created DataInputStream 
>>>>>>> object contains data from both packets in cases when the reply 
>>>>>>> was split in two packets.
>>>>>>>
>>>>>>>> Nice catch!
>>>>>>>> Even though this fix is enough to resolve this problem now, 
>>>>>>>> there is a chance,
>>>>>>>> it can fail in the future when more modules are added to the 
>>>>>>>> platform.
>>>>>>>>
>>>>>>>>
>>>>>>>>> I'm wondering if all these reads should be loops, ensuring we 
>>>>>>>>> read the expected amount of data.
>>>>>>>>>
>>>>>>> Since the implementation of the socket transport limits the max 
>>>>>>> number of packets the reply might be split in to two packets I 
>>>>>>> don't think we really need it here.
>>>>>>>>> One further comment - not sure why we need the print out for 
>>>>>>>>> when we do read multiple packets?
>>>>>>>>> That would seem to be a debugging aid.
>>>>>>>>
>>>>>>>> Yes, it helps to understand what happens.
>>>>>>>> Many tests have a lack of tracing which makes it harder to debug 
>>>>>>>> and understand failures.
>>>>>>> That is correct.  This additional tracing was added to help to 
>>>>>>> understand the possible failures in the future.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Serguei
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>>>>>>>
>>>>>>> Thanks,
>>>>>>> Daniil
>>>>>>>
>>>>>>>>> On 21/02/2018 10:14 AM, Daniil Titov wrote:
>>>>>>>>>> Hi Serguei,
>>>>>>>>>>
>>>>>>>>>> A new version of the webrev that has these strings reformatted 
>>>>>>>>>> is at http://cr.openjdk.java.net/~dtitov/8170541/webrev.02/
>>>>>>>>>>
>>>>>>>>>> Thank you!
>>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>>
>>>>>>>>>> Daniil
>>>>>>>>>>
>>>>>>>>>> *From: *"serguei.spitsyn at oracle.com" <serguei.spitsyn at oracle.com>
>>>>>>>>>> *Date: *Tuesday, February 20, 2018 at 3:00 PM
>>>>>>>>>> *To: *Daniil Titov <daniil.x.titov at oracle.com>, 
>>>>>>>>>> "serviceability-dev at openjdk.java.net" 
>>>>>>>>>> <serviceability-dev at openjdk.java.net>
>>>>>>>>>> *Subject: *Re: RFR 8170541: 
>>>>>>>>>> serviceability/jdwp/AllModulesCommandTest.java fails 
>>>>>>>>>> intermittently on Windows and Solaris
>>>>>>>>>>
>>>>>>>>>> Hi Daniil,
>>>>>>>>>>
>>>>>>>>>> Interesting issue...
>>>>>>>>>> Thank you for finding to the root cause so quickly!
>>>>>>>>>>
>>>>>>>>>> The fix looks good.
>>>>>>>>>> Could I ask you to reformat these lines to make the L54 
>>>>>>>>>> shorter ?:
>>>>>>>>>>
>>>>>>>>>>    54                 System.out.println("[" + 
>>>>>>>>>> getClass().getName() + "] Only " + bytesRead + " bytes of " + 
>>>>>>>>>> dataLength +
>>>>>>>>>>
>>>>>>>>>>    55                         " were read in the first packet. 
>>>>>>>>>> Reading the rest...");
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Serguei
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 2/20/18 09:24, Daniil Titov wrote:
>>>>>>>>>>
>>>>>>>>>>     Please review the changes that fix intermittent failure of
>>>>>>>>>>     serviceability/jdwp/AllModulesCommandTest.java test.
>>>>>>>>>>
>>>>>>>>>>     The problem here is that for a large data the JDWP agent
>>>>>>>>>>     (socketTransport_writePacket() method in
>>>>>>>>>> src/jdk.jdwp.agent/share/native/libdt_socket/socketTransport.c )
>>>>>>>>>>     sends 2 packets and in some cases only the first packet is 
>>>>>>>>>> received
>>>>>>>>>>     at the time when the test reads the reply from the JDWP 
>>>>>>>>>> agent. Since
>>>>>>>>>>     the test does not check that all data is received in the 
>>>>>>>>>> first
>>>>>>>>>>     packet the correlation between commands and replies became 
>>>>>>>>>> broken
>>>>>>>>>>     (the unread second packet is read by the next command and 
>>>>>>>>>> the reply
>>>>>>>>>>     for the next command is read by the next after next 
>>>>>>>>>> command and so on).
>>>>>>>>>>
>>>>>>>>>>     Bug: https://bugs.openjdk.java.net/browse/JDK-8170541
>>>>>>>>>>
>>>>>>>>>>     Webrev: http://cr.openjdk.java.net/~dtitov/8170541/webrev.01
>>>>>>>>>>
>>>>>>>>>>     The tests ran successfully with Mach5.
>>>>>>>>>>
>>>>>>>>>>     Best regards,
>>>>>>>>>>
>>>>>>>>>>     Daniil
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
> 
> 


More information about the serviceability-dev mailing list