RFR(XS): 8165881 - Backout JDK-8164913

Thu Sep 15 13:13:10 UTC 2016

On 15/09/2016 9:01 PM, Yasumasa Suenaga wrote:
> Hi David,
>
> I agree the call sequence which you write in the case of "jcmd" operation.
> However, we should think about "load" operation in AttachListener.
>
> We can access JvmtiExport::load_agent_library() through two routes:
>
>  Route "load": AttachListener -> JvmtiExport::load_agent_library()
>  Route "jcmd": AttachListener -> jcmd() in attachListener.cpp ->
> DCmd::parse_and_execute()
>
> "load" sets error code (it means first data in the stream) when
> JvmtiExport::load_agent_library() returns JNI_ERR.
> OTOH "jcmd" sets error code if exception is occurred ONLY.

In attachListener we have:

static jint jcmd(AttachOperation* op, outputStream* out) {
   Thread* THREAD = Thread::current();
   // All the supplied jcmd arguments are stored as a single
   // string (op->arg(0)). This is parsed by the Dcmd framework.
   DCmd::parse_and_execute(DCmd_Source_AttachAPI, out, op->arg(0), ' ', 
THREAD);
   if (HAS_PENDING_EXCEPTION) {
     java_lang_Throwable::print(PENDING_EXCEPTION, out);
     out->cr();
     CLEAR_PENDING_EXCEPTION;
     return JNI_ERR;
   }
   return JNI_OK;
}

If there is an exception pending it comes from Dcmd::parse_and_execute 
itself, not from the operation that was requested. Again we will call 
op->complete(res, &st) so this error code is first in the stream, and as 
with the other analysis, it is followed  (if 0) by the real operation 
result code and possibly an error message.

The stream is read back in VirtualMachineImpl.execute which reads the 
completion status first:

         // Create an input stream to read reply
         SocketInputStream sis = new SocketInputStream(s);

         // Read the command completion status
         int completionStatus;
         try {
             completionStatus = readInt(sis);
         } catch (IOException x) {
             sis.close();
             if (ioe != null) {
                 throw ioe;
             } else {
                 throw x;
             }
         }

         if (completionStatus != 0) {
             // read from the stream and use that as the error message
             String message = readErrorMessage(sis);
             ...
         }
         return sis;

The completion status will be 0 because we did manage to communicate 
with the target VM and process the DCmd request. It passes sis back up 
to the caller. So looking at JCmd.java I see:

            try (InputStream in = hvm.executeJCmd(line);) {
                 // read to EOF and just print output
                 byte b[] = new byte[256];
                 int n;
                 boolean messagePrinted = false;
                 do {
                     n = in.read(b);
                     if (n > 0) {
                         String s = new String(b, 0, n, "UTF-8");
                         System.out.print(s);
                         messagePrinted = true;
                     }
                 } while (n > 0);
                 if (!messagePrinted) {
                     System.out.println("Command executed successfully");
                 }
             }

and that logic, AFAICS, is not parsing out a return code followed by an 
error message!

David
-----

> If we try to load invalid JVMTI agent, "load" writes JNI_ERR to stream,
> however "jcmd" writes JNI_OK because pending exception is not available
> at this point.
> Currently, JCmd.java shows "Command executed successfully" when it
> cannot read the message from the socket. In case of this issue,
> JVMTIAgentLoadDCmd does not write any message because it cannot attach
> the agent.
> If return code which is in the top of stream should mean whether
> communication with AttachListener is succeeded, I think HotSpot should
> write error message or error code to the stream for JCmd.java .
>
> I'm not sure this email is the answer for you.
> Sorry for my English.
>
>
> Yasumasa
>
>
> On 2016/09/15 18:43, David Holmes wrote:
>> Hi Yasumasa,
>>
>> On 15/09/2016 1:56 PM, Yasumasa Suenaga wrote:
>>> Hi,
>>>
>>>> If this is done through jcmd then jcmd needs to know how to "unwrap"
>>>> the information it gets back from the Dcmd.
>>>
>>> In case of JVMTI.agent_load dcmd, I think we should be able to get the
>>> response as below:
>>>
>>>   1. Invalid JVMTI agent
>>>   2. Agent_OnAttach() did not return JNI_OK
>>>   3. Succeeded
>>>
>>> Currently, JvmtiExport::load_agent_library() returns JNI_OK to caller,
>>> and jcmd() in attachListener.cpp returns JNI_ERR if exception is
>>> occurred (in Agent_OnAttach()).
>>
>> Can you respond to my comment in the bug report about the chain of
>> calls and explain exactly where you think things are going wrong in
>> this scenario. I'm having a lot of trouble trying to see how the
>> information flows back through the layers.
>>
>>> IMHO, HotSpot should return error code (in top of the stream: written by
>>> AttachListener at first). So we should be able to get some error code in
>>> case 1. and 2. Then the client (jcmd or other methods) can decide to
>>> parse text information in the stream.
>>
>> It returns the high-level response code first "did communication with
>> the target VM succeed", then the actual action response code. That
>> seems the right thing to do to me. It is up to the callers reading the
>> stream to understand how to read it. To me the issue lies somewhere on
>> the jcmd/dcmd side.
>>
>> Thanks,
>> David
>>
>>>
>>> Sorry for my English.
>>>
>>> Yasumasa
>>>
>>>
>>> On 2016/09/14 7:03, David Holmes wrote:
>>>> On 13/09/2016 10:31 PM, Yasumasa Suenaga wrote:
>>>>> Thanks David!
>>>>> If we should not change the meaning of return code from
>>>>> JvmtiExport::load_agent_library(), we should print return code as
>>>>> below:
>>>>> -------------------
>>>>> diff -r 0cf03b9d9b1f src/share/vm/prims/jvmtiExport.cpp
>>>>> --- a/src/share/vm/prims/jvmtiExport.cpp    Mon Sep 12 18:59:13 2016
>>>>> +0000
>>>>> +++ b/src/share/vm/prims/jvmtiExport.cpp    Tue Sep 13 21:12:14 2016
>>>>> +0900
>>>>> @@ -2412,6 +2412,10 @@
>>>>>        result = JNI_OK;
>>>>>      }
>>>>>    }
>>>>> + // Print error code if Agent_OnAttach cannot be executed
>>>>> + if (result != JNI_OK) {
>>>>> + st->print_cr("%d", result);
>>>>> + }
>>>>>    return result;
>>>>>  }
>>>>> -------------------
>>>>
>>>> Not sure I see the point. "return result" will put the error code into
>>>> the socket stream and that error will be seen and responded to. I
>>>> don't expect anything to then read further into the stream to see the
>>>> "result" repeated a second time. In other words if execute() doesn't
>>>> see a zero result then you go no further; if execute sees a zero
>>>> result then the actual action may have had an issue so you proceed to
>>>> read the "action's result".
>>>>
>>>>> It can pass com/sun/tools/attach/BasicTests.java, and we can get -1 if
>>>>> we try to attach invalid agent.
>>>>
>>>> Can you please outline your test case for this again. If this is done
>>>> through jcmd then jcmd needs to know how to "unwrap" the information
>>>> it gets back from the Dcmd.
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>>
>>>>> Yasumasa
>>>>>
>>>>>
>>>>> On 2016/09/13 17:06, Dmitry Samersoff wrote:
>>>>>> David,
>>>>>>
>>>>>> Thank you for the evaluation.
>>>>>>
>>>>>>> With that in mind I suspect the real fix for the original issue
>>>>>>> is to
>>>>>>> have Dcmd recognize that it also has to read two "results".
>>>>>>> Though I'm
>>>>>>> not sure how exactly.
>>>>>>
>>>>>> This logic seems completely broken for me. But I don't see
>>>>>> anything we
>>>>>> can do right now - for jdk 9.
>>>>>>
>>>>>> It requires careful changes of both - code and tests.
>>>>>>
>>>>>> I can help with this task but not today.
>>>>>>
>>>>>> -Dmitry
>>>>>>
>>>>>> On 2016-09-13 08:08, David Holmes wrote:
>>>>>>> On 13/09/2016 1:53 PM, David Holmes wrote:
>>>>>>>> On 13/09/2016 1:30 PM, Yasumasa Suenaga wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I could reproduce and I added a comment to JBS:
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8165869?focusedCommentId=14000623&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14000623
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> In case of BasicTests.java, I think it is a test bug.
>>>>>>>>
>>>>>>>> I don't agree as yet. I have not been able to isolate the exact
>>>>>>>> difference between what happens with your fix and what happens
>>>>>>>> without.
>>>>>>>> I know it relates to the presence of the error code, but also
>>>>>>>> how we
>>>>>>>> get
>>>>>>>> AgentInitializationException instead of AgentLoadException. I need
>>>>>>>> to be
>>>>>>>> able to run the test outside of jtreg but that is proving to be
>>>>>>>> very
>>>>>>>> difficult to arrange. :(
>>>>>>>
>>>>>>> I finally managed to connect all the pieces on this.
>>>>>>>
>>>>>>> The basic problem is that with the proposed change
>>>>>>> VirtualMachineImpl.execute() will throw AgentLoadException, which
>>>>>>> then
>>>>>>> leads to the InternalError. Without the change we reach this
>>>>>>> block in
>>>>>>> HotspotVirtualMachine.loadAgentLibrary:
>>>>>>>
>>>>>>> int result = readInt(in);
>>>>>>> if (result != 0) {
>>>>>>>    throw new AgentInitializationException("Agent_OnAttach failed",
>>>>>>> result);
>>>>>>> }
>>>>>>>
>>>>>>> and so get the expected AgentInitializationException.
>>>>>>>
>>>>>>> Looking at the proposed change in jvmtiExport.cpp if we have the
>>>>>>> original:
>>>>>>>
>>>>>>>       st->print_cr("%d", result);
>>>>>>>       result = JNI_OK;
>>>>>>>
>>>>>>> then execute() manages to read a zero completion status from the
>>>>>>> stream,
>>>>>>> and then loadAgentLibrary is able to read the actual "result" -
>>>>>>> whether
>>>>>>> zero or not - from the stream. This is because the AttachListener
>>>>>>> code
>>>>>>> calls op->complete(result, &st) where st is the stream where we
>>>>>>> wrote
>>>>>>> the result value above in print_cr. Then if we look at, for example,
>>>>>>> LinuxAttachOperation::complete, we will write "result" to the socket
>>>>>>> first, followed by the contents of st. Hence on a successful
>>>>>>> operation
>>>>>>> we can read 0 for execute, and then 0 for loadAgent. On error we
>>>>>>> read 0
>>>>>>> for execute and the actual error code for loadAgent. With the
>>>>>>> proposed
>>>>>>> change execute() sees the real result (instead of JNI_OK) and so
>>>>>>> throws
>>>>>>> AgentLoadException.
>>>>>>>
>>>>>>> So in summary there are two different error indicators written into
>>>>>>> the
>>>>>>> stream, and we need the first to be zero unless some major error
>>>>>>> with
>>>>>>> the actual attach functionality occurred - otherwise even if the
>>>>>>> "load"
>>>>>>> failed in some other way, it is treated as a secondary error.
>>>>>>>
>>>>>>> With that in mind I suspect the real fix for the original issue
>>>>>>> is to
>>>>>>> have Dcmd recognize that it also has to read two "results".
>>>>>>> Though I'm
>>>>>>> not sure how exactly.
>>>>>>>
>>>>>>> David
>>>>>>> -----
>>>>>>>
>>>>>>>> David
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> If it is accepted, I will upload webrev for this redo task.
>>>>>>>>> However, I cannot access (and fix) Oracle internal test. Can
>>>>>>>>> someone
>>>>>>>>> help me?
>>>>>>>>> (I cannot access JPRT. So I need a sponsor.)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Yasumasa
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 2016/09/13 9:00, David Holmes wrote:
>>>>>>>>>> I think I see the problem. The attach framework tries to load the
>>>>>>>>>> "instrument" agent library. Prior to 8164913 if this fails it
>>>>>>>>>> actually
>>>>>>>>>> appears to succeed. Now it fails, and that error propagates
>>>>>>>>>> through.
>>>>>>>>>> I'm guessing, at the moment, that the failure is due to a missing
>>>>>>>>>> module related option.
>>>>>>>>>>
>>>>>>>>>> David
>>>>>>>>>>
>>>>>>>>>> On 13/09/2016 9:54 AM, David Holmes wrote:
>>>>>>>>>>> Hi Yasumasa,
>>>>>>>>>>>
>>>>>>>>>>> On 13/09/2016 9:23 AM, Yasumasa Suenaga wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> Hmm, it has been backouted...
>>>>>>>>>>>>
>>>>>>>>>>>> I agree to David. This error seems to be a test bug.
>>>>>>>>>>>> Can you share the test? I want to evaluate it.
>>>>>>>>>>>
>>>>>>>>>>> Sorry we can't share the tests. I took a quick look and it
>>>>>>>>>>> seems it
>>>>>>>>>>> may
>>>>>>>>>>> be related to the result code parsing (that I thought we
>>>>>>>>>>> ended up
>>>>>>>>>>> not
>>>>>>>>>>> touching!).
>>>>>>>>>>>
>>>>>>>>>>> Can you run a test of HotSpotVirtualMachine.loadAgent(null) and
>>>>>>>>>>> see
>>>>>>>>>>> what
>>>>>>>>>>> happens? We see AgentLoadException from the code that internally
>>>>>>>>>>> tries
>>>>>>>>>>> to load the "instrument" agent:
>>>>>>>>>>>
>>>>>>>>>>> Exception in thread "main" Failure: Unexpected exception during
>>>>>>>>>>> test
>>>>>>>>>>> execution: java.lang.InternalError: instrument library is
>>>>>>>>>>> missing in
>>>>>>>>>>> target VM
>>>>>>>>>>> ...
>>>>>>>>>>> Caused by: java.lang.InternalError: instrument library is
>>>>>>>>>>> missing in
>>>>>>>>>>> target VM
>>>>>>>>>>> ...
>>>>>>>>>>> Caused by: com.sun.tools.attach.AgentLoadException: Failed to
>>>>>>>>>>> load
>>>>>>>>>>> agent
>>>>>>>>>>> library
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> I do not agree to fix this bug in JDK 10 because VM might lie
>>>>>>>>>>>> when
>>>>>>>>>>>> the
>>>>>>>>>>>> JVMTI agent could not be attached. IMHO this bug should be
>>>>>>>>>>>> fixed
>>>>>>>>>>>> in 9
>>>>>>>>>>>> GA, and we should fix testcase(s).
>>>>>>>>>>>
>>>>>>>>>>> I agree. It has to be noted that "we" failed to run all the
>>>>>>>>>>> appropriate
>>>>>>>>>>> tests before pushing this, which would have discovered the
>>>>>>>>>>> problem -
>>>>>>>>>>> assuming it is actually a problem with the change and not an
>>>>>>>>>>> unrelated
>>>>>>>>>>> issue in our testing environment.
>>>>>>>>>>>
>>>>>>>>>>> Unfortunately I have some high priority tasks to handle right
>>>>>>>>>>> now,
>>>>>>>>>>> so I
>>>>>>>>>>> can't go into this in any more depth at the moment.
>>>>>>>>>>>
>>>>>>>>>>> David
>>>>>>>>>>> -----
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 2016/09/13 6:15, David Holmes wrote:
>>>>>>>>>>>>> For the record it looks like the tests involved are broken, in
>>>>>>>>>>>>> that
>>>>>>>>>>>>> because the VM used to lie about the success of attaching the
>>>>>>>>>>>>> agent, the
>>>>>>>>>>>>> tests expected different exceptions to occur.
>>>>>>>>>>>>>
>>>>>>>>>>>>> David
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 13/09/2016 3:11 AM, harold seigel wrote:
>>>>>>>>>>>>>> Looks good.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Harold
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 9/12/2016 1:05 PM, Christian Tornqvist wrote:
>>>>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Please review this (clean) backout of the change for
>>>>>>>>>>>>>>> JDK-8164913, it
>>>>>>>>>>>>>>> failed
>>>>>>>>>>>>>>> several tests in the nightly testing. The failures are
>>>>>>>>>>>>>>> tracked in:
>>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8165869
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Webrev:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ctornqvi/webrev/8165881/webrev.00/
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Bug:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8165881
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Christian
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>
>>>>>>