[9] RFR(S): 8068945: Use RBP register as proper frame pointer in JIT compiled code on x86
Vladimir Kozlov
vladimir.kozlov at oracle.com
Wed Apr 22 21:47:37 UTC 2015
Looks like 2 issues left.
First, after discussion on mailing list lets set the flag off by default
and ask SQE to add it to rotation flags for Nightly testing (after you
push changes).
Second, I am concern about vframeStreamForte() call change. Call stack
may have several Call stubs. They should be skipped to get all java
frames on stack as we do in other places. Otherwise we can get incorrect
profiling information.
Do you mean "can NOT happen"?:
> Moreover, if the stack is walked synchronously (e.g., at safepoints), no
> problems appear either, because the synchronous interruption can happen
> while execution is within the method handle intrinsic.
Vladimir
On 4/22/15 9:26 AM, Zoltán Majó wrote:
> Hi Vladimir,
>
>
> I managed to do some more work on this enhancement. Please see details
> below.
>
> On 04/01/2015 03:38 PM, Zoltán Majó wrote:
>> Hi Vladimir,
>>
>>
>> On 03/30/2015 11:39 PM, Vladimir Kozlov wrote:
>>> On 3/30/15 2:13 PM, Zoltán Majó wrote:
>>>> Hi Vladimir,
>>>>
>>>>
>>>> thank you for the feedback!
>>>>
>>>> On 03/30/2015 10:27 PM, Vladimir Kozlov wrote:
>>>>> How about PreserveFramePointer instead of simple FramePointer?
>>>>>
>>>>> PreserveFramePointer will mean that compiled (or other) code will use
>>>>> that register only as Frame pointer.
>>>>
>>>> I will change the flag's name to PreserveFramePointer and will also
>>>> update the description.
>
> I changed the flag's name to PreserveFramePointer, just as you suggested.
>
>>>>
>>>>> Zoltan, x86 flags setting should be in general globals_x86.hpp. You
>>>>> can #ifdef _LP64 there too. I don't understand why you only set it to
>>>>> true on linux-x64.
>>>>
>>>> I remembered that the original discussion with Brendan Gregg mentioned
>>>> only Linux's perf tool as a possible use case for "proper" frame
>>>> pointers. So I was unsure whether to enable proper frame pointers by
>>>> default on other x64 platforms as well.
>>>>
>>>> But if you think it would be better to have proper frame pointers on
>>>> all
>>>> x64 platforms, I will change the code to set PreserveFramePointer to
>>>> true for all x64 platforms. Just please let me know.
>
> The current webrev sets the PreserveFramePointer flag to to true on all
> x86_64 platforms and to false on all other platforms.
>
>>>
>>> Currently compiled code for all x86 platforms is almost the same
>>> (win64 has difference in registers usage) and we should keep it that
>>> way.
>>>
>>> Also the original request was to have flag to enable such behavior
>>> (use RBP only as FP). So to have it off by default is acceptable. If
>>> performance group or someone find a regression (or bug) due to this
>>> change we can switch the flag off by default before jdk9 release.
>>>
>>> Try to run pstack on Solaris and jstack on OSX to make sure they
>>> report correct call stack with compiled java methods. And JFR.
>>> Also it would be nice to run SunStudio analyzer to verify that it works.
>>
>> I ran all tools you've suggested. JFR and jstack is unaffected, pstack
>> produces nice stack traces (it did not always do so before).
>
> I tested the current webrev with the following setup: I used two tests,
> one that generates a long chain of lambda form invocations and an other
> one that generates a long chain of "regular" method invocations. Both
> tests were executed on an x64 machine in four configurations: with +/-
> Xcomp and with +/- PreserveFramePointer.
>
> Just as before, JFR and jstack stack traces are unaffected for both
> tests, pstack can now produce stack traces with both tests if
> PreserveFramePointer is enabled.
>
>> However, I've encountered a problem with SunStudio: Two asserts fail
>> in the fastdebug build. Both of them "soft" failures, as neither the
>> VM nor SunStudio crash with the product build. I worked on the problem
>> today and have a partial understanding of the issue, but more
>> investigation is needed to have a patch that preserves the correct
>> behavior of SunStudio as well.
>
> I was able to track down the problems with SunStudio. I had to change
> the code at two places.
>
>
> Change #1 (in src/cpu/x86/vm/frame_x86.cpp):
>
> *** 222,232 ****
> }
>
> if (sender_blob->is_nmethod()) {
> nmethod* nm = sender_blob->as_nmethod_or_null();
> if (nm != NULL) {
> ! if (nm->is_deopt_mh_entry(sender_pc) ||
> nm->is_deopt_entry(sender_pc)) {
> return false;
> }
> }
> }
>
> --- 222,233 ----
> }
>
> if (sender_blob->is_nmethod()) {
> nmethod* nm = sender_blob->as_nmethod_or_null();
> if (nm != NULL) {
> ! if (nm->is_deopt_mh_entry(sender_pc) ||
> nm->is_deopt_entry(sender_pc) ||
> ! nm->method()->is_method_handle_intrinsic()) {
> return false;
> }
> }
> }
>
> The reason for this change is the following. Method handle intrinsics
> (i.e., the intrinsics _invokeBasic, _linkToVirtual,_linkToStatic,
> _linkToSpecial, and _linkToInterface) do not allocate stack space when
> invoked, but they can extend the stack space of their caller "temporarily".
>
> For example, if VerifyMethodHandles is enabled, some stack space is used
> during verification. The temporarily used stack space is released before
> the intrinsic jumps to its target. As a result, the target of a method
> handle intrinsic will have a correct SP when it returns and the
> program's control flow is correct.
>
> Moreover, if the stack is walked synchronously (e.g., at safepoints), no
> problems appear either, because the synchronous interruption can happen
> while execution is within the method handle intrinsic.
>
> The problem is that the SunStudio analyzer can interrupt the VM
> asynchonously and walk the stack. If execution of a thread is
> interrupted while the thread is in a method handle intrinsic, the SP
> might contain an invalid value.
>
> The new webrev adds a check that marks the current frame unsafe for
> sender if the frame belongs to a method handle intrinsic
> (frame::safe_for_sender returns false in this case).
>
>
> Change #2 (in src/share/vm/prims/forte.cpp):
>
> *** 425,435 ****
>
> RegisterMap map(thd, false);
> initial_Java_frame = initial_Java_frame.sender(&map);
> }
>
> ! vframeStreamForte st(thd, initial_Java_frame, false);
>
> for (; !st.at_end() && count < depth; st.forte_next(), count++) {
> bci = st.bci();
> method = st.method();
>
> --- 425,435 ----
>
> RegisterMap map(thd, false);
> initial_Java_frame = initial_Java_frame.sender(&map);
> }
>
> ! vframeStreamForte st(thd, initial_Java_frame, true);
>
> for (; !st.at_end() && count < depth; st.forte_next(), count++) {
> bci = st.bci();
> method = st.method();
>
> The problem is that the following assert in forte.cpp on line 103
>
> assert(filled_in, "invariant");
>
> fails. The problem appears if we have a stack trace like:
>
> V [libjvm.so+0x1c98c4a] void VMError::report(outputStream*)+0xb1a
> V [libjvm.so+0x1c9a3e8] void VMError::report_and_die()+0x748
> V [libjvm.so+0x1003c8e] void report_vm_error(const char*,int,const
> char*,const char*)+0x7e
> V [libjvm.so+0x10efa22] vframeStreamForte::vframeStreamForte #Nvariant
> 1(JavaThread*,frame,bool)+0xe2
> --> (Frame #5) V [libjvm.so+0x10f0bb9] void
> forte_fill_call_trace_given_top(JavaThread*,ASGCT_CallTrace*,int,frame)+0x789
>
> V [libjvm.so+0x10f1436] AsyncGetCallTrace+0x246
> C [libcollector.so+0x272a8] __collector_ext_jstack_unwind+0xb8
> C [libcollector.so+0x277df] __collector_get_frame_info+0x27f
> C [libcollector.so+0x2f093] __collector_getUserCtx+0x13
> C [libcollector.so+0x1abc7] __collector_ext_profile_handler+0x127
> C [libcollector.so+0x17535] collector_sigprof_dispatcher+0x85
> C [libc.so.1+0x122476] __sighndlr+0x6
> C [libc.so.1+0x115972] call_user_handler+0x2ce
> C [libc.so.1+0x115e1b] sigacthandler+0xdb
> C 0xffffffffffffffff
> V [libjvm.so+0x1959089] void os::PlatformEvent::park()+0xd9
> V [libjvm.so+0x18a6b34] int ParkCommon(ParkEvent*,long)+0x34
> V [libjvm.so+0x18a7657] int Monitor::IWait(Thread*,long)+0xb7
> V [libjvm.so+0x18a8b86] bool Monitor::wait(bool,long,bool)+0x346
> V [libjvm.so+0xf7073d] void
> CompileBroker::wait_for_completion(CompileTask*)+0xad
> V [libjvm.so+0xf6f6b6] void
> CompileBroker::compile_method_base(methodHandle,int,int,methodHandle,int,const
> char*,Thread*)+0x406
> V [libjvm.so+0xf6fd96]
> nmethod*CompileBroker::compile_method(methodHandle,int,int,methodHandle,int,const
> char*,Thread*)+0x586
> V [libjvm.so+0xbbad72] void
> AdvancedThresholdPolicy::submit_compile(methodHandle,int,CompLevel,JavaThread*)+0xb2
>
> V [libjvm.so+0x1aef92f] void
> SimpleThresholdPolicy::compile(methodHandle,int,CompLevel,JavaThread*)+0x14f
>
> V [libjvm.so+0xbbb00f] void
> AdvancedThresholdPolicy::method_invocation_event(methodHandle,methodHandle,CompLevel,nmethod*,JavaThread*)+0x1ff
>
> V [libjvm.so+0x1aef765]
> nmethod*SimpleThresholdPolicy::event(methodHandle,methodHandle,int,int,CompLevel,nmethod*,JavaThread*)+0x2e5
>
> V [libjvm.so+0xdb93bc] unsigned
> char*Runtime1::counter_overflow(JavaThread*,int,Method*)+0x31c
> v ~RuntimeStub::counter_overflow Runtime1 stub
> --> (Frame #29) J 143 C1
> java.net.URLClassLoader$1.run()Ljava/lang/Object; (5 bytes) @
> 0xffff80ffacc6962a [0xffff80ffacc69580+0xaa]
> --> (Frame #30) v ~StubRoutines::call_stub
> V [libjvm.so+0x13ca50b] void
> JavaCalls::call_helper(JavaValue*,methodHandle*,JavaCallArguments*,Thread*)+0x41b
>
> V [libjvm.so+0x152a111] JVM_DoPrivileged+0xfb1
> C [libjava.so+0x12f42]
> Java_java_security_AccessController_doPrivileged__Ljava_security_PrivilegedExceptionAction_2Ljava_security_AccessControlContext_2+0x12
>
> J 142
> java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object;
> (0 bytes) @ 0xffff80ffb400c57c [0xffff80ffb400c420+0x15c\
> ]
> J 134 C1
> java.net.URLClassLoader.findClass(Ljava/lang/String;)Ljava/lang/Class;
> (47 bytes) @ 0xffff80ffacc66014 [0xffff80ffacc65ec0+0x154]
> ... more stack frames
>
> The forte_fill_call_trace_given_top() method (Frame #5) first checks if
> the first Java frame found is fully decipherable (line 395 in
> forte.cpp). In our case the first Java frame is Frame #29 (the
> C1-compiled version of java.net.URLClassLoader$1.run).
>
> In our case Frame #29 is not decipherable, because
> java.net.URLClassLoader$1.run has been made "not entrant" (a C2-compiled
> version of the same method has been produced shortly before).
>
> Afterwards, forte_fill_call_trace_given_top() checks if the method is
> "safe for sender" (line 424 in forte.cpp). The caller of the
> java.net.URLClassLoader$1.run method is ~StubRoutines::call_stub, which
> is considered "safe for sender" by the VM.
>
> Then, initial_Java_frame is set to the ~StubRoutines::call_stub stub
> (line 430). This does not seem to be correct because the stub is not a
> Java method and causes the assert(filled_in, "invariant") in the
> constructor of vframeStreamForte (line 103 in forte.cpp) to fail
> (because the frame cannot be filled from a stub).
>
> To avoid this failure, I propose to call the constructor of
> vframeStreamForte with parameter stop_at_java_call_stub set to true
> (instead of false) so that the VM stops walking the stack if a call stub
> has been reached.
>
>
> Here is the updated webrev:
>
> http://cr.openjdk.java.net/~zmajo/8068945/webrev.02/
>
> In addition to testing the changeset with the tools mentioned before, I
> executed
> - all JPRT tests, all pass;
> - all java/lang/invoke and compiler JTREG tests; all tests that pass
> with the unmodified source trace pass with the changes as well.
>
> Thank you very much in advance!
>
> Best regards,
>
>
> Zoltan
>
>>
>> So that will put this RFR on hold for a while, unfortunately.
>>
>> Thank you for the feedback and suggestions so far!
>>
>> Best regards,
>>
>>
>> Zoltan
>>
>>
>>>
>>> Thanks,
>>> Vladimir
>>>
>>>>
>>>> Thank you!
>>>>
>>>> Best regards,
>>>>
>>>>
>>>> Zoltan
>>>>
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 3/30/15 8:30 AM, Zoltán Majó wrote:
>>>>>> Hi Ed,
>>>>>>
>>>>>>
>>>>>> thank you for your feedback! Please see comments below.
>>>>>>
>>>>>> On 03/30/2015 11:11 AM, Edward Nevill wrote:
>>>>>>> Hi Zoltán,
>>>>>>>
>>>>>>> On Fri, 2015-03-27 at 15:34 +0100, Zoltán Majó wrote:
>>>>>>>> Full JPRT run, all tests pass. I also ran all hotspot compiler
>>>>>>>> tests and
>>>>>>>> the jdk tests in java/lang/invoke on both x86_64 and x86_32. All
>>>>>>>> tests
>>>>>>>> that pass without the patch pass also with the patch.
>>>>>>>>
>>>>>>>> I ran the SPEC JVM 2008 benchmarks on our performance
>>>>>>>> infrastructure for
>>>>>>>> x86_64. The performance evaluation suggests that there is no
>>>>>>>> statistically significant performance degradation due to having
>>>>>>>> proper
>>>>>>>> frame pointers. Therefore I propose to have OmitFramePointer set to
>>>>>>>> false by default on x86_64 (and set to true on all other
>>>>>>>> platforms).
>>>>>>> This patch looks good, however I think there is a problem with the
>>>>>>> logic of OmitFramePointer.
>>>>>>>
>>>>>>> Here is my test case.
>>>>>>>
>>>>>>> --- CUT HERE ---
>>>>>>> // $Id: fibo.java,v 1.2 2000/12/24 19:10:50 doug Exp $
>>>>>>> // http://www.bagley.org/~doug/shootout/
>>>>>>>
>>>>>>> public class fibo {
>>>>>>> public static void main(String args[]) {
>>>>>>> int N = Integer.parseInt(args[0]);
>>>>>>> System.out.println(fib(N));
>>>>>>> }
>>>>>>> public static int fib(int n) {
>>>>>>> if (n < 2) return(1);
>>>>>>> return( fib(n-2) + fib(n-1) );
>>>>>>> }
>>>>>>> }
>>>>>>> --- CUT HERE ---
>>>>>>>
>>>>>>> If I run it as follows on my x86 64 bit linux.
>>>>>>>
>>>>>>> /work/images/jdk/bin/java -XX:-TieredCompilation
>>>>>>> -XX:+PrintCompilation
>>>>>>> -XX:CompileOnly=fibo::fib -XX:+UnlockDiagnosticVMOptions
>>>>>>> -XX:-OmitFramePointer -XX:+PrintAssembly fibo 43
>>>>>>>
>>>>>>> I get
>>>>>>>
>>>>>>> # {method} {0x00007fc62c97f388} 'fib' '(I)I' in 'fibo'
>>>>>>> # parm0: rsi = int
>>>>>>> # [sp+0x30] (sp of caller)
>>>>>>> 0x00007fc625071100: mov %eax,-0x14000(%rsp)
>>>>>>> 0x00007fc625071107: push %rbp
>>>>>>> 0x00007fc625071108: mov %rsp,%rbp
>>>>>>> 0x00007f836907110b: sub $0x20,%rsp ;*synchronization entry
>>>>>>>
>>>>>>> which is correct, it is NOT(-) OmitFramePointer, therefore it is
>>>>>>> using
>>>>>>> the frame pointer
>>>>>>>
>>>>>>> Now if I try just changing -XX:-OmitFramePointer to
>>>>>>> -XX:+OmitFramePointer in the above I get
>>>>>>>
>>>>>>> /work/images/jdk/bin/java -XX:-TieredCompilation
>>>>>>> -XX:+PrintCompilation
>>>>>>> -XX:CompileOnly=fibo::fib -XX:+UnlockDiagnosticVMOptions
>>>>>>> -XX:+OmitFramePointer -XX:+PrintAssembly fibo 43
>>>>>>>
>>>>>>> I get
>>>>>>>
>>>>>>> # {method} {0x00007f14d3c00388} 'fib' '(I)I' in 'fibo'
>>>>>>> # parm0: rsi = int
>>>>>>> # [sp+0x30] (sp of caller)
>>>>>>> 0x00007f14e1071100: mov %eax,-0x14000(%rsp)
>>>>>>> 0x00007f14e1071107: push %rbp
>>>>>>> 0x00007f14e1071108: sub $0x20,%rsp ;*synchronization entry
>>>>>>>
>>>>>>> which is correct, it is ID(+) OmitFramePointer, therefore it does
>>>>>>> not
>>>>>>> use a frame pointer.
>>>>>>>
>>>>>>> However, if I now delete the -XX:+/-OmitFramePointer altogether, IE
>>>>>>>
>>>>>>> /work/images/jdk/bin/java -XX:-TieredCompilation
>>>>>>> -XX:+PrintCompilation
>>>>>>> -XX:CompileOnly=fibo::fib -XX:+UnlockDiagnosticVMOptions
>>>>>>> -XX:+PrintAssembly fibo 43
>>>>>>>
>>>>>>> I get
>>>>>>>
>>>>>>> # {method} {0x00007f0c4b730388} 'fib' '(I)I' in 'fibo'
>>>>>>> # parm0: rsi = int
>>>>>>> # [sp+0x30] (sp of caller)
>>>>>>> 0x00007f0c75071100: mov %eax,-0x14000(%rsp)
>>>>>>> 0x00007f0c75071107: push %rbp
>>>>>>> 0x00007f0c75071108: sub $0x20,%rsp ;*synchronization entry
>>>>>>>
>>>>>>> It is not using a frame pointer which is the equivalent of
>>>>>>> -XX:+OmitFramePointer. However in your description above you say
>>>>>>>
>>>>>>>> Therefore I propose to have OmitFramePointer set to false by
>>>>>>>> default
>>>>>>>> on x86_64 (and set to true on all other platforms).
>>>>>>> whereas OmitFramePointer actually seems to be set to true on x86_64
>>>>>>>
>>>>>>> I think the problem may be with the declaration and definition of
>>>>>>> OmitFramePointer in globals.hpp and globals_x86.hpp
>>>>>>>
>>>>>>> In globals.hpp it does
>>>>>>>
>>>>>>> product(bool, OmitFramePointer, true,
>>>>>>>
>>>>>>> In globals_x86.hpp it does
>>>>>>>
>>>>>>> LP64_ONLY(define_pd_global(bool, OmitFramePointer, false););
>>>>>>>
>>>>>>> I am not sure that you can mix product(...) and product_pd(...) like
>>>>>>> this, so I think it just ends up getting the default from the
>>>>>>> product(...).
>>>>>>
>>>>>> You are right, mixing product and product_pd does not make sense
>>>>>> at all.
>>>>>> Thank you for doing additional testing and for drawing attention
>>>>>> to the
>>>>>> problem.
>>>>>>
>>>>>> I updated the code to use product_pd and define_pd_global on all
>>>>>> relevant platforms.
>>>>>>
>>>>>>> Aside: In general, I do not like options which include a negative in
>>>>>>> them because I have to do a double think when I see something like,
>>>>>>> -XX:-OmitFramePointer, as in, it is omitting the frame pointer,
>>>>>>> therefore it is using a frame pointer. How about FramePointer so we
>>>>>>> have -XX:+FramePointer to say I want frame pointers and
>>>>>>> -XX:-FramePointer to say I don't.
>>>>>>
>>>>>> That is a good idea. Double negation is an unnecessary
>>>>>> complication, so
>>>>>> I changed the name of the flag to FramePointer, just as you
>>>>>> suggested.
>>>>>>
>>>>>>>
>>>>>>> I did some timing on the above 'fibo' test
>>>>>>>
>>>>>>> [ed at mylittlepony java]$ time /work/images/jdk/bin/java
>>>>>>> -XX:-OmitFramePointer fibo 43
>>>>>>> 701408733
>>>>>>>
>>>>>>> real 0m1.545s
>>>>>>> user 0m1.571s
>>>>>>> sys 0m0.015s
>>>>>>> [ed at mylittlepony java]$ time /work/images/jdk/bin/java
>>>>>>> -XX:+OmitFramePointer fibo 43
>>>>>>> 701408733
>>>>>>>
>>>>>>> real 0m1.504s
>>>>>>> user 0m1.527s
>>>>>>> sys 0m0.019s
>>>>>>>
>>>>>>> which is ~3% difference on this test case. On aarch64, I see ~7%
>>>>>>> difference on this test case.
>>>>>>
>>>>>> Thank you for the performance measurements!
>>>>>>
>>>>>>> With the above change to fix the logic of OmitFramePointer (and
>>>>>>> possible change its name) the patch looks good to me.
>>>>>>
>>>>>> Here is the updated webrev (the same webrev that was already included
>>>>>> into my reply to Roland):
>>>>>>
>>>>>> http://cr.openjdk.java.net/~zmajo/8068945/webrev.01/
>>>>>>
>>>>>>> I will prepare a mirror patch for aarch64.
>>>>>>
>>>>>> That would be great!
>>>>>>
>>>>>> Thank you and best regards,
>>>>>>
>>>>>>
>>>>>> Zoltán
>>>>>>
>>>>>>>
>>>>>>> All the best,
>>>>>>> Ed.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>
>
More information about the hotspot-compiler-dev
mailing list