Regarding 8033626: C2: Merging exception state: reexecute=true vs reexecute=false
Vladimir Ivanov
vladimir.x.ivanov at oracle.com
Mon Jun 9 11:16:09 UTC 2014
Here's the call sequence (excerpts from compiled method disassembly [1]
and copy-paste from debugging session [2]):
- compiled method calls _new_instance_Java (@ 0x10519d791)
(OptoRuntime::new_instance_C)
- OptoRuntime::new_instance_C throws OOM exception and
_new_instance_Java calls StubRoutines::forward exception
- StubRoutines::forward exception ==>
SharedRuntime::exception_handler_for_return_address ==>
SharedRuntime::raw_exception_handler_for_return_address, which returns
nm->exception_begin() (== 0x10519d860)
- exception handler in compiled method calls exception blob, which
calls OptoRuntime::handle_exception_C ==>
OptoRuntime::handle_exception_C_helper
- OptoRuntime::handle_exception_C_helper returns the address of
exception handler with uncommon trap in compiled method ( == 0x10519d83a)
Best regards,
Vladimir Ivanov
[1]
...
0x000000010519d77c: mov 0x70(%r15),%rbx
0x000000010519d780: mov %rbx,%r10
0x000000010519d783: add $0x330,%r10
0x000000010519d78a: cmp 0x80(%r15),%r10
0x000000010519d791: jae 0x000000010519d809
...
# _new_instance_Java
0x000000010519d809: movabs $0x7c0060240,%rsi ;
{metadata(synchronized 'Test$A')}
0x000000010519d813: callq 0x0000000105166260 ; OopMap{rbp=Oop off=184}
;*invokevirtual clone
; - Test::f at 1 (line 67)
; {runtime_call}
_new_istance_Java
...
# exception handler
0x000000010519d83a: mov $0xffffffa7,%esi
0x000000010519d83f: callq 0x00000001051074e0 ; OopMap{rbp=Oop off=228}
;*invokevirtual clone
; - Test::f at 1 (line 67)
; {runtime_call}
UncommonTrapBlob
0x000000010519d844: callq 0x0000000101c045f6 ;*invokevirtual clone
; - Test::f at 1 (line 67)
; {runtime_call}
os::breakpoint()
...
[Exception Handler]
[Stub Code]
0x000000010519d860: jmpq 0x00000001051669e0 ; {no_reloc}
[2] http://cr.openjdk.java.net/~vlivanov/8033626/oom_handling.txt
On 6/7/14 3:12 AM, Vladimir Ivanov wrote:
>
> >>> The test checks deoptimization for null check at the beginning of
>>>> intrinsic. Additional testing should be done for OOM exception coming
>>>> from allocation (during runtime call). Trigger OOM and run with
>>>> StressCompiledExceptionHandlers. With this flag code generated by
>>>> GraphKit::gen_stub() will be executed and
>>>> StubRoutines::forward_exception_entry() will be called. What will
>>>> happen
>>>> in this case?
>>> I tested such case manually (instrumented new_instance_C to throw OOM).
>>> The exception is forwarded (and pending exception is cleared) to the
>>> handler with contains uncommon trap. After deoptimization is done,
>>
>> So StubRoutines::forward_exception_entry() will call
>> SharedRuntime::raw_exception_handler_for_return_address(). Which one
>> code succeed?:
>>
>> nm->is_deopt_pc(return_address) and unpack_with_exception()
>> or it calls nm->exception_begin()
>>
>> I am trying to understand how it jumps to the uncommon trap code. Can
>> you show here the sequence of calls/jumps?
> It calls nm->exception_begin(). I'll send you complete sequence on Monday.
>
> Best regards,
> Vladimir Ivanov
>
>>
>> thanks,
>> Vladimir
>>
>>> Object::clone is reexecuted and it goes to JVM_Clone (diagnostic output
>>> [1]).
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>> [1]
>>> new_instance_C: OOM
>>> Exception <a 'java/lang/OutOfMemoryError'> (0x0000000115766790)
>>> thrown
>>> [/Users/vladimir/ws/jdk/ws/jdk9-dev/hotspot/src/share/vm/opto/runtime.cpp,
>>>
>>>
>>> line 241]
>>> for thread 0x00007f8dc881e800
>>> 1 [Exception (): a 'java/lang/OutOfMemoryError' in {method}
>>> {0x00000001566011e8} 'f' '(LTest;)Ljava/lang/Object;' in public
>>> synchronized 'Test' at 0x0000000111ff8818]
>>> Uncommon trap occurred in Test::f (@0x0000000111ff8844) thread=6403
>>> reason=unhandled action=none unloaded_class_index=-1
>>> DEOPT PACKING thread 0x00007f8dc881e800 Compiled frame
>>> (sp=0x0000000111d6b4c0 unextended sp=0x0000000111d6b4c0,
>>> fp=0x0000000115766bc0, real_fp=0x0000000111d6b4e0,
>>> pc=0x0000000111ff8844)
>>> nmethod 1568 1 Test::f (5 bytes)
>>>
>>> Virtual frames (innermost first):
>>> 0 - frame( sp=0x0000000111d6b4c0,
>>> unextended_sp=0x0000000111d6b4c0, fp=0x0000000115766bc0,
>>> pc=0x0000000111ff8844)
>>> Test.f(Test.java:67) - invokevirtual @ bci 1
>>> {method} {0x00000001566011e8} 'f' '(LTest;)Ljava/lang/Object;' in
>>> public synchronized 'Test'
>>> bci: 1
>>> locals:
>>> 0 conflict
>>> expressions:
>>> 0 a 'Test$A' <0x0000000115766bc0>
>>>
>>> Created vframeArray 0x00007f8dc882a618
>>> DEOPT UNPACKING thread 0x00007f8dc881e800 vframeArray 0x00007f8dc882a618
>>> mode 2
>>> [1 Interpreted Frame]
>>> Interpreted frame (sp=0x0000000111d6b488 unextended
>>> sp=0x0000000111d6b488, fp=0x0000000111d6b4d0,
>>> real_fp=0x0000000111d6b4d0, pc=0x0000000111e66b00)
>>> (~deoptimization entry points [0x0000000111e66b00, 0x0000000111e75a20]
>>> 61216 bytes
>>> )
>>> BufferBlob (0x0000000111e60590) used for Interpreter
>>>
>>> - local [0x0000000000000000]; #0
>>> - stack [0x0000000115766bc0]; #0
>>> - monitor[0x0000000111d6b490]
>>> - bcp [0x00000001566011c9]; @1
>>> - locals [0x0000000111d6b4e0]
>>> - method [0x00000001566011e8]; static jobject Test.f(jobject)
>>> {method} {0x00000001566011e8} 'f' '(LTest;)Ljava/lang/Object;' in
>>> public synchronized 'Test'
>>> bci: 1
>>> locals:
>>> 0 NULL <0x0000000000000000>
>>> locals size 1
>>> expression size 1
>>> {method} {0x00000001566011e8} 'f' '(LTest;)Ljava/lang/Object;' in public
>>> synchronized 'Test'
>>> JVM_Clone
>>>
>>>> Thanks,
>>>> Vladimir K
>>>>
>>>>
>>>> On 6/6/14 10:46 AM, Vladimir Ivanov wrote:
>>>>> I played with uncommon trap approach and it works fine.
>>>>>
>>>>> While I'm finishing extensive testing, I'd be grateful for early
>>>>> feedback.
>>>>>
>>>>> Preliminary fix:
>>>>> http://cr.openjdk.java.net/~vlivanov/8033626/webrev.00/
>>>>>
>>>>> The fix is to bail out to interpreter if any exception is generated in
>>>>> new_instance_C. There's no need to update exception state. When
>>>>> reexecuting the call, native version of Object::clone (JVM_Clone) will
>>>>> be invoked.
>>>>>
>>>>> Controlling exception handling explicitly by passing
>>>>> deoptimize_on_exception parameters looks cleaner to me, than having a
>>>>> check (is_LibraryCallKit() or JVMS reexecute bit) burden in
>>>>> GraphKit::make_slow_call_ex.
>>>>>
>>>>> I fixed only Object::clone case, since I don't see other places where
>>>>> similar IR shape can be produced.
>>>>>
>>>>> Best regards,
>>>>> Vladimir Ivanov
>>>>>
>>>>> On 6/3/14 2:27 AM, Vladimir Kozlov wrote:
>>>>>> Unfortunately it may not be a whole solution. The uncommon trap may
>>>>>> still needs to be recorded as an exceptions handler for compiled
>>>>>> code so
>>>>>> that when we return from new_instance_C() runtime call with OOM
>>>>>> exception it will find it. See GraphKit::gen_stub(),
>>>>>> generate_catch_exception() and
>>>>>> SharedRuntime::raw_exception_handler_for_return_address()
>>>>>>
>>>>>> Also do additional tests with StressCompiledExceptionHandlers
>>>>>> otherwise
>>>>>> we deoptimize immediately on the return from new_instance_C().
>>>>>>
>>>>>> In short, please investigate all return paths from new_instance_C()
>>>>>> when
>>>>>> there is pending OOM exception and we have uncommon trap instead of
>>>>>> CreateExNode.
>>>>>>
>>>>>> thanks,
>>>>>> Vladimir
>>>>>>
>>>>>> On 6/2/14 12:21 PM, Vladimir Ivanov wrote:
>>>>>>> Yes, uncommon trap should solve the problem.
>>>>>>> Thanks for excellent idea!
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Vladimir Ivanov
>>>>>>>
>>>>>>> On 5/31/14 6:30 AM, Vladimir Kozlov wrote:
>>>>>>>> I thought about this more. I think we should not generate
>>>>>>>> exceptions in
>>>>>>>> intrinsic when we are going to re-execute the method. Interpreter
>>>>>>>> will
>>>>>>>> throw exception if needed and intrinsic has wrong bci anyway
>>>>>>>> (bci of
>>>>>>>> call site and not allocation). *We should generate plain
>>>>>>>> "intrinsic"
>>>>>>>> uncommon trap in such case*.
>>>>>>>> There are GraphKit::is_LibraryCallKit() and is_Parse() methods
>>>>>>>> which
>>>>>>>> allow to check where the request for exception generation is coming
>>>>>>>> from.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Vladimir
>>>>>>>>
>>>>>>>> On 5/29/14 7:00 PM, Vladimir Kozlov wrote:
>>>>>>>>> Now I understand the problem. I think we need to keep separate an
>>>>>>>>> exception code which requires re-execution. Currently
>>>>>>>>> we don't check for should_reexecute() when we merge or search for
>>>>>>>>> existing exception code (we only compare type).
>>>>>>>>>
>>>>>>>>> Vladimir K.
>>>>>>>>>
>>>>>>>>> On 5/29/14 2:14 AM, Vladimir Ivanov wrote:
>>>>>>>>>> The problem is with exception state of two branches [1].
>>>>>>>>>> Exception on fast path comes from GraphKit::new_instance during
>>>>>>>>>> intrinsic construction in LibraryCallKit::inline_native_clone.
>>>>>>>>>> It is
>>>>>>>>>> executed under should_reexecute=true. PreserveReexecuteState
>>>>>>>>>> doesn't
>>>>>>>>>> touch exception state, so
>>>>>>>>>> kit.add_exception_states_from(new_jvms) (@
>>>>>>>>>> callGenerator.cpp:703) tries to accumulate incompatible
>>>>>>>>>> exceptions
>>>>>>>>>> from
>>>>>>>>>> fast & slow paths.
>>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>> Vladimir Ivanov
>>>>>>>>>>
>>>>>>>>>> [1] FAST PATH (intrinsic):
>>>>>>>>>>
>>>>>>>>>> JVMS depth=1 loc=5 stk=9 arg=10 mon=14 scalar=14 end=14
>>>>>>>>>> mondepth=0
>>>>>>>>>> sp=1
>>>>>>>>>> bci=12 reexecute=false method=virtual jobject
>>>>>>>>>>
>>>>>>>>>> Exception state:
>>>>>>>>>>
>>>>>>>>>> JVMS depth=1 loc=5 stk=9 arg=9 mon=14 scalar=14 end=14 mondepth=0
>>>>>>>>>> sp=0
>>>>>>>>>> bci=12 reexecute=true method=virtual jobject
>>>>>>>>>>
>>>>>>>>>> SLOW PATH
>>>>>>>>>>
>>>>>>>>>> JVMS depth=1 loc=5 stk=9 arg=10 mon=14 scalar=14 end=14
>>>>>>>>>> mondepth=0
>>>>>>>>>> sp=1
>>>>>>>>>> bci=12 reexecute=false method=virtual jobject
>>>>>>>>>>
>>>>>>>>>> Exception state:
>>>>>>>>>>
>>>>>>>>>> JVMS depth=1 loc=5 stk=9 arg=9 mon=14 scalar=14 end=14 mondepth=0
>>>>>>>>>> sp=0
>>>>>>>>>> bci=12 reexecute=false method=virtual jobject
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 5/29/14 2:18 AM, Vladimir Kozlov wrote:
>>>>>>>>>>> PredictedIntrinsicGenerator was added to have a general
>>>>>>>>>>> predicate.
>>>>>>>>>>> For
>>>>>>>>>>> example, AES crypto predicate checks klass of an object pointed
>>>>>>>>>>> by a
>>>>>>>>>>> field. PredictedCallGenerator checks klass of method's holder
>>>>>>>>>>> only.
>>>>>>>>>>>
>>>>>>>>>>> Both cases would have the same problem with reexecute. Using
>>>>>>>>>>> PredictedIntrinsicGenerator will not help you.
>>>>>>>>>>>
>>>>>>>>>>> But you should not have this problem at the merge point, because
>>>>>>>>>>> reexecute should be reset back after intrinsic is generated:
>>>>>>>>>>>
>>>>>>>>>>> { PreserveReexecuteState preexecs(this);
>>>>>>>>>>> jvms()->set_should_reexecute(true);
>>>>>>>>>>>
>>>>>>>>>>> It is strange that it is still true after PreserveReexecuteState
>>>>>>>>>>> destructor.
>>>>>>>>>>>
>>>>>>>>>>> Vladimir K
>>>>>>>>>>>
>>>>>>>>>>> On 5/28/14 9:37 AM, Vladimir Ivanov wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I'm looking at 8033626 [1]:
>>>>>>>>>>>> "assert(ex_map->jvms()->same_calls_as(_exceptions->jvms()))
>>>>>>>>>>>> failed:
>>>>>>>>>>>> all
>>>>>>>>>>>> collected exceptions must come from the same place".
>>>>>>>>>>>>
>>>>>>>>>>>> 8014447 [2] introduced new code shape for virtual
>>>>>>>>>>>> intrinsifiable
>>>>>>>>>>>> methods
>>>>>>>>>>>> (Object::hashCode & Object::clone): "invokevirtual
>>>>>>>>>>>> Object.clone()"
>>>>>>>>>>>> can
>>>>>>>>>>>> be represented as a PredictedCallGenerator with
>>>>>>>>>>>> LibraryIntrinsic on
>>>>>>>>>>>> fast
>>>>>>>>>>>> path and VirtualCallGenerator on slow path.
>>>>>>>>>>>>
>>>>>>>>>>>> The problem arises for Object::clone, because for OOM case
>>>>>>>>>>>> intrinsified
>>>>>>>>>>>> version requires reexecution (reexecute=true), but slow path
>>>>>>>>>>>> (virtual
>>>>>>>>>>>> dispatch) doesn't (reexecute=false). Further attempt to finish
>>>>>>>>>>>> the
>>>>>>>>>>>> diamond and merge 2 exception states fires the assertion.
>>>>>>>>>>>>
>>>>>>>>>>>> So, I have 2 questions:
>>>>>>>>>>>> (1) I don't see any way to merge reexecute=true &
>>>>>>>>>>>> reexecute=false
>>>>>>>>>>>> states, so both branches should have the same reexecute status.
>>>>>>>>>>>> Am I
>>>>>>>>>>>> missing something?
>>>>>>>>>>>>
>>>>>>>>>>>> (2) Does PredictedCallGenerator actually support such shape
>>>>>>>>>>>> (intrinsic on fast path)? There are 2 call generators:
>>>>>>>>>>>> PredictedIntrinsicGenerator & PredictedCallGenerator. What's
>>>>>>>>>>>> the
>>>>>>>>>>>> difference?
>>>>>>>>>>>>
>>>>>>>>>>>> I see there's a way to customize predicate in
>>>>>>>>>>>> LibraryCallKit::try_to_predicate() for
>>>>>>>>>>>> PredictedIntrinsicGenerator,
>>>>>>>>>>>> but
>>>>>>>>>>>> is it the only reason why PredictedIntrinsicGenerator was
>>>>>>>>>>>> introduced?
>>>>>>>>>>>>
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>> Vladimir Ivanov
>>>>>>>>>>>>
>>>>>>>>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8033626
>>>>>>>>>>>>
>>>>>>>>>>>> [2] https://bugs.openjdk.java.net/browse/JDK-8014447
>>>>>>>>>>>> "Object.hashCode intrinsic breaks inline caches"
More information about the hotspot-compiler-dev
mailing list