RFR(L): 8185265 [MVT] improve performance of return of value types with new calling convention
Tobias Hartmann
tobias.hartmann at oracle.com
Wed Aug 9 12:15:28 UTC 2017
Hi Roland,
nice work but lots of changes :) I tried to have a detailed look.
sharedRuntime_x86_64.cpp
- What is MacroAssembler* masm used for?
- Maybe call the buffer "value type transform" or "value type packing/unpacking" instead of "value types"
macro.cpp
- I think the comment in line 2643 should be adjusted because we may now either call into the runtime or into the
pack_handler.
- There is a UseTLAB assert but I don't see where this is checked when adding the macro node. Also, we could fall back
to the runtime call if UseTLAB is disabled (like we do in the interpreter), right?
- Your usage of "Node* n" vs. "Node *n" is a bit inconsistent (I prefer "Node* n")
- The "needgc" naming is a bit confusing, this isn't really about the GC but about if we have enough space in the TLAB.
castnode.cpp
- Please add a comment to the mask check explaining the two cases (returns as fields vs. returned as oop)
templateInterpreterGenerator_x86.cpp
- As I understand, a pack/unpack handler is always set if the value type can be returned in registers. Why do we need
the pack_handler != 0 check? Isn't the __ testptr(rax, 1) sufficient?
ValueTypeTestBench.java:
- Why did you change COMP_LEVEL_ANY to -2?
- Should we enable StressValueTypeReturnedAsFields in our tests?
I've run this through JPRT and found the following crash during class unloading with some internal test on Solaris and
Windows:
# SIGSEGV (0xb) at pc=0xffff80fb73dff109, pid=28909, tid=19
#
# JRE version: OpenJDK Runtime Environment (10.0) (build 10-internal+0-2017-08-08-063310.tobias.valhalla)
# Java VM: OpenJDK 64-Bit Server VM (10-internal+0-2017-08-08-063310.tobias.valhalla, compiled mode, compressed oops, g1
gc, solaris-amd64)
# Problematic frame:
# V [libjvm.so+0xfff109] void BufferBlob::free(BufferBlob*)+0xd9
Stack: [0xffff80ffb2efd000,0xffff80ffb2ffd000], sp=0xffff80ffb2ffbe50, free space=1019k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
V [libjvm.so+0xfff109] void BufferBlob::free(BufferBlob*)+0xd9
V [libjvm.so+0xfabf43] _$c1A.__1cPClassLoaderDataQvalue_classes_do6MpFpnKValueKlass__v_v_+0x33
V [libjvm.so+0xfab033] bool ClassLoaderDataGraph::do_unloading(BoolObjectClosure*,bool)+0x233
V [libjvm.so+0x1ae61e6] bool SystemDictionary::do_unloading(BoolObjectClosure*,bool)+0x16
V [libjvm.so+0x11b91d7] void G1MarkSweep::mark_sweep_phase1(bool&,bool)+0x417
V [libjvm.so+0x11b8cfb] void G1MarkSweep::invoke_at_safepoint(ReferenceProcessor*,bool)+0x7b
V [libjvm.so+0x118a045] bool G1CollectedHeap::do_full_collection(bool,bool)+0x465
V [libjvm.so+0x1bd7d6c] void VM_G1CollectFull::doit()+0x6c
V [libjvm.so+0x1bd5d48] void VM_Operation::evaluate()+0xd8
V [libjvm.so+0x1bd3d61] void VMThread::evaluate_operation(VM_Operation*)+0x161
V [libjvm.so+0x1bd4589] void VMThread::loop()+0x2d9
V [libjvm.so+0x1bd3a3e] void VMThread::run()+0x7e
V [libjvm.so+0x1905060] thread_native_entry+0x240
C [libc.so.1+0x125221] _thrp_setup+0xa5
C [libc.so.1+0x1254c0] _lwp_start+0x0
Best regards and greetings from Bangalore,
Tobias
On 05.08.2017 01:12, Roland Westrelin wrote:
>
> (that patch includes some runtime/interpreter changes)
>
> http://cr.openjdk.java.net/~roland/8185265/webrev.00/
>
> JDK-8184795 disabled compilation of LFs as compilation root if they
> return a value type. The logic in CheckCastPPNode::Ideal() expects
> values to be returned in registers but if the called method is a lambda
> form, a value can be returned as a buffered value. I fixed the logic in
> CheckCastPPNode::Ideal() to properly handle return of a buffered value.
>
> One of Sergey's micro benchmarks takes forever to warmup. I found it's
> related to the return of value types and, in particular, calling the
> runtime to pack/unpack values (i.e. load fields in registers on return,
> allocate a buffered value and initialize it once returned to the
> caller). It causes a ~10x slow down during warmup. To address this, the
> VM now generates little pieces of assembly code to perform the packing
> or unpacking. On return, unpacking a buffered value is performed by
> calling the value klass's unpack unhandler. When returned in the caller,
> a fast path allocation is attempted for a new value from the TLAB (I
> left off heap buffers out for now). If that allocation succeeds the
> value klass pack handler is called an initializes the value. If the
> allocation fails, we fall back to the same runtime call that is used
> currently (that calls knows there can be live oops, where to look for
> them and how to preserve them across a GC).
>
> C2 doesn't support calling to an address that is loaded in a register. I
> hacked support for it from the existing CallLeafNoFP node. It's somewhat
> ugly but I would say it's good enough for now.
>
> With this change, AFAICT, the warmup issue caused by return of value
> types in Sergey's test is gone. It doesn't mean it warmups fast (we
> still have some disabled compilation of LF for another reason) but it's
> now decent.
>
> Roland.
>
More information about the valhalla-dev
mailing list