Critical JNI and (Shenandoah) pinning questions
David Holmes
david.holmes at oracle.com
Wed Aug 21 11:42:45 UTC 2019
Hi Zhengyu,
I don't subscribe to shenandoah-dev so this reply will only go to
yourself, Ioannis and Aleksey (initially anyway).
On 21/08/2019 9:19 pm, Zhengyu Gu wrote:
> Hi David,
>
> I did not run the benchmark, Ioannis (cc'd) provided the numbers.
It would be good to have information added to the bug - thanks.
> What Ioannis proposed, does make sense to me. The implementation is
> rather trivial, e.g. for x86_64:
>
> diff -r 348f7933e2cc src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp
> --- a/src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp Tue Aug 20
> 15:40:49 2019 +0100
> +++ b/src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp Wed Aug 21
> 07:12:55 2019 -0400
> @@ -2044,6 +2044,8 @@
>
> // Now the space for the inbound oop handle area
> int total_save_slots = 6 * VMRegImpl::slots_per_word; // 6
> arguments passed in registers
> + bool has_array_params = false;
> +
> if (is_critical_native) {
> // Critical natives may have to call out so they need a save area
> // for register arguments.
> @@ -2059,6 +2061,7 @@
> case T_CHAR:
> case T_INT: single_slots++; break;
> case T_ARRAY: // specific to LP64 (7145024)
> + has_array_params = true;
> case T_LONG: double_slots++; break;
> default: ShouldNotReachHere();
> }
> @@ -2231,7 +2234,7 @@
>
> const Register oop_handle_reg = r14;
>
> - if (is_critical_native &&
> !Universe::heap()->supports_object_pinning()) {
> + if (is_critical_native && has_array_params &&
> !Universe::heap()->supports_object_pinning()) {
> check_needs_gc_for_critical_native(masm, stack_slots,
> total_c_args, total_in_args,
> oop_handle_offset, oop_maps,
> in_regs, in_sig_bt);
> }
>
>
> What do you think?
It seems like a simple proposition but I'm not familiar enough with this
aspect of critical natives to really comment at this stage.
Thanks,
David
> Thanks,
>
> -Zhengyu
>
>
>
> On 8/19/19 3:52 PM, Zhengyu Gu wrote:
>> Hi Ioannis,
>>
>> Ah, I got what you proposed now. I filed following bug, hopefully, to
>> spark some discussions.
>>
>> https://bugs.openjdk.java.net/browse/JDK-8229895
>>
>> Thanks,
>>
>> -Zhengyu
>>
>> On 8/19/19 12:05 PM, Ioannis Tsakpinis wrote:
>>> Hey Zhengyu,
>>>
>>>> If there are no array parameters, there is no point to use
>>>> Get/SetXXXArrayCritical or native critical methods in the first place.
>>>
>>> It's true that CriticalJNINatives were added as an efficient way to
>>> access Java arrays from JNI code. However, the overhead of JNI calls
>>> affects all methods, especially methods that accept or return primitive
>>> values only and the JNI code does nothing but pass the arguments to
>>> another native function.
>>>
>>> There are thousands of JNI functions in LWJGL and almost all are like
>>> that, they simply cast arguments to the appropriate type and pass them
>>> to a target native function. Libraries like JNR and other JNI binding
>>> generators also look the same.
>>>
>>> The major benefit of using CriticalJNINatives for such functions is the
>>> removal of the first two standard JNI parameters: JNIEnv* and jclass.
>>> Normally that would only mean less register pressure, which may help in
>>> some cases. In practice though, native compilers are able to optimize
>>> away any argument shuffling and convert everything to a simple
>>> tail-call, i.e. a single jump instruction.
>>>
>>> We go from this for standard JNI:
>>>
>>> Java -> shuffle arguments -> JNI -> shuffle arguments -> native call
>>>
>>> to this for critical JNI:
>>>
>>> Java -> shuffle arguments -> JNI -> native call
>>>
>>> Example code and assembly output: https://godbolt.org/z/qZRIi1
>>>
>>> This has a measurable effect on JNI call overhead and becomes more
>>> important the simpler the target native function is. With Project Panama
>>> there is no JNI function and it should be possible to optimize the first
>>> argument shuffling too. Until then, this is the best we can do, unless
>>> there are opportunities to slim down the JNI wrapper even further for
>>> critical native methods (e.g. remove the safepoint polling if it's safe
>>> to do so).
>>>
>>> To sum up, the motivation is reduced JNI overhead. My argument is that
>>> primitive-only functions could benefit from significant overhead
>>> reduction with CriticalJNINatives. However, the GC locking effect is a
>>> major and unnecessary disadvantage. Shenandoah does a perfect job here
>>> because it supports region pinning and there's no actual locking
>>> happening in primitive-only functions. Every other GC though will choke
>>> hard with applications that make heavy use of critical natives (such as
>>> typical LWJGL applications). So, two requests:
>>>
>>> - PRIMARY: Skip check_needs_gc_for_critical_native() in primitive-only
>>> functions, regardless of GC algorithm and object-pinning support.
>>>
>>> - BONUS: JNI call overhead is significantly higher (3-4ns) on Java 10+
>>> compared to Java 8 (with or without critical natives). I went through
>>> the timeline of sharedRuntime_x86_64.cpp but couldn't spot anything that
>>> would justify such a difference (thread-local handshakes maybe?). I was
>>> wondering if this is a performance regression that needs to be looked
>>> into.
>>>
>>> Thank you,
>>>
>>> - Ioannis
>>>
More information about the shenandoah-dev
mailing list