Critical JNI and (Shenandoah) pinning questions
Zhengyu Gu
zgu at redhat.com
Wed Aug 21 11:19:54 UTC 2019
Hi David,
I did not run the benchmark, Ioannis (cc'd) provided the numbers.
What Ioannis proposed, does make sense to me. The implementation is
rather trivial, e.g. for x86_64:
diff -r 348f7933e2cc src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp
--- a/src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp Tue Aug 20
15:40:49 2019 +0100
+++ b/src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp Wed Aug 21
07:12:55 2019 -0400
@@ -2044,6 +2044,8 @@
// Now the space for the inbound oop handle area
int total_save_slots = 6 * VMRegImpl::slots_per_word; // 6
arguments passed in registers
+ bool has_array_params = false;
+
if (is_critical_native) {
// Critical natives may have to call out so they need a save area
// for register arguments.
@@ -2059,6 +2061,7 @@
case T_CHAR:
case T_INT: single_slots++; break;
case T_ARRAY: // specific to LP64 (7145024)
+ has_array_params = true;
case T_LONG: double_slots++; break;
default: ShouldNotReachHere();
}
@@ -2231,7 +2234,7 @@
const Register oop_handle_reg = r14;
- if (is_critical_native && !Universe::heap()->supports_object_pinning()) {
+ if (is_critical_native && has_array_params &&
!Universe::heap()->supports_object_pinning()) {
check_needs_gc_for_critical_native(masm, stack_slots,
total_c_args, total_in_args,
oop_handle_offset, oop_maps,
in_regs, in_sig_bt);
}
What do you think?
Thanks,
-Zhengyu
On 8/19/19 3:52 PM, Zhengyu Gu wrote:
> Hi Ioannis,
>
> Ah, I got what you proposed now. I filed following bug, hopefully, to
> spark some discussions.
>
> https://bugs.openjdk.java.net/browse/JDK-8229895
>
> Thanks,
>
> -Zhengyu
>
> On 8/19/19 12:05 PM, Ioannis Tsakpinis wrote:
>> Hey Zhengyu,
>>
>>> If there are no array parameters, there is no point to use
>>> Get/SetXXXArrayCritical or native critical methods in the first place.
>>
>> It's true that CriticalJNINatives were added as an efficient way to
>> access Java arrays from JNI code. However, the overhead of JNI calls
>> affects all methods, especially methods that accept or return primitive
>> values only and the JNI code does nothing but pass the arguments to
>> another native function.
>>
>> There are thousands of JNI functions in LWJGL and almost all are like
>> that, they simply cast arguments to the appropriate type and pass them
>> to a target native function. Libraries like JNR and other JNI binding
>> generators also look the same.
>>
>> The major benefit of using CriticalJNINatives for such functions is the
>> removal of the first two standard JNI parameters: JNIEnv* and jclass.
>> Normally that would only mean less register pressure, which may help in
>> some cases. In practice though, native compilers are able to optimize
>> away any argument shuffling and convert everything to a simple
>> tail-call, i.e. a single jump instruction.
>>
>> We go from this for standard JNI:
>>
>> Java -> shuffle arguments -> JNI -> shuffle arguments -> native call
>>
>> to this for critical JNI:
>>
>> Java -> shuffle arguments -> JNI -> native call
>>
>> Example code and assembly output: https://godbolt.org/z/qZRIi1
>>
>> This has a measurable effect on JNI call overhead and becomes more
>> important the simpler the target native function is. With Project Panama
>> there is no JNI function and it should be possible to optimize the first
>> argument shuffling too. Until then, this is the best we can do, unless
>> there are opportunities to slim down the JNI wrapper even further for
>> critical native methods (e.g. remove the safepoint polling if it's safe
>> to do so).
>>
>> To sum up, the motivation is reduced JNI overhead. My argument is that
>> primitive-only functions could benefit from significant overhead
>> reduction with CriticalJNINatives. However, the GC locking effect is a
>> major and unnecessary disadvantage. Shenandoah does a perfect job here
>> because it supports region pinning and there's no actual locking
>> happening in primitive-only functions. Every other GC though will choke
>> hard with applications that make heavy use of critical natives (such as
>> typical LWJGL applications). So, two requests:
>>
>> - PRIMARY: Skip check_needs_gc_for_critical_native() in primitive-only
>> functions, regardless of GC algorithm and object-pinning support.
>>
>> - BONUS: JNI call overhead is significantly higher (3-4ns) on Java 10+
>> compared to Java 8 (with or without critical natives). I went through
>> the timeline of sharedRuntime_x86_64.cpp but couldn't spot anything that
>> would justify such a difference (thread-local handshakes maybe?). I was
>> wondering if this is a performance regression that needs to be looked
>> into.
>>
>> Thank you,
>>
>> - Ioannis
>>
More information about the shenandoah-dev
mailing list