Critical JNI and (Shenandoah) pinning questions

Wed Aug 21 11:19:54 UTC 2019

Hi David,

I did not run the benchmark, Ioannis (cc'd) provided the numbers.

What Ioannis proposed, does make sense to me. The implementation is 
rather trivial, e.g. for x86_64:

diff -r 348f7933e2cc src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp

--- a/src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp      Tue Aug 20 
15:40:49 2019 +0100
+++ b/src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp      Wed Aug 21 
07:12:55 2019 -0400
@@ -2044,6 +2044,8 @@

    // Now the space for the inbound oop handle area
    int total_save_slots = 6 * VMRegImpl::slots_per_word;  // 6 
arguments passed in registers
+  bool has_array_params = false;
+
    if (is_critical_native) {
      // Critical natives may have to call out so they need a save area
      // for register arguments.
@@ -2059,6 +2061,7 @@
            case T_CHAR:
            case T_INT:  single_slots++; break;
            case T_ARRAY:  // specific to LP64 (7145024)
+            has_array_params = true;
            case T_LONG: double_slots++; break;
            default:  ShouldNotReachHere();
          }
@@ -2231,7 +2234,7 @@

    const Register oop_handle_reg = r14;

-  if (is_critical_native && !Universe::heap()->supports_object_pinning()) {
+  if (is_critical_native && has_array_params && 
!Universe::heap()->supports_object_pinning()) {
      check_needs_gc_for_critical_native(masm, stack_slots, 
total_c_args, total_in_args,
                                         oop_handle_offset, oop_maps, 
in_regs, in_sig_bt);
    }


What do you think?

Thanks,

-Zhengyu



On 8/19/19 3:52 PM, Zhengyu Gu wrote:
> Hi Ioannis,
> 
> Ah, I got what you proposed now. I filed following bug, hopefully, to 
> spark some discussions.
> 
> https://bugs.openjdk.java.net/browse/JDK-8229895
> 
> Thanks,
> 
> -Zhengyu
> 
> On 8/19/19 12:05 PM, Ioannis Tsakpinis wrote:
>> Hey Zhengyu,
>>
>>> If there are no array parameters, there is no point to use
>>> Get/SetXXXArrayCritical or native critical methods in the first place.
>>
>> It's true that CriticalJNINatives were added as an efficient way to
>> access Java arrays from JNI code. However, the overhead of JNI calls
>> affects all methods, especially methods that accept or return primitive
>> values only and the JNI code does nothing but pass the arguments to
>> another native function.
>>
>> There are thousands of JNI functions in LWJGL and almost all are like
>> that, they simply cast arguments to the appropriate type and pass them
>> to a target native function. Libraries like JNR and other JNI binding
>> generators also look the same.
>>
>> The major benefit of using CriticalJNINatives for such functions is the
>> removal of the first two standard JNI parameters: JNIEnv* and jclass.
>> Normally that would only mean less register pressure, which may help in
>> some cases. In practice though, native compilers are able to optimize
>> away any argument shuffling and convert everything to a simple
>> tail-call, i.e. a single jump instruction.
>>
>> We go from this for standard JNI:
>>
>> Java -> shuffle arguments -> JNI -> shuffle arguments -> native call
>>
>> to this for critical JNI:
>>
>> Java -> shuffle arguments -> JNI -> native call
>>
>> Example code and assembly output: https://godbolt.org/z/qZRIi1
>>
>> This has a measurable effect on JNI call overhead and becomes more
>> important the simpler the target native function is. With Project Panama
>> there is no JNI function and it should be possible to optimize the first
>> argument shuffling too. Until then, this is the best we can do, unless
>> there are opportunities to slim down the JNI wrapper even further for
>> critical native methods (e.g. remove the safepoint polling if it's safe
>> to do so).
>>
>> To sum up, the motivation is reduced JNI overhead. My argument is that
>> primitive-only functions could benefit from significant overhead
>> reduction with CriticalJNINatives. However, the GC locking effect is a
>> major and unnecessary disadvantage. Shenandoah does a perfect job here
>> because it supports region pinning and there's no actual locking
>> happening in primitive-only functions. Every other GC though will choke
>> hard with applications that make heavy use of critical natives (such as
>> typical LWJGL applications). So, two requests:
>>
>> - PRIMARY: Skip check_needs_gc_for_critical_native() in primitive-only
>> functions, regardless of GC algorithm and object-pinning support.
>>
>> - BONUS: JNI call overhead is significantly higher (3-4ns) on Java 10+
>> compared to Java 8 (with or without critical natives). I went through
>> the timeline of sharedRuntime_x86_64.cpp but couldn't spot anything that
>> would justify such a difference (thread-local handshakes maybe?). I was
>> wondering if this is a performance regression that needs to be looked
>> into.
>>
>> Thank you,
>>
>> - Ioannis
>>