7013347: allow crypto functions to be called inline to enhance performance

Mon Jan 30 22:25:11 PST 2012

On Jan 30, 2012, at 8:10 PM, Vladimir Kozlov wrote:

> In NativeLookup::lookup_critical_style() jni_name id only used when (dll != NULL). Should the name construction be moved inside this check?

I rearranged it.

address NativeLookup::lookup_critical_style(methodHandle method, char* pure_name, const char* long_name, int args_size, bool os_style) {
  if (!method->has_native_function()) {
    return NULL;
  }

  address current_entry = method->native_function();

  char dll_name[MAXPATHLEN];
  int offset;
  if (os::dll_address_to_library_name(current_entry, dll_name, sizeof(dll_name), &offset)) {
    char ebuf[32];
    void* dll = os::dll_load(dll_name, ebuf, sizeof(ebuf));
    if (dll != NULL) {
      // Compute complete JNI name for style                                                                                                            
      stringStream st;
      if (os_style) os::print_jni_name_prefix_on(&st, args_size);
      st.print_raw(pure_name);
      st.print_raw(long_name);
      if (os_style) os::print_jni_name_suffix_on(&st, args_size);
      char* jni_name = st.as_string();
      return (address)os::dll_lookup(dll, jni_name);
    }
  }

  return NULL;
}

> 
> The same for critical_name in lookup_critical_entry().

I moved it down below the test for object.

> 
> In check_for_lazy_critical_native() should you check that stub_cb != NULL?

I can.

> Also this code has typo (ifdef?):
> + #ifndef ASSERT

Yes.  Thanks.

> 
> 
> Why you don't use new branch instructions in your sparc assembler code?

I forgot it existed.

> Next code loads one byte but you tests all bits:
> 
> +   __ load_bool_contents(sync_state, G3_scratch);
> +   __ cmp(G3_scratch, 0);
> +   __ brx(Assembler::equal, false, Assembler::pt, cont);
> +   __ delayed()->nop();
> 
> you can replace it with:
> 
> +   __ load_bool_contents(sync_state, G3_scratch);
> +   __ cmp_zero_and_br(Assembler::equal, G3_scratch, cont);
> 
> ----
> +   __ ba(done);
> +   __ delayed()->nop();
> 
> with:
> 
> +   __ ba_short(done);

Thanks.

> 
> 
> Don't use next short forward branch in both x86_32 and x86_64 files since the code could be large depending on number of arguments:
> 
> +   __ cmp8(ExternalAddress((address)GC_locker::needs_gc_address()), false);
> +   __ jccb(Assembler::equal, cont);

Code is only emitted for arguments that are in registers so it should be relatively small but maybe it's not small enough.

Thanks.

> 
> 
> Thanks,
> Vladimir
> 
> Tom Rodriguez wrote:
>> http://cr.openjdk.java.net/~never/7013347
>> 1133 lines changed: 979 ins; 56 del; 98 mod; 35796 unchg
>> 7013347: allow crypto functions to be called inline to enhance performance
>> Reviewed-by:
>> This is a long one.
>> The synopsis of this is slightly misleading.  This doens't allow
>> direct calls to native routines from Java but it does attempt to
>> reduce the overhead of using JNI for specific use cases while still
>> maintaining the safety invariants that JNI provdies.  For native code
>> that runs in a bounded time JNI provides a function called
>> GetPrimtiveArrayCritical which may provide direct access to the body
>> of Java arrays of primitive.  In Hotspot this is accomplished by
>> suppressing garbage collection while these pointers are exposed to
>> native code.  This is accomplished with the GC_locker class which is
>> basically a readers/writers lock.  Note that the GC_locker doesn't
>> suppress safepointing, just garbage collections.  There are many
>> operations which require a safepoint to make forward progress, so
>> suppressing them indefinitely isn't acceptable.
>> This RFE provides is a shorthand for the use of
>> GetPrimtiveArrayCritical by defining an alternate native calling
>> convention that only allows the use of primitive or arrays of
>> primtive.  The native method must also be static since non-static
>> methods are passed the receiver as an argument and Java objects aren't
>> allowed.  Synchronization and exceptions aren't allowed either.  The
>> Java code calling these natives is fee to use all of those features so
>> it's not that onerous of a restriction.
>> The benefits of this approach are that JVM can more quickly do the
>> work inline that would normally be done by the
>> GetPrimtiveArrayCritical/ReleasePrimtiveArrayCritical function calls.
>> Calling back into the JVM through JNI requires synchronization with
>> the JVM and each upcall adds a minimum overhead to the native routine.
>> This helps to reduce the overhead to a more fixed cost per call.  It
>> also simplifies the work that the caller must do since synchronization
>> and exceptions aren't allowed.  For now this work is being done in the
>> existing native wrapper generation but with some more simplification
>> this could be more easily inlined directly into the caller.
>> The signature of the native routine follows the same name mangling as
>> normal JNI methods but they start with JavaCritical_ instead of Java_.
>> Any array arguments are unpacked into a pair of arguments, the length
>> followed by a pointer to the body of the array.  If the incoming array
>> is NULL then the body pointer is NULL and the length is 0.
>> Currently this is a JDK private interface while we gain some
>> experience with it but it will likely become a more standard
>> extension.  It's also an optional extension so a native library is
>> required to provide the normal point in addition to the alternate
>> entry point.
>> The changes consist of three parts.  The first is the lookup logic
>> that finds the alternate native entry point.  JNI critical natives
>> currently can only be found through dynamic lookup.  JNI
>> RegisterNatives doesn't know about these functions so there's no way
>> to provide the alternate entry point.
>> The second part is the lazy critical entry logic.  The fix for 7129164
>> introduced code that computed the JNI active count during
>> safepointing.  Now as part of that computation, if a thread is seen to
>> be in thread_in_native state and the nmethod on the top of stack is a
>> critical native wrapper, then the critical count for that thread is
>> incremented and the suspend flags are set so that when the nmethod
>> returns the native code it will call back into the runtime and do the
>> unlock of the critical native.
>> The last part are the native wrappers themselves.  When compiling a
>> critical native wrapper, they emit a new check of GC_locker::_needs_gc
>> and they call into the runtime if it's true.  This keeps them from
>> starting new JNI critical sections if a GC has been requested.  The
>> arguments are unpacked following the alternate calling convention and
>> the method is called as it normally would be.
>> On return the wrapper checks the suspend flags as it normally would
>> and calls back into the runtime where is might have to block and force
>> a GC if it's the last thread exiting the GC_locker.  This required
>> some slightly different handling of the final transition back to
>> thread_in_Java since we have to allow blocking.
>> The wrappers are only generated differently if they are compiling a
>> critical native so it shouldn't have much effect on normal execution.
>> The only library currently taking advantage of this is the new ucrypto
>> provider on Solaris.  For some crypto operations it improves
>> throughput by 20% or more because the crypto routines are fast enough
>> that the JNi overhead is significant.  It's expected that other parts
>> of the JDK will take advantage of it going forward and hopefully it
>> can be tightened up further.
>> Tested with new crypto provider and microbenchmark test case.  Also
>> ran runthese.