Request for review (M): 7171890: C1: add Class.isInstance intrinsic
Rémi Forax
forax at univ-mlv.fr
Thu May 31 15:55:53 PDT 2012
On 06/01/2012 12:05 AM, Christian Thalinger wrote:
>
> On May 31, 2012, at 6:26 AM, Krystal Mok wrote:
>
>> On Thu, May 31, 2012 at 3:10 PM, John Rose <john.r.rose at oracle.com
>> <mailto:john.r.rose at oracle.com>> wrote:
>>
>> On May 30, 2012, at 7:55 PM, Krystal Mok wrote:
>>
>>> Yes, it's doable. I'll just take the same approach for
>>> clazz1.isAssignableFrom(clazz2).
>>
>> It's trickier, since you can't just repurpose the C1 InstanceOf
>> node. It looks like you'll have to refactor machine-dependent
>> code to cut in the new logic.
>>
>> Yes, I have noticed that already. It's not the same as the
>> Class.isInstance(). The (quick-n-dirty) plan was to fold the case
>> where both clazz1 and clazz2 are constants, and emit a leaf runtime
>> call for non-constant cases. Since Class.isAssignableFrom() only
>> throws NPE if any of clazz1 or clazz2 is null, it should be okay to
>> make a leaf call after null-checking them.
>>
>> The complete solution, as you suggested, would have to involve
>> changes to platform-dependent code.
>>
>> For a comparison, see inline_native_subtype_check in C2, versus
>> the "_isInstance" cases of inline_native_Class_query. The
>> intrinsic for Class.isAssignableFrom is surprisingly more complex
>> and specialized than the intrinsic for Class.isInstance.
>>
>> (For C2-ish reasons, the intrinsic logic in library_call.cpp is
>> machine-independent, so it's easier to do than in C1.)
>>
>> I did notice this from the start, too. It's so tempting to add
>> finer-grain IR to C1 so that it can do more optimizations; but that
>> feels against the theme of C1.
>>
>> Unless you find a simple way to manage the C1 changes, you might
>> want to stick with isInstance only, this time around.
>>
>> Yes, I agree. I wouldn't mind making a more complete solution for C1
>> Class intrinsics in future changes.
>>
>> In any case, we'll try what you have done already; I am confident
>> it will do good things for our dynamic language codes.
>>
>> Thanks, really looking forward to the numbers :-)
>
> The numbers are okay; it shaves off about 9% of run time:
>
> cthaling at intelsdv03.us.oracle.com:~/mlvm/jruby$ jruby -J-client
> -J-showversion -X+C bench/bench_red_black.rb
> Picked up _JAVA_OPTIONS: -XX:+UnlockExperimentalVMOptions
> -XX:-AllowChainedMethodHandles -XX:+AllowLambdaForms -Xverify:all -esa
> -Xbootclasspath/p:/home/cthaling/mlvm/jdk/classes/
> java version "1.8.0-ea"
> Java(TM) SE Runtime Environment (build 1.8.0-ea-b40)
> Java HotSpot(TM) Client VM (build 24.0-b08-internal, mixed mode)
>
> 17.479
> GC.count = 46
> 10.25
> GC.count = 54
> 9.732
> GC.count = 61
> 9.633
> GC.count = 68
> 9.568
> GC.count = 74
> 9.499
> GC.count = 81
> 9.634
> GC.count = 87
> 9.787
> GC.count = 94
> 9.739
> GC.count = 101
> 9.702
> GC.count = 107
>
>
> Flat profile of 114.73 secs (10532 total ticks): main
>
> Interpreted + native Method
> 2.6% 0 + 277 java.io.FileOutputStream.open
> 2.1% 0 + 221 java.io.FileOutputStream.close0
> 0.8% 87 + 0
> bench.bench_red_black.method__27$RUBY$insert_helper
> 0.6% 65 + 0 bench.bench_red_black.method__16$RUBY$minimum
> 0.1% 0 + 15 java.lang.Class.forName0
> 0.1% 0 + 12 sun.misc.Unsafe.defineAnonymousClass
> 0.1% 1 + 11 org.jruby.Ruby.initCore
> 0.1% 10 + 0 bench.bench_red_black.method__25$RUBY$left_rotate
> 0.1% 7 + 0 bench.bench_red_black.method__14$RUBY$insert
> 0.1% 3 + 3 java.lang.Class.getDeclaredConstructors0
> 0.1% 0 + 6 org.jruby.parser.Ruby19Parser.<clinit>
> 0.0% 0 + 5 java.io.UnixFileSystem.getBooleanAttributes0
> 0.0% 2 + 3 java.lang.ClassLoader.defineClass1
> 0.0% 1 + 2 java.security.AccessController.doPrivileged
> 0.0% 3 + 0 org.objectweb.asm.Label.a
> 0.0% 3 + 0 java.lang.invoke.Invokers.lookupInvoker
> 0.0% 3 + 0
> com.sun.xml.internal.ws.org.objectweb.asm.ClassWriter.<init>
> 0.0% 0 + 2 java.io.FileInputStream.open
> 0.0% 0 + 2 java.lang.invoke.MethodHandleNatives.resolve
> 0.0% 0 + 2 java.lang.Throwable.fillInStackTrace
> 0.0% 0 + 2 java.lang.ClassLoader.findBootstrapClass
> 0.0% 0 + 2 org.jruby.parser.Ruby19Parser.<init>
> 0.0% 2 + 0 org.jruby.util.ByteList.bytes
> 0.0% 2 + 0
> org.jruby.internal.runtime.methods.InvocationMethodFactory.invokeCallConfigPost
> 0.0% 2 + 0 org.objectweb.asm.Frame.d
> 8.4% 277 + 606 Total interpreted (including elided)
>
> Compiled + native Method
> 2.8% 1 + 295 bench.bench_red_black.method__16$RUBY$minimum
> 2.5% 3 + 261
> bench.bench_red_black.method__27$RUBY$insert_helper
> 1.5% 0 + 163 bench.bench_red_black.method__25$RUBY$left_rotate
> 1.0% 0 + 109 bench.bench_red_black.method__14$RUBY$insert
> 0.0% 0 + 1 java.io.FilterInputStream.read
> 0.0% 1 + 0 org.jruby.runtime.CompiledBlockLight19.pre
> 0.0% 0 + 1
> org.jruby.internal.runtime.methods.CompiledMethod.call
> 0.0% 1 + 0
> org.jruby.runtime.invokedynamic.InvocationLinker.testRealClass
> 0.0% 1 + 0 org.jruby.RubyBasicObject.op_not_equal
> 0.0% 0 + 1
> org.jruby.internal.runtime.methods.DynamicMethod.call
> 0.0% 1 + 0
> org.jruby.runtime.scope.ManyVarsDynamicScope.getValueOrNil
> 0.0% 1 + 0 java.lang.invoke.LambdaForm$LFI39.invoke
> 0.0% 1 + 0
> org.jruby.ast.executable.RuntimeCache.isCachedFrom
> 0.0% 0 + 1
> org.jruby.javasupport.util.RuntimeHelpers.restructureBlockArgs19
> 0.0% 1 + 0 org.objectweb.asm.ClassWriter.toByteArray
> 0.0% 0 + 1 bench.bench_red_black.method__17$RUBY$maximum
> 0.0% 0 + 1 java.lang.String.replace
> 0.0% 1 + 0 bench.bench_red_black.method__2$RUBY$initialize
> 0.0% 1 + 0
> bench.bench_red_black.method__26$RUBY$right_rotate
> 8.0% 13 + 834 Total compiled
>
> Stub + native Method
> 80.6% 0 + 8494 java.lang.Class.isInstance
> 0.8% 1 + 87 java.io.FileOutputStream.open
> 0.8% 0 + 85 java.io.FileOutputStream.close0
> 0.3% 0 + 29 java.lang.Class.isPrimitive
> 0.1% 0 + 15 sun.misc.Unsafe.ensureClassInitialized
> 0.1% 0 + 9 java.lang.String.intern
> 0.1% 0 + 8 java.lang.Class.isArray
> 0.1% 0 + 7 java.lang.invoke.MethodHandleNatives.resolve
> 0.0% 2 + 3 java.security.AccessController.doPrivileged
> 0.0% 0 + 5 java.lang.System.arraycopy
> 0.0% 0 + 4 java.lang.Class.isInterface
> 0.0% 0 + 3 java.lang.Thread.currentThread
> 0.0% 0 + 3 java.lang.Throwable.fillInStackTrace
> 0.0% 0 + 3 sun.misc.Unsafe.defineAnonymousClass
> 0.0% 0 + 2 java.lang.Object.getClass
> 0.0% 0 + 2 java.lang.Class.getComponentType
> 0.0% 0 + 2 java.lang.ClassLoader.findLoadedClass0
> 0.0% 0 + 2 java.io.FileOutputStream.writeBytes
> 0.0% 0 + 1 java.lang.Thread.holdsLock
> 0.0% 0 + 1 sun.reflect.Reflection.getCallerClass
> 0.0% 0 + 1 java.lang.Class.getEnclosingMethod0
> 0.0% 0 + 1 java.lang.Object.clone
> 0.0% 0 + 1 java.lang.reflect.Array.newArray
> 0.0% 0 + 1
> java.lang.invoke.MethodHandleNatives.setCallSiteTargetNormal
> 0.0% 0 + 1 sun.misc.Unsafe.compareAndSwapInt
> 83.3% 3 + 8771 Total stub (including elided)
>
>
>
> cthaling at intelsdv03.us.oracle.com:~/mlvm/jruby$ jruby -J-client
> -J-showversion -X+C bench/bench_red_black.rb
> Picked up _JAVA_OPTIONS: -XX:+UnlockExperimentalVMOptions
> -XX:-AllowChainedMethodHandles -XX:+AllowLambdaForms -Xverify:all -esa
> -Xbootclasspath/p:/home/cthaling/mlvm/jdk/classes/
> java version "1.8.0-ea"
> Java(TM) SE Runtime Environment (build 1.8.0-ea-b40)
> Java HotSpot(TM) Client VM (build 24.0-b08-internal, mixed mode)
>
> 15.966
> GC.count = 46
> 9.629
> GC.count = 54
> 9.135
> GC.count = 61
> 8.905
> GC.count = 68
> 8.775
> GC.count = 74
> 8.851
> GC.count = 80
> 8.782
> GC.count = 87
> 8.78
> GC.count = 93
> 8.794
> GC.count = 100
> 8.81
> GC.count = 106
>
>
> Flat profile of 108.49 secs (9711 total ticks): main
>
> Interpreted + native Method
> 14.3% 1387 + 0
> bench.bench_red_black.method__27$RUBY$insert_helper
> 13.3% 1294 + 0 bench.bench_red_black.method__16$RUBY$minimum
> 3.1% 0 + 303 java.io.FileOutputStream.open
> 2.2% 0 + 210 java.io.FileOutputStream.close0
> 1.8% 170 + 0 bench.bench_red_black.method__14$RUBY$insert
> 0.1% 0 + 13 java.lang.Class.forName0
> 0.1% 0 + 12 sun.misc.Unsafe.defineAnonymousClass
> 0.1% 2 + 10 org.jruby.Ruby.initCore
> 0.1% 0 + 8 java.io.UnixFileSystem.getBooleanAttributes0
> 0.1% 1 + 6 org.jruby.parser.Ruby19Parser.<clinit>
> 0.1% 7 + 0
> bench.bench_red_black.method__28$RUBY$delete_fixup
> 0.1% 4 + 2 java.lang.Class.getDeclaredConstructors0
> 0.1% 2 + 4 java.lang.ClassLoader.defineClass1
> 0.1% 6 + 0
> org.jruby.internal.runtime.methods.DynamicMethod.call
> 0.1% 5 + 0 org.jruby.RubyArray.eachCommon
> 0.1% 5 + 0 org.jruby.RubyArray.realloc
> 0.0% 0 + 4 java.lang.ClassLoader.findBootstrapClass
> 0.0% 0 + 4 java.io.FileOutputStream.writeBytes
> 0.0% 0 + 3 java.lang.invoke.MethodHandleNatives.resolve
> 0.0% 3 + 0 bench.bench_red_black.method__25$RUBY$left_rotate
> 0.0% 0 + 3 org.jruby.Ruby.initRoot
> 0.0% 0 + 2 java.lang.Throwable.fillInStackTrace
> 0.0% 2 + 0
> org.jruby.compiler.impl.BaseBodyCompiler.invokeUtilityMethod
> 0.0% 2 + 0
> org.jruby.runtime.opto.OptoFactory.newInvocationCompiler
> 0.0% 2 + 0
> org.jruby.compiler.impl.InvokeDynamicCacheCompiler.cacheClosure19
> 37.6% 3024 + 623 Total interpreted (including elided)
>
> Compiled + native Method
> 19.7% 321 + 1592
> bench.bench_red_black.method__27$RUBY$insert_helper
> 17.1% 79 + 1586 bench.bench_red_black.method__25$RUBY$left_rotate
> 5.4% 147 + 374 bench.bench_red_black.method__14$RUBY$insert
> 4.5% 27 + 409 bench.bench_red_black.method__16$RUBY$minimum
> 3.8% 371 + 0 bench.bench_red_black.method__22$RUBY$search
> 1.4% 133 + 0 bench.bench_red_black.method__17$RUBY$maximum
> 1.1% 102 + 0 org.jruby.RubyBasicObject.op_not_equal
> 0.9% 90 + 0 org.jruby.RubyBasicObject.setVariable
> 0.6% 56 + 0
> org.jruby.ast.executable.RuntimeCache.isCachedFrom
> 0.5% 53 + 0 org.jruby.runtime.CompiledBlockLight19.pre
> 0.5% 49 + 0 org.jruby.runtime.CompiledBlock19.yield
> 0.3% 28 + 0 org.jruby.runtime.ThreadContext.pushCallFrame
> 0.3% 25 + 0
> org.jruby.runtime.scope.ManyVarsDynamicScope.getValueOrNil
> 0.2% 24 + 0 org.jruby.MetaClass.getRealClass
> 0.2% 21 + 0 bench.bench_red_black.method__18$RUBY$successor
> 0.2% 16 + 0 org.jruby.RubyFixnum.op_plus_one
> 0.2% 16 + 0 bench.bench_red_black.method__19$RUBY$predecessor
> 0.1% 14 + 0 bench.bench_red_black.method__2$RUBY$initialize
> 0.1% 13 + 0 org.jruby.RubyObject$1.allocate
> 0.1% 13 + 0 bench$bench_red_black$method__13$RUBY$add.call
> 0.1% 13 + 0
> bench.bench_red_black.method__28$RUBY$delete_fixup
> 0.1% 11 + 0 org.jruby.RubyRandom.randCommon19
> 0.1% 10 + 0
> java.util.concurrent.atomic.AtomicReferenceFieldUpdater$AtomicReferenceFieldUpdaterImpl.compareAndSet
> 0.1% 9 + 1 org.jruby.RubyFixnum.times
> 0.1% 10 + 0
> bench.bench_red_black.method__20$RUBY$inorder_walk
> 59.5% 1801 + 3974 Total compiled (including elided)
>
> Stub + native Method
> 0.9% 0 + 86 java.io.FileOutputStream.open
> 0.7% 0 + 71 java.io.FileOutputStream.close0
> 0.2% 0 + 22 java.lang.Class.isPrimitive
> 0.2% 0 + 15 sun.misc.Unsafe.ensureClassInitialized
> 0.1% 0 + 12 java.lang.Class.isArray
> 0.1% 0 + 7 java.lang.invoke.MethodHandleNatives.resolve
> 0.1% 0 + 5 java.lang.String.intern
> 0.1% 0 + 5 java.lang.Class.isInterface
> 0.0% 0 + 4 java.lang.Class.getComponentType
> 0.0% 0 + 4 java.lang.Object.clone
> 0.0% 1 + 3 java.security.AccessController.doPrivileged
> 0.0% 0 + 4 sun.misc.Unsafe.defineAnonymousClass
> 0.0% 0 + 3 java.lang.System.arraycopy
> 0.0% 0 + 2 java.lang.ClassLoader.findLoadedClass0
> 0.0% 0 + 2 java.util.zip.Inflater.inflateBytes
> 0.0% 0 + 1 java.lang.Object.getClass
> 0.0% 0 + 1 java.lang.Class.getDeclaringClass
> 0.0% 0 + 1 java.lang.Class.isAssignableFrom
> 0.0% 0 + 1 java.lang.Throwable.fillInStackTrace
> 0.0% 0 + 1 java.security.AccessController.doPrivileged
> 0.0% 0 + 1 sun.misc.Unsafe.getObjectVolatile
> 0.0% 0 + 1 sun.misc.Unsafe.getInt
> 0.0% 0 + 1 java.io.FileOutputStream.writeBytes
> 2.6% 1 + 253 Total stub
>
> The problem why we don't see a much bigger improvement is that most of
> the time (I'd say 99% of the time) Class.cast isn't inlined in C1
> because it's too big, e.g.:
>
> @ 33
> java.lang.invoke.LambdaForm$LFI67/25137260::invoke (32 bytes)
> @ 15
> java.lang.invoke.LambdaForm$LFI1/13330648::invoke (31 bytes)
> @ 27
> sun.invoke.util.ValueConversions::castReference (6 bytes)
> @ 2 java.lang.Class::cast
> (27 bytes) callee is too large
> @ 28
> org.jruby.RubyBasicObject::op_not (20 bytes)
> @ 1
> org.jruby.runtime.ThreadContext::getRuntime (5 bytes)
> @ 5
> org.jruby.RubyBasicObject::isTrue (15 bytes)
> @ 16
> org.jruby.Ruby::newBoolean (16 bytes)
>
> and we have to go the out-of-line route.
>
> -- Chris
There are several solutions:
- Charles should use invokedynamic for the unary operator '!',
so there is no need to cast to a ThreadContext anymore
- The cast should not be needed anyway. I have no idea if
it's because the threadContext is not declared as ThreadContext in
invokedynamic
or if your implementation is not able to see that the threadContext
object is never
altered so its type is constant in the method handle blob, it's
maybe because the
blob is fully specified in Java (there is no invokeExact anymore ?)
- the inlining algorithm should not use the bytecode size but a
top-bottom algorithm
that doesn't count all codes that throws an exception and their
dependency (roughly)
or if the code is part of an exception handler
(if the classical path and the exception handler path are not shared).
- the method metadata (profile) should not be tied to a method so when
a method handle
blob is interpreted, it can have it's own profile so you will not
try to inline the code
in the slow path together with the fast path (it will also help to
de-virtualize in tiered
compilation mode).
Rémi
More information about the hotspot-compiler-dev
mailing list