Performance of instanceof with interfaces is multiple times slower than with classes

Ioi Lam ioi.lam at oracle.com
Fri Aug 7 21:13:48 UTC 2020


One thing that I found very useful in learning the C1 compiler is 
something like this:

public class InterfaceTest {
   public static void main(String args[]) {
     test(new Runnable() { public void run() {} });
   }
   static void test (Runnable r) {
     r.run();
   }
}


Build a slow-debug version of the JVM, and then:


$ gdb --args myjdk/bin/java -cp ~/tmp -Xcomp -XX:TieredStopAtLevel=1 \
     -XX:CompileCommand=print,InterfaceTest::test \
     -XX:CompileCommand=compileonly,InterfaceTest::test InterfaceTest

(gdb) b nmethod::print_nmethod
(gdb) r
Thread 13 "C1 CompilerThre" hit Breakpoint 4, nmethod::print_nmethod (....)
(gdb) finish



This will print out the content of the compiled method, like:


[Verified Entry Point]
   # {method} {0x00007fffa17fc3a0} 'test' '(Ljava/lang/Runnable;)V' in 
'InterfaceTest'
   # parm0:    rsi:rsi   = 'java/lang/Runnable'
   #           [sp+0x40]  (sp of caller)
  ;;  block B1 [0, 0]
   0x00007fffd8cf61c0:   mov    %eax,-0x16000(%rsp)
   0x00007fffd8cf61c7:   push   %rbp
   0x00007fffd8cf61c8:   sub    $0x30,%rsp ;*aload_0 {reexecute=0 
rethrow=0 return_oop=0}



Now you can set a break point there and single step the instructions to 
see what's happening.


(gdb) b *0x00007fffd8cf61c0
Breakpoint 5 at 0x7fffd8cf61c0
(gdb) c
Continuing.
[Switching to Thread 0x7ffff7fb6700 (LWP 24183)]

Thread 2 "java" hit Breakpoint 5, 0x00007fffd8cf61c0 in ?? ()
(gdb) display/i $pc
1: x/i $pc
=> 0x7fffd8cf61c0:    mov    %eax,-0x16000(%rsp)
(gdb) si
0x00007fffd8cf61c7 in ?? ()
1: x/i $pc
=> 0x7fffd8cf61c7:    push   %rbp
(gdb) si
0x00007fffd8cf61c8 in ?? ()
1: x/i $pc
=> 0x7fffd8cf61c8:    sub    $0x30,%rsp
(gdb) si
0x00007fffd8cf61cc in ?? ()
1: x/i $pc
=> 0x7fffd8cf61cc:    nop


You can also set breakpoints at places like emit_typecheck_helper. 
Because now only a very limited number of bytecodes are compiled, you 
will have less distraction and can focus on what happens with your test 
case.


It won't be easy but will definitely be fun!

(And the C2 compiler is a completely different beast .....)

- Ioi



On 8/7/20 12:41 PM, Christoph Dreis wrote:
> Hi John,
>
> thanks for your elaborate answer. That helps a lot indeed and explains some dead ends I - as a non Hotspot engineer - could have not solved without some hints.
>
> I will dig a bit deeper into this now and also follow up on the "big discussion" in the past.
>
> Thank you for taking the time - I really appreciate it.
>
> Cheers,
> Christoph
>
> Von: John Rose <john.r.rose at oracle.com>
> Datum: Freitag, 7. August 2020 um 21:27
> An: Christoph Dreis <christoph.dreis at freenet.de>
> Cc: hotspot-runtime-dev <hotspot-runtime-dev at openjdk.java.net>
> Betreff: Re: Performance of instanceof with interfaces is multiple times slower than with classes
>
> On Aug 6, 2020, at 6:56 AM, Christoph Dreis <mailto:christoph.dreis at freenet.de> wrote:
>
> I have in fact looked at the code already before the first mail and found emit_typecheck_helper in src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp, which I think is involved - please correct me if I'm wrong.
> And MacroAssembler::check_klass_subtype_fast_path in src/hotspot/cpu/x86/macroAssembler_x86.cpp etc. Depending on the architecture (x86, aarch etc.) of course.
>
> Like you said, there are many things involved and I was hoping for a good starting point really from an experienced developer or a rough explanation what might be involved in the example. I understand now that there doesn't seem to be a "simple" explanation. Unfortunately, the C++ side of things is not as good documented as the Java side of things and that makes it relatively complicated for people like me who aren't familiar with the code to follow the flows sometimes. I can assure you: I do want to look at code, hence the - admittedly clumsy - question.
>
> I’ll explain how I would handle it, assuming I didn’t know the
> code in question already.  (Counterfactual, since I wrote a bunch
> of it.  But I want to explain how to do the exploration, which would
> be a better use of both our time, than for me to run a class over email
> on HotSpot type checking.)
>
> When exploring HotSpot code, you often have to follow dependencies
> many layers deep.  You started in a good place at emit_typecheck_helper,
> perhaps after tracing callees from where c1 processes Bytecodes::_checkcast
> or some similar bytecode.  You then observed that there is a call to
> MacroAssembler::check_klass_subtype_fast_path, and hopefully also
> that there’s a “slow path” mentioned in emit_typecheck_helper.
>
> In MacroAssembler, if you nosed around a little, you might have noticed
> that the function following  check_klass_subtype_fast_path is called
> check_klass_subtype_slow_path, or perhaps you find the latter function
> by tracing through callees.
>
> You could then verify your understanding about the roles of these two
> functions by grepping for them in the whole source base, and note that
> they are used in a few crucial places.
>
> $ grep  -nH -e  check_klass_subtype_[slowfast]*_path $(find src/hotspot/{cpu/x86,share} -type f)
>
> The next step is to ask, “who does the macro assembler functions call?”
> You might think that’s a dead end, but in fact those functions make
> use of some symbolic offsets.  This is something HotSpot engineers
> learn to look for and exploit.  See these two lines:
>
>    int sc_offset = in_bytes(Klass::secondary_super_cache_offset());
>    int sco_offset = in_bytes(Klass::super_check_offset_offset());
>
> And later on in the slow path:
>
>    int ss_offset = in_bytes(Klass::secondary_supers_offset());
>    int sc_offset = in_bytes(Klass::secondary_super_cache_offset());
>
> They are static calls into klass.cpp which produce offsets in
> the Klass metadata structure.  That is the next link to explore
> in the callee chain.
>
> Next stop, in klass.hpp there is a very brief comment explaining
> the data structure, plus this little gem:
>
>    // … See big discussion
>    // in doc/server_compiler/checktype.txt
>
> There are two ways forward from this point.  First, look for
> the places where the relevant fields of Klass are initialized,
> and you will see algorithms for filling them with the correct
> data.
>
> Second, find out where that checktype.txt went.  Here’s how
> I did it; your mileage may vary.  First “git ls-files '**/*checktype*'”
> comes up empty.  So then google “hotspot checktype.txt”
> and see a bunch of WIFI mismatches.  The winning google
> query is “java hotspot checktype.txt”.  First hit points to
> some public discourse a couple years ago from people asking
> questions similar to the one you are asking.
>
> The necessary clues really are in the code base, usually.
> It seems they were in this case.
>
> As has been discussed at other times (probably on this alias),
> the algorithm we use is good enough for many but not all
> use cases, and has known weaknesses.  One or two complicated
> things could be done to patch the weaknesses, involving
> a better look aside cache (less thread contention and/or
> more cache slots) in front of the O(N) slow path, or a less
> than O(N) slow path, involving complicated tables.  There’s
> literature out there (which I don’t have handy; it’s old stuff)
> about perfect hashing and other coloring schemes that
> reach a slow and complex but honorable O(1) speed.  I
> will guess that a log N binary search through the supers
> array, maybe using metadata address order, would be
> simpler and just as good, in practice.
>
> It’s a matter for further research.  One key constraint that
> some people don’t see at first:  The resulting algorithm has
> to be maintainable over the long haul.  It can’t complicate
> unrelated paths in the JVM, so it can’t be overly “heroic”
> in order to maintain some complicated perfect coloring
> scheme.  It has to be self-contained, and easy to work with
> in isolation.
>
> I hope this helps.
>
> — John
>
>



More information about the hotspot-runtime-dev mailing list