Performance of instanceof with interfaces is multiple times slower than with classes
Christoph Dreis
christoph.dreis at freenet.de
Fri Aug 7 19:41:27 UTC 2020
Hi John,
thanks for your elaborate answer. That helps a lot indeed and explains some dead ends I - as a non Hotspot engineer - could have not solved without some hints.
I will dig a bit deeper into this now and also follow up on the "big discussion" in the past.
Thank you for taking the time - I really appreciate it.
Cheers,
Christoph
Von: John Rose <john.r.rose at oracle.com>
Datum: Freitag, 7. August 2020 um 21:27
An: Christoph Dreis <christoph.dreis at freenet.de>
Cc: hotspot-runtime-dev <hotspot-runtime-dev at openjdk.java.net>
Betreff: Re: Performance of instanceof with interfaces is multiple times slower than with classes
On Aug 6, 2020, at 6:56 AM, Christoph Dreis <mailto:christoph.dreis at freenet.de> wrote:
I have in fact looked at the code already before the first mail and found emit_typecheck_helper in src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp, which I think is involved - please correct me if I'm wrong.
And MacroAssembler::check_klass_subtype_fast_path in src/hotspot/cpu/x86/macroAssembler_x86.cpp etc. Depending on the architecture (x86, aarch etc.) of course.
Like you said, there are many things involved and I was hoping for a good starting point really from an experienced developer or a rough explanation what might be involved in the example. I understand now that there doesn't seem to be a "simple" explanation. Unfortunately, the C++ side of things is not as good documented as the Java side of things and that makes it relatively complicated for people like me who aren't familiar with the code to follow the flows sometimes. I can assure you: I do want to look at code, hence the - admittedly clumsy - question.
I’ll explain how I would handle it, assuming I didn’t know the
code in question already. (Counterfactual, since I wrote a bunch
of it. But I want to explain how to do the exploration, which would
be a better use of both our time, than for me to run a class over email
on HotSpot type checking.)
When exploring HotSpot code, you often have to follow dependencies
many layers deep. You started in a good place at emit_typecheck_helper,
perhaps after tracing callees from where c1 processes Bytecodes::_checkcast
or some similar bytecode. You then observed that there is a call to
MacroAssembler::check_klass_subtype_fast_path, and hopefully also
that there’s a “slow path” mentioned in emit_typecheck_helper.
In MacroAssembler, if you nosed around a little, you might have noticed
that the function following check_klass_subtype_fast_path is called
check_klass_subtype_slow_path, or perhaps you find the latter function
by tracing through callees.
You could then verify your understanding about the roles of these two
functions by grepping for them in the whole source base, and note that
they are used in a few crucial places.
$ grep -nH -e check_klass_subtype_[slowfast]*_path $(find src/hotspot/{cpu/x86,share} -type f)
The next step is to ask, “who does the macro assembler functions call?”
You might think that’s a dead end, but in fact those functions make
use of some symbolic offsets. This is something HotSpot engineers
learn to look for and exploit. See these two lines:
int sc_offset = in_bytes(Klass::secondary_super_cache_offset());
int sco_offset = in_bytes(Klass::super_check_offset_offset());
And later on in the slow path:
int ss_offset = in_bytes(Klass::secondary_supers_offset());
int sc_offset = in_bytes(Klass::secondary_super_cache_offset());
They are static calls into klass.cpp which produce offsets in
the Klass metadata structure. That is the next link to explore
in the callee chain.
Next stop, in klass.hpp there is a very brief comment explaining
the data structure, plus this little gem:
// … See big discussion
// in doc/server_compiler/checktype.txt
There are two ways forward from this point. First, look for
the places where the relevant fields of Klass are initialized,
and you will see algorithms for filling them with the correct
data.
Second, find out where that checktype.txt went. Here’s how
I did it; your mileage may vary. First “git ls-files '**/*checktype*'”
comes up empty. So then google “hotspot checktype.txt”
and see a bunch of WIFI mismatches. The winning google
query is “java hotspot checktype.txt”. First hit points to
some public discourse a couple years ago from people asking
questions similar to the one you are asking.
The necessary clues really are in the code base, usually.
It seems they were in this case.
As has been discussed at other times (probably on this alias),
the algorithm we use is good enough for many but not all
use cases, and has known weaknesses. One or two complicated
things could be done to patch the weaknesses, involving
a better look aside cache (less thread contention and/or
more cache slots) in front of the O(N) slow path, or a less
than O(N) slow path, involving complicated tables. There’s
literature out there (which I don’t have handy; it’s old stuff)
about perfect hashing and other coloring schemes that
reach a slow and complex but honorable O(1) speed. I
will guess that a log N binary search through the supers
array, maybe using metadata address order, would be
simpler and just as good, in practice.
It’s a matter for further research. One key constraint that
some people don’t see at first: The resulting algorithm has
to be maintainable over the long haul. It can’t complicate
unrelated paths in the JVM, so it can’t be overly “heroic”
in order to maintain some complicated perfect coloring
scheme. It has to be self-contained, and easy to work with
in isolation.
I hope this helps.
— John
More information about the hotspot-runtime-dev
mailing list