type-recursive inlining oddity

Mon Sep 10 05:08:16 PDT 2012

Hi there,

We are in the middle of performance analysis for lambdas/bulk operations
coming in JDK8, and stumbled across understandable, but weird oddity in
our inlining policy.

In short, there are cases when we delegate the method call to the same
method of the same actual class. I.e.:

    public static class DelegatingIterator1<T> implements Iterator<T> {
        private final Iterator<T> i;

        public DelegatingIterator1(Iterator<T> i) {
            this.i = i;
        }

        @Override
        public boolean hasNext() {
            return i.hasNext();
        }

        @Override
        public T next() {
            return i.next();
        }

        @Override
        public void remove() {
            i.remove();
        }
    }

This has somewhat surprising effect when you stack the delegates.
Basically, if you have the exact copy of DelegatingIterator1 as
DelegatingIterator2, then these three cases;

  empty = Collections.emptyIterator();
  chain11 = new DelegatingIterator1<>(new DelegatingIterator1<>(empty));
  chain12 = new DelegatingIterator1<>(new DelegatingIterator2<>(empty));

    empty.hasNext(); // repeatedly call this, and measure
  chain11.hasNext(); // repeatedly call this, and measure
  chain12.hasNext(); // repeatedly call this, and measure

...have drastically different performance.

With proper warmup, cpufreq, etc. etc. on my 2.0 Ghz i5-2520M, Linux
x86, JDK 7u4:
     empty: 1.511 +- 0.003 nsec/op
   chain11: 4.055 +- 0.060 nsec/op
   chain12: 2.545 +- 0.135 nsec/op

This difference could easily be explained by this inlining tree:

 GeneratedRecursiveBench::chain11 @ 6
   @ 8   RecursiveBench::chain11  inline (hot)
     @ 4   RecursiveBench$DelegatingIterator1::hasNext   inline (hot)
       @ 4   java.util.Collections$EmptyIterator::hasNext inline (hot)
       @ 4   RecursiveBench$DelegatingIterator1::hasNext  inline (hot)
         @ 4   java.util.Collections$EmptyIterator::hasNext inline (hot)
         @ 4   RecursiveBench$DelegatingIterator1::hasNext recursively
inlining too deep

...as compared with:

 GeneratedRecursiveBench::chain12 @ 6
   @ 8   RecursiveBench::chain12  inline (hot)
     @ 4   RecursiveBench$DelegatingIterator1::hasNext  inline (hot)
       @ 4   RecursiveBench$DelegatingIterator2::hasNext  inline (hot)
         @ 4   java.util.Collections$EmptyIterator::hasNext inline (hot)

What had happened in chain11? We compile the v-call in
DelegatingIterator1.hasNext(); it appears that the receivers in
i.hasNext() are either DelegatingIterator1 or EmptyIterator; so we
devirtualize this bimorphic call, inline both; then, we proceed with
processing v-calls in new inlined methods, and voila, there is the
v-call for i.hasNext() in the same method again, so we devirtualize and
inline further, until MaxRecursiveInlineLevel tells us to stop.
Twiddling MaxRecursiveInlineLevel has no effect on performance (except
for setting in to 0, when chain11 drops even further).

Note that it is not the "true" "control" recursion, when I pass the
control to the same instance method in the same _instance_; it is rather
the "type" recursion when I call the same instance method in the same
_class_.

All this actually tells me, Hotspot does not account the position in the
call tree when recording profiles. Should it do that, we could actually
differentiate the v-calls made in first instance vs. v-calls made in
second instance. Is there a known/existing explanation whether it is
feasible to do? Or, is there another solution for this problem, except
for splitting the types?

-Aleksey.