A loop optimization sometimes results in pessimization

Marko Topolnik marko.topolnik at gmail.com
Tue Feb 18 03:29:46 PST 2014


I have found that the following code:

    for (int i = 0; i < array.length; i++) {
      final int entry = array[i];
      result ^= i + array[i];
      if (entry == 0) break;
    }

executes slower if I comment out the last line

      if (entry == 0) break;

I have ensured that the array has no zero values.

After some analysis, I have isolated the cause to be an additional transformation which is performed in the latter case, when the loop has no premature exit points (subject to the elimination of array bounds check, which does happen in my case). The transformation collates the operations in the unrolled loop, so that array access for all unrolled steps is performed first, followed by all the XORing into the accumulator.

The slowdown occurs because there is a lot of temporary storage needed to remember all the results of memory reads until they are used for XOR-ing. Some of the reads overflow onto the stack.

In detail, I have gathered the following table which correlates execution times with the -XX:LoopUnrollLimit command-line argument:

LoopUnrollLimit:   14 15 18 19 22 23 60
withExitPoint:     96 95 95 79 80 80 69   1/100 ns
withoutExitPoint:  94 64 64 63 64 77 75   1/100 ns

The following sudden changes can be observed:

- on the transition from 14 to 15, the withoutExitPoint variant receives a beneficial transformation and is significantly sped up;

- on 18->19, the withExitPoint variant receives a speedup, lesser than the above;

- on 22->23, the withoutExitPoint variant is slowed down. At this point I see the above described collation starting to happen.

The default loopUnrollLimit for my setup is 60, so I present its results in the last column. The withExitPoint variant is faster, even though it has one more line of code.


Is this a known effect? Would it be worth fixing?


-Marko Topolnik


P.S. This is the JMH code I have used for measurements:

@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode(Mode.AverageTime)
@OperationsPerInvocation(Measure.ARRAY_SIZE)
@Warmup(iterations = 20, time = 200, timeUnit=MILLISECONDS)
@Measurement(iterations = 20, time = 200, timeUnit=MILLISECONDS)
@State(Scope.Thread)
@Threads(1)
@Fork(2)
public class Measure
{
  public static final int ARRAY_SIZE = 1024;
  private final int[] array = new int[ARRAY_SIZE];

  @Setup public void setup() {
    final Random random = new Random();
    for (int i = 0; i < ARRAY_SIZE; ++i) {
      final int x = random.nextInt();
      array[i] = x == 0? 1 : x;
    }
  }

  @GenerateMicroBenchmark public int withoutExitPoint() {
    final int[] array = this.array;
    int result = 0;
    for (int i = 0; i < array.length; i++) {
      final int entry = array[i];
      result ^= i + entry;
    }
    return result;
  }

  @GenerateMicroBenchmark public int withExitPoint() {
    final int[] array = this.array;
    int result = 0;
    for (int i = 0; i < array.length; i++) {
      final int entry = array[i];
      result ^= i + array[i];
      if (entry == 0) break;
    }
    return result;
  }
}



More information about the hotspot-compiler-dev mailing list