RFR (XS): Bump the inlining limits for concurrent mark

Tue Jan 24 15:01:12 UTC 2017

Hi,

In the last few days, we have struggled with GCC inlining in concurrent mark
code. We are very close to the default GCC inlining budget, and every recent
patch had to rearrange code in some way to deal with that. The issue is
compounded by lots of templated closures we have to inline to get decent
performance.

This repeated balancing act is making already hard performance work even harder.
For example, I have wasted almost entire day yesterday trying to find the method
split that made GCC happy, and that was not entirely enough.

With that, I would like us to claim surrender, bow before the compiler, and
<strike>burn it to ashes</strike> bump the inlining limits for one file:
  http://cr.openjdk.java.net/~shade/shenandoah/concmark-bump-inline/webrev.01/

This is not unprecedented in Hotspot codebase, the same file has the similar
line for psPromotionManager.cpp.

The effect is clearly visible in profiled disassembly, but here are sample
performance improvements for model tests:

*) 20M HashMap marking:

  before: 133.05 s (a =  1243493 us) (n =   107)
            (lvls, us =   568359,  1210938,  1230469,  1269531,  1390970)

  after:  117.95 s (a =  1082074 us) (n =   109)
            (lvls, us =   921875,  1054688,  1074219,  1093750,  1155972)

*) 20M Tree marking:

  before:  82.91 s (a =   637769 us) (n =   130)
            (lvls, us =   587891,   615234,   626953,   632812,   726433)

  after:   59.86 s (a =   436915 us) (n =   137)
            (lvls, us =   296875,   425781,   431641,   437500,   482738)

*) 20M Array marking:

  before:  22.06 s (a =   176497 us) (n =   125)
            (lvls, us =   169922,   171875,   173828,   177734,   188691)

  after:   16.47 s (a =   129720 us) (n =   127)
            (lvls, us =   123047,   125000,   126953,   132812,   149198)

Static footprint increased a bit, for a 130K:
  before:  20.634.880  libjvm.so
  after:   20.761.304  libjvm.so

Testing: hotspot_gc_shenandoah, targeted benchmarks

Thanks,
-Aleksey