RFR (XS): Optimize branch frequency of G1's write post-barrier in C2

Thu Jul 11 23:35:26 UTC 2019

Thanks Thomas for the review and running experiments!

> - can you share the code changes to generate the statistics? It would
> be nice to confirm these on a few more applications and play around
> with them a bit :)
> I would like to confirm some very old numbers we have for other older
> benchmarks that this is indeed the best probabibility distribution.
> Particularly I do not understand that from these numbers we did not
> change the probabilities as you suggested :( There were other changes
> mostly related to barrier elision in that time frame, but it seems
> likelihood changes were not attempted.

It is here: http://cr.openjdk.java.net/~manc/8225776/branch_profiling/
I also added a comment in https://bugs.openjdk.java.net/browse/JDK-8225776
to clarify the methodology.

> - these numbers (and yours) also indicate that the not-young check is
> very likely to be not taken (i.e. you jump over the storeload). Did you
> also perform some experiments changing the order a bit?
> It might be detrimental for this particular case where the StoreLoad is
> expensive, and the xor/non-null filter out at least some additional of
> those, but maybe
> if (young) -> exit
> if (different-region) -> exit
> if (non-null) -> exit
> StoreLoad
> ...
> may be better to do? I am aware that the "young" check adds a load,
> which is also expensive (but not as much as the StoreLoad), but it
> seems to be an interesting case to look at.
>
> In our old results (as far as I can interpret them) it did not seem to
> have any advantage/disadvantage, so I am just curious whether you did
> such tests and their conclusion.

Yes, I did this experiment. The load from card table on the fast path turns
out to be expensive for several benchmarks:
https://cr.openjdk.java.net/~manc/8225776/20190516-jdk11G1WriteBarrier-dacapoDefault4G-YoungCheckFirst.html
For this experiment, I was setting 4G heap with -XX:NewRatio=1, so most
writes happen to young object, and GC happens very infrequently.
The implementation had some bug that some benchmarks crashed while running.
I didn't look into fixing the bug, as this direction does not seem
worthwhile.

> - internal (quick) perf testing showed no overall score changes, except
> that maxJOPS on SpecJBB2015 seemed to improve by ~1.2% (only had time
> for very few experiments at this time, will rerun, so there is some
> chance that this has been a fluke) which is definitely nice.

Good to hear that!

-Man