RFR (XS): Optimize branch frequency of G1's write post-barrier in C2

Thomas Schatzl thomas.schatzl at oracle.com
Sat Aug 3 19:27:07 UTC 2019


ping at compiler team to have a quick look.

Thanks,
   Thomas

On 11.07.19 16:35, Man Cao wrote:
> Thanks Thomas for the review and running experiments!
> 
>  > - can you share the code changes to generate the statistics? It would
>  > be nice to confirm these on a few more applications and play around
>  > with them a bit :)
>  > I would like to confirm some very old numbers we have for other older
>  > benchmarks that this is indeed the best probabibility distribution.
>  > Particularly I do not understand that from these numbers we did not
>  > change the probabilities as you suggested :( There were other changes
>  > mostly related to barrier elision in that time frame, but it seems
>  > likelihood changes were not attempted.
> 
> It is here: http://cr.openjdk.java.net/~manc/8225776/branch_profiling/
> I also added a comment in 
> https://bugs.openjdk.java.net/browse/JDK-8225776 to clarify the methodology.
> 
>  > - these numbers (and yours) also indicate that the not-young check is
>  > very likely to be not taken (i.e. you jump over the storeload). Did you
>  > also perform some experiments changing the order a bit?
>  > It might be detrimental for this particular case where the StoreLoad is
>  > expensive, and the xor/non-null filter out at least some additional of
>  > those, but maybe
>  > if (young) -> exit
>  > if (different-region) -> exit
>  > if (non-null) -> exit
>  > StoreLoad
>  > ...
>  > may be better to do? I am aware that the "young" check adds a load,
>  > which is also expensive (but not as much as the StoreLoad), but it
>  > seems to be an interesting case to look at.
>  >
>  > In our old results (as far as I can interpret them) it did not seem to
>  > have any advantage/disadvantage, so I am just curious whether you did
>  > such tests and their conclusion.
> 
> Yes, I did this experiment. The load from card table on the fast path 
> turns out to be expensive for several benchmarks:
> https://cr.openjdk.java.net/~manc/8225776/20190516-jdk11G1WriteBarrier-dacapoDefault4G-YoungCheckFirst.html
> For this experiment, I was setting 4G heap with -XX:NewRatio=1, so most 
> writes happen to young object, and GC happens very infrequently.
> The implementation had some bug that some benchmarks crashed while 
> running. I didn't look into fixing the bug, as this direction does not 
> seem worthwhile.
> 
>  > - internal (quick) perf testing showed no overall score changes, except
>  > that maxJOPS on SpecJBB2015 seemed to improve by ~1.2% (only had time
>  > for very few experiments at this time, will rerun, so there is some
>  > chance that this has been a fluke) which is definitely nice.
> 
> Good to hear that!
> -Man



More information about the hotspot-compiler-dev mailing list