RFR (XS): Optimize branch frequency of G1's write post-barrier in C2
Man Cao
manc at google.com
Mon Aug 5 20:15:58 UTC 2019
Thanks for the reviews!
-Man
On Mon, Aug 5, 2019 at 1:04 PM <dean.long at oracle.com> wrote:
> Looks OK to me
>
> dl
>
> On 8/3/19 12:27 PM, Thomas Schatzl wrote:
> > ping at compiler team to have a quick look.
> >
> > Thanks,
> > Thomas
> >
> > On 11.07.19 16:35, Man Cao wrote:
> >> Thanks Thomas for the review and running experiments!
> >>
> >> > - can you share the code changes to generate the statistics? It would
> >> > be nice to confirm these on a few more applications and play around
> >> > with them a bit :)
> >> > I would like to confirm some very old numbers we have for other older
> >> > benchmarks that this is indeed the best probabibility distribution.
> >> > Particularly I do not understand that from these numbers we did not
> >> > change the probabilities as you suggested :( There were other changes
> >> > mostly related to barrier elision in that time frame, but it seems
> >> > likelihood changes were not attempted.
> >>
> >> It is here: http://cr.openjdk.java.net/~manc/8225776/branch_profiling/
> >> I also added a comment in
> >> https://bugs.openjdk.java.net/browse/JDK-8225776 to clarify the
> >> methodology.
> >>
> >> > - these numbers (and yours) also indicate that the not-young check is
> >> > very likely to be not taken (i.e. you jump over the storeload).
> >> Did you
> >> > also perform some experiments changing the order a bit?
> >> > It might be detrimental for this particular case where the
> >> StoreLoad is
> >> > expensive, and the xor/non-null filter out at least some
> >> additional of
> >> > those, but maybe
> >> > if (young) -> exit
> >> > if (different-region) -> exit
> >> > if (non-null) -> exit
> >> > StoreLoad
> >> > ...
> >> > may be better to do? I am aware that the "young" check adds a load,
> >> > which is also expensive (but not as much as the StoreLoad), but it
> >> > seems to be an interesting case to look at.
> >> >
> >> > In our old results (as far as I can interpret them) it did not
> >> seem to
> >> > have any advantage/disadvantage, so I am just curious whether you did
> >> > such tests and their conclusion.
> >>
> >> Yes, I did this experiment. The load from card table on the fast path
> >> turns out to be expensive for several benchmarks:
> >>
> https://cr.openjdk.java.net/~manc/8225776/20190516-jdk11G1WriteBarrier-dacapoDefault4G-YoungCheckFirst.html
> >>
> >> For this experiment, I was setting 4G heap with -XX:NewRatio=1, so
> >> most writes happen to young object, and GC happens very infrequently.
> >> The implementation had some bug that some benchmarks crashed while
> >> running. I didn't look into fixing the bug, as this direction does
> >> not seem worthwhile.
> >>
> >> > - internal (quick) perf testing showed no overall score changes,
> >> except
> >> > that maxJOPS on SpecJBB2015 seemed to improve by ~1.2% (only had time
> >> > for very few experiments at this time, will rerun, so there is some
> >> > chance that this has been a fluke) which is definitely nice.
> >>
> >> Good to hear that!
> >> -Man
> >
>
>
More information about the hotspot-compiler-dev
mailing list