RFR (XS): Optimize branch frequency of G1's write post-barrier in C2

Mon Aug 5 20:04:09 UTC 2019

Looks OK to me

dl

On 8/3/19 12:27 PM, Thomas Schatzl wrote:
> ping at compiler team to have a quick look.
>
> Thanks,
>   Thomas
>
> On 11.07.19 16:35, Man Cao wrote:
>> Thanks Thomas for the review and running experiments!
>>
>>  > - can you share the code changes to generate the statistics? It would
>>  > be nice to confirm these on a few more applications and play around
>>  > with them a bit :)
>>  > I would like to confirm some very old numbers we have for other older
>>  > benchmarks that this is indeed the best probabibility distribution.
>>  > Particularly I do not understand that from these numbers we did not
>>  > change the probabilities as you suggested :( There were other changes
>>  > mostly related to barrier elision in that time frame, but it seems
>>  > likelihood changes were not attempted.
>>
>> It is here: http://cr.openjdk.java.net/~manc/8225776/branch_profiling/
>> I also added a comment in 
>> https://bugs.openjdk.java.net/browse/JDK-8225776 to clarify the 
>> methodology.
>>
>>  > - these numbers (and yours) also indicate that the not-young check is
>>  > very likely to be not taken (i.e. you jump over the storeload). 
>> Did you
>>  > also perform some experiments changing the order a bit?
>>  > It might be detrimental for this particular case where the 
>> StoreLoad is
>>  > expensive, and the xor/non-null filter out at least some 
>> additional of
>>  > those, but maybe
>>  > if (young) -> exit
>>  > if (different-region) -> exit
>>  > if (non-null) -> exit
>>  > StoreLoad
>>  > ...
>>  > may be better to do? I am aware that the "young" check adds a load,
>>  > which is also expensive (but not as much as the StoreLoad), but it
>>  > seems to be an interesting case to look at.
>>  >
>>  > In our old results (as far as I can interpret them) it did not 
>> seem to
>>  > have any advantage/disadvantage, so I am just curious whether you did
>>  > such tests and their conclusion.
>>
>> Yes, I did this experiment. The load from card table on the fast path 
>> turns out to be expensive for several benchmarks:
>> https://cr.openjdk.java.net/~manc/8225776/20190516-jdk11G1WriteBarrier-dacapoDefault4G-YoungCheckFirst.html 
>>
>> For this experiment, I was setting 4G heap with -XX:NewRatio=1, so 
>> most writes happen to young object, and GC happens very infrequently.
>> The implementation had some bug that some benchmarks crashed while 
>> running. I didn't look into fixing the bug, as this direction does 
>> not seem worthwhile.
>>
>>  > - internal (quick) perf testing showed no overall score changes, 
>> except
>>  > that maxJOPS on SpecJBB2015 seemed to improve by ~1.2% (only had time
>>  > for very few experiments at this time, will rerun, so there is some
>>  > chance that this has been a fluke) which is definitely nice.
>>
>> Good to hear that!
>> -Man
>