RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2]

Mon Aug 12 08:46:16 UTC 2024

On Fri, 9 Aug 2024 14:05:43 GMT, Martin Doerr <mdoerr at openjdk.org> wrote:

>> I agree that using indirect memory operands is the most readable choice, and is slightly less wasteful from a register usage perspective. However, when I tried this choice a couple of months ago, I observed timeouts in some CTW runs, which as far as I remember were caused when LCM processed huge basic blocks with lots of memory writes (e.g. arising from static initializations of large String arrays such as in [here](https://github.com/apache/lucene/blob/ea562f6ef2b32fe6eadf57c6381d9a69acb043c7/lucene/analysis/common/src/java/org/apache/lucene/analysis/en/KStemData1.java#L47-L748)), in combination with C2 stress options. In these scenarios, the large number of additional Mach nodes seemed to cause the timeouts. I settled for materializing the store address internally to guard against such corner cases. I did not see any significant performance difference between the two choices in my benchmark results.
>> 
>> I would like to study whether LCM can be made more robust in this scenario, which would enable using indirect memory operands here, but I think this would be best addressed in a separate RFE. Would it be OK by now to extend the code comment with the details provided in the above explanation?
>
> Ok, doing it in a separate RFE is fine with me. This sounds like a C2 problem which should get investigated. It may cause other performance problems, too. Maybe a native profiler can show what takes too much time.

Thanks Martin, I have added this to my list of follow-up tasks and extended the comment in the code with some more details (commit d21104ca8).

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1713372749