RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11]

Mon Sep 16 13:31:17 UTC 2024

On Thu, 12 Sep 2024 13:13:01 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>>> @rkennke Can you please explain the changes in these tests:
>>> 
>>> ```
>>> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java
>>> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java
>>> test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java
>>> test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java
>>> test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java
>>> ```
>>> 
>>> You added these IR rule restriction: `@IR(applyIf = {"UseCompactObjectHeaders", "false"},`
>>> 
>>> This means that if `UseCompactObjectHeaders` is enabled, vectorization seems to be impacted - that could be concerning because it has a performance impact.
>>> 
>>> I have recently changed a few things in SuperWord, so maybe some of them can be removed, because they now vectorize anyway?
>>> 
>>> Of course some special tests may just rely on `UseCompactObjectHeaders == false` - but I would need some comments in the tests where you added it to justify why we add the restriction.
>>> 
>>> Please also test this patch with the cross combinations of `UseCompactObjectHeaders` and `AlignVector` enabled and disabled (and add `VerifyAlignVector` as well).
>> 
>> IIRC (it has been a while), the problem is that with Lilliput (and also without compact headers, but disabling compressed class-pointers -UseCompressedClassPointers, but nobody ever does that), byte[] and long[] start at different offsets (12 and 16, respectively). That is because with compact headers, we are using the 4 bytes after the arraylength, but long-arrays cannot do that because of alignment constraints. The impact is that these tests don't work as expected, because vectorization triggers differently. I don't remember the details, TBH, but I believe they would now generate pre-loops, or some might even not vectorize at all. Those seemed to be use-cases that did not look very important, but I may be wrong. It would be nice to properly fix those tests, or make corresponding tests for compact headers, instead, or improve vectorization to better deal with the offset mismatch, if necessary/possible.
>> 
>> I will re-evaluate those tests, and add comments or remove the restrictions.
>
>> > > @rkennke Can you please explain the changes in these tests:
>> > > ```
>> > > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java
>> > > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java
>> > > test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java
>> > > test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java
>> > > test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java
>> > > ```
>> > > 
>> > > 
>> > >     
>> > >       
>> > >     
>> > > 
>> > >       
>> > >     
>> > > 
>> > >     
>> > >   
>> > > You added these IR rule restriction: `@IR(applyIf = {"UseCompactObjectHeaders", "false"},`
>> > > This means that if `UseCompactObjectHeaders` is enabled, vectorization seems to be impacted - that could be concerning because it has a performance impact.
>> > > I have recently changed a few things in SuperWord, so maybe some of them can be removed, because they now vectorize anyway?
>> > > Of course some special tests may just rely on `UseCompactObjectHeaders == false` - but I would need some comments in the tests where you added it to justify why we add the restriction.
>> > > Please also test this patch with the cross combinations of `UseCompactObjectHeaders` and `AlignVector` enabled and disabled (and add `VerifyAlignVector` as well).
>> > 
>> > 
>> > IIRC (it has been a while), the problem is that with Lilliput (and also without compact headers, but disabling compressed class-pointers -UseCompressedClassPointers, but nobody ever does that), byte[] and long[] start at different offsets (12 and 16, respectively). That is because with compact headers, we are using the 4 bytes after the arraylength, but long-arrays cannot do that because of alignment constraints. The impact is that these tests don't work as expected, because vectorization triggers differently. I don't remember the details, TBH, but I believe they would now generate pre-loops, or some might even not vectorize at all. Those seemed to be use-cases that did not look very important, but I may be wrong. It would be nice to properly fix those tests, or make corresponding tests for compact headers, instead, or improve vectorization to better deal with the offset mismatch, if necessary/possible.
>> > I will re-evaluate those tests, and add comments or remove the restrictions.
>> 
>> If it has indeed been a while, then it might well be that some of them work now, since I did make some improvements to auto-vectorization ...

> `LoadNKlass` nodes can then be expanded into more primitive operations (load and shift for compact headers, load with `klass_offset_in_bytes()` for original headers) within C2's back-end or even during code emission as sketched [here](https://github.com/robcasloz/jdk/commit/6cb4219f101e3be982264071c2cb1d0af1c6d754). @rkennke is this similar to what you tried out ("Expanding it as a macro")?

No, this is not what I tried. I tried to completely expand LoadNKlass, and replace it with the lower nodes that load and shift the mark-word right there, in ideal graph. But your approach is saner: there is so much implicit knowledge about Load(N)Klass, and even klass_offset_in_bytes(), all over the place, it would be very hard to get this right without breaking something.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2352926265