From rkennke at openjdk.java.net Tue Feb 1 13:15:13 2022 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 1 Feb 2022 13:15:13 GMT Subject: [master] RFR: Load narrowKlass from header, AArch64 assembler implementation [v7] In-Reply-To: References: Message-ID: <6LYRawtbDd5YBa1hVWkWmDHHb0t3FQDVTOWDnbBANi4=.104adeed-dfdf-4ba5-8eec-3e9b469b6088@github.com> > This implements MacroAssembler::load_klass() to load the (narrow)Klass* from object header. Just like the x86_64 implementation, it checks whether it can take the fast path (object unlocked -> load from upper 32bits of header), or else calls the runtime to get a stable header and load from that. > > It adds a runtime call stub, which will also be used in the C2 implementation. It also adds nklass_offset_in_bytes() which will also be used in C2 impl. The part in generate_verify_oop() is a little nasty, I added a comment that explains what's going on. > > Testing: > - [x] tier1 (aarch64) > - [x] tier2 (aarch64) > - [x] tier3 (aarch64) Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Cleanup remaining uses of klass_offset_in_bytes in aarch64 ------------- Changes: - all: https://git.openjdk.java.net/lilliput/pull/36/files - new: https://git.openjdk.java.net/lilliput/pull/36/files/a43c0c85..e2e7d9df Webrevs: - full: https://webrevs.openjdk.java.net/?repo=lilliput&pr=36&range=06 - incr: https://webrevs.openjdk.java.net/?repo=lilliput&pr=36&range=05-06 Stats: 61 lines in 7 files changed: 7 ins; 27 del; 27 mod Patch: https://git.openjdk.java.net/lilliput/pull/36.diff Fetch: git fetch https://git.openjdk.java.net/lilliput pull/36/head:pull/36 PR: https://git.openjdk.java.net/lilliput/pull/36 From rkennke at openjdk.java.net Tue Feb 1 13:17:03 2022 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 1 Feb 2022 13:17:03 GMT Subject: [master] RFR: Load narrowKlass from header, AArch64 assembler implementation [v6] In-Reply-To: <87qYiKAzku357po-1g6UqJbiHs1cXqW6u2lv4OArBl4=.ece82d1f-c9dd-4158-8103-7067d52ef152@github.com> References: <87qYiKAzku357po-1g6UqJbiHs1cXqW6u2lv4OArBl4=.ece82d1f-c9dd-4158-8103-7067d52ef152@github.com> Message-ID: On Mon, 31 Jan 2022 12:59:12 GMT, Roman Kennke wrote: >> This implements MacroAssembler::load_klass() to load the (narrow)Klass* from object header. Just like the x86_64 implementation, it checks whether it can take the fast path (object unlocked -> load from upper 32bits of header), or else calls the runtime to get a stable header and load from that. >> >> It adds a runtime call stub, which will also be used in the C2 implementation. It also adds nklass_offset_in_bytes() which will also be used in C2 impl. The part in generate_verify_oop() is a little nasty, I added a comment that explains what's going on. >> >> Testing: >> - [x] tier1 (aarch64) >> - [x] tier2 (aarch64) >> - [x] tier3 (aarch64) > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Cleanups and document load_klass(), preserve rscratch1 and rscratch2 in runtime call stub I pushed more changes: - Split load_klass() and load_nklass() (we need the nklass version separately in a few places) - Cleanup remaining direct loads from klass_offset_in_bytes() in aarch64, replacing them with load_[n]klass() - Pass correct offset to null-checks that are preceding load_klass() - Trim code size estimate @theRealAph and/or @shipilev please re-review? Thanks! ------------- PR: https://git.openjdk.java.net/lilliput/pull/36 From aph at openjdk.java.net Tue Feb 1 18:31:47 2022 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 1 Feb 2022 18:31:47 GMT Subject: [master] RFR: Load narrowKlass from header, AArch64 assembler implementation [v7] In-Reply-To: <6LYRawtbDd5YBa1hVWkWmDHHb0t3FQDVTOWDnbBANi4=.104adeed-dfdf-4ba5-8eec-3e9b469b6088@github.com> References: <6LYRawtbDd5YBa1hVWkWmDHHb0t3FQDVTOWDnbBANi4=.104adeed-dfdf-4ba5-8eec-3e9b469b6088@github.com> Message-ID: On Tue, 1 Feb 2022 13:15:13 GMT, Roman Kennke wrote: >> This implements MacroAssembler::load_klass() to load the (narrow)Klass* from object header. Just like the x86_64 implementation, it checks whether it can take the fast path (object unlocked -> load from upper 32bits of header), or else calls the runtime to get a stable header and load from that. >> >> It adds a runtime call stub, which will also be used in the C2 implementation. It also adds nklass_offset_in_bytes() which will also be used in C2 impl. The part in generate_verify_oop() is a little nasty, I added a comment that explains what's going on. >> >> Testing: >> - [x] tier1 (aarch64) >> - [x] tier2 (aarch64) >> - [x] tier3 (aarch64) > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Cleanup remaining uses of klass_offset_in_bytes in aarch64 It looks good now. Having said that, the convention is mostly that macros can use rscratchN registers freely internally, and that any caller needs to be aware of that. Here is the opposite of that convention: `load_nklass` carefully avoids clobbering `rscratch1` and `rscratch2` in order to make things easier for the caller. So, while this is something of a surprise for the maintainer, it's an elegant way to solve the problem. ------------- Marked as reviewed by aph (Committer). PR: https://git.openjdk.java.net/lilliput/pull/36 From rkennke at openjdk.java.net Tue Feb 1 18:56:36 2022 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 1 Feb 2022 18:56:36 GMT Subject: [master] RFR: Load narrowKlass from header, AArch64 assembler implementation [v7] In-Reply-To: References: <6LYRawtbDd5YBa1hVWkWmDHHb0t3FQDVTOWDnbBANi4=.104adeed-dfdf-4ba5-8eec-3e9b469b6088@github.com> Message-ID: On Tue, 1 Feb 2022 18:28:42 GMT, Andrew Haley wrote: > It looks good now. Having said that, the convention is mostly that macros can use rscratchN registers freely internally, and that any caller needs to be aware of that. Here is the opposite of that convention: `load_nklass` carefully avoids clobbering `rscratch1` and `rscratch2` in order to make things easier for the caller. So, while this is something of a surprise for the maintainer, it's an elegant way to solve the problem. Thanks! I believe the trouble is that load_klass() is used in some C1 generated code paths, and C1 register allocator doesn't ignore rscratch1 and rscratch2. I may be wrong, though. ------------- PR: https://git.openjdk.java.net/lilliput/pull/36 From aph at openjdk.java.net Wed Feb 2 10:27:36 2022 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 2 Feb 2022 10:27:36 GMT Subject: [master] RFR: Load narrowKlass from header, AArch64 assembler implementation [v7] In-Reply-To: References: <6LYRawtbDd5YBa1hVWkWmDHHb0t3FQDVTOWDnbBANi4=.104adeed-dfdf-4ba5-8eec-3e9b469b6088@github.com>

Message-ID: On Tue, 1 Feb 2022 18:53:14 GMT, Roman Kennke wrote: > Thanks! I believe the trouble is that load_klass() is used in some C1 generated code paths, and C1 register allocator doesn't ignore rscratch1 and rscratch2. I may be wrong, though. I hope you are! - None of the register allocators can touch the scratch registers. Lots of hand-written code depends on that. ------------- PR: https://git.openjdk.java.net/lilliput/pull/36 From rkennke at openjdk.java.net Wed Feb 2 14:58:45 2022 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 2 Feb 2022 14:58:45 GMT Subject: [master] RFR: Load narrowKlass from header, AArch64 assembler implementation [v7] In-Reply-To: <6LYRawtbDd5YBa1hVWkWmDHHb0t3FQDVTOWDnbBANi4=.104adeed-dfdf-4ba5-8eec-3e9b469b6088@github.com> References: <6LYRawtbDd5YBa1hVWkWmDHHb0t3FQDVTOWDnbBANi4=.104adeed-dfdf-4ba5-8eec-3e9b469b6088@github.com> Message-ID: On Tue, 1 Feb 2022 13:15:13 GMT, Roman Kennke wrote: >> This implements MacroAssembler::load_klass() to load the (narrow)Klass* from object header. Just like the x86_64 implementation, it checks whether it can take the fast path (object unlocked -> load from upper 32bits of header), or else calls the runtime to get a stable header and load from that. >> >> It adds a runtime call stub, which will also be used in the C2 implementation. It also adds nklass_offset_in_bytes() which will also be used in C2 impl. The part in generate_verify_oop() is a little nasty, I added a comment that explains what's going on. >> >> Testing: >> - [x] tier1 (aarch64) >> - [x] tier2 (aarch64) >> - [x] tier3 (aarch64) > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Cleanup remaining uses of klass_offset_in_bytes in aarch64 > > Thanks! I believe the trouble is that load_klass() is used in some C1 generated code paths, and C1 register allocator doesn't ignore rscratch1 and rscratch2. I may be wrong, though. > > I hope you are! - None of the register allocators can touch the scratch registers. Lots of hand-written code depends on that. > > Thanks! I believe the trouble is that load_klass() is used in some C1 generated code paths, and C1 register allocator doesn't ignore rscratch1 and rscratch2. I may be wrong, though. > > I hope you are! - None of the register allocators can touch the scratch registers. Lots of hand-written code depends on that. I had a look at the C1 register allocator, and could verify that I was wrong. Thanks for reviewing and helping! ------------- PR: https://git.openjdk.java.net/lilliput/pull/36 From rkennke at openjdk.java.net Wed Feb 2 14:58:46 2022 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 2 Feb 2022 14:58:46 GMT Subject: [master] Integrated: Load narrowKlass from header, AArch64 assembler implementation In-Reply-To: References: Message-ID: On Wed, 26 Jan 2022 10:57:46 GMT, Roman Kennke wrote: > This implements MacroAssembler::load_klass() to load the (narrow)Klass* from object header. Just like the x86_64 implementation, it checks whether it can take the fast path (object unlocked -> load from upper 32bits of header), or else calls the runtime to get a stable header and load from that. > > It adds a runtime call stub, which will also be used in the C2 implementation. It also adds nklass_offset_in_bytes() which will also be used in C2 impl. The part in generate_verify_oop() is a little nasty, I added a comment that explains what's going on. > > Testing: > - [x] tier1 (aarch64) > - [x] tier2 (aarch64) > - [x] tier3 (aarch64) This pull request has now been integrated. Changeset: e3b8f983 Author: Roman Kennke URL: https://git.openjdk.java.net/lilliput/commit/e3b8f9837d0b405a712a323a53ed92de2632eb64 Stats: 160 lines in 12 files changed: 91 ins; 27 del; 42 mod Load narrowKlass from header, AArch64 assembler implementation Reviewed-by: aph ------------- PR: https://git.openjdk.java.net/lilliput/pull/36 From rkennke at openjdk.java.net Wed Feb 2 15:25:05 2022 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 2 Feb 2022 15:25:05 GMT Subject: [master] RFR: Remaining changes to load_[n]klass implementation in x86 assembly Message-ID: This change addresses some remaining problems in the x86 implementation of load_klass: - Split load_klass() and load_nklass() - Fix arraycopy stubs to use load_nklass() instead of direct access of the Klass* field - Use call stub for the slowpath runtime call (that reduces generated code size) - Save/restore FPU registers around runtime call Testing: - [x] tier1 - [ ] tier2 - [ ] tier3 ------------- Commit messages: - Cleanups - Merge branch 'master' into arraycopy-nklass - Trim code size estimate - Clean up and remaining fixes/improvements - Re-arrange 64/32bit code - Use movl to load nklass, not movptr - Improve handling of nklass_offset_in_bytes - Merge branch 'master' into arraycopy-nklass - Fix 32 bit - Merge branch 'master' into arraycopy-nklass - ... and 1 more: https://git.openjdk.java.net/lilliput/compare/e3b8f983...7cdb274a Changes: https://git.openjdk.java.net/lilliput/pull/31/files Webrev: https://webrevs.openjdk.java.net/?repo=lilliput&pr=31&range=00 Stats: 130 lines in 5 files changed: 80 ins; 26 del; 24 mod Patch: https://git.openjdk.java.net/lilliput/pull/31.diff Fetch: git fetch https://git.openjdk.java.net/lilliput pull/31/head:pull/31 PR: https://git.openjdk.java.net/lilliput/pull/31 From stuefe at openjdk.java.net Fri Feb 4 08:28:53 2022 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 4 Feb 2022 08:28:53 GMT Subject: [master] RFR: Fix CDS LotsOfClasses tests on 32-bit Message-ID: 68fbdb32af390762a01380b277a3ae30f864fdb4 introduced class pointer reduction to 22-bit. Since that change, `CompressedKlassPointers::encode_not_null` validates the given Klass* - checks that it can be properly encoded with 22 bits. That means `CompressedKlassPointers::encode_not_null` should only be called if UseCompressedClassPointers is true (so, only but unconditionally on 64-bit). Note that this asserts just puts our nose onto the question of what to do with Klass pointers on 32-bit. If we need to encode them into a smaller type, we need to find a way to do so for 32-bit too. There are several possibilities, ranging from introducing a class space (or class-space-like feature) for 32-bit too, allocating them wherever like today, but with a larger alignment, up to something completely different. But that is out of scope for this tiny fix. ------------- Commit messages: - Fix cds LotsOfClasses tests on 32-bit Changes: https://git.openjdk.java.net/lilliput/pull/38/files Webrev: https://webrevs.openjdk.java.net/?repo=lilliput&pr=38&range=00 Stats: 5 lines in 1 file changed: 4 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/lilliput/pull/38.diff Fetch: git fetch https://git.openjdk.java.net/lilliput pull/38/head:pull/38 PR: https://git.openjdk.java.net/lilliput/pull/38 From shade at openjdk.java.net Fri Feb 4 09:02:39 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 4 Feb 2022 09:02:39 GMT Subject: [master] RFR: Remaining changes to load_[n]klass implementation in x86 assembly In-Reply-To: References: Message-ID: <2mlxC7NHdkYcdGrJm9aUY0jAr0m_4M49sz0P5_2FRPU=.7c0d194f-b8f9-4961-b741-54964587da72@github.com> On Tue, 14 Dec 2021 11:37:22 GMT, Roman Kennke wrote: > This change addresses some remaining problems in the x86 implementation of load_klass: > - Split load_klass() and load_nklass() > - Fix arraycopy stubs to use load_nklass() instead of direct access of the Klass* field > - Use call stub for the slowpath runtime call (that reduces generated code size) > - Save/restore FPU registers around runtime call > > Testing: > - [x] tier1 (x86_64, x86_32) > - [ ] tier2 > - [ ] tier3 Looks fine to me. ------------- Marked as reviewed by shade (Committer). PR: https://git.openjdk.java.net/lilliput/pull/31 From rkennke at openjdk.java.net Fri Feb 4 10:42:35 2022 From: rkennke at openjdk.java.net (Roman Kennke) Date: Fri, 4 Feb 2022 10:42:35 GMT Subject: [master] RFR: Fix CDS LotsOfClasses tests on 32-bit In-Reply-To: References: Message-ID: On Fri, 4 Feb 2022 08:23:30 GMT, Thomas Stuefe wrote: > 68fbdb32af390762a01380b277a3ae30f864fdb4 introduced class pointer reduction to 22-bit. Since that change, `CompressedKlassPointers::encode_not_null` validates the given Klass* - checks that it can be properly encoded with 22 bits. That means `CompressedKlassPointers::encode_not_null` should only be called if UseCompressedClassPointers is true (so, only but unconditionally on 64-bit). > > Note that this asserts just puts our nose onto the question of what to do with Klass pointers on 32-bit. If we need to encode them into a smaller type, we need to find a way to do so for 32-bit too. There are several possibilities, ranging from introducing a class space (or class-space-like feature) for 32-bit too, allocating them wherever like today, but with a larger alignment, up to something completely different. But that is out of scope for this tiny fix. Looks good to me. Yes, for now, "encoding" Klass* in 32bit is ok. Eventually we're going to need something to encode Klass* in ~22 bits in 32bit builds, too. But not now. Thanks! ------------- Marked as reviewed by rkennke (Lead). PR: https://git.openjdk.java.net/lilliput/pull/38 From rkennke at openjdk.java.net Fri Feb 4 10:45:38 2022 From: rkennke at openjdk.java.net (Roman Kennke) Date: Fri, 4 Feb 2022 10:45:38 GMT Subject: [master] RFR: Remaining changes to load_[n]klass implementation in x86 assembly In-Reply-To: References: Message-ID: On Tue, 14 Dec 2021 11:37:22 GMT, Roman Kennke wrote: > This change addresses some remaining problems in the x86 implementation of load_klass: > - Split load_klass() and load_nklass() > - Fix arraycopy stubs to use load_nklass() instead of direct access of the Klass* field > - Use call stub for the slowpath runtime call (that reduces generated code size) > - Save/restore FPU registers around runtime call > > Testing: > - [x] tier1 (x86_64, x86_32) > - [ ] tier2 > - [ ] tier3 Thanks! ------------- PR: https://git.openjdk.java.net/lilliput/pull/31 From rkennke at openjdk.java.net Fri Feb 4 10:45:39 2022 From: rkennke at openjdk.java.net (Roman Kennke) Date: Fri, 4 Feb 2022 10:45:39 GMT Subject: [master] Integrated: Remaining changes to load_[n]klass implementation in x86 assembly In-Reply-To: References: Message-ID: On Tue, 14 Dec 2021 11:37:22 GMT, Roman Kennke wrote: > This change addresses some remaining problems in the x86 implementation of load_klass: > - Split load_klass() and load_nklass() > - Fix arraycopy stubs to use load_nklass() instead of direct access of the Klass* field > - Use call stub for the slowpath runtime call (that reduces generated code size) > - Save/restore FPU registers around runtime call > > Testing: > - [x] tier1 (x86_64, x86_32) > - [ ] tier2 > - [ ] tier3 This pull request has now been integrated. Changeset: caa6e789 Author: Roman Kennke URL: https://git.openjdk.java.net/lilliput/commit/caa6e78946c2e238ef0d0945ad210dcdc24f46dd Stats: 130 lines in 5 files changed: 80 ins; 26 del; 24 mod Remaining changes to load_[n]klass implementation in x86 assembly Reviewed-by: shade ------------- PR: https://git.openjdk.java.net/lilliput/pull/31 From stuefe at openjdk.java.net Fri Feb 4 10:59:25 2022 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 4 Feb 2022 10:59:25 GMT Subject: [master] RFR: Fix CDS LotsOfClasses tests on 32-bit In-Reply-To: References:

Message-ID: On Fri, 4 Feb 2022 10:39:21 GMT, Roman Kennke wrote: > Looks good to me. Yes, for now, "encoding" Klass* in 32bit is ok. Eventually we're going to need something to encode Klass* in ~22 bits in 32bit builds, too. But not now. Thanks! Thank you, @rkennke. ------------- PR: https://git.openjdk.java.net/lilliput/pull/38 From stuefe at openjdk.java.net Fri Feb 4 11:02:42 2022 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 4 Feb 2022 11:02:42 GMT Subject: [master] Integrated: Fix CDS LotsOfClasses tests on 32-bit In-Reply-To: References: Message-ID: On Fri, 4 Feb 2022 08:23:30 GMT, Thomas Stuefe wrote: > 68fbdb32af390762a01380b277a3ae30f864fdb4 introduced class pointer reduction to 22-bit. Since that change, `CompressedKlassPointers::encode_not_null` validates the given Klass* - checks that it can be properly encoded with 22 bits. That means `CompressedKlassPointers::encode_not_null` should only be called if UseCompressedClassPointers is true (so, only but unconditionally on 64-bit). > > Note that this asserts just puts our nose onto the question of what to do with Klass pointers on 32-bit. If we need to encode them into a smaller type, we need to find a way to do so for 32-bit too. There are several possibilities, ranging from introducing a class space (or class-space-like feature) for 32-bit too, allocating them wherever like today, but with a larger alignment, up to something completely different. But that is out of scope for this tiny fix. This pull request has now been integrated. Changeset: cf682984 Author: Thomas Stuefe URL: https://git.openjdk.java.net/lilliput/commit/cf682984f696da5c23c1ae69e8c85101425f3be8 Stats: 5 lines in 1 file changed: 4 ins; 0 del; 1 mod Fix CDS LotsOfClasses tests on 32-bit Reviewed-by: rkennke ------------- PR: https://git.openjdk.java.net/lilliput/pull/38 From rkennke at openjdk.java.net Fri Feb 4 15:58:55 2022 From: rkennke at openjdk.java.net (Roman Kennke) Date: Fri, 4 Feb 2022 15:58:55 GMT Subject: [master] RFR: Load Klass* from header, C2 implementation Message-ID: This implements loading the compressed Klass* from the object header in C2. This is quite a hack: normally we should change the load offset everywhere in C2 from 8 (current klass_offset_in_bytes) to 0 (the mark word offset). However, offset = 0 has different meaning in C2, it means no offset, and throws off all sorts of stuff in C2. Realistically, in the long run, we want to change to loading from offset 4 (upper half of the header), but we currently need to deal with locking, and thus load from offset 0. I fake it: I pretend to load from offset 4 everywhere in C2 ideal graph, but then load from offset 0 and check the mark bits instead. We can't leave the offset in ideal graph at 8 because that is going to conflict with loads from first field as soon as we drop the Klass*. We also need to ensure that C2 knows that the condition flags are clobbered (cr), and that src and dst are not the same register. Testing: - [x] tier1 (x86_64, aarch64) - [ ] tier2 - [ ] tier3 ------------- Commit messages: - Improve aarch64 LoadNKlass impl - Improve x86 LoadNKlass - Remove duplicate generate_load_nklass() - Merge remote-tracking branch 'upstream/master' into klass-from-header-c2 - Merge conflicts - Merge remote-tracking branch 'origin/klass-from-header-c2' into klass-from-header-c2 - Merge branch 'master' into klass-from-header-c2 - Remove klass_offset memory edge - Restore memory edge to klass_offset, we still need it - Remove assert - ... and 26 more: https://git.openjdk.java.net/lilliput/compare/cf682984...42d01cb2 Changes: https://git.openjdk.java.net/lilliput/pull/29/files Webrev: https://webrevs.openjdk.java.net/?repo=lilliput&pr=29&range=00 Stats: 65 lines in 17 files changed: 15 ins; 3 del; 47 mod Patch: https://git.openjdk.java.net/lilliput/pull/29.diff Fetch: git fetch https://git.openjdk.java.net/lilliput pull/29/head:pull/29 PR: https://git.openjdk.java.net/lilliput/pull/29 From rkennke at openjdk.java.net Fri Feb 4 16:19:31 2022 From: rkennke at openjdk.java.net (Roman Kennke) Date: Fri, 4 Feb 2022 16:19:31 GMT Subject: [master] RFR: Load Klass* from header, C2 implementation In-Reply-To: References: Message-ID: On Thu, 25 Nov 2021 17:39:43 GMT, Roman Kennke wrote: > This implements loading the compressed Klass* from the object header in C2. > This is quite a hack: normally we should change the load offset everywhere in C2 from 8 (current klass_offset_in_bytes) to 0 (the mark word offset). However, offset = 0 has different meaning in C2, it means no offset, and throws off all sorts of stuff in C2. Realistically, in the long run, we want to change to loading from offset 4 (upper half of the header), but we currently need to deal with locking, and thus load from offset 0. > > I fake it: I pretend to load from offset 4 everywhere in C2 ideal graph, but then load from offset 0 and check the mark bits instead. We can't leave the offset in ideal graph at 8 because that is going to conflict with loads from first field as soon as we drop the Klass*. > > We also need to ensure that C2 knows that the condition flags are clobbered (cr), and that src and dst are not the same register. > > Testing: > - [x] tier1 (x86_64, aarch64) > - [ ] tier2 > - [ ] tier3 Also, @rwestrel please review and comment. Unfortunately, you don't seem to be a Lilliput committer, so can't officially approve. We should probably change that. ------------- PR: https://git.openjdk.java.net/lilliput/pull/29 From rkennke at openjdk.java.net Mon Feb 7 14:50:40 2022 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 7 Feb 2022 14:50:40 GMT Subject: [master] RFR: Load Klass* from header, C2 implementation In-Reply-To: References:

Message-ID: On Mon, 7 Feb 2022 14:37:46 GMT, Roland Westrelin wrote: > Have you considered doing this in the IR graph instead of in platform dependent code? Maybe in Compile::final_graph_reshaping_impl() converting LoadNKlass to the markword load + logic to extract the class. Or do you think there's not much value in that approach? The current implementation is ok AFAICT. Yes. That seems much more complex, though, especially considering that this is most likely a temporary solution anyway. At least as far as I know, efforts are ongoing upstream to get rid of the locking stuff in the mark word altogether, at which point we no longer need the special handling of load-klass. However, this might well be several years away, and I wanted to build a prototype now, and we already have the assembly code to handle LoadNKlass, so I thought the path of least resistance would be to reuse that. It should not make a measurable difference perf-wise, I think. ------------- PR: https://git.openjdk.java.net/lilliput/pull/29 From rkennke at openjdk.java.net Wed Feb 9 10:23:34 2022 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 9 Feb 2022 10:23:34 GMT Subject: [master] RFR: Load Klass* from header, C2 implementation In-Reply-To: References: Message-ID: On Thu, 25 Nov 2021 17:39:43 GMT, Roman Kennke wrote: > This implements loading the compressed Klass* from the object header in C2. > This is quite a hack: normally we should change the load offset everywhere in C2 from 8 (current klass_offset_in_bytes) to 0 (the mark word offset). However, offset = 0 has different meaning in C2, it means no offset, and throws off all sorts of stuff in C2. Realistically, in the long run, we want to change to loading from offset 4 (upper half of the header), but we currently need to deal with locking, and thus load from offset 0. > > I fake it: I pretend to load from offset 4 everywhere in C2 ideal graph, but then load from offset 0 and check the mark bits instead. We can't leave the offset in ideal graph at 8 because that is going to conflict with loads from first field as soon as we drop the Klass*. > > We also need to ensure that C2 knows that the condition flags are clobbered (cr), and that src and dst are not the same register. > > Testing: > - [x] tier1 (x86_64, aarch64) > - [x] tier2 (x86_64, aarch64) > - [x] tier3 (x86_64, aarch64) @shipilev @theRealAph do you think this is good to go? ------------- PR: https://git.openjdk.java.net/lilliput/pull/29 From shade at openjdk.java.net Wed Feb 9 15:32:43 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 9 Feb 2022 15:32:43 GMT Subject: [master] RFR: Load Klass* from header, C2 implementation In-Reply-To: References: Message-ID: On Thu, 25 Nov 2021 17:39:43 GMT, Roman Kennke wrote: > This implements loading the compressed Klass* from the object header in C2. > This is quite a hack: normally we should change the load offset everywhere in C2 from 8 (current klass_offset_in_bytes) to 0 (the mark word offset). However, offset = 0 has different meaning in C2, it means no offset, and throws off all sorts of stuff in C2. Realistically, in the long run, we want to change to loading from offset 4 (upper half of the header), but we currently need to deal with locking, and thus load from offset 0. > > I fake it: I pretend to load from offset 4 everywhere in C2 ideal graph, but then load from offset 0 and check the mark bits instead. We can't leave the offset in ideal graph at 8 because that is going to conflict with loads from first field as soon as we drop the Klass*. > > We also need to ensure that C2 knows that the condition flags are clobbered (cr), and that src and dst are not the same register. > > Testing: > - [x] tier1 (x86_64, aarch64) > - [x] tier2 (x86_64, aarch64) > - [x] tier3 (x86_64, aarch64) I have comments :) src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 3716: > 3714: > 3715: bind(slow); > 3716: enter(); Why `enter()` and `leave()` are removed? I thought those manage the frame pointer to allow stack walk? IOW, it is not enough to just save the `lr`, we also need to manage `rfp`? src/hotspot/cpu/x86/x86_64.ad line 12268: > 12266: //instruct compN_mem_imm_klass(rFlagsRegU cr, memory mem, immNKlass src) > 12267: //%{ > 12268: // match(Set cr (CmpN src (LoadNKlass mem))); I think you can do `predicate(false)` to disable rules, while still compiling `ins_encode`, thus capturing refactoring errors there. No biggie if you want to comment out the rule. src/hotspot/share/oops/oop.hpp line 310: > 308: // for code generation > 309: static int mark_offset_in_bytes() { return offset_of(oopDesc, _mark); } > 310: static int nklass_offset_in_bytes() { Feels like it should be the other way around: always ask for `klass_offset_in_bytes()`. Return Lilliput offset in `_LP64`, return the usual `offset_of(oopDesc, _metadata._klass)`. Saves the headache of replacing `klass_offset_in_bytes` to `nklass_offset_in_bytes` everywhere? src/hotspot/share/opto/macro.cpp line 1676: > 1674: rawmem = make_store(control, rawmem, object, oopDesc::mark_offset_in_bytes(), mark_node, TypeX_X->basic_type()); > 1675: rawmem = make_store(control, rawmem, object, oopDesc::nklass_offset_in_bytes(), klass_node, T_METADATA); > 1676: rawmem = make_store(control, rawmem, object, oopDesc::klass_offset_in_bytes(), klass_node, T_METADATA); This looks patently weird. What do we store to, `nklass` or `klass`? Why both here? ------------- PR: https://git.openjdk.java.net/lilliput/pull/29 From rkennke at openjdk.java.net Wed Feb 9 17:10:47 2022 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 9 Feb 2022 17:10:47 GMT Subject: [master] RFR: Load Klass* from header, C2 implementation In-Reply-To: References:

Message-ID: On Wed, 9 Feb 2022 15:21:08 GMT, Aleksey Shipilev wrote: >> This implements loading the compressed Klass* from the object header in C2. >> This is quite a hack: normally we should change the load offset everywhere in C2 from 8 (current klass_offset_in_bytes) to 0 (the mark word offset). However, offset = 0 has different meaning in C2, it means no offset, and throws off all sorts of stuff in C2. Realistically, in the long run, we want to change to loading from offset 4 (upper half of the header), but we currently need to deal with locking, and thus load from offset 0. >> >> I fake it: I pretend to load from offset 4 everywhere in C2 ideal graph, but then load from offset 0 and check the mark bits instead. We can't leave the offset in ideal graph at 8 because that is going to conflict with loads from first field as soon as we drop the Klass*. >> >> We also need to ensure that C2 knows that the condition flags are clobbered (cr), and that src and dst are not the same register. >> >> Testing: >> - [x] tier1 (x86_64, aarch64) >> - [x] tier2 (x86_64, aarch64) >> - [x] tier3 (x86_64, aarch64) > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 3716: > >> 3714: >> 3715: bind(slow); >> 3716: enter(); > > Why `enter()` and `leave()` are removed? I thought those manage the frame pointer to allow stack walk? IOW, it is not enough to just save the `lr`, we also need to manage `rfp`? enter() saves/restores lr and rfp, and then overrides rfp with current sp. leave() reverts that. C2 doesn't seem to like messing with rfp, but we still need to preserve lr at this point, so that subroutine calls can find their way back. The stub does its own enter/leave around the slow-path call. > src/hotspot/cpu/x86/x86_64.ad line 12268: > >> 12266: //instruct compN_mem_imm_klass(rFlagsRegU cr, memory mem, immNKlass src) >> 12267: //%{ >> 12268: // match(Set cr (CmpN src (LoadNKlass mem))); > > I think you can do `predicate(false)` to disable rules, while still compiling `ins_encode`, thus capturing refactoring errors there. No biggie if you want to comment out the rule. Ok, good suggestion. I will change it. > src/hotspot/share/oops/oop.hpp line 310: > >> 308: // for code generation >> 309: static int mark_offset_in_bytes() { return offset_of(oopDesc, _mark); } >> 310: static int nklass_offset_in_bytes() { > > Feels like it should be the other way around: always ask for `klass_offset_in_bytes()`. Return Lilliput offset in `_LP64`, return the usual `offset_of(oopDesc, _metadata._klass)`. Saves the headache of replacing `klass_offset_in_bytes` to `nklass_offset_in_bytes` everywhere? Yes, I can do that as soon as the Klass* is eliminated. Currently we still need to deal with both. See below. > src/hotspot/share/opto/macro.cpp line 1676: > >> 1674: rawmem = make_store(control, rawmem, object, oopDesc::mark_offset_in_bytes(), mark_node, TypeX_X->basic_type()); >> 1675: rawmem = make_store(control, rawmem, object, oopDesc::nklass_offset_in_bytes(), klass_node, T_METADATA); >> 1676: rawmem = make_store(control, rawmem, object, oopDesc::klass_offset_in_bytes(), klass_node, T_METADATA); > > This looks patently weird. What do we store to, `nklass` or `klass`? Why both here? The Klass* will be eliminated in the subsequent PR. Until then, we have to keep both. ------------- PR: https://git.openjdk.java.net/lilliput/pull/29 From rkennke at openjdk.java.net Wed Feb 9 18:20:10 2022 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 9 Feb 2022 18:20:10 GMT Subject: [master] RFR: Load Klass* from header, C2 implementation [v2] In-Reply-To: References: Message-ID: > This implements loading the compressed Klass* from the object header in C2. > This is quite a hack: normally we should change the load offset everywhere in C2 from 8 (current klass_offset_in_bytes) to 0 (the mark word offset). However, offset = 0 has different meaning in C2, it means no offset, and throws off all sorts of stuff in C2. Realistically, in the long run, we want to change to loading from offset 4 (upper half of the header), but we currently need to deal with locking, and thus load from offset 0. > > I fake it: I pretend to load from offset 4 everywhere in C2 ideal graph, but then load from offset 0 and check the mark bits instead. We can't leave the offset in ideal graph at 8 because that is going to conflict with loads from first field as soon as we drop the Klass*. > > We also need to ensure that C2 knows that the condition flags are clobbered (cr), and that src and dst are not the same register. > > Testing: > - [x] tier1 (x86_64, aarch64) > - [x] tier2 (x86_64, aarch64) > - [x] tier3 (x86_64, aarch64) Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Merge remote-tracking branch 'origin/klass-from-header-c2' into klass-from-header-c2 - Disable compN_mem_imm_klass by predicate, not commenting out ------------- Changes: - all: https://git.openjdk.java.net/lilliput/pull/29/files - new: https://git.openjdk.java.net/lilliput/pull/29/files/42d01cb2..0488ff4b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=lilliput&pr=29&range=01 - incr: https://webrevs.openjdk.java.net/?repo=lilliput&pr=29&range=00-01 Stats: 11 lines in 1 file changed: 1 ins; 0 del; 10 mod Patch: https://git.openjdk.java.net/lilliput/pull/29.diff Fetch: git fetch https://git.openjdk.java.net/lilliput pull/29/head:pull/29 PR: https://git.openjdk.java.net/lilliput/pull/29 From rkennke at openjdk.java.net Thu Feb 10 15:01:10 2022 From: rkennke at openjdk.java.net (Roman Kennke) Date: Thu, 10 Feb 2022 15:01:10 GMT Subject: [master] RFR: Load Klass* from header, C2 implementation [v3] In-Reply-To: References: Message-ID: <2SwjT4KpUuf6FMZdLWLuU5gEWLJcWZfFnay4ARU0npI=.9bb37e73-caa1-4013-be82-0fed42622734@github.com> > This implements loading the compressed Klass* from the object header in C2. > This is quite a hack: normally we should change the load offset everywhere in C2 from 8 (current klass_offset_in_bytes) to 0 (the mark word offset). However, offset = 0 has different meaning in C2, it means no offset, and throws off all sorts of stuff in C2. Realistically, in the long run, we want to change to loading from offset 4 (upper half of the header), but we currently need to deal with locking, and thus load from offset 0. > > I fake it: I pretend to load from offset 4 everywhere in C2 ideal graph, but then load from offset 0 and check the mark bits instead. We can't leave the offset in ideal graph at 8 because that is going to conflict with loads from first field as soon as we drop the Klass*. > > We also need to ensure that C2 knows that the condition flags are clobbered (cr), and that src and dst are not the same register. > > Testing: > - [x] tier1 (x86_64, aarch64) > - [x] tier2 (x86_64, aarch64) > - [x] tier3 (x86_64, aarch64) Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Remove store to nklass_offset ------------- Changes: - all: https://git.openjdk.java.net/lilliput/pull/29/files - new: https://git.openjdk.java.net/lilliput/pull/29/files/0488ff4b..d51468a0 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=lilliput&pr=29&range=02 - incr: https://webrevs.openjdk.java.net/?repo=lilliput&pr=29&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/lilliput/pull/29.diff Fetch: git fetch https://git.openjdk.java.net/lilliput pull/29/head:pull/29 PR: https://git.openjdk.java.net/lilliput/pull/29 From rkennke at openjdk.java.net Thu Feb 10 15:01:11 2022 From: rkennke at openjdk.java.net (Roman Kennke) Date: Thu, 10 Feb 2022 15:01:11 GMT Subject: [master] RFR: Load Klass* from header, C2 implementation [v3] In-Reply-To: References:

Message-ID: On Wed, 9 Feb 2022 17:07:39 GMT, Roman Kennke wrote: >> src/hotspot/share/opto/macro.cpp line 1676: >> >>> 1674: rawmem = make_store(control, rawmem, object, oopDesc::mark_offset_in_bytes(), mark_node, TypeX_X->basic_type()); >>> 1675: rawmem = make_store(control, rawmem, object, oopDesc::nklass_offset_in_bytes(), klass_node, T_METADATA); >>> 1676: rawmem = make_store(control, rawmem, object, oopDesc::klass_offset_in_bytes(), klass_node, T_METADATA); >> >> This looks patently weird. What do we store to, `nklass` or `klass`? Why both here? > > The Klass* will be eliminated in the subsequent PR. Until then, we have to keep both. I believe I can safely remove the store to the nklass_offset. The narrow Klass* in the header is already initialized by storing the initial mark word. I only put the store to nklass_offset there to set-up a memory edge to subsequent load-klass. However, the initializer should always emit a membar, which provides the required memory ordering between the initializing store of the narrowKlass* and subsequent loads. ------------- PR: https://git.openjdk.java.net/lilliput/pull/29 From rkennke at openjdk.java.net Thu Feb 10 18:55:07 2022 From: rkennke at openjdk.java.net (Roman Kennke) Date: Thu, 10 Feb 2022 18:55:07 GMT Subject: [master] RFR: Load Klass* from header, C2 implementation [v4] In-Reply-To: References: Message-ID: > This implements loading the compressed Klass* from the object header in C2. > This is quite a hack: normally we should change the load offset everywhere in C2 from 8 (current klass_offset_in_bytes) to 0 (the mark word offset). However, offset = 0 has different meaning in C2, it means no offset, and throws off all sorts of stuff in C2. Realistically, in the long run, we want to change to loading from offset 4 (upper half of the header), but we currently need to deal with locking, and thus load from offset 0. > > I fake it: I pretend to load from offset 4 everywhere in C2 ideal graph, but then load from offset 0 and check the mark bits instead. We can't leave the offset in ideal graph at 8 because that is going to conflict with loads from first field as soon as we drop the Klass*. > > We also need to ensure that C2 knows that the condition flags are clobbered (cr), and that src and dst are not the same register. > > Testing: > - [x] tier1 (x86_64, aarch64) > - [x] tier2 (x86_64, aarch64) > - [x] tier3 (x86_64, aarch64) Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Revert unnecessary change - Experiment to keep using 8 as offset for LoadNKlass ------------- Changes: - all: https://git.openjdk.java.net/lilliput/pull/29/files - new: https://git.openjdk.java.net/lilliput/pull/29/files/d51468a0..12d09bed Webrevs: - full: https://webrevs.openjdk.java.net/?repo=lilliput&pr=29&range=03 - incr: https://webrevs.openjdk.java.net/?repo=lilliput&pr=29&range=02-03 Stats: 35 lines in 16 files changed: 3 ins; 7 del; 25 mod Patch: https://git.openjdk.java.net/lilliput/pull/29.diff Fetch: git fetch https://git.openjdk.java.net/lilliput pull/29/head:pull/29 PR: https://git.openjdk.java.net/lilliput/pull/29 From shade at openjdk.java.net Fri Feb 11 07:16:42 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 11 Feb 2022 07:16:42 GMT Subject: [master] RFR: Load Klass* from header, C2 implementation [v4] In-Reply-To: References:

Message-ID: On Thu, 10 Feb 2022 18:55:07 GMT, Roman Kennke wrote: >> This implements loading the compressed Klass* from the object header in C2. >> This is quite a hack: normally we should change the load offset everywhere in C2 from 8 (current klass_offset_in_bytes) to 0 (the mark word offset). However, offset = 0 has different meaning in C2, it means no offset, and throws off all sorts of stuff in C2. Realistically, in the long run, we want to change to loading from offset 4 (upper half of the header), but we currently need to deal with locking, and thus load from offset 0. >> >> I fake it: I pretend to load from offset 4 everywhere in C2 ideal graph, but then load from offset 0 and check the mark bits instead. We can't leave the offset in ideal graph at 8 because that is going to conflict with loads from first field as soon as we drop the Klass*. >> >> We also need to ensure that C2 knows that the condition flags are clobbered (cr), and that src and dst are not the same register. >> >> Testing: >> - [x] tier1 (x86_64, aarch64) >> - [x] tier2 (x86_64, aarch64) >> - [x] tier3 (x86_64, aarch64) > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Revert unnecessary change > - Experiment to keep using 8 as offset for LoadNKlass This PR is suddenly much shorter, is it still correct? The change seems fine, though. src/hotspot/share/opto/matcher.cpp line 2980: > 2978: } > 2979: tty->print("--N: "); > 2980: _leaf->dump(3); Leftover debugging crumb? ------------- Marked as reviewed by shade (Committer). PR: https://git.openjdk.java.net/lilliput/pull/29 From shade at openjdk.java.net Fri Feb 11 07:16:45 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 11 Feb 2022 07:16:45 GMT Subject: [master] RFR: Load Klass* from header, C2 implementation [v4] In-Reply-To: References:

Message-ID: On Wed, 9 Feb 2022 17:05:06 GMT, Roman Kennke wrote: >> src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 3716: >> >>> 3714: >>> 3715: bind(slow); >>> 3716: enter(); >> >> Why `enter()` and `leave()` are removed? I thought those manage the frame pointer to allow stack walk? IOW, it is not enough to just save the `lr`, we also need to manage `rfp`? > > enter() saves/restores lr and rfp, and then overrides rfp with current sp. leave() reverts that. C2 doesn't seem to like messing with rfp, but we still need to preserve lr at this point, so that subroutine calls can find their way back. The stub does its own enter/leave around the slow-path call. OK. ------------- PR: https://git.openjdk.java.net/lilliput/pull/29 From rkennke at openjdk.java.net Fri Feb 11 09:54:43 2022 From: rkennke at openjdk.java.net (Roman Kennke) Date: Fri, 11 Feb 2022 09:54:43 GMT Subject: [master] RFR: Load Klass* from header, C2 implementation [v4] In-Reply-To: References:

Message-ID: <4dS3sp-ZQTp5PtBoqlJSrl06ajr2uYZ3gwB1A7lMK8M=.977be8f5-4f9b-4f17-a05b-3e619aa1cb0d@github.com> On Fri, 11 Feb 2022 07:11:40 GMT, Aleksey Shipilev wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Revert unnecessary change >> - Experiment to keep using 8 as offset for LoadNKlass > > src/hotspot/share/opto/matcher.cpp line 2980: > >> 2978: } >> 2979: tty->print("--N: "); >> 2980: _leaf->dump(3); > > Leftover debugging crumb? The whole last change is intended for debugging with @rwestrel, I am not sure yet if we can convince C2 to work with 8 offset. We might revert back to offset 4 for all of C2. ------------- PR: https://git.openjdk.java.net/lilliput/pull/29 From rkennke at openjdk.java.net Fri Feb 11 15:31:09 2022 From: rkennke at openjdk.java.net (Roman Kennke) Date: Fri, 11 Feb 2022 15:31:09 GMT Subject: [master] RFR: Load Klass* from header, C2 implementation [v5] In-Reply-To: References: Message-ID: > This implements loading the compressed Klass* from the object header in C2. > This is quite a hack: normally we should change the load offset everywhere in C2 from 8 (current klass_offset_in_bytes) to 0 (the mark word offset). However, offset = 0 has different meaning in C2, it means no offset, and throws off all sorts of stuff in C2. Realistically, in the long run, we want to change to loading from offset 4 (upper half of the header), but we currently need to deal with locking, and thus load from offset 0. > > I fake it: I pretend to load from offset 4 everywhere in C2 ideal graph, but then load from offset 0 and check the mark bits instead. We can't leave the offset in ideal graph at 8 because that is going to conflict with loads from first field as soon as we drop the Klass*. > > We also need to ensure that C2 knows that the condition flags are clobbered (cr), and that src and dst are not the same register. > > Testing: > - [x] tier1 (x86_64, aarch64) > - [x] tier2 (x86_64, aarch64) > - [x] tier3 (x86_64, aarch64) Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Revert "Experiment to keep using 8 as offset for LoadNKlass" This reverts commit d247c3e285efb012595b879ec64f3314e26d6318. ------------- Changes: - all: https://git.openjdk.java.net/lilliput/pull/29/files - new: https://git.openjdk.java.net/lilliput/pull/29/files/12d09bed..58f8071c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=lilliput&pr=29&range=04 - incr: https://webrevs.openjdk.java.net/?repo=lilliput&pr=29&range=03-04 Stats: 29 lines in 15 files changed: 3 ins; 1 del; 25 mod Patch: https://git.openjdk.java.net/lilliput/pull/29.diff Fetch: git fetch https://git.openjdk.java.net/lilliput pull/29/head:pull/29 PR: https://git.openjdk.java.net/lilliput/pull/29 From shade at openjdk.java.net Mon Feb 14 07:59:43 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 14 Feb 2022 07:59:43 GMT Subject: [master] RFR: Load Klass* from header, C2 implementation [v5] In-Reply-To: References:

Message-ID: <8lZ7xM2nPfjTdB4NOQTbV4Moy336TfX7mIiQssaq3ks=.4d8e2c31-1431-4109-a56f-80353a3d4ef3@github.com> On Fri, 11 Feb 2022 15:31:09 GMT, Roman Kennke wrote: >> This implements loading the compressed Klass* from the object header in C2. >> This is quite a hack: normally we should change the load offset everywhere in C2 from 8 (current klass_offset_in_bytes) to 0 (the mark word offset). However, offset = 0 has different meaning in C2, it means no offset, and throws off all sorts of stuff in C2. Realistically, in the long run, we want to change to loading from offset 4 (upper half of the header), but we currently need to deal with locking, and thus load from offset 0. >> >> I fake it: I pretend to load from offset 4 everywhere in C2 ideal graph, but then load from offset 0 and check the mark bits instead. We can't leave the offset in ideal graph at 8 because that is going to conflict with loads from first field as soon as we drop the Klass*. >> >> We also need to ensure that C2 knows that the condition flags are clobbered (cr), and that src and dst are not the same register. >> >> Testing: >> - [x] tier1 (x86_64, aarch64) >> - [x] tier2 (x86_64, aarch64) >> - [x] tier3 (x86_64, aarch64) > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Revert "Experiment to keep using 8 as offset for LoadNKlass" > > This reverts commit d247c3e285efb012595b879ec64f3314e26d6318. All right, let's do this. ------------- Marked as reviewed by shade (Committer). PR: https://git.openjdk.java.net/lilliput/pull/29 From rkennke at openjdk.java.net Wed Feb 16 11:28:22 2022 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 16 Feb 2022 11:28:22 GMT Subject: [master] Integrated: Load Klass* from header, C2 implementation In-Reply-To: References: Message-ID: On Thu, 25 Nov 2021 17:39:43 GMT, Roman Kennke wrote: > This implements loading the compressed Klass* from the object header in C2. > This is quite a hack: normally we should change the load offset everywhere in C2 from 8 (current klass_offset_in_bytes) to 0 (the mark word offset). However, offset = 0 has different meaning in C2, it means no offset, and throws off all sorts of stuff in C2. Realistically, in the long run, we want to change to loading from offset 4 (upper half of the header), but we currently need to deal with locking, and thus load from offset 0. > > I fake it: I pretend to load from offset 4 everywhere in C2 ideal graph, but then load from offset 0 and check the mark bits instead. We can't leave the offset in ideal graph at 8 because that is going to conflict with loads from first field as soon as we drop the Klass*. > > We also need to ensure that C2 knows that the condition flags are clobbered (cr), and that src and dst are not the same register. > > Testing: > - [x] tier1 (x86_64, aarch64) > - [x] tier2 (x86_64, aarch64) > - [x] tier3 (x86_64, aarch64) This pull request has now been integrated. Changeset: 9c474685 Author: Roman Kennke URL: https://git.openjdk.java.net/lilliput/commit/9c47468527d4b81180a284d0b1367d91fe49ed50 Stats: 51 lines in 16 files changed: 13 ins; 3 del; 35 mod Load Klass* from header, C2 implementation Reviewed-by: shade ------------- PR: https://git.openjdk.java.net/lilliput/pull/29 From rkennke at openjdk.java.net Mon Feb 21 15:15:24 2022 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 21 Feb 2022 15:15:24 GMT Subject: [master] RFR: Eliminate Klass* word Message-ID: This change removes the dedicated Klass* word in the oop structure, at least in 64 bit builds. It has been usused, except for verification, since integration of #29. The change makes regular objects smaller, the field layouter can now use the extra space and start laying out fields at offset 8 (instead of 12 or 16, depending on class-compression). However, arrays remain unchanged because they still store the array length in offset 8, and current array code expects the array elements to start at 64bit aligned address, even for smaller data types. I will address this in a subsequent change. Much of the change is reverting the klass_offset -> nklass_offset change by #29, as promised. I also disabled s390x and ppc64le GHA builds, because they are broken by this change. I problem-listed a number of SA tests. The SA is broken by this change because it needs to load the Klass* from a regular field. I tried to change it, but it looks very difficult: it doesn't have a way to safely access the object mark word in the face of concurrent activity by the locking subsystem. Fixing the SA is not very high priority at this point, it may ultimately not even be necessary when/if the locking subsystem doesn't use the mark word anymore (or at least, does no longer displace the header). Testing: - [x] tier1 (x86_64, x86_32, aarch64) - [x] tier2 (x86_64, x86_32, aarch64) - [ ] tier3 (x86_64, x86_32, aarch64) ------------- Commit messages: - Remove remainder of s390x and ppc64le build defs - Merge remote-tracking branch 'origin/no-init-klass-word2' into no-init-klass-word2 - Aarch64 support - Problem-list spuriously failing ToolTabSnippetTest.java - Remove broken s390x and ppc64le builds from GHA - Eliminate Klass* word Changes: https://git.openjdk.java.net/lilliput/pull/40/files Webrev: https://webrevs.openjdk.java.net/?repo=lilliput&pr=40&range=00 Stats: 255 lines in 43 files changed: 62 ins; 129 del; 64 mod Patch: https://git.openjdk.java.net/lilliput/pull/40.diff Fetch: git fetch https://git.openjdk.java.net/lilliput pull/40/head:pull/40 PR: https://git.openjdk.java.net/lilliput/pull/40 From rkennke at openjdk.java.net Mon Feb 21 16:34:07 2022 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 21 Feb 2022 16:34:07 GMT Subject: [master] RFR: Eliminate Klass* word [v2] In-Reply-To: References: Message-ID: > This change removes the dedicated Klass* word in the oop structure, at least in 64 bit builds. It has been usused, except for verification, since integration of #29. > > The change makes regular objects smaller, the field layouter can now use the extra space and start laying out fields at offset 8 (instead of 12 or 16, depending on class-compression). However, arrays remain unchanged because they still store the array length in offset 8, and current array code expects the array elements to start at 64bit aligned address, even for smaller data types. I will address this in a subsequent change. > > Much of the change is reverting the klass_offset -> nklass_offset change by #29, as promised. I also disabled s390x and ppc64le GHA builds, because they are broken by this change. > > I problem-listed a number of SA tests. The SA is broken by this change because it needs to load the Klass* from a regular field. I tried to change it, but it looks very difficult: it doesn't have a way to safely access the object mark word in the face of concurrent activity by the locking subsystem. Fixing the SA is not very high priority at this point, it may ultimately not even be necessary when/if the locking subsystem doesn't use the mark word anymore (or at least, does no longer displace the header). > > Testing: > - [x] tier1 (x86_64, x86_32, aarch64) > - [x] tier2 (x86_64, x86_32, aarch64) > - [ ] tier3 (x86_64, x86_32, aarch64) Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix test that failed because of smaller object size ------------- Changes: - all: https://git.openjdk.java.net/lilliput/pull/40/files - new: https://git.openjdk.java.net/lilliput/pull/40/files/105cb3b5..fa9dcf62 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=lilliput&pr=40&range=01 - incr: https://webrevs.openjdk.java.net/?repo=lilliput&pr=40&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/lilliput/pull/40.diff Fetch: git fetch https://git.openjdk.java.net/lilliput pull/40/head:pull/40 PR: https://git.openjdk.java.net/lilliput/pull/40 From shade at openjdk.java.net Tue Feb 22 08:16:23 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 22 Feb 2022 08:16:23 GMT Subject: [master] RFR: Eliminate Klass* word [v2] In-Reply-To: References:

Message-ID: On Mon, 21 Feb 2022 16:34:07 GMT, Roman Kennke wrote: >> This change removes the dedicated Klass* word in the oop structure, at least in 64 bit builds. It has been usused, except for verification, since integration of #29. >> >> The change makes regular objects smaller, the field layouter can now use the extra space and start laying out fields at offset 8 (instead of 12 or 16, depending on class-compression). However, arrays remain unchanged because they still store the array length in offset 8, and current array code expects the array elements to start at 64bit aligned address, even for smaller data types. I will address this in a subsequent change. >> >> Much of the change is reverting the klass_offset -> nklass_offset change by #29, as promised. I also disabled s390x and ppc64le GHA builds, because they are broken by this change. >> >> I problem-listed a number of SA tests. The SA is broken by this change because it needs to load the Klass* from a regular field. I tried to change it, but it looks very difficult: it doesn't have a way to safely access the object mark word in the face of concurrent activity by the locking subsystem. Fixing the SA is not very high priority at this point, it may ultimately not even be necessary when/if the locking subsystem doesn't use the mark word anymore (or at least, does no longer displace the header). >> >> Testing: >> - [x] tier1 (x86_64, x86_32, aarch64) >> - [x] tier2 (x86_64, x86_32, aarch64) >> - [ ] tier3 (x86_64, x86_32, aarch64) > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix test that failed because of smaller object size Looks fine to me for the experimental code. src/hotspot/share/gc/parallel/psOldGen.cpp line 390: > 388: > 389: virtual void do_object(oop obj) { > 390: HeapWord* test_addr = cast_from_oop(obj); A bit confused by this change, why? test/langtools/ProblemList.txt line 76: > 74: > 75: > 76: jdk/jshell/ToolTabSnippetTest.java 1234567 generic-all This still fails? ------------- Marked as reviewed by shade (Committer). PR: https://git.openjdk.java.net/lilliput/pull/40 From rkennke at openjdk.java.net Tue Feb 22 08:27:05 2022 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 22 Feb 2022 08:27:05 GMT Subject: [master] RFR: Eliminate Klass* word [v2] In-Reply-To: References:

Message-ID: On Tue, 22 Feb 2022 08:12:28 GMT, Aleksey Shipilev wrote: > Looks fine to me for the experimental code. Thanks! I want to fix one more failing (because of changed object size) test in tier3 before I push it. > src/hotspot/share/gc/parallel/psOldGen.cpp line 390: > >> 388: >> 389: virtual void do_object(oop obj) { >> 390: HeapWord* test_addr = cast_from_oop(obj); > > A bit confused by this change, why? The code there checks whether a routine can find the start of an object (by looking up in an offset table), starting from somewhere in the middle of an object. It allocates a minimal object, which is 2 words length, increases offset by 1, and starts searching from there. However, with this change, objects may be a single word, and increasing offset by one would find the next object instead. TBH, I wasn't quite sure what to do with that code. > test/langtools/ProblemList.txt line 76: > >> 74: >> 75: >> 76: jdk/jshell/ToolTabSnippetTest.java 1234567 generic-all > > This still fails? Yeah, intermittendly, in GHA. It's not related to this change, but I wanted GHA green. ------------- PR: https://git.openjdk.java.net/lilliput/pull/40 From rkennke at openjdk.java.net Tue Feb 22 11:52:54 2022 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 22 Feb 2022 11:52:54 GMT Subject: [master] RFR: Eliminate Klass* word [v3] In-Reply-To: References: Message-ID: > This change removes the dedicated Klass* word in the oop structure, at least in 64 bit builds. It has been usused, except for verification, since integration of #29. > > The change makes regular objects smaller, the field layouter can now use the extra space and start laying out fields at offset 8 (instead of 12 or 16, depending on class-compression). However, arrays remain unchanged because they still store the array length in offset 8, and current array code expects the array elements to start at 64bit aligned address, even for smaller data types. I will address this in a subsequent change. > > Much of the change is reverting the klass_offset -> nklass_offset change by #29, as promised. I also disabled s390x and ppc64le GHA builds, because they are broken by this change. > > I problem-listed a number of SA tests. The SA is broken by this change because it needs to load the Klass* from a regular field. I tried to change it, but it looks very difficult: it doesn't have a way to safely access the object mark word in the face of concurrent activity by the locking subsystem. Fixing the SA is not very high priority at this point, it may ultimately not even be necessary when/if the locking subsystem doesn't use the mark word anymore (or at least, does no longer displace the header). > > Testing: > - [x] tier1 (x86_64, x86_32, aarch64) > - [x] tier2 (x86_64, x86_32, aarch64) > - [ ] tier3 (x86_64, x86_32, aarch64) Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix failing test ------------- Changes: - all: https://git.openjdk.java.net/lilliput/pull/40/files - new: https://git.openjdk.java.net/lilliput/pull/40/files/fa9dcf62..10ce5b11 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=lilliput&pr=40&range=02 - incr: https://webrevs.openjdk.java.net/?repo=lilliput&pr=40&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/lilliput/pull/40.diff Fetch: git fetch https://git.openjdk.java.net/lilliput pull/40/head:pull/40 PR: https://git.openjdk.java.net/lilliput/pull/40 From rkennke at openjdk.java.net Wed Feb 23 12:28:25 2022 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 23 Feb 2022 12:28:25 GMT Subject: [master] Integrated: Eliminate Klass* word In-Reply-To: References: Message-ID: <0Vo66tM1g2mWrmtUTBmAGCV4cFz21rA8nL2RF36y0GU=.69c4d595-42a0-4f8f-8c1e-1be8fdfde07d@github.com> On Thu, 17 Feb 2022 17:48:16 GMT, Roman Kennke wrote: > This change removes the dedicated Klass* word in the oop structure, at least in 64 bit builds. It has been usused, except for verification, since integration of #29. > > The change makes regular objects smaller, the field layouter can now use the extra space and start laying out fields at offset 8 (instead of 12 or 16, depending on class-compression). However, arrays remain unchanged because they still store the array length in offset 8, and current array code expects the array elements to start at 64bit aligned address, even for smaller data types. I will address this in a subsequent change. > > Much of the change is reverting the klass_offset -> nklass_offset change by #29, as promised. I also disabled s390x and ppc64le GHA builds, because they are broken by this change. > > I problem-listed a number of SA tests. The SA is broken by this change because it needs to load the Klass* from a regular field. I tried to change it, but it looks very difficult: it doesn't have a way to safely access the object mark word in the face of concurrent activity by the locking subsystem. Fixing the SA is not very high priority at this point, it may ultimately not even be necessary when/if the locking subsystem doesn't use the mark word anymore (or at least, does no longer displace the header). > > Testing: > - [x] tier1 (x86_64, x86_32, aarch64) > - [x] tier2 (x86_64, x86_32, aarch64) > - [ ] tier3 (x86_64, x86_32, aarch64) This pull request has now been integrated. Changeset: 35980384 Author: Roman Kennke URL: https://git.openjdk.java.net/lilliput/commit/35980384ef2b78b38190233bb2a54aeb21f12abe Stats: 259 lines in 45 files changed: 62 ins; 129 del; 68 mod Eliminate Klass* word Reviewed-by: shade ------------- PR: https://git.openjdk.java.net/lilliput/pull/40 From roland at openjdk.java.net Mon Feb 7 14:40:41 2022 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 07 Feb 2022 14:40:41 -0000 Subject: [master] RFR: Load Klass* from header, C2 implementation In-Reply-To: References: Message-ID: On Thu, 25 Nov 2021 17:39:43 GMT, Roman Kennke wrote: > This implements loading the compressed Klass* from the object header in C2. > This is quite a hack: normally we should change the load offset everywhere in C2 from 8 (current klass_offset_in_bytes) to 0 (the mark word offset). However, offset = 0 has different meaning in C2, it means no offset, and throws off all sorts of stuff in C2. Realistically, in the long run, we want to change to loading from offset 4 (upper half of the header), but we currently need to deal with locking, and thus load from offset 0. > > I fake it: I pretend to load from offset 4 everywhere in C2 ideal graph, but then load from offset 0 and check the mark bits instead. We can't leave the offset in ideal graph at 8 because that is going to conflict with loads from first field as soon as we drop the Klass*. > > We also need to ensure that C2 knows that the condition flags are clobbered (cr), and that src and dst are not the same register. > > Testing: > - [x] tier1 (x86_64, aarch64) > - [x] tier2 (x86_64, aarch64) > - [x] tier3 (x86_64, aarch64) Have you considered doing this in the IR graph instead of in platform dependent code? Maybe in Compile::final_graph_reshaping_impl() converting LoadNKlass to the markword load + logic to extract the class. Or do you think there's not much value in that approach? The current implementation is ok AFAICT. ------------- PR: https://git.openjdk.java.net/lilliput/pull/29 From roland at openjdk.java.net Mon Feb 7 15:01:59 2022 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 07 Feb 2022 15:01:59 -0000 Subject: [master] RFR: Load Klass* from header, C2 implementation In-Reply-To: References:

Message-ID: On Mon, 7 Feb 2022 14:47:14 GMT, Roman Kennke wrote: > It should not make a measurable difference perf-wise, I think. Right. ------------- PR: https://git.openjdk.java.net/lilliput/pull/29