From rkennke at openjdk.org Tue Jan 21 11:41:47 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 21 Jan 2025 11:41:47 GMT Subject: [master] RFR: 8346011: [Lilliput] Compact Full-GC Forwarding [v3] In-Reply-To: References: Message-ID: <78re__Z2m3TNvFy9TZewfl8yOcBN-JqowUxDevwXVyo=.8e62f45c-dd7e-4c97-990f-29951b5820bc@github.com> > The current forwarding scheme that is used during full-GC overrides much of the mark-word. This is a problem for current compact headers when the heap is larger than 8TB (in which case we currently disable compact headers altogether) and becomes a much worse issue with 4-byte-compact-headers, because it would override the compressed class-pointer in the upper header bits. > > This implementation uses a side-table (currently 1/512th of the heap size) to store part of the target addresses, and 9 bits in the header for the rest of it. For the full description of the algorithm, see top of fullGCForwarding.hpp. > > Some performance testing results: https://gist.github.com/rkennke/5a53d21337fc6e696041062d6b972dd6 > Interpretation: The new full-GC forwarding is slightly slower across the board. The different is almost completely a constant ~10ms. This corresponds to the setup costs of allocating and clearing the side-table. That cost could be reduced by allocating and clearing the table up-front, when the GC gets initialized, and then only ever clearing the table when full-GC is finished. That brings down the numbers to almost exactly baseline level, but I think we wouldn't want to hold on to that memory all the time, especially not in G1 and Shenandoah, where full-GC is an exceptional mode that is not intended to run frequently. > > Testing: > - [x] hotspot_gc > - [x] tier1 > - [x] tier1 +UseSerialGC > - [x] tier1 +UseParallelGC > - [x] tier1 +UseShenandoahGC Roman Kennke has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: 8346011: [Lilliput] Compact Full-GC Forwarding ------------- Changes: - all: https://git.openjdk.org/lilliput/pull/191/files - new: https://git.openjdk.org/lilliput/pull/191/files/42ca1c80..28a662cc Webrevs: - full: https://webrevs.openjdk.org/?repo=lilliput&pr=191&range=02 - incr: https://webrevs.openjdk.org/?repo=lilliput&pr=191&range=01-02 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/lilliput/pull/191.diff Fetch: git fetch https://git.openjdk.org/lilliput.git pull/191/head:pull/191 PR: https://git.openjdk.org/lilliput/pull/191 From stuefe at openjdk.org Tue Jan 21 12:18:07 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 21 Jan 2025 12:18:07 GMT Subject: [master] RFR: 8346011: [Lilliput] Compact Full-GC Forwarding [v3] In-Reply-To: <78re__Z2m3TNvFy9TZewfl8yOcBN-JqowUxDevwXVyo=.8e62f45c-dd7e-4c97-990f-29951b5820bc@github.com> References: <78re__Z2m3TNvFy9TZewfl8yOcBN-JqowUxDevwXVyo=.8e62f45c-dd7e-4c97-990f-29951b5820bc@github.com> Message-ID: On Tue, 21 Jan 2025 11:41:47 GMT, Roman Kennke wrote: >> The current forwarding scheme that is used during full-GC overrides much of the mark-word. This is a problem for current compact headers when the heap is larger than 8TB (in which case we currently disable compact headers altogether) and becomes a much worse issue with 4-byte-compact-headers, because it would override the compressed class-pointer in the upper header bits. >> >> This implementation uses a side-table (currently 1/512th of the heap size) to store part of the target addresses, and 9 bits in the header for the rest of it. For the full description of the algorithm, see top of fullGCForwarding.hpp. >> >> Some performance testing results: https://gist.github.com/rkennke/5a53d21337fc6e696041062d6b972dd6 >> Interpretation: The new full-GC forwarding is slightly slower across the board. The different is almost completely a constant ~10ms. This corresponds to the setup costs of allocating and clearing the side-table. That cost could be reduced by allocating and clearing the table up-front, when the GC gets initialized, and then only ever clearing the table when full-GC is finished. That brings down the numbers to almost exactly baseline level, but I think we wouldn't want to hold on to that memory all the time, especially not in G1 and Shenandoah, where full-GC is an exceptional mode that is not intended to run frequently. >> >> Testing: >> - [x] hotspot_gc >> - [x] tier1 >> - [x] tier1 +UseSerialGC >> - [x] tier1 +UseParallelGC >> - [x] tier1 +UseShenandoahGC > > Roman Kennke has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > 8346011: [Lilliput] Compact Full-GC Forwarding Looks reasonable. src/hotspot/share/gc/shared/fullGCForwarding.hpp line 140: > 138: // block. > 139: static const uintptr_t FALLBACK_PATTERN = right_n_bits(NUM_OFFSET_BITS); > 140: static const uintptr_t FALLBACK_PATTERN_IN_PLACE = FALLBACK_PATTERN << OFFSET_BITS_SHIFT; I would make all constants constexpr ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/lilliput/pull/191#pullrequestreview-2564414971 PR Review Comment: https://git.openjdk.org/lilliput/pull/191#discussion_r1923603764 From rkennke at openjdk.org Tue Jan 21 12:51:25 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 21 Jan 2025 12:51:25 GMT Subject: [master] RFR: 8346011: [Lilliput] Compact Full-GC Forwarding [v4] In-Reply-To: References: Message-ID: <3zi-U5f1dbmvGuqj_rBOLuL5Pf9pc3G5tBK5DXyBoO4=.fc7b8861-5fcc-4b0c-92aa-69696484fa73@github.com> > The current forwarding scheme that is used during full-GC overrides much of the mark-word. This is a problem for current compact headers when the heap is larger than 8TB (in which case we currently disable compact headers altogether) and becomes a much worse issue with 4-byte-compact-headers, because it would override the compressed class-pointer in the upper header bits. > > This implementation uses a side-table (currently 1/512th of the heap size) to store part of the target addresses, and 9 bits in the header for the rest of it. For the full description of the algorithm, see top of fullGCForwarding.hpp. > > Some performance testing results: https://gist.github.com/rkennke/5a53d21337fc6e696041062d6b972dd6 > Interpretation: The new full-GC forwarding is slightly slower across the board. The different is almost completely a constant ~10ms. This corresponds to the setup costs of allocating and clearing the side-table. That cost could be reduced by allocating and clearing the table up-front, when the GC gets initialized, and then only ever clearing the table when full-GC is finished. That brings down the numbers to almost exactly baseline level, but I think we wouldn't want to hold on to that memory all the time, especially not in G1 and Shenandoah, where full-GC is an exceptional mode that is not intended to run frequently. > > Testing: > - [x] hotspot_gc > - [x] tier1 > - [x] tier1 +UseSerialGC > - [x] tier1 +UseParallelGC > - [x] tier1 +UseShenandoahGC Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Make all constants constexpr ------------- Changes: - all: https://git.openjdk.org/lilliput/pull/191/files - new: https://git.openjdk.org/lilliput/pull/191/files/28a662cc..e3604fa7 Webrevs: - full: https://webrevs.openjdk.org/?repo=lilliput&pr=191&range=03 - incr: https://webrevs.openjdk.org/?repo=lilliput&pr=191&range=02-03 Stats: 10 lines in 1 file changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.org/lilliput/pull/191.diff Fetch: git fetch https://git.openjdk.org/lilliput.git pull/191/head:pull/191 PR: https://git.openjdk.org/lilliput/pull/191 From rkennke at openjdk.org Tue Jan 21 12:53:29 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 21 Jan 2025 12:53:29 GMT Subject: [pr/191] RFR: 8320761: [Lilliput] Implement compact identity hashcode Message-ID: This reimplements identity hash-code such that it allocates the space for the i-hash only on-demand. The idea is that most objects never get i-hashed, and therefore we don't want to penalize those objects which don't by carrying 32 unused bits for no reason. I'm proposing that we only reserve two bits in the header to track the state of an object: - One bit to indicate that an object has been i-hashed - Another bit to indicate that an object has been 'expanded' to hold the i-hash in a field. The transition would then be as follows: 1. Object starts out with state 00: not i-hashed, and not expanded. 2. The first time when Object.identityHashCode() is called, advance the state to 01 (i-hashed, but not expanded). Generate an i-hash based on the object's address, and return that. 3. Whenever Object.identityHashCode() is called in that state, and as long as that object stays at its address, keep generating the i-hash based on its address. 4. As soon as the object gets moved to a new address (by the GC), advance the state to 11 (i-hashed and expanded), generate the i-hash based on the original address one last time, and write it to the hidden field in the expanded object. 5. Whenever Object.identityHashCode() is called in that state, read the hidden field and return the i-hashed that has been stored there. Transitioning to the 'expanded' state does not necessarily mean that the object actually needs to be expanded. The offset of the hidden field for any class is computed at class-loading-time. That field may fit into any gaps in the field layout, including alignment gaps at the end of an object. Only when no such gaps are found, we need to actually expand the object during GC. That appears to happen approx 50% of the time. For a discussion why it is not a problem to expand objects during GC, see here: [https://wiki.openjdk.org/display/lilliput/Compact+Identity+Hashcode](https://wiki.openjdk.org/display/lilliput/Compact+Identity+Hashcode) I ran some I-hash heavy benchmarks (e.g. SPECjvm compiler benchmarks), and have not seen a performance difference. It would be nice to make some micro-benchmarks for the various scenarios (e.g. is it faster to compute the i-hash or read it from memory?), and compare that with the old implementation, but I haven't done that yet. I will do so in a follow-up PR. Testing: - tier1 (-UCOH) - tier2 (-UCOH) - tier1 (+UCOH) - tier2 (+UCOH) ------------- Depends on: https://git.openjdk.org/lilliput/pull/191 Commit messages: - Merge branch 'JDK-8346011' into JDK-8320761 - 8320761: [Lilliput] Implement compact identity hashcode Changes: https://git.openjdk.org/lilliput/pull/192/files Webrev: https://webrevs.openjdk.org/?repo=lilliput&pr=192&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8320761 Stats: 889 lines in 65 files changed: 725 ins; 16 del; 148 mod Patch: https://git.openjdk.org/lilliput/pull/192.diff Fetch: git fetch https://git.openjdk.org/lilliput.git pull/192/head:pull/192 PR: https://git.openjdk.org/lilliput/pull/192 From rkennke at openjdk.org Tue Jan 21 13:16:11 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 21 Jan 2025 13:16:11 GMT Subject: [master] RFR: 8347710: [Lilliput] Implement 4 byte headers Message-ID: <3F1pXSToMPvaacJejJ-8CXQSORY-x4WIDAeT26g1gqE=.afdadc1c-3e5e-4be6-81e7-bbc31716da04@github.com> Packs/reduces header bits to just 4 bytes. This reduces the number of Klass* bits to 19 bits, which allows for ~500,000 classes. Testing: - tier1 (-UCOH) - tier2 (-UCOH) - tier1 (+UCOH) - tier2 (+UCOH) ------------- Commit messages: - Merge branch 'JDK-8320761' into JDK-8347710 - Merge branch 'JDK-8346011' into JDK-8320761 - Make all constants constexpr - 8347710: [Lilliput] Implement 4 byte headers - 8320761: [Lilliput] Implement compact identity hashcode - 8346011: [Lilliput] Compact Full-GC Forwarding Changes: https://git.openjdk.org/lilliput/pull/193/files Webrev: https://webrevs.openjdk.org/?repo=lilliput&pr=193&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8347710 Stats: 2295 lines in 132 files changed: 1312 ins; 667 del; 316 mod Patch: https://git.openjdk.org/lilliput/pull/193.diff Fetch: git fetch https://git.openjdk.org/lilliput.git pull/193/head:pull/193 PR: https://git.openjdk.org/lilliput/pull/193 From stuefe at openjdk.org Tue Jan 21 14:43:54 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 21 Jan 2025 14:43:54 GMT Subject: [pr/191] RFR: 8320761: [Lilliput] Implement compact identity hashcode In-Reply-To: References: Message-ID: On Tue, 17 Dec 2024 15:07:01 GMT, Roman Kennke wrote: > This reimplements identity hash-code such that it allocates the space for the i-hash only on-demand. The idea is that most objects never get i-hashed, and therefore we don't want to penalize those objects which don't by carrying 32 unused bits for no reason. > > I'm proposing that we only reserve two bits in the header to track the state of an object: > > - One bit to indicate that an object has been i-hashed > - Another bit to indicate that an object has been 'expanded' to hold the i-hash in a field. > > The transition would then be as follows: > 1. Object starts out with state 00: not i-hashed, and not expanded. > 2. The first time when Object.identityHashCode() is called, advance the state to 01 (i-hashed, but not expanded). Generate an i-hash based on the object's address, and return that. > 3. Whenever Object.identityHashCode() is called in that state, and as long as that object stays at its address, keep generating the i-hash based on its address. > 4. As soon as the object gets moved to a new address (by the GC), advance the state to 11 (i-hashed and expanded), generate the i-hash based on the original address one last time, and write it to the hidden field in the expanded object. > 5. Whenever Object.identityHashCode() is called in that state, read the hidden field and return the i-hashed that has been stored there. > > Transitioning to the 'expanded' state does not necessarily mean that the object actually needs to be expanded. The offset of the hidden field for any class is computed at class-loading-time. That field may fit into any gaps in the field layout, including alignment gaps at the end of an object. Only when no such gaps are found, we need to actually expand the object during GC. That appears to happen approx 50% of the time. > > For a discussion why it is not a problem to expand objects during GC, see here: [https://wiki.openjdk.org/display/lilliput/Compact+Identity+Hashcode](https://wiki.openjdk.org/display/lilliput/Compact+Identity+Hashcode) > > One caveat is that ParallelGC is currently does not supported. Parallel's full-GC precalculates block sizes based on the assumption that objects don't grow. I will propose a fix for this as a follow-up, but it's likely a somewhat complex change. > > I ran some I-hash heavy benchmarks (e.g. SPECjvm compiler benchmarks), and have not seen a performance difference. It would be nice to make some micro-benchmarks for the various scenarios (e.g. is it faster to compute the i-hash or read it from memory?), and compare that with the ... Marked as reviewed by stuefe (Reviewer). ------------- PR Review: https://git.openjdk.org/lilliput/pull/192#pullrequestreview-2564843877 From stuefe at openjdk.org Tue Jan 21 14:44:47 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 21 Jan 2025 14:44:47 GMT Subject: [master] RFR: 8347710: [Lilliput] Implement 4 byte headers In-Reply-To: <3F1pXSToMPvaacJejJ-8CXQSORY-x4WIDAeT26g1gqE=.afdadc1c-3e5e-4be6-81e7-bbc31716da04@github.com> References: <3F1pXSToMPvaacJejJ-8CXQSORY-x4WIDAeT26g1gqE=.afdadc1c-3e5e-4be6-81e7-bbc31716da04@github.com> Message-ID: On Tue, 21 Jan 2025 13:11:53 GMT, Roman Kennke wrote: > Packs/reduces header bits to just 4 bytes. > > This reduces the number of Klass* bits to 19 bits, which allows for ~500,000 classes. > > Testing: > > - tier1 (-UCOH) > - tier2 (-UCOH) > - tier1 (+UCOH) > - tier2 (+UCOH) Marked as reviewed by stuefe (Reviewer). ------------- PR Review: https://git.openjdk.org/lilliput/pull/193#pullrequestreview-2564845254 From rkennke at openjdk.org Wed Jan 22 11:48:55 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 22 Jan 2025 11:48:55 GMT Subject: [master] RFR: 8346011: [Lilliput] Compact Full-GC Forwarding [v4] In-Reply-To: <3zi-U5f1dbmvGuqj_rBOLuL5Pf9pc3G5tBK5DXyBoO4=.fc7b8861-5fcc-4b0c-92aa-69696484fa73@github.com> References: <3zi-U5f1dbmvGuqj_rBOLuL5Pf9pc3G5tBK5DXyBoO4=.fc7b8861-5fcc-4b0c-92aa-69696484fa73@github.com> Message-ID: On Tue, 21 Jan 2025 12:51:25 GMT, Roman Kennke wrote: >> The current forwarding scheme that is used during full-GC overrides much of the mark-word. This is a problem for current compact headers when the heap is larger than 8TB (in which case we currently disable compact headers altogether) and becomes a much worse issue with 4-byte-compact-headers, because it would override the compressed class-pointer in the upper header bits. >> >> This implementation uses a side-table (currently 1/512th of the heap size) to store part of the target addresses, and 9 bits in the header for the rest of it. For the full description of the algorithm, see top of fullGCForwarding.hpp. >> >> Some performance testing results: https://gist.github.com/rkennke/5a53d21337fc6e696041062d6b972dd6 >> Interpretation: The new full-GC forwarding is slightly slower across the board. The different is almost completely a constant ~10ms. This corresponds to the setup costs of allocating and clearing the side-table. That cost could be reduced by allocating and clearing the table up-front, when the GC gets initialized, and then only ever clearing the table when full-GC is finished. That brings down the numbers to almost exactly baseline level, but I think we wouldn't want to hold on to that memory all the time, especially not in G1 and Shenandoah, where full-GC is an exceptional mode that is not intended to run frequently. >> >> Testing: >> - [x] hotspot_gc >> - [x] tier1 >> - [x] tier1 +UseSerialGC >> - [x] tier1 +UseParallelGC >> - [x] tier1 +UseShenandoahGC > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Make all constants constexpr Thanks! ------------- PR Comment: https://git.openjdk.org/lilliput/pull/191#issuecomment-2607028019 From rkennke at openjdk.org Wed Jan 22 11:48:55 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 22 Jan 2025 11:48:55 GMT Subject: [master] Integrated: 8346011: [Lilliput] Compact Full-GC Forwarding In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 15:46:08 GMT, Roman Kennke wrote: > The current forwarding scheme that is used during full-GC overrides much of the mark-word. This is a problem for current compact headers when the heap is larger than 8TB (in which case we currently disable compact headers altogether) and becomes a much worse issue with 4-byte-compact-headers, because it would override the compressed class-pointer in the upper header bits. > > This implementation uses a side-table (currently 1/512th of the heap size) to store part of the target addresses, and 9 bits in the header for the rest of it. For the full description of the algorithm, see top of fullGCForwarding.hpp. > > Some performance testing results: https://gist.github.com/rkennke/5a53d21337fc6e696041062d6b972dd6 > Interpretation: The new full-GC forwarding is slightly slower across the board. The different is almost completely a constant ~10ms. This corresponds to the setup costs of allocating and clearing the side-table. That cost could be reduced by allocating and clearing the table up-front, when the GC gets initialized, and then only ever clearing the table when full-GC is finished. That brings down the numbers to almost exactly baseline level, but I think we wouldn't want to hold on to that memory all the time, especially not in G1 and Shenandoah, where full-GC is an exceptional mode that is not intended to run frequently. > > Testing: > - [x] hotspot_gc > - [x] tier1 > - [x] tier1 +UseSerialGC > - [x] tier1 +UseParallelGC > - [x] tier1 +UseShenandoahGC This pull request has now been integrated. Changeset: 33d90b2c Author: Roman Kennke URL: https://git.openjdk.org/lilliput/commit/33d90b2ced7f07baab28846ce4afbcc132a31ab6 Stats: 548 lines in 14 files changed: 469 ins; 31 del; 48 mod 8346011: [Lilliput] Compact Full-GC Forwarding Reviewed-by: stuefe ------------- PR: https://git.openjdk.org/lilliput/pull/191 From rkennke at openjdk.org Wed Jan 22 11:54:44 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 22 Jan 2025 11:54:44 GMT Subject: [master] RFR: 8320761: [Lilliput] Implement compact identity hashcode [v2] In-Reply-To: References: Message-ID: <8XLm3RiR_i7sNpUaVX1191KXW61P8abKK2MXm1FzGeQ=.e8858012-aced-4fbe-b0ea-b17c81fbdb66@github.com> > This reimplements identity hash-code such that it allocates the space for the i-hash only on-demand. The idea is that most objects never get i-hashed, and therefore we don't want to penalize those objects which don't by carrying 32 unused bits for no reason. > > I'm proposing that we only reserve two bits in the header to track the state of an object: > > - One bit to indicate that an object has been i-hashed > - Another bit to indicate that an object has been 'expanded' to hold the i-hash in a field. > > The transition would then be as follows: > 1. Object starts out with state 00: not i-hashed, and not expanded. > 2. The first time when Object.identityHashCode() is called, advance the state to 01 (i-hashed, but not expanded). Generate an i-hash based on the object's address, and return that. > 3. Whenever Object.identityHashCode() is called in that state, and as long as that object stays at its address, keep generating the i-hash based on its address. > 4. As soon as the object gets moved to a new address (by the GC), advance the state to 11 (i-hashed and expanded), generate the i-hash based on the original address one last time, and write it to the hidden field in the expanded object. > 5. Whenever Object.identityHashCode() is called in that state, read the hidden field and return the i-hashed that has been stored there. > > Transitioning to the 'expanded' state does not necessarily mean that the object actually needs to be expanded. The offset of the hidden field for any class is computed at class-loading-time. That field may fit into any gaps in the field layout, including alignment gaps at the end of an object. Only when no such gaps are found, we need to actually expand the object during GC. That appears to happen approx 50% of the time. > > For a discussion why it is not a problem to expand objects during GC, see here: [https://wiki.openjdk.org/display/lilliput/Compact+Identity+Hashcode](https://wiki.openjdk.org/display/lilliput/Compact+Identity+Hashcode) > > One caveat is that ParallelGC is currently does not supported. Parallel's full-GC precalculates block sizes based on the assumption that objects don't grow. I will propose a fix for this as a follow-up, but it's likely a somewhat complex change. > > I ran some I-hash heavy benchmarks (e.g. SPECjvm compiler benchmarks), and have not seen a performance difference. It would be nice to make some micro-benchmarks for the various scenarios (e.g. is it faster to compute the i-hash or read it from memory?), and compare that with the ... Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'JDK-8346011' into JDK-8320761 - Make all constants constexpr ------------- Changes: - all: https://git.openjdk.org/lilliput/pull/192/files - new: https://git.openjdk.org/lilliput/pull/192/files/5d693277..5d693277 Webrevs: - full: https://webrevs.openjdk.org/?repo=lilliput&pr=192&range=01 - incr: https://webrevs.openjdk.org/?repo=lilliput&pr=192&range=00-01 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/lilliput/pull/192.diff Fetch: git fetch https://git.openjdk.org/lilliput.git pull/192/head:pull/192 PR: https://git.openjdk.org/lilliput/pull/192 From rkennke at openjdk.org Wed Jan 22 12:07:16 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 22 Jan 2025 12:07:16 GMT Subject: [master] RFR: 8320761: [Lilliput] Implement compact identity hashcode [v3] In-Reply-To: References: Message-ID: <2vkf2zsP2rLDLBc1zt5_9nu7IAV_Q47xFus-aaBPJfk=.505bc92b-9b4b-465f-87a7-050b222660bd@github.com> > This reimplements identity hash-code such that it allocates the space for the i-hash only on-demand. The idea is that most objects never get i-hashed, and therefore we don't want to penalize those objects which don't by carrying 32 unused bits for no reason. > > I'm proposing that we only reserve two bits in the header to track the state of an object: > > - One bit to indicate that an object has been i-hashed > - Another bit to indicate that an object has been 'expanded' to hold the i-hash in a field. > > The transition would then be as follows: > 1. Object starts out with state 00: not i-hashed, and not expanded. > 2. The first time when Object.identityHashCode() is called, advance the state to 01 (i-hashed, but not expanded). Generate an i-hash based on the object's address, and return that. > 3. Whenever Object.identityHashCode() is called in that state, and as long as that object stays at its address, keep generating the i-hash based on its address. > 4. As soon as the object gets moved to a new address (by the GC), advance the state to 11 (i-hashed and expanded), generate the i-hash based on the original address one last time, and write it to the hidden field in the expanded object. > 5. Whenever Object.identityHashCode() is called in that state, read the hidden field and return the i-hashed that has been stored there. > > Transitioning to the 'expanded' state does not necessarily mean that the object actually needs to be expanded. The offset of the hidden field for any class is computed at class-loading-time. That field may fit into any gaps in the field layout, including alignment gaps at the end of an object. Only when no such gaps are found, we need to actually expand the object during GC. That appears to happen approx 50% of the time. > > For a discussion why it is not a problem to expand objects during GC, see here: [https://wiki.openjdk.org/display/lilliput/Compact+Identity+Hashcode](https://wiki.openjdk.org/display/lilliput/Compact+Identity+Hashcode) > > One caveat is that ParallelGC is currently does not supported. Parallel's full-GC precalculates block sizes based on the assumption that objects don't grow. I will propose a fix for this as a follow-up, but it's likely a somewhat complex change. > > I ran some I-hash heavy benchmarks (e.g. SPECjvm compiler benchmarks), and have not seen a performance difference. It would be nice to make some micro-benchmarks for the various scenarios (e.g. is it faster to compute the i-hash or read it from memory?), and compare that with the ... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Cast size_t to int ------------- Changes: - all: https://git.openjdk.org/lilliput/pull/192/files - new: https://git.openjdk.org/lilliput/pull/192/files/5d693277..42b53dae Webrevs: - full: https://webrevs.openjdk.org/?repo=lilliput&pr=192&range=02 - incr: https://webrevs.openjdk.org/?repo=lilliput&pr=192&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/lilliput/pull/192.diff Fetch: git fetch https://git.openjdk.org/lilliput.git pull/192/head:pull/192 PR: https://git.openjdk.org/lilliput/pull/192 From rkennke at openjdk.org Wed Jan 22 12:15:34 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 22 Jan 2025 12:15:34 GMT Subject: [master] RFR: 8347710: [Lilliput] Implement 4 byte headers [v2] In-Reply-To: <3F1pXSToMPvaacJejJ-8CXQSORY-x4WIDAeT26g1gqE=.afdadc1c-3e5e-4be6-81e7-bbc31716da04@github.com> References: <3F1pXSToMPvaacJejJ-8CXQSORY-x4WIDAeT26g1gqE=.afdadc1c-3e5e-4be6-81e7-bbc31716da04@github.com> Message-ID: > Packs/reduces header bits to just 4 bytes. > > This reduces the number of Klass* bits to 19 bits, which allows for ~500,000 classes. > > Testing: > > - tier1 (-UCOH) > - tier2 (-UCOH) > - tier1 (+UCOH) > - tier2 (+UCOH) Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Merge branch 'JDK-8320761' into JDK-8347710 - Cast size_t to int ------------- Changes: - all: https://git.openjdk.org/lilliput/pull/193/files - new: https://git.openjdk.org/lilliput/pull/193/files/a1c5b29f..cc978ad3 Webrevs: - full: https://webrevs.openjdk.org/?repo=lilliput&pr=193&range=01 - incr: https://webrevs.openjdk.org/?repo=lilliput&pr=193&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/lilliput/pull/193.diff Fetch: git fetch https://git.openjdk.org/lilliput.git pull/193/head:pull/193 PR: https://git.openjdk.org/lilliput/pull/193 From rkennke at openjdk.org Wed Jan 22 13:28:16 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 22 Jan 2025 13:28:16 GMT Subject: [master] RFR: 8347710: [Lilliput] Implement 4 byte headers [v3] In-Reply-To: <3F1pXSToMPvaacJejJ-8CXQSORY-x4WIDAeT26g1gqE=.afdadc1c-3e5e-4be6-81e7-bbc31716da04@github.com> References: <3F1pXSToMPvaacJejJ-8CXQSORY-x4WIDAeT26g1gqE=.afdadc1c-3e5e-4be6-81e7-bbc31716da04@github.com> Message-ID: > Packs/reduces header bits to just 4 bytes. > > This reduces the number of Klass* bits to 19 bits, which allows for ~500,000 classes. > > Testing: > > - tier1 (-UCOH) > - tier2 (-UCOH) > - tier1 (+UCOH) > - tier2 (+UCOH) Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix 32 bit build ------------- Changes: - all: https://git.openjdk.org/lilliput/pull/193/files - new: https://git.openjdk.org/lilliput/pull/193/files/cc978ad3..54c5fe38 Webrevs: - full: https://webrevs.openjdk.org/?repo=lilliput&pr=193&range=02 - incr: https://webrevs.openjdk.org/?repo=lilliput&pr=193&range=01-02 Stats: 8 lines in 4 files changed: 4 ins; 0 del; 4 mod Patch: https://git.openjdk.org/lilliput/pull/193.diff Fetch: git fetch https://git.openjdk.org/lilliput.git pull/193/head:pull/193 PR: https://git.openjdk.org/lilliput/pull/193 From rkennke at openjdk.org Wed Jan 22 14:53:27 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 22 Jan 2025 14:53:27 GMT Subject: [master] RFR: 8347710: [Lilliput] Implement 4 byte headers [v4] In-Reply-To: <3F1pXSToMPvaacJejJ-8CXQSORY-x4WIDAeT26g1gqE=.afdadc1c-3e5e-4be6-81e7-bbc31716da04@github.com> References: <3F1pXSToMPvaacJejJ-8CXQSORY-x4WIDAeT26g1gqE=.afdadc1c-3e5e-4be6-81e7-bbc31716da04@github.com> Message-ID: > Packs/reduces header bits to just 4 bytes. > > This reduces the number of Klass* bits to 19 bits, which allows for ~500,000 classes. > > Testing: > > - tier1 (-UCOH) > - tier2 (-UCOH) > - tier1 (+UCOH) > - tier2 (+UCOH) Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Remove some override that cause build failures ------------- Changes: - all: https://git.openjdk.org/lilliput/pull/193/files - new: https://git.openjdk.org/lilliput/pull/193/files/54c5fe38..3ed5d1b5 Webrevs: - full: https://webrevs.openjdk.org/?repo=lilliput&pr=193&range=03 - incr: https://webrevs.openjdk.org/?repo=lilliput&pr=193&range=02-03 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/lilliput/pull/193.diff Fetch: git fetch https://git.openjdk.org/lilliput.git pull/193/head:pull/193 PR: https://git.openjdk.org/lilliput/pull/193 From rkennke at openjdk.org Wed Jan 22 16:46:16 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 22 Jan 2025 16:46:16 GMT Subject: [master] RFR: 8320761: [Lilliput] Implement compact identity hashcode [v3] In-Reply-To: <2vkf2zsP2rLDLBc1zt5_9nu7IAV_Q47xFus-aaBPJfk=.505bc92b-9b4b-465f-87a7-050b222660bd@github.com> References: <2vkf2zsP2rLDLBc1zt5_9nu7IAV_Q47xFus-aaBPJfk=.505bc92b-9b4b-465f-87a7-050b222660bd@github.com> Message-ID: <26kMi2TYoXreXDPfN6li0nkDFdMZfru3BpSVIzvk0Ds=.dead2ea5-c620-415e-83c8-ac517e2e440c@github.com> On Wed, 22 Jan 2025 12:07:16 GMT, Roman Kennke wrote: >> This reimplements identity hash-code such that it allocates the space for the i-hash only on-demand. The idea is that most objects never get i-hashed, and therefore we don't want to penalize those objects which don't by carrying 32 unused bits for no reason. >> >> I'm proposing that we only reserve two bits in the header to track the state of an object: >> >> - One bit to indicate that an object has been i-hashed >> - Another bit to indicate that an object has been 'expanded' to hold the i-hash in a field. >> >> The transition would then be as follows: >> 1. Object starts out with state 00: not i-hashed, and not expanded. >> 2. The first time when Object.identityHashCode() is called, advance the state to 01 (i-hashed, but not expanded). Generate an i-hash based on the object's address, and return that. >> 3. Whenever Object.identityHashCode() is called in that state, and as long as that object stays at its address, keep generating the i-hash based on its address. >> 4. As soon as the object gets moved to a new address (by the GC), advance the state to 11 (i-hashed and expanded), generate the i-hash based on the original address one last time, and write it to the hidden field in the expanded object. >> 5. Whenever Object.identityHashCode() is called in that state, read the hidden field and return the i-hashed that has been stored there. >> >> Transitioning to the 'expanded' state does not necessarily mean that the object actually needs to be expanded. The offset of the hidden field for any class is computed at class-loading-time. That field may fit into any gaps in the field layout, including alignment gaps at the end of an object. Only when no such gaps are found, we need to actually expand the object during GC. That appears to happen approx 50% of the time. >> >> For a discussion why it is not a problem to expand objects during GC, see here: [https://wiki.openjdk.org/display/lilliput/Compact+Identity+Hashcode](https://wiki.openjdk.org/display/lilliput/Compact+Identity+Hashcode) >> >> One caveat is that ParallelGC is currently does not supported. Parallel's full-GC precalculates block sizes based on the assumption that objects don't grow. I will propose a fix for this as a follow-up, but it's likely a somewhat complex change. >> >> I ran some I-hash heavy benchmarks (e.g. SPECjvm compiler benchmarks), and have not seen a performance difference. It would be nice to make some micro-benchmarks for the various scenarios (e.g. is it faster to compute the i-hash or read it... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Cast size_t to int Thanks! ------------- PR Comment: https://git.openjdk.org/lilliput/pull/192#issuecomment-2607739612 From rkennke at openjdk.org Wed Jan 22 16:46:16 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 22 Jan 2025 16:46:16 GMT Subject: [master] Integrated: 8320761: [Lilliput] Implement compact identity hashcode In-Reply-To: References: Message-ID: On Tue, 17 Dec 2024 15:07:01 GMT, Roman Kennke wrote: > This reimplements identity hash-code such that it allocates the space for the i-hash only on-demand. The idea is that most objects never get i-hashed, and therefore we don't want to penalize those objects which don't by carrying 32 unused bits for no reason. > > I'm proposing that we only reserve two bits in the header to track the state of an object: > > - One bit to indicate that an object has been i-hashed > - Another bit to indicate that an object has been 'expanded' to hold the i-hash in a field. > > The transition would then be as follows: > 1. Object starts out with state 00: not i-hashed, and not expanded. > 2. The first time when Object.identityHashCode() is called, advance the state to 01 (i-hashed, but not expanded). Generate an i-hash based on the object's address, and return that. > 3. Whenever Object.identityHashCode() is called in that state, and as long as that object stays at its address, keep generating the i-hash based on its address. > 4. As soon as the object gets moved to a new address (by the GC), advance the state to 11 (i-hashed and expanded), generate the i-hash based on the original address one last time, and write it to the hidden field in the expanded object. > 5. Whenever Object.identityHashCode() is called in that state, read the hidden field and return the i-hashed that has been stored there. > > Transitioning to the 'expanded' state does not necessarily mean that the object actually needs to be expanded. The offset of the hidden field for any class is computed at class-loading-time. That field may fit into any gaps in the field layout, including alignment gaps at the end of an object. Only when no such gaps are found, we need to actually expand the object during GC. That appears to happen approx 50% of the time. > > For a discussion why it is not a problem to expand objects during GC, see here: [https://wiki.openjdk.org/display/lilliput/Compact+Identity+Hashcode](https://wiki.openjdk.org/display/lilliput/Compact+Identity+Hashcode) > > One caveat is that ParallelGC is currently does not supported. Parallel's full-GC precalculates block sizes based on the assumption that objects don't grow. I will propose a fix for this as a follow-up, but it's likely a somewhat complex change. > > I ran some I-hash heavy benchmarks (e.g. SPECjvm compiler benchmarks), and have not seen a performance difference. It would be nice to make some micro-benchmarks for the various scenarios (e.g. is it faster to compute the i-hash or read it from memory?), and compare that with the ... This pull request has now been integrated. Changeset: 33f69df3 Author: Roman Kennke URL: https://git.openjdk.org/lilliput/commit/33f69df31c403e2e5c030e7b5a9f1b475cee74fb Stats: 889 lines in 65 files changed: 725 ins; 16 del; 148 mod 8320761: [Lilliput] Implement compact identity hashcode Reviewed-by: stuefe ------------- PR: https://git.openjdk.org/lilliput/pull/192 From rkennke at openjdk.org Thu Jan 23 19:04:47 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 23 Jan 2025 19:04:47 GMT Subject: [master] RFR: 8347710: [Lilliput] Implement 4 byte headers [v5] In-Reply-To: <3F1pXSToMPvaacJejJ-8CXQSORY-x4WIDAeT26g1gqE=.afdadc1c-3e5e-4be6-81e7-bbc31716da04@github.com> References: <3F1pXSToMPvaacJejJ-8CXQSORY-x4WIDAeT26g1gqE=.afdadc1c-3e5e-4be6-81e7-bbc31716da04@github.com> Message-ID: > Packs/reduces header bits to just 4 bytes. > > This reduces the number of Klass* bits to 19 bits, which allows for ~500,000 classes. > > Testing: > > - tier1 (-UCOH) > - tier2 (-UCOH) > - tier1 (+UCOH) > - tier2 (+UCOH) Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix CompressedClassPointersEncodingScheme.java ------------- Changes: - all: https://git.openjdk.org/lilliput/pull/193/files - new: https://git.openjdk.org/lilliput/pull/193/files/3ed5d1b5..16702484 Webrevs: - full: https://webrevs.openjdk.org/?repo=lilliput&pr=193&range=04 - incr: https://webrevs.openjdk.org/?repo=lilliput&pr=193&range=03-04 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/lilliput/pull/193.diff Fetch: git fetch https://git.openjdk.org/lilliput.git pull/193/head:pull/193 PR: https://git.openjdk.org/lilliput/pull/193 From rkennke at openjdk.org Fri Jan 24 09:47:01 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 24 Jan 2025 09:47:01 GMT Subject: [master] RFR: 8347710: [Lilliput] Implement 4 byte headers [v6] In-Reply-To: <3F1pXSToMPvaacJejJ-8CXQSORY-x4WIDAeT26g1gqE=.afdadc1c-3e5e-4be6-81e7-bbc31716da04@github.com> References: <3F1pXSToMPvaacJejJ-8CXQSORY-x4WIDAeT26g1gqE=.afdadc1c-3e5e-4be6-81e7-bbc31716da04@github.com> Message-ID: > Packs/reduces header bits to just 4 bytes. > > This reduces the number of Klass* bits to 19 bits, which allows for ~500,000 classes. > > Testing: > > - tier1 (-UCOH) > - tier2 (-UCOH) > - tier1 (+UCOH) > - tier2 (+UCOH) Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Comment-out unimplemented RISCV parts ------------- Changes: - all: https://git.openjdk.org/lilliput/pull/193/files - new: https://git.openjdk.org/lilliput/pull/193/files/16702484..81c71cac Webrevs: - full: https://webrevs.openjdk.org/?repo=lilliput&pr=193&range=05 - incr: https://webrevs.openjdk.org/?repo=lilliput&pr=193&range=04-05 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/lilliput/pull/193.diff Fetch: git fetch https://git.openjdk.org/lilliput.git pull/193/head:pull/193 PR: https://git.openjdk.org/lilliput/pull/193 From rkennke at openjdk.org Fri Jan 24 15:08:35 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 24 Jan 2025 15:08:35 GMT Subject: [master] RFR: 8347710: [Lilliput] Implement 4 byte headers [v7] In-Reply-To: <3F1pXSToMPvaacJejJ-8CXQSORY-x4WIDAeT26g1gqE=.afdadc1c-3e5e-4be6-81e7-bbc31716da04@github.com> References: <3F1pXSToMPvaacJejJ-8CXQSORY-x4WIDAeT26g1gqE=.afdadc1c-3e5e-4be6-81e7-bbc31716da04@github.com> Message-ID: <9RojHpoCw_jOGW0_RCFj5uXTB4EYLNv7PneOuFtQ68g=.37150841-c544-4ef5-9663-0ad228d9d1b8@github.com> > Packs/reduces header bits to just 4 bytes. > > This reduces the number of Klass* bits to 19 bits, which allows for ~500,000 classes. > > Testing: > > - tier1 (-UCOH) > - tier2 (-UCOH) > - tier1 (+UCOH) > - tier2 (+UCOH) Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: - Merge remote-tracking branch 'origin/master' into JDK-8347710 - Comment-out unimplemented RISCV parts - Fix CompressedClassPointersEncodingScheme.java - Remove some override that cause build failures - Fix 32 bit build - Merge branch 'JDK-8320761' into JDK-8347710 - Cast size_t to int - Merge branch 'JDK-8320761' into JDK-8347710 - Merge branch 'JDK-8346011' into JDK-8320761 - Make all constants constexpr - ... and 3 more: https://git.openjdk.org/lilliput/compare/33f69df3...b4b3e4f7 ------------- Changes: https://git.openjdk.org/lilliput/pull/193/files Webrev: https://webrevs.openjdk.org/?repo=lilliput&pr=193&range=06 Stats: 861 lines in 71 files changed: 109 ins; 620 del; 132 mod Patch: https://git.openjdk.org/lilliput/pull/193.diff Fetch: git fetch https://git.openjdk.org/lilliput.git pull/193/head:pull/193 PR: https://git.openjdk.org/lilliput/pull/193 From rkennke at openjdk.org Fri Jan 24 15:34:20 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 24 Jan 2025 15:34:20 GMT Subject: [master] RFR: 8347710: [Lilliput] Implement 4 byte headers [v8] In-Reply-To: <3F1pXSToMPvaacJejJ-8CXQSORY-x4WIDAeT26g1gqE=.afdadc1c-3e5e-4be6-81e7-bbc31716da04@github.com> References: <3F1pXSToMPvaacJejJ-8CXQSORY-x4WIDAeT26g1gqE=.afdadc1c-3e5e-4be6-81e7-bbc31716da04@github.com> Message-ID: > Packs/reduces header bits to just 4 bytes. > > This reduces the number of Klass* bits to 19 bits, which allows for ~500,000 classes. > > Testing: > > - tier1 (-UCOH) > - tier2 (-UCOH) > - tier1 (+UCOH) > - tier2 (+UCOH) Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix build ------------- Changes: - all: https://git.openjdk.org/lilliput/pull/193/files - new: https://git.openjdk.org/lilliput/pull/193/files/b4b3e4f7..36f08820 Webrevs: - full: https://webrevs.openjdk.org/?repo=lilliput&pr=193&range=07 - incr: https://webrevs.openjdk.org/?repo=lilliput&pr=193&range=06-07 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/lilliput/pull/193.diff Fetch: git fetch https://git.openjdk.org/lilliput.git pull/193/head:pull/193 PR: https://git.openjdk.org/lilliput/pull/193 From rkennke at openjdk.org Fri Jan 24 20:39:19 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 24 Jan 2025 20:39:19 GMT Subject: [master] Integrated: 8347710: [Lilliput] Implement 4 byte headers In-Reply-To: <3F1pXSToMPvaacJejJ-8CXQSORY-x4WIDAeT26g1gqE=.afdadc1c-3e5e-4be6-81e7-bbc31716da04@github.com> References: <3F1pXSToMPvaacJejJ-8CXQSORY-x4WIDAeT26g1gqE=.afdadc1c-3e5e-4be6-81e7-bbc31716da04@github.com> Message-ID: On Tue, 21 Jan 2025 13:11:53 GMT, Roman Kennke wrote: > Packs/reduces header bits to just 4 bytes. > > This reduces the number of Klass* bits to 19 bits, which allows for ~500,000 classes. > > Testing: > > - tier1 (-UCOH) > - tier2 (-UCOH) > - tier1 (+UCOH) > - tier2 (+UCOH) This pull request has now been integrated. Changeset: 6eb5ce82 Author: Roman Kennke URL: https://git.openjdk.org/lilliput/commit/6eb5ce82a1feb6737fe31a99e927e36991a8a5bd Stats: 863 lines in 71 files changed: 109 ins; 620 del; 134 mod 8347710: [Lilliput] Implement 4 byte headers Reviewed-by: stuefe ------------- PR: https://git.openjdk.org/lilliput/pull/193 From aph at openjdk.org Mon Jan 27 10:37:08 2025 From: aph at openjdk.org (Andrew Haley) Date: Mon, 27 Jan 2025 10:37:08 GMT Subject: [master] RFR: 8347710: [Lilliput] Implement 4 byte headers [v8] In-Reply-To: References: <3F1pXSToMPvaacJejJ-8CXQSORY-x4WIDAeT26g1gqE=.afdadc1c-3e5e-4be6-81e7-bbc31716da04@github.com> Message-ID: On Fri, 24 Jan 2025 15:34:20 GMT, Roman Kennke wrote: >> Packs/reduces header bits to just 4 bytes. >> >> This reduces the number of Klass* bits to 19 bits, which allows for ~500,000 classes. >> >> Testing: >> >> - tier1 (-UCOH) >> - tier2 (-UCOH) >> - tier1 (+UCOH) >> - tier2 (+UCOH) > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix build src/hotspot/cpu/aarch64/aarch64.ad line 6717: > 6715: %{ > 6716: match(Set dst (LoadNKlass mem)); > 6717: predicate(!needs_acquiring_load(n) && UseCompactObjectHeaders); Is there ever a time when we have an acquiring load on a narrow klass pointer? Also, do we ever generate code which loads a narrow klass pointer from something other than an Object? Maybe there's a use somewhere of `LoadNKlass` from somewhere other than `obj-start + klass_offset_in_bytes`? ------------- PR Review Comment: https://git.openjdk.org/lilliput/pull/193#discussion_r1930307977 From adinn at openjdk.org Mon Jan 27 11:02:16 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 27 Jan 2025 11:02:16 GMT Subject: [master] RFR: 8347710: [Lilliput] Implement 4 byte headers [v8] In-Reply-To: References: <3F1pXSToMPvaacJejJ-8CXQSORY-x4WIDAeT26g1gqE=.afdadc1c-3e5e-4be6-81e7-bbc31716da04@github.com> Message-ID: On Mon, 27 Jan 2025 10:34:38 GMT, Andrew Haley wrote: > Is there ever a time when we have an acquiring load on a narrow klass pointer? That's a good question! This predicate was added to handle volatile instance field loads (there is a similar set of cases for volatile stores). I believe I unthinkingly allowed for the possibility of the acquire case for Klass and NKlass loads (likewise stores) when I first added the rules that used the predicate (I think I just blindly changed all the rules that did a load/store). At that stage the test was done by traversing the graph but even then it was probably always going to fail for a Klass load/store. In the current implementation the predicate is implemented by checking field `_mo` of the relevant `LoadNode` which is set at node create and does not appear ever to get reset. Looking at the opto sources it appears that `LoadKlassNode` and `LoadNKlassNode` only ever seem to be created created with `_mo == unordered`. So, I think the Klass load/store rules which check for acquire/release semantics can be dropped. It would be worth seeking confirmation of that from @rwestrel who was the one who changed the implementation so it relied on the `_mo` fields in `Load/StoreNode`. ------------- PR Review Comment: https://git.openjdk.org/lilliput/pull/193#discussion_r1930339522 From rkennke at openjdk.org Mon Jan 27 12:12:04 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 27 Jan 2025 12:12:04 GMT Subject: [master] RFR: 8347710: [Lilliput] Implement 4 byte headers [v8] In-Reply-To: References: <3F1pXSToMPvaacJejJ-8CXQSORY-x4WIDAeT26g1gqE=.afdadc1c-3e5e-4be6-81e7-bbc31716da04@github.com> Message-ID: On Mon, 27 Jan 2025 10:57:58 GMT, Andrew Dinn wrote: > Is there ever a time when we have an acquiring load on a narrow klass pointer? I don't think so, but I copied it from the original loadNKlass code. > Also, do we ever generate code which loads a narrow klass pointer from something other than an Object? Maybe there's a use somewhere of `LoadNKlass` from somewhere other than `obj-start + klass_offset_in_bytes`? No, I don't think so, I've checked the C2 code for that. Any paths that load from non-objects are using LoadKlass. That said, I would love to have a cleaner way to load the (narrow) Klass* out of an object. There has been some discussion around that problem earlier: https://bugs.openjdk.org/browse/JDK-8340453 ------------- PR Review Comment: https://git.openjdk.org/lilliput/pull/193#discussion_r1930425913