From rcastanedalo at openjdk.org  Mon Sep  2 06:38:07 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Mon, 2 Sep 2024 06:38:07 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v12]
In-Reply-To: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
Message-ID: <k7DlJff0R897dvd7QdfElCAU3sjEqKXMVsedQIBHBSI=.3949cbbc-aecf-4ab0-a99e-1c4c54d3ce9d@github.com>

> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail.
> 
> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold:
> 
> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and
> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work.
> 
> ## Summary of the Changes
> 
> ### Platform-Independent Changes (`src/hotspot/share`)
> 
> These consist mainly of:
> 
> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early;
> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and
> - temporary support for porting the JEP to the remaining platforms.
> 
> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed.
> 
> ### Platform-Dependent Changes (`src/hotspot/cpu`)
> 
> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms.
> 
> #### ADL Changes
> 
> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy.
> 
> #### `G1BarrierSetAssembler` Changes
> 
> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c...

Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 18 additional commits since the last revision:

 - Merge jdk-24+13
 - Add test to motivate compile-time null checks in 'refine_barrier_by_new_val_type'
 - Remark relation between compiler optimization and barrier filter
 - Make 'refine_barrier_by_new_val_type' static and its input argument 'const'
 - Replace 'the null' with 'null' in comment
 - Remove redundant redefinitions of '__'
 - Replace 'already dirty' with 'young' in post-barrier fast path comment
 - Rename g1XChgX to g1GetAndSetX for consistency with Ideal operation names
 - Pass oldval to the pre-barrier in g1CompareAndExchange/SwapP
 - Assert that no implicit null checks are generated for memory accesses with barriers
 - ... and 8 more: https://git.openjdk.org/jdk/compare/52ffcda1...4ee450ad

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19746/files
  - new: https://git.openjdk.org/jdk/pull/19746/files/57adcfb0..4ee450ad

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=11
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=10-11

  Stats: 30577 lines in 938 files changed: 18592 ins; 8033 del; 3952 mod
  Patch: https://git.openjdk.org/jdk/pull/19746.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746

PR: https://git.openjdk.org/jdk/pull/19746

From rcastanedalo at openjdk.org  Mon Sep  2 06:38:07 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Mon, 2 Sep 2024 06:38:07 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v11]
In-Reply-To: <ncax4vA_nYYtSkhxvx6yXmQfbjO6N3_5AwC3Hi9TULM=.bb20fb5f-e369-49f3-ba62-d73c49144cc0@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <zacvUd_7RrmTAvSAF5g8ffGtm0ecXt65yLkrXwkMmQo=.9a53e4c0-96bb-4404-bedf-a5dcbdbcd8f5@github.com>
 <ncax4vA_nYYtSkhxvx6yXmQfbjO6N3_5AwC3Hi9TULM=.bb20fb5f-e369-49f3-ba62-d73c49144cc0@github.com>
Message-ID: <xc3FCBvFeVTNpazc11pB-o4_v2Vu_FbkBe3r75M1B04=.4f0a5199-6655-46cb-9585-b6015805de23@github.com>

On Fri, 30 Aug 2024 13:23:32 GMT, Feilong Jiang <fjiang at openjdk.org> wrote:

>> Roberto Casta?eda Lozano has updated the pull request incrementally with six additional commits since the last revision:
>> 
>>  - Add test to motivate compile-time null checks in 'refine_barrier_by_new_val_type'
>>  - Remark relation between compiler optimization and barrier filter
>>  - Make 'refine_barrier_by_new_val_type' static and its input argument 'const'
>>  - Replace 'the null' with 'null' in comment
>>  - Remove redundant redefinitions of '__'
>>  - Replace 'already dirty' with 'young' in post-barrier fast path comment
>
> risc-v port looks good too.

> OK, if there are no objections from @feilongjiang or @snazarkin within a couple of days I will prepare an update to jdk-24+13.

@TheRealMDoerr done (commit 4ee450a).

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2323921726

From mbaesken at openjdk.org  Mon Sep  2 12:52:47 2024
From: mbaesken at openjdk.org (Matthias Baesken)
Date: Mon, 2 Sep 2024 12:52:47 GMT
Subject: RFR: 8339300: CollectorPolicy.young_scaled_initial_ergo_vm gtest fails
 on ppc64 based platforms
Message-ID: <pvzsZVc6iIbQ-MWhSxLrtG91nS639azFKG48x_TUwsY=.baaa4862-f4c8-4556-868f-914ca9995e54@github.com>

We currently fail in CollectorPolicy.young_scaled_initial_ergo_vm after [JDK-8258483](https://bugs.openjdk.org/browse/JDK-8258483) came in.
AIX / Linux ppc64le show this error :

[ RUN ] CollectorPolicy.young_scaled_initial_ergo_vm
test/hotspot/gtest/gc/shared/test_collectorPolicy.cpp:122: Failure
Expected equality of these values:
  expected
    Which is: 44695552
  NewSize
    Which is: 41943040

test/hotspot/gtest/gc/shared/test_collectorPolicy.cpp:78: Failure
Expected: checker->execute() doesn't generate new fatal failures in the current thread.
  Actual: it does.

[ FAILED ] CollectorPolicy.young_scaled_initial_ergo_vm (0 ms)

So the decrease form 80M to 40M was too much for these platforms (they slightly differ in ergo/startup behavior).

-------------

Commit messages:
 - JDK-8339300

Changes: https://git.openjdk.org/jdk/pull/20820/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20820&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8339300
  Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/20820.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20820/head:pull/20820

PR: https://git.openjdk.org/jdk/pull/20820

From duke at openjdk.org  Mon Sep  2 13:14:27 2024
From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=)
Date: Mon, 2 Sep 2024 13:14:27 GMT
Subject: RFR: 8339163: ZGC: Race in clearing of remembered sets
Message-ID: <cx0c1Zz45FbeaOxJd8PqcvvDrY1d3Os1O5TFt0ryIMs=.6a8dc54e-b5d1-4241-b553-3163504df844@github.com>

When a young collection is in the "concurrent mark" phase and is scanning remembered sets (remsets) to find roots into the young gen it "consumes" the remset when it is finished by clearing it (using memset).

At the same time, an old collection might find a completely empty/garbage page that it will insert into the page cache. Before inserting into the page cache, the page's remset is cleared (using memset).

These two operations might interfere, resulting in both threads clearing the memory simultaneously.

This race was found in connection to https://bugs.openjdk.org/browse/JDK-8339161 where I experimented replacing some clears of remsets with free's and got a crash on Windows from memset when operating on free'd memory.

This patch makes sure that remsets are only cleared in the "concurrent mark" phase if not already handled by an old collection.

Tested with tiers 1-3 and with a local test that crashes if both threads handle the remset.

-------------

Commit messages:
 - 8339163: Race in clearing of remembered sets

Changes: https://git.openjdk.org/jdk/pull/20821/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20821&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8339163
  Stats: 26 lines in 2 files changed: 14 ins; 9 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/20821.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20821/head:pull/20821

PR: https://git.openjdk.org/jdk/pull/20821

From stefank at openjdk.org  Mon Sep  2 13:31:17 2024
From: stefank at openjdk.org (Stefan Karlsson)
Date: Mon, 2 Sep 2024 13:31:17 GMT
Subject: RFR: 8339163: ZGC: Race in clearing of remembered sets
In-Reply-To: <cx0c1Zz45FbeaOxJd8PqcvvDrY1d3Os1O5TFt0ryIMs=.6a8dc54e-b5d1-4241-b553-3163504df844@github.com>
References: <cx0c1Zz45FbeaOxJd8PqcvvDrY1d3Os1O5TFt0ryIMs=.6a8dc54e-b5d1-4241-b553-3163504df844@github.com>
Message-ID: <KYZHOCj7EwLNGG_dGq4xsSCIy4hjfF5OeV9OR8Mzh84=.627bacb8-ecb6-4982-9b89-da37b7e04ad3@github.com>

On Mon, 2 Sep 2024 13:09:09 GMT, Joel Sikstr?m <duke at openjdk.org> wrote:

> When a young collection is in the "concurrent mark" phase and is scanning remembered sets (remsets) to find roots into the young gen it "consumes" the remset when it is finished by clearing it (using memset).
> 
> At the same time, an old collection might find a completely empty/garbage page that it will insert into the page cache. Before inserting into the page cache, the page's remset is cleared (using memset).
> 
> These two operations might interfere, resulting in both threads clearing the memory simultaneously.
> 
> This race was found in connection to https://bugs.openjdk.org/browse/JDK-8339161 where I experimented replacing some clears of remsets with free's and got a crash on Windows from memset when operating on free'd memory.
> 
> This patch makes sure that remsets are only cleared in the "concurrent mark" phase if not already handled by an old collection.
> 
> Tested with tiers 1-3 and with a local test that crashes if both threads handle the remset.

Looks good. Great that you found this!

-------------

Marked as reviewed by stefank (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20821#pullrequestreview-2275671530

From duke at openjdk.org  Mon Sep  2 16:21:29 2024
From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=)
Date: Mon, 2 Sep 2024 16:21:29 GMT
Subject: RFR: 8339399: ZGC: Remove unnecessary page reset when splitting pages
Message-ID: <pSoFmEr2nB_zJe6GVTbNfq7FSYDFdAq8eneGfXfzfXc=.17f37411-e87d-48fb-a4f6-f7f6276a45ed@github.com>

There is no use-case where it is necessary to reset the same page after splitting and thus the reset should be removed.

Tested with tiers 1-3.

-------------

Commit messages:
 - 8339399: ZGC: Remove unnecessary page reset when splitting pages

Changes: https://git.openjdk.org/jdk/pull/20824/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20824&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8339399
  Stats: 11 lines in 2 files changed: 1 ins; 10 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/20824.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20824/head:pull/20824

PR: https://git.openjdk.org/jdk/pull/20824

From stefank at openjdk.org  Mon Sep  2 16:59:18 2024
From: stefank at openjdk.org (Stefan Karlsson)
Date: Mon, 2 Sep 2024 16:59:18 GMT
Subject: RFR: 8339399: ZGC: Remove unnecessary page reset when splitting
 pages
In-Reply-To: <pSoFmEr2nB_zJe6GVTbNfq7FSYDFdAq8eneGfXfzfXc=.17f37411-e87d-48fb-a4f6-f7f6276a45ed@github.com>
References: <pSoFmEr2nB_zJe6GVTbNfq7FSYDFdAq8eneGfXfzfXc=.17f37411-e87d-48fb-a4f6-f7f6276a45ed@github.com>
Message-ID: <heNKhjTN9YGOx9dafhe_G4W1LPhz8AlPmF2I6rjFEZY=.5cb6cfe5-c0a4-44a0-b985-07276ba0275b@github.com>

On Mon, 2 Sep 2024 16:15:57 GMT, Joel Sikstr?m <duke at openjdk.org> wrote:

> There is no use-case where it is necessary to reset the same page after splitting and thus the reset should be removed.
> 
> Tested with tiers 1-3.

Looks good.

-------------

Marked as reviewed by stefank (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20824#pullrequestreview-2275990889

From eosterlund at openjdk.org  Mon Sep  2 17:17:19 2024
From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=)
Date: Mon, 2 Sep 2024 17:17:19 GMT
Subject: RFR: 8339399: ZGC: Remove unnecessary page reset when splitting
 pages
In-Reply-To: <pSoFmEr2nB_zJe6GVTbNfq7FSYDFdAq8eneGfXfzfXc=.17f37411-e87d-48fb-a4f6-f7f6276a45ed@github.com>
References: <pSoFmEr2nB_zJe6GVTbNfq7FSYDFdAq8eneGfXfzfXc=.17f37411-e87d-48fb-a4f6-f7f6276a45ed@github.com>
Message-ID: <eenIC8qz8QSl0-MxpW3yZjUHVTHSBDIrlJqEJyDDGbM=.8a5a9aaa-dde6-4424-8d6d-ec9631713d9f@github.com>

On Mon, 2 Sep 2024 16:15:57 GMT, Joel Sikstr?m <duke at openjdk.org> wrote:

> There is no use-case where it is necessary to reset the same page after splitting and thus the reset should be removed.
> 
> Tested with tiers 1-3.

Looks good.

-------------

Marked as reviewed by eosterlund (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20824#pullrequestreview-2276004131

From eosterlund at openjdk.org  Mon Sep  2 17:22:18 2024
From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=)
Date: Mon, 2 Sep 2024 17:22:18 GMT
Subject: RFR: 8339163: ZGC: Race in clearing of remembered sets
In-Reply-To: <cx0c1Zz45FbeaOxJd8PqcvvDrY1d3Os1O5TFt0ryIMs=.6a8dc54e-b5d1-4241-b553-3163504df844@github.com>
References: <cx0c1Zz45FbeaOxJd8PqcvvDrY1d3Os1O5TFt0ryIMs=.6a8dc54e-b5d1-4241-b553-3163504df844@github.com>
Message-ID: <7fbmqs1TZO5HZnvMz46ppNfQCv3lnB4Pu9zEeSzuQGY=.14d7dbb3-8243-4299-96b0-32eb85c78aa4@github.com>

On Mon, 2 Sep 2024 13:09:09 GMT, Joel Sikstr?m <duke at openjdk.org> wrote:

> When a young collection is in the "concurrent mark" phase and is scanning remembered sets (remsets) to find roots into the young gen it "consumes" the remset when it is finished by clearing it (using memset).
> 
> At the same time, an old collection might find a completely empty/garbage page that it will insert into the page cache. Before inserting into the page cache, the page's remset is cleared (using memset).
> 
> These two operations might interfere, resulting in both threads clearing the memory simultaneously.
> 
> This race was found in connection to https://bugs.openjdk.org/browse/JDK-8339161 where I experimented replacing some clears of remsets with free's and got a crash on Windows from memset when operating on free'd memory.
> 
> This patch makes sure that remsets are only cleared in the "concurrent mark" phase if not already handled by an old collection.
> 
> Tested with tiers 1-3 and with a local test that crashes if both threads handle the remset.

Looks good!

-------------

Marked as reviewed by eosterlund (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20821#pullrequestreview-2276006890

From aboldtch at openjdk.org  Mon Sep  2 17:38:23 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Mon, 2 Sep 2024 17:38:23 GMT
Subject: RFR: 8339399: ZGC: Remove unnecessary page reset when splitting
 pages
In-Reply-To: <pSoFmEr2nB_zJe6GVTbNfq7FSYDFdAq8eneGfXfzfXc=.17f37411-e87d-48fb-a4f6-f7f6276a45ed@github.com>
References: <pSoFmEr2nB_zJe6GVTbNfq7FSYDFdAq8eneGfXfzfXc=.17f37411-e87d-48fb-a4f6-f7f6276a45ed@github.com>
Message-ID: <R7To3y5HTwfRlWVpfx74-SFVLOWbN49lNG1NXCrKhI4=.8c5b8a61-980f-4c8f-ba53-5594a1583f09@github.com>

On Mon, 2 Sep 2024 16:15:57 GMT, Joel Sikstr?m <duke at openjdk.org> wrote:

> There is no use-case where it is necessary to reset the same page after splitting and thus the reset should be removed.
> 
> Tested with tiers 1-3.

lgtm.

-------------

Marked as reviewed by aboldtch (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20824#pullrequestreview-2276017500

From aboldtch at openjdk.org  Mon Sep  2 17:39:18 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Mon, 2 Sep 2024 17:39:18 GMT
Subject: RFR: 8339163: ZGC: Race in clearing of remembered sets
In-Reply-To: <cx0c1Zz45FbeaOxJd8PqcvvDrY1d3Os1O5TFt0ryIMs=.6a8dc54e-b5d1-4241-b553-3163504df844@github.com>
References: <cx0c1Zz45FbeaOxJd8PqcvvDrY1d3Os1O5TFt0ryIMs=.6a8dc54e-b5d1-4241-b553-3163504df844@github.com>
Message-ID: <8_W87_RCGbZgF4P8_vDMeL1qO4MCKeVlgFSDZa-0wuY=.5b9a0480-5c31-42bb-bb19-00da295c43b2@github.com>

On Mon, 2 Sep 2024 13:09:09 GMT, Joel Sikstr?m <duke at openjdk.org> wrote:

> When a young collection is in the "concurrent mark" phase and is scanning remembered sets (remsets) to find roots into the young gen it "consumes" the remset when it is finished by clearing it (using memset).
> 
> At the same time, an old collection might find a completely empty/garbage page that it will insert into the page cache. Before inserting into the page cache, the page's remset is cleared (using memset).
> 
> These two operations might interfere, resulting in both threads clearing the memory simultaneously.
> 
> This race was found in connection to https://bugs.openjdk.org/browse/JDK-8339161 where I experimented replacing some clears of remsets with free's and got a crash on Windows from memset when operating on free'd memory.
> 
> This patch makes sure that remsets are only cleared in the "concurrent mark" phase if not already handled by an old collection.
> 
> Tested with tiers 1-3 and with a local test that crashes if both threads handle the remset.

lgtm.

-------------

Marked as reviewed by aboldtch (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20821#pullrequestreview-2276018164

From rcastanedalo at openjdk.org  Tue Sep  3 07:26:00 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Tue, 3 Sep 2024 07:26:00 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v13]
In-Reply-To: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
Message-ID: <M2O8S3yutUbRUOg7bpIshy8fREgJrhuR_i3d3NhXjDs=.639bd779-3ebc-4598-b968-84a17a1ac35c@github.com>

> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail.
> 
> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold:
> 
> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and
> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work.
> 
> ## Summary of the Changes
> 
> ### Platform-Independent Changes (`src/hotspot/share`)
> 
> These consist mainly of:
> 
> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early;
> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and
> - temporary support for porting the JEP to the remaining platforms.
> 
> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed.
> 
> ### Platform-Dependent Changes (`src/hotspot/cpu`)
> 
> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms.
> 
> #### ADL Changes
> 
> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy.
> 
> #### `G1BarrierSetAssembler` Changes
> 
> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c...

Roberto Casta?eda Lozano has updated the pull request incrementally with four additional commits since the last revision:

 - Increase test coverage of new-object stores with different type information
 - Refactor the two post-barrier removal cases into a single expression
 - Remove unnecessary early null-based post-barrier elision
 - Make store capturability test G1-specific and more precise

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19746/files
  - new: https://git.openjdk.org/jdk/pull/19746/files/4ee450ad..1ea2862f

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=12
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=11-12

  Stats: 88 lines in 5 files changed: 66 ins; 7 del; 15 mod
  Patch: https://git.openjdk.org/jdk/pull/19746.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746

PR: https://git.openjdk.org/jdk/pull/19746

From rcastanedalo at openjdk.org  Tue Sep  3 07:26:00 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Tue, 3 Sep 2024 07:26:00 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v10]
In-Reply-To: <accteWMpcC02sG1CeN1wXY3slh1ifD3LE3XLxUdh_GQ=.39d4c21b-df6f-4b82-898b-01444fd6b537@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <HyDyhSjB3aFOOF2fxzDj-LFpIk0eVdbwHQfoS02IwhQ=.c39c70b9-91b8-429d-bcd7-bd734124f921@github.com>
 <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com>
 <mayrMafB7Guo27uHTgI4kiuyGzIrAzZMYhiAlDMmcng=.1fca4ec4-fe9a-4aee-b9e0-909fc0a96120@github.com>
 <e7QQ2Ish-sftMHnAkYTf8R8dmGs8-DXF3KqvaXgJcNY=.1c121e00-4533-4bbc-bbb7-bc76037f5738@github.com>
 <accteWMpcC02sG1CeN1wXY3slh1ifD3LE3XLxUdh_GQ=.39d4c21b-df6f-4b82-898b-01444fd6b537@github.com>
Message-ID: <zkFnN-aMUsc40sD_HW7i-t9W_g1b-r-qq8fqtJty4-w=.36f73b0f-ae35-4220-b77e-592a11d20497@github.com>

On Fri, 30 Aug 2024 08:23:44 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

> I will study if the check in get_store_barrier is superseded by that in refine_barrier_by_new_val_type. If I can convince myself that this is the case I will consider removing the former.

This was indeed the case, so I have removed the compile-time null check from `G1BarrierSetC2::get_store_barrier` (commit deac05d7) and simplified the code around it (commit 6f4027bf). I also added a few extra test cases to exercise stores on newly-allocated objects with different nullness information (commit 1ea2862f).

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1741555725

From rcastanedalo at openjdk.org  Tue Sep  3 07:26:00 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Tue, 3 Sep 2024 07:26:00 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v10]
In-Reply-To: <rLcA4r_iAClrOKg6lOTU6PVw-77l_pUxL36aRf7SO6k=.7fcc84d1-34cb-4908-a4a1-c81cc1209a83@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <HyDyhSjB3aFOOF2fxzDj-LFpIk0eVdbwHQfoS02IwhQ=.c39c70b9-91b8-429d-bcd7-bd734124f921@github.com>
 <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com>
 <rLcA4r_iAClrOKg6lOTU6PVw-77l_pUxL36aRf7SO6k=.7fcc84d1-34cb-4908-a4a1-c81cc1209a83@github.com>
Message-ID: <3ME602kl5NBEdiWT53M-B-YHtyU4yYgwc-lVFPxjiKg=.7298f764-788a-4c79-a2aa-324407c63813@github.com>

On Fri, 30 Aug 2024 13:49:10 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

> Thanks for your suggestions and comments, Kim! I have addressed them now (modulo a couple of follow-up experiments that I will try out in the next days), please let me know if there is any further question.

@kimbarrett I have addressed all your comments now (including follow-up enhancements), please re-review.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2325782979

From rcastanedalo at openjdk.org  Tue Sep  3 07:26:01 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Tue, 3 Sep 2024 07:26:01 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v9]
In-Reply-To: <KDIImsRzO3MgeQF4fpbdhjQii81YbhNveIPnHsPYeSc=.a5b3688e-ba64-4127-b4e3-ef47fe9a90f4@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <WYYat4iyCbhlvNx5tCMDPgMhANCXWsULPcCGr4rLe5U=.afd6c854-f75f-478e-83de-fbc0eba42dae@github.com>
 <AoNG0zizOZLdXE4bahOqL7ep_wNbzCWtCFS_km8p9IM=.df3872f0-0f4b-4b8e-a131-972aa83aca30@github.com>
 <KDIImsRzO3MgeQF4fpbdhjQii81YbhNveIPnHsPYeSc=.a5b3688e-ba64-4127-b4e3-ef47fe9a90f4@github.com>
Message-ID: <sa3W85PqHub9AsQ5hwb1_83NRdLyI7wvgh5UodN-jx0=.39a6f609-514c-4cd9-9854-bc66df61b693@github.com>

On Fri, 30 Aug 2024 13:40:24 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>  A perhaps more principled solution might be extending store-capturing analysis to reject stores with late-expanded barriers. I will give it a try.

This option proved to be infeasible because other GCs (ZGC) rely on store capturing for barrier elision. Furthermore, this would prevent eliding G1 barriers that are found to be elidable only after the program is simplified by C2's intermediate optimizations, even if `ReduceInitialCardMarks` is enabled (I found a few such cases, e.g. where range check elimination is the enabling simplification).

Instead, I have opted to remove the `ReduceInitialCardMarks` condition from `StoreNode::Ideal` and introduce a GC-specific test to determine whether a store can be captured and used for object initialization (commit 6b9954979). For G1, this is true iff the store does not have any barrier or it does have barriers but `ReduceInitialCardMarks` is enabled. For all other GCs the test is always true, which preserves the original mainline behavior. To summarize, this option makes the logic clearer, improves analysis precision, and isolates the changes to G1.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1741554994

From mdoerr at openjdk.org  Tue Sep  3 12:06:27 2024
From: mdoerr at openjdk.org (Martin Doerr)
Date: Tue, 3 Sep 2024 12:06:27 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v13]
In-Reply-To: <M2O8S3yutUbRUOg7bpIshy8fREgJrhuR_i3d3NhXjDs=.639bd779-3ebc-4598-b968-84a17a1ac35c@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <M2O8S3yutUbRUOg7bpIshy8fREgJrhuR_i3d3NhXjDs=.639bd779-3ebc-4598-b968-84a17a1ac35c@github.com>
Message-ID: <eSsCiMnHnlntWJ9SFMRSrudc8lg7rJmQL4rufSE2CWY=.8ca92584-7279-43aa-8acc-9a46690687c7@github.com>

On Tue, 3 Sep 2024 07:26:00 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail.
>> 
>> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold:
>> 
>> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and
>> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work.
>> 
>> ## Summary of the Changes
>> 
>> ### Platform-Independent Changes (`src/hotspot/share`)
>> 
>> These consist mainly of:
>> 
>> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early;
>> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and
>> - temporary support for porting the JEP to the remaining platforms.
>> 
>> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed.
>> 
>> ### Platform-Dependent Changes (`src/hotspot/cpu`)
>> 
>> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms.
>> 
>> #### ADL Changes
>> 
>> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy.
>> 
>> #### `G1BarrierSetAssembler` Changes
>> 
>> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ...
>
> Roberto Casta?eda Lozano has updated the pull request incrementally with four additional commits since the last revision:
> 
>  - Increase test coverage of new-object stores with different type information
>  - Refactor the two post-barrier removal cases into a single expression
>  - Remove unnecessary early null-based post-barrier elision
>  - Make store capturability test G1-specific and more precise

src/hotspot/cpu/aarch64/gc/g1/g1_aarch64.ad line 646:

> 644: instruct g1LoadPVolatile(iRegPNoSp dst, indirect mem, iRegPNoSp tmp1, iRegPNoSp tmp2, rFlagsReg cr)
> 645: %{
> 646:   predicate(UseG1GC && needs_acquiring_load(n) && n->as_Load()->barrier_data() != 0);

Remark: This node should never match because the `referent` is never volatile (same for `g1LoadNVolatile`): https://github.com/openjdk/jdk/blob/7a418fc07464fe359a0b45b6d797c65c573770cb/src/java.base/share/classes/java/lang/ref/Reference.java#L157
Hence, `needs_acquiring_load(n)` and `n->as_Load()->barrier_data() != 0` are never true at the same time.
Not sure if this should somehow be reflected in the code. I've inserted `ShouldNotReachHere` on PPC64.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1741937425

From mdoerr at openjdk.org  Tue Sep  3 12:15:27 2024
From: mdoerr at openjdk.org (Martin Doerr)
Date: Tue, 3 Sep 2024 12:15:27 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2]
In-Reply-To: <mlYdKpFRum8G-akzgp8OLxElAXlSSES2-nLZtMtVU50=.e02066c2-580c-4f58-adec-8a78b0011bb8@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <sEEh7ndeGQznpxAqNtapGJU6dT96EXBNoS3QyVcOn_g=.e0a3525d-690c-4f2c-aca1-48c4975bfb65@github.com>
 <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com>
 <t06fvbqCJBUnGk9YZGE26EMtnVqdHMYshlRV_Z-I2iw=.c14937b3-a611-40ed-833e-d3e3da129328@github.com>
 <8fuUEkswt05x0IuT4PrNQuYgLd49g4EpZWOPPQog4PQ=.70b5edb6-98d0-4276-8578-f7a496b7f2a7@github.com>
 <T5t_M-KcfDewUXw0xM4N1x2cOPykPlVIbt1kH12Z6Kw=.7d514184-c161-4d36-99fd-770dc777a1c4@github.com>
 <fM6RYG2e8CrXNEZclaTDr0pXNH2edetYX1m0us_NfgQ=.db1ccec4-369a-43f0-a910-6c55c2e2c67e@github.com>
 <3H3rBSKDnpg5fmYqcZ5hT9yH2EAxCocycRompQJJCOo=.1b30fd89-09e9-4708-bd20-cdea00e809a7@github.com>
 <Lm9uGIt-n5tGBS-R4204b_e5YxnD0lz1wZ-ZJcztjPY=.897d4b56-e3d6-4a82-9b8d-3050d2a9a60d@github.com>
 <7Bjcf6MF4aTBuk4DmTnGzP0WwCWqJx_sv5k2sGMt9No=.ca426f04-4fa5-4ab1-a414-8c5e6a4e0dce@github.com>
 <Q3mAOdgALD4uNcOfeN_o3i6ENMq0DqzsjHn
 u6VOCSAU=.2202a64c-2ba0-412b-b1a7-f46c499cadb0@github.com>
 <6PF-kgezzOb9Ed7j-BbrwaURnLJH5aFgOouFwTYiFrE=.670092b8-6980-43f3-a091-25312cfa0f1b@github.com>
 <s-KyFqzO_MBY1r5oOlKPw99fZ3gNVlNTkAwQr_iJUEc=.9fd71de1-d4db-45b8-9a54-ae612b9499cf@github.com>
 <1vQH6zpEgjhIO_mq9DCnpwxgDXmnqdx0owlvjJq4Fcw=.78e60c6e-2b23-4e94-a998-e7ba9eafcb6a@github.com>
 <d_BgBoepV_e8m-GTtakzkGQDp9utf7A9m1jYAmMseLg=.28e24395-a3d6-4e7e-b61d-e585f1a3993d@github.com>
 <rCCaLGJ7GKPL3H9aNiFMN9VhYY5t9Ty6xGujLGRa01Q=.d4b69386-b78d-4391-9845-f68656ddf4db@github.com>
 <A8Kw-WQgHXGK2BYvSAaVIpcltY9pHXqzUCfwAUcjkzM=.d4a38bc7-2752-4453-b745-036d8bb6b5f8@github.com>
 <qLVtgTrJCsi_yO2-P9eJIlICuuuc3bnObQa1a25Kd_0=.7a67e980-92d2-40d1-9822-05b29ca6e802@github.com>
 <6rvTU-KY2DpLx3sK7zmkaZCBaNELr3FLGItGwUJzNUM=.0e75c7a7-4309-49c5-b48b-5aa642bcbe43@github.com>
 <mlYdKpFRum8G-akzgp8OLxElAXlSSES2-nLZtMtVU50=.e02066c2-580c-4f58-adec-8a78b0011bb8@github.com>
Message-ID: <NtG5E_jfErEEKt4zXEuiNnB8idyobwFQeh9dYJHXs2E=.8145bacb-e183-4c60-b205-feec1fa19a0d@github.com>

On Thu, 29 Aug 2024 08:37:24 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote:

>> Thanks, I prototyped the refactored version for both x64 and aarch64 [here](https://github.com/robcasloz/jdk/commit/c1ae871eadac0d44981b7892ac8f7b64e8734283). I do not have a strong opinion for or against this refactoring. @albertnetymk @kimbarrett what do you think about it? (asking since you have recently looked at this code).
>
> I find the use of default-arg-value `bool decode_new_val = false` a bit confusing. (I tend to think default-arg-value makes the code less readable in general.)
> 
> If not using default-arg-value, I suspect the diff will be larger, and I don't see the immediate benefit of this refactoring. Maybe this can be deferred to its own PR if it's really desirable?

@albertnetymk: FYI: The basic idea was to make compressed Oops optimizations easier. It allows using shorter decoding sequences and removing redundant null checks in the fast path. I've implemented it on PPC64: https://github.com/TheRealMDoerr/jdk/blob/ed9c0232f53a15d768804348e1d8a111fed9a19e/src/hotspot/cpu/ppc/gc/g1/g1BarrierSetAssembler_ppc.cpp#L471
But, I'm ok with postponing it.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1741950634

From mdoerr at openjdk.org  Tue Sep  3 12:20:25 2024
From: mdoerr at openjdk.org (Martin Doerr)
Date: Tue, 3 Sep 2024 12:20:25 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v10]
In-Reply-To: <3ME602kl5NBEdiWT53M-B-YHtyU4yYgwc-lVFPxjiKg=.7298f764-788a-4c79-a2aa-324407c63813@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <HyDyhSjB3aFOOF2fxzDj-LFpIk0eVdbwHQfoS02IwhQ=.c39c70b9-91b8-429d-bcd7-bd734124f921@github.com>
 <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com>
 <rLcA4r_iAClrOKg6lOTU6PVw-77l_pUxL36aRf7SO6k=.7fcc84d1-34cb-4908-a4a1-c81cc1209a83@github.com>
 <3ME602kl5NBEdiWT53M-B-YHtyU4yYgwc-lVFPxjiKg=.7298f764-788a-4c79-a2aa-324407c63813@github.com>
Message-ID: <yHuzoQPsVCuM6xWygtXCqi6dSly0MsntEVgbfDZXmSM=.850e274e-c02b-4fdc-b082-75281915cb41@github.com>

On Tue, 3 Sep 2024 07:22:32 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>>> I've only looked at the changes in gc directories (shared and cpu-specific).
>> 
>> Thanks for your suggestions and comments, Kim! I have addressed them now (modulo a couple of follow-up experiments that I will try out in the next days), please let me know if there is any further question.
>
>> Thanks for your suggestions and comments, Kim! I have addressed them now (modulo a couple of follow-up experiments that I will try out in the next days), please let me know if there is any further question.
> 
> @kimbarrett I have addressed all your comments now (including follow-up enhancements), please re-review.

@robcasloz: Thanks for the updates! I have an implementation for PPC64: https://github.com/TheRealMDoerr/jdk/commit/ed9c0232f53a15d768804348e1d8a111fed9a19e
Do you prefer integrating it soon? Otherwise, I could keep it separate and do more rebasing and testing while this PR evolves further.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2326378191

From iwalulya at openjdk.org  Tue Sep  3 13:56:29 2024
From: iwalulya at openjdk.org (Ivan Walulya)
Date: Tue, 3 Sep 2024 13:56:29 GMT
Subject: RFR: 8339369: G1: TestVerificationInConcurrentCycle.java fails with
 "Missing rem set entry" when using
 "-XX:G1RSetUpdatingPauseTimePercent=0 -XX:G1UpdateBufferSize=2"
Message-ID: <nNpC_jmImHt9CrCZBEFJxQ9iGpGJDDHffs9wjNn4PlE=.11c86736-4500-4aae-bfb7-ad610797e873@github.com>

Please review this patch to reset the per region cardsets in the later phases of the full-gc.  This ensures that Remset verification can proceed without considering whether the cardsets are combined or not.

Testing: passes test in the cited in the bug report and Tiers 1-3

-------------

Commit messages:
 - init

Changes: https://git.openjdk.org/jdk/pull/20835/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20835&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8339369
  Stats: 3 lines in 2 files changed: 2 ins; 1 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/20835.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20835/head:pull/20835

PR: https://git.openjdk.org/jdk/pull/20835

From mdoerr at openjdk.org  Tue Sep  3 14:22:19 2024
From: mdoerr at openjdk.org (Martin Doerr)
Date: Tue, 3 Sep 2024 14:22:19 GMT
Subject: RFR: 8339300: CollectorPolicy.young_scaled_initial_ergo_vm gtest
 fails on ppc64 based platforms
In-Reply-To: <pvzsZVc6iIbQ-MWhSxLrtG91nS639azFKG48x_TUwsY=.baaa4862-f4c8-4556-868f-914ca9995e54@github.com>
References: <pvzsZVc6iIbQ-MWhSxLrtG91nS639azFKG48x_TUwsY=.baaa4862-f4c8-4556-868f-914ca9995e54@github.com>
Message-ID: <w5-FSJ1s2g7rgwAu9aCuU43-cNL8SpqVRg-0TcRfD94=.f326dc2f-368e-4c18-bc69-97ef97f6dfd9@github.com>

On Mon, 2 Sep 2024 12:46:27 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

> We currently fail in CollectorPolicy.young_scaled_initial_ergo_vm after [JDK-8258483](https://bugs.openjdk.org/browse/JDK-8258483) came in.
> AIX / Linux ppc64le show this error :
> 
> [ RUN ] CollectorPolicy.young_scaled_initial_ergo_vm
> test/hotspot/gtest/gc/shared/test_collectorPolicy.cpp:122: Failure
> Expected equality of these values:
>   expected
>     Which is: 44695552
>   NewSize
>     Which is: 41943040
> 
> test/hotspot/gtest/gc/shared/test_collectorPolicy.cpp:78: Failure
> Expected: checker->execute() doesn't generate new fatal failures in the current thread.
>   Actual: it does.
> 
> [ FAILED ] CollectorPolicy.young_scaled_initial_ergo_vm (0 ms)
> 
> So the decrease form 80M to 40M was too much for these platforms (they slightly differ in ergo/startup behavior).

LGTM.

-------------

Marked as reviewed by mdoerr (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20820#pullrequestreview-2277598668

From lucy at openjdk.org  Tue Sep  3 17:38:18 2024
From: lucy at openjdk.org (Lutz Schmidt)
Date: Tue, 3 Sep 2024 17:38:18 GMT
Subject: RFR: 8339300: CollectorPolicy.young_scaled_initial_ergo_vm gtest
 fails on ppc64 based platforms
In-Reply-To: <pvzsZVc6iIbQ-MWhSxLrtG91nS639azFKG48x_TUwsY=.baaa4862-f4c8-4556-868f-914ca9995e54@github.com>
References: <pvzsZVc6iIbQ-MWhSxLrtG91nS639azFKG48x_TUwsY=.baaa4862-f4c8-4556-868f-914ca9995e54@github.com>
Message-ID: <3cJc7O8zzVLBQ5Vyg88TOtUeRMwm2xBf2XsN0-_G7HA=.e5c77738-44d2-49c9-aa76-46150860903a@github.com>

On Mon, 2 Sep 2024 12:46:27 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

> We currently fail in CollectorPolicy.young_scaled_initial_ergo_vm after [JDK-8258483](https://bugs.openjdk.org/browse/JDK-8258483) came in.
> AIX / Linux ppc64le show this error :
> 
> [ RUN ] CollectorPolicy.young_scaled_initial_ergo_vm
> test/hotspot/gtest/gc/shared/test_collectorPolicy.cpp:122: Failure
> Expected equality of these values:
>   expected
>     Which is: 44695552
>   NewSize
>     Which is: 41943040
> 
> test/hotspot/gtest/gc/shared/test_collectorPolicy.cpp:78: Failure
> Expected: checker->execute() doesn't generate new fatal failures in the current thread.
>   Actual: it does.
> 
> [ FAILED ] CollectorPolicy.young_scaled_initial_ergo_vm (0 ms)
> 
> So the decrease form 80M to 40M was too much for these platforms (they slightly differ in ergo/startup behavior).

LGTM.

-------------

Marked as reviewed by lucy (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20820#pullrequestreview-2278070784

From mbaesken at openjdk.org  Wed Sep  4 07:12:23 2024
From: mbaesken at openjdk.org (Matthias Baesken)
Date: Wed, 4 Sep 2024 07:12:23 GMT
Subject: RFR: 8339300: CollectorPolicy.young_scaled_initial_ergo_vm gtest
 fails on ppc64 based platforms
In-Reply-To: <pvzsZVc6iIbQ-MWhSxLrtG91nS639azFKG48x_TUwsY=.baaa4862-f4c8-4556-868f-914ca9995e54@github.com>
References: <pvzsZVc6iIbQ-MWhSxLrtG91nS639azFKG48x_TUwsY=.baaa4862-f4c8-4556-868f-914ca9995e54@github.com>
Message-ID: <ItmIkA7CnGjnf5_VUx1buc_4k83rZoC4akRVOcNJ3io=.f3503f69-7e0b-48da-a84a-2b39db03a5bb@github.com>

On Mon, 2 Sep 2024 12:46:27 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

> We currently fail in CollectorPolicy.young_scaled_initial_ergo_vm after [JDK-8258483](https://bugs.openjdk.org/browse/JDK-8258483) came in.
> AIX / Linux ppc64le show this error :
> 
> [ RUN ] CollectorPolicy.young_scaled_initial_ergo_vm
> test/hotspot/gtest/gc/shared/test_collectorPolicy.cpp:122: Failure
> Expected equality of these values:
>   expected
>     Which is: 44695552
>   NewSize
>     Which is: 41943040
> 
> test/hotspot/gtest/gc/shared/test_collectorPolicy.cpp:78: Failure
> Expected: checker->execute() doesn't generate new fatal failures in the current thread.
>   Actual: it does.
> 
> [ FAILED ] CollectorPolicy.young_scaled_initial_ergo_vm (0 ms)
> 
> So the decrease form 80M to 40M was too much for these platforms (they slightly differ in ergo/startup behavior).

Hi Lutz and Martin, thanks for the reviews !

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20820#issuecomment-2328083370

From mbaesken at openjdk.org  Wed Sep  4 07:12:23 2024
From: mbaesken at openjdk.org (Matthias Baesken)
Date: Wed, 4 Sep 2024 07:12:23 GMT
Subject: Integrated: 8339300: CollectorPolicy.young_scaled_initial_ergo_vm
 gtest fails on ppc64 based platforms
In-Reply-To: <pvzsZVc6iIbQ-MWhSxLrtG91nS639azFKG48x_TUwsY=.baaa4862-f4c8-4556-868f-914ca9995e54@github.com>
References: <pvzsZVc6iIbQ-MWhSxLrtG91nS639azFKG48x_TUwsY=.baaa4862-f4c8-4556-868f-914ca9995e54@github.com>
Message-ID: <ChTOJu4_a3nInpZIUAYHJ9vcYoFjPOwhSvjrDKpGlbE=.3dbdfedf-ba1f-4ca2-a762-d01bcb46b4c4@github.com>

On Mon, 2 Sep 2024 12:46:27 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

> We currently fail in CollectorPolicy.young_scaled_initial_ergo_vm after [JDK-8258483](https://bugs.openjdk.org/browse/JDK-8258483) came in.
> AIX / Linux ppc64le show this error :
> 
> [ RUN ] CollectorPolicy.young_scaled_initial_ergo_vm
> test/hotspot/gtest/gc/shared/test_collectorPolicy.cpp:122: Failure
> Expected equality of these values:
>   expected
>     Which is: 44695552
>   NewSize
>     Which is: 41943040
> 
> test/hotspot/gtest/gc/shared/test_collectorPolicy.cpp:78: Failure
> Expected: checker->execute() doesn't generate new fatal failures in the current thread.
>   Actual: it does.
> 
> [ FAILED ] CollectorPolicy.young_scaled_initial_ergo_vm (0 ms)
> 
> So the decrease form 80M to 40M was too much for these platforms (they slightly differ in ergo/startup behavior).

This pull request has now been integrated.

Changeset: f2c992c5
Author:    Matthias Baesken <mbaesken at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/f2c992c5af021ab0ff8429fd261314bc7e01f7df
Stats:     3 lines in 1 file changed: 0 ins; 0 del; 3 mod

8339300: CollectorPolicy.young_scaled_initial_ergo_vm gtest fails on ppc64 based platforms

Reviewed-by: mdoerr, lucy

-------------

PR: https://git.openjdk.org/jdk/pull/20820

From tschatzl at openjdk.org  Wed Sep  4 08:04:17 2024
From: tschatzl at openjdk.org (Thomas Schatzl)
Date: Wed, 4 Sep 2024 08:04:17 GMT
Subject: RFR: 8339369: G1: TestVerificationInConcurrentCycle.java fails
 with "Missing rem set entry" when using
 "-XX:G1RSetUpdatingPauseTimePercent=0 -XX:G1UpdateBufferSize=2"
In-Reply-To: <nNpC_jmImHt9CrCZBEFJxQ9iGpGJDDHffs9wjNn4PlE=.11c86736-4500-4aae-bfb7-ad610797e873@github.com>
References: <nNpC_jmImHt9CrCZBEFJxQ9iGpGJDDHffs9wjNn4PlE=.11c86736-4500-4aae-bfb7-ad610797e873@github.com>
Message-ID: <T5oZFYVFss0ddVME0NVJBbTDLM8r2527GoN9VcP2AIg=.54c2424f-75b8-4fd6-9372-137b4d63838a@github.com>

On Tue, 3 Sep 2024 12:12:56 GMT, Ivan Walulya <iwalulya at openjdk.org> wrote:

> Please review this patch to reset the per region cardsets in the later phases of the full-gc.  This ensures that Remset verification can proceed without considering whether the cardsets are combined or not.
> 
> Testing: passes test in the cited in the bug report and Tiers 1-3

lgtm.

The original code basically dropped the remsets to the young gen which failed by uninstalling them (the remaining remset is empty after all).

-------------

Marked as reviewed by tschatzl (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20835#pullrequestreview-2279294921

From duke at openjdk.org  Wed Sep  4 08:09:34 2024
From: duke at openjdk.org (duke)
Date: Wed, 4 Sep 2024 08:09:34 GMT
Subject: Withdrawn: 8334513: New test gc/TestAlwaysPreTouchBehavior.java is
 failing
In-Reply-To: <ipqRXRam7YQZwHjVSJSkGEuijRakCtopFe4BZzdKIOQ=.c84dabac-e588-437f-97c8-ae25370d5ee9@github.com>
References: <ipqRXRam7YQZwHjVSJSkGEuijRakCtopFe4BZzdKIOQ=.c84dabac-e588-437f-97c8-ae25370d5ee9@github.com>
Message-ID: <v90uSxM5nXxLwu4UrI8HpQ7piNGYiAZ2F7acpTHVUwA=.431a2cca-767c-4646-8c88-aecbbc912ac3@github.com>

On Thu, 20 Jun 2024 11:35:06 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> See JBS issue.
> 
> It is not completely obvious what the problem is in Oracle's CI, but the current assumption is that RSS of the testee VM gets reduced after it started and before we measured due to memory pressure.
> 
> The patch:
> - exposes os::available_memory via Whitebox
> - For the test to count as failed, we require a certain minimum size of available memory both before and during the start of the testee JVM. Otherwise, we throw a `SkippedException`
> 
> I have some misgivings about this solution, though:
> 1) obviously, it is not bullet-proof either, since it is vulnerable to fast changes in machine memory load. 
> 2) On MacOS, we have the problem that 'os::available_memory()' totally underreports how much memory is available. Therefore, as an estimate of whether the test is valid, it is too conservative. I opened https://bugs.openjdk.org/browse/JDK-8334767 to track that issue. As long as it is not fixed, the tests will likely fall below the threshold on MacOS and, therefore, be skipped. Still, this is somewhat better than outright excluding the test for MacOS (or is it? Open to opinions)
> 3) `SkippedException` leads to the test counting as "passed", not "skipped". I think that is a usability issue with jtreg. I cannot easily see which tests had been skipped due to SkippedException.
> 
> Despite my doubts, I think this is the best we can come up with if we want to have such a test.
> 
> Note: One way to go about (3) would be to make "minimum available memory" a `@requires` tag, similar to os.maxMemory. However, I fear that this may be easily misused and cause many tests to be excluded without notice.

This pull request has been closed without being integrated.

-------------

PR: https://git.openjdk.org/jdk/pull/19803

From duke at openjdk.org  Wed Sep  4 08:51:20 2024
From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=)
Date: Wed, 4 Sep 2024 08:51:20 GMT
Subject: RFR: 8339163: ZGC: Race in clearing of remembered sets
In-Reply-To: <KYZHOCj7EwLNGG_dGq4xsSCIy4hjfF5OeV9OR8Mzh84=.627bacb8-ecb6-4982-9b89-da37b7e04ad3@github.com>
References: <cx0c1Zz45FbeaOxJd8PqcvvDrY1d3Os1O5TFt0ryIMs=.6a8dc54e-b5d1-4241-b553-3163504df844@github.com>
 <KYZHOCj7EwLNGG_dGq4xsSCIy4hjfF5OeV9OR8Mzh84=.627bacb8-ecb6-4982-9b89-da37b7e04ad3@github.com>
Message-ID: <BJB_2GsFoB2LVLSLzIHk-x2VVn1FECPyIHXu4esz2Mo=.3f2cfc02-abcd-4bf7-9b07-24046c30dbf9@github.com>

On Mon, 2 Sep 2024 13:28:48 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:

>> When a young collection is in the "concurrent mark" phase and is scanning remembered sets (remsets) to find roots into the young gen it "consumes" the remset when it is finished by clearing it (using memset).
>> 
>> At the same time, an old collection might find a completely empty/garbage page that it will insert into the page cache. Before inserting into the page cache, the page's remset is cleared (using memset).
>> 
>> These two operations might interfere, resulting in both threads clearing the memory simultaneously.
>> 
>> This race was found in connection to https://bugs.openjdk.org/browse/JDK-8339161 where I experimented replacing some clears of remsets with free's and got a crash on Windows from memset when operating on free'd memory.
>> 
>> This patch makes sure that remsets are only cleared in the "concurrent mark" phase if not already handled by an old collection.
>> 
>> Tested with tiers 1-3 and with a local test that crashes if both threads handle the remset.
>
> Looks good. Great that you found this!

Thank you for the reviews! @stefank @fisk @xmas92

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20821#issuecomment-2328277567

From duke at openjdk.org  Wed Sep  4 08:51:21 2024
From: duke at openjdk.org (duke)
Date: Wed, 4 Sep 2024 08:51:21 GMT
Subject: RFR: 8339163: ZGC: Race in clearing of remembered sets
In-Reply-To: <cx0c1Zz45FbeaOxJd8PqcvvDrY1d3Os1O5TFt0ryIMs=.6a8dc54e-b5d1-4241-b553-3163504df844@github.com>
References: <cx0c1Zz45FbeaOxJd8PqcvvDrY1d3Os1O5TFt0ryIMs=.6a8dc54e-b5d1-4241-b553-3163504df844@github.com>
Message-ID: <tQdhjCd7RznF9xIeINLhrOUFDCNGyiJU1lXol9w8vNM=.8785828f-b1b3-4653-b449-c75d04993442@github.com>

On Mon, 2 Sep 2024 13:09:09 GMT, Joel Sikstr?m <duke at openjdk.org> wrote:

> When a young collection is in the "concurrent mark" phase and is scanning remembered sets (remsets) to find roots into the young gen it "consumes" the remset when it is finished by clearing it (using memset).
> 
> At the same time, an old collection might find a completely empty/garbage page that it will insert into the page cache. Before inserting into the page cache, the page's remset is cleared (using memset).
> 
> These two operations might interfere, resulting in both threads clearing the memory simultaneously.
> 
> This race was found in connection to https://bugs.openjdk.org/browse/JDK-8339161 where I experimented replacing some clears of remsets with free's and got a crash on Windows from memset when operating on free'd memory.
> 
> This patch makes sure that remsets are only cleared in the "concurrent mark" phase if not already handled by an old collection.
> 
> Tested with tiers 1-3 and with a local test that crashes if both threads handle the remset.

@jsikstro 
Your change (at version 109ee7e0fbc088b555f55012e766b7c444ee8fbf) is now ready to be sponsored by a Committer.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20821#issuecomment-2328278552

From duke at openjdk.org  Wed Sep  4 08:53:21 2024
From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=)
Date: Wed, 4 Sep 2024 08:53:21 GMT
Subject: RFR: 8339399: ZGC: Remove unnecessary page reset when splitting
 pages
In-Reply-To: <heNKhjTN9YGOx9dafhe_G4W1LPhz8AlPmF2I6rjFEZY=.5cb6cfe5-c0a4-44a0-b985-07276ba0275b@github.com>
References: <pSoFmEr2nB_zJe6GVTbNfq7FSYDFdAq8eneGfXfzfXc=.17f37411-e87d-48fb-a4f6-f7f6276a45ed@github.com>
 <heNKhjTN9YGOx9dafhe_G4W1LPhz8AlPmF2I6rjFEZY=.5cb6cfe5-c0a4-44a0-b985-07276ba0275b@github.com>
Message-ID: <TOUkmMfmr6I1eXy7V9kpN_2so4Ece2sa7NtZv4UJ03k=.39f0c96a-a268-4304-abad-6b3ba36425f4@github.com>

On Mon, 2 Sep 2024 16:56:27 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:

>> There is no use-case where it is necessary to reset the same page after splitting and thus the reset should be removed.
>> 
>> Tested with tiers 1-3.
>
> Looks good.

Thank you for the reviews! @stefank @fisk @xmas92

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20824#issuecomment-2328279272

From duke at openjdk.org  Wed Sep  4 08:53:22 2024
From: duke at openjdk.org (duke)
Date: Wed, 4 Sep 2024 08:53:22 GMT
Subject: RFR: 8339399: ZGC: Remove unnecessary page reset when splitting
 pages
In-Reply-To: <pSoFmEr2nB_zJe6GVTbNfq7FSYDFdAq8eneGfXfzfXc=.17f37411-e87d-48fb-a4f6-f7f6276a45ed@github.com>
References: <pSoFmEr2nB_zJe6GVTbNfq7FSYDFdAq8eneGfXfzfXc=.17f37411-e87d-48fb-a4f6-f7f6276a45ed@github.com>
Message-ID: <UT_hTgdIR9NADyPC1WZZWm_08yR17fjTdA69jmURNRs=.59e188e2-6b12-4afc-82e7-b412a39e80ad@github.com>

On Mon, 2 Sep 2024 16:15:57 GMT, Joel Sikstr?m <duke at openjdk.org> wrote:

> There is no use-case where it is necessary to reset the same page after splitting and thus the reset should be removed.
> 
> Tested with tiers 1-3.

@jsikstro 
Your change (at version b6fee02735ad4124e1f6e9eb1ab2654ad7444ddf) is now ready to be sponsored by a Committer.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20824#issuecomment-2328282425

From duke at openjdk.org  Wed Sep  4 08:58:22 2024
From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=)
Date: Wed, 4 Sep 2024 08:58:22 GMT
Subject: Integrated: 8339399: ZGC: Remove unnecessary page reset when splitting
 pages
In-Reply-To: <pSoFmEr2nB_zJe6GVTbNfq7FSYDFdAq8eneGfXfzfXc=.17f37411-e87d-48fb-a4f6-f7f6276a45ed@github.com>
References: <pSoFmEr2nB_zJe6GVTbNfq7FSYDFdAq8eneGfXfzfXc=.17f37411-e87d-48fb-a4f6-f7f6276a45ed@github.com>
Message-ID: <qiBWISY5Dx45gTPA3HKzohFdpzHK7FioxCnxgL_WgvY=.37e8e54f-d8d0-4d17-95aa-3dfa0f708bd2@github.com>

On Mon, 2 Sep 2024 16:15:57 GMT, Joel Sikstr?m <duke at openjdk.org> wrote:

> There is no use-case where it is necessary to reset the same page after splitting and thus the reset should be removed.
> 
> Tested with tiers 1-3.

This pull request has now been integrated.

Changeset: a6186051
Author:    Joel Sikstr?m <joel.sikstrom at oracle.com>
Committer: Stefan Karlsson <stefank at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/a61860511f67038962c54e114599948ca103dae8
Stats:     11 lines in 2 files changed: 1 ins; 10 del; 0 mod

8339399: ZGC: Remove unnecessary page reset when splitting pages

Reviewed-by: stefank, eosterlund, aboldtch

-------------

PR: https://git.openjdk.org/jdk/pull/20824

From rcastanedalo at openjdk.org  Wed Sep  4 09:06:45 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Wed, 4 Sep 2024 09:06:45 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v14]
In-Reply-To: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
Message-ID: <uqFMBxG31t7tna4W2ePWZBRNNhNuhoRp1eD-SjYgvDU=.7589248a-5993-45db-93da-45e754661bb3@github.com>

> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail.
> 
> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold:
> 
> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and
> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work.
> 
> ## Summary of the Changes
> 
> ### Platform-Independent Changes (`src/hotspot/share`)
> 
> These consist mainly of:
> 
> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early;
> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and
> - temporary support for porting the JEP to the remaining platforms.
> 
> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed.
> 
> ### Platform-Dependent Changes (`src/hotspot/cpu`)
> 
> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms.
> 
> #### ADL Changes
> 
> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy.
> 
> #### `G1BarrierSetAssembler` Changes
> 
> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c...

Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision:

  8334111: Implementation of Late Barrier Expansion for G1: ppc port

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19746/files
  - new: https://git.openjdk.org/jdk/pull/19746/files/1ea2862f..ed9c0232

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=13
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=12-13

  Stats: 1036 lines in 5 files changed: 947 ins; 64 del; 25 mod
  Patch: https://git.openjdk.org/jdk/pull/19746.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746

PR: https://git.openjdk.org/jdk/pull/19746

From rcastanedalo at openjdk.org  Wed Sep  4 09:10:27 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Wed, 4 Sep 2024 09:10:27 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v10]
In-Reply-To: <yHuzoQPsVCuM6xWygtXCqi6dSly0MsntEVgbfDZXmSM=.850e274e-c02b-4fdc-b082-75281915cb41@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <HyDyhSjB3aFOOF2fxzDj-LFpIk0eVdbwHQfoS02IwhQ=.c39c70b9-91b8-429d-bcd7-bd734124f921@github.com>
 <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com>
 <rLcA4r_iAClrOKg6lOTU6PVw-77l_pUxL36aRf7SO6k=.7fcc84d1-34cb-4908-a4a1-c81cc1209a83@github.com>
 <3ME602kl5NBEdiWT53M-B-YHtyU4yYgwc-lVFPxjiKg=.7298f764-788a-4c79-a2aa-324407c63813@github.com>
 <yHuzoQPsVCuM6xWygtXCqi6dSly0MsntEVgbfDZXmSM=.850e274e-c02b-4fdc-b082-75281915cb41@github.com>
Message-ID: <SkS1lfhV1nEt7Evm7FMTEHenfabzl4VgX7jEItIKtBY=.aca672bb-6f6d-40cb-94eb-b6a8ba65892a@github.com>

On Tue, 3 Sep 2024 12:17:58 GMT, Martin Doerr <mdoerr at openjdk.org> wrote:

> I have an implementation for PPC64: https://github.com/TheRealMDoerr/jdk/commit/ed9c0232f53a15d768804348e1d8a111fed9a19e
Do you prefer integrating it soon?

That's great, thank you for your work and your comments and suggestions, Martin! I just merged your implementation.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2328319555

From duke at openjdk.org  Wed Sep  4 09:12:23 2024
From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=)
Date: Wed, 4 Sep 2024 09:12:23 GMT
Subject: Integrated: 8339163: ZGC: Race in clearing of remembered sets
In-Reply-To: <cx0c1Zz45FbeaOxJd8PqcvvDrY1d3Os1O5TFt0ryIMs=.6a8dc54e-b5d1-4241-b553-3163504df844@github.com>
References: <cx0c1Zz45FbeaOxJd8PqcvvDrY1d3Os1O5TFt0ryIMs=.6a8dc54e-b5d1-4241-b553-3163504df844@github.com>
Message-ID: <qOLZnH4PGjAYT55S6yr3PyDML48yGyrNWKEuk-hyuic=.1640f7d2-f3c7-44c2-bc61-00960e74dc7e@github.com>

On Mon, 2 Sep 2024 13:09:09 GMT, Joel Sikstr?m <duke at openjdk.org> wrote:

> When a young collection is in the "concurrent mark" phase and is scanning remembered sets (remsets) to find roots into the young gen it "consumes" the remset when it is finished by clearing it (using memset).
> 
> At the same time, an old collection might find a completely empty/garbage page that it will insert into the page cache. Before inserting into the page cache, the page's remset is cleared (using memset).
> 
> These two operations might interfere, resulting in both threads clearing the memory simultaneously.
> 
> This race was found in connection to https://bugs.openjdk.org/browse/JDK-8339161 where I experimented replacing some clears of remsets with free's and got a crash on Windows from memset when operating on free'd memory.
> 
> This patch makes sure that remsets are only cleared in the "concurrent mark" phase if not already handled by an old collection.
> 
> Tested with tiers 1-3 and with a local test that crashes if both threads handle the remset.

This pull request has now been integrated.

Changeset: 7ad61605
Author:    Joel Sikstr?m <joel.sikstrom at oracle.com>
Committer: Stefan Karlsson <stefank at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/7ad61605f1669f51a97f4f263a7afaa9ab7706be
Stats:     26 lines in 2 files changed: 14 ins; 9 del; 3 mod

8339163: ZGC: Race in clearing of remembered sets

Reviewed-by: stefank, eosterlund, aboldtch

-------------

PR: https://git.openjdk.org/jdk/pull/20821

From kbarrett at openjdk.org  Wed Sep  4 19:37:19 2024
From: kbarrett at openjdk.org (Kim Barrett)
Date: Wed, 4 Sep 2024 19:37:19 GMT
Subject: RFR: 8339369: G1: TestVerificationInConcurrentCycle.java fails
 with "Missing rem set entry" when using
 "-XX:G1RSetUpdatingPauseTimePercent=0 -XX:G1UpdateBufferSize=2"
In-Reply-To: <nNpC_jmImHt9CrCZBEFJxQ9iGpGJDDHffs9wjNn4PlE=.11c86736-4500-4aae-bfb7-ad610797e873@github.com>
References: <nNpC_jmImHt9CrCZBEFJxQ9iGpGJDDHffs9wjNn4PlE=.11c86736-4500-4aae-bfb7-ad610797e873@github.com>
Message-ID: <7hxjryXYK8mRcjIc9ph0g0FJEqknP9-UBRZZhizFJSY=.9609fa18-6a9a-485c-a813-3919834ffbd4@github.com>

On Tue, 3 Sep 2024 12:12:56 GMT, Ivan Walulya <iwalulya at openjdk.org> wrote:

> Please review this patch to reset the per region cardsets in the later phases of the full-gc.  This ensures that Remset verification can proceed without considering whether the cardsets are combined or not.
> 
> Testing: passes test in the cited in the bug report and Tiers 1-3

Looks good.

-------------

Marked as reviewed by kbarrett (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20835#pullrequestreview-2281081437

From iwalulya at openjdk.org  Thu Sep  5 08:21:00 2024
From: iwalulya at openjdk.org (Ivan Walulya)
Date: Thu, 5 Sep 2024 08:21:00 GMT
Subject: RFR: 8339369: G1: TestVerificationInConcurrentCycle.java fails
 with "Missing rem set entry" when using
 "-XX:G1RSetUpdatingPauseTimePercent=0 -XX:G1UpdateBufferSize=2"
In-Reply-To: <T5oZFYVFss0ddVME0NVJBbTDLM8r2527GoN9VcP2AIg=.54c2424f-75b8-4fd6-9372-137b4d63838a@github.com>
References: <nNpC_jmImHt9CrCZBEFJxQ9iGpGJDDHffs9wjNn4PlE=.11c86736-4500-4aae-bfb7-ad610797e873@github.com>
 <T5oZFYVFss0ddVME0NVJBbTDLM8r2527GoN9VcP2AIg=.54c2424f-75b8-4fd6-9372-137b4d63838a@github.com>
Message-ID: <DP_omp6qFywIwI-zjLRVq02E-6_ph-ecDSANRQC16Ws=.f63a550e-b031-4712-aed7-5d250aee3a7a@github.com>

On Wed, 4 Sep 2024 08:01:24 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:

>> Please review this patch to reset the per region cardsets in the later phases of the full-gc.  This ensures that Remset verification can proceed without considering whether the cardsets are combined or not.
>> 
>> Testing: passes test in the cited in the bug report and Tiers 1-3
>
> lgtm.
> 
> The original code basically dropped the remsets to the young gen which failed by uninstalling them (the remaining remset is empty after all).

Thanks @tschatzl and @kimbarrett for the reviews!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20835#issuecomment-2330898844

From iwalulya at openjdk.org  Thu Sep  5 08:21:02 2024
From: iwalulya at openjdk.org (Ivan Walulya)
Date: Thu, 5 Sep 2024 08:21:02 GMT
Subject: Integrated: 8339369: G1: TestVerificationInConcurrentCycle.java fails
 with "Missing rem set entry" when using
 "-XX:G1RSetUpdatingPauseTimePercent=0 -XX:G1UpdateBufferSize=2"
In-Reply-To: <nNpC_jmImHt9CrCZBEFJxQ9iGpGJDDHffs9wjNn4PlE=.11c86736-4500-4aae-bfb7-ad610797e873@github.com>
References: <nNpC_jmImHt9CrCZBEFJxQ9iGpGJDDHffs9wjNn4PlE=.11c86736-4500-4aae-bfb7-ad610797e873@github.com>
Message-ID: <_Y9Z3b2iSPXZQ82AeH8cH144Jb8zyqnTGIMzW2xqvOY=.d64c1478-bb7c-4b42-99c0-2aec1802e09b@github.com>

On Tue, 3 Sep 2024 12:12:56 GMT, Ivan Walulya <iwalulya at openjdk.org> wrote:

> Please review this patch to reset the per region cardsets in the later phases of the full-gc.  This ensures that Remset verification can proceed without considering whether the cardsets are combined or not.
> 
> Testing: passes test in the cited in the bug report and Tiers 1-3

This pull request has now been integrated.

Changeset: 96a0502d
Author:    Ivan Walulya <iwalulya at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/96a0502d624e3eff1b00a7c63e8b3a27870b475e
Stats:     3 lines in 2 files changed: 2 ins; 1 del; 0 mod

8339369: G1: TestVerificationInConcurrentCycle.java fails with "Missing rem set entry" when using "-XX:G1RSetUpdatingPauseTimePercent=0 -XX:G1UpdateBufferSize=2"

Reviewed-by: tschatzl, kbarrett

-------------

PR: https://git.openjdk.org/jdk/pull/20835

From rcastanedalo at openjdk.org  Thu Sep  5 10:05:12 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Thu, 5 Sep 2024 10:05:12 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v15]
In-Reply-To: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
Message-ID: <k_fBpG4Ihb9993DGWcwnI7Co9qDzh-ajQU8rQuUBTYk=.fd889de7-87f5-44a8-8b0e-c4301771cc5c@github.com>

> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail.
> 
> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold:
> 
> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and
> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work.
> 
> ## Summary of the Changes
> 
> ### Platform-Independent Changes (`src/hotspot/share`)
> 
> These consist mainly of:
> 
> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early;
> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and
> - temporary support for porting the JEP to the remaining platforms.
> 
> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed.
> 
> ### Platform-Dependent Changes (`src/hotspot/cpu`)
> 
> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms.
> 
> #### ADL Changes
> 
> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy.
> 
> #### `G1BarrierSetAssembler` Changes
> 
> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c...

Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision:

  Remove unnecessary g1LoadXVolatile instructions in aarch64

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19746/files
  - new: https://git.openjdk.org/jdk/pull/19746/files/ed9c0232..9821e795

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=14
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=13-14

  Stats: 71 lines in 2 files changed: 4 ins; 51 del; 16 mod
  Patch: https://git.openjdk.org/jdk/pull/19746.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746

PR: https://git.openjdk.org/jdk/pull/19746

From rcastanedalo at openjdk.org  Thu Sep  5 10:09:55 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Thu, 5 Sep 2024 10:09:55 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v13]
In-Reply-To: <eSsCiMnHnlntWJ9SFMRSrudc8lg7rJmQL4rufSE2CWY=.8ca92584-7279-43aa-8acc-9a46690687c7@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <M2O8S3yutUbRUOg7bpIshy8fREgJrhuR_i3d3NhXjDs=.639bd779-3ebc-4598-b968-84a17a1ac35c@github.com>
 <eSsCiMnHnlntWJ9SFMRSrudc8lg7rJmQL4rufSE2CWY=.8ca92584-7279-43aa-8acc-9a46690687c7@github.com>
Message-ID: <xRoKF1fOkzG16ss8s72y3ilvF82tfjHK_FiyCqfWDWY=.b138df58-53b0-4504-870a-096c779a1e09@github.com>

On Tue, 3 Sep 2024 12:04:09 GMT, Martin Doerr <mdoerr at openjdk.org> wrote:

>> Roberto Casta?eda Lozano has updated the pull request incrementally with four additional commits since the last revision:
>> 
>>  - Increase test coverage of new-object stores with different type information
>>  - Refactor the two post-barrier removal cases into a single expression
>>  - Remove unnecessary early null-based post-barrier elision
>>  - Make store capturability test G1-specific and more precise
>
> src/hotspot/cpu/aarch64/gc/g1/g1_aarch64.ad line 646:
> 
>> 644: instruct g1LoadPVolatile(iRegPNoSp dst, indirect mem, iRegPNoSp tmp1, iRegPNoSp tmp2, rFlagsReg cr)
>> 645: %{
>> 646:   predicate(UseG1GC && needs_acquiring_load(n) && n->as_Load()->barrier_data() != 0);
> 
> Remark: This node should never match because the `referent` is never volatile (same for `g1LoadNVolatile`): https://github.com/openjdk/jdk/blob/7a418fc07464fe359a0b45b6d797c65c573770cb/src/java.base/share/classes/java/lang/ref/Reference.java#L157
> Hence, `needs_acquiring_load(n)` and `n->as_Load()->barrier_data() != 0` are never true at the same time.
> Not sure if this should somehow be reflected in the code. I've inserted `ShouldNotReachHere` on PPC64.

Good catch, thanks! I simply removed the `g1LoadXVolatile` patterns and added a comment explaining why they are not needed (commit 9821e795). The matcher should already fail if we ever end up with an erroneous `LoadX` node `n` for which `UseG1GC && needs_acquiring_load(n) && n->as_Load()->barrier_data() != 0` holds.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1745185394

From mdoerr at openjdk.org  Thu Sep  5 10:45:55 2024
From: mdoerr at openjdk.org (Martin Doerr)
Date: Thu, 5 Sep 2024 10:45:55 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v13]
In-Reply-To: <xRoKF1fOkzG16ss8s72y3ilvF82tfjHK_FiyCqfWDWY=.b138df58-53b0-4504-870a-096c779a1e09@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <M2O8S3yutUbRUOg7bpIshy8fREgJrhuR_i3d3NhXjDs=.639bd779-3ebc-4598-b968-84a17a1ac35c@github.com>
 <eSsCiMnHnlntWJ9SFMRSrudc8lg7rJmQL4rufSE2CWY=.8ca92584-7279-43aa-8acc-9a46690687c7@github.com>
 <xRoKF1fOkzG16ss8s72y3ilvF82tfjHK_FiyCqfWDWY=.b138df58-53b0-4504-870a-096c779a1e09@github.com>
Message-ID: <bjUziksC4EvtMMp2Zh7Yk18j9dsaG1VtlAm4vx1_NWA=.596a2b32-492e-4db9-b5de-41befc6a6258@github.com>

On Thu, 5 Sep 2024 10:07:14 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> src/hotspot/cpu/aarch64/gc/g1/g1_aarch64.ad line 646:
>> 
>>> 644: instruct g1LoadPVolatile(iRegPNoSp dst, indirect mem, iRegPNoSp tmp1, iRegPNoSp tmp2, rFlagsReg cr)
>>> 645: %{
>>> 646:   predicate(UseG1GC && needs_acquiring_load(n) && n->as_Load()->barrier_data() != 0);
>> 
>> Remark: This node should never match because the `referent` is never volatile (same for `g1LoadNVolatile`): https://github.com/openjdk/jdk/blob/7a418fc07464fe359a0b45b6d797c65c573770cb/src/java.base/share/classes/java/lang/ref/Reference.java#L157
>> Hence, `needs_acquiring_load(n)` and `n->as_Load()->barrier_data() != 0` are never true at the same time.
>> Not sure if this should somehow be reflected in the code. I've inserted `ShouldNotReachHere` on PPC64.
>
> Good catch, thanks! I simply removed the `g1LoadXVolatile` patterns and added a comment explaining why they are not needed (commit 9821e795). The matcher should already fail if we ever end up with an erroneous `LoadX` node `n` for which `UseG1GC && needs_acquiring_load(n) && n->as_Load()->barrier_data() != 0` holds.

Correct. Only the error message may be not so nice ("bad AD file").
PPC64 still has `g1LoadP_acq` and `g1LoadN_acq` which could also be replaced by a comment. But it's not important.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1745230285

From duke at openjdk.org  Thu Sep  5 12:23:17 2024
From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=)
Date: Thu, 5 Sep 2024 12:23:17 GMT
Subject: RFR: 8339579: ZGC: Race results in only one of two remembered sets
 being cleared
Message-ID: <MHwQmhgqQHpcWUxjJ--SMAxQluFKbj5UZBMNP3htwHs=.3c9b9a85-3c70-4915-b9d3-ff551f452cd6@github.com>

https://github.com/openjdk/jdk/pull/20821 introduced a fix that skips clearing of a potential remembered set in the young collection in favor of doing it only in the old collection.

The code responsible for clearing the remset (`ZRememberedSet::clear_all()`) is dependent on the `_current` variable in ZRememberedSet not being altered in between the two "clear_*" calls in `ZRememberedSet::clear_all()`. If the remembered set is being cleared in the old collection, and successfully clears the current remset with `clear_current()`, a young collection might then flip/swap the current/previous remembered sets by changing the `_current` value in ZRememberedSet. This would mean that the call to clear_previous() would clean the same bitmap again, resulting in one of the two bitmaps not being cleared at all. This will later crash in an assert checking if both bitmaps are empty/clear when the page is being freed.

Code in question:
```c++
void ZRememberedSet::clear_all() {
  clear_current();
  clear_previous();
}


This PR makes sure that clearing is done independently of what the `_current` value is, by accessing the two bitmaps directly.

-------------

Commit messages:
 - 8339579: ZGC: Race results in only one of two remembered sets being cleared

Changes: https://git.openjdk.org/jdk/pull/20869/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20869&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8339579
  Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/20869.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20869/head:pull/20869

PR: https://git.openjdk.org/jdk/pull/20869

From stefank at openjdk.org  Thu Sep  5 12:44:49 2024
From: stefank at openjdk.org (Stefan Karlsson)
Date: Thu, 5 Sep 2024 12:44:49 GMT
Subject: RFR: 8339579: ZGC: Race results in only one of two remembered sets
 being cleared
In-Reply-To: <MHwQmhgqQHpcWUxjJ--SMAxQluFKbj5UZBMNP3htwHs=.3c9b9a85-3c70-4915-b9d3-ff551f452cd6@github.com>
References: <MHwQmhgqQHpcWUxjJ--SMAxQluFKbj5UZBMNP3htwHs=.3c9b9a85-3c70-4915-b9d3-ff551f452cd6@github.com>
Message-ID: <99UpwbYB_lIljopYQicstFc76LlA0icGylPglJpGW9Q=.8405b963-ea12-4a14-893b-74c645a3b52a@github.com>

On Thu, 5 Sep 2024 12:18:48 GMT, Joel Sikstr?m <duke at openjdk.org> wrote:

> https://github.com/openjdk/jdk/pull/20821 introduced a fix that skips clearing of a potential remembered set in the young collection in favor of doing it only in the old collection.
> 
> The code responsible for clearing the remset (`ZRememberedSet::clear_all()`) is dependent on the `_current` variable in ZRememberedSet not being altered in between the two "clear_*" calls in `ZRememberedSet::clear_all()`. If the remembered set is being cleared in the old collection, and successfully clears the current remset with `clear_current()`, a young collection might then flip/swap the current/previous remembered sets by changing the `_current` value in ZRememberedSet. This would mean that the call to clear_previous() would clean the same bitmap again, resulting in one of the two bitmaps not being cleared at all. This will later crash in an assert checking if both bitmaps are empty/clear when the page is being freed.
> 
> Code in question:
> ```c++
> void ZRememberedSet::clear_all() {
>   clear_current();
>   clear_previous();
> }
> 
> 
> This PR makes sure that clearing is done independently of what the `_current` value is, by accessing the two bitmaps directly.

Looks good!

-------------

Marked as reviewed by stefank (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20869#pullrequestreview-2282883401

From sjohanss at openjdk.org  Thu Sep  5 13:18:54 2024
From: sjohanss at openjdk.org (Stefan Johansson)
Date: Thu, 5 Sep 2024 13:18:54 GMT
Subject: RFR: 8339579: ZGC: Race results in only one of two remembered sets
 being cleared
In-Reply-To: <MHwQmhgqQHpcWUxjJ--SMAxQluFKbj5UZBMNP3htwHs=.3c9b9a85-3c70-4915-b9d3-ff551f452cd6@github.com>
References: <MHwQmhgqQHpcWUxjJ--SMAxQluFKbj5UZBMNP3htwHs=.3c9b9a85-3c70-4915-b9d3-ff551f452cd6@github.com>
Message-ID: <wedavz42do2gXOQAETQV5sF4mk-q1abJKxk4_tRBlNU=.c1aee146-2cb4-4c8d-b49d-07f28c5fa320@github.com>

On Thu, 5 Sep 2024 12:18:48 GMT, Joel Sikstr?m <duke at openjdk.org> wrote:

> https://github.com/openjdk/jdk/pull/20821 introduced a fix that skips clearing of a potential remembered set in the young collection in favor of doing it only in the old collection.
> 
> The code responsible for clearing the remset (`ZRememberedSet::clear_all()`) is dependent on the `_current` variable in ZRememberedSet not being altered in between the two "clear_*" calls in `ZRememberedSet::clear_all()`. If the remembered set is being cleared in the old collection, and successfully clears the current remset with `clear_current()`, a young collection might then flip/swap the current/previous remembered sets by changing the `_current` value in ZRememberedSet. This would mean that the call to clear_previous() would clean the same bitmap again, resulting in one of the two bitmaps not being cleared at all. This will later crash in an assert checking if both bitmaps are empty/clear when the page is being freed.
> 
> Code in question:
> ```c++
> void ZRememberedSet::clear_all() {
>   clear_current();
>   clear_previous();
> }
> 
> 
> This PR makes sure that clearing is done independently of what the `_current` value is, by accessing the two bitmaps directly.

Looks good.

-------------

Marked as reviewed by sjohanss (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20869#pullrequestreview-2283029963

From duke at openjdk.org  Thu Sep  5 13:42:53 2024
From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=)
Date: Thu, 5 Sep 2024 13:42:53 GMT
Subject: Integrated: 8339579: ZGC: Race results in only one of two remembered
 sets being cleared
In-Reply-To: <MHwQmhgqQHpcWUxjJ--SMAxQluFKbj5UZBMNP3htwHs=.3c9b9a85-3c70-4915-b9d3-ff551f452cd6@github.com>
References: <MHwQmhgqQHpcWUxjJ--SMAxQluFKbj5UZBMNP3htwHs=.3c9b9a85-3c70-4915-b9d3-ff551f452cd6@github.com>
Message-ID: <ltXyW_anu8zR3U_VwnqLg7w201pQ8AW4wbm9xDBFsfI=.f92befda-48d0-444d-8597-642926062d2a@github.com>

On Thu, 5 Sep 2024 12:18:48 GMT, Joel Sikstr?m <duke at openjdk.org> wrote:

> https://github.com/openjdk/jdk/pull/20821 introduced a fix that skips clearing of a potential remembered set in the young collection in favor of doing it only in the old collection.
> 
> The code responsible for clearing the remset (`ZRememberedSet::clear_all()`) is dependent on the `_current` variable in ZRememberedSet not being altered in between the two "clear_*" calls in `ZRememberedSet::clear_all()`. If the remembered set is being cleared in the old collection, and successfully clears the current remset with `clear_current()`, a young collection might then flip/swap the current/previous remembered sets by changing the `_current` value in ZRememberedSet. This would mean that the call to `clear_previous()` would clean the same bitmap again, resulting in one of the two bitmaps not being cleared at all. This will later crash in an assert checking if both bitmaps are empty/clear when the page is being freed.
> 
> Code in question:
> ```c++
> void ZRememberedSet::clear_all() {
>   clear_current();
>   clear_previous();
> }
> 
> 
> This PR makes sure that clearing is done independently of what the `_current` value is, by accessing the two bitmaps directly.
> 
> Tested with tier5, where the fails/crashes occured before this fix, and a reproducer of the crash as well.

This pull request has now been integrated.

Changeset: ab656c3a
Author:    Joel Sikstr?m <joel.sikstrom at oracle.com>
Committer: Stefan Karlsson <stefank at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/ab656c3aab8157ed8e70bc126881cbadc825de93
Stats:     2 lines in 1 file changed: 0 ins; 0 del; 2 mod

8339579: ZGC: Race results in only one of two remembered sets being cleared

Reviewed-by: stefank, sjohanss

-------------

PR: https://git.openjdk.org/jdk/pull/20869

From duke at openjdk.org  Thu Sep  5 14:03:54 2024
From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=)
Date: Thu, 5 Sep 2024 14:03:54 GMT
Subject: RFR: 8339579: ZGC: Race results in only one of two remembered sets
 being cleared
In-Reply-To: <99UpwbYB_lIljopYQicstFc76LlA0icGylPglJpGW9Q=.8405b963-ea12-4a14-893b-74c645a3b52a@github.com>
References: <MHwQmhgqQHpcWUxjJ--SMAxQluFKbj5UZBMNP3htwHs=.3c9b9a85-3c70-4915-b9d3-ff551f452cd6@github.com>
 <99UpwbYB_lIljopYQicstFc76LlA0icGylPglJpGW9Q=.8405b963-ea12-4a14-893b-74c645a3b52a@github.com>
Message-ID: <iMpdNpppFogesYYXGI9ZS_FyrMUgMYIrwoYkbNjhxuM=.3977f886-3bfc-435b-897b-cf7ec0d1acd9@github.com>

On Thu, 5 Sep 2024 12:42:25 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:

>> https://github.com/openjdk/jdk/pull/20821 introduced a fix that skips clearing of a potential remembered set in the young collection in favor of doing it only in the old collection.
>> 
>> The code responsible for clearing the remset (`ZRememberedSet::clear_all()`) is dependent on the `_current` variable in ZRememberedSet not being altered in between the two "clear_*" calls in `ZRememberedSet::clear_all()`. If the remembered set is being cleared in the old collection, and successfully clears the current remset with `clear_current()`, a young collection might then flip/swap the current/previous remembered sets by changing the `_current` value in ZRememberedSet. This would mean that the call to `clear_previous()` would clean the same bitmap again, resulting in one of the two bitmaps not being cleared at all. This will later crash in an assert checking if both bitmaps are empty/clear when the page is being freed.
>> 
>> Code in question:
>> ```c++
>> void ZRememberedSet::clear_all() {
>>   clear_current();
>>   clear_previous();
>> }
>> 
>> 
>> This PR makes sure that clearing is done independently of what the `_current` value is, by accessing the two bitmaps directly.
>> 
>> Tested with tier5, where the fails/crashes occured before this fix, and a reproducer of the crash as well.
>
> Looks good!

Thank you for the reviews! @stefank @kstefanj

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20869#issuecomment-2331767183

From fjiang at openjdk.org  Thu Sep  5 14:56:02 2024
From: fjiang at openjdk.org (Feilong Jiang)
Date: Thu, 5 Sep 2024 14:56:02 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v10]
In-Reply-To: <SkS1lfhV1nEt7Evm7FMTEHenfabzl4VgX7jEItIKtBY=.aca672bb-6f6d-40cb-94eb-b6a8ba65892a@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <HyDyhSjB3aFOOF2fxzDj-LFpIk0eVdbwHQfoS02IwhQ=.c39c70b9-91b8-429d-bcd7-bd734124f921@github.com>
 <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com>
 <rLcA4r_iAClrOKg6lOTU6PVw-77l_pUxL36aRf7SO6k=.7fcc84d1-34cb-4908-a4a1-c81cc1209a83@github.com>
 <3ME602kl5NBEdiWT53M-B-YHtyU4yYgwc-lVFPxjiKg=.7298f764-788a-4c79-a2aa-324407c63813@github.com>
 <yHuzoQPsVCuM6xWygtXCqi6dSly0MsntEVgbfDZXmSM=.850e274e-c02b-4fdc-b082-75281915cb41@github.com>
 <SkS1lfhV1nEt7Evm7FMTEHenfabzl4VgX7jEItIKtBY=.aca672bb-6f6d-40cb-94eb-b6a8ba65892a@github.com>
Message-ID: <m9AkvkF9aM4PMSX8WlgGX098Ws91SZm7UdMaCqKj3og=.772b24ba-ca65-41d7-95b1-1dab3dbd7e0b@github.com>

On Wed, 4 Sep 2024 09:07:23 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> @robcasloz: Thanks for the updates! I have an implementation for PPC64: https://github.com/TheRealMDoerr/jdk/commit/ed9c0232f53a15d768804348e1d8a111fed9a19e
>> Do you prefer integrating it soon? Otherwise, I could keep it separate and do more rebasing and testing while this PR evolves further.
>
>> I have an implementation for PPC64: https://github.com/TheRealMDoerr/jdk/commit/ed9c0232f53a15d768804348e1d8a111fed9a19e
> Do you prefer integrating it soon?
> 
> That's great, thank you for your work and your comments and suggestions, Martin! I just merged your implementation.

Hi @robcasloz, here is the implementation for RISC-V: https://github.com/feilongjiang/jdk/commit/1c012cfd4a1f4d6e39d6f4798281423f884e97a6
We are still testing the latest changes, results will be updated later.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2331932063

From rcastanedalo at openjdk.org  Thu Sep  5 16:06:59 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Thu, 5 Sep 2024 16:06:59 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v10]
In-Reply-To: <SkS1lfhV1nEt7Evm7FMTEHenfabzl4VgX7jEItIKtBY=.aca672bb-6f6d-40cb-94eb-b6a8ba65892a@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <HyDyhSjB3aFOOF2fxzDj-LFpIk0eVdbwHQfoS02IwhQ=.c39c70b9-91b8-429d-bcd7-bd734124f921@github.com>
 <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com>
 <rLcA4r_iAClrOKg6lOTU6PVw-77l_pUxL36aRf7SO6k=.7fcc84d1-34cb-4908-a4a1-c81cc1209a83@github.com>
 <3ME602kl5NBEdiWT53M-B-YHtyU4yYgwc-lVFPxjiKg=.7298f764-788a-4c79-a2aa-324407c63813@github.com>
 <yHuzoQPsVCuM6xWygtXCqi6dSly0MsntEVgbfDZXmSM=.850e274e-c02b-4fdc-b082-75281915cb41@github.com>
 <SkS1lfhV1nEt7Evm7FMTEHenfabzl4VgX7jEItIKtBY=.aca672bb-6f6d-40cb-94eb-b6a8ba65892a@github.com>
Message-ID: <yGcw0wtBi6C0PPSbCGDj3Bb4V8xui6xoFpAx8nIgUGQ=.822b7451-2a73-43eb-8063-e11c5e0aa525@github.com>

On Wed, 4 Sep 2024 09:07:23 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> @robcasloz: Thanks for the updates! I have an implementation for PPC64: https://github.com/TheRealMDoerr/jdk/commit/ed9c0232f53a15d768804348e1d8a111fed9a19e
>> Do you prefer integrating it soon? Otherwise, I could keep it separate and do more rebasing and testing while this PR evolves further.
>
>> I have an implementation for PPC64: https://github.com/TheRealMDoerr/jdk/commit/ed9c0232f53a15d768804348e1d8a111fed9a19e
> Do you prefer integrating it soon?
> 
> That's great, thank you for your work and your comments and suggestions, Martin! I just merged your implementation.

> Hi @robcasloz, here is the implementation for RISC-V: [feilongjiang at 1c012cf](https://github.com/feilongjiang/jdk/commit/1c012cfd4a1f4d6e39d6f4798281423f884e97a6) We are still testing the latest changes, results will be updated later.

Great, thanks @feilongjiang! Just let me know when you are done with testing and I will merge it into this changeset.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2332119624

From mdoerr at openjdk.org  Thu Sep  5 18:18:56 2024
From: mdoerr at openjdk.org (Martin Doerr)
Date: Thu, 5 Sep 2024 18:18:56 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v15]
In-Reply-To: <k_fBpG4Ihb9993DGWcwnI7Co9qDzh-ajQU8rQuUBTYk=.fd889de7-87f5-44a8-8b0e-c4301771cc5c@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <k_fBpG4Ihb9993DGWcwnI7Co9qDzh-ajQU8rQuUBTYk=.fd889de7-87f5-44a8-8b0e-c4301771cc5c@github.com>
Message-ID: <is3PXG-bGpZ6vKcxGLC-ttQjvEhu43bQr48LyMcyN5s=.f7907623-d8ca-45e7-9b9b-9ea1b63f837f@github.com>

On Thu, 5 Sep 2024 10:05:12 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail.
>> 
>> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold:
>> 
>> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and
>> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work.
>> 
>> ## Summary of the Changes
>> 
>> ### Platform-Independent Changes (`src/hotspot/share`)
>> 
>> These consist mainly of:
>> 
>> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early;
>> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and
>> - temporary support for porting the JEP to the remaining platforms.
>> 
>> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed.
>> 
>> ### Platform-Dependent Changes (`src/hotspot/cpu`)
>> 
>> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms.
>> 
>> #### ADL Changes
>> 
>> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy.
>> 
>> #### `G1BarrierSetAssembler` Changes
>> 
>> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ...
>
> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Remove unnecessary g1LoadXVolatile instructions in aarch64

I've implemented the same cleanup as on aarch64: https://github.com/TheRealMDoerr/jdk/commit/ad662a256034a09156b1b43673d2640a119740b2
Would be nice if you could apply it. Thanks!
In case you want to merge further updates from head, I have no objections.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2332365001

From duke at openjdk.org  Thu Sep  5 20:45:57 2024
From: duke at openjdk.org (halkosajtarevic)
Date: Thu, 5 Sep 2024 20:45:57 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v15]
In-Reply-To: <k_fBpG4Ihb9993DGWcwnI7Co9qDzh-ajQU8rQuUBTYk=.fd889de7-87f5-44a8-8b0e-c4301771cc5c@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <k_fBpG4Ihb9993DGWcwnI7Co9qDzh-ajQU8rQuUBTYk=.fd889de7-87f5-44a8-8b0e-c4301771cc5c@github.com>
Message-ID: <Fl_ko9RfCnlLwHEvb9oPp3ld1mAb_POxQZVr7cQA0Nw=.08dbde25-f357-4d07-8005-cff37f329388@github.com>

On Thu, 5 Sep 2024 10:05:12 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail.
>> 
>> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold:
>> 
>> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and
>> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work.
>> 
>> ## Summary of the Changes
>> 
>> ### Platform-Independent Changes (`src/hotspot/share`)
>> 
>> These consist mainly of:
>> 
>> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early;
>> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and
>> - temporary support for porting the JEP to the remaining platforms.
>> 
>> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed.
>> 
>> ### Platform-Dependent Changes (`src/hotspot/cpu`)
>> 
>> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms.
>> 
>> #### ADL Changes
>> 
>> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy.
>> 
>> #### `G1BarrierSetAssembler` Changes
>> 
>> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ...
>
> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Remove unnecessary g1LoadXVolatile instructions in aarch64

Sorry, one maybe dumb question, hopefully matching the context here:
Is this whole handling also required if those references point to instances of enums? Or could we do some further optimizations in such cases? Or is that exactly what C2 is doing afterwards?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2332586175

From sjohanss at openjdk.org  Fri Sep  6 07:20:20 2024
From: sjohanss at openjdk.org (Stefan Johansson)
Date: Fri, 6 Sep 2024 07:20:20 GMT
Subject: RFR: 8339387: ZGC: Synchronize medium page allocation
Message-ID: <ziblxZciM5RxCBkyeLaCrbokDcOiymECZfmvwqrhGOc=.e19013cb-e105-45b8-878f-b27ccea55944@github.com>

Please review this change to synchronize medium page allocations in ZGC.

**Summary**
In ZGC objects of a certain size class are allocated in medium sized pages. For each age there is a single medium page shared by all mutators. When this page gets full all thread that try to do a medium page allocation will try to allocate and install a new medium page, but only one will succeed. This can lead to a lot of unnecessary medium page allocation which in turn can lead to the unnecessary page cache flushing.

This change introduces synchronization to only a allow a single thread to allocate the medium page in the common case.  

**Testing**
* Functional testing through mach5 tier1-7 using ZGC
* Performance testing through aurora to verify no regression occur
* Manual testing to verify performance
* Manual testing to verify we avoid page cache flushing

-------------

Commit messages:
 - StefanK comments and reuse of share page addr
 - 8339387: ZGC: Synchronize medium page allocation

Changes: https://git.openjdk.org/jdk/pull/20883/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20883&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8339387
  Stats: 57 lines in 2 files changed: 49 ins; 5 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/20883.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20883/head:pull/20883

PR: https://git.openjdk.org/jdk/pull/20883

From eosterlund at openjdk.org  Fri Sep  6 08:25:52 2024
From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=)
Date: Fri, 6 Sep 2024 08:25:52 GMT
Subject: RFR: 8339387: ZGC: Synchronize medium page allocation
In-Reply-To: <ziblxZciM5RxCBkyeLaCrbokDcOiymECZfmvwqrhGOc=.e19013cb-e105-45b8-878f-b27ccea55944@github.com>
References: <ziblxZciM5RxCBkyeLaCrbokDcOiymECZfmvwqrhGOc=.e19013cb-e105-45b8-878f-b27ccea55944@github.com>
Message-ID: <GU0p3_i28AH9XcirdMjlpp9Ip7P63oaJn3GFomcDbI8=.b1206841-19a4-4a4b-b882-51abf462be90@github.com>

On Fri, 6 Sep 2024 07:14:11 GMT, Stefan Johansson <sjohanss at openjdk.org> wrote:

> Please review this change to synchronize medium page allocations in ZGC.
> 
> **Summary**
> In ZGC objects of a certain size class are allocated in medium sized pages. For each age there is a single medium page shared by all mutators. When this page gets full all thread that try to do a medium page allocation will try to allocate and install a new medium page, but only one will succeed. This can lead to a lot of unnecessary medium page allocation which in turn can lead to the unnecessary page cache flushing.
> 
> This change introduces synchronization to only a allow a single thread to allocate the medium page in the common case.  
> 
> **Testing**
> * Functional testing through mach5 tier1-7 using ZGC
> * Performance testing through aurora to verify no regression occur
> * Manual testing to verify performance
> * Manual testing to verify we avoid page cache flushing

Looks good! Thanks for fixing.

-------------

Marked as reviewed by eosterlund (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20883#pullrequestreview-2285376634

From sjohanss at openjdk.org  Fri Sep  6 08:25:52 2024
From: sjohanss at openjdk.org (Stefan Johansson)
Date: Fri, 6 Sep 2024 08:25:52 GMT
Subject: RFR: 8339387: ZGC: Synchronize medium page allocation
In-Reply-To: <ziblxZciM5RxCBkyeLaCrbokDcOiymECZfmvwqrhGOc=.e19013cb-e105-45b8-878f-b27ccea55944@github.com>
References: <ziblxZciM5RxCBkyeLaCrbokDcOiymECZfmvwqrhGOc=.e19013cb-e105-45b8-878f-b27ccea55944@github.com>
Message-ID: <-eVvccFIXfKeQNnEsbFJpW_C8WUmYx4fArqTUuTBoY4=.8551f684-57ed-4ba3-aff1-db367532a5b9@github.com>

On Fri, 6 Sep 2024 07:14:11 GMT, Stefan Johansson <sjohanss at openjdk.org> wrote:

> Please review this change to synchronize medium page allocations in ZGC.
> 
> **Summary**
> In ZGC objects of a certain size class are allocated in medium sized pages. For each age there is a single medium page shared by all mutators. When this page gets full all thread that try to do a medium page allocation will try to allocate and install a new medium page, but only one will succeed. This can lead to a lot of unnecessary medium page allocation which in turn can lead to the unnecessary page cache flushing.
> 
> This change introduces synchronization to only a allow a single thread to allocate the medium page in the common case.  
> 
> **Testing**
> * Functional testing through mach5 tier1-7 using ZGC
> * Performance testing through aurora to verify no regression occur
> * Manual testing to verify performance
> * Manual testing to verify we avoid page cache flushing

As mentioned in the summary, there is no direct performance improvement seen in most benchmarks by this change. But looking at memory usage from our logs we can see improvements in how ZGC uses memory. 

In the below statistics logging from the end of a benchmark run where medium objects are in use we can see some of the improvements. Even if they don't translate into a score improvement, they will improve the latency of some allocation operations.


Baseline:
[369.264s][info][gc,stats    ]                                      Last 10s              Last 10m 
[369.264s][info][gc,stats    ]                                     Avg / Max             Avg / Max
[369.264s][info][gc,stats    ] Memory: Allocation Rate             438 / 950             684 / 2846            684 / 2846            684 / 2846        MB/s
[369.264s][info][gc,stats    ] Memory: Defragment                    0 / 0                18 / 190              18 / 190              18 / 190         ops/s
[369.264s][info][gc,stats    ] Memory: Page Cache Flush              0 / 0                36 / 380              36 / 380              36 / 380         MB/s
[369.264s][info][gc,stats    ] Memory: Undo Page Allocation          0 / 1                 2 / 71                2 / 71                2 / 71          ops/s

With this change:
[369.104s][info][gc,stats    ] Memory: Allocation Rate             465 / 620             612 / 1086            612 / 1086            612 / 1086        MB/s
[369.104s][info][gc,stats    ] Memory: Defragment                    0 / 0                 0 / 0                 0 / 0                 0 / 0           ops/s
[369.104s][info][gc,stats    ] Memory: Page Cache Flush              0 / 0                 0 / 0                 0 / 0                 0 / 0           MB/s
[369.104s][info][gc,stats    ] Memory: Undo Page Allocation          0 / 0                 0 / 8                 0 / 8                 0 / 8           ops/s


Additional details about the different lines:
**Allocation rate** - The maximum allocation rate is down, because its not inflated by many unnecessary medium page allocation happening at once.
**Defragment** - ZGC try to defragment the virtual address space by remapping memory used by small page from high addresses to low. This will only happen when the page cache only caches medium and large pages, which might be case after a set of medium page allocations that are later undone. In this run all such defragmentations were avoided.
**Page Cache Flush** - When there are no medium (or large) pages available in the cache, the cache needs to be flushed to allow a creation of a new page. When not doing the unnecessary allocations ZGC is able to avoid flushing in this benchmark.
**Undo Page Allocation** - When a page is allocated but later found to not be needed, we undo the page allocation. This can happen for small pages as well, so we still have some undos. But the one for medium pages are avoided.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20883#issuecomment-2333513053

From rcastanedalo at openjdk.org  Fri Sep  6 08:49:35 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Fri, 6 Sep 2024 08:49:35 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v16]
In-Reply-To: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
Message-ID: <sorgHV1J-RQA4-2AD90uObuw5qhEzJYzBwqOkH-n7uA=.4be0ed36-d17e-4c7e-9c28-da79951b57ef@github.com>

> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail.
> 
> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold:
> 
> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and
> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work.
> 
> ## Summary of the Changes
> 
> ### Platform-Independent Changes (`src/hotspot/share`)
> 
> These consist mainly of:
> 
> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early;
> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and
> - temporary support for porting the JEP to the remaining platforms.
> 
> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed.
> 
> ### Platform-Dependent Changes (`src/hotspot/cpu`)
> 
> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms.
> 
> #### ADL Changes
> 
> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy.
> 
> #### `G1BarrierSetAssembler` Changes
> 
> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c...

Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision:

 - Merge remote-tracking branch 'TheRealMDoerr/8334111_PPC64_G1_Barriers_V2' into JDK-8334060-g1-late-barrier-expansion
 - Cleanup g1_ppc.ad

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19746/files
  - new: https://git.openjdk.org/jdk/pull/19746/files/9821e795..22e07ef0

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=15
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=14-15

  Stats: 40 lines in 1 file changed: 4 ins; 30 del; 6 mod
  Patch: https://git.openjdk.org/jdk/pull/19746.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746

PR: https://git.openjdk.org/jdk/pull/19746

From rcastanedalo at openjdk.org  Fri Sep  6 08:49:35 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Fri, 6 Sep 2024 08:49:35 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v15]
In-Reply-To: <k_fBpG4Ihb9993DGWcwnI7Co9qDzh-ajQU8rQuUBTYk=.fd889de7-87f5-44a8-8b0e-c4301771cc5c@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <k_fBpG4Ihb9993DGWcwnI7Co9qDzh-ajQU8rQuUBTYk=.fd889de7-87f5-44a8-8b0e-c4301771cc5c@github.com>
Message-ID: <hYNygMwTW-jr3X5lFlHsggCznUnkzFovoppv-__zdcs=.7cd31bf4-b3f0-43e2-ad67-fcd88d94db72@github.com>

On Thu, 5 Sep 2024 10:05:12 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail.
>> 
>> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold:
>> 
>> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and
>> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work.
>> 
>> ## Summary of the Changes
>> 
>> ### Platform-Independent Changes (`src/hotspot/share`)
>> 
>> These consist mainly of:
>> 
>> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early;
>> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and
>> - temporary support for porting the JEP to the remaining platforms.
>> 
>> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed.
>> 
>> ### Platform-Dependent Changes (`src/hotspot/cpu`)
>> 
>> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms.
>> 
>> #### ADL Changes
>> 
>> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy.
>> 
>> #### `G1BarrierSetAssembler` Changes
>> 
>> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ...
>
> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Remove unnecessary g1LoadXVolatile instructions in aarch64

> I've implemented the same cleanup as on aarch64: [TheRealMDoerr at ad662a2](https://github.com/TheRealMDoerr/jdk/commit/ad662a256034a09156b1b43673d2640a119740b2) Would be nice if you could apply it. Thanks!

Sure, merged now (commit 22e07ef03a).

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2333553391

From rcastanedalo at openjdk.org  Fri Sep  6 09:43:57 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Fri, 6 Sep 2024 09:43:57 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v15]
In-Reply-To: <Fl_ko9RfCnlLwHEvb9oPp3ld1mAb_POxQZVr7cQA0Nw=.08dbde25-f357-4d07-8005-cff37f329388@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <k_fBpG4Ihb9993DGWcwnI7Co9qDzh-ajQU8rQuUBTYk=.fd889de7-87f5-44a8-8b0e-c4301771cc5c@github.com>
 <Fl_ko9RfCnlLwHEvb9oPp3ld1mAb_POxQZVr7cQA0Nw=.08dbde25-f357-4d07-8005-cff37f329388@github.com>
Message-ID: <01AjvN-E7dfMD3IbJHbjLpHwe3VbMuPPloi3vt7Bxxk=.bc1bb5e5-6cd9-414d-8c76-a224fe856dcd@github.com>

On Thu, 5 Sep 2024 20:36:01 GMT, halkosajtarevic <duke at openjdk.org> wrote:

> Sorry, one maybe dumb question, hopefully matching the context here: Is this whole handling also required if those references point to instances of enums? Or could we do some further optimizations in such cases? Or is that exactly what C2 is doing afterwards?

Hi, do you mean whether G1 requires barriers when writing enum instances into object fields, as in `storeEnum` in this example?


  (...)

  public enum Day {MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY};

  static class MyObject {
    Day day;
  }

  public static void storeEnum(MyObject o, Day d) {
    o.day = d;
  }

  (...)

    MyObject o = new MyObject();
    Day d = Day.TUESDAY;
    storeEnum(o, d);

  (...)


If so, the answer is yes: C2 treats this case as any other object write and generates GC barriers accordingly. Do you have any specific optimization in mind?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2333674779

From duke at openjdk.org  Fri Sep  6 10:14:59 2024
From: duke at openjdk.org (halkosajtarevic)
Date: Fri, 6 Sep 2024 10:14:59 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v16]
In-Reply-To: <sorgHV1J-RQA4-2AD90uObuw5qhEzJYzBwqOkH-n7uA=.4be0ed36-d17e-4c7e-9c28-da79951b57ef@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <sorgHV1J-RQA4-2AD90uObuw5qhEzJYzBwqOkH-n7uA=.4be0ed36-d17e-4c7e-9c28-da79951b57ef@github.com>
Message-ID: <VTMCuZaZozrhGO7p4LQzRoEwvEfpoDaF1l2ka8oGzis=.6feefbef-d25d-447c-adfa-e8dd84c5d013@github.com>

On Fri, 6 Sep 2024 08:49:35 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail.
>> 
>> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold:
>> 
>> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and
>> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work.
>> 
>> ## Summary of the Changes
>> 
>> ### Platform-Independent Changes (`src/hotspot/share`)
>> 
>> These consist mainly of:
>> 
>> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early;
>> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and
>> - temporary support for porting the JEP to the remaining platforms.
>> 
>> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed.
>> 
>> ### Platform-Dependent Changes (`src/hotspot/cpu`)
>> 
>> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms.
>> 
>> #### ADL Changes
>> 
>> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy.
>> 
>> #### `G1BarrierSetAssembler` Changes
>> 
>> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ...
>
> Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Merge remote-tracking branch 'TheRealMDoerr/8334111_PPC64_G1_Barriers_V2' into JDK-8334060-g1-late-barrier-expansion
>  - Cleanup g1_ppc.ad

Yes exactly, that was what I meant.
I was just thinking whether it is necessary to do these barriers for enums at all. They will most likely always reside in a different region/generation anyways, and will (but that may be wrong) never be collected.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2333731725

From mbaesken at openjdk.org  Fri Sep  6 10:32:01 2024
From: mbaesken at openjdk.org (Matthias Baesken)
Date: Fri, 6 Sep 2024 10:32:01 GMT
Subject: RFR: 8339648: ZGC: Division by zero in rule_major_allocation_rate
Message-ID: <ElUYTplxxvkJLtxj5OdpcsgumQV8zdml9oyQim2kBP8=.695e816f-a639-4b39-92d2-ea5de5cb1037@github.com>

The HS jtreg test gc/stringdedup/TestStringDeduplicationAgeThreshold_ZGenerational
shows this error when running with ubsan enabled 

src/hotspot/share/gc/z/zDirector.cpp:491:74: runtime error: division by zero
    #0 0x7f09886401d4 in rule_major_allocation_rate src/hotspot/share/gc/z/zDirector.cpp:491
    #1 0x7f09886401d4 in start_gc src/hotspot/share/gc/z/zDirector.cpp:822
    #2 0x7f09886401d4 in ZDirector::run_thread() src/hotspot/share/gc/z/zDirector.cpp:912
    #3 0x7f098c1404e8 in ZThread::run_service() src/hotspot/share/gc/z/zThread.cpp:29
    #4 0x7f09897cac19 in ConcurrentGCThread::run() src/hotspot/share/gc/shared/concurrentGCThread.cpp:48
    #5 0x7f098bb46b0a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225
    #6 0x7f098b1a9881 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858

-------------

Commit messages:
 - JDK-8339648

Changes: https://git.openjdk.org/jdk/pull/20888/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20888&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8339648
  Stats: 6 lines in 1 file changed: 4 ins; 1 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/20888.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20888/head:pull/20888

PR: https://git.openjdk.org/jdk/pull/20888

From mbaesken at openjdk.org  Fri Sep  6 10:38:49 2024
From: mbaesken at openjdk.org (Matthias Baesken)
Date: Fri, 6 Sep 2024 10:38:49 GMT
Subject: RFR: 8339648: ZGC: Division by zero in rule_major_allocation_rate
In-Reply-To: <ElUYTplxxvkJLtxj5OdpcsgumQV8zdml9oyQim2kBP8=.695e816f-a639-4b39-92d2-ea5de5cb1037@github.com>
References: <ElUYTplxxvkJLtxj5OdpcsgumQV8zdml9oyQim2kBP8=.695e816f-a639-4b39-92d2-ea5de5cb1037@github.com>
Message-ID: <JD9hTl3H_zCVH7YsmUeTdPRTt02BlK5LPQBPBM_JbuY=.d8242ac6-9d74-4857-b0cd-ceb4a762760f@github.com>

On Fri, 6 Sep 2024 10:26:19 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

> The HS jtreg test gc/stringdedup/TestStringDeduplicationAgeThreshold_ZGenerational
> shows this error when running with ubsan enabled 
> 
> src/hotspot/share/gc/z/zDirector.cpp:491:74: runtime error: division by zero
>     #0 0x7f09886401d4 in rule_major_allocation_rate src/hotspot/share/gc/z/zDirector.cpp:491
>     #1 0x7f09886401d4 in start_gc src/hotspot/share/gc/z/zDirector.cpp:822
>     #2 0x7f09886401d4 in ZDirector::run_thread() src/hotspot/share/gc/z/zDirector.cpp:912
>     #3 0x7f098c1404e8 in ZThread::run_service() src/hotspot/share/gc/z/zThread.cpp:29
>     #4 0x7f09897cac19 in ConcurrentGCThread::run() src/hotspot/share/gc/shared/concurrentGCThread.cpp:48
>     #5 0x7f098bb46b0a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225
>     #6 0x7f098b1a9881 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858

So we should avoid the division in case the divisor is zero and rewrite the coding a bit.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20888#issuecomment-2333770996

From amitkumar at openjdk.org  Fri Sep  6 10:43:54 2024
From: amitkumar at openjdk.org (Amit Kumar)
Date: Fri, 6 Sep 2024 10:43:54 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v15]
In-Reply-To: <01AjvN-E7dfMD3IbJHbjLpHwe3VbMuPPloi3vt7Bxxk=.bc1bb5e5-6cd9-414d-8c76-a224fe856dcd@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <k_fBpG4Ihb9993DGWcwnI7Co9qDzh-ajQU8rQuUBTYk=.fd889de7-87f5-44a8-8b0e-c4301771cc5c@github.com>
 <Fl_ko9RfCnlLwHEvb9oPp3ld1mAb_POxQZVr7cQA0Nw=.08dbde25-f357-4d07-8005-cff37f329388@github.com>
 <01AjvN-E7dfMD3IbJHbjLpHwe3VbMuPPloi3vt7Bxxk=.bc1bb5e5-6cd9-414d-8c76-a224fe856dcd@github.com>
Message-ID: <5SBKgUwrPmIXH0hA64aKRsYZiHMg0M0uh_IjFq_xdAo=.f323ec69-adf3-4722-a5cb-0c49cfb8c5b1@github.com>

On Fri, 6 Sep 2024 09:40:56 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> Sorry, one maybe dumb question, hopefully matching the context here:
>> Is this whole handling also required if those references point to instances of enums? Or could we do some further optimizations in such cases? Or is that exactly what C2 is doing afterwards?
>
>> Sorry, one maybe dumb question, hopefully matching the context here: Is this whole handling also required if those references point to instances of enums? Or could we do some further optimizations in such cases? Or is that exactly what C2 is doing afterwards?
> 
> Hi, do you mean whether G1 requires barriers when writing enum instances into object fields, as in `storeEnum` in this example?
> 
> 
>   (...)
> 
>   public enum Day {MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY};
> 
>   static class MyObject {
>     Day day;
>   }
> 
>   public static void storeEnum(MyObject o, Day d) {
>     o.day = d;
>   }
> 
>   (...)
> 
>     MyObject o = new MyObject();
>     Day d = Day.TUESDAY;
>     storeEnum(o, d);
> 
>   (...)
> 
> 
> If so, the answer is yes: C2 treats this case as any other object write and generates GC barriers accordingly. Do you have any specific optimization in mind?

Hi @robcasloz, 
you can pick up s390x patch from here: https://github.com/offamitkumar/jdk/commit/6663433c4aa17925f699eaa8995cdc0cd78c0034

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2333779374

From rcastanedalo at openjdk.org  Fri Sep  6 12:07:57 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Fri, 6 Sep 2024 12:07:57 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v16]
In-Reply-To: <VTMCuZaZozrhGO7p4LQzRoEwvEfpoDaF1l2ka8oGzis=.6feefbef-d25d-447c-adfa-e8dd84c5d013@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <sorgHV1J-RQA4-2AD90uObuw5qhEzJYzBwqOkH-n7uA=.4be0ed36-d17e-4c7e-9c28-da79951b57ef@github.com>
 <VTMCuZaZozrhGO7p4LQzRoEwvEfpoDaF1l2ka8oGzis=.6feefbef-d25d-447c-adfa-e8dd84c5d013@github.com>
Message-ID: <YCqd-SwWRTx-gxIqnJcGrfDs_BY4yVLqgnky67P2nUg=.4db33253-d4c2-4b81-91e1-7005be6f3d98@github.com>

On Fri, 6 Sep 2024 10:12:19 GMT, halkosajtarevic <duke at openjdk.org> wrote:

> I was just thinking whether it is necessary to do these barriers for enums at all. They will most likely always reside in a different region/generation anyways, and will (but that may be wrong) never be collected.

As far as I understand, from a GC perspective enum instances are handled just like any other class instance, so I do not think there is any special GC barrier optimization opportunity for them. Maybe some GC expert can comment further on this or correct me if I am wrong.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2333907222

From duke at openjdk.org  Fri Sep  6 12:49:01 2024
From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=)
Date: Fri, 6 Sep 2024 12:49:01 GMT
Subject: RFR: 8339661: ZGC: Move some page resets and verification to callsites
Message-ID: <oiSYmbM4LhTgP1NIAWbOceKhb1PLUEKYJhKj9RbdcJA=.21409309-eb6d-4685-86c5-3371037a3b52@github.com>

Currently, `ZPage::reset()` does different things depending on where the page is being reset and what it will be used for. This leads to checks being somewhat hard to understand and to follow the edge-cases of.

By moving some of the reset logic that is now part of `ZPage::reset()` to the callsite, some operations can be made easier to understand the reason behind when reading the code.

Main highlights:
- Clear logic behind initializing remsets at callsites, now guarded by asserts in `ZPage::remset_initialize()`.
- `ZPage::clone_limited()` retains the value of the top-pointer.
- The kind of verification for remsets are now at callsites:
  - Allocations from the page cache, and only if the page got a remset
  - Old-to-old in-place relocations, where only the inactive remset is checked

-------------

Commit messages:
 - 8339661: ZGC: Move some page resets and verification to callsites

Changes: https://git.openjdk.org/jdk/pull/20890/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20890&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8339661
  Stats: 127 lines in 6 files changed: 34 ins; 64 del; 29 mod
  Patch: https://git.openjdk.org/jdk/pull/20890.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20890/head:pull/20890

PR: https://git.openjdk.org/jdk/pull/20890

From aboldtch at openjdk.org  Fri Sep  6 12:51:55 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Fri, 6 Sep 2024 12:51:55 GMT
Subject: RFR: 8339648: ZGC: Division by zero in rule_major_allocation_rate
In-Reply-To: <ElUYTplxxvkJLtxj5OdpcsgumQV8zdml9oyQim2kBP8=.695e816f-a639-4b39-92d2-ea5de5cb1037@github.com>
References: <ElUYTplxxvkJLtxj5OdpcsgumQV8zdml9oyQim2kBP8=.695e816f-a639-4b39-92d2-ea5de5cb1037@github.com>
Message-ID: <qWnxK1XZU3BABIrcVqkL9mbqMLWoE0fJj5rJc7cJf3M=.78c31c28-be84-4118-9807-2ea549bc76eb@github.com>

On Fri, 6 Sep 2024 10:26:19 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

> The HS jtreg test gc/stringdedup/TestStringDeduplicationAgeThreshold_ZGenerational
> shows this error when running with ubsan enabled 
> 
> src/hotspot/share/gc/z/zDirector.cpp:491:74: runtime error: division by zero
>     #0 0x7f09886401d4 in rule_major_allocation_rate src/hotspot/share/gc/z/zDirector.cpp:491
>     #1 0x7f09886401d4 in start_gc src/hotspot/share/gc/z/zDirector.cpp:822
>     #2 0x7f09886401d4 in ZDirector::run_thread() src/hotspot/share/gc/z/zDirector.cpp:912
>     #3 0x7f098c1404e8 in ZThread::run_service() src/hotspot/share/gc/z/zThread.cpp:29
>     #4 0x7f09897cac19 in ConcurrentGCThread::run() src/hotspot/share/gc/shared/concurrentGCThread.cpp:48
>     #5 0x7f098bb46b0a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225
>     #6 0x7f098b1a9881 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858

src/hotspot/share/gc/z/zDirector.cpp line 524:

> 522:     const double current_old_gc_time_per_bytes_freed = double(old_gc_time) / double(reclaimed_per_old_gc);
> 523:     old_garbage_is_cheaper = current_old_gc_time_per_bytes_freed < current_young_gc_time_per_bytes_freed;
> 524:   }

Ending up with `old_garbage_is_cheaper == true` when  `reclaimed_per_old_gc == 0` seems wrong to me. 

Division by 0.0 is weird in C++. Do we even build for systems where it would not be supported. But regardless to me I feel like the change here should be more like:

-  const double current_old_gc_time_per_bytes_freed = double(old_gc_time) / double(reclaimed_per_old_gc);
+  const double current_old_gc_time_per_bytes_freed = reclaimed_per_old_gc == 0 ? std::numeric_limits<double>::infinity : double(old_gc_time) / double(reclaimed_per_old_gc);

Which is the behaviour I expect us to currently have, given that `old_gc_time` should be a positive number (`>0.0`).  The `!stats._old_stats._cycle._is_time_trustable` check above should protect against `0.0`.

I expect that this division we see happens when we have run a warmup major collection which did no reclaim any memory.  And this change would trigger us to try and promote a minor collection to a major collection.

I am no expert on our supported platforms matrix w.r.t. floating numbers and `std::numeric_limits<T>::has_infinity`.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20888#discussion_r1747065752

From stefank at openjdk.org  Fri Sep  6 13:03:48 2024
From: stefank at openjdk.org (Stefan Karlsson)
Date: Fri, 6 Sep 2024 13:03:48 GMT
Subject: RFR: 8339661: ZGC: Move some page resets and verification to
 callsites
In-Reply-To: <oiSYmbM4LhTgP1NIAWbOceKhb1PLUEKYJhKj9RbdcJA=.21409309-eb6d-4685-86c5-3371037a3b52@github.com>
References: <oiSYmbM4LhTgP1NIAWbOceKhb1PLUEKYJhKj9RbdcJA=.21409309-eb6d-4685-86c5-3371037a3b52@github.com>
Message-ID: <ML3aOke915yqdmmoJTjsiIk48NBHywdeEsObLozaE8w=.e631d22d-b040-4eeb-aace-ba81046afd51@github.com>

On Fri, 6 Sep 2024 12:43:28 GMT, Joel Sikstr?m <duke at openjdk.org> wrote:

> Currently, `ZPage::reset()` does different things depending on where the page is being reset and what it will be used for. This leads to checks being somewhat hard to understand and to follow the edge-cases of.
> 
> By moving some of the reset logic that is now part of `ZPage::reset()` to the callsite, some operations can be made easier to understand the reason behind when reading the code.
> 
> Main highlights:
> - Clear logic behind initializing remsets at callsites, now guarded by asserts in `ZPage::remset_initialize()`.
> - `ZPage::clone_limited()` retains the value of the top-pointer.
> - The kind of verification for remsets are now at callsites:
>   - Allocations from the page cache, and only if the page got a remset
>   - Old-to-old in-place relocations, where only the inactive remset is checked

Looks good!

-------------

Marked as reviewed by stefank (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20890#pullrequestreview-2286216362

From rcastanedalo at openjdk.org  Fri Sep  6 14:15:41 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Fri, 6 Sep 2024 14:15:41 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v17]
In-Reply-To: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
Message-ID: <DhU_l-0fczpPrZ8h-99waBbHcEOcQs-BmRM8_uIdPQ4=.6e5de896-d87c-4a9d-8c77-b155b29618a1@github.com>

> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail.
> 
> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold:
> 
> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and
> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work.
> 
> ## Summary of the Changes
> 
> ### Platform-Independent Changes (`src/hotspot/share`)
> 
> These consist mainly of:
> 
> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early;
> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and
> - temporary support for porting the JEP to the remaining platforms.
> 
> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed.
> 
> ### Platform-Dependent Changes (`src/hotspot/cpu`)
> 
> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms.
> 
> #### ADL Changes
> 
> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy.
> 
> #### `G1BarrierSetAssembler` Changes
> 
> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c...

Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision:

  s390 port : late barrier expansion

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19746/files
  - new: https://git.openjdk.org/jdk/pull/19746/files/22e07ef0..6663433c

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=16
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=15-16

  Stats: 896 lines in 8 files changed: 837 ins; 32 del; 27 mod
  Patch: https://git.openjdk.org/jdk/pull/19746.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746

PR: https://git.openjdk.org/jdk/pull/19746

From rcastanedalo at openjdk.org  Fri Sep  6 14:15:42 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Fri, 6 Sep 2024 14:15:42 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v16]
In-Reply-To: <YCqd-SwWRTx-gxIqnJcGrfDs_BY4yVLqgnky67P2nUg=.4db33253-d4c2-4b81-91e1-7005be6f3d98@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <sorgHV1J-RQA4-2AD90uObuw5qhEzJYzBwqOkH-n7uA=.4be0ed36-d17e-4c7e-9c28-da79951b57ef@github.com>
 <VTMCuZaZozrhGO7p4LQzRoEwvEfpoDaF1l2ka8oGzis=.6feefbef-d25d-447c-adfa-e8dd84c5d013@github.com>
 <YCqd-SwWRTx-gxIqnJcGrfDs_BY4yVLqgnky67P2nUg=.4db33253-d4c2-4b81-91e1-7005be6f3d98@github.com>
Message-ID: <qpMwBJRKmb9KvTqHW-H89OfTMzGEslJQqiGm3lAqx-8=.961cb6bb-baa7-4d4b-ab57-1968bd7f48a1@github.com>

On Fri, 6 Sep 2024 12:04:52 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> Yes exactly, that was what I meant.
>> I was just thinking whether it is necessary to do these barriers for enums at all. They will most likely always reside in a different region/generation anyways, and will (but that may be wrong) never be collected.
>
>> I was just thinking whether it is necessary to do these barriers for enums at all. They will most likely always reside in a different region/generation anyways, and will (but that may be wrong) never be collected.
> 
> As far as I understand, from a GC perspective enum instances are handled just like any other class instance, so I do not think there is any special GC barrier optimization opportunity for them. Maybe some GC expert can comment further on this or correct me if I am wrong.

> Hi @robcasloz, you can pick up s390x patch from here: [offamitkumar at 6663433](https://github.com/offamitkumar/jdk/commit/6663433c4aa17925f699eaa8995cdc0cd78c0034)

Done, thanks!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2334130205

From kbarrett at openjdk.org  Fri Sep  6 20:26:09 2024
From: kbarrett at openjdk.org (Kim Barrett)
Date: Fri, 6 Sep 2024 20:26:09 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v16]
In-Reply-To: <YCqd-SwWRTx-gxIqnJcGrfDs_BY4yVLqgnky67P2nUg=.4db33253-d4c2-4b81-91e1-7005be6f3d98@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <sorgHV1J-RQA4-2AD90uObuw5qhEzJYzBwqOkH-n7uA=.4be0ed36-d17e-4c7e-9c28-da79951b57ef@github.com>
 <VTMCuZaZozrhGO7p4LQzRoEwvEfpoDaF1l2ka8oGzis=.6feefbef-d25d-447c-adfa-e8dd84c5d013@github.com>
 <YCqd-SwWRTx-gxIqnJcGrfDs_BY4yVLqgnky67P2nUg=.4db33253-d4c2-4b81-91e1-7005be6f3d98@github.com>
Message-ID: <jjyV3PHO3r3l2mHYkeDCTLWZrZ2bzbnybsaIXc02SIY=.bea36c1b-8299-4f24-a30b-86621cbb23d9@github.com>

On Fri, 6 Sep 2024 12:04:52 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

> > I was just thinking whether it is necessary to do these barriers for enums at all. They will most likely always reside in a different region/generation anyways, and will (but that may be wrong) never be collected.
> 
> As far as I understand, from a GC perspective enum instances are handled just like any other class instance, so I do not think there is any special GC barrier optimization opportunity for them. Maybe some GC expert can comment further on this or correct me if I am wrong.

@robcasloz is correct, the GCs don't have any special knowledge about enum instances.  They are ordinary objects,
though probably long-lived so will eventually migrate to the old generation.  Trying to do anything special with them
seems very unlikely to provide a benefit worth the costs involved.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2334754544

From duke at openjdk.org  Fri Sep  6 20:26:10 2024
From: duke at openjdk.org (halkosajtarevic)
Date: Fri, 6 Sep 2024 20:26:10 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v16]
In-Reply-To: <jjyV3PHO3r3l2mHYkeDCTLWZrZ2bzbnybsaIXc02SIY=.bea36c1b-8299-4f24-a30b-86621cbb23d9@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <sorgHV1J-RQA4-2AD90uObuw5qhEzJYzBwqOkH-n7uA=.4be0ed36-d17e-4c7e-9c28-da79951b57ef@github.com>
 <VTMCuZaZozrhGO7p4LQzRoEwvEfpoDaF1l2ka8oGzis=.6feefbef-d25d-447c-adfa-e8dd84c5d013@github.com>
 <YCqd-SwWRTx-gxIqnJcGrfDs_BY4yVLqgnky67P2nUg=.4db33253-d4c2-4b81-91e1-7005be6f3d98@github.com>
 <jjyV3PHO3r3l2mHYkeDCTLWZrZ2bzbnybsaIXc02SIY=.bea36c1b-8299-4f24-a30b-86621cbb23d9@github.com>
Message-ID: <WSLtFFTnot2EC1dTVC7a6Z4_0iKvgeBvtsg6IdEzHGI=.d1604e15-1a14-4818-82c3-eb6532f2d2b8@github.com>

On Fri, 6 Sep 2024 20:21:11 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:

> > > I was just thinking whether it is necessary to do these barriers for enums at all. They will most likely always reside in a different region/generation anyways, and will (but that may be wrong) never be collected.
> > 
> > 
> > As far as I understand, from a GC perspective enum instances are handled just like any other class instance, so I do not think there is any special GC barrier optimization opportunity for them. Maybe some GC expert can comment further on this or correct me if I am wrong.
> 
> @robcasloz is correct, the GCs don't have any special knowledge about enum instances. They are ordinary objects, though probably long-lived so will eventually migrate to the old generation. Trying to do anything special with them seems very unlikely to provide a benefit worth the costs involved.

Thank you very much for the insights!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2334756865

From kbarrett at openjdk.org  Sat Sep  7 04:15:14 2024
From: kbarrett at openjdk.org (Kim Barrett)
Date: Sat, 7 Sep 2024 04:15:14 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v17]
In-Reply-To: <DhU_l-0fczpPrZ8h-99waBbHcEOcQs-BmRM8_uIdPQ4=.6e5de896-d87c-4a9d-8c77-b155b29618a1@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <DhU_l-0fczpPrZ8h-99waBbHcEOcQs-BmRM8_uIdPQ4=.6e5de896-d87c-4a9d-8c77-b155b29618a1@github.com>
Message-ID: <V1ue84GBUJsrKD6LtJtHAJpnXiD_Vs_mlJoAOfDkKFw=.3360375c-5367-41c9-b3f8-0d1c6ae98513@github.com>

On Fri, 6 Sep 2024 14:15:41 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail.
>> 
>> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold:
>> 
>> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and
>> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work.
>> 
>> ## Summary of the Changes
>> 
>> ### Platform-Independent Changes (`src/hotspot/share`)
>> 
>> These consist mainly of:
>> 
>> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early;
>> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and
>> - temporary support for porting the JEP to the remaining platforms.
>> 
>> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed.
>> 
>> ### Platform-Dependent Changes (`src/hotspot/cpu`)
>> 
>> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms.
>> 
>> #### ADL Changes
>> 
>> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy.
>> 
>> #### `G1BarrierSetAssembler` Changes
>> 
>> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ...
>
> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision:
> 
>   s390 port : late barrier expansion

I've reviewed the non-compiler GC changes.  I've looked over the compiler changes,
but can't claim to have reviewed them.  I've also reviewed the x64 changes, and
looked over the aarch64 changes.

src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 176:

> 174:   __ jcc(Assembler::zero, runtime);                           // jump to runtime if index == 0 (full buffer)
> 175:   // The buffer is not full, store value into it.
> 176:   __ subptr(temp, wordSize);                                  // temp := next index

Instead of 

  __ testptr(temp, temp);
  __ jcc(Assembler::zero, runtime);
  __ subptr(temp, wordSize);

it seems like this might be better

  __ subptr(temp, wordSize);
  __ jcc(Assembler::below, runtime);

I think the code in the PR matches what the early expansion generates, so I think a change here
can be deferred to a followup.

src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 354:

> 352:   __ bind(runtime);
> 353:   // save the live input values
> 354:   RegSet saved = RegSet::of(store_addr NOT_LP64(COMMA thread));

I was looking at this a while ago, and haven't figured out why we're saving `store_addr` here.
Also not sure why we're saving `thread` here for 32bit platforms.
Something to think about for the future.  Though maybe the 32bit case will be gone by then :)

src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 112:

> 110:           // The answer is that stores of different sizes can co-exist
> 111:           // in the same sequence of RawMem effects.  We sometimes initialize
> 112:           // a whole 'tile' of array elements with a single jint or jlong.)

I'm having trouble making sense of this comment.  I guess a jlong could be used to null-initialize two
32bit oops/narrowOops?  But that doesn't have anything to do with jints.

-------------

Marked as reviewed by kbarrett (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19746#pullrequestreview-2287188386
PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1747741376
PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1747824868
PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1747898995

From mdoerr at openjdk.org  Sat Sep  7 12:40:10 2024
From: mdoerr at openjdk.org (Martin Doerr)
Date: Sat, 7 Sep 2024 12:40:10 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v17]
In-Reply-To: <DhU_l-0fczpPrZ8h-99waBbHcEOcQs-BmRM8_uIdPQ4=.6e5de896-d87c-4a9d-8c77-b155b29618a1@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <DhU_l-0fczpPrZ8h-99waBbHcEOcQs-BmRM8_uIdPQ4=.6e5de896-d87c-4a9d-8c77-b155b29618a1@github.com>
Message-ID: <j3yMY_VrnBkrCj3tqg1ZwcpXwKVPjML5xTlcBlPSqzY=.09d4c71f-89c0-4f85-9fa0-c1ac887dc0ce@github.com>

On Fri, 6 Sep 2024 14:15:41 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail.
>> 
>> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold:
>> 
>> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and
>> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work.
>> 
>> ## Summary of the Changes
>> 
>> ### Platform-Independent Changes (`src/hotspot/share`)
>> 
>> These consist mainly of:
>> 
>> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early;
>> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and
>> - temporary support for porting the JEP to the remaining platforms.
>> 
>> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed.
>> 
>> ### Platform-Dependent Changes (`src/hotspot/cpu`)
>> 
>> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms.
>> 
>> #### ADL Changes
>> 
>> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy.
>> 
>> #### `G1BarrierSetAssembler` Changes
>> 
>> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ...
>
> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision:
> 
>   s390 port : late barrier expansion

I've run the current version through our nightly tests and there were no issues. (Applied on JDK head and tested on x86_64, aarch64 and PPC64 with several OSes.) Performance also looks good on PPC64. No regression observed.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2335174688

From fjiang at openjdk.org  Mon Sep  9 06:09:12 2024
From: fjiang at openjdk.org (Feilong Jiang)
Date: Mon, 9 Sep 2024 06:09:12 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v16]
In-Reply-To: <qpMwBJRKmb9KvTqHW-H89OfTMzGEslJQqiGm3lAqx-8=.961cb6bb-baa7-4d4b-ab57-1968bd7f48a1@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <sorgHV1J-RQA4-2AD90uObuw5qhEzJYzBwqOkH-n7uA=.4be0ed36-d17e-4c7e-9c28-da79951b57ef@github.com>
 <VTMCuZaZozrhGO7p4LQzRoEwvEfpoDaF1l2ka8oGzis=.6feefbef-d25d-447c-adfa-e8dd84c5d013@github.com>
 <YCqd-SwWRTx-gxIqnJcGrfDs_BY4yVLqgnky67P2nUg=.4db33253-d4c2-4b81-91e1-7005be6f3d98@github.com>
 <qpMwBJRKmb9KvTqHW-H89OfTMzGEslJQqiGm3lAqx-8=.961cb6bb-baa7-4d4b-ab57-1968bd7f48a1@github.com>
Message-ID: <2Iqb8t5nI61Zq22PafvY9QUUw_9OZ7oHygSdOY6QCX8=.f1338ef5-d646-45aa-bcb6-54f0dd13bc87@github.com>

On Fri, 6 Sep 2024 14:02:58 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>>> I was just thinking whether it is necessary to do these barriers for enums at all. They will most likely always reside in a different region/generation anyways, and will (but that may be wrong) never be collected.
>> 
>> As far as I understand, from a GC perspective enum instances are handled just like any other class instance, so I do not think there is any special GC barrier optimization opportunity for them. Maybe some GC expert can comment further on this or correct me if I am wrong.
>
>> Hi @robcasloz, you can pick up s390x patch from here: [offamitkumar at 6663433](https://github.com/offamitkumar/jdk/commit/6663433c4aa17925f699eaa8995cdc0cd78c0034)
> 
> Done, thanks!

> > Hi @robcasloz, here is the implementation for RISC-V: [feilongjiang at 1c012cf](https://github.com/feilongjiang/jdk/commit/1c012cfd4a1f4d6e39d6f4798281423f884e97a6) We are still testing the latest changes, results will be updated later.
> 
> Great, thanks @feilongjiang! Just let me know when you are done with testing and I will merge it into this changeset.

Tier1-3 & hotspot:tier4 test result is clean on linux-riscv64 platform. No regression observed for performance. (Applied on JDK head).

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2337203016

From aboldtch at openjdk.org  Mon Sep  9 06:18:08 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Mon, 9 Sep 2024 06:18:08 GMT
Subject: RFR: 8339387: ZGC: Synchronize medium page allocation
In-Reply-To: <ziblxZciM5RxCBkyeLaCrbokDcOiymECZfmvwqrhGOc=.e19013cb-e105-45b8-878f-b27ccea55944@github.com>
References: <ziblxZciM5RxCBkyeLaCrbokDcOiymECZfmvwqrhGOc=.e19013cb-e105-45b8-878f-b27ccea55944@github.com>
Message-ID: <UeTJNewXK3MbRUbhbEXJcIcGfIbU38eKZJHW-j6Amjw=.897225dc-f1e3-426e-aaeb-01d23f78d665@github.com>

On Fri, 6 Sep 2024 07:14:11 GMT, Stefan Johansson <sjohanss at openjdk.org> wrote:

> Please review this change to synchronize medium page allocations in ZGC.
> 
> **Summary**
> In ZGC objects of a certain size class are allocated in medium sized pages. For each age there is a single medium page shared by all mutators. When this page gets full all thread that try to do a medium page allocation will try to allocate and install a new medium page, but only one will succeed. This can lead to a lot of unnecessary medium page allocation which in turn can lead to the unnecessary page cache flushing.
> 
> This change introduces synchronization to only a allow a single thread to allocate the medium page in the common case.  
> 
> **Testing**
> * Functional testing through mach5 tier1-7 using ZGC
> * Performance testing through aurora to verify no regression occur
> * Manual testing to verify performance
> * Manual testing to verify we avoid page cache flushing

lgtm.

-------------

Marked as reviewed by aboldtch (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20883#pullrequestreview-2288903518

From sjohanss at openjdk.org  Mon Sep  9 06:46:25 2024
From: sjohanss at openjdk.org (Stefan Johansson)
Date: Mon, 9 Sep 2024 06:46:25 GMT
Subject: RFR: 8339387: ZGC: Synchronize medium page allocation [v2]
In-Reply-To: <ziblxZciM5RxCBkyeLaCrbokDcOiymECZfmvwqrhGOc=.e19013cb-e105-45b8-878f-b27ccea55944@github.com>
References: <ziblxZciM5RxCBkyeLaCrbokDcOiymECZfmvwqrhGOc=.e19013cb-e105-45b8-878f-b27ccea55944@github.com>
Message-ID: <IrUKgazsEx8Bq48S3M4E91q4dzzMhyoItwSDQ7HGdtA=.b20ae5ae-4f11-40fb-8e54-c78ccdcc37fa@github.com>

> Please review this change to synchronize medium page allocations in ZGC.
> 
> **Summary**
> In ZGC objects of a certain size class are allocated in medium sized pages. For each age there is a single medium page shared by all mutators. When this page gets full all thread that try to do a medium page allocation will try to allocate and install a new medium page, but only one will succeed. This can lead to a lot of unnecessary medium page allocation which in turn can lead to the unnecessary page cache flushing.
> 
> This change introduces synchronization to only a allow a single thread to allocate the medium page in the common case.  
> 
> **Testing**
> * Functional testing through mach5 tier1-7 using ZGC
> * Performance testing through aurora to verify no regression occur
> * Manual testing to verify performance
> * Manual testing to verify we avoid page cache flushing

Stefan Johansson has updated the pull request incrementally with two additional commits since the last revision:

 - Review - use explicit null checks
 - StefanK review - change lock type

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20883/files
  - new: https://git.openjdk.org/jdk/pull/20883/files/66a9a238..fd5ad8b1

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20883&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20883&range=00-01

  Stats: 19 lines in 2 files changed: 7 ins; 8 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/20883.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20883/head:pull/20883

PR: https://git.openjdk.org/jdk/pull/20883

From aboldtch at openjdk.org  Mon Sep  9 07:11:04 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Mon, 9 Sep 2024 07:11:04 GMT
Subject: RFR: 8339387: ZGC: Synchronize medium page allocation [v2]
In-Reply-To: <IrUKgazsEx8Bq48S3M4E91q4dzzMhyoItwSDQ7HGdtA=.b20ae5ae-4f11-40fb-8e54-c78ccdcc37fa@github.com>
References: <ziblxZciM5RxCBkyeLaCrbokDcOiymECZfmvwqrhGOc=.e19013cb-e105-45b8-878f-b27ccea55944@github.com>
 <IrUKgazsEx8Bq48S3M4E91q4dzzMhyoItwSDQ7HGdtA=.b20ae5ae-4f11-40fb-8e54-c78ccdcc37fa@github.com>
Message-ID: <bXq1qS0_HRKSq1lid4RFwIXCzFX9G9c4cFZKQiuJPqU=.d3a71156-d00d-4baa-a831-9d40c157e6ff@github.com>

On Mon, 9 Sep 2024 06:46:25 GMT, Stefan Johansson <sjohanss at openjdk.org> wrote:

>> Please review this change to synchronize medium page allocations in ZGC.
>> 
>> **Summary**
>> In ZGC objects of a certain size class are allocated in medium sized pages. For each age there is a single medium page shared by all mutators. When this page gets full all thread that try to do a medium page allocation will try to allocate and install a new medium page, but only one will succeed. This can lead to a lot of unnecessary medium page allocation which in turn can lead to the unnecessary page cache flushing.
>> 
>> This change introduces synchronization to only a allow a single thread to allocate the medium page in the common case.  
>> 
>> **Testing**
>> * Functional testing through mach5 tier1-7 using ZGC
>> * Performance testing through aurora to verify no regression occur
>> * Manual testing to verify performance
>> * Manual testing to verify we avoid page cache flushing
>
> Stefan Johansson has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Review - use explicit null checks
>  - StefanK review - change lock type

Marked as reviewed by aboldtch (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/20883#pullrequestreview-2288987407

From stefank at openjdk.org  Mon Sep  9 07:11:05 2024
From: stefank at openjdk.org (Stefan Karlsson)
Date: Mon, 9 Sep 2024 07:11:05 GMT
Subject: RFR: 8339387: ZGC: Synchronize medium page allocation [v2]
In-Reply-To: <IrUKgazsEx8Bq48S3M4E91q4dzzMhyoItwSDQ7HGdtA=.b20ae5ae-4f11-40fb-8e54-c78ccdcc37fa@github.com>
References: <ziblxZciM5RxCBkyeLaCrbokDcOiymECZfmvwqrhGOc=.e19013cb-e105-45b8-878f-b27ccea55944@github.com>
 <IrUKgazsEx8Bq48S3M4E91q4dzzMhyoItwSDQ7HGdtA=.b20ae5ae-4f11-40fb-8e54-c78ccdcc37fa@github.com>
Message-ID: <H93ufBEri5QTXokS2bH7EObhPmasRjoZAQnrbch6kYE=.076dd915-0527-4be1-b5f5-aa04f33f1a10@github.com>

On Mon, 9 Sep 2024 06:46:25 GMT, Stefan Johansson <sjohanss at openjdk.org> wrote:

>> Please review this change to synchronize medium page allocations in ZGC.
>> 
>> **Summary**
>> In ZGC objects of a certain size class are allocated in medium sized pages. For each age there is a single medium page shared by all mutators. When this page gets full all thread that try to do a medium page allocation will try to allocate and install a new medium page, but only one will succeed. This can lead to a lot of unnecessary medium page allocation which in turn can lead to the unnecessary page cache flushing.
>> 
>> This change introduces synchronization to only a allow a single thread to allocate the medium page in the common case.  
>> 
>> **Testing**
>> * Functional testing through mach5 tier1-7 using ZGC
>> * Performance testing through aurora to verify no regression occur
>> * Manual testing to verify performance
>> * Manual testing to verify we avoid page cache flushing
>
> Stefan Johansson has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Review - use explicit null checks
>  - StefanK review - change lock type

Looks good!

-------------

Marked as reviewed by stefank (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20883#pullrequestreview-2288990050

From rcastanedalo at openjdk.org  Mon Sep  9 07:44:13 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Mon, 9 Sep 2024 07:44:13 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v17]
In-Reply-To: <j3yMY_VrnBkrCj3tqg1ZwcpXwKVPjML5xTlcBlPSqzY=.09d4c71f-89c0-4f85-9fa0-c1ac887dc0ce@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <DhU_l-0fczpPrZ8h-99waBbHcEOcQs-BmRM8_uIdPQ4=.6e5de896-d87c-4a9d-8c77-b155b29618a1@github.com>
 <j3yMY_VrnBkrCj3tqg1ZwcpXwKVPjML5xTlcBlPSqzY=.09d4c71f-89c0-4f85-9fa0-c1ac887dc0ce@github.com>
Message-ID: <RcdzC_ZSoH3GHNlJmtXfxDb-naN3C4EoPzuA8tdZ_ZE=.42ba05ac-0951-4dd5-afd2-5075fbb1030e@github.com>

On Sat, 7 Sep 2024 12:37:54 GMT, Martin Doerr <mdoerr at openjdk.org> wrote:

> I've run the current version through our nightly tests and there were no issues. (Applied on JDK head and tested on x86_64, aarch64 and PPC64 with several OSes.) Performance also looks good on PPC64. No regression observed.

Great, thanks for testing Martin!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2337362381

From mbaesken at openjdk.org  Mon Sep  9 07:46:06 2024
From: mbaesken at openjdk.org (Matthias Baesken)
Date: Mon, 9 Sep 2024 07:46:06 GMT
Subject: RFR: 8339648: ZGC: Division by zero in rule_major_allocation_rate
In-Reply-To: <ElUYTplxxvkJLtxj5OdpcsgumQV8zdml9oyQim2kBP8=.695e816f-a639-4b39-92d2-ea5de5cb1037@github.com>
References: <ElUYTplxxvkJLtxj5OdpcsgumQV8zdml9oyQim2kBP8=.695e816f-a639-4b39-92d2-ea5de5cb1037@github.com>
Message-ID: <FzjRlMQGFuKD0Qxn47nW0I9nrDRBVOVvxsffqiQUDzQ=.64030d8d-2fb9-4ae7-88bd-98708927499d@github.com>

On Fri, 6 Sep 2024 10:26:19 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

> The HS jtreg test gc/stringdedup/TestStringDeduplicationAgeThreshold_ZGenerational
> shows this error when running with ubsan enabled 
> 
> src/hotspot/share/gc/z/zDirector.cpp:491:74: runtime error: division by zero
>     #0 0x7f09886401d4 in rule_major_allocation_rate src/hotspot/share/gc/z/zDirector.cpp:491
>     #1 0x7f09886401d4 in start_gc src/hotspot/share/gc/z/zDirector.cpp:822
>     #2 0x7f09886401d4 in ZDirector::run_thread() src/hotspot/share/gc/z/zDirector.cpp:912
>     #3 0x7f098c1404e8 in ZThread::run_service() src/hotspot/share/gc/z/zThread.cpp:29
>     #4 0x7f09897cac19 in ConcurrentGCThread::run() src/hotspot/share/gc/shared/concurrentGCThread.cpp:48
>     #5 0x7f098bb46b0a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225
>     #6 0x7f098b1a9881 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858

looks like for clang
https://bugs.llvm.org/show_bug.cgi?id=17000#c1
the float division by 0 became defined behavior, but it might be different for other compilers. I think it depends not only on the platform but also on the compiler. See  the discussion here https://stackoverflow.com/questions/42926763/the-behaviour-of-floating-point-division-by-zero

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20888#issuecomment-2337366232

From rkennke at openjdk.org  Mon Sep  9 10:29:55 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Mon, 9 Sep 2024 10:29:55 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v7]
In-Reply-To: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
Message-ID: <okQKnqlC8qh08kXMl6p_Vbr1RFA1VVgFRUtc4wRysM0=.1e9da30a-5e40-46e9-bd15-edccea4af792@github.com>

> This is the main body of the JEP 450: Compact Object Headers (Experimental).
> 
> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
> 
> Main changes:
>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>  - Arrays will now store their length at offset 8.
>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv...

Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits:

 - Fix compiler/c2/irTests/TestPadding.java for +COH
 - Simplify arrayOopDesc::length_offset_in_bytes and oopDesc::base_offset_in_bytes
 - Nit in header_size
 - GC code tweaks
 - Fix runtime/cds/appcds/loaderConstraints/DynamicLoaderConstraintsTest.java
 - Fix jdk/tools/jlink/plugins/CDSPluginTest.java
 - Cleanup markWord bits and comments
 - x86_64: Fix loadNKlassCompactHeaders
 - aarch64: Fix loadNKlassCompactHeaders
 - Use FLAG_SET_ERGO when turning off UseCompactObjectHeaders
 - ... and 16 more: https://git.openjdk.org/jdk/compare/b45fe174...49126383

-------------

Changes: https://git.openjdk.org/jdk/pull/20677/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=06
  Stats: 4465 lines in 189 files changed: 3175 ins; 678 del; 612 mod
  Patch: https://git.openjdk.org/jdk/pull/20677.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677

PR: https://git.openjdk.org/jdk/pull/20677

From rcastanedalo at openjdk.org  Mon Sep  9 11:15:47 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Mon, 9 Sep 2024 11:15:47 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v18]
In-Reply-To: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
Message-ID: <IeVZPQjaeWE6AThbR5sjTIlartNQ_nlI20cPp3DF_Dw=.7f799fc4-aab6-4cb3-9b4e-a1a2d0288362@github.com>

> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail.
> 
> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold:
> 
> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and
> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work.
> 
> ## Summary of the Changes
> 
> ### Platform-Independent Changes (`src/hotspot/share`)
> 
> These consist mainly of:
> 
> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early;
> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and
> - temporary support for porting the JEP to the remaining platforms.
> 
> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed.
> 
> ### Platform-Dependent Changes (`src/hotspot/cpu`)
> 
> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms.
> 
> #### ADL Changes
> 
> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy.
> 
> #### `G1BarrierSetAssembler` Changes
> 
> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c...

Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision:

 - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion
 - riscv port for JEP 475

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19746/files
  - new: https://git.openjdk.org/jdk/pull/19746/files/6663433c..94145917

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=17
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=16-17

  Stats: 860 lines in 4 files changed: 771 ins; 49 del; 40 mod
  Patch: https://git.openjdk.org/jdk/pull/19746.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746

PR: https://git.openjdk.org/jdk/pull/19746

From rcastanedalo at openjdk.org  Mon Sep  9 11:15:47 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Mon, 9 Sep 2024 11:15:47 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v17]
In-Reply-To: <RcdzC_ZSoH3GHNlJmtXfxDb-naN3C4EoPzuA8tdZ_ZE=.42ba05ac-0951-4dd5-afd2-5075fbb1030e@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <DhU_l-0fczpPrZ8h-99waBbHcEOcQs-BmRM8_uIdPQ4=.6e5de896-d87c-4a9d-8c77-b155b29618a1@github.com>
 <j3yMY_VrnBkrCj3tqg1ZwcpXwKVPjML5xTlcBlPSqzY=.09d4c71f-89c0-4f85-9fa0-c1ac887dc0ce@github.com>
 <RcdzC_ZSoH3GHNlJmtXfxDb-naN3C4EoPzuA8tdZ_ZE=.42ba05ac-0951-4dd5-afd2-5075fbb1030e@github.com>
Message-ID: <tUlIoyGz5Yi5DSOENUtNN8b7BCtaXTGLgJZub8fAB-4=.5824bf04-428e-49db-ab29-dc3a0ebec848@github.com>

On Mon, 9 Sep 2024 07:41:06 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> I've run the current version through our nightly tests and there were no issues. (Applied on JDK head and tested on x86_64, aarch64 and PPC64 with several OSes.) Performance also looks good on PPC64. No regression observed.
>
>> I've run the current version through our nightly tests and there were no issues. (Applied on JDK head and tested on x86_64, aarch64 and PPC64 with several OSes.) Performance also looks good on PPC64. No regression observed.
> 
> Great, thanks for testing Martin!

> > > Hi @robcasloz, here is the implementation for RISC-V: [feilongjiang at 1c012cf](https://github.com/feilongjiang/jdk/commit/1c012cfd4a1f4d6e39d6f4798281423f884e97a6) We are still testing the latest changes, results will be updated later.
> > 
> > 
> > Great, thanks @feilongjiang! Just let me know when you are done with testing and I will merge it into this changeset.
> 
> Tier1-3 & hotspot:tier4 test result is clean on linux-riscv64 platform. No regression observed for performance. (Applied on JDK head).

Thanks @feilongjiang, merged now (commit 94145917).

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2337824882

From sjohanss at openjdk.org  Mon Sep  9 11:17:12 2024
From: sjohanss at openjdk.org (Stefan Johansson)
Date: Mon, 9 Sep 2024 11:17:12 GMT
Subject: RFR: 8339387: ZGC: Synchronize medium page allocation [v2]
In-Reply-To: <GU0p3_i28AH9XcirdMjlpp9Ip7P63oaJn3GFomcDbI8=.b1206841-19a4-4a4b-b882-51abf462be90@github.com>
References: <ziblxZciM5RxCBkyeLaCrbokDcOiymECZfmvwqrhGOc=.e19013cb-e105-45b8-878f-b27ccea55944@github.com>
 <GU0p3_i28AH9XcirdMjlpp9Ip7P63oaJn3GFomcDbI8=.b1206841-19a4-4a4b-b882-51abf462be90@github.com>
Message-ID: <NTX9wvusXQnioZ4E0DVdEc_tyQLB3Xz5QexVwk-Sa2k=.27699575-be95-4975-924f-d96b16ad3d40@github.com>

On Fri, 6 Sep 2024 08:22:53 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote:

>> Stefan Johansson has updated the pull request incrementally with two additional commits since the last revision:
>> 
>>  - Review - use explicit null checks
>>  - StefanK review - change lock type
>
> Looks good! Thanks for fixing.

Thanks for the reviews @fisk, @xmas92 and @stefank.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20883#issuecomment-2337828395

From sjohanss at openjdk.org  Mon Sep  9 11:17:14 2024
From: sjohanss at openjdk.org (Stefan Johansson)
Date: Mon, 9 Sep 2024 11:17:14 GMT
Subject: Integrated: 8339387: ZGC: Synchronize medium page allocation
In-Reply-To: <ziblxZciM5RxCBkyeLaCrbokDcOiymECZfmvwqrhGOc=.e19013cb-e105-45b8-878f-b27ccea55944@github.com>
References: <ziblxZciM5RxCBkyeLaCrbokDcOiymECZfmvwqrhGOc=.e19013cb-e105-45b8-878f-b27ccea55944@github.com>
Message-ID: <qjdKTQ0OFEPRTL-HRQ3jIWp8e8VhwDi1LxwggX7Bsr0=.3d4efee1-3b8e-4e15-9701-c4f4d666ef5e@github.com>

On Fri, 6 Sep 2024 07:14:11 GMT, Stefan Johansson <sjohanss at openjdk.org> wrote:

> Please review this change to synchronize medium page allocations in ZGC.
> 
> **Summary**
> In ZGC objects of a certain size class are allocated in medium sized pages. For each age there is a single medium page shared by all mutators. When this page gets full all thread that try to do a medium page allocation will try to allocate and install a new medium page, but only one will succeed. This can lead to a lot of unnecessary medium page allocation which in turn can lead to the unnecessary page cache flushing.
> 
> This change introduces synchronization to only a allow a single thread to allocate the medium page in the common case.  
> 
> **Testing**
> * Functional testing through mach5 tier1-7 using ZGC
> * Performance testing through aurora to verify no regression occur
> * Manual testing to verify performance
> * Manual testing to verify we avoid page cache flushing

This pull request has now been integrated.

Changeset: 347d5728
Author:    Stefan Johansson <sjohanss at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/347d5728e69ae1f7d1a24820cc2c17bb0b8c0af5
Stats:     47 lines in 2 files changed: 44 ins; 1 del; 2 mod

8339387: ZGC: Synchronize medium page allocation

Reviewed-by: aboldtch, stefank, eosterlund

-------------

PR: https://git.openjdk.org/jdk/pull/20883

From rcastanedalo at openjdk.org  Mon Sep  9 11:35:13 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Mon, 9 Sep 2024 11:35:13 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v17]
In-Reply-To: <V1ue84GBUJsrKD6LtJtHAJpnXiD_Vs_mlJoAOfDkKFw=.3360375c-5367-41c9-b3f8-0d1c6ae98513@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <DhU_l-0fczpPrZ8h-99waBbHcEOcQs-BmRM8_uIdPQ4=.6e5de896-d87c-4a9d-8c77-b155b29618a1@github.com>
 <V1ue84GBUJsrKD6LtJtHAJpnXiD_Vs_mlJoAOfDkKFw=.3360375c-5367-41c9-b3f8-0d1c6ae98513@github.com>
Message-ID: <oZYpQUywTbi2gwA924iVKttcygCPfJL5ccKM6a_4yxk=.5cc990b6-99eb-4d3d-997b-0b4da984e07c@github.com>

On Fri, 6 Sep 2024 21:33:42 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:

>> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   s390 port : late barrier expansion
>
> src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 176:
> 
>> 174:   __ jcc(Assembler::zero, runtime);                           // jump to runtime if index == 0 (full buffer)
>> 175:   // The buffer is not full, store value into it.
>> 176:   __ subptr(temp, wordSize);                                  // temp := next index
> 
> Instead of 
> 
>   __ testptr(temp, temp);
>   __ jcc(Assembler::zero, runtime);
>   __ subptr(temp, wordSize);
> 
> it seems like this might be better
> 
>   __ subptr(temp, wordSize);
>   __ jcc(Assembler::below, runtime);
> 
> I think the code in the PR matches what the early expansion generates, so I think a change here
> can be deferred to a followup.

Good point, thanks! I made a note for follow-up work.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1750088920

From mbaesken at openjdk.org  Mon Sep  9 11:37:41 2024
From: mbaesken at openjdk.org (Matthias Baesken)
Date: Mon, 9 Sep 2024 11:37:41 GMT
Subject: RFR: 8339648: ZGC: Division by zero in rule_major_allocation_rate
 [v2]
In-Reply-To: <ElUYTplxxvkJLtxj5OdpcsgumQV8zdml9oyQim2kBP8=.695e816f-a639-4b39-92d2-ea5de5cb1037@github.com>
References: <ElUYTplxxvkJLtxj5OdpcsgumQV8zdml9oyQim2kBP8=.695e816f-a639-4b39-92d2-ea5de5cb1037@github.com>
Message-ID: <ftQ3_zwYKXg1VEzKfx5XqYxCAWZtbhskNJTD47st1Jg=.eadd39e7-5a1e-4001-ab10-901c9ef57581@github.com>

> The HS jtreg test gc/stringdedup/TestStringDeduplicationAgeThreshold_ZGenerational
> shows this error when running with ubsan enabled 
> 
> src/hotspot/share/gc/z/zDirector.cpp:491:74: runtime error: division by zero
>     #0 0x7f09886401d4 in rule_major_allocation_rate src/hotspot/share/gc/z/zDirector.cpp:491
>     #1 0x7f09886401d4 in start_gc src/hotspot/share/gc/z/zDirector.cpp:822
>     #2 0x7f09886401d4 in ZDirector::run_thread() src/hotspot/share/gc/z/zDirector.cpp:912
>     #3 0x7f098c1404e8 in ZThread::run_service() src/hotspot/share/gc/z/zThread.cpp:29
>     #4 0x7f09897cac19 in ConcurrentGCThread::run() src/hotspot/share/gc/shared/concurrentGCThread.cpp:48
>     #5 0x7f098bb46b0a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225
>     #6 0x7f098b1a9881 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858

Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision:

  Adjust division following suggestion by xmas

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20888/files
  - new: https://git.openjdk.org/jdk/pull/20888/files/c66c089e..21fe3ca7

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20888&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20888&range=00-01

  Stats: 6 lines in 1 file changed: 1 ins; 4 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/20888.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20888/head:pull/20888

PR: https://git.openjdk.org/jdk/pull/20888

From mbaesken at openjdk.org  Mon Sep  9 11:41:05 2024
From: mbaesken at openjdk.org (Matthias Baesken)
Date: Mon, 9 Sep 2024 11:41:05 GMT
Subject: RFR: 8339648: ZGC: Division by zero in rule_major_allocation_rate
 [v2]
In-Reply-To: <ftQ3_zwYKXg1VEzKfx5XqYxCAWZtbhskNJTD47st1Jg=.eadd39e7-5a1e-4001-ab10-901c9ef57581@github.com>
References: <ElUYTplxxvkJLtxj5OdpcsgumQV8zdml9oyQim2kBP8=.695e816f-a639-4b39-92d2-ea5de5cb1037@github.com>
 <ftQ3_zwYKXg1VEzKfx5XqYxCAWZtbhskNJTD47st1Jg=.eadd39e7-5a1e-4001-ab10-901c9ef57581@github.com>
Message-ID: <5oKpxfys6Bj1vhYQURKt_TYMXqJ1u-R2FrMXwZJrUng=.3ddf9704-cbc1-44ae-b871-b3b5b7bd821d@github.com>

On Mon, 9 Sep 2024 11:37:41 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

>> The HS jtreg test gc/stringdedup/TestStringDeduplicationAgeThreshold_ZGenerational
>> shows this error when running with ubsan enabled 
>> 
>> src/hotspot/share/gc/z/zDirector.cpp:491:74: runtime error: division by zero
>>     #0 0x7f09886401d4 in rule_major_allocation_rate src/hotspot/share/gc/z/zDirector.cpp:491
>>     #1 0x7f09886401d4 in start_gc src/hotspot/share/gc/z/zDirector.cpp:822
>>     #2 0x7f09886401d4 in ZDirector::run_thread() src/hotspot/share/gc/z/zDirector.cpp:912
>>     #3 0x7f098c1404e8 in ZThread::run_service() src/hotspot/share/gc/z/zThread.cpp:29
>>     #4 0x7f09897cac19 in ConcurrentGCThread::run() src/hotspot/share/gc/shared/concurrentGCThread.cpp:48
>>     #5 0x7f098bb46b0a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225
>>     #6 0x7f098b1a9881 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858
>
> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Adjust division following suggestion by xmas

Hi Axel, I adjusted the coding following your suggestion.
Btw. is there maybe already somewhere a template function doing that division handling divisor 0? Probably it is not the only place in the codebase where this can happen ?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20888#issuecomment-2337879054

From rcastanedalo at openjdk.org  Mon Sep  9 11:48:11 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Mon, 9 Sep 2024 11:48:11 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v17]
In-Reply-To: <V1ue84GBUJsrKD6LtJtHAJpnXiD_Vs_mlJoAOfDkKFw=.3360375c-5367-41c9-b3f8-0d1c6ae98513@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <DhU_l-0fczpPrZ8h-99waBbHcEOcQs-BmRM8_uIdPQ4=.6e5de896-d87c-4a9d-8c77-b155b29618a1@github.com>
 <V1ue84GBUJsrKD6LtJtHAJpnXiD_Vs_mlJoAOfDkKFw=.3360375c-5367-41c9-b3f8-0d1c6ae98513@github.com>
Message-ID: <GUeq1XYVIzCILnzmFx_CK7CP9-RGEBvuMEoEh0JDyV0=.b9cb72f5-d604-4cd0-966b-7439545a4c89@github.com>

On Fri, 6 Sep 2024 23:57:59 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:

>> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   s390 port : late barrier expansion
>
> src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 354:
> 
>> 352:   __ bind(runtime);
>> 353:   // save the live input values
>> 354:   RegSet saved = RegSet::of(store_addr NOT_LP64(COMMA thread));
> 
> I was looking at this a while ago, and haven't figured out why we're saving `store_addr` here.
> Also not sure why we're saving `thread` here for 32bit platforms.
> Something to think about for the future.  Though maybe the 32bit case will be gone by then :)

I'm not sure either, this is in any case pre-existing interpreter code.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1750105760

From rkennke at openjdk.org  Mon Sep  9 11:55:52 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Mon, 9 Sep 2024 11:55:52 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v8]
In-Reply-To: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
Message-ID: <J0g0PSye768dhN3JoXFsZ6FivocNBLPmHNtRUCOmLTw=.5756c64a-4763-4727-9637-ba3fe7b9b17d@github.com>

> This is the main body of the JEP 450: Compact Object Headers (Experimental).
> 
> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
> 
> Main changes:
>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>  - Arrays will now store their length at offset 8.
>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv...

Roman Kennke has updated the pull request incrementally with two additional commits since the last revision:

 - Try to avoid lea in loadNklass (aarch64)
 - Fix release build error

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20677/files
  - new: https://git.openjdk.org/jdk/pull/20677/files/49126383..70f492d3

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=07
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=06-07

  Stats: 24 lines in 5 files changed: 12 ins; 1 del; 11 mod
  Patch: https://git.openjdk.org/jdk/pull/20677.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677

PR: https://git.openjdk.org/jdk/pull/20677

From tschatzl at openjdk.org  Mon Sep  9 12:40:13 2024
From: tschatzl at openjdk.org (Thomas Schatzl)
Date: Mon, 9 Sep 2024 12:40:13 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v7]
In-Reply-To: <okQKnqlC8qh08kXMl6p_Vbr1RFA1VVgFRUtc4wRysM0=.1e9da30a-5e40-46e9-bd15-edccea4af792@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <okQKnqlC8qh08kXMl6p_Vbr1RFA1VVgFRUtc4wRysM0=.1e9da30a-5e40-46e9-bd15-edccea4af792@github.com>
Message-ID: <RpKHsmiTSMtDUKWez7eTvz_lOtfODGQlZeQ4ilt6-ng=.52e0f840-e732-4f0a-b618-dade28b38957@github.com>

On Mon, 9 Sep 2024 10:29:55 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> This is the main body of the JEP 450: Compact Object Headers (Experimental).
>> 
>> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
>> 
>> Main changes:
>>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>>  - Arrays will now store their length at offset 8.
>>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co...
>
> Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits:
> 
>  - Fix compiler/c2/irTests/TestPadding.java for +COH
>  - Simplify arrayOopDesc::length_offset_in_bytes and oopDesc::base_offset_in_bytes
>  - Nit in header_size
>  - GC code tweaks
>  - Fix runtime/cds/appcds/loaderConstraints/DynamicLoaderConstraintsTest.java
>  - Fix jdk/tools/jlink/plugins/CDSPluginTest.java
>  - Cleanup markWord bits and comments
>  - x86_64: Fix loadNKlassCompactHeaders
>  - aarch64: Fix loadNKlassCompactHeaders
>  - Use FLAG_SET_ERGO when turning off UseCompactObjectHeaders
>  - ... and 16 more: https://git.openjdk.org/jdk/compare/b45fe174...49126383

src/hotspot/share/gc/g1/g1ParScanThreadState.cpp line 481:

> 479:   Klass* klass = UseCompactObjectHeaders
> 480:       ? old_mark.klass()
> 481:       : old->klass();

To be exact "promotion" only refers to copying to an older generation, so this comment does not cover objects copied within the generation.

Suggestion:

  // NOTE: With compact headers, it is not safe to load the Klass* from old, because
  // that would access the mark-word, that might change at any time by concurrent
  // workers.
  // This mark word would refer to a forwardee, which may not yet have completed
  // copying. Therefore we must load the Klass* from the mark-word that we already
  // loaded. This is safe, because we only enter here if not yet forwarded.

src/hotspot/share/gc/parallel/mutableSpace.cpp line 225:

> 223:       // header-based forwarding during promotion. Full GC doesn't
> 224:       // use the object header for forwarding at all.
> 225:       p += obj->forwardee()->size();

Better use `!obj->is_self_forwarded()` here.

src/hotspot/share/gc/parallel/psPromotionManager.inline.hpp line 174:

> 172:   // may not yet have completed copying. Therefore we must load the Klass* from
> 173:   // the mark-word that we have already loaded. This is safe, because we have checked
> 174:   // that this is not yet forwarded in the caller.)

Same adjustment needed as for G1.

src/hotspot/share/gc/shared/c2/barrierSetC2.cpp line 711:

> 709:   // 8  - 32-bit VM
> 710:   // 12 - 64-bit VM, compressed klass
> 711:   // 16 - 64-bit VM, normal klass

The comment needs to be adapted to include the case for compact object headers.

src/hotspot/share/oops/arrayOop.hpp line 83:

> 81:   // The _length field is not declared in C++.  It is allocated after the
> 82:   // declared nonstatic fields in arrayOopDesc if not compressed, otherwise
> 83:   // it occupies the second half of the _klass field in oopDesc.

Needs update.

src/hotspot/share/oops/instanceOop.hpp line 36:

> 34: class instanceOopDesc : public oopDesc {
> 35:  public:
> 36:   // If compressed, the offset of the fields of the instance may not be aligned.

Needs fixing (or removal) wrt to compact object headers, or move to the particular case.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750046114
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750056160
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750074607
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750080552
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750027009
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750116336

From tschatzl at openjdk.org  Mon Sep  9 12:40:14 2024
From: tschatzl at openjdk.org (Thomas Schatzl)
Date: Mon, 9 Sep 2024 12:40:14 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v6]
In-Reply-To: <BGmZ04yibLn_KtPyQrA3sQv5OtMonrcDy3aN0vAj8NA=.5de6cae3-3a05-4a74-a15c-aade4176bdeb@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <BGmZ04yibLn_KtPyQrA3sQv5OtMonrcDy3aN0vAj8NA=.5de6cae3-3a05-4a74-a15c-aade4176bdeb@github.com>
Message-ID: <86SISXYdKqHq5_nSqeVVNgmxplVK6QuHvOjCmiCKkzQ=.92ac6af1-9a94-4068-b625-1e331314826e@github.com>

On Thu, 22 Aug 2024 20:08:43 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> This is the main body of the JEP 450: Compact Object Headers (Experimental).
>> 
>> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
>> 
>> Main changes:
>>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>>  - Arrays will now store their length at offset 8.
>>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co...
>
> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix bit counts in GCForwarding

src/hotspot/share/gc/shared/collectedHeap.cpp line 232:

> 230:   }
> 231: 
> 232:   // With compact headers, we can't safely access the class, due

Suggestion:

  // With compact headers, we can't safely access the klass, due


This is the case why? Because we might not have copied the header yet? Is this method actually ever used while the forwarded object is unstable?
Given this is used for verification only afaik, we should make an effort to provide that check.

src/hotspot/share/gc/shared/gcForwarding.hpp line 34:

> 32: 
> 33: /*
> 34:  * Implements forwarding for the full-GCs of Serial, Parallel, G1 and Shenandoah in

Suggestion:

 * Implements forwarding for the Full GCs of Serial, Parallel, G1 and Shenandoah in

src/hotspot/share/gc/shared/gcForwarding.hpp line 41:

> 39:  * bits (to indicate 'forwarded' state as usual).
> 40:  */
> 41: class GCForwarding : public AllStatic {

Since this class is only used for Full GCs, it may be useful to include that information, i.e. something like `FullGCForwarding` to avoid confusion why it is not used for other GCs too.
(Unless this has been discussed and even rejected by me before).

src/hotspot/share/oops/compressedKlass.hpp line 43:

> 41: 
> 42:   // Tiny-class-pointer mode
> 43:   static int _tiny_cp; // -1, 0=true, 1=false

Suggestion:

  static int _tiny_cp; // -1 = uninitialized, 0 = true, 1 = false

In addition to that, I am not sure if introducing a new term ("tiny") for compact class header related changes (and just here) makes the code more clear; I would have expected a "_compact_" prefix. Also all other members use "k"-klass and spell out "klass pointer", so I would prefer to keep that style.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1749995275
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1749980748
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1749987945
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1749969456

From tschatzl at openjdk.org  Mon Sep  9 12:40:18 2024
From: tschatzl at openjdk.org (Thomas Schatzl)
Date: Mon, 9 Sep 2024 12:40:18 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v8]
In-Reply-To: <J0g0PSye768dhN3JoXFsZ6FivocNBLPmHNtRUCOmLTw=.5756c64a-4763-4727-9637-ba3fe7b9b17d@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <J0g0PSye768dhN3JoXFsZ6FivocNBLPmHNtRUCOmLTw=.5756c64a-4763-4727-9637-ba3fe7b9b17d@github.com>
Message-ID: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com>

On Mon, 9 Sep 2024 11:55:52 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> This is the main body of the JEP 450: Compact Object Headers (Experimental).
>> 
>> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
>> 
>> Main changes:
>>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>>  - Arrays will now store their length at offset 8.
>>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co...
>
> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Try to avoid lea in loadNklass (aarch64)
>  - Fix release build error

src/hotspot/share/oops/klass.hpp line 169:

> 167:                                 // contention that may happen when a nearby object is modified.
> 168:   AccessFlags _access_flags;    // Access flags. The class/interface distinction is stored here.
> 169:                                 // Some flags created by the JVM, not in the class file itself,

Suggestion:

  markWord _prototype_header;   // Used to initialize objects' header with compact headers.


Maybe some comment why this is an instance member.

src/hotspot/share/oops/objArrayKlass.inline.hpp line 74:

> 72: void ObjArrayKlass::oop_oop_iterate(oop obj, OopClosureType* closure) {
> 73:   // In this assert, we cannot safely access the Klass* with compact headers.
> 74:   assert (UseCompactObjectHeaders || obj->is_array(), "obj must be array");

If we can't safely access the `Klass*` here, why is the call to `obj->klass()` below safe?

src/hotspot/share/oops/oop.cpp line 157:

> 155: bool oopDesc::has_klass_gap() {
> 156:   // Only has a klass gap when compressed class pointers are used.
> 157:   // Except when using compact headers.

Suggestion:

  // Only has a klass gap when compressed class pointers are used and not
  // using compact headers.

(Not sure if repeating the fairly simple disjunction below makes sense, but there has been a comment before too)

src/hotspot/share/oops/oop.cpp line 230:

> 228:   // disjunct below to fail if the two comparands are computed across such
> 229:   // a concurrent change.
> 230:   return Universe::heap()->is_stw_gc_active() && klass->is_objArray_klass() && is_forwarded() && (UseParallelGC || UseG1GC);

Is this still true after the recent changes like JDK-8311163? It might be worth waiting for.

src/hotspot/share/oops/oop.hpp line 103:

> 101:   static inline void set_klass_gap(HeapWord* mem, int z);
> 102: 
> 103:   // size of object header, aligned to platform wordSize

Suggestion:

  // Size of object header, aligned to platform wordSize

Pre-existing

src/hotspot/share/oops/oop.hpp line 108:

> 106:       return sizeof(markWord) / HeapWordSize;
> 107:     } else {
> 108:       return sizeof(oopDesc) / HeapWordSize;

Suggestion:

      return sizeof(oopDesc) / HeapWordSize;

src/hotspot/share/oops/oop.hpp line 134:

> 132:   inline Klass*   forward_safe_klass(markWord m) const;
> 133:   inline size_t   forward_safe_size();
> 134:   inline void     forward_safe_init_mark();

Given the comment these methods do not seem "safe" to me. Maybe use "raw" or something to better indicate that care must be taken to use them.

Maybe the "safe" refers to use them only in "safe" contexts, but in Hotspot code iirc we use something like "raw" or "unsafe".

src/hotspot/share/oops/oop.hpp line 295:

> 293:   // this call returns null for that thread; any other thread has the
> 294:   // value of the forwarding pointer returned and does not modify "this".
> 295:   inline oop forward_to_atomic(oop p, markWord compare, atomic_memory_order order = memory_order_conservative);

Maybe add an assert in the implementation so that it is not used for self-forwarding. Same for `forward_to`.

src/hotspot/share/oops/oop.hpp line 356:

> 354:       return mark_offset_in_bytes() + sizeof(markWord) / 2;
> 355:     } else
> 356: #endif

Maybe instead of trying to calculate some random, meaningless value just use some "random" value directly?
I am fine with the existing code, but first stating directly that "any value" works here, this additional code seems to confuse the message. (Fwiw, the method is also used during Universe initialization).

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750118470
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750143956
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750145460
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750150640
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750154114
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750153663
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750157781
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750159516
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750163768

From tschatzl at openjdk.org  Mon Sep  9 12:45:07 2024
From: tschatzl at openjdk.org (Thomas Schatzl)
Date: Mon, 9 Sep 2024 12:45:07 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v8]
In-Reply-To: <J0g0PSye768dhN3JoXFsZ6FivocNBLPmHNtRUCOmLTw=.5756c64a-4763-4727-9637-ba3fe7b9b17d@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <J0g0PSye768dhN3JoXFsZ6FivocNBLPmHNtRUCOmLTw=.5756c64a-4763-4727-9637-ba3fe7b9b17d@github.com>
Message-ID: <aebi9KrSXwlD4fPsZ791PmOsMIsJ4gVzmbtLAw3uoZA=.f3b91b1a-4e97-4ba2-b7c0-533f2f8dbfe4@github.com>

On Mon, 9 Sep 2024 11:55:52 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> This is the main body of the JEP 450: Compact Object Headers (Experimental).
>> 
>> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
>> 
>> Main changes:
>>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>>  - Arrays will now store their length at offset 8.
>>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co...
>
> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Try to avoid lea in loadNklass (aarch64)
>  - Fix release build error

Only looked at GC and runtime changes, only very briefly at compiler stuff.

Only looked at GC and runtime changes, only very briefly at compiler stuff.

-------------

Changes requested by tschatzl (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2289786482
PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2289800458

From rkennke at openjdk.org  Mon Sep  9 12:52:07 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Mon, 9 Sep 2024 12:52:07 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v3]
In-Reply-To: <LwLTyRUwTUB8Kph7ngHJExHVwAdJSvimoXRsh6A2HcM=.1c03eb16-4e7a-42ac-bbe1-425f4f7fed75@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <bKtmNAMLqRlWTG9MBcosxeh2rYNK_yx_gkh_qbNKt6s=.e23c8a75-0072-4c9f-8d8e-7db14826fded@github.com>
 <ySywujYrcJzmYZtj5oflhcQZbzxPv78dm29gJ5B1fqE=.31b98908-bc0f-48af-b67a-75e7351d1b74@github.com>
 <CjcHSbAikUEXe-wpmu-p2od4z6IN-QC54DCKL5hSaZY=.85532a31-e607-447b-8cea-80e97024f994@github.com>
 <VGJOzveV1b0KxnQ_lSugtMP7boOK5TPmiYT7QXgcCoM=.780e110d-491c-4bbc-9ee7-a7db237ee9dc@github.com>
 <LwLTyRUwTUB8Kph7ngHJExHVwAdJSvimoXRsh6A2HcM=.1c03eb16-4e7a-42ac-bbe1-425f4f7fed75@github.com>
Message-ID: <GA5I5QeGCRY4m_9sFnco0CkvJ18upVlhDSA-VyiNhUw=.6cbd78f0-1064-46dc-8791-18c8c63c5e8c@github.com>

On Fri, 30 Aug 2024 18:10:44 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote:

>> FWIW, the ParallelGC does something very similar to what you propose, except that it walks bitmaps instead of paring the space to find the self-forwarded objects. It then has a check inside object_iterate to make sure that it doesn't expose the dead objects (in eden and the from space) to heap dumpers and histogram printers.
>> 
>> Because of the the code above, the SerialGC clears away the information about what objects are dead in eden and the from space, so heap dumpers and histogram printers will include these dead objects. We might want to fix that as a future RFE.
>
>> If we get a promotion failure in the young gen, we are leaving the dead objects marked as forwarded. 
> 
> True; need to do sth like `obj->init_mark();` for the non-self-forwarded case. The postcondition is that no forwarded objs in eden/from.

ParallelGC actually doesn't use bitmaps, it pushes all forwarded objs to preserved-marks-table, and uses that to find forwarded objects, which is why we can't remove the preserved-marks table in ParallelGC (IOW, after this patch, the preserved-marks-stuff in Parallel scavenger is *only* used to find forwarded objects. We might want to think about more efficient solutions for this).

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750199051

From rkennke at openjdk.org  Mon Sep  9 13:02:08 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Mon, 9 Sep 2024 13:02:08 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v5]
In-Reply-To: <3QGPH52NyrDPne5EgoGx2sx9OeGRu9K72onNNwzMr2M=.8a390b3d-2e8a-470e-8bb7-1ba975070c53@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <TzmQRM_6QAc9tdMzNYnzSPR_nENrokTzNx2q-TPuWpw=.cfcbe6f4-4397-4c31-be73-d5a456618090@github.com>
 <-G_gdaZBT2xhZFsdyEwIqiOHpbLpiL79N6NDsW8X2BY=.bc52bd8a-21c5-40e7-a921-a5f68675200f@github.com>
 <BarC_59LJpof-lBzklJ8Dm8liOxnh6jr9cbCn6-NCTI=.64c466f7-c527-44e0-9f5f-2c6928e6a436@github.com>
 <3QGPH52NyrDPne5EgoGx2sx9OeGRu9K72onNNwzMr2M=.8a390b3d-2e8a-470e-8bb7-1ba975070c53@github.com>
Message-ID: <tgYxhCcpQNFDFZ3hfWxp7RYksBd-E2ODD4FctuvjVVY=.5ab13f08-736e-47de-9340-78abbf1a2541@github.com>

On Fri, 30 Aug 2024 07:42:39 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> Yes. This silent setting of UseCompactObjectHeaders ended up hiding why we got CDS failures. I would also suggest that we change this to FLAG_SET_ERGO.
>
> Seems we run all into the same thoughts :)
> 
> I added
> 
> Suggestion:
> 
>     FLAG_SET_DEFAULT(UseCompactObjectHeaders, false);
>     warning("Compact object headers require a java heap size smaller than %zu (given: %zu). "
>                  "Disabling compact object headers.", max_narrow_heap_size * HeapWordSize, max_heap_size);

That %zu is SIZE_FORMAT, right? This should probably use proper_unit_for_byte_size()/byte_size_in_proper_unit().

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750215510

From rkennke at openjdk.org  Mon Sep  9 13:31:10 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Mon, 9 Sep 2024 13:31:10 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v5]
In-Reply-To: <-G_gdaZBT2xhZFsdyEwIqiOHpbLpiL79N6NDsW8X2BY=.bc52bd8a-21c5-40e7-a921-a5f68675200f@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <TzmQRM_6QAc9tdMzNYnzSPR_nENrokTzNx2q-TPuWpw=.cfcbe6f4-4397-4c31-be73-d5a456618090@github.com>
 <-G_gdaZBT2xhZFsdyEwIqiOHpbLpiL79N6NDsW8X2BY=.bc52bd8a-21c5-40e7-a921-a5f68675200f@github.com>
Message-ID: <dFS8lzKbVvuhVhZjOXyp-XwK8PWhXDPmyA3W04sQ0w4=.30f673b8-7cae-4414-8976-1aa2f922828b@github.com>

On Thu, 22 Aug 2024 19:50:21 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote:

>> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Fix hash shift for 32 bit builds
>
> src/hotspot/share/gc/shared/gcForwarding.hpp line 36:
> 
>> 34:  * Implements forwarding for the full-GCs of Serial, Parallel, G1 and Shenandoah in
>> 35:  * a way that preserves upper N bits of object mark-words, which contain crucial
>> 36:  * Klass* information when running with compact headers. The encoding is similar to
> 
> This doc suggests this forwarding is only for compact-header so I wonder if we can check `UseCompactObjectHeaders` directly instead of heap-size in `GCForwarding::initialize`.

Right. The original implementation was more complex and then the consensus was to not sprinkle UseCompactHeaders all over the place, but with that new/simpler implementation it makes sense to simply check the UCOH flag.

> src/hotspot/share/gc/shared/gcForwarding.hpp line 40:
> 
>> 38:  * heap-base, shifts that difference into the right place, and sets the lowest two
>> 39:  * bits (to indicate 'forwarded' state as usual).
>> 40:  */
> 
>> "can use 40 bits for forwardee encoding. That's enough for 8TB of heap."
> 
> I feel this 8T-constraint is significant and should be in the doc.

Done.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750264571
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750265026

From rkennke at openjdk.org  Mon Sep  9 14:11:08 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Mon, 9 Sep 2024 14:11:08 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v6]
In-Reply-To: <wVUrQQtzaYdZNPBPyiuln0Zd8C67zG2y0yX2Iik6l_I=.0be5dcf7-c48d-4bb9-9966-23530f0b8d2e@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <BGmZ04yibLn_KtPyQrA3sQv5OtMonrcDy3aN0vAj8NA=.5de6cae3-3a05-4a74-a15c-aade4176bdeb@github.com>
 <utcfd2HiwL8a7ut2UJ7tHb1P_oz5TKXA9bwGQ5G_FnE=.63fa2666-dc3f-406d-b8d2-4f960f05f05d@github.com>
 <n33g0VZINE0Qly_UoJV2zOCDkapbuqFgRyzTYor6TrE=.7acb0ac5-5bbe-4747-bf3c-20110201616c@github.com>
 <wVUrQQtzaYdZNPBPyiuln0Zd8C67zG2y0yX2Iik6l_I=.0be5dcf7-c48d-4bb9-9966-23530f0b8d2e@github.com>
Message-ID: <6I0T4rOjOTj-FZxpspatEo6j1_Num75bCAOBNxsrHI8=.097f731e-7c92-4eac-a379-c2df336cd412@github.com>

On Tue, 27 Aug 2024 07:43:07 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> @Hamlin-Li : AFAIK, porting to linux-riscv platform has NOT been started yet. To avoid duplicate work, please let me know if anyone is interested or has been working on it :-)
>
> Yes, I'm interested in it. Thanks for raising the discussion. :)

If anybody is doing it, please send me a patch, or we can do it as a follow-up PR.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750345203

From rkennke at openjdk.org  Mon Sep  9 14:11:10 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Mon, 9 Sep 2024 14:11:10 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v6]
In-Reply-To: <utcfd2HiwL8a7ut2UJ7tHb1P_oz5TKXA9bwGQ5G_FnE=.63fa2666-dc3f-406d-b8d2-4f960f05f05d@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <BGmZ04yibLn_KtPyQrA3sQv5OtMonrcDy3aN0vAj8NA=.5de6cae3-3a05-4a74-a15c-aade4176bdeb@github.com>
 <utcfd2HiwL8a7ut2UJ7tHb1P_oz5TKXA9bwGQ5G_FnE=.63fa2666-dc3f-406d-b8d2-4f960f05f05d@github.com>
Message-ID: <O7zXQ8Jh20lyh29fRklOZcGNxDSoQiSKA3JkFLelsC4=.4c6a050f-4053-43ff-be03-4336cdaa25a2@github.com>

On Fri, 23 Aug 2024 11:38:39 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Fix bit counts in GCForwarding
>
> src/hotspot/share/oops/oop.inline.hpp line 94:
> 
>> 92: 
>> 93: void oopDesc::init_mark() {
>> 94:   if (UseCompactObjectHeaders) {
> 
> Seems only `set_mark(prototype_mark());` is fine for both cases?

Right. Done.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750342555

From rkennke at openjdk.org  Mon Sep  9 14:35:08 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Mon, 9 Sep 2024 14:35:08 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v6]
In-Reply-To: <Og4I5G3w51d7zF27JawLM9pm_XpHwijYt3lJZ4hpeWQ=.fd974e82-2ce0-4ebd-b49e-108ab87690a9@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <BGmZ04yibLn_KtPyQrA3sQv5OtMonrcDy3aN0vAj8NA=.5de6cae3-3a05-4a74-a15c-aade4176bdeb@github.com>
 <Og4I5G3w51d7zF27JawLM9pm_XpHwijYt3lJZ4hpeWQ=.fd974e82-2ce0-4ebd-b49e-108ab87690a9@github.com>
Message-ID: <S-95OBPOZb6GfBEUIDtHxCB4hTFPB2uNf5YmwMerGcE=.83afc323-883e-4ae0-8d10-fe7030cddb33@github.com>

On Mon, 26 Aug 2024 21:52:58 GMT, Chris Plummer <cjplummer at openjdk.org> wrote:

>> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Fix bit counts in GCForwarding
>
> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/Oop.java line 169:
> 
>> 167:         } else {
>> 168:           visitor.doMetadata(klass, true);
>> 169:         }
> 
> Why is there no `visitor.doMetadata()` call for the compact object header case?

There is no dedicated klass field anymore, the Klass* is encoded in the mark, and we would need to extract it. What is the purpose of the visitors? Do they need to see the klass/compressedKlass, or is it sufficient to visit the mark-word (which we already do, but as CInt).

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750386024

From rcastanedalo at openjdk.org  Mon Sep  9 14:44:17 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Mon, 9 Sep 2024 14:44:17 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v17]
In-Reply-To: <V1ue84GBUJsrKD6LtJtHAJpnXiD_Vs_mlJoAOfDkKFw=.3360375c-5367-41c9-b3f8-0d1c6ae98513@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <DhU_l-0fczpPrZ8h-99waBbHcEOcQs-BmRM8_uIdPQ4=.6e5de896-d87c-4a9d-8c77-b155b29618a1@github.com>
 <V1ue84GBUJsrKD6LtJtHAJpnXiD_Vs_mlJoAOfDkKFw=.3360375c-5367-41c9-b3f8-0d1c6ae98513@github.com>
Message-ID: <_7xTwicd2PDxJZOJ7xLdkHZc18UT-9sVZk-3YFgkMA0=.7db81e94-4f38-44a0-9983-6e391459aab2@github.com>

On Sat, 7 Sep 2024 03:57:43 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:

>> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   s390 port : late barrier expansion
>
> src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 112:
> 
>> 110:           // The answer is that stores of different sizes can co-exist
>> 111:           // in the same sequence of RawMem effects.  We sometimes initialize
>> 112:           // a whole 'tile' of array elements with a single jint or jlong.)
> 
> I'm having trouble making sense of this comment.  I guess a jlong could be used to null-initialize two
> 32bit oops/narrowOops?  But that doesn't have anything to do with jints.

I am not sure the complex overlap test is necessary here, this code was copy-pasted from [MemNode::find_previous_store()](https://github.com/openjdk/jdk/blob/c54fc08aa3c63e4b26dc5edb2436844dfd3bab7c/src/hotspot/share/opto/memnode.cpp#L678) by [JDK-8057737](https://bugs.openjdk.org/browse/JDK-8057737), and in this new context I do not see how we might find stores of different sizes as mentioned in the comment. jlongs could be used to null-initialize two 32-bit OOPs, but such initializing stores are not even visible in C2's intermediate representation at the time `G1BarrierSetC2::g1_can_remove_pre_barrier()` is called. The fact that the comment refers to initializing several array elements with a single jint suggests to me that this code has lost some of its original purpose after being copied into a narrower context (OOP stores after object allocations). But since this code is pre-existing and in the worst case it is just performing some unnecessary work, I suggest to leave it as-is a
 nd possibly investigate how to simplify it as a follow-up task.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1750400106

From eosterlund at openjdk.org  Mon Sep  9 14:47:06 2024
From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=)
Date: Mon, 9 Sep 2024 14:47:06 GMT
Subject: RFR: 8339661: ZGC: Move some page resets and verification to
 callsites
In-Reply-To: <oiSYmbM4LhTgP1NIAWbOceKhb1PLUEKYJhKj9RbdcJA=.21409309-eb6d-4685-86c5-3371037a3b52@github.com>
References: <oiSYmbM4LhTgP1NIAWbOceKhb1PLUEKYJhKj9RbdcJA=.21409309-eb6d-4685-86c5-3371037a3b52@github.com>
Message-ID: <kvRqcjBNQKGXcfu_WQQw50HB8904f2e2mR9n7rD8LUU=.f61bd9d5-d8b6-4086-af81-779af1c98e57@github.com>

On Fri, 6 Sep 2024 12:43:28 GMT, Joel Sikstr?m <duke at openjdk.org> wrote:

> Currently, `ZPage::reset()` does different things depending on where the page is being reset and what it will be used for. This leads to checks being somewhat hard to understand and to follow the edge-cases of.
> 
> By moving some of the reset logic that is now part of `ZPage::reset()` to the callsite, some operations can be made easier to understand the reason behind when reading the code.
> 
> Main highlights:
> - Clear logic behind initializing remsets at callsites, now guarded by asserts in `ZPage::remset_initialize()`.
> - `ZPage::clone_limited()` retains the value of the top-pointer.
> - The kind of verification for remsets are now at callsites:
>   - Allocations from the page cache, and only if the page got a remset
>   - Old-to-old in-place relocations, where only the inactive remset is checked

Nice change! Looks good.

-------------

Marked as reviewed by eosterlund (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20890#pullrequestreview-2290147705

From stefank at openjdk.org  Mon Sep  9 14:50:10 2024
From: stefank at openjdk.org (Stefan Karlsson)
Date: Mon, 9 Sep 2024 14:50:10 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v6]
In-Reply-To: <ppEMUrlGXfCRo9oZPRt0xf_FrD3-pv5SjJh8eBoe3TU=.92c19564-443c-41d0-b047-0a9a7c6dad10@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <BGmZ04yibLn_KtPyQrA3sQv5OtMonrcDy3aN0vAj8NA=.5de6cae3-3a05-4a74-a15c-aade4176bdeb@github.com>
 <ppEMUrlGXfCRo9oZPRt0xf_FrD3-pv5SjJh8eBoe3TU=.92c19564-443c-41d0-b047-0a9a7c6dad10@github.com>
Message-ID: <4yTWbD93OXGwYYxEQo56smKa5kl_WiPPcMsXSs0eUoQ=.893f54c8-ed2b-4a7f-bf0a-36553a951f47@github.com>

On Fri, 30 Aug 2024 08:06:31 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:

>> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Fix bit counts in GCForwarding
>
> src/hotspot/share/cds/filemap.cpp line 2507:
> 
>> 2505:   }
>> 2506: 
>> 2507:   if (compact_headers() != UseCompactObjectHeaders) {
> 
> (Commenting here, but the comment applies to code a bit above) While debugging CDS, it would have been useful to print the value of UseCompactObjectHeaders.
> 
> Could we change the code to be:
> 
>   log_info(cds)("Archive was created with UseCompressedOops = %d, UseCompressedClassPointers = %d, UseCompactObjectHeaders = %d",
>                           compressed_oops(), compressed_class_pointers(), compact_headers());

Resolved.

> src/hotspot/share/cds/filemap.cpp line 2508:
> 
>> 2506: 
>> 2507:   if (compact_headers() != UseCompactObjectHeaders) {
>> 2508:     log_info(cds)("The shared archive file's UseCompactObjectHeaders setting (%s)"
> 
> Printing on the `info` level mimics what we do when there's a mismatch for compressed classes (and oops), but I wonder if that one is intentional or if it is accidentally printing to 'info' instead of 'warning'.

@iklam informed me that some of the info levels (including this line) should be converted to warning.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750408043
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750410679

From rkennke at openjdk.org  Mon Sep  9 15:04:09 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Mon, 9 Sep 2024 15:04:09 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v6]
In-Reply-To: <4yTWbD93OXGwYYxEQo56smKa5kl_WiPPcMsXSs0eUoQ=.893f54c8-ed2b-4a7f-bf0a-36553a951f47@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <BGmZ04yibLn_KtPyQrA3sQv5OtMonrcDy3aN0vAj8NA=.5de6cae3-3a05-4a74-a15c-aade4176bdeb@github.com>
 <ppEMUrlGXfCRo9oZPRt0xf_FrD3-pv5SjJh8eBoe3TU=.92c19564-443c-41d0-b047-0a9a7c6dad10@github.com>
 <4yTWbD93OXGwYYxEQo56smKa5kl_WiPPcMsXSs0eUoQ=.893f54c8-ed2b-4a7f-bf0a-36553a951f47@github.com>
Message-ID: <-2JWx3F8EdyQ0Uf-mI62ImLXgjgIy9PEydjtKHhx12Q=.4d944301-6f1c-4270-953c-ec6c86df946a@github.com>

On Mon, 9 Sep 2024 14:47:28 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:

>> src/hotspot/share/cds/filemap.cpp line 2508:
>> 
>>> 2506: 
>>> 2507:   if (compact_headers() != UseCompactObjectHeaders) {
>>> 2508:     log_info(cds)("The shared archive file's UseCompactObjectHeaders setting (%s)"
>> 
>> Printing on the `info` level mimics what we do when there's a mismatch for compressed classes (and oops), but I wonder if that one is intentional or if it is accidentally printing to 'info' instead of 'warning'.
>
> @iklam informed me that some of the info levels (including this line) should be converted to warning.

Yeah that looks inconsistent with other places where we print a warning instead. I'll change it to warning for the UCOH check.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750430001

From stefank at openjdk.org  Mon Sep  9 15:04:12 2024
From: stefank at openjdk.org (Stefan Karlsson)
Date: Mon, 9 Sep 2024 15:04:12 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v8]
In-Reply-To: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <J0g0PSye768dhN3JoXFsZ6FivocNBLPmHNtRUCOmLTw=.5756c64a-4763-4727-9637-ba3fe7b9b17d@github.com>
 <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com>
Message-ID: <rhloCBhjvBZ4Cs8LLpD_UyHIBR08b8NIjXwIEQlirrs=.8a06186e-a315-4977-961c-b172e077bd9a@github.com>

On Mon, 9 Sep 2024 12:21:19 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:

>> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision:
>> 
>>  - Try to avoid lea in loadNklass (aarch64)
>>  - Fix release build error
>
> src/hotspot/share/oops/oop.hpp line 134:
> 
>> 132:   inline Klass*   forward_safe_klass(markWord m) const;
>> 133:   inline size_t   forward_safe_size();
>> 134:   inline void     forward_safe_init_mark();
> 
> Given the comment these methods do not seem "safe" to me. Maybe use "raw" or something to better indicate that care must be taken to use them.
> 
> Maybe the "safe" refers to use them only in "safe" contexts, but in Hotspot code iirc we use something like "raw" or "unsafe".

Restating my earlier comment about this: These functions are mainly used by the GCs. In one of the patches I've cleaned away all usages except for those in Shenandoah. I would prefer to see these completely removed from the oops/ directory and let the GCs decide when and how to perform "safe" reads of these values.

> src/hotspot/share/oops/oop.hpp line 356:
> 
>> 354:       return mark_offset_in_bytes() + sizeof(markWord) / 2;
>> 355:     } else
>> 356: #endif
> 
> Maybe instead of trying to calculate some random, meaningless value just use some "random" value directly?
> I am fine with the existing code, but first stating directly that "any value" works here, this additional code seems to confuse the message. (Fwiw, the method is also used during Universe initialization).

Just to be clear, the second part of the quoted sentence is important:
> could be any value *that is not a valid field offset*

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750428581
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750432186

From tschatzl at openjdk.org  Mon Sep  9 15:04:12 2024
From: tschatzl at openjdk.org (Thomas Schatzl)
Date: Mon, 9 Sep 2024 15:04:12 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v8]
In-Reply-To: <rhloCBhjvBZ4Cs8LLpD_UyHIBR08b8NIjXwIEQlirrs=.8a06186e-a315-4977-961c-b172e077bd9a@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <J0g0PSye768dhN3JoXFsZ6FivocNBLPmHNtRUCOmLTw=.5756c64a-4763-4727-9637-ba3fe7b9b17d@github.com>
 <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com>
 <rhloCBhjvBZ4Cs8LLpD_UyHIBR08b8NIjXwIEQlirrs=.8a06186e-a315-4977-961c-b172e077bd9a@github.com>
Message-ID: <W-1ANH5uIdfAoNRB5Mp1wbdS_aswZz8JhdHNNzz6pMw=.b3e0db43-1df0-4b0f-a6fe-ca77aa23d8f1@github.com>

On Mon, 9 Sep 2024 15:00:09 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:

>     could be any value that is not a valid field offset

I understand that that "random value" needs to satisfy this condition.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750433800

From stefank at openjdk.org  Mon Sep  9 15:34:10 2024
From: stefank at openjdk.org (Stefan Karlsson)
Date: Mon, 9 Sep 2024 15:34:10 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v5]
In-Reply-To: <tgYxhCcpQNFDFZ3hfWxp7RYksBd-E2ODD4FctuvjVVY=.5ab13f08-736e-47de-9340-78abbf1a2541@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <TzmQRM_6QAc9tdMzNYnzSPR_nENrokTzNx2q-TPuWpw=.cfcbe6f4-4397-4c31-be73-d5a456618090@github.com>
 <-G_gdaZBT2xhZFsdyEwIqiOHpbLpiL79N6NDsW8X2BY=.bc52bd8a-21c5-40e7-a921-a5f68675200f@github.com>
 <BarC_59LJpof-lBzklJ8Dm8liOxnh6jr9cbCn6-NCTI=.64c466f7-c527-44e0-9f5f-2c6928e6a436@github.com>
 <3QGPH52NyrDPne5EgoGx2sx9OeGRu9K72onNNwzMr2M=.8a390b3d-2e8a-470e-8bb7-1ba975070c53@github.com>
 <tgYxhCcpQNFDFZ3hfWxp7RYksBd-E2ODD4FctuvjVVY=.5ab13f08-736e-47de-9340-78abbf1a2541@github.com>
Message-ID: <h4Kzrm_0dTMlvlrqQwTW6ZoSifbhEZjrTB0Cyd4Kf7M=.8c70440e-e19a-45c0-ac97-d9fd22f774ed@github.com>

On Mon, 9 Sep 2024 12:59:36 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

> That %zu is SIZE_FORMAT, right?

Yes. Reviewers have lately encouraged people to use %zu instead of SIZE_FORMAT.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750482486

From stefank at openjdk.org  Mon Sep  9 15:34:09 2024
From: stefank at openjdk.org (Stefan Karlsson)
Date: Mon, 9 Sep 2024 15:34:09 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v3]
In-Reply-To: <GA5I5QeGCRY4m_9sFnco0CkvJ18upVlhDSA-VyiNhUw=.6cbd78f0-1064-46dc-8791-18c8c63c5e8c@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <bKtmNAMLqRlWTG9MBcosxeh2rYNK_yx_gkh_qbNKt6s=.e23c8a75-0072-4c9f-8d8e-7db14826fded@github.com>
 <ySywujYrcJzmYZtj5oflhcQZbzxPv78dm29gJ5B1fqE=.31b98908-bc0f-48af-b67a-75e7351d1b74@github.com>
 <CjcHSbAikUEXe-wpmu-p2od4z6IN-QC54DCKL5hSaZY=.85532a31-e607-447b-8cea-80e97024f994@github.com>
 <VGJOzveV1b0KxnQ_lSugtMP7boOK5TPmiYT7QXgcCoM=.780e110d-491c-4bbc-9ee7-a7db237ee9dc@github.com>
 <LwLTyRUwTUB8Kph7ngHJExHVwAdJSvimoXRsh6A2HcM=.1c03eb16-4e7a-42ac-bbe1-425f4f7fed75@github.com>
 <GA5I5QeGCRY4m_9sFnco0CkvJ18upVlhDSA-VyiNhUw=.6cbd78f0-1064-46dc-8791-18c8c63c5e8c@github.com>
Message-ID: <c4JGl_coRGcWROB9KAlYGOt5zBCqqoFH_oYGKtq9z0M=.3cdaecfb-eef0-4d40-9b51-7df223118d10@github.com>

On Mon, 9 Sep 2024 12:49:05 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>>> If we get a promotion failure in the young gen, we are leaving the dead objects marked as forwarded. 
>> 
>> True; need to do sth like `obj->init_mark();` for the non-self-forwarded case. The postcondition is that no forwarded objs in eden/from.
>
> ParallelGC actually doesn't use bitmaps, it pushes all forwarded objs to preserved-marks-table, and uses that to find forwarded objects, which is why we can't remove the preserved-marks table in ParallelGC (IOW, after this patch, the preserved-marks-stuff in Parallel scavenger is *only* used to find forwarded objects. We might want to think about more efficient solutions for this).

(Just to clarify if others are reading this)

Right, what I referred to above was how we found the object to forward, which is done via the bitmaps:

          while (cur_addr < region_end) {
            cur_addr = mark_bitmap()->find_obj_beg(cur_addr, region_end);

If the Parallel Old collector didn't do that, but instead parsed the heap like Serial does, then the Parallel Young collector would also have to fix the from space copies of moved objects when when it hits a promotion failure, just like Serial does. This was just meant to point out the differences between the two collectors and why the young GC code is different.

I realize that in earlier comments I called the from-space copy of the objects "dead objects", but they are not dead they are just the stale objects that are discoverable because of promotion failure keeping the eden and from spaces.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750480983

From cjplummer at openjdk.org  Mon Sep  9 16:56:08 2024
From: cjplummer at openjdk.org (Chris Plummer)
Date: Mon, 9 Sep 2024 16:56:08 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v6]
In-Reply-To: <Og4I5G3w51d7zF27JawLM9pm_XpHwijYt3lJZ4hpeWQ=.fd974e82-2ce0-4ebd-b49e-108ab87690a9@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <BGmZ04yibLn_KtPyQrA3sQv5OtMonrcDy3aN0vAj8NA=.5de6cae3-3a05-4a74-a15c-aade4176bdeb@github.com>
 <Og4I5G3w51d7zF27JawLM9pm_XpHwijYt3lJZ4hpeWQ=.fd974e82-2ce0-4ebd-b49e-108ab87690a9@github.com>
Message-ID: <1VACYSoQRtP9m4BJkCVrdFxueC75Kg4Kp3wjGsAA2Dw=.53563f62-70cf-4d93-8d99-69b737812ba6@github.com>

On Mon, 26 Aug 2024 21:30:51 GMT, Chris Plummer <cjplummer at openjdk.org> wrote:

>> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Fix bit counts in GCForwarding
>
> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/Oop.java line 85:
> 
>> 83: 
>> 84:   private static Klass getKlass(Mark mark) {
>> 85:     assert(VM.getVM().isCompactObjectHeadersEnabled());
> 
> `mark.getKlass()` already does this assert. I don't see any value in this `getKlass()` method. The caller should just call `getMark().getKlass()` rather than `getKlass(getMark())`.

I'm not sure why this got marked as resolved.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750600652

From cjplummer at openjdk.org  Mon Sep  9 16:56:08 2024
From: cjplummer at openjdk.org (Chris Plummer)
Date: Mon, 9 Sep 2024 16:56:08 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v6]
In-Reply-To: <S-95OBPOZb6GfBEUIDtHxCB4hTFPB2uNf5YmwMerGcE=.83afc323-883e-4ae0-8d10-fe7030cddb33@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <BGmZ04yibLn_KtPyQrA3sQv5OtMonrcDy3aN0vAj8NA=.5de6cae3-3a05-4a74-a15c-aade4176bdeb@github.com>
 <Og4I5G3w51d7zF27JawLM9pm_XpHwijYt3lJZ4hpeWQ=.fd974e82-2ce0-4ebd-b49e-108ab87690a9@github.com>
 <S-95OBPOZb6GfBEUIDtHxCB4hTFPB2uNf5YmwMerGcE=.83afc323-883e-4ae0-8d10-fe7030cddb33@github.com>
Message-ID: <ZnddOJnCLP0HABu3ymeWaAvcJNd3mKz8AhFwOtUAYW8=.f534bbd8-c222-4cd7-b6a9-3d6fd644c5e1@github.com>

On Mon, 9 Sep 2024 14:32:49 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/Oop.java line 169:
>> 
>>> 167:         } else {
>>> 168:           visitor.doMetadata(klass, true);
>>> 169:         }
>> 
>> Why is there no `visitor.doMetadata()` call for the compact object header case?
>
> There is no dedicated klass field anymore, the Klass* is encoded in the mark, and we would need to extract it. What is the purpose of the visitors? Do they need to see the klass/compressedKlass, or is it sufficient to visit the mark-word (which we already do, but as CInt).

I've been looking into this. It's a bit hard to follow but I think you do need to do something more here. Can you run ClhsdbInspect.java and send me the output. In specific I need to know if the log includes the following:


hsdb> + inspect 0x00000007cff154b8
instance of Oop for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject @ 0x00000007cff154b8 (size = 24)
_mark: 1
_metadata._compressed_klass: InstanceKlass for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject
firstWaiter: Oop for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionNode @ 0x00000007cfff5f80
lastWaiter: Oop for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionNode @ 0x00000007cfff5f80
this$0: Oop for java/util/concurrent/locks/ReentrantLock$NonfairSync @ 0x00000007cff15498

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750598648

From rkennke at openjdk.org  Mon Sep  9 17:45:47 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Mon, 9 Sep 2024 17:45:47 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v9]
In-Reply-To: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
Message-ID: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com>

> This is the main body of the JEP 450: Compact Object Headers (Experimental).
> 
> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
> 
> Main changes:
>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>  - Arrays will now store their length at offset 8.
>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv...

Roman Kennke has updated the pull request incrementally with six additional commits since the last revision:

 - Print as warning when UCOH doesn't match in CDS archive
 - Improve initialization of mark-word in CDS ArchiveHeapWriter
 - Simplify getKlass() in SA
 - Simplify oopDesc::init_mark()
 - Get rid of forward_safe_* methods
 - GCForwarding touch-ups

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20677/files
  - new: https://git.openjdk.org/jdk/pull/20677/files/70f492d3..2884499a

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=08
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=07-08

  Stats: 132 lines in 17 files changed: 26 ins; 73 del; 33 mod
  Patch: https://git.openjdk.org/jdk/pull/20677.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677

PR: https://git.openjdk.org/jdk/pull/20677

From cjplummer at openjdk.org  Mon Sep  9 18:37:09 2024
From: cjplummer at openjdk.org (Chris Plummer)
Date: Mon, 9 Sep 2024 18:37:09 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v6]
In-Reply-To: <ZnddOJnCLP0HABu3ymeWaAvcJNd3mKz8AhFwOtUAYW8=.f534bbd8-c222-4cd7-b6a9-3d6fd644c5e1@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <BGmZ04yibLn_KtPyQrA3sQv5OtMonrcDy3aN0vAj8NA=.5de6cae3-3a05-4a74-a15c-aade4176bdeb@github.com>
 <Og4I5G3w51d7zF27JawLM9pm_XpHwijYt3lJZ4hpeWQ=.fd974e82-2ce0-4ebd-b49e-108ab87690a9@github.com>
 <S-95OBPOZb6GfBEUIDtHxCB4hTFPB2uNf5YmwMerGcE=.83afc323-883e-4ae0-8d10-fe7030cddb33@github.com>
 <ZnddOJnCLP0HABu3ymeWaAvcJNd3mKz8AhFwOtUAYW8=.f534bbd8-c222-4cd7-b6a9-3d6fd644c5e1@github.com>
Message-ID: <SKA53xBzGbvMWiQxBtEYG9QB2KqZOKEBMzJzbQb_Zr8=.0cd24aa4-3aed-46e6-9092-bbc4b926bb42@github.com>

On Mon, 9 Sep 2024 16:51:35 GMT, Chris Plummer <cjplummer at openjdk.org> wrote:

>> There is no dedicated klass field anymore, the Klass* is encoded in the mark, and we would need to extract it. What is the purpose of the visitors? Do they need to see the klass/compressedKlass, or is it sufficient to visit the mark-word (which we already do, but as CInt).
>
> I've been looking into this. It's a bit hard to follow but I think you do need to do something more here. Can you run ClhsdbInspect.java and send me the output. In specific I need to know if the log includes the following:
> 
> 
> hsdb> + inspect 0x00000007cff154b8
> instance of Oop for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject @ 0x00000007cff154b8 (size = 24)
> _mark: 1
> _metadata._compressed_klass: InstanceKlass for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject
> firstWaiter: Oop for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionNode @ 0x00000007cfff5f80
> lastWaiter: Oop for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionNode @ 0x00000007cfff5f80
> this$0: Oop for java/util/concurrent/locks/ReentrantLock$NonfairSync @ 0x00000007cff15498

I pulled your changes and I see one slight difference in the output. The following line is missing:

`_metadata._compressed_klass: InstanceKlass for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject`

I realize that there is no `_metadata._compressed_klass` when you have compact headers, and that the Klass* is encoded in the `_mark` word, which is now looks something like this in the output:

_mark: 16294762323640321

So you can say that the Klass* is embedded in the _mark work, but this isn't of much help to SA users. I think what is expected is that the visitor is passed a MetadataField object that when getValue() is called on it, the Klass mirror is returned. Maybe we need a new CompactKlassField type like we current have a NarrowKlassField field type, and it will do the decoding of the _mark work into a Klass. The current getKlass() is related to this.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750743693

From cjplummer at openjdk.org  Mon Sep  9 19:07:10 2024
From: cjplummer at openjdk.org (Chris Plummer)
Date: Mon, 9 Sep 2024 19:07:10 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v6]
In-Reply-To: <SKA53xBzGbvMWiQxBtEYG9QB2KqZOKEBMzJzbQb_Zr8=.0cd24aa4-3aed-46e6-9092-bbc4b926bb42@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <BGmZ04yibLn_KtPyQrA3sQv5OtMonrcDy3aN0vAj8NA=.5de6cae3-3a05-4a74-a15c-aade4176bdeb@github.com>
 <Og4I5G3w51d7zF27JawLM9pm_XpHwijYt3lJZ4hpeWQ=.fd974e82-2ce0-4ebd-b49e-108ab87690a9@github.com>
 <S-95OBPOZb6GfBEUIDtHxCB4hTFPB2uNf5YmwMerGcE=.83afc323-883e-4ae0-8d10-fe7030cddb33@github.com>
 <ZnddOJnCLP0HABu3ymeWaAvcJNd3mKz8AhFwOtUAYW8=.f534bbd8-c222-4cd7-b6a9-3d6fd644c5e1@github.com>
 <SKA53xBzGbvMWiQxBtEYG9QB2KqZOKEBMzJzbQb_Zr8=.0cd24aa4-3aed-46e6-9092-bbc4b926bb42@github.com>
Message-ID: <xyQ56u8rEyLnbVv9iP23zziwfEkc0v9bm1MGVQ8BUHY=.352e102f-41d3-4ef6-86d2-2ebc998671d1@github.com>

On Mon, 9 Sep 2024 18:34:10 GMT, Chris Plummer <cjplummer at openjdk.org> wrote:

>> I've been looking into this. It's a bit hard to follow but I think you do need to do something more here. Can you run ClhsdbInspect.java and send me the output. In specific I need to know if the log includes the following:
>> 
>> 
>> hsdb> + inspect 0x00000007cff154b8
>> instance of Oop for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject @ 0x00000007cff154b8 (size = 24)
>> _mark: 1
>> _metadata._compressed_klass: InstanceKlass for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject
>> firstWaiter: Oop for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionNode @ 0x00000007cfff5f80
>> lastWaiter: Oop for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionNode @ 0x00000007cfff5f80
>> this$0: Oop for java/util/concurrent/locks/ReentrantLock$NonfairSync @ 0x00000007cff15498
>
> I pulled your changes and I see one slight difference in the output. The following line is missing:
> 
> `_metadata._compressed_klass: InstanceKlass for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject`
> 
> I realize that there is no `_metadata._compressed_klass` when you have compact headers, and that the Klass* is encoded in the `_mark` word, which is now looks something like this in the output:
> 
> _mark: 16294762323640321
> 
> So you can say that the Klass* is embedded in the _mark work, but this isn't of much help to SA users. I think what is expected is that the visitor is passed a MetadataField object that when getValue() is called on it, the Klass mirror is returned. Maybe we need a new CompactKlassField type like we current have a NarrowKlassField field type, and it will do the decoding of the _mark work into a Klass. The current getKlass() is related to this.

Thinking about this a bit more, maybe _mark needs to be a MetadataFile rather than CInt. This is a kind of odd situation. Basically we have a CInt field that is more than just simple bits used as flags or small integers. It also gets you to the Klass*. Possibly SA should treat _mark is two seprate fields; one that remains a CInt as it currently is and another that treats it as an encoded Klass* like the NarrowKlassField case.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750788243

From coleenp at openjdk.org  Mon Sep  9 19:55:16 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Mon, 9 Sep 2024 19:55:16 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v9]
In-Reply-To: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com>
Message-ID: <NR6CKOtdUaVarB7hiAUIAsIMe03qiTQr5VcEOWGt8mA=.164efc10-f0ff-4ddf-80b8-c56599c0ba7d@github.com>

On Mon, 9 Sep 2024 17:45:47 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> This is the main body of the JEP 450: Compact Object Headers (Experimental).
>> 
>> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
>> 
>> Main changes:
>>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>>  - Arrays will now store their length at offset 8.
>>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co...
>
> Roman Kennke has updated the pull request incrementally with six additional commits since the last revision:
> 
>  - Print as warning when UCOH doesn't match in CDS archive
>  - Improve initialization of mark-word in CDS ArchiveHeapWriter
>  - Simplify getKlass() in SA
>  - Simplify oopDesc::init_mark()
>  - Get rid of forward_safe_* methods
>  - GCForwarding touch-ups

I reviewed the oops code so far.

src/hotspot/share/oops/compressedKlass.cpp line 116:

> 114:   _range = end - _base;
> 115: 
> 116:   DEBUG_ONLY(assert_is_valid_encoding(addr, len, _base, _shift);)

Can you refactor so the aarch64 path runs this same code without duplication?

src/hotspot/share/oops/klass.hpp line 173:

> 171: 
> 172:   markWord _prototype_header;   // Used to initialize objects' header
> 173: 

I think you should move this up after ClassLoaderData, as there might be an alignment gap (you can run pahole to check).

src/hotspot/share/oops/klass.hpp line 718:

> 716: 
> 717:   markWord prototype_header() const {
> 718:     assert(UseCompactObjectHeaders, "only use with compact object headers");

Should this unconditionally return _prototype_header since it's initialized to markWord::prototype_header(), or would that decrease performance for the non-compact headers case?

src/hotspot/share/oops/klass.inline.hpp line 54:

> 52: }
> 53: 
> 54: inline void Klass::set_prototype_header(markWord header) {

Can you put a comment that this is only used when dumping the archive?  Because otherwise the Klass::_prototype_header field should always be initialized to the right thing (either with Klass encoded or as markWord::protoytpe_header()) and doesn't change.

src/hotspot/share/oops/markWord.inline.hpp line 90:

> 88:   ShouldNotReachHere();
> 89:   return markWord();
> 90: #endif

Is the ifdef _LP64 necessary, since UseCompactObjectHeaders should always be false for 32 bits?

src/hotspot/share/oops/oop.inline.hpp line 90:

> 88:   } else {
> 89:     return markWord::prototype();
> 90:   }

Could this be unconditional since prototoype_header is initialized for all Klasses?

src/hotspot/share/oops/typeArrayKlass.cpp line 175:

> 173: size_t TypeArrayKlass::oop_size(oop obj) const {
> 174:   // In this assert, we cannot safely access the Klass* with compact headers.
> 175:   assert(UseCompactObjectHeaders || obj->is_typeArray(),"must be a type array");

Why not?  I think I'm missing something.  Klass should be in the markWord and that should be ok (?)

-------------

PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2290316150
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750529270
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750727211
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750730078
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750736547
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750739441
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750842383
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750721069

From coleenp at openjdk.org  Mon Sep  9 19:55:19 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Mon, 9 Sep 2024 19:55:19 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v8]
In-Reply-To: <J0g0PSye768dhN3JoXFsZ6FivocNBLPmHNtRUCOmLTw=.5756c64a-4763-4727-9637-ba3fe7b9b17d@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <J0g0PSye768dhN3JoXFsZ6FivocNBLPmHNtRUCOmLTw=.5756c64a-4763-4727-9637-ba3fe7b9b17d@github.com>
Message-ID: <vt3oD_XW4eFNTnPYoSa0-TZiEsFtBXJKfPdsIoqVuXM=.7e14c4df-4c66-4f52-9a59-e86fbcb6e7d9@github.com>

On Mon, 9 Sep 2024 11:55:52 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> This is the main body of the JEP 450: Compact Object Headers (Experimental).
>> 
>> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
>> 
>> Main changes:
>>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>>  - Arrays will now store their length at offset 8.
>>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co...
>
> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Try to avoid lea in loadNklass (aarch64)
>  - Fix release build error

src/hotspot/cpu/aarch64/compressedKlass_aarch64.cpp line 147:

> 145: #endif
> 146: 
> 147:   return true;

This should only be in the compressedKlass.cpp file.

src/hotspot/share/oops/compressedKlass.cpp line 214:

> 212:     ss.print("Class space size (%zu) exceeds the maximum possible size (%zu)",
> 213:               len, max_encoding_range_size());
> 214:     vm_exit_during_initialization(ss.base());

Why does this exit and not turn off compressed klass pointers and compact object headers?

src/hotspot/share/oops/compressedKlass.cpp line 222:

> 220:     return;
> 221:   }
> 222: #endif

Why not add null pd_initialize to zero to remove this conditional code?

src/hotspot/share/oops/compressedKlass.cpp line 224:

> 222: #endif
> 223: 
> 224:   if (tiny_classpointer_mode()) {

I kind of agree with Thomas Schatzl for this.  Maybe it should be compact_classpointer_mode(). It's nice to have a new string for grep, but they're not really that tiny.

src/hotspot/share/oops/compressedKlass.cpp line 234:

> 232:     _range = len;
> 233: 
> 234:     constexpr int log_cacheline = 6;

Is 6 the log of DEFAULT_CACHE_LINE_SIZE?

src/hotspot/share/oops/compressedKlass.cpp line 243:

> 241:   } else {
> 242: 
> 243:     // In legacy mode, we try, in order of preference:

Can you not use the word 'legacy' here?  Maybe in "non-compact object header mode"...

src/hotspot/share/oops/compressedKlass.inline.hpp line 100:

> 98:   check_valid_klass(k, base(), shift());
> 99:   // Also assert that k falls into what we know is the valid Klass range. This is usually smaller
> 100:   // than the encoding range (e.g. encoding range covers 4G, but we only have 1G class space and a

1G is the default CompressedClassSpaceSize but can be larger, right?  So the comment isn't quite accurate.  Or with tiny class pointers can it only be 1G?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750527537
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750511912
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750513660
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750515923
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750520712
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750524690
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750662637

From coleenp at openjdk.org  Mon Sep  9 19:55:20 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Mon, 9 Sep 2024 19:55:20 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v6]
In-Reply-To: <86SISXYdKqHq5_nSqeVVNgmxplVK6QuHvOjCmiCKkzQ=.92ac6af1-9a94-4068-b625-1e331314826e@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <BGmZ04yibLn_KtPyQrA3sQv5OtMonrcDy3aN0vAj8NA=.5de6cae3-3a05-4a74-a15c-aade4176bdeb@github.com>
 <86SISXYdKqHq5_nSqeVVNgmxplVK6QuHvOjCmiCKkzQ=.92ac6af1-9a94-4068-b625-1e331314826e@github.com>
Message-ID: <4cTKmlYUEtFpr2TURf25gd7-_eSb-uF0cC0BmLl6wd0=.b9f0482d-5439-421b-9a29-7a014fb72558@github.com>

On Mon, 9 Sep 2024 10:02:53 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:

>> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Fix bit counts in GCForwarding
>
> src/hotspot/share/oops/compressedKlass.hpp line 43:
> 
>> 41: 
>> 42:   // Tiny-class-pointer mode
>> 43:   static int _tiny_cp; // -1, 0=true, 1=false
> 
> Suggestion:
> 
>   static int _tiny_cp; // -1 = uninitialized, 0 = true, 1 = false
> 
> In addition to that, I am not sure if introducing a new term ("tiny") for compact class header related changes (and just here) makes the code more clear; I would have expected a "_compact_" prefix. Also all other members use "k"-klass and spell out "klass pointer", so I would prefer to keep that style.

I agree with this.  'cp' reads as ConstantPool for me even though this is a different context.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750531167

From stefank at openjdk.org  Mon Sep  9 20:07:13 2024
From: stefank at openjdk.org (Stefan Karlsson)
Date: Mon, 9 Sep 2024 20:07:13 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v9]
In-Reply-To: <NR6CKOtdUaVarB7hiAUIAsIMe03qiTQr5VcEOWGt8mA=.164efc10-f0ff-4ddf-80b8-c56599c0ba7d@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com>
 <NR6CKOtdUaVarB7hiAUIAsIMe03qiTQr5VcEOWGt8mA=.164efc10-f0ff-4ddf-80b8-c56599c0ba7d@github.com>
Message-ID: <1vlOROPKL6eDagcb0xz0MGe6vA6vYBVa4UzVsCZ5Q8I=.d5408660-ee94-4118-b809-d13a3dc500b4@github.com>

On Mon, 9 Sep 2024 18:15:38 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

>> Roman Kennke has updated the pull request incrementally with six additional commits since the last revision:
>> 
>>  - Print as warning when UCOH doesn't match in CDS archive
>>  - Improve initialization of mark-word in CDS ArchiveHeapWriter
>>  - Simplify getKlass() in SA
>>  - Simplify oopDesc::init_mark()
>>  - Get rid of forward_safe_* methods
>>  - GCForwarding touch-ups
>
> src/hotspot/share/oops/typeArrayKlass.cpp line 175:
> 
>> 173: size_t TypeArrayKlass::oop_size(oop obj) const {
>> 174:   // In this assert, we cannot safely access the Klass* with compact headers.
>> 175:   assert(UseCompactObjectHeaders || obj->is_typeArray(),"must be a type array");
> 
> Why not?  I think I'm missing something.  Klass should be in the markWord and that should be ok (?)

I tracked this down to only (at least in my testing) happen from `size_given_klass` when called from the GC when it is about to copy an object. While that happens another thread can racingly succeed to copy the object and install a forwarding pointer over the old copy. When that happens the klass pointer is broken and the call to oopDesc::is_typeArray() crashes.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750862842

From coleenp at openjdk.org  Mon Sep  9 20:23:11 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Mon, 9 Sep 2024 20:23:11 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v9]
In-Reply-To: <1vlOROPKL6eDagcb0xz0MGe6vA6vYBVa4UzVsCZ5Q8I=.d5408660-ee94-4118-b809-d13a3dc500b4@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com>
 <NR6CKOtdUaVarB7hiAUIAsIMe03qiTQr5VcEOWGt8mA=.164efc10-f0ff-4ddf-80b8-c56599c0ba7d@github.com>
 <1vlOROPKL6eDagcb0xz0MGe6vA6vYBVa4UzVsCZ5Q8I=.d5408660-ee94-4118-b809-d13a3dc500b4@github.com>
Message-ID: <s69LCOhxAaI5gsdJKdqHoyQfdqNlcdftQW9xiyiqG1w=.b754a259-4f44-4844-a926-9b05c7199652@github.com>

On Mon, 9 Sep 2024 20:04:22 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:

>> src/hotspot/share/oops/typeArrayKlass.cpp line 175:
>> 
>>> 173: size_t TypeArrayKlass::oop_size(oop obj) const {
>>> 174:   // In this assert, we cannot safely access the Klass* with compact headers.
>>> 175:   assert(UseCompactObjectHeaders || obj->is_typeArray(),"must be a type array");
>> 
>> Why not?  I think I'm missing something.  Klass should be in the markWord and that should be ok (?)
>
> I tracked this down to only (at least in my testing) happen from `size_given_klass` when called from the GC when it is about to copy an object. While that happens another thread can racingly succeed to copy the object and install a forwarding pointer over the old copy. When that happens the klass pointer is broken and the call to oopDesc::is_typeArray() crashes.

I did miss something.  I thought the markWord was never overwritten by the forwarding pointer.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750882259

From rkennke at openjdk.org  Tue Sep 10 07:23:13 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Tue, 10 Sep 2024 07:23:13 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v6]
In-Reply-To: <86SISXYdKqHq5_nSqeVVNgmxplVK6QuHvOjCmiCKkzQ=.92ac6af1-9a94-4068-b625-1e331314826e@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <BGmZ04yibLn_KtPyQrA3sQv5OtMonrcDy3aN0vAj8NA=.5de6cae3-3a05-4a74-a15c-aade4176bdeb@github.com>
 <86SISXYdKqHq5_nSqeVVNgmxplVK6QuHvOjCmiCKkzQ=.92ac6af1-9a94-4068-b625-1e331314826e@github.com>
Message-ID: <Ed16htADrBzE1qG3KNbjqiOocZTb_s_n4LG5jfENVEs=.b0ffbe54-30a7-496d-ae7c-9204f97e1eb5@github.com>

On Mon, 9 Sep 2024 10:16:24 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:

>> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Fix bit counts in GCForwarding
>
> src/hotspot/share/gc/shared/gcForwarding.hpp line 41:
> 
>> 39:  * bits (to indicate 'forwarded' state as usual).
>> 40:  */
>> 41: class GCForwarding : public AllStatic {
> 
> Since this class is only used for Full GCs, it may be useful to include that information, i.e. something like `FullGCForwarding` to avoid confusion why it is not used for other GCs too.
> (Unless this has been discussed and even rejected by me before).

I agree. In-fact, that has been my original name. It has been suggested that I change it to SlidingForwarding when that was the approach that we were going to take, but with the new implementation, FullGCForwarding makes most sense. I'll change it.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751400378

From rkennke at openjdk.org  Tue Sep 10 07:56:09 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Tue, 10 Sep 2024 07:56:09 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v6]
In-Reply-To: <86SISXYdKqHq5_nSqeVVNgmxplVK6QuHvOjCmiCKkzQ=.92ac6af1-9a94-4068-b625-1e331314826e@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <BGmZ04yibLn_KtPyQrA3sQv5OtMonrcDy3aN0vAj8NA=.5de6cae3-3a05-4a74-a15c-aade4176bdeb@github.com>
 <86SISXYdKqHq5_nSqeVVNgmxplVK6QuHvOjCmiCKkzQ=.92ac6af1-9a94-4068-b625-1e331314826e@github.com>
Message-ID: <Ok_qSVYsLHQokqwR-CoQcmDE2Ai8uZldwMB5tZT8r6Y=.7c8d6e77-45ed-4896-9629-fbb0a6074655@github.com>

On Mon, 9 Sep 2024 10:21:54 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:

>> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Fix bit counts in GCForwarding
>
> src/hotspot/share/gc/shared/collectedHeap.cpp line 232:
> 
>> 230:   }
>> 231: 
>> 232:   // With compact headers, we can't safely access the class, due
> 
> Suggestion:
> 
>   // With compact headers, we can't safely access the klass, due
> 
> 
> This is the case why? Because we might not have copied the header yet? Is this method actually ever used while the forwarded object is unstable?
> Given this is used for verification only afaik, we should make an effort to provide that check.

With compact headers, we can't safely access the Klass* when the object has been forwarded, because non-full-GC-forwarding temporarily overwrites the mark-word, and thus the Klass*, with the forwarding pointer, and here we have no way to make a distinction between Full-GC and regular GC forwarding.

I improved the code to make the check when the object is not forwarded. Not sure if we could/should do more (e.g. pass around is_full argument to make the distinction, or find the - possibly few - places where we might call is_oop() on from-space objects in regular GC and do the check in a forwardee-safe way?).

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751448814

From rkennke at openjdk.org  Tue Sep 10 08:36:13 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Tue, 10 Sep 2024 08:36:13 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v8]
In-Reply-To: <rhloCBhjvBZ4Cs8LLpD_UyHIBR08b8NIjXwIEQlirrs=.8a06186e-a315-4977-961c-b172e077bd9a@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <J0g0PSye768dhN3JoXFsZ6FivocNBLPmHNtRUCOmLTw=.5756c64a-4763-4727-9637-ba3fe7b9b17d@github.com>
 <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com>
 <rhloCBhjvBZ4Cs8LLpD_UyHIBR08b8NIjXwIEQlirrs=.8a06186e-a315-4977-961c-b172e077bd9a@github.com>
Message-ID: <U4wnNpLlescFdpGPsZwetL2Ymz6ygUyw7lNnEkwiw68=.5118cf4f-c46d-4ac1-8b5b-48f86e590f1c@github.com>

On Mon, 9 Sep 2024 14:58:07 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:

>> src/hotspot/share/oops/oop.hpp line 134:
>> 
>>> 132:   inline Klass*   forward_safe_klass(markWord m) const;
>>> 133:   inline size_t   forward_safe_size();
>>> 134:   inline void     forward_safe_init_mark();
>> 
>> Given the comment these methods do not seem "safe" to me. Maybe use "raw" or something to better indicate that care must be taken to use them.
>> 
>> Maybe the "safe" refers to use them only in "safe" contexts, but in Hotspot code iirc we use something like "raw" or "unsafe".
>
> Restating my earlier comment about this: These functions are mainly used by the GCs. In one of the patches I've cleaned away all usages except for those in Shenandoah. I would prefer to see these completely removed from the oops/ directory and let the GCs decide when and how to perform "safe" reads of these values.

I've removed those methods.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751514466

From rkennke at openjdk.org  Tue Sep 10 08:40:12 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Tue, 10 Sep 2024 08:40:12 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v8]
In-Reply-To: <W-1ANH5uIdfAoNRB5Mp1wbdS_aswZz8JhdHNNzz6pMw=.b3e0db43-1df0-4b0f-a6fe-ca77aa23d8f1@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <J0g0PSye768dhN3JoXFsZ6FivocNBLPmHNtRUCOmLTw=.5756c64a-4763-4727-9637-ba3fe7b9b17d@github.com>
 <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com>
 <rhloCBhjvBZ4Cs8LLpD_UyHIBR08b8NIjXwIEQlirrs=.8a06186e-a315-4977-961c-b172e077bd9a@github.com>
 <W-1ANH5uIdfAoNRB5Mp1wbdS_aswZz8JhdHNNzz6pMw=.b3e0db43-1df0-4b0f-a6fe-ca77aa23d8f1@github.com>
Message-ID: <k5oJ0MxPfzSI0V4d8SGr8ApvuM9Xpuy25WiJsJ51Tl8=.62dd2a4c-497d-42bc-a513-0151786b19c5@github.com>

On Mon, 9 Sep 2024 15:01:10 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:

>> Just to be clear, the second part of the quoted sentence is important:
>>> could be any value *that is not a valid field offset*
>
>>     could be any value that is not a valid field offset
> 
> I understand that that "random value" needs to satisfy this condition.

With compact headers, this value should only be used in C2, and not really as an actual offset. An earlier version of the change had the value in src/hotspot/share/opto/type.hpp instead, and only an assert(!UCOH) in oopDesc::klass_offset_in_bytes(). I think this would be a better solution overall, because it prevents accidental (and wrong) usage of the klass_offset in the runtime. Back then it has been rejected by somebody (don't remember), because it made the C2 diff a little messier, so I kept it like it is now. I would prefer to reinstate it, though.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751522091

From rkennke at openjdk.org  Tue Sep 10 08:44:11 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Tue, 10 Sep 2024 08:44:11 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v8]
In-Reply-To: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <J0g0PSye768dhN3JoXFsZ6FivocNBLPmHNtRUCOmLTw=.5756c64a-4763-4727-9637-ba3fe7b9b17d@github.com>
 <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com>
Message-ID: <bDtRnL6u3kG7s2mNBQyDSrWboOmMhdJVzUwfqd8osig=.c47cbaa9-9036-43f9-9fc7-75781296961b@github.com>

On Mon, 9 Sep 2024 12:12:23 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:

>> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision:
>> 
>>  - Try to avoid lea in loadNklass (aarch64)
>>  - Fix release build error
>
> src/hotspot/share/oops/objArrayKlass.inline.hpp line 74:
> 
>> 72: void ObjArrayKlass::oop_oop_iterate(oop obj, OopClosureType* closure) {
>> 73:   // In this assert, we cannot safely access the Klass* with compact headers.
>> 74:   assert (UseCompactObjectHeaders || obj->is_array(), "obj must be array");
> 
> If we can't safely access the `Klass*` here, why is the call to `obj->klass()` below safe?

Good question. This comment and assert can probably be removed (same for the similar comment/assert in TypeArrayKlass::oop_oop_iterate_impl(). Could be a left-over from a time when we had to deal with OM and/or stack-locks in the header.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751527745

From mli at openjdk.org  Tue Sep 10 08:54:13 2024
From: mli at openjdk.org (Hamlin Li)
Date: Tue, 10 Sep 2024 08:54:13 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v6]
In-Reply-To: <6I0T4rOjOTj-FZxpspatEo6j1_Num75bCAOBNxsrHI8=.097f731e-7c92-4eac-a379-c2df336cd412@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <BGmZ04yibLn_KtPyQrA3sQv5OtMonrcDy3aN0vAj8NA=.5de6cae3-3a05-4a74-a15c-aade4176bdeb@github.com>
 <utcfd2HiwL8a7ut2UJ7tHb1P_oz5TKXA9bwGQ5G_FnE=.63fa2666-dc3f-406d-b8d2-4f960f05f05d@github.com>
 <n33g0VZINE0Qly_UoJV2zOCDkapbuqFgRyzTYor6TrE=.7acb0ac5-5bbe-4747-bf3c-20110201616c@github.com>
 <wVUrQQtzaYdZNPBPyiuln0Zd8C67zG2y0yX2Iik6l_I=.0be5dcf7-c48d-4bb9-9966-23530f0b8d2e@github.com>
 <6I0T4rOjOTj-FZxpspatEo6j1_Num75bCAOBNxsrHI8=.097f731e-7c92-4eac-a379-c2df336cd412@github.com>
Message-ID: <J6x6kmRiK5huvPUhwooOR4_0blKnZqNvTaP-_kKhGlE=.e9e64c89-297b-42ee-ba51-1b79422d0387@github.com>

On Mon, 9 Sep 2024 14:08:53 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> Yes, I'm interested in it. Thanks for raising the discussion. :)
>
> If anybody is doing it, please send me a patch, or we can do it as a follow-up PR.

Thanks. I'll send it to you if I finish it in time, otherwise I will do it in a separate pr.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751544394

From lujianping521 at gmail.com  Tue Sep 10 09:16:51 2024
From: lujianping521 at gmail.com (=?UTF-8?B?6bKB5bu65bmz?=)
Date: Tue, 10 Sep 2024 17:16:51 +0800
Subject: Split Lock Warning with ZGC and -XX:-ClassUnloading on Linux x86_64, 
 JDK 17.0.2
Message-ID: <CAOXKhpUP78qtHA=uhF9bFGdatwbmb+OSU8oOuim3ggnz1oScZw@mail.gmail.com>

HI ALL:

When running JDK 17.0.2 on a Linux x86_64 architecture with ZGC and the JVM
option -XX:-ClassUnloading, I encounter split lock warnings from the Linux
kernel. This issue appears consistently during garbage collection
operations.

Here is the specific warning message from the kernel:
    x86/split lock detection: #AC: ZWorker#0/2154775 took a split_lock trap
at address:     0x7f50c6e0433c

Upon investigating the assembly at this address, I identified the following
instruction:
   0x00007f50c6e0433c <+76>: lock cmpxchg %rcx,(%rbx)

This is part of the function:
    Dump of assembler code for function
_ZN15ZMarkOopClosure6do_oopEPP7oopDesc:
 0x00007f50c6e0433c <+76>: lock cmpxchg %rcx,(%rbx)

The split lock warning occurs during the execution of the ZWorker thread,
which is responsible for concurrent marking in ZGC. The warning seems to be
triggered specifically when class unloading is disabled with
-XX:-ClassUnloading.

Environment:
JDK Version: OpenJDK 17.0.2
GC: ZGC with -XX:-ClassUnloading
OS: Linux x86_64

 I would like to understand if this behavior is expected when class
unloading is disabled or if there are any recommended fixes or workarounds
for avoiding the split lock issue during concurrent garbage collection.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20240910/af970946/attachment.htm>

From rkennke at openjdk.org  Tue Sep 10 09:31:09 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Tue, 10 Sep 2024 09:31:09 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v8]
In-Reply-To: <k5oJ0MxPfzSI0V4d8SGr8ApvuM9Xpuy25WiJsJ51Tl8=.62dd2a4c-497d-42bc-a513-0151786b19c5@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <J0g0PSye768dhN3JoXFsZ6FivocNBLPmHNtRUCOmLTw=.5756c64a-4763-4727-9637-ba3fe7b9b17d@github.com>
 <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com>
 <rhloCBhjvBZ4Cs8LLpD_UyHIBR08b8NIjXwIEQlirrs=.8a06186e-a315-4977-961c-b172e077bd9a@github.com>
 <W-1ANH5uIdfAoNRB5Mp1wbdS_aswZz8JhdHNNzz6pMw=.b3e0db43-1df0-4b0f-a6fe-ca77aa23d8f1@github.com>
 <k5oJ0MxPfzSI0V4d8SGr8ApvuM9Xpuy25WiJsJ51Tl8=.62dd2a4c-497d-42bc-a513-0151786b19c5@github.com>
Message-ID: <NM8izXWScsGxiKeCqUapYVOoPaUfzdHwJq1GvAH58cs=.28e5b32c-59e0-4b24-8244-9580296d2e84@github.com>

On Tue, 10 Sep 2024 08:37:43 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>>>     could be any value that is not a valid field offset
>> 
>> I understand that that "random value" needs to satisfy this condition.
>
> With compact headers, this value should only be used in C2, and not really as an actual offset. An earlier version of the change had the value in src/hotspot/share/opto/type.hpp instead, and only an assert(!UCOH) in oopDesc::klass_offset_in_bytes(). I think this would be a better solution overall, because it prevents accidental (and wrong) usage of the klass_offset in the runtime. Back then it has been rejected by somebody (don't remember), because it made the C2 diff a little messier, so I kept it like it is now. I would prefer to reinstate it, though.

> (Fwiw, the method is also used during Universe initialization).

Yes, but only in the -UCOH branch.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751604467

From stefank at openjdk.org  Tue Sep 10 10:05:10 2024
From: stefank at openjdk.org (Stefan Karlsson)
Date: Tue, 10 Sep 2024 10:05:10 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v8]
In-Reply-To: <bDtRnL6u3kG7s2mNBQyDSrWboOmMhdJVzUwfqd8osig=.c47cbaa9-9036-43f9-9fc7-75781296961b@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <J0g0PSye768dhN3JoXFsZ6FivocNBLPmHNtRUCOmLTw=.5756c64a-4763-4727-9637-ba3fe7b9b17d@github.com>
 <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com>
 <bDtRnL6u3kG7s2mNBQyDSrWboOmMhdJVzUwfqd8osig=.c47cbaa9-9036-43f9-9fc7-75781296961b@github.com>
Message-ID: <WyTjHxMb_l-Hkfe-sTK0e_4UpI9cn0E7BBiCpKDVI34=.282c10e9-a1cd-4d46-9d1f-133030ac9c58@github.com>

On Tue, 10 Sep 2024 08:41:16 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> src/hotspot/share/oops/objArrayKlass.inline.hpp line 74:
>> 
>>> 72: void ObjArrayKlass::oop_oop_iterate(oop obj, OopClosureType* closure) {
>>> 73:   // In this assert, we cannot safely access the Klass* with compact headers.
>>> 74:   assert (UseCompactObjectHeaders || obj->is_array(), "obj must be array");
>> 
>> If we can't safely access the `Klass*` here, why is the call to `obj->klass()` below safe?
>
> Good question. This comment and assert can probably be removed (same for the similar comment/assert in TypeArrayKlass::oop_oop_iterate_impl(). Could be a left-over from a time when we had to deal with OM and/or stack-locks in the header.

FWIW, I've been running tests with this assert restored (and the one in TypeArrayKlass) without hitting any problems.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751656595

From rkennke at openjdk.org  Tue Sep 10 11:29:09 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Tue, 10 Sep 2024 11:29:09 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v6]
In-Reply-To: <Ok_qSVYsLHQokqwR-CoQcmDE2Ai8uZldwMB5tZT8r6Y=.7c8d6e77-45ed-4896-9629-fbb0a6074655@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <BGmZ04yibLn_KtPyQrA3sQv5OtMonrcDy3aN0vAj8NA=.5de6cae3-3a05-4a74-a15c-aade4176bdeb@github.com>
 <86SISXYdKqHq5_nSqeVVNgmxplVK6QuHvOjCmiCKkzQ=.92ac6af1-9a94-4068-b625-1e331314826e@github.com>
 <Ok_qSVYsLHQokqwR-CoQcmDE2Ai8uZldwMB5tZT8r6Y=.7c8d6e77-45ed-4896-9629-fbb0a6074655@github.com>
Message-ID: <SypFb1aukQ3WLqnYAGKUJj1LaS8ZEWucIFA8FzKKWXA=.92647e41-af44-4b88-933f-0f1cc6eff820@github.com>

On Tue, 10 Sep 2024 07:53:23 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> src/hotspot/share/gc/shared/collectedHeap.cpp line 232:
>> 
>>> 230:   }
>>> 231: 
>>> 232:   // With compact headers, we can't safely access the class, due
>> 
>> Suggestion:
>> 
>>   // With compact headers, we can't safely access the klass, due
>> 
>> 
>> This is the case why? Because we might not have copied the header yet? Is this method actually ever used while the forwarded object is unstable?
>> Given this is used for verification only afaik, we should make an effort to provide that check.
>
> With compact headers, we can't safely access the Klass* when the object has been forwarded, because non-full-GC-forwarding temporarily overwrites the mark-word, and thus the Klass*, with the forwarding pointer, and here we have no way to make a distinction between Full-GC and regular GC forwarding.
> 
> I improved the code to make the check when the object is not forwarded. Not sure if we could/should do more (e.g. pass around is_full argument to make the distinction, or find the - possibly few - places where we might call is_oop() on from-space objects in regular GC and do the check in a forwardee-safe way?).

Ah, I found it! It seems only the ShenandoahVerifier calls oop_iterate() on from_space objects, which can have a forwarding, which would mess with the object's Klass*. We're lucky because that iterator doesn't visit the Klass*. I see the following ways out:
- The caller must ensure that the oop is ok and Klass* is accessible. I could do that in the ShenandoahVerifier. It kinda defeats the point, though, we want the verifier operate on the 'raw' object, not necessarily the forwardee.
- Next easy way out would be to use 'this' instead of obj->klass(). Should makes sense, because it should always be the same. Using 'this' in the assert (this->is_array_klass()) is kinda bogus, though. And asserting (this == obj->klass()) would be nice, but would have the same problem as before where we would need to exclude UCOH for the case where Shenandoah needs it. In-fact, this is done already in oopDesc::oop_iterate_backwards(), but also excluding UCOH.
- We could add a hook in the iterator that gives the Klass* for a given oop, which can then be overridden by the actual iterator to do the right thing, e.g. load the Klass* from the forwardee.

WDYT?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751770293

From tschatzl at openjdk.org  Tue Sep 10 12:02:11 2024
From: tschatzl at openjdk.org (Thomas Schatzl)
Date: Tue, 10 Sep 2024 12:02:11 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v18]
In-Reply-To: <IeVZPQjaeWE6AThbR5sjTIlartNQ_nlI20cPp3DF_Dw=.7f799fc4-aab6-4cb3-9b4e-a1a2d0288362@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <IeVZPQjaeWE6AThbR5sjTIlartNQ_nlI20cPp3DF_Dw=.7f799fc4-aab6-4cb3-9b4e-a1a2d0288362@github.com>
Message-ID: <PuajSlua0EuyqwkCV8s7EuUZcqHNbyZ_K-XhtSzak3g=.42af938b-d7e0-4fdb-bcbc-a7985703c60b@github.com>

On Mon, 9 Sep 2024 11:15:47 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail.
>> 
>> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold:
>> 
>> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and
>> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work.
>> 
>> ## Summary of the Changes
>> 
>> ### Platform-Independent Changes (`src/hotspot/share`)
>> 
>> These consist mainly of:
>> 
>> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early;
>> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and
>> - temporary support for porting the JEP to the remaining platforms.
>> 
>> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed.
>> 
>> ### Platform-Dependent Changes (`src/hotspot/cpu`)
>> 
>> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms.
>> 
>> #### ADL Changes
>> 
>> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy.
>> 
>> #### `G1BarrierSetAssembler` Changes
>> 
>> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ...
>
> Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion
>  - riscv port for JEP 475

src/hotspot/cpu/aarch64/gc/g1/g1BarrierSetAssembler_aarch64.cpp line 210:

> 208:                                             Label& done,
> 209:                                             bool new_val_may_be_null) {
> 210:     // Does store cross heap regions?

Suggestion:

  // Does store cross heap regions?

Indentation

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1751721626

From stuefe at openjdk.org  Tue Sep 10 12:07:09 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Tue, 10 Sep 2024 12:07:09 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v8]
In-Reply-To: <vt3oD_XW4eFNTnPYoSa0-TZiEsFtBXJKfPdsIoqVuXM=.7e14c4df-4c66-4f52-9a59-e86fbcb6e7d9@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <J0g0PSye768dhN3JoXFsZ6FivocNBLPmHNtRUCOmLTw=.5756c64a-4763-4727-9637-ba3fe7b9b17d@github.com>
 <vt3oD_XW4eFNTnPYoSa0-TZiEsFtBXJKfPdsIoqVuXM=.7e14c4df-4c66-4f52-9a59-e86fbcb6e7d9@github.com>
Message-ID: <sUnwVBIPq6l_4sgw5hOnjU4cP_RL4AM7f9RCFIOK6lc=.cb3e7f4b-e862-48f8-a308-e762ea272294@github.com>

On Mon, 9 Sep 2024 15:49:57 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

>> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision:
>> 
>>  - Try to avoid lea in loadNklass (aarch64)
>>  - Fix release build error
>
> src/hotspot/share/oops/compressedKlass.cpp line 214:
> 
>> 212:     ss.print("Class space size (%zu) exceeds the maximum possible size (%zu)",
>> 213:               len, max_encoding_range_size());
>> 214:     vm_exit_during_initialization(ss.base());
> 
> Why does this exit and not turn off compressed klass pointers and compact object headers?

This is tricky. We are already deep in initialization and have done a couple of decisions based on +UseCompressedClassPointers (e.g. CDS setup). I *think* we could still go with -UseCCP, but I wonder whether this is wise. 

Note that this error is not new. In the old code, we simply asserted. That left us with UB in release builds, which remains unresolved. I simply made the error explicit in release too.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751819814

From stuefe at openjdk.org  Tue Sep 10 12:16:11 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Tue, 10 Sep 2024 12:16:11 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v8]
In-Reply-To: <vt3oD_XW4eFNTnPYoSa0-TZiEsFtBXJKfPdsIoqVuXM=.7e14c4df-4c66-4f52-9a59-e86fbcb6e7d9@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <J0g0PSye768dhN3JoXFsZ6FivocNBLPmHNtRUCOmLTw=.5756c64a-4763-4727-9637-ba3fe7b9b17d@github.com>
 <vt3oD_XW4eFNTnPYoSa0-TZiEsFtBXJKfPdsIoqVuXM=.7e14c4df-4c66-4f52-9a59-e86fbcb6e7d9@github.com>
Message-ID: <Qmq5KYm4QolMsEt29X8lhx1oWSkBEApS-3iaIXd3L7o=.42bba1bc-d795-4712-a1b6-dab2781a60f7@github.com>

On Mon, 9 Sep 2024 15:50:50 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

>> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision:
>> 
>>  - Try to avoid lea in loadNklass (aarch64)
>>  - Fix release build error
>
> src/hotspot/share/oops/compressedKlass.cpp line 222:
> 
>> 220:     return;
>> 221:   }
>> 222: #endif
> 
> Why not add null pd_initialize to zero to remove this conditional code?

I can do that. Added to backlist (https://wiki.openjdk.org/display/lilliput/JEP-450+Review+Todo)

> src/hotspot/share/oops/compressedKlass.cpp line 224:
> 
>> 222: #endif
>> 223: 
>> 224:   if (tiny_classpointer_mode()) {
> 
> I kind of agree with Thomas Schatzl for this.  Maybe it should be compact_classpointer_mode(). It's nice to have a new string for grep, but they're not really that tiny.

Yes, makes sense. Added to backlist. 

This coding was developed somewhat independently from +COH at the beginning, but now the two parts (tinycp and the rest of COH) depend on each other anyway. I should just use UseCompactObjectHeaders or a flag directly derived from it.

> src/hotspot/share/oops/compressedKlass.cpp line 234:
> 
>> 232:     _range = len;
>> 233: 
>> 234:     constexpr int log_cacheline = 6;
> 
> Is 6 the log of DEFAULT_CACHE_LINE_SIZE?

64, yes

> src/hotspot/share/oops/compressedKlass.cpp line 243:
> 
>> 241:   } else {
>> 242: 
>> 243:     // In legacy mode, we try, in order of preference:
> 
> Can you not use the word 'legacy' here?  Maybe in "non-compact object header mode"...

okay.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751828214
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751831035
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751831994
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751833034

From duke at openjdk.org  Tue Sep 10 12:17:07 2024
From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=)
Date: Tue, 10 Sep 2024 12:17:07 GMT
Subject: RFR: 8339661: ZGC: Move some page resets and verification to
 callsites
In-Reply-To: <ML3aOke915yqdmmoJTjsiIk48NBHywdeEsObLozaE8w=.e631d22d-b040-4eeb-aace-ba81046afd51@github.com>
References: <oiSYmbM4LhTgP1NIAWbOceKhb1PLUEKYJhKj9RbdcJA=.21409309-eb6d-4685-86c5-3371037a3b52@github.com>
 <ML3aOke915yqdmmoJTjsiIk48NBHywdeEsObLozaE8w=.e631d22d-b040-4eeb-aace-ba81046afd51@github.com>
Message-ID: <PLBbs7vNvfXWKI07Pz75m9bvgGEswON5VXEVJo6ZXW0=.277aa231-f0b3-456f-8153-d0903d8b2957@github.com>

On Fri, 6 Sep 2024 13:01:12 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:

>> Currently, `ZPage::reset()` does different things depending on where the page is being reset and what it will be used for. This leads to checks being somewhat hard to understand and to follow the edge-cases of.
>> 
>> By moving some of the reset logic that is now part of `ZPage::reset()` to the callsite, some operations can be made easier to understand the reason behind when reading the code.
>> 
>> Main highlights:
>> - Clear logic behind initializing remsets at callsites, now guarded by asserts in `ZPage::remset_initialize()`.
>> - `ZPage::clone_limited()` retains the value of the top-pointer.
>> - The kind of verification for remsets are now at callsites:
>>   - Allocations from the page cache, and only if the page got a remset
>>   - Old-to-old in-place relocations, where only the inactive remset is checked
>
> Looks good!

Thank you for the reviews! @stefank @fisk

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20890#issuecomment-2340530203

From duke at openjdk.org  Tue Sep 10 12:17:09 2024
From: duke at openjdk.org (duke)
Date: Tue, 10 Sep 2024 12:17:09 GMT
Subject: RFR: 8339661: ZGC: Move some page resets and verification to
 callsites
In-Reply-To: <oiSYmbM4LhTgP1NIAWbOceKhb1PLUEKYJhKj9RbdcJA=.21409309-eb6d-4685-86c5-3371037a3b52@github.com>
References: <oiSYmbM4LhTgP1NIAWbOceKhb1PLUEKYJhKj9RbdcJA=.21409309-eb6d-4685-86c5-3371037a3b52@github.com>
Message-ID: <WC96yRAcxKe7cZIpcr60DYUpTPpxhV4F0WDlflbhzkk=.3047ca95-da85-4e26-bc57-6e508308ea3d@github.com>

On Fri, 6 Sep 2024 12:43:28 GMT, Joel Sikstr?m <duke at openjdk.org> wrote:

> Currently, `ZPage::reset()` does different things depending on where the page is being reset and what it will be used for. This leads to checks being somewhat hard to understand and to follow the edge-cases of.
> 
> By moving some of the reset logic that is now part of `ZPage::reset()` to the callsite, some operations can be made easier to understand the reason behind when reading the code.
> 
> Main highlights:
> - Clear logic behind initializing remsets at callsites, now guarded by asserts in `ZPage::remset_initialize()`.
> - `ZPage::clone_limited()` retains the value of the top-pointer.
> - The kind of verification for remsets are now at callsites:
>   - Allocations from the page cache, and only if the page got a remset
>   - Old-to-old in-place relocations, where only the inactive remset is checked

@jsikstro 
Your change (at version d3378b4f21086b4f2eb84d7bf7ecf2e9007acf8d) is now ready to be sponsored by a Committer.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20890#issuecomment-2340532527

From coleenp at openjdk.org  Tue Sep 10 12:22:09 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Tue, 10 Sep 2024 12:22:09 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v8]
In-Reply-To: <sUnwVBIPq6l_4sgw5hOnjU4cP_RL4AM7f9RCFIOK6lc=.cb3e7f4b-e862-48f8-a308-e762ea272294@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <J0g0PSye768dhN3JoXFsZ6FivocNBLPmHNtRUCOmLTw=.5756c64a-4763-4727-9637-ba3fe7b9b17d@github.com>
 <vt3oD_XW4eFNTnPYoSa0-TZiEsFtBXJKfPdsIoqVuXM=.7e14c4df-4c66-4f52-9a59-e86fbcb6e7d9@github.com>
 <sUnwVBIPq6l_4sgw5hOnjU4cP_RL4AM7f9RCFIOK6lc=.cb3e7f4b-e862-48f8-a308-e762ea272294@github.com>
Message-ID: <Kr9eGGeMs3LB-ULkPBBkaBBvURU3P7NJItuUR3Gmpmo=.d93bed80-2c06-414d-b8bd-b8f57a4deee3@github.com>

On Tue, 10 Sep 2024 12:03:59 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> src/hotspot/share/oops/compressedKlass.cpp line 214:
>> 
>>> 212:     ss.print("Class space size (%zu) exceeds the maximum possible size (%zu)",
>>> 213:               len, max_encoding_range_size());
>>> 214:     vm_exit_during_initialization(ss.base());
>> 
>> Why does this exit and not turn off compressed klass pointers and compact object headers?
>
> This is tricky. We are already deep in initialization and have done a couple of decisions based on +UseCompressedClassPointers (e.g. CDS setup). I *think* we could still go with -UseCCP, but I wonder whether this is wise. 
> 
> Note that this error is not new. In the old code, we simply asserted. That left us with UB in release builds, which remains unresolved. I simply made the error explicit in release too.

Ok, in this case, that's fine if we already asserted.  A fatal error is better.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751840556

From rkennke at openjdk.org  Tue Sep 10 12:42:48 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Tue, 10 Sep 2024 12:42:48 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v10]
In-Reply-To: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
Message-ID: <RNbGUr1omfddrAOB_cPdxKaNbErq6Pj8cRidmhnmgZY=.f9ee76ff-4dcc-4739-809a-b8da10e4eadf@github.com>

> This is the main body of the JEP 450: Compact Object Headers (Experimental).
> 
> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
> 
> Main changes:
>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>  - Arrays will now store their length at offset 8.
>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv...

Roman Kennke has updated the pull request incrementally with six additional commits since the last revision:

 - More touch-ups, fix Shenandoah oop iterator
 - Remove asserts in XArrayKlass::oop_oop_iterate()
 - Various touch-ups
 - Improve is_oop()
 - Rename GCForwarding -> FullGCForwarding; some touch-ups
 - Fix comment

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20677/files
  - new: https://git.openjdk.org/jdk/pull/20677/files/2884499a..5da250cf

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=09
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=08-09

  Stats: 238 lines in 36 files changed: 74 ins; 65 del; 99 mod
  Patch: https://git.openjdk.org/jdk/pull/20677.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677

PR: https://git.openjdk.org/jdk/pull/20677

From stuefe at openjdk.org  Tue Sep 10 12:42:49 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Tue, 10 Sep 2024 12:42:49 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v10]
In-Reply-To: <NR6CKOtdUaVarB7hiAUIAsIMe03qiTQr5VcEOWGt8mA=.164efc10-f0ff-4ddf-80b8-c56599c0ba7d@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com>
 <NR6CKOtdUaVarB7hiAUIAsIMe03qiTQr5VcEOWGt8mA=.164efc10-f0ff-4ddf-80b8-c56599c0ba7d@github.com>
Message-ID: <EZmLU9kQ--5-_Xi99YZG9HsKibIwnot3VcSI50gP6SY=.058873e4-dc41-460f-a5e4-3887d9be9d48@github.com>

On Mon, 9 Sep 2024 15:59:43 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

>> Roman Kennke has updated the pull request incrementally with six additional commits since the last revision:
>> 
>>  - More touch-ups, fix Shenandoah oop iterator
>>  - Remove asserts in XArrayKlass::oop_oop_iterate()
>>  - Various touch-ups
>>  - Improve is_oop()
>>  - Rename GCForwarding -> FullGCForwarding; some touch-ups
>>  - Fix comment
>
> src/hotspot/share/oops/compressedKlass.cpp line 116:
> 
>> 114:   _range = end - _base;
>> 115: 
>> 116:   DEBUG_ONLY(assert_is_valid_encoding(addr, len, _base, _shift);)
> 
> Can you refactor so the aarch64 path runs this same code without duplication?

In tinycp mode, aarch64 runs this code though? The aarch64 variant of pd_initialize just returns then. In non-COH mode (preexisting, not touched by this patch) Aarch64 needs its own handling.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751866773

From stuefe at openjdk.org  Tue Sep 10 12:42:49 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Tue, 10 Sep 2024 12:42:49 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v6]
In-Reply-To: <4cTKmlYUEtFpr2TURf25gd7-_eSb-uF0cC0BmLl6wd0=.b9f0482d-5439-421b-9a29-7a014fb72558@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <BGmZ04yibLn_KtPyQrA3sQv5OtMonrcDy3aN0vAj8NA=.5de6cae3-3a05-4a74-a15c-aade4176bdeb@github.com>
 <86SISXYdKqHq5_nSqeVVNgmxplVK6QuHvOjCmiCKkzQ=.92ac6af1-9a94-4068-b625-1e331314826e@github.com>
 <4cTKmlYUEtFpr2TURf25gd7-_eSb-uF0cC0BmLl6wd0=.b9f0482d-5439-421b-9a29-7a014fb72558@github.com>
Message-ID: <j8Kjz_b9OX7HADy3DxIfM6yOCxA6_Rk4scI6UxTI2Rs=.7b20c9e9-5c39-4f86-9b0c-732c116d323a@github.com>

On Mon, 9 Sep 2024 16:01:10 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

>> src/hotspot/share/oops/compressedKlass.hpp line 43:
>> 
>>> 41: 
>>> 42:   // Tiny-class-pointer mode
>>> 43:   static int _tiny_cp; // -1, 0=true, 1=false
>> 
>> Suggestion:
>> 
>>   static int _tiny_cp; // -1 = uninitialized, 0 = true, 1 = false
>> 
>> In addition to that, I am not sure if introducing a new term ("tiny") for compact class header related changes (and just here) makes the code more clear; I would have expected a "_compact_" prefix. Also all other members use "k"-klass and spell out "klass pointer", so I would prefer to keep that style.
>
> I agree with this.  'cp' reads as ConstantPool for me even though this is a different context.

Okay, I will change that

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751867998

From tschatzl at openjdk.org  Tue Sep 10 13:03:15 2024
From: tschatzl at openjdk.org (Thomas Schatzl)
Date: Tue, 10 Sep 2024 13:03:15 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v18]
In-Reply-To: <IeVZPQjaeWE6AThbR5sjTIlartNQ_nlI20cPp3DF_Dw=.7f799fc4-aab6-4cb3-9b4e-a1a2d0288362@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <IeVZPQjaeWE6AThbR5sjTIlartNQ_nlI20cPp3DF_Dw=.7f799fc4-aab6-4cb3-9b4e-a1a2d0288362@github.com>
Message-ID: <6oxoxxR5yGICT3d_4Wip3egvyFiQLWb3HWMxpL41jwY=.9ca3bb60-1e86-4bd2-a3b4-26eadc409eb5@github.com>

On Mon, 9 Sep 2024 11:15:47 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail.
>> 
>> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold:
>> 
>> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and
>> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work.
>> 
>> ## Summary of the Changes
>> 
>> ### Platform-Independent Changes (`src/hotspot/share`)
>> 
>> These consist mainly of:
>> 
>> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early;
>> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and
>> - temporary support for porting the JEP to the remaining platforms.
>> 
>> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed.
>> 
>> ### Platform-Dependent Changes (`src/hotspot/cpu`)
>> 
>> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms.
>> 
>> #### ADL Changes
>> 
>> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy.
>> 
>> #### `G1BarrierSetAssembler` Changes
>> 
>> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ...
>
> Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion
>  - riscv port for JEP 475

Marked as reviewed by tschatzl (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/19746#pullrequestreview-2292405233

From rcastanedalo at openjdk.org  Tue Sep 10 16:26:58 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Tue, 10 Sep 2024 16:26:58 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v19]
In-Reply-To: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
Message-ID: <5CDsDluK4CaytgLTPJPuKbz8Ug9mcF10mco0In8ljZM=.3fe332b8-36af-4de5-8cee-73f58a564497@github.com>

> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail.
> 
> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold:
> 
> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and
> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work.
> 
> ## Summary of the Changes
> 
> ### Platform-Independent Changes (`src/hotspot/share`)
> 
> These consist mainly of:
> 
> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early;
> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and
> - temporary support for porting the JEP to the remaining platforms.
> 
> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed.
> 
> ### Platform-Dependent Changes (`src/hotspot/cpu`)
> 
> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms.
> 
> #### ADL Changes
> 
> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy.
> 
> #### `G1BarrierSetAssembler` Changes
> 
> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c...

Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision:

  Fix indentation in generate_post_barrier_fast_path
  
  Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com>

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19746/files
  - new: https://git.openjdk.org/jdk/pull/19746/files/94145917..0979e41e

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=18
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=17-18

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/19746.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746

PR: https://git.openjdk.org/jdk/pull/19746

From rcastanedalo at openjdk.org  Tue Sep 10 16:26:59 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Tue, 10 Sep 2024 16:26:59 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v18]
In-Reply-To: <6oxoxxR5yGICT3d_4Wip3egvyFiQLWb3HWMxpL41jwY=.9ca3bb60-1e86-4bd2-a3b4-26eadc409eb5@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <IeVZPQjaeWE6AThbR5sjTIlartNQ_nlI20cPp3DF_Dw=.7f799fc4-aab6-4cb3-9b4e-a1a2d0288362@github.com>
 <6oxoxxR5yGICT3d_4Wip3egvyFiQLWb3HWMxpL41jwY=.9ca3bb60-1e86-4bd2-a3b4-26eadc409eb5@github.com>
Message-ID: <7epSurWH76D6t-eSs3neVvSHYRdhdGanYobPU0Y_-SM=.5068c4a5-d220-417d-9d8a-0518bfdc61d8@github.com>

On Tue, 10 Sep 2024 13:00:05 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:

>> Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision:
>> 
>>  - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion
>>  - riscv port for JEP 475
>
> Marked as reviewed by tschatzl (Reviewer).

Thanks for reviewing, @tschatzl!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2341418514

From rkennke at openjdk.org  Tue Sep 10 19:11:30 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Tue, 10 Sep 2024 19:11:30 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v11]
In-Reply-To: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
Message-ID: <DUgk6HT_Zk5KnbmAd_HzrJi4B8D1J-uuoAGzReWyI8s=.6c04f90a-bf1d-49af-a31d-5224973952d4@github.com>

> This is the main body of the JEP 450: Compact Object Headers (Experimental).
> 
> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
> 
> Main changes:
>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>  - Arrays will now store their length at offset 8.
>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv...

Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:

  Fix FullGCForwarding initialization

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20677/files
  - new: https://git.openjdk.org/jdk/pull/20677/files/5da250cf..6abda7bc

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=10
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=09-10

  Stats: 8 lines in 7 files changed: 1 ins; 0 del; 7 mod
  Patch: https://git.openjdk.org/jdk/pull/20677.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677

PR: https://git.openjdk.org/jdk/pull/20677

From stuefe at openjdk.org  Tue Sep 10 19:11:30 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Tue, 10 Sep 2024 19:11:30 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v8]
In-Reply-To: <vt3oD_XW4eFNTnPYoSa0-TZiEsFtBXJKfPdsIoqVuXM=.7e14c4df-4c66-4f52-9a59-e86fbcb6e7d9@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <J0g0PSye768dhN3JoXFsZ6FivocNBLPmHNtRUCOmLTw=.5756c64a-4763-4727-9637-ba3fe7b9b17d@github.com>
 <vt3oD_XW4eFNTnPYoSa0-TZiEsFtBXJKfPdsIoqVuXM=.7e14c4df-4c66-4f52-9a59-e86fbcb6e7d9@github.com>
Message-ID: <yk4X4o4WwCsA2Hj5ncidSj7Ng-74NfzKZy4J37uHnyQ=.dcc835de-584a-4922-9234-c7b54d7e35ed@github.com>

On Mon, 9 Sep 2024 17:40:03 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

>> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision:
>> 
>>  - Try to avoid lea in loadNklass (aarch64)
>>  - Fix release build error
>
> src/hotspot/share/oops/compressedKlass.inline.hpp line 100:
> 
>> 98:   check_valid_klass(k, base(), shift());
>> 99:   // Also assert that k falls into what we know is the valid Klass range. This is usually smaller
>> 100:   // than the encoding range (e.g. encoding range covers 4G, but we only have 1G class space and a
> 
> 1G is the default CompressedClassSpaceSize but can be larger, right?  So the comment isn't quite accurate.  Or with tiny class pointers can it only be 1G?

The comment was misleading, it referred to the 1g default class space. I recently changed class space (in mainline) to be max. 4GB (minus whatever little CDS needs), and for +COH, this is still true. 22 bit class pointer and 10 bit shift still gives us a max encoding range size of 4GB.

I will update the comment. (->backlist)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751872461

From duke at openjdk.org  Wed Sep 11 08:11:17 2024
From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=)
Date: Wed, 11 Sep 2024 08:11:17 GMT
Subject: Integrated: 8339661: ZGC: Move some page resets and verification to
 callsites
In-Reply-To: <oiSYmbM4LhTgP1NIAWbOceKhb1PLUEKYJhKj9RbdcJA=.21409309-eb6d-4685-86c5-3371037a3b52@github.com>
References: <oiSYmbM4LhTgP1NIAWbOceKhb1PLUEKYJhKj9RbdcJA=.21409309-eb6d-4685-86c5-3371037a3b52@github.com>
Message-ID: <7-ZqaoZ1s-ga06y8iChA-GQHwjluY_5y97aWJ0lsioc=.3bc5ef62-9879-4a43-bae5-6fe926d705ae@github.com>

On Fri, 6 Sep 2024 12:43:28 GMT, Joel Sikstr?m <duke at openjdk.org> wrote:

> Currently, `ZPage::reset()` does different things depending on where the page is being reset and what it will be used for. This leads to checks being somewhat hard to understand and to follow the edge-cases of.
> 
> By moving some of the reset logic that is now part of `ZPage::reset()` to the callsite, some operations can be made easier to understand the reason behind when reading the code.
> 
> Main highlights:
> - Clear logic behind initializing remsets at callsites, now guarded by asserts in `ZPage::remset_initialize()`.
> - `ZPage::clone_limited()` retains the value of the top-pointer.
> - The kind of verification for remsets are now at callsites:
>   - Allocations from the page cache, and only if the page got a remset
>   - Old-to-old in-place relocations, where only the inactive remset is checked

This pull request has now been integrated.

Changeset: ceef161e
Author:    Joel Sikstr?m <joel.sikstrom at oracle.com>
Committer: Stefan Karlsson <stefank at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/ceef161eea51578160b71b20826a9328f9a87a88
Stats:     127 lines in 6 files changed: 34 ins; 64 del; 29 mod

8339661: ZGC: Move some page resets and verification to callsites

Reviewed-by: stefank, eosterlund

-------------

PR: https://git.openjdk.org/jdk/pull/20890

From epeter at openjdk.org  Wed Sep 11 08:28:13 2024
From: epeter at openjdk.org (Emanuel Peter)
Date: Wed, 11 Sep 2024 08:28:13 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v11]
In-Reply-To: <DUgk6HT_Zk5KnbmAd_HzrJi4B8D1J-uuoAGzReWyI8s=.6c04f90a-bf1d-49af-a31d-5224973952d4@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <DUgk6HT_Zk5KnbmAd_HzrJi4B8D1J-uuoAGzReWyI8s=.6c04f90a-bf1d-49af-a31d-5224973952d4@github.com>
Message-ID: <YTte6oY9EeAPHxzzOndL8M202wY91SbeaYC4hPva0W0=.caf00270-89c7-44b0-ab88-a970fb8e840d@github.com>

On Tue, 10 Sep 2024 19:11:30 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> This is the main body of the JEP 450: Compact Object Headers (Experimental).
>> 
>> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
>> 
>> Main changes:
>>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>>  - Arrays will now store their length at offset 8.
>>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co...
>
> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix FullGCForwarding initialization

@rkennke Can you please explain the changes in these tests:

test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java
test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java
test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java
test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java
test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java


You added these IR rule restriction:
`@IR(applyIf = {"UseCompactObjectHeaders", "false"},`

This means that if `UseCompactObjectHeaders` is enabled, vectorization seems to be impacted - that could be concerning because it has a performance impact.

I have recently changed a few things in SuperWord, so maybe some of them can be removed, because they now vectorize anyway?

Of course some special tests may just rely on `UseCompactObjectHeaders == false` - but I would need some comments in the tests where you added it to justify why we add the restriction.

Please also test this patch with the cross combinations of `UseCompactObjectHeaders` and `AlignVector` enabled and disabled (and add `VerifyAlignVector` as well).

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2342983487

From rcastanedalo at openjdk.org  Wed Sep 11 08:30:02 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Wed, 11 Sep 2024 08:30:02 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v20]
In-Reply-To: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
Message-ID: <fZlrdfsGJHUzjBFq0wEWY0QPZlhjC3KAK1obtgfMCTk=.7ccbe569-50d8-4a3e-8728-5aaca9678ab6@github.com>

> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail.
> 
> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold:
> 
> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and
> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work.
> 
> ## Summary of the Changes
> 
> ### Platform-Independent Changes (`src/hotspot/share`)
> 
> These consist mainly of:
> 
> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early;
> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and
> - temporary support for porting the JEP to the remaining platforms.
> 
> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed.
> 
> ### Platform-Dependent Changes (`src/hotspot/cpu`)
> 
> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms.
> 
> #### ADL Changes
> 
> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy.
> 
> #### `G1BarrierSetAssembler` Changes
> 
> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c...

Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision:

  Fix a few style issues

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19746/files
  - new: https://git.openjdk.org/jdk/pull/19746/files/0979e41e..141020e6

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=19
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=18-19

  Stats: 7 lines in 3 files changed: 0 ins; 0 del; 7 mod
  Patch: https://git.openjdk.org/jdk/pull/19746.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746

PR: https://git.openjdk.org/jdk/pull/19746

From rcastanedalo at openjdk.org  Wed Sep 11 08:32:12 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Wed, 11 Sep 2024 08:32:12 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v19]
In-Reply-To: <5CDsDluK4CaytgLTPJPuKbz8Ug9mcF10mco0In8ljZM=.3fe332b8-36af-4de5-8cee-73f58a564497@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <5CDsDluK4CaytgLTPJPuKbz8Ug9mcF10mco0In8ljZM=.3fe332b8-36af-4de5-8cee-73f58a564497@github.com>
Message-ID: <8-IYniHv9GgBnsv9w3GggGF1mKKf3MfwxIxGIjEUh3c=.446607ac-5624-4c16-a1a5-a29187526023@github.com>

On Tue, 10 Sep 2024 16:26:58 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail.
>> 
>> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold:
>> 
>> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and
>> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work.
>> 
>> ## Summary of the Changes
>> 
>> ### Platform-Independent Changes (`src/hotspot/share`)
>> 
>> These consist mainly of:
>> 
>> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early;
>> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and
>> - temporary support for porting the JEP to the remaining platforms.
>> 
>> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed.
>> 
>> ### Platform-Dependent Changes (`src/hotspot/cpu`)
>> 
>> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms.
>> 
>> #### ADL Changes
>> 
>> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy.
>> 
>> #### `G1BarrierSetAssembler` Changes
>> 
>> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ...
>
> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix indentation in generate_post_barrier_fast_path
>   
>   Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com>

I just fixed a few more indentation and code style glitches found by clang-format in commit 141020e6 (thanks @dlunde for helping with the setup).

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2342993484

From aboldtch at openjdk.org  Wed Sep 11 09:44:04 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Wed, 11 Sep 2024 09:44:04 GMT
Subject: RFR: 8339648: ZGC: Division by zero in rule_major_allocation_rate
 [v2]
In-Reply-To: <ftQ3_zwYKXg1VEzKfx5XqYxCAWZtbhskNJTD47st1Jg=.eadd39e7-5a1e-4001-ab10-901c9ef57581@github.com>
References: <ElUYTplxxvkJLtxj5OdpcsgumQV8zdml9oyQim2kBP8=.695e816f-a639-4b39-92d2-ea5de5cb1037@github.com>
 <ftQ3_zwYKXg1VEzKfx5XqYxCAWZtbhskNJTD47st1Jg=.eadd39e7-5a1e-4001-ab10-901c9ef57581@github.com>
Message-ID: <7yVEKxrYW37Ci3BnoTv836ENs4EoncYnRKolR4ytJTM=.b69666e3-1a76-4ae4-a28c-ef28ae27cac9@github.com>

On Mon, 9 Sep 2024 11:37:41 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

>> The HS jtreg test gc/stringdedup/TestStringDeduplicationAgeThreshold_ZGenerational
>> shows this error when running with ubsan enabled 
>> 
>> src/hotspot/share/gc/z/zDirector.cpp:491:74: runtime error: division by zero
>>     #0 0x7f09886401d4 in rule_major_allocation_rate src/hotspot/share/gc/z/zDirector.cpp:491
>>     #1 0x7f09886401d4 in start_gc src/hotspot/share/gc/z/zDirector.cpp:822
>>     #2 0x7f09886401d4 in ZDirector::run_thread() src/hotspot/share/gc/z/zDirector.cpp:912
>>     #3 0x7f098c1404e8 in ZThread::run_service() src/hotspot/share/gc/z/zThread.cpp:29
>>     #4 0x7f09897cac19 in ConcurrentGCThread::run() src/hotspot/share/gc/shared/concurrentGCThread.cpp:48
>>     #5 0x7f098bb46b0a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225
>>     #6 0x7f098b1a9881 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858
>
> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Adjust division following suggestion by xmas

I am fine with this change. But I am not 100% about the use of `std::numeric_limits<double>::infinity()`. Maybe someone else can chime in.

Not sure if there are any other places we have expect division by zero to result in infinity.

-------------

Marked as reviewed by aboldtch (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20888#pullrequestreview-2296226260

From mbaesken at openjdk.org  Wed Sep 11 11:13:04 2024
From: mbaesken at openjdk.org (Matthias Baesken)
Date: Wed, 11 Sep 2024 11:13:04 GMT
Subject: RFR: 8339648: ZGC: Division by zero in rule_major_allocation_rate
 [v2]
In-Reply-To: <7yVEKxrYW37Ci3BnoTv836ENs4EoncYnRKolR4ytJTM=.b69666e3-1a76-4ae4-a28c-ef28ae27cac9@github.com>
References: <ElUYTplxxvkJLtxj5OdpcsgumQV8zdml9oyQim2kBP8=.695e816f-a639-4b39-92d2-ea5de5cb1037@github.com>
 <ftQ3_zwYKXg1VEzKfx5XqYxCAWZtbhskNJTD47st1Jg=.eadd39e7-5a1e-4001-ab10-901c9ef57581@github.com>
 <7yVEKxrYW37Ci3BnoTv836ENs4EoncYnRKolR4ytJTM=.b69666e3-1a76-4ae4-a28c-ef28ae27cac9@github.com>
Message-ID: <kSFW9GXwBqsJeCooNCT--XBlZtEkMPyDinTwAN96eSs=.4a05f5f1-1c21-44a1-9d53-696b0fb4fa93@github.com>

On Wed, 11 Sep 2024 09:41:14 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

> I am fine with this change. But I am not 100% about the use of `std::numeric_limits<double>::infinity()`. Maybe someone else can chime in.
> 
> Not sure if there are any other places we have expect division by zero to result in infinity.

Thanks for the review !

Seems this exists since c++11  https://en.cppreference.com/w/cpp/types/numeric_limits/infinity  so usage should be okay.
We also find it in libsimdsort (linux only in OpenJDK however) https://github.com/openjdk/jdk/blob/master/src/java.base/linux/native/libsimdsort/xss-common-includes.h#L47

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20888#issuecomment-2343339018

From rkennke at openjdk.org  Wed Sep 11 13:37:16 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Wed, 11 Sep 2024 13:37:16 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v11]
In-Reply-To: <YTte6oY9EeAPHxzzOndL8M202wY91SbeaYC4hPva0W0=.caf00270-89c7-44b0-ab88-a970fb8e840d@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <DUgk6HT_Zk5KnbmAd_HzrJi4B8D1J-uuoAGzReWyI8s=.6c04f90a-bf1d-49af-a31d-5224973952d4@github.com>
 <YTte6oY9EeAPHxzzOndL8M202wY91SbeaYC4hPva0W0=.caf00270-89c7-44b0-ab88-a970fb8e840d@github.com>
Message-ID: <kzodWwxj5qdAqPYcP2s4k2QTXqjju2H6Iftu7dojezk=.1f72fc31-94ba-4017-b330-c8688a8d39a4@github.com>

On Wed, 11 Sep 2024 08:24:16 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> @rkennke Can you please explain the changes in these tests:
> 
> ```
> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java
> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java
> test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java
> test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java
> test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java
> ```
> 
> You added these IR rule restriction: `@IR(applyIf = {"UseCompactObjectHeaders", "false"},`
> 
> This means that if `UseCompactObjectHeaders` is enabled, vectorization seems to be impacted - that could be concerning because it has a performance impact.
> 
> I have recently changed a few things in SuperWord, so maybe some of them can be removed, because they now vectorize anyway?
> 
> Of course some special tests may just rely on `UseCompactObjectHeaders == false` - but I would need some comments in the tests where you added it to justify why we add the restriction.
> 
> Please also test this patch with the cross combinations of `UseCompactObjectHeaders` and `AlignVector` enabled and disabled (and add `VerifyAlignVector` as well).

IIRC (it has been a while), the problem is that with Lilliput (and also without compact headers, but disabling compressed class-pointers -UseCompressedClassPointers, but nobody ever does that), byte[] and long[] start at different offsets (12 and 16, respectively). That is because with compact headers, we are using the 4 bytes after the arraylength, but long-arrays cannot do that because of alignment constraints. The impact is that these tests don't work as expected, because vectorization triggers differently. I don't remember the details, TBH, but I believe they would now generate pre-loops, or some might even not vectorize at all. Those seemed to be use-cases that did not look very important, but I may be wrong. It would be nice to properly fix those tests, or make corresponding tests for compact headers, instead, or improve vectorization to better deal with the offset mismatch, if necessary/possible.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2343693629

From jsjolen at openjdk.org  Wed Sep 11 14:00:17 2024
From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=)
Date: Wed, 11 Sep 2024 14:00:17 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v11]
In-Reply-To: <DUgk6HT_Zk5KnbmAd_HzrJi4B8D1J-uuoAGzReWyI8s=.6c04f90a-bf1d-49af-a31d-5224973952d4@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <DUgk6HT_Zk5KnbmAd_HzrJi4B8D1J-uuoAGzReWyI8s=.6c04f90a-bf1d-49af-a31d-5224973952d4@github.com>
Message-ID: <b8u6PaStO4nt0AGnbZEkmXkzQKzXaB2c4lSiC9JWdTY=.a5805ef2-84d8-44c0-9dc0-3e90da3460aa@github.com>

On Tue, 10 Sep 2024 19:11:30 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> This is the main body of the JEP 450: Compact Object Headers (Experimental).
>> 
>> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
>> 
>> Main changes:
>>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>>  - Arrays will now store their length at offset 8.
>>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co...
>
> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix FullGCForwarding initialization

Hi,

Me and @caspernorrbin are reviewing the Metaspace changes (so anything in the `memory` and `metaspace` folders). We have found minor improvements that can be made and some nits, but the code over all looks OK. We are finishing up a first round of review now, and will have a second one.

Thank you for your hard work and your patience with the review process.

src/hotspot/share/memory/classLoaderMetaspace.cpp line 87:

> 85:         klass_alignment_words,
> 86:         "class arena");
> 87:   }

As per my comment in the header file, change the code to this:

```c++
if (class_context != nullptr) {
  // ... Same as in PR
} else {
  _class_space_arena = _non_class_space_arena;
}

src/hotspot/share/memory/classLoaderMetaspace.cpp line 115:

> 113:   if (wastage.is_nonempty()) {
> 114:     non_class_space_arena()->deallocate(wastage);
> 115:   }

This code reads a bit strangely. I understand *what* it tries to do. It tries to give back any wasted memory from either the class space arena *or* the non class space arena to the non class space arena's freelist. I assume that we do this since any wastage is presumably too small to be used by our new 22-bit class pointers. However, this context will be lost on future readers. It should have at least a comment in the `if (wastage.is_nonempty())` clause explaining what we expect should happen and why. For example:

```c++
// Any wasted memory is presumably too small for any class.
// Therefore, give it back to the non-class space arena's free list.

src/hotspot/share/memory/classLoaderMetaspace.cpp line 118:

> 116: #ifdef ASSERT
> 117:   if (result.is_nonempty()) {
> 118:     const bool in_class_arena = class_space_arena() != nullptr ? class_space_arena()->contains(result) : false;

Unnecessary nullptr check if you take my suggestion, or you should switch to `have_class_space_arena`.

src/hotspot/share/memory/classLoaderMetaspace.cpp line 165:

> 163:   MetaBlock bl(ptr, word_size);
> 164:   // If the block would be reusable for a Klass, add to class arena, otherwise to
> 165:   // then non-class arena.

Nit: spelling, "the"

src/hotspot/share/memory/classLoaderMetaspace.hpp line 81:

> 79:   metaspace::MetaspaceArena* class_space_arena() const       { return _class_space_arena; }
> 80: 
> 81:   bool have_class_space_arena() const { return _class_space_arena != nullptr; }

This is unnecessary. Instead of having this and having to remember to check for nullness each time, just change the `_class_space_arena` to point to the same arena as the `_non_class_space_arena` does when we run with `-XX:-UseCompressedClassPointers`

src/hotspot/share/memory/metaspace.cpp line 656:

> 654:     // Adjust size of the compressed class space.
> 655: 
> 656:     const size_t res_align = reserve_alignment();

Can you change the name to `root_chunk_size`?

src/hotspot/share/memory/metaspace.hpp line 112:

> 110:   static size_t max_allocation_word_size();
> 111: 
> 112:   // Minimum allocation alignment, in bytes. All MetaData shall be aligned correclty

Nit: Spelling, "correctly"

src/hotspot/share/memory/metaspace/metablock.hpp line 48:

> 46: 
> 47:   MetaWord* base() const { return _base; }
> 48:   const MetaWord* end() const { return _base + _word_size; }

`assert(is_nonempty())`

src/hotspot/share/memory/metaspace/metablock.hpp line 51:

> 49:   size_t word_size() const { return _word_size; }
> 50:   bool is_empty() const { return _base == nullptr; }
> 51:   bool is_nonempty() const { return _base != nullptr; }

Can `_base == nullptr` but `_word_size != 0`?

src/hotspot/share/memory/metaspace/metablock.hpp line 52:

> 50:   bool is_empty() const { return _base == nullptr; }
> 51:   bool is_nonempty() const { return _base != nullptr; }
> 52:   void reset() { _base = nullptr; _word_size = 0; }

Is this function really necessary? According to my IDE it's only used in tests and even then the `MetaBlock` isn't used afterwards (so it has no effect).

src/hotspot/share/memory/metaspace/metaspaceArena.hpp line 44:

> 42: class FreeBlocks;
> 43: 
> 44: struct ArenaStats;

Nit: Sort?

src/hotspot/share/memory/metaspace/metaspaceArena.hpp line 84:

> 82:   // between threads and needs to be synchronized in CLMS.
> 83: 
> 84:   const size_t _allocation_alignment_words;

Nit: Document this? All other members are documented.

-------------

PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2296528491
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754335269
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754398993
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754343513
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754459464
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754330432
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754619023
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754508321
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754142822
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754142098
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754153662
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754192464
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754197251

From epeter at openjdk.org  Wed Sep 11 14:17:16 2024
From: epeter at openjdk.org (Emanuel Peter)
Date: Wed, 11 Sep 2024 14:17:16 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v11]
In-Reply-To: <kzodWwxj5qdAqPYcP2s4k2QTXqjju2H6Iftu7dojezk=.1f72fc31-94ba-4017-b330-c8688a8d39a4@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <DUgk6HT_Zk5KnbmAd_HzrJi4B8D1J-uuoAGzReWyI8s=.6c04f90a-bf1d-49af-a31d-5224973952d4@github.com>
 <YTte6oY9EeAPHxzzOndL8M202wY91SbeaYC4hPva0W0=.caf00270-89c7-44b0-ab88-a970fb8e840d@github.com>
 <kzodWwxj5qdAqPYcP2s4k2QTXqjju2H6Iftu7dojezk=.1f72fc31-94ba-4017-b330-c8688a8d39a4@github.com>
Message-ID: <oLLd3DTbo5HtVFxXEyY-SSiwwJe6n5q5YMvoSf6xiEE=.6c18ec11-9e51-4635-84e7-5a56e47b0445@github.com>

On Wed, 11 Sep 2024 13:34:28 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

> > @rkennke Can you please explain the changes in these tests:
> > ```
> > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java
> > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java
> > test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java
> > test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java
> > test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java
> > ```
> > 
> > 
> >     
> >       
> >     
> > 
> >       
> >     
> > 
> >     
> >   
> > You added these IR rule restriction: `@IR(applyIf = {"UseCompactObjectHeaders", "false"},`
> > This means that if `UseCompactObjectHeaders` is enabled, vectorization seems to be impacted - that could be concerning because it has a performance impact.
> > I have recently changed a few things in SuperWord, so maybe some of them can be removed, because they now vectorize anyway?
> > Of course some special tests may just rely on `UseCompactObjectHeaders == false` - but I would need some comments in the tests where you added it to justify why we add the restriction.
> > Please also test this patch with the cross combinations of `UseCompactObjectHeaders` and `AlignVector` enabled and disabled (and add `VerifyAlignVector` as well).
> 
> IIRC (it has been a while), the problem is that with Lilliput (and also without compact headers, but disabling compressed class-pointers -UseCompressedClassPointers, but nobody ever does that), byte[] and long[] start at different offsets (12 and 16, respectively). That is because with compact headers, we are using the 4 bytes after the arraylength, but long-arrays cannot do that because of alignment constraints. The impact is that these tests don't work as expected, because vectorization triggers differently. I don't remember the details, TBH, but I believe they would now generate pre-loops, or some might even not vectorize at all. Those seemed to be use-cases that did not look very important, but I may be wrong. It would be nice to properly fix those tests, or make corresponding tests for compact headers, instead, or improve vectorization to better deal with the offset mismatch, if necessary/possible.
> 
> I will re-evaluate those tests, and add comments or remove the restrictions.

If it has indeed been a while, then it might well be that some of them work now, since I did make some improvements to auto-vectorization ;)

My suggestion is this: go over the examples, check which ones are now ok. Those that are not ok, add a comment, and file a bug: I can then analyze those cases later, and see how to write other tests or improve auto-vectorization.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2343797957

From rcastanedalo at openjdk.org  Wed Sep 11 14:17:17 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Wed, 11 Sep 2024 14:17:17 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v11]
In-Reply-To: <DUgk6HT_Zk5KnbmAd_HzrJi4B8D1J-uuoAGzReWyI8s=.6c04f90a-bf1d-49af-a31d-5224973952d4@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <DUgk6HT_Zk5KnbmAd_HzrJi4B8D1J-uuoAGzReWyI8s=.6c04f90a-bf1d-49af-a31d-5224973952d4@github.com>
Message-ID: <blSYjm8E4iTDbxev4YGU8sm0EFE-dNa87oQYNzneLD8=.9bdad535-f503-443c-b3e6-26899e72b2a7@github.com>

On Tue, 10 Sep 2024 19:11:30 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> This is the main body of the JEP 450: Compact Object Headers (Experimental).
>> 
>> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
>> 
>> Main changes:
>>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>>  - Arrays will now store their length at offset 8.
>>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co...
>
> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix FullGCForwarding initialization

src/hotspot/share/memory/metaspace/binList.hpp line 202:

> 200:         b_last = b;
> 201:       }
> 202:       if (UseNewCode)printf("\n");

I guess this line is a leftover to be removed?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754702742

From rcastanedalo at openjdk.org  Wed Sep 11 14:50:17 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Wed, 11 Sep 2024 14:50:17 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v11]
In-Reply-To: <DUgk6HT_Zk5KnbmAd_HzrJi4B8D1J-uuoAGzReWyI8s=.6c04f90a-bf1d-49af-a31d-5224973952d4@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <DUgk6HT_Zk5KnbmAd_HzrJi4B8D1J-uuoAGzReWyI8s=.6c04f90a-bf1d-49af-a31d-5224973952d4@github.com>
Message-ID: <GQVVR0K-mDh4evKvrKG9eXwwF4j7zAg-8ai4__gWNGE=.3c502820-717b-44b7-b82d-a24dd7fdd9d5@github.com>

On Tue, 10 Sep 2024 19:11:30 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> This is the main body of the JEP 450: Compact Object Headers (Experimental).
>> 
>> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
>> 
>> Main changes:
>>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>>  - Arrays will now store their length at offset 8.
>>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co...
>
> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix FullGCForwarding initialization

src/hotspot/share/opto/machnode.cpp line 390:

> 388:     t = t->make_ptr();
> 389:   }
> 390:   if (t->isa_narrowklass() && CompressedKlassPointers::shift() == 0) {

Does this change have any effect? `UseCompressedClassPointers` should be implied by `t->isa_narrowklass()`.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754813751

From stuefe at openjdk.org  Wed Sep 11 16:17:16 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Wed, 11 Sep 2024 16:17:16 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v11]
In-Reply-To: <b8u6PaStO4nt0AGnbZEkmXkzQKzXaB2c4lSiC9JWdTY=.a5805ef2-84d8-44c0-9dc0-3e90da3460aa@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <DUgk6HT_Zk5KnbmAd_HzrJi4B8D1J-uuoAGzReWyI8s=.6c04f90a-bf1d-49af-a31d-5224973952d4@github.com>
 <b8u6PaStO4nt0AGnbZEkmXkzQKzXaB2c4lSiC9JWdTY=.a5805ef2-84d8-44c0-9dc0-3e90da3460aa@github.com>
Message-ID: <heQmg7wA6Yrt25T08k27pNGDJ_FAI0c3aGNuqNWt7Qk=.f01b3868-bd28-4635-a3b8-618e93880a91@github.com>

On Wed, 11 Sep 2024 12:47:30 GMT, Johan Sj?len <jsjolen at openjdk.org> wrote:

>> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Fix FullGCForwarding initialization
>
> src/hotspot/share/memory/classLoaderMetaspace.cpp line 115:
> 
>> 113:   if (wastage.is_nonempty()) {
>> 114:     non_class_space_arena()->deallocate(wastage);
>> 115:   }
> 
> This code reads a bit strangely. I understand *what* it tries to do. It tries to give back any wasted memory from either the class space arena *or* the non class space arena to the non class space arena's freelist. I assume that we do this since any wastage is presumably too small to be used by our new 22-bit class pointers. However, this context will be lost on future readers. It should have at least a comment in the `if (wastage.is_nonempty())` clause explaining what we expect should happen and why. For example:
> 
> ```c++
> // Any wasted memory is presumably too small for any class.
> // Therefore, give it back to the non-class space arena's free list.

Yes. Some background:

- wastage can only occur for larger Klass* alignments (aka class space arena alignment property), so only for +COH (note to self, maybe assert)
- wastage is, by definition, not aligned to the required Klass* alignment, so it cannot be reused. Yes, its probably also too small

Yes, I will write a better comment.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1755111131

From stuefe at openjdk.org  Wed Sep 11 16:17:16 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Wed, 11 Sep 2024 16:17:16 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v11]
In-Reply-To: <blSYjm8E4iTDbxev4YGU8sm0EFE-dNa87oQYNzneLD8=.9bdad535-f503-443c-b3e6-26899e72b2a7@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <DUgk6HT_Zk5KnbmAd_HzrJi4B8D1J-uuoAGzReWyI8s=.6c04f90a-bf1d-49af-a31d-5224973952d4@github.com>
 <blSYjm8E4iTDbxev4YGU8sm0EFE-dNa87oQYNzneLD8=.9bdad535-f503-443c-b3e6-26899e72b2a7@github.com>
Message-ID: <y70DRIqSP00J6AsD0qAUV3Mn6XgMkMzu4s7ZkO2OqZk=.519190f4-7e32-4351-965b-9aaf8781316a@github.com>

On Wed, 11 Sep 2024 14:15:12 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Fix FullGCForwarding initialization
>
> src/hotspot/share/memory/metaspace/binList.hpp line 202:
> 
>> 200:         b_last = b;
>> 201:       }
>> 202:       if (UseNewCode)printf("\n");
> 
> I guess this line is a leftover to be removed?

Yep thanks for spotting

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1755115905

From stuefe at openjdk.org  Wed Sep 11 16:17:17 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Wed, 11 Sep 2024 16:17:17 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v11]
In-Reply-To: <y70DRIqSP00J6AsD0qAUV3Mn6XgMkMzu4s7ZkO2OqZk=.519190f4-7e32-4351-965b-9aaf8781316a@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <DUgk6HT_Zk5KnbmAd_HzrJi4B8D1J-uuoAGzReWyI8s=.6c04f90a-bf1d-49af-a31d-5224973952d4@github.com>
 <blSYjm8E4iTDbxev4YGU8sm0EFE-dNa87oQYNzneLD8=.9bdad535-f503-443c-b3e6-26899e72b2a7@github.com>
 <y70DRIqSP00J6AsD0qAUV3Mn6XgMkMzu4s7ZkO2OqZk=.519190f4-7e32-4351-965b-9aaf8781316a@github.com>
Message-ID: <iLETzB3HhQYtpUGFtQ7MQs8_v-US_YxsaOGNrXS7tE8=.d5301449-6481-4af6-b238-bc717831adf9@github.com>

On Wed, 11 Sep 2024 16:14:39 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> src/hotspot/share/memory/metaspace/binList.hpp line 202:
>> 
>>> 200:         b_last = b;
>>> 201:       }
>>> 202:       if (UseNewCode)printf("\n");
>> 
>> I guess this line is a leftover to be removed?
>
> Yep thanks for spotting

So that was causing the empty lines in my logs (facepalm)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1755116656

From rkennke at openjdk.org  Wed Sep 11 17:31:54 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Wed, 11 Sep 2024 17:31:54 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v12]
In-Reply-To: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
Message-ID: <hB5TC1vwu6WY8By-sEZEBpD3j6an9rGBh14Ef3oyv4s=.aedaeb19-5804-458e-aeb6-3bad716477a7@github.com>

> This is the main body of the JEP 450: Compact Object Headers (Experimental).
> 
> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
> 
> Main changes:
>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>  - Arrays will now store their length at offset 8.
>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv...

Roman Kennke has updated the pull request incrementally with two additional commits since the last revision:

 - Make is_oop() MT-safe
 - Re-enable some vectorization tests

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20677/files
  - new: https://git.openjdk.org/jdk/pull/20677/files/6abda7bc..b6c11f74

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=11
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=10-11

  Stats: 32 lines in 6 files changed: 7 ins; 8 del; 17 mod
  Patch: https://git.openjdk.org/jdk/pull/20677.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677

PR: https://git.openjdk.org/jdk/pull/20677

From rkennke at openjdk.org  Wed Sep 11 17:38:57 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Wed, 11 Sep 2024 17:38:57 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v13]
In-Reply-To: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
Message-ID: <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com>

> This is the main body of the JEP 450: Compact Object Headers (Experimental).
> 
> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
> 
> Main changes:
>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>  - Arrays will now store their length at offset 8.
>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv...

Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:

  Revert accidental change of UCOH default

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20677/files
  - new: https://git.openjdk.org/jdk/pull/20677/files/b6c11f74..9e008ac1

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=12
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=11-12

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/20677.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677

PR: https://git.openjdk.org/jdk/pull/20677

From coleenp at openjdk.org  Wed Sep 11 21:18:12 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Wed, 11 Sep 2024 21:18:12 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v13]
In-Reply-To: <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com>
Message-ID: <epFrYZqTWD77IySDqJ9fwFiXZKZ_NCuzA-lfPi6TU_4=.d9c5c483-9a6a-4113-9856-cddcae962f33@github.com>

On Wed, 11 Sep 2024 17:38:57 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> This is the main body of the JEP 450: Compact Object Headers (Experimental).
>> 
>> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
>> 
>> Main changes:
>>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>>  - Arrays will now store their length at offset 8.
>>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co...
>
> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Revert accidental change of UCOH default

I was starting to understand the concerns with having prototype_header in Klass.  It seems like it would simplify encoding the klass for object allocation.  My recent change https://bugs.openjdk.org/browse/JDK-8338526 breaks this.  You need to pass a parameter to Klass() to tell whether to encode the klass pointer or not, and pass this to Klass() constructor.

     diff --git a/src/hotspot/share/oops/instanceKlass.cpp b/src/hotspot/share/oops/instanceKlass.cpp
     index fd198f54fc9..7aa4bd24948 100644
     --- a/src/hotspot/share/oops/instanceKlass.cpp
     +++ b/src/hotspot/share/oops/instanceKlass.cpp
    @@ -511,7 +511,7 @@ InstanceKlass::InstanceKlass() {
     }
     
     InstanceKlass::InstanceKlass(const ClassFileParser& parser, KlassKind kind, ReferenceType reference_type) :
    -  Klass(kind),
    +  Klass(kind, (!parser.is_interface() && !parser.is_abstract())),
       _nest_members(nullptr),
       _nest_host(nullptr),
       _permitted_subclasses(nullptr),

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2344715540

From stefank at openjdk.org  Thu Sep 12 09:37:22 2024
From: stefank at openjdk.org (Stefan Karlsson)
Date: Thu, 12 Sep 2024 09:37:22 GMT
Subject: RFR: 8314842: zgc/genzgc tests ignore vm flags
Message-ID: <7p6dM8Eb8dQSgTA_pDvyOLs3Qu6YB4uJT9LxZbCiwcU=.2fc89050-0ebd-4a4f-8968-7bfb824c3dac@github.com>

Change some ZGC tests to propagate requested vm flags.

I tested this by manually passing in various flags to jtreg. I also ran tier1-5 on Linux.

-------------

Commit messages:
 - 8314842: zgc/genzgc tests ignore vm flags

Changes: https://git.openjdk.org/jdk/pull/20963/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20963&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8314842
  Stats: 6 lines in 6 files changed: 0 ins; 0 del; 6 mod
  Patch: https://git.openjdk.org/jdk/pull/20963.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20963/head:pull/20963

PR: https://git.openjdk.org/jdk/pull/20963

From rcastanedalo at openjdk.org  Thu Sep 12 10:20:15 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Thu, 12 Sep 2024 10:20:15 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v13]
In-Reply-To: <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com>
Message-ID: <xnL42l4bz2SaiEgSVCDBXXbnZqgnIcYpnSy7tM2WY1w=.094f34bb-4587-4c45-ad44-daf83705515a@github.com>

On Wed, 11 Sep 2024 17:38:57 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> This is the main body of the JEP 450: Compact Object Headers (Experimental).
>> 
>> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
>> 
>> Main changes:
>>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>>  - Arrays will now store their length at offset 8.
>>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co...
>
> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Revert accidental change of UCOH default

src/hotspot/share/opto/lcm.cpp line 272:

> 270:         const TypePtr* tptr;
> 271:         if ((UseCompressedOops || UseCompressedClassPointers) &&
> 272:             (CompressedOops::shift() == 0 || CompressedKlassPointers::shift() == 0)) {

Could you explain this change? It seems like it may affect C2's implicit null check analysis even for `-XX:-UseCompactObjectHeaders`. In particular, for the following configurations, the changed condition evaluates to true before the change and false after it, regardless of whether `UseCompactObjectHeaders` is enabled:

(!UseCompressedOops,  UseCompressedClassPointers, CompressedKlassPointers::shift() != 0)
( UseCompressedOops, !UseCompressedClassPointers, CompressedOops::shift() != 0)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1756570168

From rcastanedalo at openjdk.org  Thu Sep 12 11:49:15 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Thu, 12 Sep 2024 11:49:15 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v13]
In-Reply-To: <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com>
Message-ID: <s0uUPQ4WJ3JMVQStLUrhTK_Kjn3joFNQOX7Hm_JRbqE=.44a6c8ee-fad6-4aa6-8e95-cd8d23beab60@github.com>

On Wed, 11 Sep 2024 17:38:57 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> This is the main body of the JEP 450: Compact Object Headers (Experimental).
>> 
>> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
>> 
>> Main changes:
>>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>>  - Arrays will now store their length at offset 8.
>>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co...
>
> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Revert accidental change of UCOH default

src/hotspot/share/cds/filemap.cpp line 2457:

> 2455:                           compressed_oops(), compressed_class_pointers());
> 2456:   if (compressed_oops() != UseCompressedOops || compressed_class_pointers() != UseCompressedClassPointers) {
> 2457:     log_info(cds)("Unable to use shared archive.\nThe saved state of UseCompressedOops and UseCompressedClassPointers is "

The promotion of this CDS log line from `info` to `warning` triggers false failures in the `test/hotspot/jtreg/compiler/intrinsics/bmi` tests when running them with `-XX:-UseCompressedClassPointers`. These tests expect the standard output of different JVM runs to be identical, but the timestamps in the log messages tend to differ. I suggest adjusting the test configuration so that log timestamps are simply omitted, as in [this patch](https://github.com/robcasloz/jdk/commit/48f6e90ef6e0a71b55df536ed04a8b72130b5ea9) (feel free to merge it as-is or with any further changes you may find necessary).

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1756699774

From tschatzl at openjdk.org  Thu Sep 12 12:57:05 2024
From: tschatzl at openjdk.org (Thomas Schatzl)
Date: Thu, 12 Sep 2024 12:57:05 GMT
Subject: RFR: 8314842: zgc/genzgc tests ignore vm flags
In-Reply-To: <7p6dM8Eb8dQSgTA_pDvyOLs3Qu6YB4uJT9LxZbCiwcU=.2fc89050-0ebd-4a4f-8968-7bfb824c3dac@github.com>
References: <7p6dM8Eb8dQSgTA_pDvyOLs3Qu6YB4uJT9LxZbCiwcU=.2fc89050-0ebd-4a4f-8968-7bfb824c3dac@github.com>
Message-ID: <l6qi8hONu3_TnsnKzPmarToE2NGg6A0hV01BgLNINb8=.804a9227-c4d0-4a6d-b164-853d2d858eec@github.com>

On Thu, 12 Sep 2024 09:31:46 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:

> Change some ZGC tests to propagate requested vm flags.
> 
> I tested this by manually passing in various flags to jtreg. I also ran tier1-5 on Linux.

lgtm

-------------

Marked as reviewed by tschatzl (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20963#pullrequestreview-2300218267

From rkennke at openjdk.org  Thu Sep 12 13:16:14 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Thu, 12 Sep 2024 13:16:14 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v11]
In-Reply-To: <kzodWwxj5qdAqPYcP2s4k2QTXqjju2H6Iftu7dojezk=.1f72fc31-94ba-4017-b330-c8688a8d39a4@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <DUgk6HT_Zk5KnbmAd_HzrJi4B8D1J-uuoAGzReWyI8s=.6c04f90a-bf1d-49af-a31d-5224973952d4@github.com>
 <YTte6oY9EeAPHxzzOndL8M202wY91SbeaYC4hPva0W0=.caf00270-89c7-44b0-ab88-a970fb8e840d@github.com>
 <kzodWwxj5qdAqPYcP2s4k2QTXqjju2H6Iftu7dojezk=.1f72fc31-94ba-4017-b330-c8688a8d39a4@github.com>
Message-ID: <hkGaBWSugvpV0PgoR9rnutHCgQzkeXv1ctH7QwV3zK0=.204737d5-fadb-410a-8891-7db59d36830f@github.com>

On Wed, 11 Sep 2024 13:34:28 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> @rkennke Can you please explain the changes in these tests:
>> 
>> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java
>> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java
>> test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java
>> test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java
>> test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java
>> 
>> 
>> You added these IR rule restriction:
>> `@IR(applyIf = {"UseCompactObjectHeaders", "false"},`
>> 
>> This means that if `UseCompactObjectHeaders` is enabled, vectorization seems to be impacted - that could be concerning because it has a performance impact.
>> 
>> I have recently changed a few things in SuperWord, so maybe some of them can be removed, because they now vectorize anyway?
>> 
>> Of course some special tests may just rely on `UseCompactObjectHeaders == false` - but I would need some comments in the tests where you added it to justify why we add the restriction.
>> 
>> Please also test this patch with the cross combinations of `UseCompactObjectHeaders` and `AlignVector` enabled and disabled (and add `VerifyAlignVector` as well).
>
>> @rkennke Can you please explain the changes in these tests:
>> 
>> ```
>> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java
>> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java
>> test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java
>> test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java
>> test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java
>> ```
>> 
>> You added these IR rule restriction: `@IR(applyIf = {"UseCompactObjectHeaders", "false"},`
>> 
>> This means that if `UseCompactObjectHeaders` is enabled, vectorization seems to be impacted - that could be concerning because it has a performance impact.
>> 
>> I have recently changed a few things in SuperWord, so maybe some of them can be removed, because they now vectorize anyway?
>> 
>> Of course some special tests may just rely on `UseCompactObjectHeaders == false` - but I would need some comments in the tests where you added it to justify why we add the restriction.
>> 
>> Please also test this patch with the cross combinations of `UseCompactObjectHeaders` and `AlignVector` enabled and disabled (and add `VerifyAlignVector` as well).
> 
> IIRC (it has been a while), the problem is that with Lilliput (and also without compact headers, but disabling compressed class-pointers -UseCompressedClassPointers, but nobody ever does that), byte[] and long[] start at different offsets (12 and 16, respectively). That is because with compact headers, we are using the 4 bytes after the arraylength, but long-arrays cannot do that because of alignment constraints. The impact is that these tests don't work as expected, because vectorization triggers differently. I don't remember the details, TBH, but I believe they would now generate pre-loops, or some might even not vectorize at all. Those seemed to be use-cases that did not look very important, but I may be wrong. It would be nice to properly fix those tests, or make corresponding tests for compact headers, instead, or improve vectorization to better deal with the offset mismatch, if necessary/possible.
> 
> I will re-evaluate those tests, and add comments or remove the restrictions.

> > > @rkennke Can you please explain the changes in these tests:
> > > ```
> > > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java
> > > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java
> > > test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java
> > > test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java
> > > test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java
> > > ```
> > > 
> > > 
> > >     
> > >       
> > >     
> > > 
> > >       
> > >     
> > > 
> > >     
> > >   
> > > You added these IR rule restriction: `@IR(applyIf = {"UseCompactObjectHeaders", "false"},`
> > > This means that if `UseCompactObjectHeaders` is enabled, vectorization seems to be impacted - that could be concerning because it has a performance impact.
> > > I have recently changed a few things in SuperWord, so maybe some of them can be removed, because they now vectorize anyway?
> > > Of course some special tests may just rely on `UseCompactObjectHeaders == false` - but I would need some comments in the tests where you added it to justify why we add the restriction.
> > > Please also test this patch with the cross combinations of `UseCompactObjectHeaders` and `AlignVector` enabled and disabled (and add `VerifyAlignVector` as well).
> > 
> > 
> > IIRC (it has been a while), the problem is that with Lilliput (and also without compact headers, but disabling compressed class-pointers -UseCompressedClassPointers, but nobody ever does that), byte[] and long[] start at different offsets (12 and 16, respectively). That is because with compact headers, we are using the 4 bytes after the arraylength, but long-arrays cannot do that because of alignment constraints. The impact is that these tests don't work as expected, because vectorization triggers differently. I don't remember the details, TBH, but I believe they would now generate pre-loops, or some might even not vectorize at all. Those seemed to be use-cases that did not look very important, but I may be wrong. It would be nice to properly fix those tests, or make corresponding tests for compact headers, instead, or improve vectorization to better deal with the offset mismatch, if necessary/possible.
> > I will re-evaluate those tests, and add comments or remove the restrictions.
> 
> If it has indeed been a while, then it might well be that some of them work now, since I did make some improvements to auto-vectorization ;)
> 
> My suggestion is this: go over the examples, check which ones are now ok. Those that are not ok, add a comment, and file a bug: I can then analyze those cases later, and see how to write other tests or improve auto-vectorization.

Indeed, I could re-enable all tests in:

test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java
test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java
test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java

but unfortunately not those others:

> > > test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java
> > > test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java


I think the issue with all of them is that vectorization in those scenarios only works when the operations inside the loop start at an array index that addresses an element at 8-byte-aligned offset.

I have filed https://bugs.openjdk.org/browse/JDK-8340010 to track it.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2346250313

From epeter at openjdk.org  Thu Sep 12 13:23:14 2024
From: epeter at openjdk.org (Emanuel Peter)
Date: Thu, 12 Sep 2024 13:23:14 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v11]
In-Reply-To: <hkGaBWSugvpV0PgoR9rnutHCgQzkeXv1ctH7QwV3zK0=.204737d5-fadb-410a-8891-7db59d36830f@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <DUgk6HT_Zk5KnbmAd_HzrJi4B8D1J-uuoAGzReWyI8s=.6c04f90a-bf1d-49af-a31d-5224973952d4@github.com>
 <YTte6oY9EeAPHxzzOndL8M202wY91SbeaYC4hPva0W0=.caf00270-89c7-44b0-ab88-a970fb8e840d@github.com>
 <kzodWwxj5qdAqPYcP2s4k2QTXqjju2H6Iftu7dojezk=.1f72fc31-94ba-4017-b330-c8688a8d39a4@github.com>
 <hkGaBWSugvpV0PgoR9rnutHCgQzkeXv1ctH7QwV3zK0=.204737d5-fadb-410a-8891-7db59d36830f@github.com>
Message-ID: <shoDPnL4nQyaYX4xy-WyzQ7CwsoCbAeCE3c755CkE1o=.caf940aa-5fbb-4cb5-a4a2-68a2452ffe1b@github.com>

On Thu, 12 Sep 2024 13:13:01 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

> > > > @rkennke Can you please explain the changes in these tests:
> > > > ```
> > > > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java
> > > > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java
> > > > test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java
> > > > test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java
> > > > test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java
> > > > ```
> > > > 
> > > > 
> > > >     
> > > >       
> > > >     
> > > > 
> > > >       
> > > >     
> > > > 
> > > >     
> > > >   
> > > > You added these IR rule restriction: `@IR(applyIf = {"UseCompactObjectHeaders", "false"},`
> > > > This means that if `UseCompactObjectHeaders` is enabled, vectorization seems to be impacted - that could be concerning because it has a performance impact.
> > > > I have recently changed a few things in SuperWord, so maybe some of them can be removed, because they now vectorize anyway?
> > > > Of course some special tests may just rely on `UseCompactObjectHeaders == false` - but I would need some comments in the tests where you added it to justify why we add the restriction.
> > > > Please also test this patch with the cross combinations of `UseCompactObjectHeaders` and `AlignVector` enabled and disabled (and add `VerifyAlignVector` as well).
> > > 
> > > 
> > > IIRC (it has been a while), the problem is that with Lilliput (and also without compact headers, but disabling compressed class-pointers -UseCompressedClassPointers, but nobody ever does that), byte[] and long[] start at different offsets (12 and 16, respectively). That is because with compact headers, we are using the 4 bytes after the arraylength, but long-arrays cannot do that because of alignment constraints. The impact is that these tests don't work as expected, because vectorization triggers differently. I don't remember the details, TBH, but I believe they would now generate pre-loops, or some might even not vectorize at all. Those seemed to be use-cases that did not look very important, but I may be wrong. It would be nice to properly fix those tests, or make corresponding tests for compact headers, instead, or improve vectorization to better deal with the offset mismatch, if necessary/possible.
> > > I will re-evaluate those tests, and add comments or remove the restrictions.
> > 
> > 
> > If it has indeed been a while, then it might well be that some of them work now, since I did make some improvements to auto-vectorization ;)
> > My suggestion is this: go over the examples, check which ones are now ok. Those that are not ok, add a comment, and file a bug: I can then analyze those cases later, and see how to write other tests or improve auto-vectorization.
> 
> Indeed, I could re-enable all tests in:
> 
> ```
> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java
> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java
> test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java
> ```
> 
> but unfortunately not those others:
> 
> ```
> > > > test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java
> > > > test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java
> ```
> 
> I think the issue with all of them is that vectorization in those scenarios only works when the operations inside the loop start at an array index that addresses an element at 8-byte-aligned offset.
> 
> I have filed https://bugs.openjdk.org/browse/JDK-8340010 to track it.

Excellent, that is what I hoped for! Thanks for filing the bug, I'll look into it once this is integrated. You should probably mark it as "blocked by", not "related to" ;)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2346266568

From stuefe at openjdk.org  Thu Sep 12 15:41:22 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Thu, 12 Sep 2024 15:41:22 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v13]
In-Reply-To: <xnL42l4bz2SaiEgSVCDBXXbnZqgnIcYpnSy7tM2WY1w=.094f34bb-4587-4c45-ad44-daf83705515a@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com>
 <xnL42l4bz2SaiEgSVCDBXXbnZqgnIcYpnSy7tM2WY1w=.094f34bb-4587-4c45-ad44-daf83705515a@github.com>
Message-ID: <jqZ-1tR0WMbnDldGMusdqMnswTGj1hbOIY3JdUFtIsM=.4169760d-d5cf-4335-9e53-baf885d9422f@github.com>

On Thu, 12 Sep 2024 10:17:47 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Revert accidental change of UCOH default
>
> src/hotspot/share/opto/lcm.cpp line 272:
> 
>> 270:         const TypePtr* tptr;
>> 271:         if ((UseCompressedOops || UseCompressedClassPointers) &&
>> 272:             (CompressedOops::shift() == 0 || CompressedKlassPointers::shift() == 0)) {
> 
> Could you explain this change? It seems like it may affect C2's implicit null check analysis even for `-XX:-UseCompactObjectHeaders`. In particular, for the following configurations, the changed condition evaluates to true before the change and false after it, regardless of whether `UseCompactObjectHeaders` is enabled:
> 
> (!UseCompressedOops,  UseCompressedClassPointers, CompressedKlassPointers::shift() != 0)
> ( UseCompressedOops, !UseCompressedClassPointers, CompressedOops::shift() != 0)

Hi @robcasloz 

The `CompressedKlassPointers` utility class is not usable anymore with `-UseCompressedClassPointers`. One change is that if `UseCompressedClassPointers` is off, `CompressedKlassPointers` stays uninitialized. And that makes more sense then to rely on the static initialization values of `CompressedOops::_shift`.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1757126946

From stuefe at openjdk.org  Thu Sep 12 15:46:17 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Thu, 12 Sep 2024 15:46:17 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v13]
In-Reply-To: <GQVVR0K-mDh4evKvrKG9eXwwF4j7zAg-8ai4__gWNGE=.3c502820-717b-44b7-b82d-a24dd7fdd9d5@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <DUgk6HT_Zk5KnbmAd_HzrJi4B8D1J-uuoAGzReWyI8s=.6c04f90a-bf1d-49af-a31d-5224973952d4@github.com>
 <GQVVR0K-mDh4evKvrKG9eXwwF4j7zAg-8ai4__gWNGE=.3c502820-717b-44b7-b82d-a24dd7fdd9d5@github.com>
Message-ID: <-CikzUsH1qKbMujGJQFhaPlKaCUDzqH-jEZNM5BZVQQ=.22d236a1-a69a-42e0-86d1-aa738c6e6e6d@github.com>

On Wed, 11 Sep 2024 14:47:07 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Revert accidental change of UCOH default
>
> src/hotspot/share/opto/machnode.cpp line 390:
> 
>> 388:     t = t->make_ptr();
>> 389:   }
>> 390:   if (t->isa_narrowklass() && CompressedKlassPointers::shift() == 0) {
> 
> Does this change have any effect? `UseCompressedClassPointers` should be implied by `t->isa_narrowklass()`.

I don't remember if this change was a reaction to an error or if I just guarded `CompressedKlassPointers::shift()` with +UseCCP because that is the prerequisite now. Probably the latter. I can remove this. Probably should assert then for +UseCCP.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1757135035

From stuefe at openjdk.org  Thu Sep 12 16:08:15 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Thu, 12 Sep 2024 16:08:15 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v8]
In-Reply-To: <vt3oD_XW4eFNTnPYoSa0-TZiEsFtBXJKfPdsIoqVuXM=.7e14c4df-4c66-4f52-9a59-e86fbcb6e7d9@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <J0g0PSye768dhN3JoXFsZ6FivocNBLPmHNtRUCOmLTw=.5756c64a-4763-4727-9637-ba3fe7b9b17d@github.com>
 <vt3oD_XW4eFNTnPYoSa0-TZiEsFtBXJKfPdsIoqVuXM=.7e14c4df-4c66-4f52-9a59-e86fbcb6e7d9@github.com>
Message-ID: <N10Px5txoEoXh2oO1BEYEhSY7gysOWtYjuBxFg_DArk=.fad5b9e3-f602-445a-aa34-180ebe1fd52a@github.com>

On Mon, 9 Sep 2024 15:58:29 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

>> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision:
>> 
>>  - Try to avoid lea in loadNklass (aarch64)
>>  - Fix release build error
>
> src/hotspot/cpu/aarch64/compressedKlass_aarch64.cpp line 147:
> 
>> 145: #endif
>> 146: 
>> 147:   return true;
> 
> This should only be in the compressedKlass.cpp file.

Okay. I will remove the whole `CompressedKlassPointers::pd_initialize` logic. We only need it for one architecture (aarch) and one case (+UseCCP -UseCOH), so maybe its not worth fanning out across all platforms, including Zero. Instead, I will add a short `ifdef` section to `CompressedKlassPointers::initialize`.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1757169570

From coleenp at openjdk.org  Thu Sep 12 17:37:15 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Thu, 12 Sep 2024 17:37:15 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v8]
In-Reply-To: <N10Px5txoEoXh2oO1BEYEhSY7gysOWtYjuBxFg_DArk=.fad5b9e3-f602-445a-aa34-180ebe1fd52a@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <J0g0PSye768dhN3JoXFsZ6FivocNBLPmHNtRUCOmLTw=.5756c64a-4763-4727-9637-ba3fe7b9b17d@github.com>
 <vt3oD_XW4eFNTnPYoSa0-TZiEsFtBXJKfPdsIoqVuXM=.7e14c4df-4c66-4f52-9a59-e86fbcb6e7d9@github.com>
 <N10Px5txoEoXh2oO1BEYEhSY7gysOWtYjuBxFg_DArk=.fad5b9e3-f602-445a-aa34-180ebe1fd52a@github.com>
Message-ID: <CT8NA5dTl_1inuqDp7v8-SFLzfRqW35LRYMh4FnASnc=.91f561db-d5a7-4f25-bcef-da5a1fb7cb11@github.com>

On Thu, 12 Sep 2024 16:04:45 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> src/hotspot/cpu/aarch64/compressedKlass_aarch64.cpp line 147:
>> 
>>> 145: #endif
>>> 146: 
>>> 147:   return true;
>> 
>> This should only be in the compressedKlass.cpp file.
>
> Okay. I will remove the whole `CompressedKlassPointers::pd_initialize` logic. We only need it for one architecture (aarch) and one case (+UseCCP -UseCOH), so maybe its not worth fanning out across all platforms, including Zero. Instead, I will add a short `ifdef` section to `CompressedKlassPointers::initialize`.

Yes, looking at this further, it does seem like a small amount of conditional compilation that sets all the same values that are set in the architecture independent version.  It seems best to move it there.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1757300544

From kdnilsen at openjdk.org  Thu Sep 12 20:29:39 2024
From: kdnilsen at openjdk.org (Kelvin Nilsen)
Date: Thu, 12 Sep 2024 20:29:39 GMT
Subject: RFR: 8339960: Shenandoah: Fix inconsistencies in generational
 Shenandoah behaviors
Message-ID: <vzXjdQg2wTPzgTG1aU7_CUnAxtnlt_2o01drZXtI324=.100816d9-1ed5-497c-be51-fd540f496cb7@github.com>

This fixes some bugs found in recent code review and playback of an assertion failure.

See also https://github.com/openjdk/shenandoah/pull/497

-------------

Commit messages:
 - Use -1 for rightmost interval when range is empty
 - Check available rather than capacity before logging shortfall
 - Merge branch 'openjdk:master' into master
 - Merge branch 'openjdk:master' into master
 - Merge branch 'openjdk:master' into master
 - Revert "Make GC logging less verbose"
 - Make GC logging less verbose
 - Merge branch 'openjdk:master' into master
 - Merge branch 'openjdk:master' into master
 - Merge branch 'openjdk:master' into master
 - ... and 13 more: https://git.openjdk.org/jdk/compare/81ff91ef...f1ba63f4

Changes: https://git.openjdk.org/jdk/pull/20974/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20974&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8339960
  Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/20974.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20974/head:pull/20974

PR: https://git.openjdk.org/jdk/pull/20974

From lmesnik at openjdk.org  Fri Sep 13 01:08:06 2024
From: lmesnik at openjdk.org (Leonid Mesnik)
Date: Fri, 13 Sep 2024 01:08:06 GMT
Subject: RFR: 8314842: zgc/genzgc tests ignore vm flags
In-Reply-To: <7p6dM8Eb8dQSgTA_pDvyOLs3Qu6YB4uJT9LxZbCiwcU=.2fc89050-0ebd-4a4f-8968-7bfb824c3dac@github.com>
References: <7p6dM8Eb8dQSgTA_pDvyOLs3Qu6YB4uJT9LxZbCiwcU=.2fc89050-0ebd-4a4f-8968-7bfb824c3dac@github.com>
Message-ID: <azfHDPgUOJGa3Y1V-5dqXBseaN151TFFE8tTfKPmN8U=.a9161a6d-4f1b-4a67-bb8f-e0b15f2c9b83@github.com>

On Thu, 12 Sep 2024 09:31:46 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:

> Change some ZGC tests to propagate requested vm flags.
> 
> I tested this by manually passing in various flags to jtreg. I also ran tier1-5 on Linux.

Marked as reviewed by lmesnik (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/20963#pullrequestreview-2301742629

From stefank at openjdk.org  Fri Sep 13 05:50:12 2024
From: stefank at openjdk.org (Stefan Karlsson)
Date: Fri, 13 Sep 2024 05:50:12 GMT
Subject: RFR: 8314842: zgc/genzgc tests ignore vm flags
In-Reply-To: <7p6dM8Eb8dQSgTA_pDvyOLs3Qu6YB4uJT9LxZbCiwcU=.2fc89050-0ebd-4a4f-8968-7bfb824c3dac@github.com>
References: <7p6dM8Eb8dQSgTA_pDvyOLs3Qu6YB4uJT9LxZbCiwcU=.2fc89050-0ebd-4a4f-8968-7bfb824c3dac@github.com>
Message-ID: <7SojoPqqF1B9dH2Dw7FSmdiGxx_eLJ8-pFksb2TP4k8=.b64e2636-6d9e-4581-8395-422e4351e4a4@github.com>

On Thu, 12 Sep 2024 09:31:46 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:

> Change some ZGC tests to propagate requested vm flags.
> 
> I tested this by manually passing in various flags to jtreg. I also ran tier1-5 on Linux.

Thanks for the reviews.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20963#issuecomment-2348080550

From stefank at openjdk.org  Fri Sep 13 05:50:12 2024
From: stefank at openjdk.org (Stefan Karlsson)
Date: Fri, 13 Sep 2024 05:50:12 GMT
Subject: Integrated: 8314842: zgc/genzgc tests ignore vm flags
In-Reply-To: <7p6dM8Eb8dQSgTA_pDvyOLs3Qu6YB4uJT9LxZbCiwcU=.2fc89050-0ebd-4a4f-8968-7bfb824c3dac@github.com>
References: <7p6dM8Eb8dQSgTA_pDvyOLs3Qu6YB4uJT9LxZbCiwcU=.2fc89050-0ebd-4a4f-8968-7bfb824c3dac@github.com>
Message-ID: <5s0GWk65p6QVVc6yXL8F8HCnDp3E1Cs0w6_29EcqUaQ=.01b6f839-312a-439a-9900-43a8908c74e2@github.com>

On Thu, 12 Sep 2024 09:31:46 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:

> Change some ZGC tests to propagate requested vm flags.
> 
> I tested this by manually passing in various flags to jtreg. I also ran tier1-5 on Linux.

This pull request has now been integrated.

Changeset: ae75ca05
Author:    Stefan Karlsson <stefank at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/ae75ca05e450da577e712eb7ed9dd9203616b80b
Stats:     6 lines in 6 files changed: 0 ins; 0 del; 6 mod

8314842: zgc/genzgc tests ignore vm flags

Reviewed-by: tschatzl, lmesnik

-------------

PR: https://git.openjdk.org/jdk/pull/20963

From rcastanedalo at openjdk.org  Fri Sep 13 06:46:15 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Fri, 13 Sep 2024 06:46:15 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v13]
In-Reply-To: <-CikzUsH1qKbMujGJQFhaPlKaCUDzqH-jEZNM5BZVQQ=.22d236a1-a69a-42e0-86d1-aa738c6e6e6d@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <DUgk6HT_Zk5KnbmAd_HzrJi4B8D1J-uuoAGzReWyI8s=.6c04f90a-bf1d-49af-a31d-5224973952d4@github.com>
 <GQVVR0K-mDh4evKvrKG9eXwwF4j7zAg-8ai4__gWNGE=.3c502820-717b-44b7-b82d-a24dd7fdd9d5@github.com>
 <-CikzUsH1qKbMujGJQFhaPlKaCUDzqH-jEZNM5BZVQQ=.22d236a1-a69a-42e0-86d1-aa738c6e6e6d@github.com>
Message-ID: <P6u4aaxhJr4zjXT4_J993iocS4F11e7lfHjnnGqIJKc=.1bf65c03-fe7f-4b1a-ba08-26c64dafa3f7@github.com>

On Thu, 12 Sep 2024 15:42:59 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> src/hotspot/share/opto/machnode.cpp line 390:
>> 
>>> 388:     t = t->make_ptr();
>>> 389:   }
>>> 390:   if (t->isa_narrowklass() && CompressedKlassPointers::shift() == 0) {
>> 
>> Does this change have any effect? `UseCompressedClassPointers` should be implied by `t->isa_narrowklass()`.
>
> I don't remember if this change was a reaction to an error or if I just guarded `CompressedKlassPointers::shift()` with +UseCCP because that is the prerequisite now. Probably the latter. I can remove this. Probably should assert then for +UseCCP.

I see, thanks. In that case, I would suggest removing the explicit `UseCompressedClassPointers` test, since it should be implied by `t->isa_narrowklass()`. `check_init()` within `CompressedKlassPointers::shift()` would already fail for the unexpected case where `t->isa_narrowklass() && !UseCompressedClassPointers`, no?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758270661

From rcastanedalo at openjdk.org  Fri Sep 13 07:49:15 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Fri, 13 Sep 2024 07:49:15 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v13]
In-Reply-To: <jqZ-1tR0WMbnDldGMusdqMnswTGj1hbOIY3JdUFtIsM=.4169760d-d5cf-4335-9e53-baf885d9422f@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com>
 <xnL42l4bz2SaiEgSVCDBXXbnZqgnIcYpnSy7tM2WY1w=.094f34bb-4587-4c45-ad44-daf83705515a@github.com>
 <jqZ-1tR0WMbnDldGMusdqMnswTGj1hbOIY3JdUFtIsM=.4169760d-d5cf-4335-9e53-baf885d9422f@github.com>
Message-ID: <JNixrGzCB7AoiRE73hQkycIX130qFzKq-ibQFMfsu9c=.e926bacd-33dd-421d-8154-4a88fe3232d9@github.com>

On Thu, 12 Sep 2024 15:38:18 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> src/hotspot/share/opto/lcm.cpp line 272:
>> 
>>> 270:         const TypePtr* tptr;
>>> 271:         if ((UseCompressedOops || UseCompressedClassPointers) &&
>>> 272:             (CompressedOops::shift() == 0 || CompressedKlassPointers::shift() == 0)) {
>> 
>> Could you explain this change? It seems like it may affect C2's implicit null check analysis even for `-XX:-UseCompactObjectHeaders`. In particular, for the following configurations, the changed condition evaluates to true before the change and false after it, regardless of whether `UseCompactObjectHeaders` is enabled:
>> 
>> (!UseCompressedOops,  UseCompressedClassPointers, CompressedKlassPointers::shift() != 0)
>> ( UseCompressedOops, !UseCompressedClassPointers, CompressedOops::shift() != 0)
>
> Hi @robcasloz 
> 
> The `CompressedKlassPointers` utility class is not usable anymore with `-UseCompressedClassPointers`. One change is that if `UseCompressedClassPointers` is off, `CompressedKlassPointers` stays uninitialized. And that makes more sense then to rely on the static initialization values of `CompressedOops::_shift`.

Thanks for the explanation. I wonder if the test is necessary at all, or one could simply use `base->get_ptr_type()` unconditionally, which defaults to `base->bottom_type()->isa_ptr()` anyway for non-compressed pointers. But this simplification would be in any case out of the scope of this changeset.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758356268

From rcastanedalo at openjdk.org  Fri Sep 13 07:57:15 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Fri, 13 Sep 2024 07:57:15 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v13]
In-Reply-To: <s0uUPQ4WJ3JMVQStLUrhTK_Kjn3joFNQOX7Hm_JRbqE=.44a6c8ee-fad6-4aa6-8e95-cd8d23beab60@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com>
 <s0uUPQ4WJ3JMVQStLUrhTK_Kjn3joFNQOX7Hm_JRbqE=.44a6c8ee-fad6-4aa6-8e95-cd8d23beab60@github.com>
Message-ID: <lBpTKXccIP3OSowW43K_kl_a_yVs7FoGT_05fJGxDv8=.f720318f-9a1c-497c-9821-866ed35104c4@github.com>

On Thu, 12 Sep 2024 11:46:35 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Revert accidental change of UCOH default
>
> src/hotspot/share/cds/filemap.cpp line 2457:
> 
>> 2455:                           compressed_oops(), compressed_class_pointers());
>> 2456:   if (compressed_oops() != UseCompressedOops || compressed_class_pointers() != UseCompressedClassPointers) {
>> 2457:     log_info(cds)("Unable to use shared archive.\nThe saved state of UseCompressedOops and UseCompressedClassPointers is "
> 
> The promotion of this CDS log line from `info` to `warning` triggers false failures in the `test/hotspot/jtreg/compiler/intrinsics/bmi` tests when running them with `-XX:-UseCompressedClassPointers`. These tests expect the standard output of different JVM runs to be identical, but the timestamps in the log messages tend to differ. I suggest adjusting the test configuration so that log timestamps are simply omitted, as in [this patch](https://github.com/robcasloz/jdk/commit/48f6e90ef6e0a71b55df536ed04a8b72130b5ea9) (feel free to merge it as-is or with any further changes you may find necessary).

This comment has been marked as "resolved" without any apparent action being taken, is that intentional?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758369787

From rkennke at openjdk.org  Fri Sep 13 08:21:54 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Fri, 13 Sep 2024 08:21:54 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v14]
In-Reply-To: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
Message-ID: <3-nOaxvBWIcOLzCOlrWPzJtsYRknVz5JIwx21X8xkIg=.6ce32f15-a4d6-44d5-9b9e-c3015de45e66@github.com>

> This is the main body of the JEP 450: Compact Object Headers (Experimental).
> 
> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
> 
> Main changes:
>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>  - Arrays will now store their length at offset 8.
>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv...

Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:

  Hide log timestamps in test to prevent false failures

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20677/files
  - new: https://git.openjdk.org/jdk/pull/20677/files/9e008ac1..69f1ef1d

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=13
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=12-13

  Stats: 5 lines in 1 file changed: 4 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/20677.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677

PR: https://git.openjdk.org/jdk/pull/20677

From rkennke at openjdk.org  Fri Sep 13 08:21:55 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Fri, 13 Sep 2024 08:21:55 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v14]
In-Reply-To: <lBpTKXccIP3OSowW43K_kl_a_yVs7FoGT_05fJGxDv8=.f720318f-9a1c-497c-9821-866ed35104c4@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com>
 <s0uUPQ4WJ3JMVQStLUrhTK_Kjn3joFNQOX7Hm_JRbqE=.44a6c8ee-fad6-4aa6-8e95-cd8d23beab60@github.com>
 <lBpTKXccIP3OSowW43K_kl_a_yVs7FoGT_05fJGxDv8=.f720318f-9a1c-497c-9821-866ed35104c4@github.com>
Message-ID: <99QfaesSJzBLGXsBKOdiSwjAdt18pwNMh62Pyhr-6bk=.b27f001b-e3e3-4826-9542-698eef2a9ee3@github.com>

On Fri, 13 Sep 2024 07:54:30 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> src/hotspot/share/cds/filemap.cpp line 2457:
>> 
>>> 2455:                           compressed_oops(), compressed_class_pointers());
>>> 2456:   if (compressed_oops() != UseCompressedOops || compressed_class_pointers() != UseCompressedClassPointers) {
>>> 2457:     log_info(cds)("Unable to use shared archive.\nThe saved state of UseCompressedOops and UseCompressedClassPointers is "
>> 
>> The promotion of this CDS log line from `info` to `warning` triggers false failures in the `test/hotspot/jtreg/compiler/intrinsics/bmi` tests when running them with `-XX:-UseCompressedClassPointers`. These tests expect the standard output of different JVM runs to be identical, but the timestamps in the log messages tend to differ. I suggest adjusting the test configuration so that log timestamps are simply omitted, as in [this patch](https://github.com/robcasloz/jdk/commit/48f6e90ef6e0a71b55df536ed04a8b72130b5ea9) (feel free to merge it as-is or with any further changes you may find necessary).
>
> This comment has been marked as "resolved" without any apparent action being taken, is that intentional?

I have merged your patch locally but forgot to push it. Sorry.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758407575

From tschatzl at openjdk.org  Fri Sep 13 08:34:09 2024
From: tschatzl at openjdk.org (Thomas Schatzl)
Date: Fri, 13 Sep 2024 08:34:09 GMT
Subject: RFR: 8339648: ZGC: Division by zero in rule_major_allocation_rate
 [v2]
In-Reply-To: <ftQ3_zwYKXg1VEzKfx5XqYxCAWZtbhskNJTD47st1Jg=.eadd39e7-5a1e-4001-ab10-901c9ef57581@github.com>
References: <ElUYTplxxvkJLtxj5OdpcsgumQV8zdml9oyQim2kBP8=.695e816f-a639-4b39-92d2-ea5de5cb1037@github.com>
 <ftQ3_zwYKXg1VEzKfx5XqYxCAWZtbhskNJTD47st1Jg=.eadd39e7-5a1e-4001-ab10-901c9ef57581@github.com>
Message-ID: <d0ktMAy479FMoekk7SIPAUSN-gJ7XBQVeYlv-SLWUdg=.09f66056-999c-469c-b63c-38eda72fb8cb@github.com>

On Mon, 9 Sep 2024 11:37:41 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

>> The HS jtreg test gc/stringdedup/TestStringDeduplicationAgeThreshold_ZGenerational
>> shows this error when running with ubsan enabled 
>> 
>> src/hotspot/share/gc/z/zDirector.cpp:491:74: runtime error: division by zero
>>     #0 0x7f09886401d4 in rule_major_allocation_rate src/hotspot/share/gc/z/zDirector.cpp:491
>>     #1 0x7f09886401d4 in start_gc src/hotspot/share/gc/z/zDirector.cpp:822
>>     #2 0x7f09886401d4 in ZDirector::run_thread() src/hotspot/share/gc/z/zDirector.cpp:912
>>     #3 0x7f098c1404e8 in ZThread::run_service() src/hotspot/share/gc/z/zThread.cpp:29
>>     #4 0x7f09897cac19 in ConcurrentGCThread::run() src/hotspot/share/gc/shared/concurrentGCThread.cpp:48
>>     #5 0x7f098bb46b0a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225
>>     #6 0x7f098b1a9881 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858
>
> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Adjust division following suggestion by xmas

Lgtm but see the additional comment.

src/hotspot/share/gc/z/zDirector.cpp line 490:

> 488: 
> 489:   // Calculate the GC cost for each reclaimed byte
> 490:   const double current_young_gc_time_per_bytes_freed = double(young_gc_time) / double(reclaimed_per_young_gc);

Could this division have the same issue?

-------------

Marked as reviewed by tschatzl (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20888#pullrequestreview-2302481070
PR Review Comment: https://git.openjdk.org/jdk/pull/20888#discussion_r1758431387

From aboldtch at openjdk.org  Fri Sep 13 09:03:09 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Fri, 13 Sep 2024 09:03:09 GMT
Subject: RFR: 8339648: ZGC: Division by zero in rule_major_allocation_rate
 [v2]
In-Reply-To: <d0ktMAy479FMoekk7SIPAUSN-gJ7XBQVeYlv-SLWUdg=.09f66056-999c-469c-b63c-38eda72fb8cb@github.com>
References: <ElUYTplxxvkJLtxj5OdpcsgumQV8zdml9oyQim2kBP8=.695e816f-a639-4b39-92d2-ea5de5cb1037@github.com>
 <ftQ3_zwYKXg1VEzKfx5XqYxCAWZtbhskNJTD47st1Jg=.eadd39e7-5a1e-4001-ab10-901c9ef57581@github.com>
 <d0ktMAy479FMoekk7SIPAUSN-gJ7XBQVeYlv-SLWUdg=.09f66056-999c-469c-b63c-38eda72fb8cb@github.com>
Message-ID: <S7UCGPx836yVT3PtgcKKMGsuei0jbHbWeruSpYU1sXI=.bbd22098-9bef-4c62-9ab9-55934a32d849@github.com>

On Fri, 13 Sep 2024 08:31:43 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:

>> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Adjust division following suggestion by xmas
>
> src/hotspot/share/gc/z/zDirector.cpp line 490:
> 
>> 488: 
>> 489:   // Calculate the GC cost for each reclaimed byte
>> 490:   const double current_young_gc_time_per_bytes_freed = double(young_gc_time) / double(reclaimed_per_young_gc);
> 
> Could this division have the same issue?

Yes, it could if no memory has been reclaimed at all (since the VM started). Similar issues would occur in the call to `calculate_extra_young_gc_time` below. And there I think the problem is even worse, because we might end up with `inf - inf == -nan`.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20888#discussion_r1758476392

From aboldtch at openjdk.org  Fri Sep 13 09:19:04 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Fri, 13 Sep 2024 09:19:04 GMT
Subject: RFR: 8339648: ZGC: Division by zero in rule_major_allocation_rate
 [v2]
In-Reply-To: <S7UCGPx836yVT3PtgcKKMGsuei0jbHbWeruSpYU1sXI=.bbd22098-9bef-4c62-9ab9-55934a32d849@github.com>
References: <ElUYTplxxvkJLtxj5OdpcsgumQV8zdml9oyQim2kBP8=.695e816f-a639-4b39-92d2-ea5de5cb1037@github.com>
 <ftQ3_zwYKXg1VEzKfx5XqYxCAWZtbhskNJTD47st1Jg=.eadd39e7-5a1e-4001-ab10-901c9ef57581@github.com>
 <d0ktMAy479FMoekk7SIPAUSN-gJ7XBQVeYlv-SLWUdg=.09f66056-999c-469c-b63c-38eda72fb8cb@github.com>
 <S7UCGPx836yVT3PtgcKKMGsuei0jbHbWeruSpYU1sXI=.bbd22098-9bef-4c62-9ab9-55934a32d849@github.com>
Message-ID: <FpCekVy5EJUJLfWeO-gUFgT6qiKXyWxKYiAiYQHBlAo=.9bb50256-a0d2-4d85-9282-709e7f0703c3@github.com>

On Fri, 13 Sep 2024 09:00:00 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

>> src/hotspot/share/gc/z/zDirector.cpp line 490:
>> 
>>> 488: 
>>> 489:   // Calculate the GC cost for each reclaimed byte
>>> 490:   const double current_young_gc_time_per_bytes_freed = double(young_gc_time) / double(reclaimed_per_young_gc);
>> 
>> Could this division have the same issue?
>
> Yes, it could if no memory has been reclaimed at all (since the VM started). Similar issues would occur in the call to `calculate_extra_young_gc_time` below. And there I think the problem is even worse, because we might end up with `inf - inf == -nan`.

The case where we have performed a major collection and no young collection has reclaim any memory seems like a very degenerate situation. The solution is probably to handle that case separately, and not try to adapt the current heuristics to handle the extreme values.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20888#discussion_r1758502503

From stuefe at openjdk.org  Fri Sep 13 09:30:24 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Fri, 13 Sep 2024 09:30:24 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v8]
In-Reply-To: <Kr9eGGeMs3LB-ULkPBBkaBBvURU3P7NJItuUR3Gmpmo=.d93bed80-2c06-414d-b8bd-b8f57a4deee3@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <J0g0PSye768dhN3JoXFsZ6FivocNBLPmHNtRUCOmLTw=.5756c64a-4763-4727-9637-ba3fe7b9b17d@github.com>
 <vt3oD_XW4eFNTnPYoSa0-TZiEsFtBXJKfPdsIoqVuXM=.7e14c4df-4c66-4f52-9a59-e86fbcb6e7d9@github.com>
 <sUnwVBIPq6l_4sgw5hOnjU4cP_RL4AM7f9RCFIOK6lc=.cb3e7f4b-e862-48f8-a308-e762ea272294@github.com>
 <Kr9eGGeMs3LB-ULkPBBkaBBvURU3P7NJItuUR3Gmpmo=.d93bed80-2c06-414d-b8bd-b8f57a4deee3@github.com>
Message-ID: <GMDowPeimWiK9fbaAVZsoPsphenfwWpSkMwR2pE0mNw=.818fca12-9537-4ed6-af3b-22ea9c2b5122@github.com>

On Tue, 10 Sep 2024 12:19:32 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

>> This is tricky. We are already deep in initialization and have done a couple of decisions based on +UseCompressedClassPointers (e.g. CDS setup). I *think* we could still go with -UseCCP, but I wonder whether this is wise. 
>> 
>> Note that this error is not new. In the old code, we simply asserted. That left us with UB in release builds, which remains unresolved. I simply made the error explicit in release too.
>
> Ok, in this case, that's fine if we already asserted.  A fatal error is better.

Actually, a lot of the old code had dusty side corners that were UB. Making narrowKlass smaller than 32bit exposed a lot of them, and a lot of the changes in and around CompressedKlassPointers are about cleanly making explicit what before had been implicit or just broken (e.g. a clear distinction between encoding range and Klass range, and a clear handling of narrowKlass bit width as a runtime value).

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758522844

From stuefe at openjdk.org  Fri Sep 13 09:38:15 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Fri, 13 Sep 2024 09:38:15 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v8]
In-Reply-To: <Qmq5KYm4QolMsEt29X8lhx1oWSkBEApS-3iaIXd3L7o=.42bba1bc-d795-4712-a1b6-dab2781a60f7@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <J0g0PSye768dhN3JoXFsZ6FivocNBLPmHNtRUCOmLTw=.5756c64a-4763-4727-9637-ba3fe7b9b17d@github.com>
 <vt3oD_XW4eFNTnPYoSa0-TZiEsFtBXJKfPdsIoqVuXM=.7e14c4df-4c66-4f52-9a59-e86fbcb6e7d9@github.com>
 <Qmq5KYm4QolMsEt29X8lhx1oWSkBEApS-3iaIXd3L7o=.42bba1bc-d795-4712-a1b6-dab2781a60f7@github.com>
Message-ID: <JV_wg0xsmPrv_le1MBjFUxg916tpI5jdP46iDhXGua4=.2b073e9a-0ae7-4031-abdf-f937ee8cd2b1@github.com>

On Tue, 10 Sep 2024 12:13:58 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> src/hotspot/share/oops/compressedKlass.cpp line 243:
>> 
>>> 241:   } else {
>>> 242: 
>>> 243:     // In legacy mode, we try, in order of preference:
>> 
>> Can you not use the word 'legacy' here?  Maybe in "non-compact object header mode"...
>
> okay.

I removed all traces of "legacy" and "tiny", reverting to "standard" or "non-coh" vs "coh". I would prefer to use the shorthand "coh" in some places since "compact object header mode" is a mouthful and gives me RSI :)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758533732

From stefank at openjdk.org  Fri Sep 13 09:44:19 2024
From: stefank at openjdk.org (Stefan Karlsson)
Date: Fri, 13 Sep 2024 09:44:19 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v14]
In-Reply-To: <3-nOaxvBWIcOLzCOlrWPzJtsYRknVz5JIwx21X8xkIg=.6ce32f15-a4d6-44d5-9b9e-c3015de45e66@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <3-nOaxvBWIcOLzCOlrWPzJtsYRknVz5JIwx21X8xkIg=.6ce32f15-a4d6-44d5-9b9e-c3015de45e66@github.com>
Message-ID: <tiMTCfGoBUlu_Sm2daFxgrO5QLzVXU_jWolWV_Vk7nk=.a9e0ba6e-b436-454d-9a76-173cd1e83639@github.com>

On Fri, 13 Sep 2024 08:21:54 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> This is the main body of the JEP 450: Compact Object Headers (Experimental).
>> 
>> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
>> 
>> Main changes:
>>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>>  - Arrays will now store their length at offset 8.
>>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co...
>
> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Hide log timestamps in test to prevent false failures

I went over the oops/ directory and added a few cleanup requests and comments.

src/hotspot/share/oops/instanceOop.hpp line 43:

> 41:     } else {
> 42:       return sizeof(instanceOopDesc);
> 43:     }

This entire function can be removed. It returns the same value as oopDesc::base_offset_in_bytes(), but in a slightly different way.

src/hotspot/share/oops/markWord.hpp line 171:

> 169:     return mask_bits(value(), lock_mask_in_place | self_fwd_mask_in_place) >= static_cast<intptr_t>(marked_value);
> 170:   }
> 171: 

Suggestion to retain code layout.
Suggestion:

src/hotspot/share/oops/markWord.inline.hpp line 29:

> 27: 
> 28: #include "oops/markWord.hpp"
> 29: #include "oops/compressedOops.inline.hpp"

Suggestion:

#include "oops/compressedOops.inline.hpp"
#include "oops/markWord.hpp"

src/hotspot/share/oops/objArrayKlass.cpp line 146:

> 144: 
> 145: size_t ObjArrayKlass::oop_size(oop obj) const {
> 146:   // In this assert, we cannot safely access the Klass* with compact headers.

I would like a comment stating that this assert is turned of because size_give_klass calls oop_size on an object that might be concurrently forwarded.

src/hotspot/share/oops/oop.cpp line 158:

> 156:   // Only has a klass gap when compressed class pointers are used and not
> 157:   // using compact headers.
> 158:   return UseCompressedClassPointers && !UseCompactObjectHeaders;

This comment can just be removed.

src/hotspot/share/oops/oop.hpp line 340:

> 338:       // field offset. Use an offset halfway into the markWord, as the markWord is never
> 339:       // partially loaded from C2.
> 340:       return 4;

I asked around to see what people felt about dropping references to mark_offset_in_bytes(), which we know is 0. There was a request to strive to use mark_offset_in_bytes() for clarity.
Suggestion:

      return mark_offset_in_bytes() + 4;

src/hotspot/share/oops/oop.hpp line 349:

> 347:   static int klass_gap_offset_in_bytes() {
> 348:     assert(has_klass_gap(), "only applicable to compressed klass pointers");
> 349:     assert(!UseCompactObjectHeaders, "don't use klass_gap_offset_in_bytes() with compact headers");

This assert is implied by `has_klass_gap()`. I don't see the need to repeat it here.

src/hotspot/share/oops/oop.hpp line 363:

> 361:       return sizeof(markWord) + sizeof(Klass*);
> 362:     }
> 363:   }

Not a strong request for this PR, but there are many places that calculates almost the same thing, and it might be good to limit the number of places we do similar calculations.

I'm wondering if it wouldn't be better for readability to structure the code as follows:

  static int header_size_in_bytes() {
    if (UseCompactObjectHeaders) {
      return sizeof(markWord);
    } else if (UseCompressedClassPointers) {
      return sizeof(markWord) + sizeof(narrowKlass);
    } else {
      return sizeof(markWord) + sizeof(Klass*);
    }
  }

  // Size of object header, aligned to platform wordSize
  static int header_size() {
    return align_up(header_size_in_bytes(), HeapWordSize) / HeapWordSize;
  }
...  
  static int base_offset_in_bytes() {
    return header_size_in_bytes();
  }

src/hotspot/share/oops/oop.inline.hpp line 161:

> 159: 
> 160: void oopDesc::set_klass_gap(HeapWord* mem, int v) {
> 161:   assert(!UseCompactObjectHeaders, "don't set Klass* gap with compact headers");

We might want to consider just simplifying the function to: 

void oopDesc::set_klass_gap(HeapWord* mem, int v) {
  assert(has_klass_gap(), "precondition");
  *(int*)(((char*)mem) + klass_gap_offset_in_bytes()) = v;
}

src/hotspot/share/oops/oop.inline.hpp line 295:

> 293: // Used by scavengers
> 294: void oopDesc::forward_to(oop p) {
> 295:   assert(cast_from_oop<oopDesc*>(p) != this,

Do we really need the cast here?

-------------

PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2302542279
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758503206
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758482703
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758505713
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758479437
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758478106
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758472909
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758474349
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758528515
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758538380
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758540055

From stefank at openjdk.org  Fri Sep 13 09:44:20 2024
From: stefank at openjdk.org (Stefan Karlsson)
Date: Fri, 13 Sep 2024 09:44:20 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v8]
In-Reply-To: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <J0g0PSye768dhN3JoXFsZ6FivocNBLPmHNtRUCOmLTw=.5756c64a-4763-4727-9637-ba3fe7b9b17d@github.com>
 <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com>
Message-ID: <ZDRwCJRCHrlH268CzJ4P7VDJXnGar40Fl4t_W28L99Y=.c9ad4810-4040-494b-8d54-96160fcd6046@github.com>

On Mon, 9 Sep 2024 12:17:17 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:

>> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision:
>> 
>>  - Try to avoid lea in loadNklass (aarch64)
>>  - Fix release build error
>
> src/hotspot/share/oops/oop.cpp line 230:
> 
>> 228:   // disjunct below to fail if the two comparands are computed across such
>> 229:   // a concurrent change.
>> 230:   return Universe::heap()->is_stw_gc_active() && klass->is_objArray_klass() && is_forwarded() && (UseParallelGC || UseG1GC);
> 
> Is this still true after the recent changes like JDK-8311163? It might be worth waiting for.

That bug doesn't fix all cases where the the length field is modified.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758477168

From tschatzl at openjdk.org  Fri Sep 13 11:15:15 2024
From: tschatzl at openjdk.org (Thomas Schatzl)
Date: Fri, 13 Sep 2024 11:15:15 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v8]
In-Reply-To: <ZDRwCJRCHrlH268CzJ4P7VDJXnGar40Fl4t_W28L99Y=.c9ad4810-4040-494b-8d54-96160fcd6046@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <J0g0PSye768dhN3JoXFsZ6FivocNBLPmHNtRUCOmLTw=.5756c64a-4763-4727-9637-ba3fe7b9b17d@github.com>
 <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com>
 <ZDRwCJRCHrlH268CzJ4P7VDJXnGar40Fl4t_W28L99Y=.c9ad4810-4040-494b-8d54-96160fcd6046@github.com>
Message-ID: <Iv-aWPSvR4eZY9bek0msxpkDVQnvT9XyGcXEAhJ6UWo=.6e03e898-cba2-444f-b075-5b5fae3059c5@github.com>

On Fri, 13 Sep 2024 09:00:32 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:

>> src/hotspot/share/oops/oop.cpp line 230:
>> 
>>> 228:   // disjunct below to fail if the two comparands are computed across such
>>> 229:   // a concurrent change.
>>> 230:   return Universe::heap()->is_stw_gc_active() && klass->is_objArray_klass() && is_forwarded() && (UseParallelGC || UseG1GC);
>> 
>> Is this still true after the recent changes like JDK-8311163? It might be worth waiting for.
>
> That bug doesn't fix all cases where the the length field is modified.

Which ones are remaining? JDK-8337709 implemented the same change for G1 GC before JDK-8311163.

The full collectors/g1 marking do not modify the length fields but have multiple separate queues which is a different issue. It will also be handled by the new `PartialArrayTaskStepper`, but should be of no concern here.

If I am not missing some case, this whole method is unnecessary now.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758672296

From stuefe at openjdk.org  Fri Sep 13 12:51:17 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Fri, 13 Sep 2024 12:51:17 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v14]
In-Reply-To: <EZmLU9kQ--5-_Xi99YZG9HsKibIwnot3VcSI50gP6SY=.058873e4-dc41-460f-a5e4-3887d9be9d48@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com>
 <NR6CKOtdUaVarB7hiAUIAsIMe03qiTQr5VcEOWGt8mA=.164efc10-f0ff-4ddf-80b8-c56599c0ba7d@github.com>
 <EZmLU9kQ--5-_Xi99YZG9HsKibIwnot3VcSI50gP6SY=.058873e4-dc41-460f-a5e4-3887d9be9d48@github.com>
Message-ID: <PNLPNSaq1-U9bu683HGzuoBAyEcmJ4yTVE5CIYVIfFA=.b61fd250-10fa-4299-bb36-c52dd3bbd117@github.com>

On Tue, 10 Sep 2024 12:35:42 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> src/hotspot/share/oops/compressedKlass.cpp line 116:
>> 
>>> 114:   _range = end - _base;
>>> 115: 
>>> 116:   DEBUG_ONLY(assert_is_valid_encoding(addr, len, _base, _shift);)
>> 
>> Can you refactor so the aarch64 path runs this same code without duplication?
>
> In tinycp mode, aarch64 runs this code though? The aarch64 variant of pd_initialize just returns then. In non-COH mode (preexisting, not touched by this patch) Aarch64 needs its own handling.

I refactored: Now we should have no duplication (once my patch hits Romans PR branch)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758800913

From stefank at openjdk.org  Fri Sep 13 12:51:18 2024
From: stefank at openjdk.org (Stefan Karlsson)
Date: Fri, 13 Sep 2024 12:51:18 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v8]
In-Reply-To: <Iv-aWPSvR4eZY9bek0msxpkDVQnvT9XyGcXEAhJ6UWo=.6e03e898-cba2-444f-b075-5b5fae3059c5@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <J0g0PSye768dhN3JoXFsZ6FivocNBLPmHNtRUCOmLTw=.5756c64a-4763-4727-9637-ba3fe7b9b17d@github.com>
 <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com>
 <ZDRwCJRCHrlH268CzJ4P7VDJXnGar40Fl4t_W28L99Y=.c9ad4810-4040-494b-8d54-96160fcd6046@github.com>
 <Iv-aWPSvR4eZY9bek0msxpkDVQnvT9XyGcXEAhJ6UWo=.6e03e898-cba2-444f-b075-5b5fae3059c5@github.com>
Message-ID: <TSrOBVP-x_1zWFqLynDcmtPSgONPen7W5HZ7bwD4_D4=.406eff89-e0da-40a1-998e-7a77c895fdd9@github.com>

On Fri, 13 Sep 2024 11:10:58 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:

>> That bug doesn't fix all cases where the the length field is modified.
>
> Which ones are remaining? JDK-8337709 implemented the same change for G1 GC before JDK-8311163.
> 
> The full collectors/g1 marking do not modify the length fields but have multiple separate queues which is a different issue. It will also be handled by the new `PartialArrayTaskStepper`, but should be of no concern here.
> 
> If I am not missing some case, this whole method is unnecessary now.

If you've already fixed this for GC then I agree that we could remove this.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758805418

From stefank at openjdk.org  Fri Sep 13 12:51:18 2024
From: stefank at openjdk.org (Stefan Karlsson)
Date: Fri, 13 Sep 2024 12:51:18 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v8]
In-Reply-To: <TSrOBVP-x_1zWFqLynDcmtPSgONPen7W5HZ7bwD4_D4=.406eff89-e0da-40a1-998e-7a77c895fdd9@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <J0g0PSye768dhN3JoXFsZ6FivocNBLPmHNtRUCOmLTw=.5756c64a-4763-4727-9637-ba3fe7b9b17d@github.com>
 <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com>
 <ZDRwCJRCHrlH268CzJ4P7VDJXnGar40Fl4t_W28L99Y=.c9ad4810-4040-494b-8d54-96160fcd6046@github.com>
 <Iv-aWPSvR4eZY9bek0msxpkDVQnvT9XyGcXEAhJ6UWo=.6e03e898-cba2-444f-b075-5b5fae3059c5@github.com>
 <TSrOBVP-x_1zWFqLynDcmtPSgONPen7W5HZ7bwD4_D4=.406eff89-e0da-40a1-998e-7a77c895fdd9@github.com>
Message-ID: <v0VLG-aoo8NDw4RxMWa66ktvblMg741lPUUEle69V60=.82823ab0-b6f1-4a81-a29b-58be35fca1a0@github.com>

On Fri, 13 Sep 2024 12:47:09 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:

>> Which ones are remaining? JDK-8337709 implemented the same change for G1 GC before JDK-8311163.
>> 
>> The full collectors/g1 marking do not modify the length fields but have multiple separate queues which is a different issue. It will also be handled by the new `PartialArrayTaskStepper`, but should be of no concern here.
>> 
>> If I am not missing some case, this whole method is unnecessary now.
>
> If you've already fixed this for GC then I agree that we could remove this.

This seems like something that should be done as a separate patch that gets pushed before this PR.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758808115

From rkennke at openjdk.org  Fri Sep 13 12:56:17 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Fri, 13 Sep 2024 12:56:17 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v14]
In-Reply-To: <tiMTCfGoBUlu_Sm2daFxgrO5QLzVXU_jWolWV_Vk7nk=.a9e0ba6e-b436-454d-9a76-173cd1e83639@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <3-nOaxvBWIcOLzCOlrWPzJtsYRknVz5JIwx21X8xkIg=.6ce32f15-a4d6-44d5-9b9e-c3015de45e66@github.com>
 <tiMTCfGoBUlu_Sm2daFxgrO5QLzVXU_jWolWV_Vk7nk=.a9e0ba6e-b436-454d-9a76-173cd1e83639@github.com>
Message-ID: <QHJWe_7rjgKSw78ZIDsmkLL5gZJLzUT2M_lJE2WSSLo=.678f9de1-0cdb-46d9-8dbe-9362aeb494fe@github.com>

On Fri, 13 Sep 2024 09:39:23 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:

>> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Hide log timestamps in test to prevent false failures
>
> src/hotspot/share/oops/oop.inline.hpp line 295:
> 
>> 293: // Used by scavengers
>> 294: void oopDesc::forward_to(oop p) {
>> 295:   assert(cast_from_oop<oopDesc*>(p) != this,
> 
> Do we really need the cast here?

Yes, otherwise compiler complains about ambiguous != operator.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758815451

From rkennke at openjdk.org  Fri Sep 13 13:03:16 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Fri, 13 Sep 2024 13:03:16 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v14]
In-Reply-To: <tiMTCfGoBUlu_Sm2daFxgrO5QLzVXU_jWolWV_Vk7nk=.a9e0ba6e-b436-454d-9a76-173cd1e83639@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <3-nOaxvBWIcOLzCOlrWPzJtsYRknVz5JIwx21X8xkIg=.6ce32f15-a4d6-44d5-9b9e-c3015de45e66@github.com>
 <tiMTCfGoBUlu_Sm2daFxgrO5QLzVXU_jWolWV_Vk7nk=.a9e0ba6e-b436-454d-9a76-173cd1e83639@github.com>
Message-ID: <aww1izsoDFzMbpM1_tYL5WqHRfFKqP_q36aCus6Y5CQ=.4289de67-b7c8-4cf1-b897-ebc3eff67224@github.com>

On Fri, 13 Sep 2024 09:31:39 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:

>> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Hide log timestamps in test to prevent false failures
>
> src/hotspot/share/oops/oop.hpp line 363:
> 
>> 361:       return sizeof(markWord) + sizeof(Klass*);
>> 362:     }
>> 363:   }
> 
> Not a strong request for this PR, but there are many places that calculates almost the same thing, and it might be good to limit the number of places we do similar calculations.
> 
> I'm wondering if it wouldn't be better for readability to structure the code as follows:
> 
>   static int header_size_in_bytes() {
>     if (UseCompactObjectHeaders) {
>       return sizeof(markWord);
>     } else if (UseCompressedClassPointers) {
>       return sizeof(markWord) + sizeof(narrowKlass);
>     } else {
>       return sizeof(markWord) + sizeof(Klass*);
>     }
>   }
> 
>   // Size of object header, aligned to platform wordSize
>   static int header_size() {
>     return align_up(header_size_in_bytes(), HeapWordSize) / HeapWordSize;
>   }
> ...  
>   static int base_offset_in_bytes() {
>     return header_size_in_bytes();
>   }

Ok. I filed: https://bugs.openjdk.org/browse/JDK-8340118 for now, let's see if I can sort this out before integrating this PR.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758825458

From rkennke at openjdk.org  Fri Sep 13 13:11:45 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Fri, 13 Sep 2024 13:11:45 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v15]
In-Reply-To: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
Message-ID: <CCz9B03dUxzua_YHk9-nyyHS_FTa6axwrwYjACcz4Y0=.6052f53c-9ddb-4868-a817-aff4276a2b6a@github.com>

> This is the main body of the JEP 450: Compact Object Headers (Experimental).
> 
> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
> 
> Main changes:
>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>  - Arrays will now store their length at offset 8.
>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv...

Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:

  Various touch-ups

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20677/files
  - new: https://git.openjdk.org/jdk/pull/20677/files/69f1ef1d..990926f5

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=14
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=13-14

  Stats: 25 lines in 8 files changed: 3 ins; 17 del; 5 mod
  Patch: https://git.openjdk.org/jdk/pull/20677.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677

PR: https://git.openjdk.org/jdk/pull/20677

From stefank at openjdk.org  Fri Sep 13 13:18:16 2024
From: stefank at openjdk.org (Stefan Karlsson)
Date: Fri, 13 Sep 2024 13:18:16 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v14]
In-Reply-To: <QHJWe_7rjgKSw78ZIDsmkLL5gZJLzUT2M_lJE2WSSLo=.678f9de1-0cdb-46d9-8dbe-9362aeb494fe@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <3-nOaxvBWIcOLzCOlrWPzJtsYRknVz5JIwx21X8xkIg=.6ce32f15-a4d6-44d5-9b9e-c3015de45e66@github.com>
 <tiMTCfGoBUlu_Sm2daFxgrO5QLzVXU_jWolWV_Vk7nk=.a9e0ba6e-b436-454d-9a76-173cd1e83639@github.com>
 <QHJWe_7rjgKSw78ZIDsmkLL5gZJLzUT2M_lJE2WSSLo=.678f9de1-0cdb-46d9-8dbe-9362aeb494fe@github.com>
Message-ID: <FVXBfTUdHro_ESmy-NA-ZfkOetfGUrgoEFGysqI5a6k=.27b7179b-14e5-4500-943c-8b160eb5da5b@github.com>

On Fri, 13 Sep 2024 12:53:29 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> src/hotspot/share/oops/oop.inline.hpp line 295:
>> 
>>> 293: // Used by scavengers
>>> 294: void oopDesc::forward_to(oop p) {
>>> 295:   assert(cast_from_oop<oopDesc*>(p) != this,
>> 
>> Do we really need the cast here?
>
> Yes, otherwise compiler complains about ambiguous != operator.

OK, we shouldn't need to. It seems like I can silence the compiler by tweaking oopsHierarchy.hpp. I'll deal with that as a follow-up.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758853099

From tschatzl at openjdk.org  Fri Sep 13 13:51:16 2024
From: tschatzl at openjdk.org (Thomas Schatzl)
Date: Fri, 13 Sep 2024 13:51:16 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v8]
In-Reply-To: <v0VLG-aoo8NDw4RxMWa66ktvblMg741lPUUEle69V60=.82823ab0-b6f1-4a81-a29b-58be35fca1a0@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <J0g0PSye768dhN3JoXFsZ6FivocNBLPmHNtRUCOmLTw=.5756c64a-4763-4727-9637-ba3fe7b9b17d@github.com>
 <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com>
 <ZDRwCJRCHrlH268CzJ4P7VDJXnGar40Fl4t_W28L99Y=.c9ad4810-4040-494b-8d54-96160fcd6046@github.com>
 <Iv-aWPSvR4eZY9bek0msxpkDVQnvT9XyGcXEAhJ6UWo=.6e03e898-cba2-444f-b075-5b5fae3059c5@github.com>
 <TSrOBVP-x_1zWFqLynDcmtPSgONPen7W5HZ7bwD4_D4=.406eff89-e0da-40a1-998e-7a77c895fdd9@github.com>
 <v0VLG-aoo8NDw4RxMWa66ktvblMg741lPUUEle69V60=.82823ab0-b6f1-4a81-a29b-58be35fca1a0@github.com>
Message-ID: <HJO2ejuvN1xEutkOZUct_W9ayVEFqSsUt91S1lybKOo=.60019b1c-a1f4-4ce3-b4db-58f0e58e106a@github.com>

On Fri, 13 Sep 2024 12:48:53 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:

>> If you've already fixed this for GC then I agree that we could remove this.
>
> This seems like something that should be done as a separate patch that gets pushed before this PR.

Will remove in JDK-8340119.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758906485

From kvn at openjdk.org  Fri Sep 13 22:12:13 2024
From: kvn at openjdk.org (Vladimir Kozlov)
Date: Fri, 13 Sep 2024 22:12:13 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v17]
In-Reply-To: <_7xTwicd2PDxJZOJ7xLdkHZc18UT-9sVZk-3YFgkMA0=.7db81e94-4f38-44a0-9983-6e391459aab2@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <DhU_l-0fczpPrZ8h-99waBbHcEOcQs-BmRM8_uIdPQ4=.6e5de896-d87c-4a9d-8c77-b155b29618a1@github.com>
 <V1ue84GBUJsrKD6LtJtHAJpnXiD_Vs_mlJoAOfDkKFw=.3360375c-5367-41c9-b3f8-0d1c6ae98513@github.com>
 <_7xTwicd2PDxJZOJ7xLdkHZc18UT-9sVZk-3YFgkMA0=.7db81e94-4f38-44a0-9983-6e391459aab2@github.com>
Message-ID: <-lhMoCYQAGXWEAQ2ySemYzUh_DjKgqi4pG10NdrHils=.b2bc294a-941d-42aa-a00f-149d9260dfeb@github.com>

On Mon, 9 Sep 2024 14:41:25 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 112:
>> 
>>> 110:           // The answer is that stores of different sizes can co-exist
>>> 111:           // in the same sequence of RawMem effects.  We sometimes initialize
>>> 112:           // a whole 'tile' of array elements with a single jint or jlong.)
>> 
>> I'm having trouble making sense of this comment.  I guess a jlong could be used to null-initialize two
>> 32bit oops/narrowOops?  But that doesn't have anything to do with jints.
>
> I am not sure the complex overlap test is necessary here, this code was copy-pasted from [MemNode::find_previous_store()](https://github.com/openjdk/jdk/blob/c54fc08aa3c63e4b26dc5edb2436844dfd3bab7c/src/hotspot/share/opto/memnode.cpp#L678) by [JDK-8057737](https://bugs.openjdk.org/browse/JDK-8057737), and in this new context I do not see how we might find stores of different sizes as mentioned in the comment. jlongs could be used to null-initialize two 32-bit OOPs, but such initializing stores are not even visible in C2's intermediate representation at the time `G1BarrierSetC2::g1_can_remove_pre_barrier()` is called. The fact that the comment refers to initializing several array elements with a single jint suggests to me that this code has lost some of its original purpose after being copied into a narrower context (OOP stores after object allocations). But since this code is pre-existing and in the worst case it is just performing some unnecessary work, I suggest to leave it as-is
  and possibly investigate how to simplify it as a follow-up task.

Yes, the comment reference to combined initialization stores: [memnode.cpp#L4925](https://github.com/openjdk/jdk/blob/c54fc08aa3c63e4b26dc5edb2436844dfd3bab7c/src/hotspot/share/opto/memnode.cpp#L4925)
Which is used only for primitive type (integers and floats) constant strores.
There was also recent change by Emanuel to combine stores into primitive arrays: [JDK-8335390](https://bugs.openjdk.org/browse/JDK-8335390)

None of above do anything to oop stores.  I agree that this code could left for now and be optimized later.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1759565105

From wkemper at openjdk.org  Fri Sep 13 22:57:17 2024
From: wkemper at openjdk.org (William Kemper)
Date: Fri, 13 Sep 2024 22:57:17 GMT
Subject: RFR: 8339960: Shenandoah: Fix inconsistencies in generational
 Shenandoah behaviors
In-Reply-To: <vzXjdQg2wTPzgTG1aU7_CUnAxtnlt_2o01drZXtI324=.100816d9-1ed5-497c-be51-fd540f496cb7@github.com>
References: <vzXjdQg2wTPzgTG1aU7_CUnAxtnlt_2o01drZXtI324=.100816d9-1ed5-497c-be51-fd540f496cb7@github.com>
Message-ID: <FpbbX-7rc4IUjBybSWLEsbalDIINOIULk26piRxX8J4=.dbb3df5a-cf37-4cdc-83b0-b3c4af0e3679@github.com>

On Thu, 12 Sep 2024 20:23:36 GMT, Kelvin Nilsen <kdnilsen at openjdk.org> wrote:

> This fixes some bugs found in recent code review and playback of an assertion failure.
> 
> See also https://github.com/openjdk/shenandoah/pull/497

Looks good to me.

-------------

Marked as reviewed by wkemper (Committer).

PR Review: https://git.openjdk.org/jdk/pull/20974#pullrequestreview-2304276183

From kvn at openjdk.org  Fri Sep 13 23:23:19 2024
From: kvn at openjdk.org (Vladimir Kozlov)
Date: Fri, 13 Sep 2024 23:23:19 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v20]
In-Reply-To: <fZlrdfsGJHUzjBFq0wEWY0QPZlhjC3KAK1obtgfMCTk=.7ccbe569-50d8-4a3e-8728-5aaca9678ab6@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <fZlrdfsGJHUzjBFq0wEWY0QPZlhjC3KAK1obtgfMCTk=.7ccbe569-50d8-4a3e-8728-5aaca9678ab6@github.com>
Message-ID: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com>

On Wed, 11 Sep 2024 08:30:02 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail.
>> 
>> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold:
>> 
>> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and
>> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work.
>> 
>> ## Summary of the Changes
>> 
>> ### Platform-Independent Changes (`src/hotspot/share`)
>> 
>> These consist mainly of:
>> 
>> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early;
>> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and
>> - temporary support for porting the JEP to the remaining platforms.
>> 
>> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed.
>> 
>> ### Platform-Dependent Changes (`src/hotspot/cpu`)
>> 
>> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms.
>> 
>> #### ADL Changes
>> 
>> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy.
>> 
>> #### `G1BarrierSetAssembler` Changes
>> 
>> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ...
>
> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix a few style issues

src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 241:

> 239:   assert(newval_bottom->isa_ptr() || newval_bottom->isa_narrowoop(), "newval should be an OOP");
> 240:   TypePtr::PTR newval_type = newval_bottom->make_ptr()->ptr();
> 241:   uint8_t barrier_data = store->barrier_data();

Should you check barrier data for 0?
`is_ptr()` has wide set of types. It includes TypeRawPtr, TypeKlassPtr and TypeMetadataPtr. Where you filtering them?

src/hotspot/share/gc/g1/g1BarrierSet.cpp line 65:

> 63: #else
> 64:                       make_barrier_set_c2<G1BarrierSetC2>(),
> 65: #endif

I assume it is temporary until all ports a ready (except 32-bit x86 may be). Right?

src/hotspot/share/opto/matcher.cpp line 1821:

> 1819:   if( rule >= _END_INST_CHAIN_RULE || rule < _BEGIN_INST_CHAIN_RULE ) {
> 1820:     assert(C->node_arena()->contains(s->_leaf) || !has_new_node(s->_leaf),
> 1821:            "duplicating node that's already been matched");

Why it was removed?

src/hotspot/share/opto/matcher.cpp line 2845:

> 2843:     n->Opcode() == Op_StoreN &&
> 2844:     m->is_EncodeP();
> 2845: }

Add comment that `m` is input of `n`. I thought about adding assert too but I will leave it to you.

src/hotspot/share/opto/output.cpp line 2026:

> 2024:     if (n->is_MachNullCheck()) {
> 2025:       assert(n->in(1)->as_Mach()->barrier_data() == 0,
> 2026:              "Implicit null checks on memory accesses with barriers are not yet supported");

I don't see here changes in `lcm.cpp` which would prevent it.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1759604325
PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1759604944
PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1759593453
PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1759593131
PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1759605704

From stuefe at openjdk.org  Sun Sep 15 06:17:14 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Sun, 15 Sep 2024 06:17:14 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v13]
In-Reply-To: <epFrYZqTWD77IySDqJ9fwFiXZKZ_NCuzA-lfPi6TU_4=.d9c5c483-9a6a-4113-9856-cddcae962f33@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com>
 <epFrYZqTWD77IySDqJ9fwFiXZKZ_NCuzA-lfPi6TU_4=.d9c5c483-9a6a-4113-9856-cddcae962f33@github.com>
Message-ID: <Mti73HI8rw7uN-bgm4CxqMvC714P2vtnQ2prQDQ-HXo=.064603a1-fae7-49d5-bd33-f905ac33a19c@github.com>

On Wed, 11 Sep 2024 21:15:21 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

>> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Revert accidental change of UCOH default
>
> I was starting to understand the concerns with having prototype_header in Klass.  It seems like it would simplify encoding the klass for object allocation.  My recent change https://bugs.openjdk.org/browse/JDK-8338526 breaks this.  You need to pass a parameter to Klass() to tell whether to encode the klass pointer or not, and pass this to Klass() constructor.
> 
>      diff --git a/src/hotspot/share/oops/instanceKlass.cpp b/src/hotspot/share/oops/instanceKlass.cpp
>      index fd198f54fc9..7aa4bd24948 100644
>      --- a/src/hotspot/share/oops/instanceKlass.cpp
>      +++ b/src/hotspot/share/oops/instanceKlass.cpp
>     @@ -511,7 +511,7 @@ InstanceKlass::InstanceKlass() {
>      }
>      
>      InstanceKlass::InstanceKlass(const ClassFileParser& parser, KlassKind kind, ReferenceType reference_type) :
>     -  Klass(kind),
>     +  Klass(kind, (!parser.is_interface() && !parser.is_abstract())),
>        _nest_members(nullptr),
>        _nest_host(nullptr),
>        _permitted_subclasses(nullptr),

@coleenp 

> I was starting to understand the concerns with having prototype_header in Klass. It seems like it would simplify encoding the klass for object allocation. My recent change https://bugs.openjdk.org/browse/JDK-8338526 breaks this. You need to pass a parameter to Klass() to tell whether to encode the klass pointer or not, and pass this to Klass() constructor.
> 

I solved this differently (Roman will merge this into his PR).


static markWord make_prototype(const Klass* kls) {
  markWord prototype = markWord::prototype();
#ifdef _LP64
  if (UseCompactObjectHeaders) {
    // With compact object headers, the narrow Klass ID is part of the mark word.
    // We therfore seed the mark word with the narrow Klass ID.
    // Note that only those Klass that can be instantiated have a narrow Klass ID.
    // For those who don't, we leave the klass bits empty and assert if someone
    // tries to use those.
    const narrowKlass nk = CompressedKlassPointers::is_encodable(kls) ?
        CompressedKlassPointers::encode(const_cast<Klass*>(kls)) : 0;
    prototype = prototype.set_narrow_klass(nk);
  }
#endif
  return prototype;
}

inline bool CompressedKlassPointers::is_encodable(const void* address) {
  check_init(_base);
  // An address can only be encoded if:
  //
  // 1) the address lies within the klass range.
  // 2) It is suitably aligned to 2^encoding_shift. This only really matters for
  //    +UseCompactObjectHeaders, since the encoding shift can be large (max 10 bits -> 1KB).
  return is_aligned(address, klass_alignment_in_bytes()) &&
      address >= _klass_range_start && address < _klass_range_end;
}


So, we put an nKlass into the prototype if we can. We can, if the Klass address is encodable. It is encodable if it lives in the encoded Klass range and is correctly aligned. No need to pass this information via another channel: its right there, in the Klass address. This works even before Klass is initialized.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2351399143

From stuefe at openjdk.org  Sun Sep 15 06:17:15 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Sun, 15 Sep 2024 06:17:15 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v11]
In-Reply-To: <b8u6PaStO4nt0AGnbZEkmXkzQKzXaB2c4lSiC9JWdTY=.a5805ef2-84d8-44c0-9dc0-3e90da3460aa@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <DUgk6HT_Zk5KnbmAd_HzrJi4B8D1J-uuoAGzReWyI8s=.6c04f90a-bf1d-49af-a31d-5224973952d4@github.com>
 <b8u6PaStO4nt0AGnbZEkmXkzQKzXaB2c4lSiC9JWdTY=.a5805ef2-84d8-44c0-9dc0-3e90da3460aa@github.com>
Message-ID: <rJRoSXcS7-_XJVMfJ59GKXjV68j7MBkmecVLjpnq1Ks=.2d534dd5-d7ec-4f72-af63-8ca00a6a7406@github.com>

On Wed, 11 Sep 2024 11:25:41 GMT, Johan Sj?len <jsjolen at openjdk.org> wrote:

>> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Fix FullGCForwarding initialization
>
> src/hotspot/share/memory/metaspace/metablock.hpp line 51:
> 
>> 49:   size_t word_size() const { return _word_size; }
>> 50:   bool is_empty() const { return _base == nullptr; }
>> 51:   bool is_nonempty() const { return _base != nullptr; }
> 
> Can `_base == nullptr` but `_word_size != 0`?

No

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1759973362

From rcastanedalo at openjdk.org  Mon Sep 16 06:56:18 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Mon, 16 Sep 2024 06:56:18 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v15]
In-Reply-To: <CCz9B03dUxzua_YHk9-nyyHS_FTa6axwrwYjACcz4Y0=.6052f53c-9ddb-4868-a817-aff4276a2b6a@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <CCz9B03dUxzua_YHk9-nyyHS_FTa6axwrwYjACcz4Y0=.6052f53c-9ddb-4868-a817-aff4276a2b6a@github.com>
Message-ID: <6UJOrZqmfsJj6pRzMjPdlYt191QgBV6fIv1qJAYsv60=.15284272-464f-4321-b76c-3412dafc6c63@github.com>

On Fri, 13 Sep 2024 13:11:45 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> This is the main body of the JEP 450: Compact Object Headers (Experimental).
>> 
>> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
>> 
>> Main changes:
>>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>>  - Arrays will now store their length at offset 8.
>>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co...
>
> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Various touch-ups

src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2576:

> 2574:   } else {
> 2575:     lea(dst, Address(obj, index, Address::lsl(scale)));
> 2576:     ldr(dst, Address(dst, offset));

Do you have a reproducer (or, better yet, a test case) that exercises this case? I ran Oracle's internal CI tiers 1-5 and could never hit it. Could this happen for x64 as well?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1760617744

From rcastanedalo at openjdk.org  Mon Sep 16 08:07:18 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Mon, 16 Sep 2024 08:07:18 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v15]
In-Reply-To: <CCz9B03dUxzua_YHk9-nyyHS_FTa6axwrwYjACcz4Y0=.6052f53c-9ddb-4868-a817-aff4276a2b6a@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <CCz9B03dUxzua_YHk9-nyyHS_FTa6axwrwYjACcz4Y0=.6052f53c-9ddb-4868-a817-aff4276a2b6a@github.com>
Message-ID: <mX0ou3n5MzR_tt1nnuLy-cFYj_85cfCK-LHR6KZ-Uwk=.4fca751d-a06a-4c2d-8bb2-eba51d435ef2@github.com>

On Fri, 13 Sep 2024 13:11:45 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> This is the main body of the JEP 450: Compact Object Headers (Experimental).
>> 
>> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
>> 
>> Main changes:
>>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>>  - Arrays will now store their length at offset 8.
>>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co...
>
> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Various touch-ups

> * Note that oopDesc::klass_offset_in_bytes() is not used by +UCOH paths anymore. The only exception is C2, which uses it as a placeholder/identifier of the special memory slice that only LoadNKlass uses. The backend then extracts the original oop and loads its mark-word and extracts the narrow-Klass* from that.

I agree that this is the simplest and least intrusive way of getting klass loading working in C2 for this experimental version of the feature. However, the approach seems brittle and error-prone, and it may be hard to maintain in the long run. Therefore, I think that a more principled and robust modeling will be needed, after this PR is integrated, in preparation for the non-experimental version.

An alternative that seems promising is to hide the object header klass pointer extraction and make it part of the `LoadNKlass` node semantics, as illustrated in this example:

![alternative-modeling](https://github.com/user-attachments/assets/06243966-3065-4969-a2dd-d05133b36366)

`LoadNKlass` nodes can then be expanded into more primitive operations (load and shift for compact headers, load with `klass_offset_in_bytes()` for original headers) within C2's back-end or even during code emission as sketched [here](https://github.com/robcasloz/jdk/commit/6cb4219f101e3be982264071c2cb1d0af1c6d754). @rkennke is this similar to what you tried out ("Expanding it as a macro")?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2352253326

From lucy at openjdk.org  Mon Sep 16 08:18:07 2024
From: lucy at openjdk.org (Lutz Schmidt)
Date: Mon, 16 Sep 2024 08:18:07 GMT
Subject: RFR: 8339648: ZGC: Division by zero in rule_major_allocation_rate
 [v2]
In-Reply-To: <ftQ3_zwYKXg1VEzKfx5XqYxCAWZtbhskNJTD47st1Jg=.eadd39e7-5a1e-4001-ab10-901c9ef57581@github.com>
References: <ElUYTplxxvkJLtxj5OdpcsgumQV8zdml9oyQim2kBP8=.695e816f-a639-4b39-92d2-ea5de5cb1037@github.com>
 <ftQ3_zwYKXg1VEzKfx5XqYxCAWZtbhskNJTD47st1Jg=.eadd39e7-5a1e-4001-ab10-901c9ef57581@github.com>
Message-ID: <fPSLNr7wPapykkTwLRCujkJ22IKobQdpIuNltMY2I54=.a9d79cb2-1bc5-4eed-b86d-48d5ed485586@github.com>

On Mon, 9 Sep 2024 11:37:41 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

>> The HS jtreg test gc/stringdedup/TestStringDeduplicationAgeThreshold_ZGenerational
>> shows this error when running with ubsan enabled 
>> 
>> src/hotspot/share/gc/z/zDirector.cpp:491:74: runtime error: division by zero
>>     #0 0x7f09886401d4 in rule_major_allocation_rate src/hotspot/share/gc/z/zDirector.cpp:491
>>     #1 0x7f09886401d4 in start_gc src/hotspot/share/gc/z/zDirector.cpp:822
>>     #2 0x7f09886401d4 in ZDirector::run_thread() src/hotspot/share/gc/z/zDirector.cpp:912
>>     #3 0x7f098c1404e8 in ZThread::run_service() src/hotspot/share/gc/z/zThread.cpp:29
>>     #4 0x7f09897cac19 in ConcurrentGCThread::run() src/hotspot/share/gc/shared/concurrentGCThread.cpp:48
>>     #5 0x7f098bb46b0a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225
>>     #6 0x7f098b1a9881 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858
>
> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Adjust division following suggestion by xmas

Changes requested by lucy (Reviewer).

src/hotspot/share/gc/z/zDirector.cpp line 491:

> 489:   // Calculate the GC cost for each reclaimed byte
> 490:   const double current_young_gc_time_per_bytes_freed = double(young_gc_time) / double(reclaimed_per_young_gc);
> 491:   const double current_old_gc_time_per_bytes_freed = reclaimed_per_old_gc == 0 ? std::numeric_limits<double>::infinity() : double(old_gc_time) / double(reclaimed_per_old_gc);

How about using some parentheses? To my understanding, the division has a higher precedence than the ternary conditional expression. See: https://en.cppreference.com/w/cpp/language/operator_precedence

-------------

PR Review: https://git.openjdk.org/jdk/pull/20888#pullrequestreview-2306000198
PR Review Comment: https://git.openjdk.org/jdk/pull/20888#discussion_r1760715673

From rcastanedalo at openjdk.org  Mon Sep 16 08:19:24 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Mon, 16 Sep 2024 08:19:24 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v20]
In-Reply-To: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <fZlrdfsGJHUzjBFq0wEWY0QPZlhjC3KAK1obtgfMCTk=.7ccbe569-50d8-4a3e-8728-5aaca9678ab6@github.com>
 <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com>
Message-ID: <F-fkbPBSHGq6IAPtGNj5WmpyzALG6MUfy3t-2_HZAHU=.5335023e-f53c-4db2-aa6a-d8d550944f54@github.com>

On Fri, 13 Sep 2024 23:16:32 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

>> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Fix a few style issues
>
> src/hotspot/share/gc/g1/g1BarrierSet.cpp line 65:
> 
>> 63: #else
>> 64:                       make_barrier_set_c2<G1BarrierSetC2>(),
>> 65: #endif
> 
> I assume it is temporary until all ports a ready (except 32-bit x86 may be). Right?

Right, all code guarded by `G1_LATE_BARRIER_MIGRATION_SUPPORT` will be removed before integration.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1760716721

From rcastanedalo at openjdk.org  Mon Sep 16 09:31:13 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Mon, 16 Sep 2024 09:31:13 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v20]
In-Reply-To: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <fZlrdfsGJHUzjBFq0wEWY0QPZlhjC3KAK1obtgfMCTk=.7ccbe569-50d8-4a3e-8728-5aaca9678ab6@github.com>
 <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com>
Message-ID: <nzubYA7ESzm-LTxypE-Nx4OSVrK39Mj-wLou2alQxto=.0dfacde3-68f2-47c4-ba54-5dd76f7e91c6@github.com>

On Fri, 13 Sep 2024 23:18:44 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

>> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Fix a few style issues
>
> src/hotspot/share/opto/output.cpp line 2026:
> 
>> 2024:     if (n->is_MachNullCheck()) {
>> 2025:       assert(n->in(1)->as_Mach()->barrier_data() == 0,
>> 2026:              "Implicit null checks on memory accesses with barriers are not yet supported");
> 
> I don't see here changes in `lcm.cpp` which would prevent it.

I did not make any changes because the current logic in `lcm.cpp` already prevents this, albeit in a rather accidental way: `PhaseCFG::implicit_null_check` requires that all inputs of a candidate memory operation dominate the null check ([here](https://github.com/robcasloz/jdk/blob/d21104ca8ff1eef88a9d87fb78dda3009414b5b8/src/hotspot/share/opto/lcm.cpp#L310-L328)) so that it can be hoisted. This fails if the candidate memory operation has barriers because these always require `MachTemp` nodes, which are placed in the same block as the candidate and break the dominance condition. See a longer explanation [here](https://github.com/openjdk/jdk/pull/19746/files#r1715387255).

Should I add a check to `PhaseCFG::implicit_null_check` to discard these memory accesses more explicitly?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1760814745

From shade at openjdk.org  Mon Sep 16 10:40:16 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Mon, 16 Sep 2024 10:40:16 GMT
Subject: RFR: 8340183: Shenandoah: Incorrect match for clone barrier in
 is_gc_barrier_node
Message-ID: <lMJ7zz1QZt3FMIqVLhEpYe6aqMc_uqU_d0P2mWQI4IY=.136a8eb3-b7ae-43f4-b268-5f7a6d189e09@github.com>

The name of the call we emit is "shenandoah_clone":
https://github.com/openjdk/jdk/blob/545951889c1ea68646be600decaf2bf4c049600b/src/hotspot/share/gc/shenandoah/c2/shenandoahBarrierSetC2.cpp#L806

...yet we test for "shenandoah_clone_barrier" here:
https://github.com/openjdk/jdk/blob/545951889c1ea68646be600decaf2bf4c049600b/src/hotspot/share/gc/shenandoah/c2/shenandoahBarrierSetC2.cpp#L688

I think we are better off polling the call target instead of relying on call name. This change also eliminates `shenandoah_cas_obj` matcher, for which we do not have the emitted call ever since we started doing CAS expansions inline.

Additional testing:
 - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah`
 - [ ] Linux x86_64 server fastdebug, `all` with `-XX:+UseShenandoahGC`

-------------

Commit messages:
 - Fix

Changes: https://git.openjdk.org/jdk/pull/21014/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21014&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8340183
  Stats: 20 lines in 3 files changed: 6 ins; 10 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/21014.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/21014/head:pull/21014

PR: https://git.openjdk.org/jdk/pull/21014

From shade at openjdk.org  Mon Sep 16 10:44:12 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Mon, 16 Sep 2024 10:44:12 GMT
Subject: RFR: 8340186: Shenandoah: Missing
 load_reference_barrier_phantom_narrow match in is_shenandoah_lrb_call
Message-ID: <WKF6HOcWBzpASW2-szk8bk_hZ-DN6CGvX9vwkHPM0x0=.7fb65ec8-d648-4b60-ad73-f9c1e1c1b8ce@github.com>

[JDK-8256999](https://bugs.openjdk.org/browse/JDK-8256999) added `ShenandoahRuntime::load_reference_barrier_phantom_narrow`, but missed adding it to `ShenandoahBarrierSetC2::is_shenandoah_lrb_call`. It is currently innocuous, as there are no users of `is_shenandoah_lrb_call`, but it will be important when [JDK-8340183](https://bugs.openjdk.org/browse/JDK-8340183) lands.

-------------

Commit messages:
 - Fix

Changes: https://git.openjdk.org/jdk/pull/21016/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21016&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8340186
  Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/21016.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/21016/head:pull/21016

PR: https://git.openjdk.org/jdk/pull/21016

From rkennke at openjdk.org  Mon Sep 16 12:38:00 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Mon, 16 Sep 2024 12:38:00 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v16]
In-Reply-To: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
Message-ID: <BFrQL8ZoZKj-jdrRp_6onv4H7lchv46F6dcQ_vMzYZw=.9bb74940-a2d7-4005-bf25-ae2886c9a119@github.com>

> This is the main body of the JEP 450: Compact Object Headers (Experimental).
> 
> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
> 
> Main changes:
>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>  - Arrays will now store their length at offset 8.
>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv...

Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 53 commits:

 - Fix  test/hotspot/jtreg/runtime/CompressedOops/CompressedClassPointersEncodingScheme.java
 - Fix loop on aarch64
 - clarify obscure assert in metasapce setup
 - Rework compressedklass encoding
 - remove stray debug output
 - Fixes post 8338526
 - Merge commit '597788850041e7272a23714c05ba546ee6080856' into JDK-8305895-v4
 - Various touch-ups
 - Hide log timestamps in test to prevent false failures
 - Revert accidental change of UCOH default
 - ... and 43 more: https://git.openjdk.org/jdk/compare/59778885...49c87547

-------------

Changes: https://git.openjdk.org/jdk/pull/20677/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=15
  Stats: 4605 lines in 190 files changed: 3252 ins; 724 del; 629 mod
  Patch: https://git.openjdk.org/jdk/pull/20677.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677

PR: https://git.openjdk.org/jdk/pull/20677

From rkennke at openjdk.org  Mon Sep 16 13:28:00 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Mon, 16 Sep 2024 13:28:00 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v17]
In-Reply-To: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
Message-ID: <_5gI7i33xrOgXMTI_04oX9UDGwhVTtSNWoSiNfM3FOM=.b24979b3-dcde-401f-b2d8-9b201d303f57@github.com>

> This is the main body of the JEP 450: Compact Object Headers (Experimental).
> 
> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
> 
> Main changes:
>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>  - Arrays will now store their length at offset 8.
>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv...

Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 54 commits:

 - Merge remote-tracking branch 'origin/master' into JDK-8305895-v4
 - Fix  test/hotspot/jtreg/runtime/CompressedOops/CompressedClassPointersEncodingScheme.java
 - Fix loop on aarch64
 - clarify obscure assert in metasapce setup
 - Rework compressedklass encoding
 - remove stray debug output
 - Fixes post 8338526
 - Merge commit '597788850041e7272a23714c05ba546ee6080856' into JDK-8305895-v4
 - Various touch-ups
 - Hide log timestamps in test to prevent false failures
 - ... and 44 more: https://git.openjdk.org/jdk/compare/54595188...2125cd81

-------------

Changes: https://git.openjdk.org/jdk/pull/20677/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=16
  Stats: 4598 lines in 190 files changed: 3245 ins; 719 del; 634 mod
  Patch: https://git.openjdk.org/jdk/pull/20677.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677

PR: https://git.openjdk.org/jdk/pull/20677

From rkennke at openjdk.org  Mon Sep 16 13:31:17 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Mon, 16 Sep 2024 13:31:17 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v11]
In-Reply-To: <hkGaBWSugvpV0PgoR9rnutHCgQzkeXv1ctH7QwV3zK0=.204737d5-fadb-410a-8891-7db59d36830f@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <DUgk6HT_Zk5KnbmAd_HzrJi4B8D1J-uuoAGzReWyI8s=.6c04f90a-bf1d-49af-a31d-5224973952d4@github.com>
 <YTte6oY9EeAPHxzzOndL8M202wY91SbeaYC4hPva0W0=.caf00270-89c7-44b0-ab88-a970fb8e840d@github.com>
 <kzodWwxj5qdAqPYcP2s4k2QTXqjju2H6Iftu7dojezk=.1f72fc31-94ba-4017-b330-c8688a8d39a4@github.com>
 <hkGaBWSugvpV0PgoR9rnutHCgQzkeXv1ctH7QwV3zK0=.204737d5-fadb-410a-8891-7db59d36830f@github.com>
Message-ID: <cjnJhjaNZpKO7F4Tx0zQEnBuJ_vWFxddjqXrwS35Sx8=.43a14afb-8f60-496b-866b-fac5e6d38ca2@github.com>

On Thu, 12 Sep 2024 13:13:01 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>>> @rkennke Can you please explain the changes in these tests:
>>> 
>>> ```
>>> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java
>>> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java
>>> test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java
>>> test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java
>>> test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java
>>> ```
>>> 
>>> You added these IR rule restriction: `@IR(applyIf = {"UseCompactObjectHeaders", "false"},`
>>> 
>>> This means that if `UseCompactObjectHeaders` is enabled, vectorization seems to be impacted - that could be concerning because it has a performance impact.
>>> 
>>> I have recently changed a few things in SuperWord, so maybe some of them can be removed, because they now vectorize anyway?
>>> 
>>> Of course some special tests may just rely on `UseCompactObjectHeaders == false` - but I would need some comments in the tests where you added it to justify why we add the restriction.
>>> 
>>> Please also test this patch with the cross combinations of `UseCompactObjectHeaders` and `AlignVector` enabled and disabled (and add `VerifyAlignVector` as well).
>> 
>> IIRC (it has been a while), the problem is that with Lilliput (and also without compact headers, but disabling compressed class-pointers -UseCompressedClassPointers, but nobody ever does that), byte[] and long[] start at different offsets (12 and 16, respectively). That is because with compact headers, we are using the 4 bytes after the arraylength, but long-arrays cannot do that because of alignment constraints. The impact is that these tests don't work as expected, because vectorization triggers differently. I don't remember the details, TBH, but I believe they would now generate pre-loops, or some might even not vectorize at all. Those seemed to be use-cases that did not look very important, but I may be wrong. It would be nice to properly fix those tests, or make corresponding tests for compact headers, instead, or improve vectorization to better deal with the offset mismatch, if necessary/possible.
>> 
>> I will re-evaluate those tests, and add comments or remove the restrictions.
>
>> > > @rkennke Can you please explain the changes in these tests:
>> > > ```
>> > > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java
>> > > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java
>> > > test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java
>> > > test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java
>> > > test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java
>> > > ```
>> > > 
>> > > 
>> > >     
>> > >       
>> > >     
>> > > 
>> > >       
>> > >     
>> > > 
>> > >     
>> > >   
>> > > You added these IR rule restriction: `@IR(applyIf = {"UseCompactObjectHeaders", "false"},`
>> > > This means that if `UseCompactObjectHeaders` is enabled, vectorization seems to be impacted - that could be concerning because it has a performance impact.
>> > > I have recently changed a few things in SuperWord, so maybe some of them can be removed, because they now vectorize anyway?
>> > > Of course some special tests may just rely on `UseCompactObjectHeaders == false` - but I would need some comments in the tests where you added it to justify why we add the restriction.
>> > > Please also test this patch with the cross combinations of `UseCompactObjectHeaders` and `AlignVector` enabled and disabled (and add `VerifyAlignVector` as well).
>> > 
>> > 
>> > IIRC (it has been a while), the problem is that with Lilliput (and also without compact headers, but disabling compressed class-pointers -UseCompressedClassPointers, but nobody ever does that), byte[] and long[] start at different offsets (12 and 16, respectively). That is because with compact headers, we are using the 4 bytes after the arraylength, but long-arrays cannot do that because of alignment constraints. The impact is that these tests don't work as expected, because vectorization triggers differently. I don't remember the details, TBH, but I believe they would now generate pre-loops, or some might even not vectorize at all. Those seemed to be use-cases that did not look very important, but I may be wrong. It would be nice to properly fix those tests, or make corresponding tests for compact headers, instead, or improve vectorization to better deal with the offset mismatch, if necessary/possible.
>> > I will re-evaluate those tests, and add comments or remove the restrictions.
>> 
>> If it has indeed been a while, then it might well be that some of them work now, since I did make some improvements to auto-vectorization ...

> `LoadNKlass` nodes can then be expanded into more primitive operations (load and shift for compact headers, load with `klass_offset_in_bytes()` for original headers) within C2's back-end or even during code emission as sketched [here](https://github.com/robcasloz/jdk/commit/6cb4219f101e3be982264071c2cb1d0af1c6d754). @rkennke is this similar to what you tried out ("Expanding it as a macro")?

No, this is not what I tried. I tried to completely expand LoadNKlass, and replace it with the lower nodes that load and shift the mark-word right there, in ideal graph. But your approach is saner: there is so much implicit knowledge about Load(N)Klass, and even klass_offset_in_bytes(), all over the place, it would be very hard to get this right without breaking something.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2352926265

From roland at openjdk.org  Mon Sep 16 14:07:07 2024
From: roland at openjdk.org (Roland Westrelin)
Date: Mon, 16 Sep 2024 14:07:07 GMT
Subject: RFR: 8340183: Shenandoah: Incorrect match for clone barrier in
 is_gc_barrier_node
In-Reply-To: <lMJ7zz1QZt3FMIqVLhEpYe6aqMc_uqU_d0P2mWQI4IY=.136a8eb3-b7ae-43f4-b268-5f7a6d189e09@github.com>
References: <lMJ7zz1QZt3FMIqVLhEpYe6aqMc_uqU_d0P2mWQI4IY=.136a8eb3-b7ae-43f4-b268-5f7a6d189e09@github.com>
Message-ID: <I7NhR4jVuklXowTs34Rv02cNZoZiDttwDz-7Cft1iNE=.2fa69349-9b82-474a-ba53-b400f5361bba@github.com>

On Mon, 16 Sep 2024 10:35:15 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> The name of the call we emit is "shenandoah_clone":
> https://github.com/openjdk/jdk/blob/545951889c1ea68646be600decaf2bf4c049600b/src/hotspot/share/gc/shenandoah/c2/shenandoahBarrierSetC2.cpp#L806
> 
> ...yet we test for "shenandoah_clone_barrier" here:
> https://github.com/openjdk/jdk/blob/545951889c1ea68646be600decaf2bf4c049600b/src/hotspot/share/gc/shenandoah/c2/shenandoahBarrierSetC2.cpp#L688
> 
> I think we are better off polling the call target instead of relying on call name. This change also eliminates `shenandoah_cas_obj` matcher, for which we do not have the emitted call ever since we started doing CAS expansions inline.
> 
> Additional testing:
>  - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah`
>  - [ ] Linux x86_64 server fastdebug, `all` with `-XX:+UseShenandoahGC`

Looks good to me.

-------------

Marked as reviewed by roland (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/21014#pullrequestreview-2306783149

From aboldtch at openjdk.org  Mon Sep 16 14:17:07 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Mon, 16 Sep 2024 14:17:07 GMT
Subject: RFR: 8339648: ZGC: Division by zero in rule_major_allocation_rate
 [v2]
In-Reply-To: <fPSLNr7wPapykkTwLRCujkJ22IKobQdpIuNltMY2I54=.a9d79cb2-1bc5-4eed-b86d-48d5ed485586@github.com>
References: <ElUYTplxxvkJLtxj5OdpcsgumQV8zdml9oyQim2kBP8=.695e816f-a639-4b39-92d2-ea5de5cb1037@github.com>
 <ftQ3_zwYKXg1VEzKfx5XqYxCAWZtbhskNJTD47st1Jg=.eadd39e7-5a1e-4001-ab10-901c9ef57581@github.com>
 <fPSLNr7wPapykkTwLRCujkJ22IKobQdpIuNltMY2I54=.a9d79cb2-1bc5-4eed-b86d-48d5ed485586@github.com>
Message-ID: <-fwIuetM6bjQ93mo3QgorOpFNkxkgJ2SH-LbTT0k2h0=.f37f385d-20cf-4cf8-9496-d7256482726d@github.com>

On Mon, 16 Sep 2024 08:15:38 GMT, Lutz Schmidt <lucy at openjdk.org> wrote:

>> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Adjust division following suggestion by xmas
>
> src/hotspot/share/gc/z/zDirector.cpp line 491:
> 
>> 489:   // Calculate the GC cost for each reclaimed byte
>> 490:   const double current_young_gc_time_per_bytes_freed = double(young_gc_time) / double(reclaimed_per_young_gc);
>> 491:   const double current_old_gc_time_per_bytes_freed = reclaimed_per_old_gc == 0 ? std::numeric_limits<double>::infinity() : double(old_gc_time) / double(reclaimed_per_old_gc);
> 
> How about using some parentheses? To my understanding, the division has a higher precedence than the ternary conditional expression. See: https://en.cppreference.com/w/cpp/language/operator_precedence

I do not mind parentheses. But ternary are the lowest precedence (if you do not count the `,` which I would almost always say is wrong to use without a surrounding `() / [] / {}`), so to me it seems superfluous.

Just to clarify the intent of this code is what we are getting with a higher precedence on division. That is:

  const double current_old_gc_time_per_bytes_freed = ((reclaimed_per_old_gc == 0) ? (std::numeric_limits<double>::infinity()) : (double(old_gc_time) / double(reclaimed_per_old_gc)));


_Side Note:_
I also think I prefer immediately invoked lambdas when the ternaries get this long.

  const double current_old_gc_time_per_bytes_freed = [&]() {
    if (reclaimed_per_old_gc == 0) {
      return std::numeric_limits<double>::infinity();
    }
    return double(old_gc_time) / double(reclaimed_per_old_gc);
  }();

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20888#discussion_r1761248058

From mbaesken at openjdk.org  Mon Sep 16 15:07:42 2024
From: mbaesken at openjdk.org (Matthias Baesken)
Date: Mon, 16 Sep 2024 15:07:42 GMT
Subject: RFR: 8339648: ZGC: Division by zero in rule_major_allocation_rate
 [v3]
In-Reply-To: <ElUYTplxxvkJLtxj5OdpcsgumQV8zdml9oyQim2kBP8=.695e816f-a639-4b39-92d2-ea5de5cb1037@github.com>
References: <ElUYTplxxvkJLtxj5OdpcsgumQV8zdml9oyQim2kBP8=.695e816f-a639-4b39-92d2-ea5de5cb1037@github.com>
Message-ID: <9B02mnKaYLY90CZog5880J_BmLZT01rc6mEwqY6hWU0=.07e8eb6c-e9c3-40ac-98d1-8f99f32a5c88@github.com>

> The HS jtreg test gc/stringdedup/TestStringDeduplicationAgeThreshold_ZGenerational
> shows this error when running with ubsan enabled 
> 
> src/hotspot/share/gc/z/zDirector.cpp:491:74: runtime error: division by zero
>     #0 0x7f09886401d4 in rule_major_allocation_rate src/hotspot/share/gc/z/zDirector.cpp:491
>     #1 0x7f09886401d4 in start_gc src/hotspot/share/gc/z/zDirector.cpp:822
>     #2 0x7f09886401d4 in ZDirector::run_thread() src/hotspot/share/gc/z/zDirector.cpp:912
>     #3 0x7f098c1404e8 in ZThread::run_service() src/hotspot/share/gc/z/zThread.cpp:29
>     #4 0x7f09897cac19 in ConcurrentGCThread::run() src/hotspot/share/gc/shared/concurrentGCThread.cpp:48
>     #5 0x7f098bb46b0a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225
>     #6 0x7f098b1a9881 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858

Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision:

  add parentheses

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20888/files
  - new: https://git.openjdk.org/jdk/pull/20888/files/21fe3ca7..6902026f

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20888&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20888&range=01-02

  Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/20888.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20888/head:pull/20888

PR: https://git.openjdk.org/jdk/pull/20888

From kvn at openjdk.org  Mon Sep 16 15:51:15 2024
From: kvn at openjdk.org (Vladimir Kozlov)
Date: Mon, 16 Sep 2024 15:51:15 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v20]
In-Reply-To: <nzubYA7ESzm-LTxypE-Nx4OSVrK39Mj-wLou2alQxto=.0dfacde3-68f2-47c4-ba54-5dd76f7e91c6@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <fZlrdfsGJHUzjBFq0wEWY0QPZlhjC3KAK1obtgfMCTk=.7ccbe569-50d8-4a3e-8728-5aaca9678ab6@github.com>
 <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com>
 <nzubYA7ESzm-LTxypE-Nx4OSVrK39Mj-wLou2alQxto=.0dfacde3-68f2-47c4-ba54-5dd76f7e91c6@github.com>
Message-ID: <mQNshQkD1Uajpt2GT3thCvl_VOtyHm3Zmwruem-4s0g=.8060ba41-15a1-407e-a19a-6c6e97322a07@github.com>

On Mon, 16 Sep 2024 09:28:30 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

> Should I add a check to PhaseCFG::implicit_null_check to discard these memory accesses more explicitly?

Yes, please.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1761413544

From aboldtch at openjdk.org  Mon Sep 16 16:07:05 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Mon, 16 Sep 2024 16:07:05 GMT
Subject: RFR: 8339648: ZGC: Division by zero in rule_major_allocation_rate
 [v3]
In-Reply-To: <9B02mnKaYLY90CZog5880J_BmLZT01rc6mEwqY6hWU0=.07e8eb6c-e9c3-40ac-98d1-8f99f32a5c88@github.com>
References: <ElUYTplxxvkJLtxj5OdpcsgumQV8zdml9oyQim2kBP8=.695e816f-a639-4b39-92d2-ea5de5cb1037@github.com>
 <9B02mnKaYLY90CZog5880J_BmLZT01rc6mEwqY6hWU0=.07e8eb6c-e9c3-40ac-98d1-8f99f32a5c88@github.com>
Message-ID: <O3LbE1nEdl74be0PVq1cBx477SBkSS0fmaIfy4ZTyn0=.d8304072-8bf7-4428-ac82-a33a16a8fad0@github.com>

On Mon, 16 Sep 2024 15:07:42 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

>> The HS jtreg test gc/stringdedup/TestStringDeduplicationAgeThreshold_ZGenerational
>> shows this error when running with ubsan enabled 
>> 
>> src/hotspot/share/gc/z/zDirector.cpp:491:74: runtime error: division by zero
>>     #0 0x7f09886401d4 in rule_major_allocation_rate src/hotspot/share/gc/z/zDirector.cpp:491
>>     #1 0x7f09886401d4 in start_gc src/hotspot/share/gc/z/zDirector.cpp:822
>>     #2 0x7f09886401d4 in ZDirector::run_thread() src/hotspot/share/gc/z/zDirector.cpp:912
>>     #3 0x7f098c1404e8 in ZThread::run_service() src/hotspot/share/gc/z/zThread.cpp:29
>>     #4 0x7f09897cac19 in ConcurrentGCThread::run() src/hotspot/share/gc/shared/concurrentGCThread.cpp:48
>>     #5 0x7f098bb46b0a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225
>>     #6 0x7f098b1a9881 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858
>
> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision:
> 
>   add parentheses

Changes requested by aboldtch (Reviewer).

src/hotspot/share/gc/z/zDirector.cpp line 492:

> 490:   const double current_young_gc_time_per_bytes_freed = double(young_gc_time) / double(reclaimed_per_young_gc);
> 491:   const double current_old_gc_time_per_bytes_freed = ((reclaimed_per_old_gc == 0) ? (std::numeric_limits<double>::infinity())
> 492:                                                                                   : (double(old_gc_time) / double(reclaimed_per_old_gc)));

Suggestion:

  const double current_old_gc_time_per_bytes_freed = reclaimed_per_old_gc == 0 ? std::numeric_limits<double>::infinity() : (double(old_gc_time) / double(reclaimed_per_old_gc));


Sorry I probably confused things here. I think this is what was wanted. I just added all the parentheses as a clarification of how this was meant to be parsed by the compiler.

-------------

PR Review: https://git.openjdk.org/jdk/pull/20888#pullrequestreview-2307097576
PR Review Comment: https://git.openjdk.org/jdk/pull/20888#discussion_r1761435309

From kvn at openjdk.org  Mon Sep 16 16:11:04 2024
From: kvn at openjdk.org (Vladimir Kozlov)
Date: Mon, 16 Sep 2024 16:11:04 GMT
Subject: RFR: 8340186: Shenandoah: Missing
 load_reference_barrier_phantom_narrow match in is_shenandoah_lrb_call
In-Reply-To: <WKF6HOcWBzpASW2-szk8bk_hZ-DN6CGvX9vwkHPM0x0=.7fb65ec8-d648-4b60-ad73-f9c1e1c1b8ce@github.com>
References: <WKF6HOcWBzpASW2-szk8bk_hZ-DN6CGvX9vwkHPM0x0=.7fb65ec8-d648-4b60-ad73-f9c1e1c1b8ce@github.com>
Message-ID: <Eeb3ati5QBCdPFcglZ4XQ1KGCNJBXynYlw-BKGVmQwU=.db10e7de-56c6-41c0-930d-a15dac622c5e@github.com>

On Mon, 16 Sep 2024 10:38:12 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> [JDK-8256999](https://bugs.openjdk.org/browse/JDK-8256999) added `ShenandoahRuntime::load_reference_barrier_phantom_narrow`, but missed adding it to `ShenandoahBarrierSetC2::is_shenandoah_lrb_call`. It is currently innocuous, as there are no users of `is_shenandoah_lrb_call`, but it will be important when [JDK-8340183](https://bugs.openjdk.org/browse/JDK-8340183) lands.

Trivial ;)

-------------

Marked as reviewed by kvn (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/21016#pullrequestreview-2307108763

From aboldtch at openjdk.org  Mon Sep 16 16:21:20 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Mon, 16 Sep 2024 16:21:20 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v17]
In-Reply-To: <_5gI7i33xrOgXMTI_04oX9UDGwhVTtSNWoSiNfM3FOM=.b24979b3-dcde-401f-b2d8-9b201d303f57@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <_5gI7i33xrOgXMTI_04oX9UDGwhVTtSNWoSiNfM3FOM=.b24979b3-dcde-401f-b2d8-9b201d303f57@github.com>
Message-ID: <ofn5UzcoE5CTfZm-tNL_xw1htewyzz9VmVbEOSvplVg=.8f474dd6-b18c-464e-b944-3780281a6717@github.com>

On Mon, 16 Sep 2024 13:28:00 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> This is the main body of the JEP 450: Compact Object Headers (Experimental).
>> 
>> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
>> 
>> Main changes:
>>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>>  - Arrays will now store their length at offset 8.
>>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co...
>
> Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 54 commits:
> 
>  - Merge remote-tracking branch 'origin/master' into JDK-8305895-v4
>  - Fix  test/hotspot/jtreg/runtime/CompressedOops/CompressedClassPointersEncodingScheme.java
>  - Fix loop on aarch64
>  - clarify obscure assert in metasapce setup
>  - Rework compressedklass encoding
>  - remove stray debug output
>  - Fixes post 8338526
>  - Merge commit '597788850041e7272a23714c05ba546ee6080856' into JDK-8305895-v4
>  - Various touch-ups
>  - Hide log timestamps in test to prevent false failures
>  - ... and 44 more: https://git.openjdk.org/jdk/compare/54595188...2125cd81

src/hotspot/cpu/aarch64/aarch64.ad line 6459:

> 6457:   format %{ "ldrw  $dst, $mem\t# compressed class ptr" %}
> 6458:   ins_encode %{
> 6459:     __ load_nklass_compact_c2($dst$$Register, $mem$$base$$Register, $mem$$index$$Register, $mem$$scale, $mem$$disp);

I wonder if something along the line of this is required here.
Suggestion:

    Address addr = mem2address($mem->opcode(), $mem$$base$$Register, $mem$$index, $mem$$scale, $mem$$disp);
    __ load_nklass_compact_c2($dst$$Register, __ adjust_compact_object_header_address_c2(addr, rscratch1));

With `adjust_compact_object_header_address_c2` being:
```C++
Address C2_MacroAssembler::adjust_compact_object_header_address_c2(Address addr, Register tmp) {
  // The incoming address is pointing into obj-start + klass_offset_in_bytes. We need to extract
  // obj-start, so that we can load from the object's mark-word instead. Usually the address
  // comes as obj-start in addr.base() and klass_offset_in_bytes in addr.offset().
  if (addr.getMode() != Address::base_plus_offset) {
    lea(tmp, addr);
    addr = Address(tmp, -oopDesc::klass_offset_in_bytes());
  } else {
    addr = Address(addr.base(), addr.offset() - oopDesc::klass_offset_in_bytes());
  }
  return legitimize_address(addr, 8, tmp);
}


Maybe it is the case that we never get the case where `$mem->opcode()` is not `lsl` variant, nor that the offset is to far away for an immediate fixed by `legitimize_address`. But it seems like this would at least make those cases correct, while avoiding the `lea` in the common case. 

Maybe someone with better experience in aarch64 macroassembler+ad files and C2 can give an opinion.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1761455581

From shade at openjdk.org  Mon Sep 16 16:25:08 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Mon, 16 Sep 2024 16:25:08 GMT
Subject: RFR: 8340186: Shenandoah: Missing
 load_reference_barrier_phantom_narrow match in is_shenandoah_lrb_call
In-Reply-To: <WKF6HOcWBzpASW2-szk8bk_hZ-DN6CGvX9vwkHPM0x0=.7fb65ec8-d648-4b60-ad73-f9c1e1c1b8ce@github.com>
References: <WKF6HOcWBzpASW2-szk8bk_hZ-DN6CGvX9vwkHPM0x0=.7fb65ec8-d648-4b60-ad73-f9c1e1c1b8ce@github.com>
Message-ID: <2JTmLP8qoq6b358-CVLMhT5fgLK3rB_dWqTdNtwYUXg=.6a991137-e68a-4bdc-b58c-3bfb66774e79@github.com>

On Mon, 16 Sep 2024 10:38:12 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> [JDK-8256999](https://bugs.openjdk.org/browse/JDK-8256999) added `ShenandoahRuntime::load_reference_barrier_phantom_narrow`, but missed adding it to `ShenandoahBarrierSetC2::is_shenandoah_lrb_call`. It is currently innocuous, as there are no users of `is_shenandoah_lrb_call`, but it will be important when [JDK-8340183](https://bugs.openjdk.org/browse/JDK-8340183) lands.

Yup :) Thanks!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21016#issuecomment-2353368636

From shade at openjdk.org  Mon Sep 16 16:25:09 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Mon, 16 Sep 2024 16:25:09 GMT
Subject: Integrated: 8340186: Shenandoah: Missing
 load_reference_barrier_phantom_narrow match in is_shenandoah_lrb_call
In-Reply-To: <WKF6HOcWBzpASW2-szk8bk_hZ-DN6CGvX9vwkHPM0x0=.7fb65ec8-d648-4b60-ad73-f9c1e1c1b8ce@github.com>
References: <WKF6HOcWBzpASW2-szk8bk_hZ-DN6CGvX9vwkHPM0x0=.7fb65ec8-d648-4b60-ad73-f9c1e1c1b8ce@github.com>
Message-ID: <vIaPR50C2wPm6-9tPKibr2sCaqIBMc76QHPLR8XzrO0=.3d685ecd-125a-4f41-b9a9-ba28e0ac0faf@github.com>

On Mon, 16 Sep 2024 10:38:12 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> [JDK-8256999](https://bugs.openjdk.org/browse/JDK-8256999) added `ShenandoahRuntime::load_reference_barrier_phantom_narrow`, but missed adding it to `ShenandoahBarrierSetC2::is_shenandoah_lrb_call`. It is currently innocuous, as there are no users of `is_shenandoah_lrb_call`, but it will be important when [JDK-8340183](https://bugs.openjdk.org/browse/JDK-8340183) lands.

This pull request has now been integrated.

Changeset: 1640bd26
Author:    Aleksey Shipilev <shade at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/1640bd2676d8d183f02b4f5386ce42c47950e356
Stats:     2 lines in 1 file changed: 1 ins; 0 del; 1 mod

8340186: Shenandoah: Missing load_reference_barrier_phantom_narrow match in is_shenandoah_lrb_call

Reviewed-by: kvn

-------------

PR: https://git.openjdk.org/jdk/pull/21016

From rcastanedalo at openjdk.org  Mon Sep 16 16:34:59 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Mon, 16 Sep 2024 16:34:59 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v21]
In-Reply-To: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
Message-ID: <Fp5ivAsAYYStXkqUlra5GP4v9NMLi1UrRD3NkFIKcH4=.b98dec5b-446e-402b-b28a-54c4725e6fbd@github.com>

> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail.
> 
> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold:
> 
> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and
> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work.
> 
> ## Summary of the Changes
> 
> ### Platform-Independent Changes (`src/hotspot/share`)
> 
> These consist mainly of:
> 
> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early;
> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and
> - temporary support for porting the JEP to the remaining platforms.
> 
> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed.
> 
> ### Platform-Dependent Changes (`src/hotspot/cpu`)
> 
> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms.
> 
> #### ADL Changes
> 
> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy.
> 
> #### `G1BarrierSetAssembler` Changes
> 
> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c...

Roberto Casta?eda Lozano has updated the pull request incrementally with three additional commits since the last revision:

 - Add missing IR test to test run
 - Skip barrier refining for non-OOP stores and stores without barrier data
 - Assert that m is input to n in Matcher::is_encode_and_store_pattern

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19746/files
  - new: https://git.openjdk.org/jdk/pull/19746/files/141020e6..653f9acf

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=20
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=19-20

  Stats: 21 lines in 3 files changed: 16 ins; 1 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/19746.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746

PR: https://git.openjdk.org/jdk/pull/19746

From rcastanedalo at openjdk.org  Mon Sep 16 16:37:32 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Mon, 16 Sep 2024 16:37:32 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v20]
In-Reply-To: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <fZlrdfsGJHUzjBFq0wEWY0QPZlhjC3KAK1obtgfMCTk=.7ccbe569-50d8-4a3e-8728-5aaca9678ab6@github.com>
 <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com>
Message-ID: <nOf0rNbqdUHBEG2QCQ1gq6c1Uxpgvqdm4Aixl-j7sTs=.9b09020c-2da3-4488-81c6-b19d86fa07b4@github.com>

On Fri, 13 Sep 2024 22:51:07 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

>> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Fix a few style issues
>
> src/hotspot/share/opto/matcher.cpp line 2845:
> 
>> 2843:     n->Opcode() == Op_StoreN &&
>> 2844:     m->is_EncodeP();
>> 2845: }
> 
> Add comment that `m` is input of `n`. I thought about adding assert too but I will leave it to you.

Added the assertion (commit a480d70b).

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1761478462

From rcastanedalo at openjdk.org  Mon Sep 16 16:49:25 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Mon, 16 Sep 2024 16:49:25 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v20]
In-Reply-To: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <fZlrdfsGJHUzjBFq0wEWY0QPZlhjC3KAK1obtgfMCTk=.7ccbe569-50d8-4a3e-8728-5aaca9678ab6@github.com>
 <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com>
Message-ID: <cxEm9_XoC9s0Hyhvvk1S9_TbY2AGJzYbgTLpWVsoMsQ=.544a90b2-9ebb-40b0-b4ac-1029170726b5@github.com>

On Fri, 13 Sep 2024 23:14:19 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

>> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Fix a few style issues
>
> src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 241:
> 
>> 239:   assert(newval_bottom->isa_ptr() || newval_bottom->isa_narrowoop(), "newval should be an OOP");
>> 240:   TypePtr::PTR newval_type = newval_bottom->make_ptr()->ptr();
>> 241:   uint8_t barrier_data = store->barrier_data();
> 
> Should you check barrier data for 0?
> `is_ptr()` has wide set of types. It includes TypeRawPtr, TypeKlassPtr and TypeMetadataPtr. Where you filtering them?

I added the check and excluded other pointers than OOPs, narrow OOPs, and null pointers (needed because null in uncompressed OOP mode is typed as `AnyPtr`) in commit 10bc0d2c. Note that these checks are not strictly required for correctness, because for all other pointers the corresponding barrier data would be 0, and the only potential operations over it would be bit clearing. But I still think they have value in that they communicate more clearly the intent and scope of the optimization.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1761494258

From rkennke at openjdk.org  Mon Sep 16 17:53:09 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Mon, 16 Sep 2024 17:53:09 GMT
Subject: RFR: 8339960: GenShen: Fix inconsistencies in generational
 Shenandoah behavior
In-Reply-To: <vzXjdQg2wTPzgTG1aU7_CUnAxtnlt_2o01drZXtI324=.100816d9-1ed5-497c-be51-fd540f496cb7@github.com>
References: <vzXjdQg2wTPzgTG1aU7_CUnAxtnlt_2o01drZXtI324=.100816d9-1ed5-497c-be51-fd540f496cb7@github.com>
Message-ID: <82yjeweCEIPcfscwWESC3M8c_UTgXKOAROiVXAKF09k=.4273df48-3931-4836-8da7-c25d8fd5a29b@github.com>

On Thu, 12 Sep 2024 20:23:36 GMT, Kelvin Nilsen <kdnilsen at openjdk.org> wrote:

> This fixes some bugs found in recent code review and playback of an assertion failure.
> 
> See also https://github.com/openjdk/shenandoah/pull/497

Looks good, thanks!

-------------

Marked as reviewed by rkennke (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20974#pullrequestreview-2307337108

From duke at openjdk.org  Mon Sep 16 18:09:22 2024
From: duke at openjdk.org (duke)
Date: Mon, 16 Sep 2024 18:09:22 GMT
Subject: RFR: 8339960: GenShen: Fix inconsistencies in generational
 Shenandoah behavior
In-Reply-To: <vzXjdQg2wTPzgTG1aU7_CUnAxtnlt_2o01drZXtI324=.100816d9-1ed5-497c-be51-fd540f496cb7@github.com>
References: <vzXjdQg2wTPzgTG1aU7_CUnAxtnlt_2o01drZXtI324=.100816d9-1ed5-497c-be51-fd540f496cb7@github.com>
Message-ID: <e5qWqDwU24IurRMAe6iqSOrVnZUMFtOrwabMG4krC84=.5b71c284-d4f6-460f-84f4-3d39e53ace31@github.com>

On Thu, 12 Sep 2024 20:23:36 GMT, Kelvin Nilsen <kdnilsen at openjdk.org> wrote:

> This fixes some bugs found in recent code review and playback of an assertion failure.
> 
> See also https://github.com/openjdk/shenandoah/pull/497

@kdnilsen 
Your change (at version f1ba63f4d58161512ad0262783ceda0916aece3c) is now ready to be sponsored by a Committer.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20974#issuecomment-2353576294

From kdnilsen at openjdk.org  Mon Sep 16 19:18:11 2024
From: kdnilsen at openjdk.org (Kelvin Nilsen)
Date: Mon, 16 Sep 2024 19:18:11 GMT
Subject: Integrated: 8339960: GenShen: Fix inconsistencies in generational
 Shenandoah behavior
In-Reply-To: <vzXjdQg2wTPzgTG1aU7_CUnAxtnlt_2o01drZXtI324=.100816d9-1ed5-497c-be51-fd540f496cb7@github.com>
References: <vzXjdQg2wTPzgTG1aU7_CUnAxtnlt_2o01drZXtI324=.100816d9-1ed5-497c-be51-fd540f496cb7@github.com>
Message-ID: <6MxEO3TVWiI4HzK-mHqwb32Yq8tRq6Gg6PbxePp8Hl8=.ced31c43-4ff8-45af-9293-819a6cc9ab73@github.com>

On Thu, 12 Sep 2024 20:23:36 GMT, Kelvin Nilsen <kdnilsen at openjdk.org> wrote:

> This fixes some bugs found in recent code review and playback of an assertion failure.
> 
> See also https://github.com/openjdk/shenandoah/pull/497

This pull request has now been integrated.

Changeset: 858b4f12
Author:    Kelvin Nilsen <kdnilsen at openjdk.org>
Committer: Y. Srinivas Ramakrishna <ysr at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/858b4f127ad873666f51f4c54c37fa2d7801c32c
Stats:     4 lines in 1 file changed: 2 ins; 0 del; 2 mod

8339960: GenShen: Fix inconsistencies in generational Shenandoah behavior

Reviewed-by: wkemper, rkennke

-------------

PR: https://git.openjdk.org/jdk/pull/20974

From rcastanedalo at openjdk.org  Tue Sep 17 05:20:30 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Tue, 17 Sep 2024 05:20:30 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v22]
In-Reply-To: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
Message-ID: <FqaLMxkntOx3RpFcdHQx0jjZ0NSqaUoARbcJDHnlDW4=.a37161d8-b299-4595-98bd-b2c396e5e266@github.com>

> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail.
> 
> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold:
> 
> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and
> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work.
> 
> ## Summary of the Changes
> 
> ### Platform-Independent Changes (`src/hotspot/share`)
> 
> These consist mainly of:
> 
> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early;
> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and
> - temporary support for porting the JEP to the remaining platforms.
> 
> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed.
> 
> ### Platform-Dependent Changes (`src/hotspot/cpu`)
> 
> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms.
> 
> #### ADL Changes
> 
> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy.
> 
> #### `G1BarrierSetAssembler` Changes
> 
> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c...

Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision:

  Discard memory accesses with barrier data as implicit null check candidates

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19746/files
  - new: https://git.openjdk.org/jdk/pull/19746/files/653f9acf..71a51bfc

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=21
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=20-21

  Stats: 8 lines in 1 file changed: 8 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/19746.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746

PR: https://git.openjdk.org/jdk/pull/19746

From rcastanedalo at openjdk.org  Tue Sep 17 05:20:30 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Tue, 17 Sep 2024 05:20:30 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v20]
In-Reply-To: <mQNshQkD1Uajpt2GT3thCvl_VOtyHm3Zmwruem-4s0g=.8060ba41-15a1-407e-a19a-6c6e97322a07@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <fZlrdfsGJHUzjBFq0wEWY0QPZlhjC3KAK1obtgfMCTk=.7ccbe569-50d8-4a3e-8728-5aaca9678ab6@github.com>
 <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com>
 <nzubYA7ESzm-LTxypE-Nx4OSVrK39Mj-wLou2alQxto=.0dfacde3-68f2-47c4-ba54-5dd76f7e91c6@github.com>
 <mQNshQkD1Uajpt2GT3thCvl_VOtyHm3Zmwruem-4s0g=.8060ba41-15a1-407e-a19a-6c6e97322a07@github.com>
Message-ID: <NLFruqtMmPStxMyPj0VB3LHmhsRz_W6pJOA5rylMn-M=.20a620f4-0f47-43a7-8cc1-f64f618b14b2@github.com>

On Mon, 16 Sep 2024 15:48:32 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

>> I did not make any changes because the current logic in `lcm.cpp` already prevents this, albeit in a rather accidental way: `PhaseCFG::implicit_null_check` requires that all inputs of a candidate memory operation dominate the null check ([here](https://github.com/robcasloz/jdk/blob/d21104ca8ff1eef88a9d87fb78dda3009414b5b8/src/hotspot/share/opto/lcm.cpp#L310-L328)) so that it can be hoisted. This fails if the candidate memory operation has barriers because these always require `MachTemp` nodes, which are placed in the same block as the candidate and break the dominance condition. See a longer explanation [here](https://github.com/openjdk/jdk/pull/19746/files#r1715387255).
>> 
>> Should I add a check to `PhaseCFG::implicit_null_check` to discard these memory accesses more explicitly?
>
>> Should I add a check to PhaseCFG::implicit_null_check to discard these memory accesses more explicitly?
> 
> Yes, please.

Done (commit 71a51bfc).

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1762318179

From mbaesken at openjdk.org  Tue Sep 17 07:28:51 2024
From: mbaesken at openjdk.org (Matthias Baesken)
Date: Tue, 17 Sep 2024 07:28:51 GMT
Subject: RFR: 8339648: ZGC: Division by zero in rule_major_allocation_rate
 [v4]
In-Reply-To: <ElUYTplxxvkJLtxj5OdpcsgumQV8zdml9oyQim2kBP8=.695e816f-a639-4b39-92d2-ea5de5cb1037@github.com>
References: <ElUYTplxxvkJLtxj5OdpcsgumQV8zdml9oyQim2kBP8=.695e816f-a639-4b39-92d2-ea5de5cb1037@github.com>
Message-ID: <pwgljGMCMe3y7tp4okuV2uE8PGyXlOXh_xixkEnyJ60=.2e383566-041f-47f5-8be5-c635ffdef347@github.com>

> The HS jtreg test gc/stringdedup/TestStringDeduplicationAgeThreshold_ZGenerational
> shows this error when running with ubsan enabled 
> 
> src/hotspot/share/gc/z/zDirector.cpp:491:74: runtime error: division by zero
>     #0 0x7f09886401d4 in rule_major_allocation_rate src/hotspot/share/gc/z/zDirector.cpp:491
>     #1 0x7f09886401d4 in start_gc src/hotspot/share/gc/z/zDirector.cpp:822
>     #2 0x7f09886401d4 in ZDirector::run_thread() src/hotspot/share/gc/z/zDirector.cpp:912
>     #3 0x7f098c1404e8 in ZThread::run_service() src/hotspot/share/gc/z/zThread.cpp:29
>     #4 0x7f09897cac19 in ConcurrentGCThread::run() src/hotspot/share/gc/shared/concurrentGCThread.cpp:48
>     #5 0x7f098bb46b0a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225
>     #6 0x7f098b1a9881 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858

Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision:

  adjust parentheses

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20888/files
  - new: https://git.openjdk.org/jdk/pull/20888/files/6902026f..7ecdb37f

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20888&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20888&range=02-03

  Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/20888.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20888/head:pull/20888

PR: https://git.openjdk.org/jdk/pull/20888

From aboldtch at openjdk.org  Tue Sep 17 07:45:06 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Tue, 17 Sep 2024 07:45:06 GMT
Subject: RFR: 8339648: ZGC: Division by zero in rule_major_allocation_rate
 [v4]
In-Reply-To: <pwgljGMCMe3y7tp4okuV2uE8PGyXlOXh_xixkEnyJ60=.2e383566-041f-47f5-8be5-c635ffdef347@github.com>
References: <ElUYTplxxvkJLtxj5OdpcsgumQV8zdml9oyQim2kBP8=.695e816f-a639-4b39-92d2-ea5de5cb1037@github.com>
 <pwgljGMCMe3y7tp4okuV2uE8PGyXlOXh_xixkEnyJ60=.2e383566-041f-47f5-8be5-c635ffdef347@github.com>
Message-ID: <8oaWmqLYOQgXvxb4I9EFR_Jw7IyPWz6O9_nd9i2YlB4=.30e72075-acda-4eba-9a3b-b3589e22df13@github.com>

On Tue, 17 Sep 2024 07:28:51 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

>> The HS jtreg test gc/stringdedup/TestStringDeduplicationAgeThreshold_ZGenerational
>> shows this error when running with ubsan enabled 
>> 
>> src/hotspot/share/gc/z/zDirector.cpp:491:74: runtime error: division by zero
>>     #0 0x7f09886401d4 in rule_major_allocation_rate src/hotspot/share/gc/z/zDirector.cpp:491
>>     #1 0x7f09886401d4 in start_gc src/hotspot/share/gc/z/zDirector.cpp:822
>>     #2 0x7f09886401d4 in ZDirector::run_thread() src/hotspot/share/gc/z/zDirector.cpp:912
>>     #3 0x7f098c1404e8 in ZThread::run_service() src/hotspot/share/gc/z/zThread.cpp:29
>>     #4 0x7f09897cac19 in ConcurrentGCThread::run() src/hotspot/share/gc/shared/concurrentGCThread.cpp:48
>>     #5 0x7f098bb46b0a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225
>>     #6 0x7f098b1a9881 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858
>
> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision:
> 
>   adjust parentheses

Marked as reviewed by aboldtch (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/20888#pullrequestreview-2308661474

From lucy at openjdk.org  Tue Sep 17 09:32:11 2024
From: lucy at openjdk.org (Lutz Schmidt)
Date: Tue, 17 Sep 2024 09:32:11 GMT
Subject: RFR: 8339648: ZGC: Division by zero in rule_major_allocation_rate
 [v4]
In-Reply-To: <pwgljGMCMe3y7tp4okuV2uE8PGyXlOXh_xixkEnyJ60=.2e383566-041f-47f5-8be5-c635ffdef347@github.com>
References: <ElUYTplxxvkJLtxj5OdpcsgumQV8zdml9oyQim2kBP8=.695e816f-a639-4b39-92d2-ea5de5cb1037@github.com>
 <pwgljGMCMe3y7tp4okuV2uE8PGyXlOXh_xixkEnyJ60=.2e383566-041f-47f5-8be5-c635ffdef347@github.com>
Message-ID: <lhBKNy_Ir4qBPa-MoBwI3uRhNay67JWmqau7zAuWGvg=.fdadf050-1552-4af6-85d7-bacb76c8608d@github.com>

On Tue, 17 Sep 2024 07:28:51 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

>> The HS jtreg test gc/stringdedup/TestStringDeduplicationAgeThreshold_ZGenerational
>> shows this error when running with ubsan enabled 
>> 
>> src/hotspot/share/gc/z/zDirector.cpp:491:74: runtime error: division by zero
>>     #0 0x7f09886401d4 in rule_major_allocation_rate src/hotspot/share/gc/z/zDirector.cpp:491
>>     #1 0x7f09886401d4 in start_gc src/hotspot/share/gc/z/zDirector.cpp:822
>>     #2 0x7f09886401d4 in ZDirector::run_thread() src/hotspot/share/gc/z/zDirector.cpp:912
>>     #3 0x7f098c1404e8 in ZThread::run_service() src/hotspot/share/gc/z/zThread.cpp:29
>>     #4 0x7f09897cac19 in ConcurrentGCThread::run() src/hotspot/share/gc/shared/concurrentGCThread.cpp:48
>>     #5 0x7f098bb46b0a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225
>>     #6 0x7f098b1a9881 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858
>
> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision:
> 
>   adjust parentheses

Looks good now. Thanks

-------------

Marked as reviewed by lucy (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20888#pullrequestreview-2309074532

From rkennke at openjdk.org  Tue Sep 17 09:35:02 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Tue, 17 Sep 2024 09:35:02 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v18]
In-Reply-To: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
Message-ID: <tzW1ygFfWY40OR9lB3z0gVv7H9WKJPbFCZmR0F23-_k=.de6459ea-024c-48de-a062-0136d0af0306@github.com>

> This is the main body of the JEP 450: Compact Object Headers (Experimental).
> 
> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
> 
> Main changes:
>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>  - Arrays will now store their length at offset 8.
>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv...

Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 57 commits:

 - fix CompressedClassPointersEncodingScheme yet again for linux aarch64
 - Fixes post-8340184
 - Merge upstream up to and including 8340184
 - Merge remote-tracking branch 'origin/master' into JDK-8305895-v4
 - Fix  test/hotspot/jtreg/runtime/CompressedOops/CompressedClassPointersEncodingScheme.java
 - Fix loop on aarch64
 - clarify obscure assert in metasapce setup
 - Rework compressedklass encoding
 - remove stray debug output
 - Fixes post 8338526
 - ... and 47 more: https://git.openjdk.org/jdk/compare/7849f252...28a26aed

-------------

Changes: https://git.openjdk.org/jdk/pull/20677/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=17
  Stats: 4518 lines in 190 files changed: 3180 ins; 718 del; 620 mod
  Patch: https://git.openjdk.org/jdk/pull/20677.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677

PR: https://git.openjdk.org/jdk/pull/20677

From stuefe at openjdk.org  Tue Sep 17 10:02:17 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Tue, 17 Sep 2024 10:02:17 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v11]
In-Reply-To: <b8u6PaStO4nt0AGnbZEkmXkzQKzXaB2c4lSiC9JWdTY=.a5805ef2-84d8-44c0-9dc0-3e90da3460aa@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <DUgk6HT_Zk5KnbmAd_HzrJi4B8D1J-uuoAGzReWyI8s=.6c04f90a-bf1d-49af-a31d-5224973952d4@github.com>
 <b8u6PaStO4nt0AGnbZEkmXkzQKzXaB2c4lSiC9JWdTY=.a5805ef2-84d8-44c0-9dc0-3e90da3460aa@github.com>
Message-ID: <IvzUGZlRyrzIiWxydj9nR_z-0seKPQXR3vZgv9fDYbs=.e21f58b1-42bc-42d5-9eb5-797320dcd9ce@github.com>

On Wed, 11 Sep 2024 12:25:37 GMT, Johan Sj?len <jsjolen at openjdk.org> wrote:

>> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Fix FullGCForwarding initialization
>
> src/hotspot/share/memory/classLoaderMetaspace.hpp line 81:
> 
>> 79:   metaspace::MetaspaceArena* class_space_arena() const       { return _class_space_arena; }
>> 80: 
>> 81:   bool have_class_space_arena() const { return _class_space_arena != nullptr; }
> 
> This is unnecessary. Instead of having this and having to remember to check for nullness each time, just change the `_class_space_arena` to point to the same arena as the `_non_class_space_arena` does when we run with `-XX:-UseCompressedClassPointers`

I'd prefer not to. 

This logic (when -UCCP class space arena is NULL, with the implicit assumption that both are different entities) has been in there forever, and changing that is out of scope for and unrelated to this PR. I am not sure what will break if I change this but don't want to chase risk test errors at this point (one example, reporting would have to be adapted to recognize that both arenas are the same, and there are plenty of tests that would also need to be fixd).

This can be done in a follow-up RFE if necessary.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1762917467

From stuefe at openjdk.org  Tue Sep 17 10:05:20 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Tue, 17 Sep 2024 10:05:20 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v11]
In-Reply-To: <b8u6PaStO4nt0AGnbZEkmXkzQKzXaB2c4lSiC9JWdTY=.a5805ef2-84d8-44c0-9dc0-3e90da3460aa@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <DUgk6HT_Zk5KnbmAd_HzrJi4B8D1J-uuoAGzReWyI8s=.6c04f90a-bf1d-49af-a31d-5224973952d4@github.com>
 <b8u6PaStO4nt0AGnbZEkmXkzQKzXaB2c4lSiC9JWdTY=.a5805ef2-84d8-44c0-9dc0-3e90da3460aa@github.com>
Message-ID: <uvkx62POF-wL1HcEf22FxsR8sjbeiW3K2OKzqzcUwmg=.c3c4c165-ff8a-4839-a5bc-d269666a316e@github.com>

On Wed, 11 Sep 2024 13:05:10 GMT, Johan Sj?len <jsjolen at openjdk.org> wrote:

>> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Fix FullGCForwarding initialization
>
> src/hotspot/share/memory/classLoaderMetaspace.cpp line 165:
> 
>> 163:   MetaBlock bl(ptr, word_size);
>> 164:   // If the block would be reusable for a Klass, add to class arena, otherwise to
>> 165:   // then non-class arena.
> 
> Nit: spelling, "the"

Okay

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1762928041

From stuefe at openjdk.org  Tue Sep 17 10:16:24 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Tue, 17 Sep 2024 10:16:24 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v11]
In-Reply-To: <b8u6PaStO4nt0AGnbZEkmXkzQKzXaB2c4lSiC9JWdTY=.a5805ef2-84d8-44c0-9dc0-3e90da3460aa@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <DUgk6HT_Zk5KnbmAd_HzrJi4B8D1J-uuoAGzReWyI8s=.6c04f90a-bf1d-49af-a31d-5224973952d4@github.com>
 <b8u6PaStO4nt0AGnbZEkmXkzQKzXaB2c4lSiC9JWdTY=.a5805ef2-84d8-44c0-9dc0-3e90da3460aa@github.com>
Message-ID: <HUXKSlIwpGr-tpd3FY7OBJJ_mUDUYFb4Z51fLmndwuc=.aeff8ad5-6710-4ffa-a377-73eb69811667@github.com>

On Wed, 11 Sep 2024 13:50:59 GMT, Johan Sj?len <jsjolen at openjdk.org> wrote:

>> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Fix FullGCForwarding initialization
>
> src/hotspot/share/memory/metaspace.cpp line 656:
> 
>> 654:     // Adjust size of the compressed class space.
>> 655: 
>> 656:     const size_t res_align = reserve_alignment();
> 
> Can you change the name to `root_chunk_size`?

It feels wrong, since this is a deeply hidden implementation detail.\

I will remove this temporary variable, which will also make the diff smaller.

> src/hotspot/share/memory/metaspace.hpp line 112:
> 
>> 110:   static size_t max_allocation_word_size();
>> 111: 
>> 112:   // Minimum allocation alignment, in bytes. All MetaData shall be aligned correclty
> 
> Nit: Spelling, "correctly"

Fixed

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1762968742
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1762972938

From stuefe at openjdk.org  Tue Sep 17 10:23:19 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Tue, 17 Sep 2024 10:23:19 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v11]
In-Reply-To: <b8u6PaStO4nt0AGnbZEkmXkzQKzXaB2c4lSiC9JWdTY=.a5805ef2-84d8-44c0-9dc0-3e90da3460aa@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <DUgk6HT_Zk5KnbmAd_HzrJi4B8D1J-uuoAGzReWyI8s=.6c04f90a-bf1d-49af-a31d-5224973952d4@github.com>
 <b8u6PaStO4nt0AGnbZEkmXkzQKzXaB2c4lSiC9JWdTY=.a5805ef2-84d8-44c0-9dc0-3e90da3460aa@github.com>
Message-ID: <McAy2Wmaq0Pz-V1qYn1ejKHxBqB86FLFTmz1ALHa1K0=.1dbcb43f-dd7f-41c3-ac2b-ce1134f802dd@github.com>

On Wed, 11 Sep 2024 11:25:56 GMT, Johan Sj?len <jsjolen at openjdk.org> wrote:

>> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Fix FullGCForwarding initialization
>
> src/hotspot/share/memory/metaspace/metablock.hpp line 48:
> 
>> 46: 
>> 47:   MetaWord* base() const { return _base; }
>> 48:   const MetaWord* end() const { return _base + _word_size; }
> 
> `assert(is_nonempty())`

Raises the question of why here and not in other accessors? 

Note that the only patch via which end() is called already asserts for non-empty-ness (MetaspaceArena::contains).

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1762985723

From jsjolen at openjdk.org  Tue Sep 17 10:31:19 2024
From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=)
Date: Tue, 17 Sep 2024 10:31:19 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v11]
In-Reply-To: <IvzUGZlRyrzIiWxydj9nR_z-0seKPQXR3vZgv9fDYbs=.e21f58b1-42bc-42d5-9eb5-797320dcd9ce@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <DUgk6HT_Zk5KnbmAd_HzrJi4B8D1J-uuoAGzReWyI8s=.6c04f90a-bf1d-49af-a31d-5224973952d4@github.com>
 <b8u6PaStO4nt0AGnbZEkmXkzQKzXaB2c4lSiC9JWdTY=.a5805ef2-84d8-44c0-9dc0-3e90da3460aa@github.com>
 <IvzUGZlRyrzIiWxydj9nR_z-0seKPQXR3vZgv9fDYbs=.e21f58b1-42bc-42d5-9eb5-797320dcd9ce@github.com>
Message-ID: <Wzl0buNcLNqNrvVd8m4A8-oPngryjSc5wBcDoaT-6o0=.fbed6a18-5f64-4b2c-93b7-abb65fd5b3ec@github.com>

On Tue, 17 Sep 2024 09:59:49 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> src/hotspot/share/memory/classLoaderMetaspace.hpp line 81:
>> 
>>> 79:   metaspace::MetaspaceArena* class_space_arena() const       { return _class_space_arena; }
>>> 80: 
>>> 81:   bool have_class_space_arena() const { return _class_space_arena != nullptr; }
>> 
>> This is unnecessary. Instead of having this and having to remember to check for nullness each time, just change the `_class_space_arena` to point to the same arena as the `_non_class_space_arena` does when we run with `-XX:-UseCompressedClassPointers`
>
> I'd prefer not to. 
> 
> This logic (when -UCCP class space arena is NULL, with the implicit assumption that both are different entities) has been in there forever, and changing that is out of scope for and unrelated to this PR. I am not sure what will break if I change this but don't want to risk test errors at this point (one example, reporting would have to be adapted to recognize that both arenas are the same, and there are plenty of tests that would also need to be fixd).
> 
> This can be done in a follow-up RFE if necessary.

OK, that's fine.

>> src/hotspot/share/memory/metaspace.cpp line 656:
>> 
>>> 654:     // Adjust size of the compressed class space.
>>> 655: 
>>> 656:     const size_t res_align = reserve_alignment();
>> 
>> Can you change the name to `root_chunk_size`?
>
> It feels wrong, since this is a deeply hidden implementation detail.\
> 
> I will remove this temporary variable, which will also make the diff smaller.

Sounds OK, I wanted the name change to indicate that "hey, deep impl detail where we use this to mean something else".

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1762993568
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1762994772

From stuefe at openjdk.org  Tue Sep 17 10:31:20 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Tue, 17 Sep 2024 10:31:20 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v11]
In-Reply-To: <b8u6PaStO4nt0AGnbZEkmXkzQKzXaB2c4lSiC9JWdTY=.a5805ef2-84d8-44c0-9dc0-3e90da3460aa@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <DUgk6HT_Zk5KnbmAd_HzrJi4B8D1J-uuoAGzReWyI8s=.6c04f90a-bf1d-49af-a31d-5224973952d4@github.com>
 <b8u6PaStO4nt0AGnbZEkmXkzQKzXaB2c4lSiC9JWdTY=.a5805ef2-84d8-44c0-9dc0-3e90da3460aa@github.com>
Message-ID: <32_SIVHDWyZyYSvbV1jUHc631MTKUP2Thh_M9Q71jrc=.351aed23-599d-4a53-9cc0-0e9c85ecdf03@github.com>

On Wed, 11 Sep 2024 11:29:38 GMT, Johan Sj?len <jsjolen at openjdk.org> wrote:

>> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Fix FullGCForwarding initialization
>
> src/hotspot/share/memory/metaspace/metablock.hpp line 52:
> 
>> 50:   bool is_empty() const { return _base == nullptr; }
>> 51:   bool is_nonempty() const { return _base != nullptr; }
>> 52:   void reset() { _base = nullptr; _word_size = 0; }
> 
> Is this function really necessary? According to my IDE it's only used in tests and even then the `MetaBlock` isn't used afterwards (so it has no effect).

see test_clms.cpp, test_random function, used in two places there.

> src/hotspot/share/memory/metaspace/metaspaceArena.hpp line 84:
> 
>> 82:   // between threads and needs to be synchronized in CLMS.
>> 83: 
>> 84:   const size_t _allocation_alignment_words;
> 
> Nit: Document this? All other members are documented.

ok

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1762993378
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1762995731

From stuefe at openjdk.org  Tue Sep 17 10:31:23 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Tue, 17 Sep 2024 10:31:23 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v18]
In-Reply-To: <b8u6PaStO4nt0AGnbZEkmXkzQKzXaB2c4lSiC9JWdTY=.a5805ef2-84d8-44c0-9dc0-3e90da3460aa@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <DUgk6HT_Zk5KnbmAd_HzrJi4B8D1J-uuoAGzReWyI8s=.6c04f90a-bf1d-49af-a31d-5224973952d4@github.com>
 <b8u6PaStO4nt0AGnbZEkmXkzQKzXaB2c4lSiC9JWdTY=.a5805ef2-84d8-44c0-9dc0-3e90da3460aa@github.com>
Message-ID: <nE6sEWK9Et-vO44iZzCxbuZv-qnE20-DrCXBIdh7e1o=.089d17cb-8501-4e39-884d-9f638920b638@github.com>

On Wed, 11 Sep 2024 11:40:24 GMT, Johan Sj?len <jsjolen at openjdk.org> wrote:

>> Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 57 commits:
>> 
>>  - fix CompressedClassPointersEncodingScheme yet again for linux aarch64
>>  - Fixes post-8340184
>>  - Merge upstream up to and including 8340184
>>  - Merge remote-tracking branch 'origin/master' into JDK-8305895-v4
>>  - Fix  test/hotspot/jtreg/runtime/CompressedOops/CompressedClassPointersEncodingScheme.java
>>  - Fix loop on aarch64
>>  - clarify obscure assert in metasapce setup
>>  - Rework compressedklass encoding
>>  - remove stray debug output
>>  - Fixes post 8338526
>>  - ... and 47 more: https://git.openjdk.org/jdk/compare/7849f252...28a26aed
>
> src/hotspot/share/memory/metaspace/metaspaceArena.hpp line 44:
> 
>> 42: class FreeBlocks;
>> 43: 
>> 44: struct ArenaStats;
> 
> Nit: Sort?

ok

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1762994972

From jsjolen at openjdk.org  Tue Sep 17 10:47:25 2024
From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=)
Date: Tue, 17 Sep 2024 10:47:25 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v18]
In-Reply-To: <tzW1ygFfWY40OR9lB3z0gVv7H9WKJPbFCZmR0F23-_k=.de6459ea-024c-48de-a062-0136d0af0306@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <tzW1ygFfWY40OR9lB3z0gVv7H9WKJPbFCZmR0F23-_k=.de6459ea-024c-48de-a062-0136d0af0306@github.com>
Message-ID: <sXx4wxL6nlbJFQXLjN83xKY43ZZ1PyuLwk32st0Dfhg=.df5ee7f3-9c92-41d4-a80b-6e7291619a23@github.com>

On Tue, 17 Sep 2024 09:35:02 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> This is the main body of the JEP 450: Compact Object Headers (Experimental).
>> 
>> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
>> 
>> Main changes:
>>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>>  - Arrays will now store their length at offset 8.
>>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co...
>
> Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 57 commits:
> 
>  - fix CompressedClassPointersEncodingScheme yet again for linux aarch64
>  - Fixes post-8340184
>  - Merge upstream up to and including 8340184
>  - Merge remote-tracking branch 'origin/master' into JDK-8305895-v4
>  - Fix  test/hotspot/jtreg/runtime/CompressedOops/CompressedClassPointersEncodingScheme.java
>  - Fix loop on aarch64
>  - clarify obscure assert in metasapce setup
>  - Rework compressedklass encoding
>  - remove stray debug output
>  - Fixes post 8338526
>  - ... and 47 more: https://git.openjdk.org/jdk/compare/7849f252...28a26aed

Hi,

We've gone through the rest of the Metaspace code and looked at the tests. It looks OK to us. Would like to see some style cleanups in the tests, but that can wait as a follow up.

test/hotspot/gtest/metaspace/test_clms.cpp line 193:

> 191: 
> 192:       {
> 193:         // Nonclass arena allocation.

The style in this source file isn't really up to scratch, especially *these* lines. Anyway, it's in the tests, so I'm OK with this being fixed in a follow up RFE.

-------------

PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2309360771
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1763005291

From mbaesken at openjdk.org  Tue Sep 17 12:01:10 2024
From: mbaesken at openjdk.org (Matthias Baesken)
Date: Tue, 17 Sep 2024 12:01:10 GMT
Subject: RFR: 8339648: ZGC: Division by zero in rule_major_allocation_rate
 [v4]
In-Reply-To: <pwgljGMCMe3y7tp4okuV2uE8PGyXlOXh_xixkEnyJ60=.2e383566-041f-47f5-8be5-c635ffdef347@github.com>
References: <ElUYTplxxvkJLtxj5OdpcsgumQV8zdml9oyQim2kBP8=.695e816f-a639-4b39-92d2-ea5de5cb1037@github.com>
 <pwgljGMCMe3y7tp4okuV2uE8PGyXlOXh_xixkEnyJ60=.2e383566-041f-47f5-8be5-c635ffdef347@github.com>
Message-ID: <c4pXOl5bMIfHHBTdjIvNRFFWCIIttMSzm6cdEUv0DwE=.a1d3f932-fe5e-4a94-8971-4938b47ced3c@github.com>

On Tue, 17 Sep 2024 07:28:51 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

>> The HS jtreg test gc/stringdedup/TestStringDeduplicationAgeThreshold_ZGenerational
>> shows this error when running with ubsan enabled 
>> 
>> src/hotspot/share/gc/z/zDirector.cpp:491:74: runtime error: division by zero
>>     #0 0x7f09886401d4 in rule_major_allocation_rate src/hotspot/share/gc/z/zDirector.cpp:491
>>     #1 0x7f09886401d4 in start_gc src/hotspot/share/gc/z/zDirector.cpp:822
>>     #2 0x7f09886401d4 in ZDirector::run_thread() src/hotspot/share/gc/z/zDirector.cpp:912
>>     #3 0x7f098c1404e8 in ZThread::run_service() src/hotspot/share/gc/z/zThread.cpp:29
>>     #4 0x7f09897cac19 in ConcurrentGCThread::run() src/hotspot/share/gc/shared/concurrentGCThread.cpp:48
>>     #5 0x7f098bb46b0a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225
>>     #6 0x7f098b1a9881 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858
>
> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision:
> 
>   adjust parentheses

Thanks for the reviews !

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20888#issuecomment-2355506623

From mbaesken at openjdk.org  Tue Sep 17 12:01:11 2024
From: mbaesken at openjdk.org (Matthias Baesken)
Date: Tue, 17 Sep 2024 12:01:11 GMT
Subject: Integrated: 8339648: ZGC: Division by zero in
 rule_major_allocation_rate
In-Reply-To: <ElUYTplxxvkJLtxj5OdpcsgumQV8zdml9oyQim2kBP8=.695e816f-a639-4b39-92d2-ea5de5cb1037@github.com>
References: <ElUYTplxxvkJLtxj5OdpcsgumQV8zdml9oyQim2kBP8=.695e816f-a639-4b39-92d2-ea5de5cb1037@github.com>
Message-ID: <Fim9haDyqPjIXwr8nRp0xWhgSCfLK728ms7b-KWDK7M=.b6dcad76-174a-4be8-b382-872e41c33a01@github.com>

On Fri, 6 Sep 2024 10:26:19 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

> The HS jtreg test gc/stringdedup/TestStringDeduplicationAgeThreshold_ZGenerational
> shows this error when running with ubsan enabled 
> 
> src/hotspot/share/gc/z/zDirector.cpp:491:74: runtime error: division by zero
>     #0 0x7f09886401d4 in rule_major_allocation_rate src/hotspot/share/gc/z/zDirector.cpp:491
>     #1 0x7f09886401d4 in start_gc src/hotspot/share/gc/z/zDirector.cpp:822
>     #2 0x7f09886401d4 in ZDirector::run_thread() src/hotspot/share/gc/z/zDirector.cpp:912
>     #3 0x7f098c1404e8 in ZThread::run_service() src/hotspot/share/gc/z/zThread.cpp:29
>     #4 0x7f09897cac19 in ConcurrentGCThread::run() src/hotspot/share/gc/shared/concurrentGCThread.cpp:48
>     #5 0x7f098bb46b0a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225
>     #6 0x7f098b1a9881 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858

This pull request has now been integrated.

Changeset: 80db6e71
Author:    Matthias Baesken <mbaesken at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/80db6e71b092867212147bd369a9fda65dbd4b70
Stats:     2 lines in 1 file changed: 1 ins; 0 del; 1 mod

8339648: ZGC: Division by zero in rule_major_allocation_rate

Reviewed-by: aboldtch, lucy, tschatzl

-------------

PR: https://git.openjdk.org/jdk/pull/20888

From rkennke at openjdk.org  Tue Sep 17 12:52:03 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Tue, 17 Sep 2024 12:52:03 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v19]
In-Reply-To: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
Message-ID: <FENW_mCsovAzl3Jpaf-3Sz4Lm8PWbCYlDHEqrDT4qjM=.0f01c195-d868-4ba0-a3f6-2bdf8e95e510@github.com>

> This is the main body of the JEP 450: Compact Object Headers (Experimental).
> 
> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
> 
> Main changes:
>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>  - Arrays will now store their length at offset 8.
>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv...

Roman Kennke has updated the pull request incrementally with two additional commits since the last revision:

 - CompressedKlassPointers::is_encodable shall be callable with -UseCCP
 - Johan review feedback

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20677/files
  - new: https://git.openjdk.org/jdk/pull/20677/files/28a26aed..612d3045

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=18
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=17-18

  Stats: 39 lines in 7 files changed: 22 ins; 8 del; 9 mod
  Patch: https://git.openjdk.org/jdk/pull/20677.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677

PR: https://git.openjdk.org/jdk/pull/20677

From duke at openjdk.org  Tue Sep 17 12:54:15 2024
From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=)
Date: Tue, 17 Sep 2024 12:54:15 GMT
Subject: RFR: 8339161: ZGC: Remove unused remembered sets
Message-ID: <yHFK07SWLlxEyRnbof_8OY9ZPszetezf6t83VEA2RVk=.7edd8ad8-52ea-4eb6-b037-699373804b47@github.com>

In ZGC, when a page becomes old it needs a remembered set (remset) which stores 2 bits per byte, a memory overhead of 3.125% (2/64) per page that stores an allocated remset.

When an old page is potentially freed and inserted in the page cache, it can later be re-used as a young page. In this case, the remset is still allocated even though the young page does not need it. This is especially noteworthy for long-running programs where pages are recycled for a long enough period to have a remset allocated for close to all pages.

The attached plot shows remset memory usage for pages that are "live" for a program that frequently recycles pages using a cache-mechanism. As remsets for young pages are unused, it should be considered wasted memory.

![remset_waste](https://github.com/user-attachments/assets/2a60948b-9297-4554-8fb4-d9f527855c33)

The alternative solution that I propose in this PR deletes/frees remsets when an old page is inserted into the page cache, to not wast ememory. This would mean that remsets are only stored for old pages that are in use. Pages that are not in use or are young, should not have an allocated remset. With this change, the line showing remset usage for young pages would disappear in the plot above and the total memory usage dictated by the number of "live" old pages.

Below is a performance measurement of initializing vs. clearing remsets in the same cache program mentioned above. When deleting remsets, freeing is made by GC threads so there is no latency impact for mutators. Initializing remsets is on average ~2.3x slower than clearing, but also uses 3.125% less memory for pages that do not need a remset. ~89% of the measured initializations are made by GC threads and the rest by mutator threads.

|              | min (ms) | max (ms) | mean (ms)  |
| ------------ | -------- | -------- | ---------- |
| remset init  | 0.000292 | 0.706    | 0.00258083 |
| remset clear | 0.000082 | 0.015    | 0.00111340 |

Tested with tiers 1-7 and local test making sure there are no remsets for young pages. SPECjbb2015 performance measurements show no statistically significant regression/improvement.

-------------

Commit messages:
 - Merge resize and delete for remsets
 - 8339161: ZGC: Remove unused remembered sets

Changes: https://git.openjdk.org/jdk/pull/20947/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20947&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8339161
  Stats: 95 lines in 7 files changed: 1 ins; 67 del; 27 mod
  Patch: https://git.openjdk.org/jdk/pull/20947.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20947/head:pull/20947

PR: https://git.openjdk.org/jdk/pull/20947

From kvn at openjdk.org  Tue Sep 17 16:12:24 2024
From: kvn at openjdk.org (Vladimir Kozlov)
Date: Tue, 17 Sep 2024 16:12:24 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v22]
In-Reply-To: <FqaLMxkntOx3RpFcdHQx0jjZ0NSqaUoARbcJDHnlDW4=.a37161d8-b299-4595-98bd-b2c396e5e266@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <FqaLMxkntOx3RpFcdHQx0jjZ0NSqaUoARbcJDHnlDW4=.a37161d8-b299-4595-98bd-b2c396e5e266@github.com>
Message-ID: <9qt_iRfNSfdLuraZ18LqQx_xVt7xNPF9SBRZXwWkIig=.f86597de-6440-4235-9152-55b3a08c0d89@github.com>

On Tue, 17 Sep 2024 05:20:30 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail.
>> 
>> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold:
>> 
>> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and
>> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work.
>> 
>> ## Summary of the Changes
>> 
>> ### Platform-Independent Changes (`src/hotspot/share`)
>> 
>> These consist mainly of:
>> 
>> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early;
>> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and
>> - temporary support for porting the JEP to the remaining platforms.
>> 
>> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed.
>> 
>> ### Platform-Dependent Changes (`src/hotspot/cpu`)
>> 
>> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms.
>> 
>> #### ADL Changes
>> 
>> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy.
>> 
>> #### `G1BarrierSetAssembler` Changes
>> 
>> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ...
>
> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Discard memory accesses with barrier data as implicit null check candidates

Looks good to me.

-------------

Marked as reviewed by kvn (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19746#pullrequestreview-2310210106

From rcastanedalo at openjdk.org  Wed Sep 18 07:18:12 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Wed, 18 Sep 2024 07:18:12 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v22]
In-Reply-To: <9qt_iRfNSfdLuraZ18LqQx_xVt7xNPF9SBRZXwWkIig=.f86597de-6440-4235-9152-55b3a08c0d89@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <FqaLMxkntOx3RpFcdHQx0jjZ0NSqaUoARbcJDHnlDW4=.a37161d8-b299-4595-98bd-b2c396e5e266@github.com>
 <9qt_iRfNSfdLuraZ18LqQx_xVt7xNPF9SBRZXwWkIig=.f86597de-6440-4235-9152-55b3a08c0d89@github.com>
Message-ID: <zEwq3OIUL7m947lCcsUErQJVk9rKlVYhALMjOC-h9HQ=.678f5040-6031-476b-b829-730d9a84b59d@github.com>

On Tue, 17 Sep 2024 16:09:30 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

> Looks good to me.

Thanks for reviewing, Vladimir!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2357686525

From rcastanedalo at openjdk.org  Wed Sep 18 07:49:52 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Wed, 18 Sep 2024 07:49:52 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v23]
In-Reply-To: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
Message-ID: <tmnNZFXsAOWiXareBZUTR0NSDWDyPJOm3-MfoOfwsK8=.8ebb7ccc-ef9b-478e-bbeb-28becd9c1c85@github.com>

> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail.
> 
> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold:
> 
> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and
> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work.
> 
> ## Summary of the Changes
> 
> ### Platform-Independent Changes (`src/hotspot/share`)
> 
> These consist mainly of:
> 
> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early;
> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and
> - temporary support for porting the JEP to the remaining platforms.
> 
> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed.
> 
> ### Platform-Dependent Changes (`src/hotspot/cpu`)
> 
> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms.
> 
> #### ADL Changes
> 
> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy.
> 
> #### `G1BarrierSetAssembler` Changes
> 
> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c...

Roberto Casta?eda Lozano has updated the pull request incrementally with seven additional commits since the last revision:

 - Assert that unneeded stub tmp registers are not initialized in x64 and aarch64 platforms
 - Set tmp registers to noreg by default in G1PreBarrierStubC2::initialize_registers, for consistency
 - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion
 - Restore some asserts
 - Default values for tmp regs of G1PostBarrierStubC2
 -  8334060: [arm32] Implementation of Late Barrier Expansion for G1
 - 8330685: [arm32] share barrier spilling logic

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19746/files
  - new: https://git.openjdk.org/jdk/pull/19746/files/71a51bfc..13b93bd9

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=22
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=21-22

  Stats: 614 lines in 12 files changed: 521 ins; 36 del; 57 mod
  Patch: https://git.openjdk.org/jdk/pull/19746.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746

PR: https://git.openjdk.org/jdk/pull/19746

From rcastanedalo at openjdk.org  Wed Sep 18 08:00:30 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Wed, 18 Sep 2024 08:00:30 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v23]
In-Reply-To: <tmnNZFXsAOWiXareBZUTR0NSDWDyPJOm3-MfoOfwsK8=.8ebb7ccc-ef9b-478e-bbeb-28becd9c1c85@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <tmnNZFXsAOWiXareBZUTR0NSDWDyPJOm3-MfoOfwsK8=.8ebb7ccc-ef9b-478e-bbeb-28becd9c1c85@github.com>
Message-ID: <tJsw5FSPMXWKAZ3WXlzGlcJShrMdjNJbvGqQiANonsE=.b93d6ecf-dbd5-41ad-9dfb-359db1389a16@github.com>

On Wed, 18 Sep 2024 07:49:52 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail.
>> 
>> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold:
>> 
>> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and
>> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work.
>> 
>> ## Summary of the Changes
>> 
>> ### Platform-Independent Changes (`src/hotspot/share`)
>> 
>> These consist mainly of:
>> 
>> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early;
>> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and
>> - temporary support for porting the JEP to the remaining platforms.
>> 
>> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed.
>> 
>> ### Platform-Dependent Changes (`src/hotspot/cpu`)
>> 
>> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms.
>> 
>> #### ADL Changes
>> 
>> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy.
>> 
>> #### `G1BarrierSetAssembler` Changes
>> 
>> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ...
>
> Roberto Casta?eda Lozano has updated the pull request incrementally with seven additional commits since the last revision:
> 
>  - Assert that unneeded stub tmp registers are not initialized in x64 and aarch64 platforms
>  - Set tmp registers to noreg by default in G1PreBarrierStubC2::initialize_registers, for consistency
>  - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion
>  - Restore some asserts
>  - Default values for tmp regs of G1PostBarrierStubC2
>  -  8334060: [arm32] Implementation of Late Barrier Expansion for G1
>  - 8330685: [arm32] share barrier spilling logic

Thanks for the arm 32-bits port @snazarkin! Merged in commit 3957c03f.
Besides the arm 32-bits port, @snazarkin's changeset includes adding the possibility to use a third temporary register in the platform-independent class `G1PostBarrierStubC2`. This temporary register (`G1PostBarrierStubC2::_tmp3`) is initialized to `noreg` by default in `G1PostBarrierStubC2::initialize_registers`, so no other platform should be affected.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2357765066

From rkennke at openjdk.org  Wed Sep 18 12:11:31 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Wed, 18 Sep 2024 12:11:31 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v15]
In-Reply-To: <6UJOrZqmfsJj6pRzMjPdlYt191QgBV6fIv1qJAYsv60=.15284272-464f-4321-b76c-3412dafc6c63@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <CCz9B03dUxzua_YHk9-nyyHS_FTa6axwrwYjACcz4Y0=.6052f53c-9ddb-4868-a817-aff4276a2b6a@github.com>
 <6UJOrZqmfsJj6pRzMjPdlYt191QgBV6fIv1qJAYsv60=.15284272-464f-4321-b76c-3412dafc6c63@github.com>
Message-ID: <poutu1_cWKZJS_aUjdHdxXdoYBb0nS5Zohb1-_UwFiY=.403c84c9-a971-4b16-b436-e34cc9654321@github.com>

On Mon, 16 Sep 2024 06:53:42 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Various touch-ups
>
> src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2576:
> 
>> 2574:   } else {
>> 2575:     lea(dst, Address(obj, index, Address::lsl(scale)));
>> 2576:     ldr(dst, Address(dst, offset));
> 
> Do you have a reproducer (or, better yet, a test case) that exercises this case? I ran Oracle's internal CI tiers 1-5 and could never hit it. Could this happen for x64 as well?

AFAIK, this happens only when using compressed oops with a heap-base in r27. When running with that setting, we would get addresses like r27[nklass] or r27[nklass]+offset, both with scale=8. You would need large heaps, perhaps >4GB, to get this coops setting. The problem with aarch64 is that we can't have an address like  r27[nklass]+offset, that's why we need to lea the r27[nklass] part first.
Yes, this also happens on x86, but x86 supports  rX[nklass]+offset addressing.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1764937842

From rkennke at openjdk.org  Wed Sep 18 12:25:50 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Wed, 18 Sep 2024 12:25:50 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v20]
In-Reply-To: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
Message-ID: <1o2b4fxBhqrlRqkNwKqZD1mgRNfTM16_NHZweEbd9SI=.1f68868b-1b98-4f78-9d37-2a805ffc932b@github.com>

> This is the main body of the JEP 450: Compact Object Headers (Experimental).
> 
> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
> 
> Main changes:
>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>  - Arrays will now store their length at offset 8.
>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv...

Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 60 commits:

 - Merge remote-tracking branch 'origin/master' into JDK-8305895-v4
 - CompressedKlassPointers::is_encodable shall be callable with -UseCCP
 - Johan review feedback
 - fix CompressedClassPointersEncodingScheme yet again for linux aarch64
 - Fixes post-8340184
 - Merge upstream up to and including 8340184
 - Merge remote-tracking branch 'origin/master' into JDK-8305895-v4
 - Fix  test/hotspot/jtreg/runtime/CompressedOops/CompressedClassPointersEncodingScheme.java
 - Fix loop on aarch64
 - clarify obscure assert in metasapce setup
 - ... and 50 more: https://git.openjdk.org/jdk/compare/19b2cee4...bb641621

-------------

Changes: https://git.openjdk.org/jdk/pull/20677/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=19
  Stats: 4525 lines in 190 files changed: 3194 ins; 718 del; 613 mod
  Patch: https://git.openjdk.org/jdk/pull/20677.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677

PR: https://git.openjdk.org/jdk/pull/20677

From yzheng at openjdk.org  Wed Sep 18 12:25:51 2024
From: yzheng at openjdk.org (Yudi Zheng)
Date: Wed, 18 Sep 2024 12:25:51 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v19]
In-Reply-To: <FENW_mCsovAzl3Jpaf-3Sz4Lm8PWbCYlDHEqrDT4qjM=.0f01c195-d868-4ba0-a3f6-2bdf8e95e510@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <FENW_mCsovAzl3Jpaf-3Sz4Lm8PWbCYlDHEqrDT4qjM=.0f01c195-d868-4ba0-a3f6-2bdf8e95e510@github.com>
Message-ID: <RfVmBzOwNMSLLg29JMF6QNg_5I_VBr4sYTepEVylQc8=.02dbd983-ddb2-4602-914f-1d0e13c05e29@github.com>

On Tue, 17 Sep 2024 12:52:03 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> This is the main body of the JEP 450: Compact Object Headers (Experimental).
>> 
>> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
>> 
>> Main changes:
>>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>>  - Arrays will now store their length at offset 8.
>>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co...
>
> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - CompressedKlassPointers::is_encodable shall be callable with -UseCCP
>  - Johan review feedback

Could you please cherry pick https://github.com/mur47x111/jdk/commit/c45ebc2a89d0b25a3dd8cc46386e37a635ff9af2 for the JVMCI support?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2358324621

From rkennke at openjdk.org  Wed Sep 18 12:38:21 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Wed, 18 Sep 2024 12:38:21 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v6]
In-Reply-To: <xyQ56u8rEyLnbVv9iP23zziwfEkc0v9bm1MGVQ8BUHY=.352e102f-41d3-4ef6-86d2-2ebc998671d1@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <BGmZ04yibLn_KtPyQrA3sQv5OtMonrcDy3aN0vAj8NA=.5de6cae3-3a05-4a74-a15c-aade4176bdeb@github.com>
 <Og4I5G3w51d7zF27JawLM9pm_XpHwijYt3lJZ4hpeWQ=.fd974e82-2ce0-4ebd-b49e-108ab87690a9@github.com>
 <S-95OBPOZb6GfBEUIDtHxCB4hTFPB2uNf5YmwMerGcE=.83afc323-883e-4ae0-8d10-fe7030cddb33@github.com>
 <ZnddOJnCLP0HABu3ymeWaAvcJNd3mKz8AhFwOtUAYW8=.f534bbd8-c222-4cd7-b6a9-3d6fd644c5e1@github.com>
 <SKA53xBzGbvMWiQxBtEYG9QB2KqZOKEBMzJzbQb_Zr8=.0cd24aa4-3aed-46e6-9092-bbc4b926bb42@github.com>
 <xyQ56u8rEyLnbVv9iP23zziwfEkc0v9bm1MGVQ8BUHY=.352e102f-41d3-4ef6-86d2-2ebc998671d1@github.com>
Message-ID: <kUw1IOXnjaF1IqloewVZ8vlWvB2qGcSCSgPFCUtK6L0=.997c885c-48b1-4c1c-9c06-4d2eacaf86ea@github.com>

On Mon, 9 Sep 2024 19:04:13 GMT, Chris Plummer <cjplummer at openjdk.org> wrote:

>> I pulled your changes and I see one slight difference in the output. The following line is missing:
>> 
>> `_metadata._compressed_klass: InstanceKlass for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject`
>> 
>> I realize that there is no `_metadata._compressed_klass` when you have compact headers, and that the Klass* is encoded in the `_mark` word, which is now looks something like this in the output:
>> 
>> _mark: 16294762323640321
>> 
>> So you can say that the Klass* is embedded in the _mark work, but this isn't of much help to SA users. I think what is expected is that the visitor is passed a MetadataField object that when getValue() is called on it, the Klass mirror is returned. Maybe we need a new CompactKlassField type like we current have a NarrowKlassField field type, and it will do the decoding of the _mark work into a Klass. The current getKlass() is related to this.
>
> Thinking about this a bit more, maybe _mark needs to be a MetadataFile rather than CInt. This is a kind of odd situation. Basically we have a CInt field that is more than just simple bits used as flags or small integers. It also gets you to the Klass*. Possibly SA should treat _mark is two seprate fields; one that remains a CInt as it currently is and another that treats it as an encoded Klass* like the NarrowKlassField case.

Do you think this needs to be addressed before integration? And if so, could you help with implementation? Or could we do it after intergration? Then please file a follow-up issue.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1764976086

From rkennke at openjdk.org  Wed Sep 18 12:59:25 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Wed, 18 Sep 2024 12:59:25 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v9]
In-Reply-To: <NR6CKOtdUaVarB7hiAUIAsIMe03qiTQr5VcEOWGt8mA=.164efc10-f0ff-4ddf-80b8-c56599c0ba7d@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com>
 <NR6CKOtdUaVarB7hiAUIAsIMe03qiTQr5VcEOWGt8mA=.164efc10-f0ff-4ddf-80b8-c56599c0ba7d@github.com>
Message-ID: <BNQiyA0NCBO-oFZU4Nz3yvaMxDzICFxfqDKWt4wCL_U=.9405414d-f961-4168-8d26-ef5f58069161@github.com>

On Mon, 9 Sep 2024 18:30:21 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

>> Roman Kennke has updated the pull request incrementally with six additional commits since the last revision:
>> 
>>  - Print as warning when UCOH doesn't match in CDS archive
>>  - Improve initialization of mark-word in CDS ArchiveHeapWriter
>>  - Simplify getKlass() in SA
>>  - Simplify oopDesc::init_mark()
>>  - Get rid of forward_safe_* methods
>>  - GCForwarding touch-ups
>
> src/hotspot/share/oops/markWord.inline.hpp line 90:
> 
>> 88:   ShouldNotReachHere();
>> 89:   return markWord();
>> 90: #endif
> 
> Is the ifdef _LP64 necessary, since UseCompactObjectHeaders should always be false for 32 bits?

Kindof. The problem is that klass_shift is larger than 31, and shifting with it would thus be UB and generate a compiler warning. I opted to simply not compile any of that code in 32bit builds. We could also define klass_shift differently on 32bit.
Long-term (maybe with Lilliput2/4-byte-headers?) it would be nice to consolidate the header layout between 32 and 64 bit builds and not make any distinction anywhere. E.g. define markWord (or objectHeader?) in a single way, and use that to extract all the relevant stuff. It's not totally unlikely that we deprecate 32-bit builds before that can happen, though.

> src/hotspot/share/oops/oop.inline.hpp line 90:
> 
>> 88:   } else {
>> 89:     return markWord::prototype();
>> 90:   }
> 
> Could this be unconditional since prototoype_header is initialized for all Klasses?

yes, but there is ongoing effort (at Oracle) to get rid of ```Klass::_prototype_header``` altogether. Let's wait for that and see how it looks then.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1765003983
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1765006669

From rkennke at openjdk.org  Wed Sep 18 13:23:44 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Wed, 18 Sep 2024 13:23:44 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v21]
In-Reply-To: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
Message-ID: <TCFshDmOrJjN0zTh1X9BGEa_rczyMqRbyDIiYsFr_5I=.ea051371-4c86-461e-8721-83b32caf808f@github.com>

> This is the main body of the JEP 450: Compact Object Headers (Experimental).
> 
> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
> 
> Main changes:
>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>  - Arrays will now store their length at offset 8.
>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv...

Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:

  JVMCI support

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20677/files
  - new: https://git.openjdk.org/jdk/pull/20677/files/bb641621..9ad2e62f

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=20
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=19-20

  Stats: 22 lines in 6 files changed: 16 ins; 0 del; 6 mod
  Patch: https://git.openjdk.org/jdk/pull/20677.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677

PR: https://git.openjdk.org/jdk/pull/20677

From shade at openjdk.org  Wed Sep 18 13:55:37 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Wed, 18 Sep 2024 13:55:37 GMT
Subject: RFR: 8340381: Shenandoah: Class mirrors verification should check
 forwarded objects
Message-ID: <9vV2xnuP2lgRCLLbB5LWnIg26HtPjS7BOIyt0qaLkwg=.d7975d49-c70b-43e5-89cb-ef1b4f86ac52@github.com>

The from-space objects can be effectively dead, and their backlinks to `InstanceKlass*` not updated anymore. So they can point to garbage. 

Additional testing:
 - [x] Some previously failing reproducers are not failing anymore
 - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah`

-------------

Commit messages:
 - Fix

Changes: https://git.openjdk.org/jdk/pull/21064/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21064&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8340381
  Stats: 22 lines in 2 files changed: 9 ins; 0 del; 13 mod
  Patch: https://git.openjdk.org/jdk/pull/21064.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/21064/head:pull/21064

PR: https://git.openjdk.org/jdk/pull/21064

From stuefe at openjdk.org  Wed Sep 18 14:00:25 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Wed, 18 Sep 2024 14:00:25 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v11]
In-Reply-To: <b8u6PaStO4nt0AGnbZEkmXkzQKzXaB2c4lSiC9JWdTY=.a5805ef2-84d8-44c0-9dc0-3e90da3460aa@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <DUgk6HT_Zk5KnbmAd_HzrJi4B8D1J-uuoAGzReWyI8s=.6c04f90a-bf1d-49af-a31d-5224973952d4@github.com>
 <b8u6PaStO4nt0AGnbZEkmXkzQKzXaB2c4lSiC9JWdTY=.a5805ef2-84d8-44c0-9dc0-3e90da3460aa@github.com>
Message-ID: <m9Xj-C2ZuCBJfaOrr8zH59Ny5LDERRs-Lw5oVzDGvII=.daaae9b4-b1a3-491c-a5d0-9e327443e3bd@github.com>

On Wed, 11 Sep 2024 12:27:14 GMT, Johan Sj?len <jsjolen at openjdk.org> wrote:

>> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Fix FullGCForwarding initialization
>
> src/hotspot/share/memory/classLoaderMetaspace.cpp line 87:
> 
>> 85:         klass_alignment_words,
>> 86:         "class arena");
>> 87:   }
> 
> As per my comment in the header file, change the code to this:
> 
> ```c++
> if (class_context != nullptr) {
>   // ... Same as in PR
> } else {
>   _class_space_arena = _non_class_space_arena;
> }

Rather not, see reasoning under https://github.com/openjdk/jdk/pull/20677/files#r1754330432

> src/hotspot/share/memory/classLoaderMetaspace.cpp line 118:
> 
>> 116: #ifdef ASSERT
>> 117:   if (result.is_nonempty()) {
>> 118:     const bool in_class_arena = class_space_arena() != nullptr ? class_space_arena()->contains(result) : false;
> 
> Unnecessary nullptr check if you take my suggestion, or you should switch to `have_class_space_arena`.

See reasoning under https://github.com/openjdk/jdk/pull/20677/files#r1754335269

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1765113297
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1765113850

From cjplummer at openjdk.org  Wed Sep 18 16:41:20 2024
From: cjplummer at openjdk.org (Chris Plummer)
Date: Wed, 18 Sep 2024 16:41:20 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v6]
In-Reply-To: <kUw1IOXnjaF1IqloewVZ8vlWvB2qGcSCSgPFCUtK6L0=.997c885c-48b1-4c1c-9c06-4d2eacaf86ea@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <BGmZ04yibLn_KtPyQrA3sQv5OtMonrcDy3aN0vAj8NA=.5de6cae3-3a05-4a74-a15c-aade4176bdeb@github.com>
 <Og4I5G3w51d7zF27JawLM9pm_XpHwijYt3lJZ4hpeWQ=.fd974e82-2ce0-4ebd-b49e-108ab87690a9@github.com>
 <S-95OBPOZb6GfBEUIDtHxCB4hTFPB2uNf5YmwMerGcE=.83afc323-883e-4ae0-8d10-fe7030cddb33@github.com>
 <ZnddOJnCLP0HABu3ymeWaAvcJNd3mKz8AhFwOtUAYW8=.f534bbd8-c222-4cd7-b6a9-3d6fd644c5e1@github.com>
 <SKA53xBzGbvMWiQxBtEYG9QB2KqZOKEBMzJzbQb_Zr8=.0cd24aa4-3aed-46e6-9092-bbc4b926bb42@github.com>
 <xyQ56u8rEyLnbVv9iP23zziwfEkc0v9bm1MGVQ8BUHY=.352e102f-41d3-4ef6-86d2-2ebc998671d1@github.com>
 <kUw1IOXnjaF1IqloewVZ8vlWvB2qGcSCSgPFCUtK6L0=.997c885c-48b1-4c1c-9c06-4d2eacaf86ea@github.com>
Message-ID: <N6Anv99FjOg7Ebitb2vL8NVA5-dv0gEzLduMfQz7hmc=.eae44a47-a460-4a6a-845b-4c02681ebb87@github.com>

On Wed, 18 Sep 2024 12:35:28 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> Thinking about this a bit more, maybe _mark needs to be a MetadataField rather than CInt. This is a kind of odd situation. Basically we have a CInt field that is more than just simple bits used as flags or small integers. It also gets you to the Klass*. Possibly SA should treat _mark is two separate fields; one that remains a CInt as it currently is and another that treats it as an encoded Klass* like the NarrowKlassField case.
>
> Do you think this needs to be addressed before integration? And if so, could you help with implementation? Or could we do it after intergration? Then please file a follow-up issue.

Ok. I filed [JDK-8340396](https://bugs.openjdk.org/browse/JDK-8340396).

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1765387764

From rcastanedalo at openjdk.org  Wed Sep 18 17:45:51 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Wed, 18 Sep 2024 17:45:51 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v24]
In-Reply-To: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
Message-ID: <-TKP8SOh4eIFRiOcn-a3aRb_GEgApaWv9vhyEOiKJSw=.d4ade865-3cb1-477d-a7c7-f46449553a64@github.com>

> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail.
> 
> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold:
> 
> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and
> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work.
> 
> ## Summary of the Changes
> 
> ### Platform-Independent Changes (`src/hotspot/share`)
> 
> These consist mainly of:
> 
> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early;
> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and
> - temporary support for porting the JEP to the remaining platforms.
> 
> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed.
> 
> ### Platform-Dependent Changes (`src/hotspot/cpu`)
> 
> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms.
> 
> #### ADL Changes
> 
> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy.
> 
> #### `G1BarrierSetAssembler` Changes
> 
> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c...

Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision:

 - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion
 - Remove redundant comment

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19746/files
  - new: https://git.openjdk.org/jdk/pull/19746/files/13b93bd9..d54d67f1

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=23
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=22-23

  Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/19746.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746

PR: https://git.openjdk.org/jdk/pull/19746

From aboldtch at openjdk.org  Wed Sep 18 18:41:09 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Wed, 18 Sep 2024 18:41:09 GMT
Subject: RFR: 8339161: ZGC: Remove unused remembered sets
In-Reply-To: <yHFK07SWLlxEyRnbof_8OY9ZPszetezf6t83VEA2RVk=.7edd8ad8-52ea-4eb6-b037-699373804b47@github.com>
References: <yHFK07SWLlxEyRnbof_8OY9ZPszetezf6t83VEA2RVk=.7edd8ad8-52ea-4eb6-b037-699373804b47@github.com>
Message-ID: <Aj22qsQpIcj8Mk6dW9ykOHq374R517RV8OYcIiqrK-4=.6471b081-7c66-4743-86f7-d74180821076@github.com>

On Wed, 11 Sep 2024 12:15:47 GMT, Joel Sikstr?m <duke at openjdk.org> wrote:

> In ZGC, when a page becomes old it needs a remembered set (remset) which stores 2 bits per byte, a memory overhead of 3.125% (2/64) per page that stores an allocated remset.
> 
> When an old page is potentially freed and inserted in the page cache, it can later be re-used as a young page. In this case, the remset is still allocated even though the young page does not need it. This is especially noteworthy for long-running programs where pages are recycled for a long enough period to have a remset allocated for close to all pages.
> 
> The attached plot shows remset memory usage for pages that are "live" for a program that frequently recycles pages using a cache-mechanism. As remsets for young pages are unused, it should be considered wasted memory.
> 
> ![remset_waste](https://github.com/user-attachments/assets/2a60948b-9297-4554-8fb4-d9f527855c33)
> 
> The alternative solution that I propose in this PR deletes/frees remsets when an old page is inserted into the page cache, to not wast ememory. This would mean that remsets are only stored for old pages that are in use. Pages that are not in use or are young, should not have an allocated remset. With this change, the line showing remset usage for young pages would disappear in the plot above and the total memory usage dictated by the number of "live" old pages.
> 
> Below is a performance measurement of initializing vs. clearing remsets in the same cache program mentioned above. When deleting remsets, freeing is made by GC threads so there is no latency impact for mutators. Initializing remsets is on average ~2.3x slower than clearing, but also uses 3.125% less memory for pages that do not need a remset. ~89% of the measured initializations are made by GC threads and the rest by mutator threads.
> 
> |              | min (ms) | max (ms) | mean (ms)  |
> | ------------ | -------- | -------- | ---------- |
> | remset init  | 0.000292 | 0.706    | 0.00258083 |
> | remset clear | 0.000082 | 0.015    | 0.00111340 |
> 
> Tested with tiers 1-7 and local test making sure there are no remsets for young pages. SPECjbb2015 performance measurements show no statistically significant regression/improvement.

lgtm. Nicely done.

-------------

Marked as reviewed by aboldtch (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20947#pullrequestreview-2313491254

From wkemper at openjdk.org  Wed Sep 18 21:09:19 2024
From: wkemper at openjdk.org (William Kemper)
Date: Wed, 18 Sep 2024 21:09:19 GMT
Subject: RFR: 8340400: Shenandoah: Whitebox breakpoint GC requests may cause
 assertions
Message-ID: <eFIBip-WT-4JsLJIJ9F-L66700DZjmakU3P4ZRcWjL0=.712ae3a2-65c6-4f78-a2f5-ab8a92a8aa3a@github.com>

When a test requests a concurrent GC breakpoint, the calling thread arranges for itself to block until the concurrent GC thread notifies it that the GC has reached the requested breakpoint (phase). The code that handles the whitebox breakpoint request should therefore not block the caller. An attempt was made to do this, but the request just has the caller thread run in a busy loop without waiting. What's more, this loop resets the requested gc cause on every iteration, which may lead to gc cycles with a wb_breakpoint cause, but no breakpoint set - which violates assertions.

-------------

Commit messages:
 - Do not block whitebox breakpoint requests for gc

Changes: https://git.openjdk.org/jdk/pull/21074/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21074&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8340400
  Stats: 13 lines in 1 file changed: 10 ins; 2 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/21074.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/21074/head:pull/21074

PR: https://git.openjdk.org/jdk/pull/21074

From coleenp at openjdk.org  Thu Sep 19 00:04:45 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Thu, 19 Sep 2024 00:04:45 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v21]
In-Reply-To: <TCFshDmOrJjN0zTh1X9BGEa_rczyMqRbyDIiYsFr_5I=.ea051371-4c86-461e-8721-83b32caf808f@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <TCFshDmOrJjN0zTh1X9BGEa_rczyMqRbyDIiYsFr_5I=.ea051371-4c86-461e-8721-83b32caf808f@github.com>
Message-ID: <DDy_r4_vlMDIhKmbAYCyQj4Uu43soArfL9tAJoDX9-I=.b14baeb8-a7ef-4144-8a9c-de8d733e1c4d@github.com>

On Wed, 18 Sep 2024 13:23:44 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> This is the main body of the JEP 450: Compact Object Headers (Experimental).
>> 
>> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
>> 
>> Main changes:
>>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>>  - Arrays will now store their length at offset 8.
>>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co...
>
> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
> 
>   JVMCI support

src/hotspot/share/oops/compressedKlass.cpp line 242:

> 240:   } else {
> 241: 
> 242:     // Traditional (non-compact) header mode)

Extra )

src/hotspot/share/oops/compressedKlass.hpp line 175:

> 173:   //   5b) if CDS=off: Calls initialize() - here, we have more freedom and, if we want, can choose an encoding
> 174:   //       base that differs from the reservation base from step (4). That allows us, e.g., to later use
> 175:   //       zero-based encoding.

Not for this but is there really any benefit for zero based encoding for klass ids?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1765888065
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1765889975

From coleenp at openjdk.org  Thu Sep 19 00:04:46 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Thu, 19 Sep 2024 00:04:46 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v9]
In-Reply-To: <BNQiyA0NCBO-oFZU4Nz3yvaMxDzICFxfqDKWt4wCL_U=.9405414d-f961-4168-8d26-ef5f58069161@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com>
 <NR6CKOtdUaVarB7hiAUIAsIMe03qiTQr5VcEOWGt8mA=.164efc10-f0ff-4ddf-80b8-c56599c0ba7d@github.com>
 <BNQiyA0NCBO-oFZU4Nz3yvaMxDzICFxfqDKWt4wCL_U=.9405414d-f961-4168-8d26-ef5f58069161@github.com>
Message-ID: <igBchd3diNsHdnQwZS2sj4tjaKejOq4f4EZovOmpmRQ=.abf79fee-9592-4555-ac01-de0170665ae4@github.com>

On Wed, 18 Sep 2024 12:56:16 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> src/hotspot/share/oops/oop.inline.hpp line 90:
>> 
>>> 88:   } else {
>>> 89:     return markWord::prototype();
>>> 90:   }
>> 
>> Could this be unconditional since prototoype_header is initialized for all Klasses?
>
> yes, but there is ongoing effort (at Oracle) to get rid of ```Klass::_prototype_header``` altogether. Let's wait for that and see how it looks then.

Yes, I saw that patch.  I'm not sure I like the idea of cpu dependent code also doing the encoding.  There were some C2 changes related to it that I didn't understand if that scheme required them.  I don't see the down side to having the prototype header pre-encoded in the markWord.  Seems simpler.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1765893566

From zgu at openjdk.org  Thu Sep 19 00:40:21 2024
From: zgu at openjdk.org (Zhengyu Gu)
Date: Thu, 19 Sep 2024 00:40:21 GMT
Subject: RFR: 8340408: Shenandoah: Remove redundant task stats printing code in
 ShenandoahTaskQueue
Message-ID: <6PiT6kRNGaQjkJTNknVTqMV14SyYXwFosAz9Ts3Emfk=.46c01286-6733-424c-a0d6-e6ff8aa4726c@github.com>

[JDK-8280397](https://bugs.openjdk.org/browse/JDK-8280397) made the code redundant. 

Adopt shared implementation.

-------------

Commit messages:
 - 8340408: Shenandoah: Remove redundant task stats printing code in ShenandoahTaskQueue

Changes: https://git.openjdk.org/jdk/pull/21077/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21077&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8340408
  Stats: 49 lines in 4 files changed: 0 ins; 47 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/21077.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/21077/head:pull/21077

PR: https://git.openjdk.org/jdk/pull/21077

From stefank at openjdk.org  Thu Sep 19 05:00:37 2024
From: stefank at openjdk.org (Stefan Karlsson)
Date: Thu, 19 Sep 2024 05:00:37 GMT
Subject: RFR: 8339161: ZGC: Remove unused remembered sets
In-Reply-To: <yHFK07SWLlxEyRnbof_8OY9ZPszetezf6t83VEA2RVk=.7edd8ad8-52ea-4eb6-b037-699373804b47@github.com>
References: <yHFK07SWLlxEyRnbof_8OY9ZPszetezf6t83VEA2RVk=.7edd8ad8-52ea-4eb6-b037-699373804b47@github.com>
Message-ID: <SrSTiI6mPtlBOJAJFoLzdO7fur0bTegL_embH62S830=.c753a296-72ba-4b41-b8b2-c47f325ab611@github.com>

On Wed, 11 Sep 2024 12:15:47 GMT, Joel Sikstr?m <duke at openjdk.org> wrote:

> In ZGC, when a page becomes old it needs a remembered set (remset) which stores 2 bits per byte, a memory overhead of 3.125% (2/64) per page that stores an allocated remset.
> 
> When an old page is potentially freed and inserted in the page cache, it can later be re-used as a young page. In this case, the remset is still allocated even though the young page does not need it. This is especially noteworthy for long-running programs where pages are recycled for a long enough period to have a remset allocated for close to all pages.
> 
> The attached plot shows remset memory usage for pages that are "live" for a program that frequently recycles pages using a cache-mechanism. As remsets for young pages are unused, it should be considered wasted memory.
> 
> ![remset_waste](https://github.com/user-attachments/assets/2a60948b-9297-4554-8fb4-d9f527855c33)
> 
> The alternative solution that I propose in this PR deletes/frees remsets when an old page is inserted into the page cache, to not wast ememory. This would mean that remsets are only stored for old pages that are in use. Pages that are not in use or are young, should not have an allocated remset. With this change, the line showing remset usage for young pages would disappear in the plot above and the total memory usage dictated by the number of "live" old pages.
> 
> Below is a performance measurement of initializing vs. clearing remsets in the same cache program mentioned above. When deleting remsets, freeing is made by GC threads so there is no latency impact for mutators. Initializing remsets is on average ~2.3x slower than clearing, but also uses 3.125% less memory for pages that do not need a remset. ~89% of the measured initializations are made by GC threads and the rest by mutator threads.
> 
> |              | min (ms) | max (ms) | mean (ms)  |
> | ------------ | -------- | -------- | ---------- |
> | remset init  | 0.000292 | 0.706    | 0.00258083 |
> | remset clear | 0.000082 | 0.015    | 0.00111340 |
> 
> Tested with tiers 1-7 and local test making sure there are no remsets for young pages. SPECjbb2015 performance measurements show no statistically significant regression/improvement.

Looks good! Thanks for fixing.

-------------

Marked as reviewed by stefank (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20947#pullrequestreview-2314405140

From stefank at openjdk.org  Thu Sep 19 05:06:51 2024
From: stefank at openjdk.org (Stefan Karlsson)
Date: Thu, 19 Sep 2024 05:06:51 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v9]
In-Reply-To: <igBchd3diNsHdnQwZS2sj4tjaKejOq4f4EZovOmpmRQ=.abf79fee-9592-4555-ac01-de0170665ae4@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com>
 <NR6CKOtdUaVarB7hiAUIAsIMe03qiTQr5VcEOWGt8mA=.164efc10-f0ff-4ddf-80b8-c56599c0ba7d@github.com>
 <BNQiyA0NCBO-oFZU4Nz3yvaMxDzICFxfqDKWt4wCL_U=.9405414d-f961-4168-8d26-ef5f58069161@github.com>
 <igBchd3diNsHdnQwZS2sj4tjaKejOq4f4EZovOmpmRQ=.abf79fee-9592-4555-ac01-de0170665ae4@github.com>
Message-ID: <BkuKSFHleaF8I2tWCDIC-rGlWnfGhbK1zQ6CLlB-rVE=.9c7976e3-b0e9-4356-8d4b-d880aa30bffa@github.com>

On Wed, 18 Sep 2024 23:59:39 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

>> yes, but there is ongoing effort (at Oracle) to get rid of ```Klass::_prototype_header``` altogether. Let's wait for that and see how it looks then.
>
> Yes, I saw that patch.  I'm not sure I like the idea of cpu dependent code also doing the encoding.  There were some C2 changes related to it that I didn't understand if that scheme required them.  I don't see the down side to having the prototype header pre-encoded in the markWord.  Seems simpler.

We already have a cpu dependent code for both C1 and the interpreter. Adding cpu dependent code to C2 doesn't make it significantly worse. My latest patch also refactors the code so that C1, interpreter, and C2 all calls into shared functions in the macro assembler.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766163092

From stefank at openjdk.org  Thu Sep 19 05:53:48 2024
From: stefank at openjdk.org (Stefan Karlsson)
Date: Thu, 19 Sep 2024 05:53:48 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v21]
In-Reply-To: <TCFshDmOrJjN0zTh1X9BGEa_rczyMqRbyDIiYsFr_5I=.ea051371-4c86-461e-8721-83b32caf808f@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <TCFshDmOrJjN0zTh1X9BGEa_rczyMqRbyDIiYsFr_5I=.ea051371-4c86-461e-8721-83b32caf808f@github.com>
Message-ID: <qPL-XuWVkfNYFmfYatyQZwbHismNzW_jXdvkDvIiVNc=.d4fd26a8-f509-494a-bd3d-0c6f37ba5269@github.com>

On Wed, 18 Sep 2024 13:23:44 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> This is the main body of the JEP 450: Compact Object Headers (Experimental).
>> 
>> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
>> 
>> Main changes:
>>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>>  - Arrays will now store their length at offset 8.
>>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co...
>
> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
> 
>   JVMCI support

src/hotspot/share/gc/parallel/psParallelCompact.cpp line 787:

> 785:     // The gap is always equal to min-fill-size, so nothing to do.
> 786:     return;
> 787:   }

Reading the comment, it is not obvious that this is correct if you set MinObjectAlignment to something larger than the default value:

void PSParallelCompact::fill_dense_prefix_end(SpaceId id) {
  // Comparing two sizes to decide if filling is required:
  //
  // The size of the filler (min-obj-size) is 2 heap words with the default
  // MinObjAlignment, since both markword and klass take 1 heap word.
  //
  // The size of the gap (if any) right before dense-prefix-end is
  // MinObjAlignment.
  //
  // Need to fill in the gap only if it's smaller than min-obj-size, and the
  // filler obj will extend to next region.

  // Note: If min-fill-size decreases to 1, this whole method becomes redundant.
  if (UseCompactObjectHeaders) {
    // The gap is always equal to min-fill-size, so nothing to do.
    return;
  }
  assert(CollectedHeap::min_fill_size() >= 2, "inv");

src/hotspot/share/oops/compressedKlass.cpp line 231:

> 229:     // The reason is that we want to avoid, if possible, shifts larger than
> 230:     // a cacheline size.
> 231:     _base = addr;

Why is this important?

src/hotspot/share/oops/compressedKlass.hpp line 261:

> 259:   }
> 260: 
> 261: };

Missing blank line before `#endif`

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766185665
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766192688
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766193355

From stefank at openjdk.org  Thu Sep 19 05:53:49 2024
From: stefank at openjdk.org (Stefan Karlsson)
Date: Thu, 19 Sep 2024 05:53:49 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v21]
In-Reply-To: <qPL-XuWVkfNYFmfYatyQZwbHismNzW_jXdvkDvIiVNc=.d4fd26a8-f509-494a-bd3d-0c6f37ba5269@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <TCFshDmOrJjN0zTh1X9BGEa_rczyMqRbyDIiYsFr_5I=.ea051371-4c86-461e-8721-83b32caf808f@github.com>
 <qPL-XuWVkfNYFmfYatyQZwbHismNzW_jXdvkDvIiVNc=.d4fd26a8-f509-494a-bd3d-0c6f37ba5269@github.com>
Message-ID: <-UEFgAIQjGBginN0JRoyuwwJLmDssUEQGE_tCP-tRkw=.01ef3f37-01fa-4931-b4f3-571d21252bbd@github.com>

On Thu, 19 Sep 2024 05:35:34 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:

>> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   JVMCI support
>
> src/hotspot/share/gc/parallel/psParallelCompact.cpp line 787:
> 
>> 785:     // The gap is always equal to min-fill-size, so nothing to do.
>> 786:     return;
>> 787:   }
> 
> Reading the comment, it is not obvious that this is correct if you set MinObjectAlignment to something larger than the default value:
> 
> void PSParallelCompact::fill_dense_prefix_end(SpaceId id) {
>   // Comparing two sizes to decide if filling is required:
>   //
>   // The size of the filler (min-obj-size) is 2 heap words with the default
>   // MinObjAlignment, since both markword and klass take 1 heap word.
>   //
>   // The size of the gap (if any) right before dense-prefix-end is
>   // MinObjAlignment.
>   //
>   // Need to fill in the gap only if it's smaller than min-obj-size, and the
>   // filler obj will extend to next region.
> 
>   // Note: If min-fill-size decreases to 1, this whole method becomes redundant.
>   if (UseCompactObjectHeaders) {
>     // The gap is always equal to min-fill-size, so nothing to do.
>     return;
>   }
>   assert(CollectedHeap::min_fill_size() >= 2, "inv");

Style note: The added code is inserted between a comment and the code that the comment refers to. It would be nice to tidy this up.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766186545

From shade at openjdk.org  Thu Sep 19 05:58:38 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Thu, 19 Sep 2024 05:58:38 GMT
Subject: RFR: 8340183: Shenandoah: Incorrect match for clone barrier in
 is_gc_barrier_node
In-Reply-To: <lMJ7zz1QZt3FMIqVLhEpYe6aqMc_uqU_d0P2mWQI4IY=.136a8eb3-b7ae-43f4-b268-5f7a6d189e09@github.com>
References: <lMJ7zz1QZt3FMIqVLhEpYe6aqMc_uqU_d0P2mWQI4IY=.136a8eb3-b7ae-43f4-b268-5f7a6d189e09@github.com>
Message-ID: <pYs0mZTeChFp6f7DfK1cU45Bx4-PM3ddUJxqsgFK-TA=.e062b4d5-d26e-409a-ad58-91a6addfb30d@github.com>

On Mon, 16 Sep 2024 10:35:15 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> The name of the call we emit is "shenandoah_clone":
> https://github.com/openjdk/jdk/blob/545951889c1ea68646be600decaf2bf4c049600b/src/hotspot/share/gc/shenandoah/c2/shenandoahBarrierSetC2.cpp#L806
> 
> ...yet we test for "shenandoah_clone_barrier" here:
> https://github.com/openjdk/jdk/blob/545951889c1ea68646be600decaf2bf4c049600b/src/hotspot/share/gc/shenandoah/c2/shenandoahBarrierSetC2.cpp#L688
> 
> I think we are better off polling the call target instead of relying on call name. This change also eliminates `shenandoah_cas_obj` matcher, for which we do not have the emitted call ever since we started doing CAS expansions inline.
> 
> Additional testing:
>  - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah`
>  - [ ] Linux x86_64 server fastdebug, `all` with `-XX:+UseShenandoahGC`

Need another review here. @rkennke, maybe?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21014#issuecomment-2360040048

From shade at openjdk.org  Thu Sep 19 08:31:35 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Thu, 19 Sep 2024 08:31:35 GMT
Subject: RFR: 8340408: Shenandoah: Remove redundant task stats printing
 code in ShenandoahTaskQueue
In-Reply-To: <6PiT6kRNGaQjkJTNknVTqMV14SyYXwFosAz9Ts3Emfk=.46c01286-6733-424c-a0d6-e6ff8aa4726c@github.com>
References: <6PiT6kRNGaQjkJTNknVTqMV14SyYXwFosAz9Ts3Emfk=.46c01286-6733-424c-a0d6-e6ff8aa4726c@github.com>
Message-ID: <1m4c0G4GgF_uHGxwKvqmilfGjIv1qqsvhCZX3VfKvbo=.5628bbd0-f0c2-4efd-a9f0-c43c0a8ccc64@github.com>

On Thu, 19 Sep 2024 00:33:04 GMT, Zhengyu Gu <zgu at openjdk.org> wrote:

> [JDK-8280397](https://bugs.openjdk.org/browse/JDK-8280397) made the code redundant. 
> 
> Adopt shared implementation.

Ah, cool. Thanks.

-------------

Marked as reviewed by shade (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/21077#pullrequestreview-2314809810

From shade at openjdk.org  Thu Sep 19 08:45:36 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Thu, 19 Sep 2024 08:45:36 GMT
Subject: RFR: 8340408: Shenandoah: Remove redundant task stats printing
 code in ShenandoahTaskQueue
In-Reply-To: <6PiT6kRNGaQjkJTNknVTqMV14SyYXwFosAz9Ts3Emfk=.46c01286-6733-424c-a0d6-e6ff8aa4726c@github.com>
References: <6PiT6kRNGaQjkJTNknVTqMV14SyYXwFosAz9Ts3Emfk=.46c01286-6733-424c-a0d6-e6ff8aa4726c@github.com>
Message-ID: <8ML4TaXKk8S8zA_MiOfZyniZqVP7uyQMaY2SRw5Nsow=.490b258d-3731-4fca-bfb1-07439da5a1a3@github.com>

On Thu, 19 Sep 2024 00:33:04 GMT, Zhengyu Gu <zgu at openjdk.org> wrote:

> [JDK-8280397](https://bugs.openjdk.org/browse/JDK-8280397) made the code redundant. 
> 
> Adopt shared implementation.

@earthling-amzn, @kdnilsen, @ysramakrishna -- your turn :)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21077#issuecomment-2360383565

From mli at openjdk.org  Thu Sep 19 10:32:50 2024
From: mli at openjdk.org (Hamlin Li)
Date: Thu, 19 Sep 2024 10:32:50 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v21]
In-Reply-To: <TCFshDmOrJjN0zTh1X9BGEa_rczyMqRbyDIiYsFr_5I=.ea051371-4c86-461e-8721-83b32caf808f@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <TCFshDmOrJjN0zTh1X9BGEa_rczyMqRbyDIiYsFr_5I=.ea051371-4c86-461e-8721-83b32caf808f@github.com>
Message-ID: <pOeT98rm-lCu2o8_7HzKpkA7iC800HUxgaJMNTD6K5Q=.03ff309e-c28c-4945-ad71-4462ed1b7f67@github.com>

On Wed, 18 Sep 2024 13:23:44 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> This is the main body of the JEP 450: Compact Object Headers (Experimental).
>> 
>> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
>> 
>> Main changes:
>>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>>  - Arrays will now store their length at offset 8.
>>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co...
>
> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
> 
>   JVMCI support

src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp line 2529:

> 2527:     }
> 2528:     __ decode_klass_not_null(result);
> 2529:   } else {

Could this if/else block be replaced with a simple call of load_klass(...)?

src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp line 3522:

> 3520:   {
> 3521:     __ movptr(result, Address(obj, oopDesc::klass_offset_in_bytes()));
> 3522:   }

Could this if/else block be replaced with a simple call of load_klass(...)?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766587136
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766582255

From rcastanedalo at openjdk.org  Thu Sep 19 11:02:50 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Thu, 19 Sep 2024 11:02:50 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v15]
In-Reply-To: <mX0ou3n5MzR_tt1nnuLy-cFYj_85cfCK-LHR6KZ-Uwk=.4fca751d-a06a-4c2d-8bb2-eba51d435ef2@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <CCz9B03dUxzua_YHk9-nyyHS_FTa6axwrwYjACcz4Y0=.6052f53c-9ddb-4868-a817-aff4276a2b6a@github.com>
 <mX0ou3n5MzR_tt1nnuLy-cFYj_85cfCK-LHR6KZ-Uwk=.4fca751d-a06a-4c2d-8bb2-eba51d435ef2@github.com>
Message-ID: <gPJJkZqhS0S4vgAqP_FoUfNEIAfw_f53uU_kID1N1P8=.a4433f1f-3b2c-4c00-a074-ab650b81b38b@github.com>

On Mon, 16 Sep 2024 08:04:43 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

> I agree that this is the simplest and least intrusive way of getting klass loading working in C2 for this experimental version of the feature. However, the approach seems brittle and error-prone, and it may be hard to maintain in the long run. Therefore, I think that a more principled and robust modeling will be needed, after this PR is integrated, in preparation for the non-experimental version.

What do you think about this @rkennke? Do you agree on an alternative modeling of klass loading in C2 (without any reliance on `oopDesc::klass_offset_in_bytes()`) being a pre-condition for a future, non-experimental version of compact headers?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2360673405

From yzheng at openjdk.org  Thu Sep 19 11:12:49 2024
From: yzheng at openjdk.org (Yudi Zheng)
Date: Thu, 19 Sep 2024 11:12:49 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v9]
In-Reply-To: <BkuKSFHleaF8I2tWCDIC-rGlWnfGhbK1zQ6CLlB-rVE=.9c7976e3-b0e9-4356-8d4b-d880aa30bffa@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com>
 <NR6CKOtdUaVarB7hiAUIAsIMe03qiTQr5VcEOWGt8mA=.164efc10-f0ff-4ddf-80b8-c56599c0ba7d@github.com>
 <BNQiyA0NCBO-oFZU4Nz3yvaMxDzICFxfqDKWt4wCL_U=.9405414d-f961-4168-8d26-ef5f58069161@github.com>
 <igBchd3diNsHdnQwZS2sj4tjaKejOq4f4EZovOmpmRQ=.abf79fee-9592-4555-ac01-de0170665ae4@github.com>
 <BkuKSFHleaF8I2tWCDIC-rGlWnfGhbK1zQ6CLlB-rVE=.9c7976e3-b0e9-4356-8d4b-d880aa30bffa@github.com>
Message-ID: <XBxfeOy1JiEESlmCA3u1_zYc7ss5dUkSzYdJSI1XJJo=.948eb061-5051-4781-a3f0-60daaf58419d@github.com>

On Thu, 19 Sep 2024 05:03:42 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:

>> Yes, I saw that patch.  I'm not sure I like the idea of cpu dependent code also doing the encoding.  There were some C2 changes related to it that I didn't understand if that scheme required them.  I don't see the down side to having the prototype header pre-encoded in the markWord.  Seems simpler.
>
> We already have a cpu dependent code for both C1 and the interpreter. Adding cpu dependent code to C2 doesn't make it significantly worse. My latest patch also refactors the code so that C1, interpreter, and C2 all calls into shared functions in the macro assembler.

Could you please point me to the C2 change? Is it going to be integrated in this PR? We in Graal have not yet adopted `Klass::_prototype_header` and will hold if you decide to get rid of it

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766642585

From stuefe at openjdk.org  Thu Sep 19 11:39:50 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Thu, 19 Sep 2024 11:39:50 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v21]
In-Reply-To: <DDy_r4_vlMDIhKmbAYCyQj4Uu43soArfL9tAJoDX9-I=.b14baeb8-a7ef-4144-8a9c-de8d733e1c4d@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <TCFshDmOrJjN0zTh1X9BGEa_rczyMqRbyDIiYsFr_5I=.ea051371-4c86-461e-8721-83b32caf808f@github.com>
 <DDy_r4_vlMDIhKmbAYCyQj4Uu43soArfL9tAJoDX9-I=.b14baeb8-a7ef-4144-8a9c-de8d733e1c4d@github.com>
Message-ID: <mN9d7VoyWJaDzLQFV0fcTDlYGsLuRpVDGU1Q03N4W5M=.182fb997-ffb1-4445-a9cb-54abf5daa309@github.com>

On Wed, 18 Sep 2024 23:49:34 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

>> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   JVMCI support
>
> src/hotspot/share/oops/compressedKlass.cpp line 242:
> 
>> 240:   } else {
>> 241: 
>> 242:     // Traditional (non-compact) header mode)
> 
> Extra )

Will fix

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766676702

From rkennke at openjdk.org  Thu Sep 19 11:52:34 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Thu, 19 Sep 2024 11:52:34 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v22]
In-Reply-To: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
Message-ID: <0mWQW50x4UNwdsRE94w3rZVGnppxQeR9fbe4eUrAGtM=.cca89805-ca82-4605-bc11-4f9ac53d2b90@github.com>

> This is the main body of the JEP 450: Compact Object Headers (Experimental).
> 
> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
> 
> Main changes:
>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>  - Arrays will now store their length at offset 8.
>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv...

Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:

  Simplify LIR_Assembler::emit_load_klass()

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20677/files
  - new: https://git.openjdk.org/jdk/pull/20677/files/9ad2e62f..b25a4b69

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=21
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=20-21

  Stats: 28 lines in 2 files changed: 0 ins; 26 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/20677.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677

PR: https://git.openjdk.org/jdk/pull/20677

From rkennke at openjdk.org  Thu Sep 19 11:52:34 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Thu, 19 Sep 2024 11:52:34 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v15]
In-Reply-To: <gPJJkZqhS0S4vgAqP_FoUfNEIAfw_f53uU_kID1N1P8=.a4433f1f-3b2c-4c00-a074-ab650b81b38b@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <CCz9B03dUxzua_YHk9-nyyHS_FTa6axwrwYjACcz4Y0=.6052f53c-9ddb-4868-a817-aff4276a2b6a@github.com>
 <mX0ou3n5MzR_tt1nnuLy-cFYj_85cfCK-LHR6KZ-Uwk=.4fca751d-a06a-4c2d-8bb2-eba51d435ef2@github.com>
 <gPJJkZqhS0S4vgAqP_FoUfNEIAfw_f53uU_kID1N1P8=.a4433f1f-3b2c-4c00-a074-ab650b81b38b@github.com>
Message-ID: <kOLQ21p1NoD9_mYCqEo1W1oUjXx1ITxgMAPP_WvKJOM=.21b5935e-d183-4bda-aaf4-286ab3b4a6c4@github.com>

On Thu, 19 Sep 2024 11:00:20 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

> > I agree that this is the simplest and least intrusive way of getting klass loading working in C2 for this experimental version of the feature. However, the approach seems brittle and error-prone, and it may be hard to maintain in the long run. Therefore, I think that a more principled and robust modeling will be needed, after this PR is integrated, in preparation for the non-experimental version.
> 
> What do you think about this @rkennke? Do you agree on an alternative modeling of klass loading in C2 (without any reliance on `oopDesc::klass_offset_in_bytes()`) being a pre-condition for a future, non-experimental version of compact headers?

Yes, that sounds like a good improvement! It'd also clean up C2 considerably - right now there are many places in C2 that rely on klass_offset_in_bytes(). Getting rid of them all would be great, but also seems like a major effort. Could you file an issue to track that future work?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2360756796

From rkennke at openjdk.org  Thu Sep 19 11:52:37 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Thu, 19 Sep 2024 11:52:37 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v21]
In-Reply-To: <pOeT98rm-lCu2o8_7HzKpkA7iC800HUxgaJMNTD6K5Q=.03ff309e-c28c-4945-ad71-4462ed1b7f67@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <TCFshDmOrJjN0zTh1X9BGEa_rczyMqRbyDIiYsFr_5I=.ea051371-4c86-461e-8721-83b32caf808f@github.com>
 <pOeT98rm-lCu2o8_7HzKpkA7iC800HUxgaJMNTD6K5Q=.03ff309e-c28c-4945-ad71-4462ed1b7f67@github.com>
Message-ID: <qt6n1HsHLaG2loSXY0pzBhnSDHGp6owvMZ0cM7DXTP4=.7575154e-2d7f-41bd-beb2-40442f6154ba@github.com>

On Thu, 19 Sep 2024 10:29:11 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   JVMCI support
>
> src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp line 2529:
> 
>> 2527:     }
>> 2528:     __ decode_klass_not_null(result);
>> 2529:   } else {
> 
> Could this if/else block be replaced with a simple call of load_klass(...)?

Yes, will do.

> src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp line 3522:
> 
>> 3520:   {
>> 3521:     __ movptr(result, Address(obj, oopDesc::klass_offset_in_bytes()));
>> 3522:   }
> 
> Could this if/else block be replaced with a simple call of load_klass(...)?

Yes, will do.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766689169
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766689004

From stuefe at openjdk.org  Thu Sep 19 11:52:38 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Thu, 19 Sep 2024 11:52:38 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v21]
In-Reply-To: <qPL-XuWVkfNYFmfYatyQZwbHismNzW_jXdvkDvIiVNc=.d4fd26a8-f509-494a-bd3d-0c6f37ba5269@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <TCFshDmOrJjN0zTh1X9BGEa_rczyMqRbyDIiYsFr_5I=.ea051371-4c86-461e-8721-83b32caf808f@github.com>
 <qPL-XuWVkfNYFmfYatyQZwbHismNzW_jXdvkDvIiVNc=.d4fd26a8-f509-494a-bd3d-0c6f37ba5269@github.com>
Message-ID: <spymZDxJXwseFGxL78zfn4V8qe9NpX3gifTgd_eQkiM=.d1af7cf3-b503-4c35-93bb-db00f3c4cfc8@github.com>

On Thu, 19 Sep 2024 05:44:42 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:

>> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   JVMCI support
>
> src/hotspot/share/oops/compressedKlass.cpp line 231:
> 
>> 229:     // The reason is that we want to avoid, if possible, shifts larger than
>> 230:     // a cacheline size.
>> 231:     _base = addr;
> 
> Why is this important?

It lessens the cache effects of Klass hyperaligning.

> src/hotspot/share/oops/compressedKlass.hpp line 261:
> 
>> 259:   }
>> 260: 
>> 261: };
> 
> Missing blank line before `#endif`

Fixed

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766684016
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766684491

From stuefe at openjdk.org  Thu Sep 19 11:52:39 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Thu, 19 Sep 2024 11:52:39 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v21]
In-Reply-To: <spymZDxJXwseFGxL78zfn4V8qe9NpX3gifTgd_eQkiM=.d1af7cf3-b503-4c35-93bb-db00f3c4cfc8@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <TCFshDmOrJjN0zTh1X9BGEa_rczyMqRbyDIiYsFr_5I=.ea051371-4c86-461e-8721-83b32caf808f@github.com>
 <qPL-XuWVkfNYFmfYatyQZwbHismNzW_jXdvkDvIiVNc=.d4fd26a8-f509-494a-bd3d-0c6f37ba5269@github.com>
 <spymZDxJXwseFGxL78zfn4V8qe9NpX3gifTgd_eQkiM=.d1af7cf3-b503-4c35-93bb-db00f3c4cfc8@github.com>
Message-ID: <U1HQ2ijNoadMNlrW_Adxkju4fM8kfAB4hn-a_gkPrnA=.7e0fee38-8534-4dcc-8aac-77d6559687c8@github.com>

On Thu, 19 Sep 2024 11:43:12 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> src/hotspot/share/oops/compressedKlass.cpp line 231:
>> 
>>> 229:     // The reason is that we want to avoid, if possible, shifts larger than
>>> 230:     // a cacheline size.
>>> 231:     _base = addr;
>> 
>> Why is this important?
>
> It lessens the cache effects of Klass hyperaligning.

Note that if we go with my KLUT proposal for post-Lilliput (the GC oop iteration improvements), this will not matter anymore and can be simplified to a fixed shift of 10.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766688756

From stuefe at openjdk.org  Thu Sep 19 11:52:40 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Thu, 19 Sep 2024 11:52:40 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v21]
In-Reply-To: <DDy_r4_vlMDIhKmbAYCyQj4Uu43soArfL9tAJoDX9-I=.b14baeb8-a7ef-4144-8a9c-de8d733e1c4d@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <TCFshDmOrJjN0zTh1X9BGEa_rczyMqRbyDIiYsFr_5I=.ea051371-4c86-461e-8721-83b32caf808f@github.com>
 <DDy_r4_vlMDIhKmbAYCyQj4Uu43soArfL9tAJoDX9-I=.b14baeb8-a7ef-4144-8a9c-de8d733e1c4d@github.com>
Message-ID: <NrRhACMXbA72b5jdZo12kOK8euCDLSLJgRBDGRUmVoM=.ae5bbb9d-b9f5-4b89-a39d-424ecb8031f0@github.com>

On Wed, 18 Sep 2024 23:53:28 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

>> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   JVMCI support
>
> src/hotspot/share/oops/compressedKlass.hpp line 175:
> 
>> 173:   //   5b) if CDS=off: Calls initialize() - here, we have more freedom and, if we want, can choose an encoding
>> 174:   //       base that differs from the reservation base from step (4). That allows us, e.g., to later use
>> 175:   //       zero-based encoding.
> 
> Not for this but is there really any benefit for zero based encoding for klass ids?

Yes, I think so. I think the SAP Jit people investigated this when doing the PPC ports. You save at least two instructions, and possibly more, per decode op. You save code size too since you don't need to materialize the 64-bit base immediate. Especially on x64 this can mean easily 11 fewer bytes.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766681110

From stuefe at openjdk.org  Thu Sep 19 11:52:42 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Thu, 19 Sep 2024 11:52:42 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v18]
In-Reply-To: <sXx4wxL6nlbJFQXLjN83xKY43ZZ1PyuLwk32st0Dfhg=.df5ee7f3-9c92-41d4-a80b-6e7291619a23@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <tzW1ygFfWY40OR9lB3z0gVv7H9WKJPbFCZmR0F23-_k=.de6459ea-024c-48de-a062-0136d0af0306@github.com>
 <sXx4wxL6nlbJFQXLjN83xKY43ZZ1PyuLwk32st0Dfhg=.df5ee7f3-9c92-41d4-a80b-6e7291619a23@github.com>
Message-ID: <ZFg-_gMExHPrQiClx2MKLQpk9VN-iemvFgkXxbH8JIE=.35110abe-dd9e-4d41-8d41-dda7b6eae89a@github.com>

On Tue, 17 Sep 2024 10:36:58 GMT, Johan Sj?len <jsjolen at openjdk.org> wrote:

>> Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 57 commits:
>> 
>>  - fix CompressedClassPointersEncodingScheme yet again for linux aarch64
>>  - Fixes post-8340184
>>  - Merge upstream up to and including 8340184
>>  - Merge remote-tracking branch 'origin/master' into JDK-8305895-v4
>>  - Fix  test/hotspot/jtreg/runtime/CompressedOops/CompressedClassPointersEncodingScheme.java
>>  - Fix loop on aarch64
>>  - clarify obscure assert in metasapce setup
>>  - Rework compressedklass encoding
>>  - remove stray debug output
>>  - Fixes post 8338526
>>  - ... and 47 more: https://git.openjdk.org/jdk/compare/7849f252...28a26aed
>
> test/hotspot/gtest/metaspace/test_clms.cpp line 193:
> 
>> 191: 
>> 192:       {
>> 193:         // Nonclass arena allocation.
> 
> The style in this source file isn't really up to scratch, especially *these* lines. Anyway, it's in the tests, so I'm OK with this being fixed in a follow up RFE.

Okay, will fix

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766686807

From rkennke at openjdk.org  Thu Sep 19 11:57:52 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Thu, 19 Sep 2024 11:57:52 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v9]
In-Reply-To: <BkuKSFHleaF8I2tWCDIC-rGlWnfGhbK1zQ6CLlB-rVE=.9c7976e3-b0e9-4356-8d4b-d880aa30bffa@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com>
 <NR6CKOtdUaVarB7hiAUIAsIMe03qiTQr5VcEOWGt8mA=.164efc10-f0ff-4ddf-80b8-c56599c0ba7d@github.com>
 <BNQiyA0NCBO-oFZU4Nz3yvaMxDzICFxfqDKWt4wCL_U=.9405414d-f961-4168-8d26-ef5f58069161@github.com>
 <igBchd3diNsHdnQwZS2sj4tjaKejOq4f4EZovOmpmRQ=.abf79fee-9592-4555-ac01-de0170665ae4@github.com>
 <BkuKSFHleaF8I2tWCDIC-rGlWnfGhbK1zQ6CLlB-rVE=.9c7976e3-b0e9-4356-8d4b-d880aa30bffa@github.com>
Message-ID: <ubcY5Q-aPRCLmhZV9MjBfzzbs84FegDvNkHx8nwPUd0=.46503fd8-d274-4558-bd0f-cc0c4f199667@github.com>

On Thu, 19 Sep 2024 05:03:42 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:

>> Yes, I saw that patch.  I'm not sure I like the idea of cpu dependent code also doing the encoding.  There were some C2 changes related to it that I didn't understand if that scheme required them.  I don't see the down side to having the prototype header pre-encoded in the markWord.  Seems simpler.
>
> We already have a cpu dependent code for both C1 and the interpreter. Adding cpu dependent code to C2 doesn't make it significantly worse. My latest patch also refactors the code so that C1, interpreter, and C2 all calls into shared functions in the macro assembler.

We haven't decided whether or not we will git rid of ```Klass::_prototype_header``` before intergrating this PR, or not. @stefank could point you to a WIP branch, if that's helpful.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766697849

From rkennke at openjdk.org  Thu Sep 19 12:08:46 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Thu, 19 Sep 2024 12:08:46 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v23]
In-Reply-To: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
Message-ID: <0BrAbBTKmpqTGDrc--2znzO8t07yoqabwa6g2K05GHI=.d3c17fd5-4770-4623-8d2f-604816afc033@github.com>

> This is the main body of the JEP 450: Compact Object Headers (Experimental).
> 
> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
> 
> Main changes:
>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>  - Arrays will now store their length at offset 8.
>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv...

Roman Kennke has updated the pull request incrementally with two additional commits since the last revision:

 - Merge remote-tracking branch 'lilliput/JEP-450-temporary-fix-branch-2' into JDK-8305895-v4
 - review feedback

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20677/files
  - new: https://git.openjdk.org/jdk/pull/20677/files/b25a4b69..0d8a9236

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=22
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=21-22

  Stats: 10 lines in 3 files changed: 1 ins; 4 del; 5 mod
  Patch: https://git.openjdk.org/jdk/pull/20677.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677

PR: https://git.openjdk.org/jdk/pull/20677

From shade at openjdk.org  Thu Sep 19 12:20:39 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Thu, 19 Sep 2024 12:20:39 GMT
Subject: RFR: 8340400: Shenandoah: Whitebox breakpoint GC requests may
 cause assertions
In-Reply-To: <eFIBip-WT-4JsLJIJ9F-L66700DZjmakU3P4ZRcWjL0=.712ae3a2-65c6-4f78-a2f5-ab8a92a8aa3a@github.com>
References: <eFIBip-WT-4JsLJIJ9F-L66700DZjmakU3P4ZRcWjL0=.712ae3a2-65c6-4f78-a2f5-ab8a92a8aa3a@github.com>
Message-ID: <K7fewLuXp389LoFdjN54N4mIBRoUJZRKQqrX7f_gK8c=.0821d3a2-d978-4682-baf5-6eeee97593d7@github.com>

On Wed, 18 Sep 2024 21:02:23 GMT, William Kemper <wkemper at openjdk.org> wrote:

> When a test requests a concurrent GC breakpoint, the calling thread arranges for itself to block until the concurrent GC thread notifies it that the GC has reached the requested breakpoint (phase). The code that handles the whitebox breakpoint request should therefore not block the caller. An attempt was made to do this, but the request just has the caller thread run in a busy loop without waiting. What's more, this loop resets the requested gc cause on every iteration, which may lead to gc cycles with a wb_breakpoint cause, but no breakpoint set - which violates assertions.

Yeah, this makes sense. Any tests fail without this patch?

-------------

Marked as reviewed by shade (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/21074#pullrequestreview-2315363086

From coleenp at openjdk.org  Thu Sep 19 12:38:48 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Thu, 19 Sep 2024 12:38:48 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v21]
In-Reply-To: <U1HQ2ijNoadMNlrW_Adxkju4fM8kfAB4hn-a_gkPrnA=.7e0fee38-8534-4dcc-8aac-77d6559687c8@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <TCFshDmOrJjN0zTh1X9BGEa_rczyMqRbyDIiYsFr_5I=.ea051371-4c86-461e-8721-83b32caf808f@github.com>
 <qPL-XuWVkfNYFmfYatyQZwbHismNzW_jXdvkDvIiVNc=.d4fd26a8-f509-494a-bd3d-0c6f37ba5269@github.com>
 <spymZDxJXwseFGxL78zfn4V8qe9NpX3gifTgd_eQkiM=.d1af7cf3-b503-4c35-93bb-db00f3c4cfc8@github.com>
 <U1HQ2ijNoadMNlrW_Adxkju4fM8kfAB4hn-a_gkPrnA=.7e0fee38-8534-4dcc-8aac-77d6559687c8@github.com>
Message-ID: <CNrcAK8SrGCzsreT3SdoxaoGaqKqGb989t-waCJUz64=.0662bccb-661f-4dc9-b8b6-24d615035261@github.com>

On Thu, 19 Sep 2024 11:47:21 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> It lessens the cache effects of Klass hyperaligning.
>
> Note that if we go with my KLUT proposal for post-Lilliput (the GC oop iteration improvements), this will not matter anymore and can be simplified to a fixed shift of 10.

Yes, please, not having this code would be really nice.  This is difficult code.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766753081

From rcastanedalo at openjdk.org  Thu Sep 19 13:12:49 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Thu, 19 Sep 2024 13:12:49 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v15]
In-Reply-To: <kOLQ21p1NoD9_mYCqEo1W1oUjXx1ITxgMAPP_WvKJOM=.21b5935e-d183-4bda-aaf4-286ab3b4a6c4@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <CCz9B03dUxzua_YHk9-nyyHS_FTa6axwrwYjACcz4Y0=.6052f53c-9ddb-4868-a817-aff4276a2b6a@github.com>
 <mX0ou3n5MzR_tt1nnuLy-cFYj_85cfCK-LHR6KZ-Uwk=.4fca751d-a06a-4c2d-8bb2-eba51d435ef2@github.com>
 <gPJJkZqhS0S4vgAqP_FoUfNEIAfw_f53uU_kID1N1P8=.a4433f1f-3b2c-4c00-a074-ab650b81b38b@github.com>
 <kOLQ21p1NoD9_mYCqEo1W1oUjXx1ITxgMAPP_WvKJOM=.21b5935e-d183-4bda-aaf4-286ab3b4a6c4@github.com>
Message-ID: <0gatRiYQ3frDnMftpb_WaDolUwcYvBFh5hAp6jY0dzQ=.21d6518e-7217-477e-954f-69fd52eb713e@github.com>

On Thu, 19 Sep 2024 11:42:04 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

> > > I agree that this is the simplest and least intrusive way of getting klass loading working in C2 for this experimental version of the feature. However, the approach seems brittle and error-prone, and it may be hard to maintain in the long run. Therefore, I think that a more principled and robust modeling will be needed, after this PR is integrated, in preparation for the non-experimental version.
> > 
> > 
> > What do you think about this @rkennke? Do you agree on an alternative modeling of klass loading in C2 (without any reliance on `oopDesc::klass_offset_in_bytes()`) being a pre-condition for a future, non-experimental version of compact headers?
> 
> Yes, that sounds like a good improvement! It'd also clean up C2 considerably - right now there are many places in C2 that rely on klass_offset_in_bytes(). Getting rid of them all would be great, but also seems like a major effort. Could you file an issue to track that future work?

Done: https://bugs.openjdk.org/browse/JDK-8340453.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2360945827

From stefank at openjdk.org  Thu Sep 19 13:12:50 2024
From: stefank at openjdk.org (Stefan Karlsson)
Date: Thu, 19 Sep 2024 13:12:50 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v21]
In-Reply-To: <CNrcAK8SrGCzsreT3SdoxaoGaqKqGb989t-waCJUz64=.0662bccb-661f-4dc9-b8b6-24d615035261@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <TCFshDmOrJjN0zTh1X9BGEa_rczyMqRbyDIiYsFr_5I=.ea051371-4c86-461e-8721-83b32caf808f@github.com>
 <qPL-XuWVkfNYFmfYatyQZwbHismNzW_jXdvkDvIiVNc=.d4fd26a8-f509-494a-bd3d-0c6f37ba5269@github.com>
 <spymZDxJXwseFGxL78zfn4V8qe9NpX3gifTgd_eQkiM=.d1af7cf3-b503-4c35-93bb-db00f3c4cfc8@github.com>
 <U1HQ2ijNoadMNlrW_Adxkju4fM8kfAB4hn-a_gkPrnA=.7e0fee38-8534-4dcc-8aac-77d6559687c8@github.com>
 <CNrcAK8SrGCzsreT3SdoxaoGaqKqGb989t-waCJUz64=.0662bccb-661f-4dc9-b8b6-24d615035261@github.com>
Message-ID: <YyPrYu-6p5wcUwfMQQamwoM1eGp6eUze_yC1lrc7mb0=.f1e5853b-3b6e-4460-8f57-a6f7588e513e@github.com>

On Thu, 19 Sep 2024 12:35:30 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

>> Note that if we go with my KLUT proposal for post-Lilliput (the GC oop iteration improvements), this will not matter anymore and can be simplified to a fixed shift of 10.
>
> Yes, please, not having this code would be really nice.  This is difficult code.

Do you seen any effects of this in anything other than special-crafted micro benchmarks? I wonder if it would be good enough to hard-code this to be 10 for the first integration of Lilliput.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766804699

From stuefe at openjdk.org  Thu Sep 19 13:37:52 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Thu, 19 Sep 2024 13:37:52 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v21]
In-Reply-To: <YyPrYu-6p5wcUwfMQQamwoM1eGp6eUze_yC1lrc7mb0=.f1e5853b-3b6e-4460-8f57-a6f7588e513e@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <TCFshDmOrJjN0zTh1X9BGEa_rczyMqRbyDIiYsFr_5I=.ea051371-4c86-461e-8721-83b32caf808f@github.com>
 <qPL-XuWVkfNYFmfYatyQZwbHismNzW_jXdvkDvIiVNc=.d4fd26a8-f509-494a-bd3d-0c6f37ba5269@github.com>
 <spymZDxJXwseFGxL78zfn4V8qe9NpX3gifTgd_eQkiM=.d1af7cf3-b503-4c35-93bb-db00f3c4cfc8@github.com>
 <U1HQ2ijNoadMNlrW_Adxkju4fM8kfAB4hn-a_gkPrnA=.7e0fee38-8534-4dcc-8aac-77d6559687c8@github.com>
 <CNrcAK8SrGCzsreT3SdoxaoGaqKqGb989t-waCJUz64=.0662bccb-661f-4dc9-b8b6-24d615035261@github.com>
 <YyPrYu-6p5wcUwfMQQamwoM1eGp6eUze_yC1lrc7mb0=.f1e5853b-3b6e-4460-8f57-a6f7588e513e@github.com>
Message-ID: <Na5LWOz_h0ZGmLcETAxDA9wYiEopgWfvNTbYAELExR8=.e30fe7a0-c766-4bb4-aebb-fc6643c832c1@github.com>

On Thu, 19 Sep 2024 13:08:43 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:

>> Yes, please, not having this code would be really nice.  This is difficult code.
>
> Do you seen any effects of this in anything other than special-crafted micro benchmarks? I wonder if it would be good enough to hard-code this to be 10 for the first integration of Lilliput.

I will do some benchmarks

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766848371

From zgu at openjdk.org  Thu Sep 19 14:05:56 2024
From: zgu at openjdk.org (Zhengyu Gu)
Date: Thu, 19 Sep 2024 14:05:56 GMT
Subject: RFR: 8339668: Parallel: Adopt PartialArrayState to consolidate marking
 stack in compact GC
Message-ID: <SpLKo44sGBtJ8p0qZGh6NCl2ABe0ezh8l645vr6R3xM=.6615b830-f3aa-4c25-a0b4-1ab94a66b3d9@github.com>

Please review this patch that adopts `PartialArrayState`introduced by [JDK-8337709](https://bugs.openjdk.org/browse/JDK-8337709) to consolidate `_oop_task_queues` and `_objarray_task_queues` into single `_marking_stacks`.

The change mirrors Kim's [JDK-8311163](https://bugs.openjdk.org/browse/JDK-8311163) work, therefore, there are methods can be consolidated and simplified, but I would like defer to a followup CR.

-------------

Commit messages:
 - v7
 - v6
 - v5
 - v4
 - v3
 - Correct marking stride
 - v2 - tq stats
 - v1
 - v0

Changes: https://git.openjdk.org/jdk/pull/21089/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21089&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8339668
  Stats: 262 lines in 5 files changed: 152 ins; 44 del; 66 mod
  Patch: https://git.openjdk.org/jdk/pull/21089.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/21089/head:pull/21089

PR: https://git.openjdk.org/jdk/pull/21089

From stefank at openjdk.org  Thu Sep 19 14:25:52 2024
From: stefank at openjdk.org (Stefan Karlsson)
Date: Thu, 19 Sep 2024 14:25:52 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v9]
In-Reply-To: <ubcY5Q-aPRCLmhZV9MjBfzzbs84FegDvNkHx8nwPUd0=.46503fd8-d274-4558-bd0f-cc0c4f199667@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com>
 <NR6CKOtdUaVarB7hiAUIAsIMe03qiTQr5VcEOWGt8mA=.164efc10-f0ff-4ddf-80b8-c56599c0ba7d@github.com>
 <BNQiyA0NCBO-oFZU4Nz3yvaMxDzICFxfqDKWt4wCL_U=.9405414d-f961-4168-8d26-ef5f58069161@github.com>
 <igBchd3diNsHdnQwZS2sj4tjaKejOq4f4EZovOmpmRQ=.abf79fee-9592-4555-ac01-de0170665ae4@github.com>
 <BkuKSFHleaF8I2tWCDIC-rGlWnfGhbK1zQ6CLlB-rVE=.9c7976e3-b0e9-4356-8d4b-d880aa30bffa@github.com>
 <ubcY5Q-aPRCLmhZV9MjBfzzbs84FegDvNkHx8nwPUd0=.46503fd8-d274-4558-bd0f-cc0c4f199667@github.com>
Message-ID: <2w9H6VAbxm7BgSGRwKAxbI56bG-k4bE_ZDviGrBF36o=.3d4cb47f-0f84-479a-a809-6d0186dfad2d@github.com>

On Thu, 19 Sep 2024 11:54:50 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> We already have a cpu dependent code for both C1 and the interpreter. Adding cpu dependent code to C2 doesn't make it significantly worse. My latest patch also refactors the code so that C1, interpreter, and C2 all calls into shared functions in the macro assembler.
>
> We haven't decided whether or not we will git rid of ```Klass::_prototype_header``` before intergrating this PR, or not. @stefank could point you to a WIP branch, if that's helpful.

This is my current work-in-progress code:
https://github.com/stefank/jdk/compare/pull/20677...stefank:jdk:lilliput_remove_prototype_header_wip_2

I've made some large rewrites and are currently running it through functional testing.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766934571

From mli at openjdk.org  Thu Sep 19 15:03:53 2024
From: mli at openjdk.org (Hamlin Li)
Date: Thu, 19 Sep 2024 15:03:53 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v23]
In-Reply-To: <0BrAbBTKmpqTGDrc--2znzO8t07yoqabwa6g2K05GHI=.d3c17fd5-4770-4623-8d2f-604816afc033@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <0BrAbBTKmpqTGDrc--2znzO8t07yoqabwa6g2K05GHI=.d3c17fd5-4770-4623-8d2f-604816afc033@github.com>
Message-ID: <cL1TRIR8QmiIxhVU8LmxcEjUcneCNyLOh7ik4fgmIcM=.abc5dea2-6d92-4b6e-9e3f-01137441d02e@github.com>

On Thu, 19 Sep 2024 12:08:46 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> This is the main body of the JEP 450: Compact Object Headers (Experimental).
>> 
>> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
>> 
>> Main changes:
>>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>>  - Arrays will now store their length at offset 8.
>>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co...
>
> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Merge remote-tracking branch 'lilliput/JEP-450-temporary-fix-branch-2' into JDK-8305895-v4
>  - review feedback

In both aarch64.ad and x86_64.ad, `MachUEPNode::format` might need some change accordingly?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2361266175

From rcastanedalo at openjdk.org  Thu Sep 19 17:23:50 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Thu, 19 Sep 2024 17:23:50 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v15]
In-Reply-To: <poutu1_cWKZJS_aUjdHdxXdoYBb0nS5Zohb1-_UwFiY=.403c84c9-a971-4b16-b436-e34cc9654321@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <CCz9B03dUxzua_YHk9-nyyHS_FTa6axwrwYjACcz4Y0=.6052f53c-9ddb-4868-a817-aff4276a2b6a@github.com>
 <6UJOrZqmfsJj6pRzMjPdlYt191QgBV6fIv1qJAYsv60=.15284272-464f-4321-b76c-3412dafc6c63@github.com>
 <poutu1_cWKZJS_aUjdHdxXdoYBb0nS5Zohb1-_UwFiY=.403c84c9-a971-4b16-b436-e34cc9654321@github.com>
Message-ID: <IvQ2g_HP080mVZMQYJM3SCnjeAwZnJHfDHGUzwkXpR4=.053ed2be-6e53-4b91-b048-8efb57645c9c@github.com>

On Wed, 18 Sep 2024 12:08:46 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2576:
>> 
>>> 2574:   } else {
>>> 2575:     lea(dst, Address(obj, index, Address::lsl(scale)));
>>> 2576:     ldr(dst, Address(dst, offset));
>> 
>> Do you have a reproducer (or, better yet, a test case) that exercises this case? I ran Oracle's internal CI tiers 1-5 and could never hit it. Could this happen for x64 as well?
>
> AFAIK, this happens only when using compressed oops with a heap-base in r27. When running with that setting, we would get addresses like r27[nklass] or r27[nklass]+offset, both with scale=8. You would need large heaps, perhaps >4GB, to get this coops setting. The problem with aarch64 is that we can't have an address like  r27[nklass]+offset, that's why we need to lea the r27[nklass] part first.
> Yes, this also happens on x86, but x86 supports  rX[nklass]+offset addressing.

Thanks @rkennke, I tried running test tiers 1-3 using different compressed OOPs configurations but could not reach this code, unfortunately. Could you provide a reproducer? The reason I am particularly interested is because I'd like to find whether there could be any problematic interaction with C2's implicit null check optimization.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1767315114

From wkemper at openjdk.org  Thu Sep 19 17:57:38 2024
From: wkemper at openjdk.org (William Kemper)
Date: Thu, 19 Sep 2024 17:57:38 GMT
Subject: RFR: 8340400: Shenandoah: Whitebox breakpoint GC requests may
 cause assertions
In-Reply-To: <eFIBip-WT-4JsLJIJ9F-L66700DZjmakU3P4ZRcWjL0=.712ae3a2-65c6-4f78-a2f5-ab8a92a8aa3a@github.com>
References: <eFIBip-WT-4JsLJIJ9F-L66700DZjmakU3P4ZRcWjL0=.712ae3a2-65c6-4f78-a2f5-ab8a92a8aa3a@github.com>
Message-ID: <yPlca617eBiAIUar11blTD8XYHNpOk9jbGFa2sJ2gLo=.349e1c34-8435-4af1-b5eb-aaa44426c4f9@github.com>

On Wed, 18 Sep 2024 21:02:23 GMT, William Kemper <wkemper at openjdk.org> wrote:

> When a test requests a concurrent GC breakpoint, the calling thread arranges for itself to block until the concurrent GC thread notifies it that the GC has reached the requested breakpoint (phase). The code that handles the whitebox breakpoint request should therefore not block the caller. An attempt was made to do this, but the request just has the caller thread run in a busy loop without waiting. What's more, this loop resets the requested gc cause on every iteration, which may lead to gc cycles with a wb_breakpoint cause, but no breakpoint set - which violates assertions.

TestReferenceShortcutCycle and TestReferenceRefersToShenandoah would fail occasionally in the generational mode. I believe the generational mode was more susceptible to the issue because of differences in the generational mode controller. I don't recall seeing test failures in upstream, but as I read the code I believe the issue _could_ happen to other Shenandoah modes (or otherwise cause tests using whitebox breakpoints to behave in unexpected ways).

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21074#issuecomment-2361831211

From wkemper at openjdk.org  Thu Sep 19 17:57:39 2024
From: wkemper at openjdk.org (William Kemper)
Date: Thu, 19 Sep 2024 17:57:39 GMT
Subject: Integrated: 8340400: Shenandoah: Whitebox breakpoint GC requests may
 cause assertions
In-Reply-To: <eFIBip-WT-4JsLJIJ9F-L66700DZjmakU3P4ZRcWjL0=.712ae3a2-65c6-4f78-a2f5-ab8a92a8aa3a@github.com>
References: <eFIBip-WT-4JsLJIJ9F-L66700DZjmakU3P4ZRcWjL0=.712ae3a2-65c6-4f78-a2f5-ab8a92a8aa3a@github.com>
Message-ID: <IPTaYJc_qvzOiTde82xfhwxHbflXJloopTqQRIWZbQo=.7b7c9699-c997-4147-b283-f80645ac5c79@github.com>

On Wed, 18 Sep 2024 21:02:23 GMT, William Kemper <wkemper at openjdk.org> wrote:

> When a test requests a concurrent GC breakpoint, the calling thread arranges for itself to block until the concurrent GC thread notifies it that the GC has reached the requested breakpoint (phase). The code that handles the whitebox breakpoint request should therefore not block the caller. An attempt was made to do this, but the request just has the caller thread run in a busy loop without waiting. What's more, this loop resets the requested gc cause on every iteration, which may lead to gc cycles with a wb_breakpoint cause, but no breakpoint set - which violates assertions.

This pull request has now been integrated.

Changeset: 75d5e117
Author:    William Kemper <wkemper at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/75d5e117770590d2432fcfe8d89734c7038d4e55
Stats:     13 lines in 1 file changed: 10 ins; 2 del; 1 mod

8340400: Shenandoah: Whitebox breakpoint GC requests may cause assertions

Reviewed-by: shade

-------------

PR: https://git.openjdk.org/jdk/pull/21074

From wkemper at openjdk.org  Thu Sep 19 21:50:35 2024
From: wkemper at openjdk.org (William Kemper)
Date: Thu, 19 Sep 2024 21:50:35 GMT
Subject: RFR: 8340408: Shenandoah: Remove redundant task stats printing
 code in ShenandoahTaskQueue
In-Reply-To: <6PiT6kRNGaQjkJTNknVTqMV14SyYXwFosAz9Ts3Emfk=.46c01286-6733-424c-a0d6-e6ff8aa4726c@github.com>
References: <6PiT6kRNGaQjkJTNknVTqMV14SyYXwFosAz9Ts3Emfk=.46c01286-6733-424c-a0d6-e6ff8aa4726c@github.com>
Message-ID: <b8j5T3wNtPN9ysHoj8pDP3c3oWWnWG25E1gDGk1SpIs=.fc6b14f8-af4b-4dc1-a624-044957b4e61a@github.com>

On Thu, 19 Sep 2024 00:33:04 GMT, Zhengyu Gu <zgu at openjdk.org> wrote:

> [JDK-8280397](https://bugs.openjdk.org/browse/JDK-8280397) made the code redundant. 
> 
> Adopt shared implementation.

Thanks for this! Can we use the labels as requested in the review comments?

src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp line 228:

> 226:   finish_mark_work();
> 227:   assert(task_queues()->is_empty(), "Should be empty");
> 228:   TASKQUEUE_STATS_ONLY(task_queues()->print_and_reset_taskqueue_stats(""));

Could we pass `"Finish Mark"` for the label here.

src/hotspot/share/gc/shenandoah/shenandoahSTWMark.cpp line 136:

> 134: 
> 135:   assert(task_queues()->is_empty(), "Should be empty");
> 136:   TASKQUEUE_STATS_ONLY(task_queues()->print_and_reset_taskqueue_stats(""));

Could we pass `"Mark"` for the label here?

-------------

Changes requested by wkemper (Committer).

PR Review: https://git.openjdk.org/jdk/pull/21077#pullrequestreview-2316808410
PR Review Comment: https://git.openjdk.org/jdk/pull/21077#discussion_r1767638510
PR Review Comment: https://git.openjdk.org/jdk/pull/21077#discussion_r1767638220

From rkennke at openjdk.org  Fri Sep 20 12:33:50 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Fri, 20 Sep 2024 12:33:50 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v15]
In-Reply-To: <IvQ2g_HP080mVZMQYJM3SCnjeAwZnJHfDHGUzwkXpR4=.053ed2be-6e53-4b91-b048-8efb57645c9c@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <CCz9B03dUxzua_YHk9-nyyHS_FTa6axwrwYjACcz4Y0=.6052f53c-9ddb-4868-a817-aff4276a2b6a@github.com>
 <6UJOrZqmfsJj6pRzMjPdlYt191QgBV6fIv1qJAYsv60=.15284272-464f-4321-b76c-3412dafc6c63@github.com>
 <poutu1_cWKZJS_aUjdHdxXdoYBb0nS5Zohb1-_UwFiY=.403c84c9-a971-4b16-b436-e34cc9654321@github.com>
 <IvQ2g_HP080mVZMQYJM3SCnjeAwZnJHfDHGUzwkXpR4=.053ed2be-6e53-4b91-b048-8efb57645c9c@github.com>
Message-ID: <n5SsWdtVlbLQeGu2xxc39RFOkpcMUBUb16rd7IaZEAY=.a4d84480-b40f-4d74-9314-6b47c6551c20@github.com>

On Thu, 19 Sep 2024 17:20:36 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> AFAIK, this happens only when using compressed oops with a heap-base in r27. When running with that setting, we would get addresses like r27[nklass] or r27[nklass]+offset, both with scale=8. You would need large heaps, perhaps >4GB, to get this coops setting. The problem with aarch64 is that we can't have an address like  r27[nklass]+offset, that's why we need to lea the r27[nklass] part first.
>> Yes, this also happens on x86, but x86 supports  rX[nklass]+offset addressing.
>
> Thanks @rkennke, I tried running test tiers 1-3 using different compressed OOPs configurations but could not reach this code, unfortunately. Could you provide a reproducer? The reason I am particularly interested is because I'd like to find whether there could be any problematic interaction with C2's implicit null check optimization.

I tried to reproduce for a few hours now using a custom testcase, with no success.
I am pretty sure that this can happen, that is why I added this code. Originally I had an assert there asserting that index is not used. I do remember that this happens very rarely, and I don't remember the exact condition. Looking at the possible operands in opclass memory, I think this can only happen when we load an nKlass from an address of the form [rX, rY], i.e. the address in rX indexed by rY. This is an odd thing to happen for loadNKlass, I think, because rY should always be klass_offset_in_bytes. Maybe this is possible when we get odd address merges where we get a PhiNode as the offset/index? I don't know.
I agree, this *might* lead to surprising problems with implicit null-checking, if it is expected that the first instruction in loadNKlass provokes the SIGSEGV. A way around this would be to declare an opclass that is a subset of 'memory' that excludes all operands with index, and match on that. I think this would force the lea as a separate instruction and ensure that we never see such a thing in loadNKlass. However, I would not feel very confident to do that without a reproducer. Let me dig a little further.

For reference, here is my unsuccessful reproducer: https://gist.github.com/rkennke/8a57610d74fcde07a9390f268ec35738

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1768538965

From rkennke at openjdk.org  Fri Sep 20 15:29:52 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Fri, 20 Sep 2024 15:29:52 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v15]
In-Reply-To: <n5SsWdtVlbLQeGu2xxc39RFOkpcMUBUb16rd7IaZEAY=.a4d84480-b40f-4d74-9314-6b47c6551c20@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <CCz9B03dUxzua_YHk9-nyyHS_FTa6axwrwYjACcz4Y0=.6052f53c-9ddb-4868-a817-aff4276a2b6a@github.com>
 <6UJOrZqmfsJj6pRzMjPdlYt191QgBV6fIv1qJAYsv60=.15284272-464f-4321-b76c-3412dafc6c63@github.com>
 <poutu1_cWKZJS_aUjdHdxXdoYBb0nS5Zohb1-_UwFiY=.403c84c9-a971-4b16-b436-e34cc9654321@github.com>
 <IvQ2g_HP080mVZMQYJM3SCnjeAwZnJHfDHGUzwkXpR4=.053ed2be-6e53-4b91-b048-8efb57645c9c@github.com>
 <n5SsWdtVlbLQeGu2xxc39RFOkpcMUBUb16rd7IaZEAY=.a4d84480-b40f-4d74-9314-6b47c6551c20@github.com>
Message-ID: <kBBMPkYzFAS7ne_MN9yQpW1f60peFJsaxLxnSAD0reo=.f7abc0d7-dddd-4c13-a27f-a58772f7ba96@github.com>

On Fri, 20 Sep 2024 12:31:18 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> Thanks @rkennke, I tried running test tiers 1-3 using different compressed OOPs configurations but could not reach this code, unfortunately. Could you provide a reproducer? The reason I am particularly interested is because I'd like to find whether there could be any problematic interaction with C2's implicit null check optimization.
>
> I tried to reproduce for a few hours now using a custom testcase, with no success.
> I am pretty sure that this can happen, that is why I added this code. Originally I had an assert there asserting that index is not used. I do remember that this happens very rarely, and I don't remember the exact condition. Looking at the possible operands in opclass memory, I think this can only happen when we load an nKlass from an address of the form [rX, rY], i.e. the address in rX indexed by rY. This is an odd thing to happen for loadNKlass, I think, because rY should always be klass_offset_in_bytes. Maybe this is possible when we get odd address merges where we get a PhiNode as the offset/index? I don't know.
> I agree, this *might* lead to surprising problems with implicit null-checking, if it is expected that the first instruction in loadNKlass provokes the SIGSEGV. A way around this would be to declare an opclass that is a subset of 'memory' that excludes all operands with index, and match on that. I think this would force the lea as a separate instruction and ensure that we never see such a thing in loadNKlass. However, I would not feel very confident to do that without a reproducer. Let me dig a little further.
> 
> For reference, here is my unsuccessful reproducer: https://gist.github.com/rkennke/8a57610d74fcde07a9390f268ec35738

Something like this is what I have in mind. It seems to pass tier1 tests. I still haven't managed to reproduce the path that requires an index register, though.
https://github.com/rkennke/jdk/commit/2c4a7877e4ef94017c8155578d8cfc9342441656

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1768816377

From matsaave at openjdk.org  Fri Sep 20 17:21:51 2024
From: matsaave at openjdk.org (Matias Saavedra Silva)
Date: Fri, 20 Sep 2024 17:21:51 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v23]
In-Reply-To: <0BrAbBTKmpqTGDrc--2znzO8t07yoqabwa6g2K05GHI=.d3c17fd5-4770-4623-8d2f-604816afc033@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <0BrAbBTKmpqTGDrc--2znzO8t07yoqabwa6g2K05GHI=.d3c17fd5-4770-4623-8d2f-604816afc033@github.com>
Message-ID: <tqr9Mxi8tb3Ng8WCDF6_mrCick03DcwMV6IQJVfcF64=.956103a6-0d69-4dac-a819-39be957cf188@github.com>

On Thu, 19 Sep 2024 12:08:46 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> This is the main body of the JEP 450: Compact Object Headers (Experimental).
>> 
>> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
>> 
>> Main changes:
>>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>>  - Arrays will now store their length at offset 8.
>>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co...
>
> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Merge remote-tracking branch 'lilliput/JEP-450-temporary-fix-branch-2' into JDK-8305895-v4
>  - review feedback

CDS changes look good! Have two style comments but otherwise this makes sense

-------------

Marked as reviewed by matsaave (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2318793061

From matsaave at openjdk.org  Fri Sep 20 17:21:53 2024
From: matsaave at openjdk.org (Matias Saavedra Silva)
Date: Fri, 20 Sep 2024 17:21:53 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v6]
In-Reply-To: <BGmZ04yibLn_KtPyQrA3sQv5OtMonrcDy3aN0vAj8NA=.5de6cae3-3a05-4a74-a15c-aade4176bdeb@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <BGmZ04yibLn_KtPyQrA3sQv5OtMonrcDy3aN0vAj8NA=.5de6cae3-3a05-4a74-a15c-aade4176bdeb@github.com>
Message-ID: <prghUAoX81SehIWD3kZW80POXP-2FuDwqTO2XaVZFDo=.8861307e-4f8d-4690-ac08-fb2e1923471e@github.com>

On Thu, 22 Aug 2024 20:08:43 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> This is the main body of the JEP 450: Compact Object Headers (Experimental).
>> 
>> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
>> 
>> Main changes:
>>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>>  - Arrays will now store their length at offset 8.
>>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co...
>
> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix bit counts in GCForwarding

src/hotspot/share/cds/archiveBuilder.cpp line 677:

> 675:     // Allocate space for the future InstanceKlass with proper alignment
> 676:     const size_t alignment =
> 677: #ifdef _LP64

I think the text alignment here is a bit confusing. Should 678 and 682 be at the same indentation?

src/hotspot/share/cds/archiveUtils.cpp line 348:

> 346:   old_tag = (int)(intptr_t)nextPtr();
> 347:   // do_int(&old_tag);
> 348:   assert(tag == old_tag, "tag doesn't match (%d, expected %d)", old_tag, tag);

Is this assert message change a leftover from debugging or is it meant to be this way?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1768946883
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1768923643

From coleenp at openjdk.org  Fri Sep 20 18:19:51 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Fri, 20 Sep 2024 18:19:51 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v23]
In-Reply-To: <0BrAbBTKmpqTGDrc--2znzO8t07yoqabwa6g2K05GHI=.d3c17fd5-4770-4623-8d2f-604816afc033@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <0BrAbBTKmpqTGDrc--2znzO8t07yoqabwa6g2K05GHI=.d3c17fd5-4770-4623-8d2f-604816afc033@github.com>
Message-ID: <VjSp2vvwfzZWKtreZyiwjxsSblAX5JHCpF3c0mcYUHs=.b154b3dd-a8f2-4fdf-8176-52709f906891@github.com>

On Thu, 19 Sep 2024 12:08:46 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> This is the main body of the JEP 450: Compact Object Headers (Experimental).
>> 
>> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
>> 
>> Main changes:
>>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>>  - Arrays will now store their length at offset 8.
>>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co...
>
> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Merge remote-tracking branch 'lilliput/JEP-450-temporary-fix-branch-2' into JDK-8305895-v4
>  - review feedback

I mostly reviewed the metaspace changes and suggest upstreaming the MetaBlock refactoring ahead of the rest of this patch.
Only one comment about the interpreter code (affecting 4 locations).

src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 3636:

> 3634:     } else {
> 3635:       __ sub(r3, r3, sizeof(oopDesc));
> 3636:     }

This looks like something that could be buggy if we're not careful.  We had a pass where we cleaned up sizeof(oopDesc) once.  Can this be in oopDesc as (this is not header_size() anymore?) some function with the right name?

src/hotspot/cpu/x86/templateTable_x86.cpp line 4121:

> 4119:       __ movptr(Address(rax, rdx, Address::times_8, sizeof(oopDesc) - 1*oopSize), rcx);
> 4120:       NOT_LP64(__ movptr(Address(rax, rdx, Address::times_8, sizeof(oopDesc) - 2*oopSize), rcx));
> 4121:     }

For this and above, I'd rather oopDesc encapsulate the header_size for UseCompactObjectHeaders condition in C++ code, and never see sizeof(oopDesc).

src/hotspot/share/memory/metaspace.cpp line 799:

> 797: 
> 798:     // Set up compressed class pointer encoding.
> 799:     // In CDS=off mode, we give the JVM some leeway to choose a favorable base/shift combination.

I don't know why this comment is here.  Seems out of place.

src/hotspot/share/memory/metaspace/freeBlocks.cpp line 57:

> 55:     }
> 56:   }
> 57:   return p;

This answers my prior question.  The waste is added back to the block list for non-class-arenas as well.

src/hotspot/share/memory/metaspace/metablock.hpp line 74:

> 72: #define METABLOCKFORMATARGS(__block__)  p2i((__block__).base()), (__block__).word_size()
> 73: 
> 74: } // namespace metaspace

I am wondering if some of these metaspace changes, that is, the addition of MetaBlock could be upstreamed ahead of the CompactObjectHeaders.  Some is refactoring so that you can use the wastage to allocate into class-arena but a lot of this seems neutral to compact object headers, and would reduce this patch and allow different people to focus on just this.

src/hotspot/share/memory/metaspace/metaspaceArena.cpp line 470:

> 468: 
> 469: // Returns true if the given block is contained in this arena
> 470: // Returns true if the given block is contained in this arena

Here's the same comment twice.

-------------

PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2318539468
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1768775590
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1768781956
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1768979540
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1769008437
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1769012842
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1769015008

From coleenp at openjdk.org  Fri Sep 20 18:19:52 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Fri, 20 Sep 2024 18:19:52 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v11]
In-Reply-To: <m9Xj-C2ZuCBJfaOrr8zH59Ny5LDERRs-Lw5oVzDGvII=.daaae9b4-b1a3-491c-a5d0-9e327443e3bd@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <DUgk6HT_Zk5KnbmAd_HzrJi4B8D1J-uuoAGzReWyI8s=.6c04f90a-bf1d-49af-a31d-5224973952d4@github.com>
 <b8u6PaStO4nt0AGnbZEkmXkzQKzXaB2c4lSiC9JWdTY=.a5805ef2-84d8-44c0-9dc0-3e90da3460aa@github.com>
 <m9Xj-C2ZuCBJfaOrr8zH59Ny5LDERRs-Lw5oVzDGvII=.daaae9b4-b1a3-491c-a5d0-9e327443e3bd@github.com>
Message-ID: <5Mh0IteE4Z7zDseCNMmYKOtyCMTQe0iuAp70kJf8pS0=.8215bcce-6387-46fb-97a0-d1e6a9498b61@github.com>

On Wed, 18 Sep 2024 13:57:29 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> src/hotspot/share/memory/classLoaderMetaspace.cpp line 87:
>> 
>>> 85:         klass_alignment_words,
>>> 86:         "class arena");
>>> 87:   }
>> 
>> As per my comment in the header file, change the code to this:
>> 
>> ```c++
>> if (class_context != nullptr) {
>>   // ... Same as in PR
>> } else {
>>   _class_space_arena = _non_class_space_arena;
>> }
>
> Rather not, see reasoning under https://github.com/openjdk/jdk/pull/20677/files#r1754330432

Yes, I'd rather _class_space_arena be nullptr if not used.

>> src/hotspot/share/memory/classLoaderMetaspace.cpp line 115:
>> 
>>> 113:   if (wastage.is_nonempty()) {
>>> 114:     non_class_space_arena()->deallocate(wastage);
>>> 115:   }
>> 
>> This code reads a bit strangely. I understand *what* it tries to do. It tries to give back any wasted memory from either the class space arena *or* the non class space arena to the non class space arena's freelist. I assume that we do this since any wastage is presumably too small to be used by our new 22-bit class pointers. However, this context will be lost on future readers. It should have at least a comment in the `if (wastage.is_nonempty())` clause explaining what we expect should happen and why. For example:
>> 
>> ```c++
>> // Any wasted memory is presumably too small for any class.
>> // Therefore, give it back to the non-class space arena's free list.
>
> Yes. Some background:
> 
> - wastage can only occur for larger Klass* alignments (aka class space arena alignment property), so only for +COH (note to self, maybe assert)
> - wastage is, by definition, not aligned to the required Klass* alignment, so it cannot be reused. Yes, its probably also too small
> 
> Yes, I will write a better comment.

Yes, this definitely needs a comment why since this is how we allocate small chunks of wasted because of hyper-aligning Klasses in class space.   Line 111 is somewhat surprising though.  I didn't expect there to be wastage from allocating to non-class-metaspace.

The unnerving bit of this is that CompressedKlassPointers::is_encodable() is true for memory allocated here.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1768897591
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1768966812

From coleenp at openjdk.org  Fri Sep 20 18:19:53 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Fri, 20 Sep 2024 18:19:53 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v11]
In-Reply-To: <5Mh0IteE4Z7zDseCNMmYKOtyCMTQe0iuAp70kJf8pS0=.8215bcce-6387-46fb-97a0-d1e6a9498b61@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <DUgk6HT_Zk5KnbmAd_HzrJi4B8D1J-uuoAGzReWyI8s=.6c04f90a-bf1d-49af-a31d-5224973952d4@github.com>
 <b8u6PaStO4nt0AGnbZEkmXkzQKzXaB2c4lSiC9JWdTY=.a5805ef2-84d8-44c0-9dc0-3e90da3460aa@github.com>
 <m9Xj-C2ZuCBJfaOrr8zH59Ny5LDERRs-Lw5oVzDGvII=.daaae9b4-b1a3-491c-a5d0-9e327443e3bd@github.com>
 <5Mh0IteE4Z7zDseCNMmYKOtyCMTQe0iuAp70kJf8pS0=.8215bcce-6387-46fb-97a0-d1e6a9498b61@github.com>
Message-ID: <BEtvFqI5Oh4yHALynmunhsWEwWLdkF2FuyBqJlHwO3k=.0bfe0664-3325-436d-a56e-e997d7a2d1f7@github.com>

On Fri, 20 Sep 2024 17:34:09 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

>> Yes. Some background:
>> 
>> - wastage can only occur for larger Klass* alignments (aka class space arena alignment property), so only for +COH (note to self, maybe assert)
>> - wastage is, by definition, not aligned to the required Klass* alignment, so it cannot be reused. Yes, its probably also too small
>> 
>> Yes, I will write a better comment.
>
> Yes, this definitely needs a comment why since this is how we allocate small chunks of wasted because of hyper-aligning Klasses in class space.   Line 111 is somewhat surprising though.  I didn't expect there to be wastage from allocating to non-class-metaspace.
> 
> The unnerving bit of this is that CompressedKlassPointers::is_encodable() is true for memory allocated here.

I think this should also assert or be condionalized on UseCompactObjectHeaders.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1768972448

From xpeng at openjdk.org  Fri Sep 20 18:31:55 2024
From: xpeng at openjdk.org (Xiaolong Peng)
Date: Fri, 20 Sep 2024 18:31:55 GMT
Subject: RFR: 8340490: Shenandoah: Optimize ShenandoahPacer
Message-ID: <mh4-1fuv1_9n0e_BCeIX-WHnXZN-1A7woAkyJinXkIU=.c03edc09-82fc-44c2-b09b-22d543fb4ec4@github.com>

In a simple latency benchmark for memory allocation, I found ShenandoahPacer contributed quite a lot to the long tail latency > 10ms, when there are multi mutator threads failed at fast path to claim budget [here](https://github.com/openjdk/jdk/blob/fdc16a373459cb2311316448c765b1bee5c73694/src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp#L230), all of them will forcefully claim and them wait for up to 10ms([code link](https://github.com/openjdk/jdk/blob/fdc16a373459cb2311316448c765b1bee5c73694/src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp#L239-L277))

The change in this PR makes ShenandoahPacer impact long tail latency much less, instead forcefully claim budget and them wait, it attempts to claim after waiting for 1ms, and keep doing this until: 1/ either spent 10ms waiting in total; 2/ or successfully claimed the budget.

Here the latency comparison for the optimization:
![hdr-histogram-optimize-pacer](https://github.com/user-attachments/assets/811f48c5-87eb-462d-8b27-d15bd08be7b0)

With the optimization, long tail latency from the test code below has been much improved from over 20ms to ~10ms on MacOS with M3 chip:

    static final int threadCount = Runtime.getRuntime().availableProcessors();
    static final LongAdder totalCount = new LongAdder();
    static volatile byte[] sink;
    public static void main(String[] args) {
        runAllocationTest(100000);
    }
    static void recordTimeToAllocate(final int dataSize, final Histogram histogram) {
        long startTime = System.nanoTime();
        sink = new byte[dataSize];
        long endTime = System.nanoTime();
        histogram.recordValue(endTime - startTime);
    }

    static void runAllocationTest(final int dataSize) {
        final long endTime = System.currentTimeMillis() + 30_000;
        final CountDownLatch startSignal = new CountDownLatch(1);
        final CountDownLatch finished = new CountDownLatch(threadCount);
        final Thread[] threads = new Thread[threadCount];
        final Histogram[] histograms = new Histogram[threadCount];
        final Histogram totalHistogram = new Histogram(3600000000000L, 3);
        for (int i = 0; i < threadCount; i++) {
            final var histogram = new Histogram(3600000000000L, 3);
            histograms[i] = histogram;
            threads[i] = new Thread(() -> {
                wait(startSignal);
                do {
                    recordTimeToAllocate(dataSize, histogram);
                } while (System.currentTimeMillis() < endTime);
                finished.countDown();
            });
            threads[i].start();
        }

        startSignal.countDown(); //Start to test
        wait(finished);
        
        for (Histogram histogram : histograms) {
            totalHistogram.add(histogram);
        }

        totalHistogram.outputPercentileDistribution(System.out, 1000.0);

    }

    public static void wait(final CountDownLatch latch) {
        try {
            latch.await();
        } catch (InterruptedException e) {
            throw new RuntimeException(e);
        }
    }


### Additional test
- [x] MacOS AArch64 server fastdebug, hotspot_gc_shenandoah

-------------

Commit messages:
 - use const
 - refactor
 - Clean code
 - try claim_for_alloc before calculating total_delay
 - try claim_for_alloc before calculating total_delay
 - clean up
 - 8340490: Shenandoah: Optimize ShenandoahPacer

Changes: https://git.openjdk.org/jdk/pull/21099/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21099&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8340490
  Stats: 41 lines in 3 files changed: 8 ins; 16 del; 17 mod
  Patch: https://git.openjdk.org/jdk/pull/21099.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/21099/head:pull/21099

PR: https://git.openjdk.org/jdk/pull/21099

From shade at openjdk.org  Fri Sep 20 18:31:56 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Fri, 20 Sep 2024 18:31:56 GMT
Subject: RFR: 8340490: Shenandoah: Optimize ShenandoahPacer
In-Reply-To: <mh4-1fuv1_9n0e_BCeIX-WHnXZN-1A7woAkyJinXkIU=.c03edc09-82fc-44c2-b09b-22d543fb4ec4@github.com>
References: <mh4-1fuv1_9n0e_BCeIX-WHnXZN-1A7woAkyJinXkIU=.c03edc09-82fc-44c2-b09b-22d543fb4ec4@github.com>
Message-ID: <61bsu02-sDgUFx45vjC3xAvAaDtIY405Qm8lh3Edp28=.5a1c2d42-a51c-4014-b5e3-64cbd02786ed@github.com>

On Thu, 19 Sep 2024 23:32:14 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote:

> In a simple latency benchmark for memory allocation, I found ShenandoahPacer contributed quite a lot to the long tail latency > 10ms, when there are multi mutator threads failed at fast path to claim budget [here](https://github.com/openjdk/jdk/blob/fdc16a373459cb2311316448c765b1bee5c73694/src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp#L230), all of them will forcefully claim and them wait for up to 10ms([code link](https://github.com/openjdk/jdk/blob/fdc16a373459cb2311316448c765b1bee5c73694/src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp#L239-L277))
> 
> The change in this PR makes ShenandoahPacer impact long tail latency much less, instead forcefully claim budget and them wait, it attempts to claim after waiting for 1ms, and keep doing this until: 1/ either spent 10ms waiting in total; 2/ or successfully claimed the budget.
> 
> Here the latency comparison for the optimization:
> ![hdr-histogram-optimize-pacer](https://github.com/user-attachments/assets/811f48c5-87eb-462d-8b27-d15bd08be7b0)
> 
> With the optimization, long tail latency from the test code below has been much improved from over 20ms to ~10ms on MacOS with M3 chip:
> 
>     static final int threadCount = Runtime.getRuntime().availableProcessors();
>     static final LongAdder totalCount = new LongAdder();
>     static volatile byte[] sink;
>     public static void main(String[] args) {
>         runAllocationTest(100000);
>     }
>     static void recordTimeToAllocate(final int dataSize, final Histogram histogram) {
>         long startTime = System.nanoTime();
>         sink = new byte[dataSize];
>         long endTime = System.nanoTime();
>         histogram.recordValue(endTime - startTime);
>     }
> 
>     static void runAllocationTest(final int dataSize) {
>         final long endTime = System.currentTimeMillis() + 30_000;
>         final CountDownLatch startSignal = new CountDownLatch(1);
>         final CountDownLatch finished = new CountDownLatch(threadCount);
>         final Thread[] threads = new Thread[threadCount];
>         final Histogram[] histograms = new Histogram[threadCount];
>         final Histogram totalHistogram = new Histogram(3600000000000L, 3);
>         for (int i = 0; i < threadCount; i++) {
>             final var histogram = new Histogram(3600000000000L, 3);
>             histograms[i] = histogram;
>             threads[i] = new Thread(() -> {
>                 wait(startSignal);
>                 do {
>                     recordTimeToAllocate(dataSize, histogram);
>                 } while (System.currentTimeMillis() < e...

I am trying to remember why we even bothered to go into negative budget on this path, and then waited for it to recover. I think it is from here: https://mail.openjdk.org/pipermail/shenandoah-dev/2018-April/005559.html. AFAICS, the intent for that fix was to make sure that unsuccessful pacing claim the budget, which this patch also does. And given it apparently improves performance, I don't mind it going in.

Comprehension question: the actual improvement comes from waiting in 1ms slices, not from anything else? In retrospect, _it is_ silly to wait until the deadline before attempting to claim the pacing budget.

I am good with this, assuming performance runs show good results.

src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp line 191:

> 189:   _need_notify_waiters.try_set();
> 190: }
> 191: template<bool FORCE>

Newline before `template`, please.

src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp line 206:

> 204:     }
> 205:     new_val = cur - tax;
> 206:   } while (Atomic::load(&_budget) == cur &&

I don't think we need this load, since we have _just_ had another load nearby. This should be enough to resolve the contention issues TTAS pattern tries to avoid.

src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp line 256:

> 254:   double total_delay = 0;
> 255: 
> 256:   double start = os::elapsedTime();

While we are here, let's avoid some integer divisions and floating-point math. Try to rewrite this using `jlong os::elapsed_counter()`, which returns integer nanoseconds? Do the math in `jlong`-s.

src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp line 257:

> 255: 
> 256:   double start = os::elapsedTime();
> 257:   while (!claimed) {

I suggest we common some exit paths by writing the loop like this:


double start_time = os::elapsedTime();
while (!claimed && (os::elapsedTime() - start_time) < max_delay) {
  // We could instead assist GC, but this would suffice for now.
  wait(1);
  claimed = claim_for_alloc<false>(words);
}
if (!claimed) {
  // Spent local time budget to wait for enough GC progress.
  // Force allocating anyway, which may mean we outpace GC,
  // and start Degenerated GC cycle.
  claimed = claim_for_alloc<true>(words);
  assert(claimed, "Should always succeed");
}
ShenandoahThreadLocalData::add_paced_time(JavaThread::current(), os::elapsedTime() - start_time);

src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp line 265:

> 263:     // and start Degenerated GC cycle.
> 264:     claimed = claim_for_alloc<true>(words);
> 265:     assert(claimed, "Should always succeed");

Come to think about it, we don't need to check for return value here. We don't check in other place where we call `claim_for_alloc<true>(words);`

src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp line 267:

> 265:     assert(claimed, "Should always succeed");
> 266:   }
> 267:   ShenandoahThreadLocalData::add_paced_time(JavaThread::current(), (double)(os::elapsed_counter() - start_time) / (double) NANOSECS_PER_SEC);

We already have `current` (`JavaThread::current()`) in scope here, use that :)
I also think a second cast to `(double) NANOSECS_PER_SEC` is redundant.

-------------

PR Review: https://git.openjdk.org/jdk/pull/21099#pullrequestreview-2317722311
Marked as reviewed by shade (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/21099#pullrequestreview-2318988155
PR Review Comment: https://git.openjdk.org/jdk/pull/21099#discussion_r1768272092
PR Review Comment: https://git.openjdk.org/jdk/pull/21099#discussion_r1768271671
PR Review Comment: https://git.openjdk.org/jdk/pull/21099#discussion_r1768281644
PR Review Comment: https://git.openjdk.org/jdk/pull/21099#discussion_r1768291970
PR Review Comment: https://git.openjdk.org/jdk/pull/21099#discussion_r1769025976
PR Review Comment: https://git.openjdk.org/jdk/pull/21099#discussion_r1769027052

From xpeng at openjdk.org  Fri Sep 20 18:31:56 2024
From: xpeng at openjdk.org (Xiaolong Peng)
Date: Fri, 20 Sep 2024 18:31:56 GMT
Subject: RFR: 8340490: Shenandoah: Optimize ShenandoahPacer
In-Reply-To: <61bsu02-sDgUFx45vjC3xAvAaDtIY405Qm8lh3Edp28=.5a1c2d42-a51c-4014-b5e3-64cbd02786ed@github.com>
References: <mh4-1fuv1_9n0e_BCeIX-WHnXZN-1A7woAkyJinXkIU=.c03edc09-82fc-44c2-b09b-22d543fb4ec4@github.com>
 <61bsu02-sDgUFx45vjC3xAvAaDtIY405Qm8lh3Edp28=.5a1c2d42-a51c-4014-b5e3-64cbd02786ed@github.com>
Message-ID: <CBJLCwYKqrdTQ9T49jBynZGf7rNcDAyGipaOlUNRcmc=.3625732b-d75c-4ada-961b-ee2ee9122c21@github.com>

On Fri, 20 Sep 2024 09:46:50 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> I am trying to remember why we even bothered to go into negative budget on this path, and then waited for it to recover. I think it is from here: https://mail.openjdk.org/pipermail/shenandoah-dev/2018-April/005559.html. AFAICS, the intent for that fix was to make sure that unsuccessful pacing claim the budget, which this patch also does. And given it apparently improves performance, I don't mind it going in.
> 
> Comprehension question: the actual improvement comes from waiting in 1ms slices, not from anything else? In retrospect, _it is_ silly to wait until the deadline before attempting to claim the pacing budget.

> src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp line 206:
> 
>> 204:     }
>> 205:     new_val = cur - tax;
>> 206:   } while (Atomic::load(&_budget) == cur &&
> 
> I don't think we need this load, since we have _just_ had another load nearby. This should be enough to resolve the contention issues TTAS pattern tries to avoid.

Thanks, reverted TTAS pattern.

> src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp line 257:
> 
>> 255: 
>> 256:   double start = os::elapsedTime();
>> 257:   while (!claimed) {
> 
> I suggest we common some exit paths by writing the loop like this:
> 
> 
> double start_time = os::elapsedTime();
> while (!claimed && (os::elapsedTime() - start_time) < max_delay) {
>   // We could instead assist GC, but this would suffice for now.
>   wait(1);
>   claimed = claim_for_alloc<false>(words);
> }
> if (!claimed) {
>   // Spent local time budget to wait for enough GC progress.
>   // Force allocating anyway, which may mean we outpace GC,
>   // and start Degenerated GC cycle.
>   claimed = claim_for_alloc<true>(words);
>   assert(claimed, "Should always succeed");
> }
> ShenandoahThreadLocalData::add_paced_time(JavaThread::current(), os::elapsedTime() - start_time);

Thanks, refactored the code along with the change to use os::elapsed_counter(), only need handle the nanos to seconds conversion when calling  ShenandoahThreadLocalData::add_paced_time at the last.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21099#issuecomment-2364294440
PR Review Comment: https://git.openjdk.org/jdk/pull/21099#discussion_r1769018442
PR Review Comment: https://git.openjdk.org/jdk/pull/21099#discussion_r1769021455

From xpeng at openjdk.org  Fri Sep 20 18:47:50 2024
From: xpeng at openjdk.org (Xiaolong Peng)
Date: Fri, 20 Sep 2024 18:47:50 GMT
Subject: RFR: 8340490: Shenandoah: Optimize ShenandoahPacer [v2]
In-Reply-To: <mh4-1fuv1_9n0e_BCeIX-WHnXZN-1A7woAkyJinXkIU=.c03edc09-82fc-44c2-b09b-22d543fb4ec4@github.com>
References: <mh4-1fuv1_9n0e_BCeIX-WHnXZN-1A7woAkyJinXkIU=.c03edc09-82fc-44c2-b09b-22d543fb4ec4@github.com>
Message-ID: <U4-6cg1j9inHQdKJDt0mYEl9DFc6nQ9sbdiao4hSH_4=.9e421feb-a9c7-4c77-b745-e307749b052f@github.com>

> In a simple latency benchmark for memory allocation, I found ShenandoahPacer contributed quite a lot to the long tail latency > 10ms, when there are multi mutator threads failed at fast path to claim budget [here](https://github.com/openjdk/jdk/blob/fdc16a373459cb2311316448c765b1bee5c73694/src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp#L230), all of them will forcefully claim and them wait for up to 10ms([code link](https://github.com/openjdk/jdk/blob/fdc16a373459cb2311316448c765b1bee5c73694/src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp#L239-L277))
> 
> The change in this PR makes ShenandoahPacer impact long tail latency much less, instead forcefully claim budget and them wait, it attempts to claim after waiting for 1ms, and keep doing this until: 1/ either spent 10ms waiting in total; 2/ or successfully claimed the budget.
> 
> Here the latency comparison for the optimization:
> ![hdr-histogram-optimize-pacer](https://github.com/user-attachments/assets/811f48c5-87eb-462d-8b27-d15bd08be7b0)
> 
> With the optimization, long tail latency from the test code below has been much improved from over 20ms to ~10ms on MacOS with M3 chip:
> 
>     static final int threadCount = Runtime.getRuntime().availableProcessors();
>     static final LongAdder totalCount = new LongAdder();
>     static volatile byte[] sink;
>     public static void main(String[] args) {
>         runAllocationTest(100000);
>     }
>     static void recordTimeToAllocate(final int dataSize, final Histogram histogram) {
>         long startTime = System.nanoTime();
>         sink = new byte[dataSize];
>         long endTime = System.nanoTime();
>         histogram.recordValue(endTime - startTime);
>     }
> 
>     static void runAllocationTest(final int dataSize) {
>         final long endTime = System.currentTimeMillis() + 30_000;
>         final CountDownLatch startSignal = new CountDownLatch(1);
>         final CountDownLatch finished = new CountDownLatch(threadCount);
>         final Thread[] threads = new Thread[threadCount];
>         final Histogram[] histograms = new Histogram[threadCount];
>         final Histogram totalHistogram = new Histogram(3600000000000L, 3);
>         for (int i = 0; i < threadCount; i++) {
>             final var histogram = new Histogram(3600000000000L, 3);
>             histograms[i] = histogram;
>             threads[i] = new Thread(() -> {
>                 wait(startSignal);
>                 do {
>                     recordTimeToAllocate(dataSize, histogram);
>                 } while (System.currentTimeMillis() < e...

Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision:

  clean up

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/21099/files
  - new: https://git.openjdk.org/jdk/pull/21099/files/1de70211..58196a4f

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=21099&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21099&range=00-01

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/21099.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/21099/head:pull/21099

PR: https://git.openjdk.org/jdk/pull/21099

From xpeng at openjdk.org  Fri Sep 20 18:47:50 2024
From: xpeng at openjdk.org (Xiaolong Peng)
Date: Fri, 20 Sep 2024 18:47:50 GMT
Subject: RFR: 8340490: Shenandoah: Optimize ShenandoahPacer [v2]
In-Reply-To: <CBJLCwYKqrdTQ9T49jBynZGf7rNcDAyGipaOlUNRcmc=.3625732b-d75c-4ada-961b-ee2ee9122c21@github.com>
References: <mh4-1fuv1_9n0e_BCeIX-WHnXZN-1A7woAkyJinXkIU=.c03edc09-82fc-44c2-b09b-22d543fb4ec4@github.com>
 <61bsu02-sDgUFx45vjC3xAvAaDtIY405Qm8lh3Edp28=.5a1c2d42-a51c-4014-b5e3-64cbd02786ed@github.com>
 <CBJLCwYKqrdTQ9T49jBynZGf7rNcDAyGipaOlUNRcmc=.3625732b-d75c-4ada-961b-ee2ee9122c21@github.com>
Message-ID: <-6wa5ftQJ3WdXiX-SsMY-nXgnTWCl9ZzDTt89akghyM=.7e53fed3-2f4c-4c40-9465-15c97bf8e089@github.com>

On Fri, 20 Sep 2024 18:27:14 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote:

> I am trying to remember why we even bothered to go into negative budget on this path, and then waited for it to recover. I think it is from here: https://mail.openjdk.org/pipermail/shenandoah-dev/2018-April/005559.html. AFAICS, the intent for that fix was to make sure that unsuccessful pacing claim the budget, which this patch also does. And given it apparently improves performance, I don't mind it going in.
> 
> Comprehension question: the actual improvement comes from waiting in 1ms slices, not from anything else? In retrospect, _it is_ silly to wait until the deadline before attempting to claim the pacing budget.

It is primarily from the  algorithm change with 1ms slices. 

The behavior has been changed in the new algorithm with 1ms slices, e.g. when 10 threads seeming insufficient budget at the same time, assuming each of them claim 100 budget, in old algorithm all of the 10 threads forcefully claim the budget and result in `-1000` budget, them it need other mutators to release at least `1000` or they have to wait for up to 10ms even they may be woken up by the ShenandoahPeriodicPacerNotifyTask. In new algorithm, each threads will try to claim 100 budget every 1ms and don't need to wait other mutators to release at least `1000`, as soon as enough budget(>100) is returned, some thread(s) will compete others and proceed.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21099#issuecomment-2364322181

From xpeng at openjdk.org  Fri Sep 20 18:51:35 2024
From: xpeng at openjdk.org (Xiaolong Peng)
Date: Fri, 20 Sep 2024 18:51:35 GMT
Subject: RFR: 8340490: Shenandoah: Optimize ShenandoahPacer [v2]
In-Reply-To: <61bsu02-sDgUFx45vjC3xAvAaDtIY405Qm8lh3Edp28=.5a1c2d42-a51c-4014-b5e3-64cbd02786ed@github.com>
References: <mh4-1fuv1_9n0e_BCeIX-WHnXZN-1A7woAkyJinXkIU=.c03edc09-82fc-44c2-b09b-22d543fb4ec4@github.com>
 <61bsu02-sDgUFx45vjC3xAvAaDtIY405Qm8lh3Edp28=.5a1c2d42-a51c-4014-b5e3-64cbd02786ed@github.com>
Message-ID: <p1_dfGVCQ8Hswy1n4E2HPW22LVPjwPVba5cecSQxrPg=.22aeb487-488f-46e7-a86a-be1f4a6baedd@github.com>

On Fri, 20 Sep 2024 18:27:20 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> I am good with this, assuming performance runs show good results.

Latency wise, in most time it is better than old impl. 

In my specific test with 8G heap on MacOS, throughput is very close to the test w/ ShenandoahPacing disabled, and about 25%~30% improvement comparing the old implementation.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21099#issuecomment-2364332905

From coleenp at openjdk.org  Fri Sep 20 19:02:49 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Fri, 20 Sep 2024 19:02:49 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v9]
In-Reply-To: <BNQiyA0NCBO-oFZU4Nz3yvaMxDzICFxfqDKWt4wCL_U=.9405414d-f961-4168-8d26-ef5f58069161@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com>
 <NR6CKOtdUaVarB7hiAUIAsIMe03qiTQr5VcEOWGt8mA=.164efc10-f0ff-4ddf-80b8-c56599c0ba7d@github.com>
 <BNQiyA0NCBO-oFZU4Nz3yvaMxDzICFxfqDKWt4wCL_U=.9405414d-f961-4168-8d26-ef5f58069161@github.com>
Message-ID: <UQKdVMOsFJuEegNqlRHnj7Cqyo_MCOKpCiL30W_KWjk=.26f653ea-4858-4748-b374-454610acd3bd@github.com>

On Wed, 18 Sep 2024 12:54:34 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> src/hotspot/share/oops/markWord.inline.hpp line 90:
>> 
>>> 88:   ShouldNotReachHere();
>>> 89:   return markWord();
>>> 90: #endif
>> 
>> Is the ifdef _LP64 necessary, since UseCompactObjectHeaders should always be false for 32 bits?
>
> Kindof. The problem is that klass_shift is larger than 31, and shifting with it would thus be UB and generate a compiler warning. I opted to simply not compile any of that code in 32bit builds. We could also define klass_shift differently on 32bit.
> Long-term (maybe with Lilliput2/4-byte-headers?) it would be nice to consolidate the header layout between 32 and 64 bit builds and not make any distinction anywhere. E.g. define markWord (or objectHeader?) in a single way, and use that to extract all the relevant stuff. It's not totally unlikely that we deprecate 32-bit builds before that can happen, though.

Ok.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1769069007

From coleenp at openjdk.org  Fri Sep 20 19:09:50 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Fri, 20 Sep 2024 19:09:50 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v9]
In-Reply-To: <2w9H6VAbxm7BgSGRwKAxbI56bG-k4bE_ZDviGrBF36o=.3d4cb47f-0f84-479a-a809-6d0186dfad2d@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com>
 <NR6CKOtdUaVarB7hiAUIAsIMe03qiTQr5VcEOWGt8mA=.164efc10-f0ff-4ddf-80b8-c56599c0ba7d@github.com>
 <BNQiyA0NCBO-oFZU4Nz3yvaMxDzICFxfqDKWt4wCL_U=.9405414d-f961-4168-8d26-ef5f58069161@github.com>
 <igBchd3diNsHdnQwZS2sj4tjaKejOq4f4EZovOmpmRQ=.abf79fee-9592-4555-ac01-de0170665ae4@github.com>
 <BkuKSFHleaF8I2tWCDIC-rGlWnfGhbK1zQ6CLlB-rVE=.9c7976e3-b0e9-4356-8d4b-d880aa30bffa@github.com>
 <ubcY5Q-aPRCLmhZV9MjBfzzbs84FegDvNkHx8nwPUd0=.46503fd8-d274-4558-bd0f-cc0c4f199667@github.com>
 <2w9H6VAbxm7BgSGRwKAxbI56bG-k4bE_ZDviGrBF36o=.3d4cb47f-0f84-479a-a809-6d0186dfad2d@github.com>
Message-ID: <VLGyxnaSrbWp2xvih098AS__7lRE9tBq5hk8QhHSK94=.be7bb13d-7001-48ba-a1a9-8952b43873fc@github.com>

On Thu, 19 Sep 2024 14:22:51 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:

>> We haven't decided whether or not we will git rid of ```Klass::_prototype_header``` before intergrating this PR, or not. @stefank could point you to a WIP branch, if that's helpful.
>
> This is my current work-in-progress code:
> https://github.com/stefank/jdk/compare/pull/20677...stefank:jdk:lilliput_remove_prototype_header_wip_2
> 
> I've made some large rewrites and I'm currently running it through functional testing.

The refactoring is better in this last version with encode_and_store_compact_object_header, although some comments around the c2 version would be good.  Still don't know what the c2 version does.  Someone else should review that.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1769075714

From shade at openjdk.org  Sat Sep 21 05:54:42 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Sat, 21 Sep 2024 05:54:42 GMT
Subject: RFR: 8340490: Shenandoah: Optimize ShenandoahPacer [v2]
In-Reply-To: <p1_dfGVCQ8Hswy1n4E2HPW22LVPjwPVba5cecSQxrPg=.22aeb487-488f-46e7-a86a-be1f4a6baedd@github.com>
References: <mh4-1fuv1_9n0e_BCeIX-WHnXZN-1A7woAkyJinXkIU=.c03edc09-82fc-44c2-b09b-22d543fb4ec4@github.com>
 <61bsu02-sDgUFx45vjC3xAvAaDtIY405Qm8lh3Edp28=.5a1c2d42-a51c-4014-b5e3-64cbd02786ed@github.com>
 <p1_dfGVCQ8Hswy1n4E2HPW22LVPjwPVba5cecSQxrPg=.22aeb487-488f-46e7-a86a-be1f4a6baedd@github.com>
Message-ID: <Xf89-L8wLAMbngt5PxoN3B0sX94chZQz5zjqfuiBYZM=.6c6387e8-572a-4c47-8de2-d4f407924c88@github.com>

On Fri, 20 Sep 2024 18:48:45 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote:

> > I am good with this, assuming performance runs show good results.
> 
> Latency wise, in most time it is better than old impl.

It is great it improves targeted tests, and it makes sense from the first principles. Run our usual performance pipeline to sanity check if this affects any other benchmarks in any meaningful way.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21099#issuecomment-2365015599

From fyang at openjdk.org  Sat Sep 21 06:48:45 2024
From: fyang at openjdk.org (Fei Yang)
Date: Sat, 21 Sep 2024 06:48:45 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v24]
In-Reply-To: <-TKP8SOh4eIFRiOcn-a3aRb_GEgApaWv9vhyEOiKJSw=.d4ade865-3cb1-477d-a7c7-f46449553a64@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <-TKP8SOh4eIFRiOcn-a3aRb_GEgApaWv9vhyEOiKJSw=.d4ade865-3cb1-477d-a7c7-f46449553a64@github.com>
Message-ID: <y4r0APx4JgMk680hZEpMvn1fWLeJ_3-NIQBDaykZLRg=.e83a8cca-0a87-4f8a-bfd8-814626c0a086@github.com>

On Wed, 18 Sep 2024 17:45:51 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail.
>> 
>> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold:
>> 
>> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and
>> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work.
>> 
>> ## Summary of the Changes
>> 
>> ### Platform-Independent Changes (`src/hotspot/share`)
>> 
>> These consist mainly of:
>> 
>> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early;
>> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and
>> - temporary support for porting the JEP to the remaining platforms.
>> 
>> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed.
>> 
>> ### Platform-Dependent Changes (`src/hotspot/cpu`)
>> 
>> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms.
>> 
>> #### ADL Changes
>> 
>> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy.
>> 
>> #### `G1BarrierSetAssembler` Changes
>> 
>> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ...
>
> Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion
>  - Remove redundant comment

src/hotspot/cpu/aarch64/gc/g1/g1_aarch64.ad line 257:

> 255:                       RegSet::of($res$$Register) /* no_preserve */);
> 256:     __ mov($tmp1$$Register, $oldval$$Register);
> 257:     __ mov($tmp2$$Register, $newval$$Register);

Hi, I don't quite understand these two register-register moves here. Seems to me that we could pass `oldval` and `newval` to `cmpxchg` directly as `cmpxchg` won't modify them, which help us save these two moves. Did I miss anything? Thanks.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1769492955

From kbarrett at openjdk.org  Sat Sep 21 23:38:43 2024
From: kbarrett at openjdk.org (Kim Barrett)
Date: Sat, 21 Sep 2024 23:38:43 GMT
Subject: RFR: 8340573: Remove unused
 G1ParScanThreadState::_partial_objarray_chunk_size
Message-ID: <amChyi-bb0oLTn7pokdGShJgl-Pq2t0r9SIeIU1tKLQ=.425e0309-da21-4b72-b4b0-ebfdd39555eb@github.com>

Please review this trivial change to remove unused G1ParScanThreadState::_partial_objarray_chunk_size.

Testing: local (linux-x64) clean build

-------------

Commit messages:
 - remove unused _partial_objarray_chunk_size

Changes: https://git.openjdk.org/jdk/pull/21117/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21117&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8340573
  Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/21117.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/21117/head:pull/21117

PR: https://git.openjdk.org/jdk/pull/21117

From stuefe at openjdk.org  Sun Sep 22 12:01:51 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Sun, 22 Sep 2024 12:01:51 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v6]
In-Reply-To: <prghUAoX81SehIWD3kZW80POXP-2FuDwqTO2XaVZFDo=.8861307e-4f8d-4690-ac08-fb2e1923471e@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <BGmZ04yibLn_KtPyQrA3sQv5OtMonrcDy3aN0vAj8NA=.5de6cae3-3a05-4a74-a15c-aade4176bdeb@github.com>
 <prghUAoX81SehIWD3kZW80POXP-2FuDwqTO2XaVZFDo=.8861307e-4f8d-4690-ac08-fb2e1923471e@github.com>
Message-ID: <VBo_MMmdcMSNwpXxmwt4i0qTN8WWvzguRIWGgMIeBNs=.b9131817-3f3e-446a-a5f4-021251136431@github.com>

On Fri, 20 Sep 2024 16:56:58 GMT, Matias Saavedra Silva <matsaave at openjdk.org> wrote:

>> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Fix bit counts in GCForwarding
>
> src/hotspot/share/cds/archiveUtils.cpp line 348:
> 
>> 346:   old_tag = (int)(intptr_t)nextPtr();
>> 347:   // do_int(&old_tag);
>> 348:   assert(tag == old_tag, "tag doesn't match (%d, expected %d)", old_tag, tag);
> 
> Is this assert message change a leftover from debugging or is it meant to be this way?

Its a leftover, but otoh it does not hurt. I found myself re-adding it several times to analyze CDS issues during development, so I decided to just leave it in.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1770536320

From tschatzl at openjdk.org  Mon Sep 23 07:15:37 2024
From: tschatzl at openjdk.org (Thomas Schatzl)
Date: Mon, 23 Sep 2024 07:15:37 GMT
Subject: RFR: 8340573: Remove unused
 G1ParScanThreadState::_partial_objarray_chunk_size
In-Reply-To: <amChyi-bb0oLTn7pokdGShJgl-Pq2t0r9SIeIU1tKLQ=.425e0309-da21-4b72-b4b0-ebfdd39555eb@github.com>
References: <amChyi-bb0oLTn7pokdGShJgl-Pq2t0r9SIeIU1tKLQ=.425e0309-da21-4b72-b4b0-ebfdd39555eb@github.com>
Message-ID: <y4PvUeXFTqoZRzi9zVCjsiJCFenFC1Z04-S_X3Qb0ms=.fd8eb3a4-da55-4f1e-8409-4d85c4085b5b@github.com>

On Sat, 21 Sep 2024 23:34:24 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:

> Please review this trivial change to remove unused G1ParScanThreadState::_partial_objarray_chunk_size.
> 
> Testing: local (linux-x64) clean build

Lgtm and trivial.

-------------

Marked as reviewed by tschatzl (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/21117#pullrequestreview-2321288400

From aboldtch at openjdk.org  Mon Sep 23 07:28:02 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Mon, 23 Sep 2024 07:28:02 GMT
Subject: RFR: 8340146: ZGC: TestAllocateHeapAt.java should not run with
 UseLargePages
Message-ID: <1AOCA9rqJqTywzPzJ7jtmw4SJ01kE1LAfvmG1otOH-U=.e17bd4bd-221b-4ae6-a213-57e031f731a1@github.com>

TestAllocateHeapAt.java expects that creating the heap file works in the current director (`.`). But when using persistent hugepages (-XX:+UseLargePages) this would require the filesystem to be a HugeTLBFS.

I propose that we do not allow running these tests with persistent hugepages.

-------------

Commit messages:
 - 8340146: ZGC: TestAllocateHeapAt.java should not run with UseLargePages

Changes: https://git.openjdk.org/jdk/pull/21127/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21127&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8340146
  Stats: 5 lines in 3 files changed: 3 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/21127.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/21127/head:pull/21127

PR: https://git.openjdk.org/jdk/pull/21127

From aboldtch at openjdk.org  Mon Sep 23 07:32:47 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Mon, 23 Sep 2024 07:32:47 GMT
Subject: RFR: 8340419: ZGC: Create an UseLargePages adaptation of
 TestAllocateHeapAt.java
Message-ID: <ynQJWSbQMh3j1ryENUE6EMdku-NwJsDEVxlfbP8wRIU=.f0aa81d7-cda3-40b6-a478-43285305da22@github.com>

[JDK-8340146](https://bugs.openjdk.org/browse/JDK-8340146) / #21127  disables TestAllocateHeapAt.java when running with persistent hugepages because it makes assumptions on the underlying file systems.

I propose creating a version of this tests which instead first checks if there is an appropriate mount point for a persistent hugepages heap file, and only runs the test if exists.

-------------

Commit messages:
 - 8340419: ZGC: Create an UseLargePages adaptation of TestAllocateHeapAt.java

Changes: https://git.openjdk.org/jdk/pull/21128/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21128&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8340419
  Stats: 91 lines in 1 file changed: 91 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/21128.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/21128/head:pull/21128

PR: https://git.openjdk.org/jdk/pull/21128

From tschatzl at openjdk.org  Mon Sep 23 07:33:36 2024
From: tschatzl at openjdk.org (Thomas Schatzl)
Date: Mon, 23 Sep 2024 07:33:36 GMT
Subject: RFR: 8340146: ZGC: TestAllocateHeapAt.java should not run with
 UseLargePages
In-Reply-To: <1AOCA9rqJqTywzPzJ7jtmw4SJ01kE1LAfvmG1otOH-U=.e17bd4bd-221b-4ae6-a213-57e031f731a1@github.com>
References: <1AOCA9rqJqTywzPzJ7jtmw4SJ01kE1LAfvmG1otOH-U=.e17bd4bd-221b-4ae6-a213-57e031f731a1@github.com>
Message-ID: <55EUODB7OmULTHMww9vJX5MrU3_tuMXrwrlm9EsxeiU=.6d6fbdcd-0f35-4381-b2d9-fe6da69c9884@github.com>

On Mon, 23 Sep 2024 07:22:44 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

> TestAllocateHeapAt.java expects that creating the heap file works in the current director (`.`). But when using persistent hugepages (-XX:+UseLargePages) this would require the filesystem to be a HugeTLBFS.
> 
> I propose that we do not allow running these tests with persistent hugepages.

lgtm.

-------------

Marked as reviewed by tschatzl (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/21127#pullrequestreview-2321326839

From stefank at openjdk.org  Mon Sep 23 07:40:34 2024
From: stefank at openjdk.org (Stefan Karlsson)
Date: Mon, 23 Sep 2024 07:40:34 GMT
Subject: RFR: 8340419: ZGC: Create an UseLargePages adaptation of
 TestAllocateHeapAt.java
In-Reply-To: <ynQJWSbQMh3j1ryENUE6EMdku-NwJsDEVxlfbP8wRIU=.f0aa81d7-cda3-40b6-a478-43285305da22@github.com>
References: <ynQJWSbQMh3j1ryENUE6EMdku-NwJsDEVxlfbP8wRIU=.f0aa81d7-cda3-40b6-a478-43285305da22@github.com>
Message-ID: <lIN9mVOjMtKM-7VtHeR5M8TrQLxyYKS5cLF8Yrcxu-I=.8be2768f-0dda-422a-b326-921f5c33deed@github.com>

On Mon, 23 Sep 2024 07:28:15 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

> [JDK-8340146](https://bugs.openjdk.org/browse/JDK-8340146) / #21127  disables TestAllocateHeapAt.java when running with persistent hugepages because it makes assumptions on the underlying file systems.
> 
> I propose creating a version of this tests which instead first checks if there is an appropriate mount point for a persistent hugepages heap file, and only runs the test if exists.

Marked as reviewed by stefank (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/21128#pullrequestreview-2321340466

From stefank at openjdk.org  Mon Sep 23 07:41:34 2024
From: stefank at openjdk.org (Stefan Karlsson)
Date: Mon, 23 Sep 2024 07:41:34 GMT
Subject: RFR: 8340146: ZGC: TestAllocateHeapAt.java should not run with
 UseLargePages
In-Reply-To: <1AOCA9rqJqTywzPzJ7jtmw4SJ01kE1LAfvmG1otOH-U=.e17bd4bd-221b-4ae6-a213-57e031f731a1@github.com>
References: <1AOCA9rqJqTywzPzJ7jtmw4SJ01kE1LAfvmG1otOH-U=.e17bd4bd-221b-4ae6-a213-57e031f731a1@github.com>
Message-ID: <WWTqLfFwFyjN0QQ4CTdqAwG40pGnjcM5ALkP7_smG_k=.97de393e-ba43-4559-ae3b-8fc4c94e7068@github.com>

On Mon, 23 Sep 2024 07:22:44 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

> TestAllocateHeapAt.java expects that creating the heap file works in the current director (`.`). But when using persistent hugepages (-XX:+UseLargePages) this would require the filesystem to be a HugeTLBFS.
> 
> I propose that we do not allow running these tests with persistent hugepages.

Marked as reviewed by stefank (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/21127#pullrequestreview-2321341392

From rcastanedalo at openjdk.org  Mon Sep 23 07:48:16 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Mon, 23 Sep 2024 07:48:16 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v25]
In-Reply-To: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
Message-ID: <IepCEeBHNViuIu07mRYpRy-0dQq9QgROYtSrLip0Vb8=.31f9b573-ef3d-4ce5-a0ac-bbe7c5b78131@github.com>

> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail.
> 
> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold:
> 
> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and
> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work.
> 
> ## Summary of the Changes
> 
> ### Platform-Independent Changes (`src/hotspot/share`)
> 
> These consist mainly of:
> 
> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early;
> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and
> - temporary support for porting the JEP to the remaining platforms.
> 
> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed.
> 
> ### Platform-Dependent Changes (`src/hotspot/cpu`)
> 
> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms.
> 
> #### ADL Changes
> 
> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy.
> 
> #### `G1BarrierSetAssembler` Changes
> 
> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c...

Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 46 additional commits since the last revision:

 - Merge jdk-24+16
 - Ensure that detected encode-and-store patterns are matched
 - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion
 - Remove redundant comment
 - Assert that unneeded stub tmp registers are not initialized in x64 and aarch64 platforms
 - Set tmp registers to noreg by default in G1PreBarrierStubC2::initialize_registers, for consistency
 - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion
 - Restore some asserts
 - Default values for tmp regs of G1PostBarrierStubC2
 -  8334060: [arm32] Implementation of Late Barrier Expansion for G1
 - ... and 36 more: https://git.openjdk.org/jdk/compare/bdb0e33c...47c982ba

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19746/files
  - new: https://git.openjdk.org/jdk/pull/19746/files/d54d67f1..47c982ba

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=24
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=23-24

  Stats: 170497 lines in 1328 files changed: 155223 ins; 8073 del; 7201 mod
  Patch: https://git.openjdk.org/jdk/pull/19746.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746

PR: https://git.openjdk.org/jdk/pull/19746

From rcastanedalo at openjdk.org  Mon Sep 23 07:57:52 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Mon, 23 Sep 2024 07:57:52 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v25]
In-Reply-To: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <fZlrdfsGJHUzjBFq0wEWY0QPZlhjC3KAK1obtgfMCTk=.7ccbe569-50d8-4a3e-8728-5aaca9678ab6@github.com>
 <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com>
Message-ID: <CDl6hsO3iqzIUZDlHRIab-AUNryWMQnpCeYxmsTIKWI=.b8571ef3-6a76-4964-9c88-e604f71d3f0e@github.com>

On Fri, 13 Sep 2024 22:51:59 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

>> Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 46 additional commits since the last revision:
>> 
>>  - Merge jdk-24+16
>>  - Ensure that detected encode-and-store patterns are matched
>>  - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion
>>  - Remove redundant comment
>>  - Assert that unneeded stub tmp registers are not initialized in x64 and aarch64 platforms
>>  - Set tmp registers to noreg by default in G1PreBarrierStubC2::initialize_registers, for consistency
>>  - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion
>>  - Restore some asserts
>>  - Default values for tmp regs of G1PostBarrierStubC2
>>  -  8334060: [arm32] Implementation of Late Barrier Expansion for G1
>>  - ... and 36 more: https://git.openjdk.org/jdk/compare/da906826...47c982ba
>
> src/hotspot/share/opto/matcher.cpp line 1821:
> 
>> 1819:   if( rule >= _END_INST_CHAIN_RULE || rule < _BEGIN_INST_CHAIN_RULE ) {
>> 1820:     assert(C->node_arena()->contains(s->_leaf) || !has_new_node(s->_leaf),
>> 1821:            "duplicating node that's already been matched");
> 
> Why it was removed?

The assertion was failing due to it being too strict in several cases where the matcher would generate valid code anyway. One of them is when `is_encode_and_store_pattern(n, m)` returns true but `m -> n` cannot be matched by a single `g1EncodePAndStoreN` instruction. Commit 9ad158b6 removes this case by ensuring that `is_encode_and_store_pattern(n, m)` holds only if `m -> n` can indeed be matched.
There are other cases (all of them harmless as far as I can see) in which this assertion can fail. I am investigating whether they can be avoided so that the assertion can be restored, and what would be the impact on the "redundant decompression removal" (`g1EncodePAndStoreN`) optimization.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1770925777

From kbarrett at openjdk.org  Mon Sep 23 08:05:40 2024
From: kbarrett at openjdk.org (Kim Barrett)
Date: Mon, 23 Sep 2024 08:05:40 GMT
Subject: RFR: 8340573: Remove unused
 G1ParScanThreadState::_partial_objarray_chunk_size
In-Reply-To: <y4PvUeXFTqoZRzi9zVCjsiJCFenFC1Z04-S_X3Qb0ms=.fd8eb3a4-da55-4f1e-8409-4d85c4085b5b@github.com>
References: <amChyi-bb0oLTn7pokdGShJgl-Pq2t0r9SIeIU1tKLQ=.425e0309-da21-4b72-b4b0-ebfdd39555eb@github.com>
 <y4PvUeXFTqoZRzi9zVCjsiJCFenFC1Z04-S_X3Qb0ms=.fd8eb3a4-da55-4f1e-8409-4d85c4085b5b@github.com>
Message-ID: <O7z1P1kgLsijrZtmSwolBa1FpAJiGpRA8QiN-WCbTI8=.3748873f-90a1-4950-b43e-8e019b249fb5@github.com>

On Mon, 23 Sep 2024 07:12:49 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:

>> Please review this trivial change to remove unused G1ParScanThreadState::_partial_objarray_chunk_size.
>> 
>> Testing: local (linux-x64) clean build
>
> Lgtm and trivial.

Thanks for reviewing, @tschatzl

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21117#issuecomment-2367484521

From kbarrett at openjdk.org  Mon Sep 23 08:05:41 2024
From: kbarrett at openjdk.org (Kim Barrett)
Date: Mon, 23 Sep 2024 08:05:41 GMT
Subject: Integrated: 8340573: Remove unused
 G1ParScanThreadState::_partial_objarray_chunk_size
In-Reply-To: <amChyi-bb0oLTn7pokdGShJgl-Pq2t0r9SIeIU1tKLQ=.425e0309-da21-4b72-b4b0-ebfdd39555eb@github.com>
References: <amChyi-bb0oLTn7pokdGShJgl-Pq2t0r9SIeIU1tKLQ=.425e0309-da21-4b72-b4b0-ebfdd39555eb@github.com>
Message-ID: <-JHhJl5HdvVcX0v4ZpflLd9Kog9RYBnr0mHAHJ8f-RI=.bee5f54e-dd64-417f-86b0-cbf0f88f272b@github.com>

On Sat, 21 Sep 2024 23:34:24 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:

> Please review this trivial change to remove unused G1ParScanThreadState::_partial_objarray_chunk_size.
> 
> Testing: local (linux-x64) clean build

This pull request has now been integrated.

Changeset: a07052e8
Author:    Kim Barrett <kbarrett at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/a07052e83d20e107f21fd0d266ab638043531c8a
Stats:     2 lines in 1 file changed: 0 ins; 2 del; 0 mod

8340573: Remove unused G1ParScanThreadState::_partial_objarray_chunk_size

Reviewed-by: tschatzl

-------------

PR: https://git.openjdk.org/jdk/pull/21117

From duke at openjdk.org  Mon Sep 23 12:14:38 2024
From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=)
Date: Mon, 23 Sep 2024 12:14:38 GMT
Subject: RFR: 8339161: ZGC: Remove unused remembered sets
In-Reply-To: <Aj22qsQpIcj8Mk6dW9ykOHq374R517RV8OYcIiqrK-4=.6471b081-7c66-4743-86f7-d74180821076@github.com>
References: <yHFK07SWLlxEyRnbof_8OY9ZPszetezf6t83VEA2RVk=.7edd8ad8-52ea-4eb6-b037-699373804b47@github.com>
 <Aj22qsQpIcj8Mk6dW9ykOHq374R517RV8OYcIiqrK-4=.6471b081-7c66-4743-86f7-d74180821076@github.com>
Message-ID: <LsRIcdjQkQVjeHW4lkFxg5t4vKHZEEdaQCojADPwXQU=.6156ba51-b67f-4d06-bbc8-92c411f3686b@github.com>

On Wed, 18 Sep 2024 18:38:56 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

>> In ZGC, when a page becomes old it needs a remembered set (remset) which stores 2 bits per byte, a memory overhead of 3.125% (2/64) per page that stores an allocated remset.
>> 
>> When an old page is potentially freed and inserted in the page cache, it can later be re-used as a young page. In this case, the remset is still allocated even though the young page does not need it. This is especially noteworthy for long-running programs where pages are recycled for a long enough period to have a remset allocated for close to all pages.
>> 
>> The attached plot shows remset memory usage for pages that are "live" for a program that frequently recycles pages using a cache-mechanism. As remsets for young pages are unused, it should be considered wasted memory.
>> 
>> ![remset_waste](https://github.com/user-attachments/assets/2a60948b-9297-4554-8fb4-d9f527855c33)
>> 
>> The alternative solution that I propose in this PR deletes/frees remsets when an old page is inserted into the page cache, to not wast ememory. This would mean that remsets are only stored for old pages that are in use. Pages that are not in use or are young, should not have an allocated remset. With this change, the line showing remset usage for young pages would disappear in the plot above and the total memory usage dictated by the number of "live" old pages.
>> 
>> Below is a performance measurement of initializing vs. clearing remsets in the same cache program mentioned above. When deleting remsets, freeing is made by GC threads so there is no latency impact for mutators. Initializing remsets is on average ~2.3x slower than clearing, but also uses 3.125% less memory for pages that do not need a remset. ~89% of the measured initializations are made by GC threads and the rest by mutator threads.
>> 
>> |              | min (ms) | max (ms) | mean (ms)  |
>> | ------------ | -------- | -------- | ---------- |
>> | remset init  | 0.000292 | 0.706    | 0.00258083 |
>> | remset clear | 0.000082 | 0.015    | 0.00111340 |
>> 
>> Tested with tiers 1-7 and local test making sure there are no remsets for young pages. SPECjbb2015 performance measurements show no statistically significant regression/improvement.
>
> lgtm. Nicely done.

Thank you for the reviews! @xmas92 @stefank

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20947#issuecomment-2368030643

From duke at openjdk.org  Mon Sep 23 12:14:39 2024
From: duke at openjdk.org (duke)
Date: Mon, 23 Sep 2024 12:14:39 GMT
Subject: RFR: 8339161: ZGC: Remove unused remembered sets
In-Reply-To: <yHFK07SWLlxEyRnbof_8OY9ZPszetezf6t83VEA2RVk=.7edd8ad8-52ea-4eb6-b037-699373804b47@github.com>
References: <yHFK07SWLlxEyRnbof_8OY9ZPszetezf6t83VEA2RVk=.7edd8ad8-52ea-4eb6-b037-699373804b47@github.com>
Message-ID: <oZUUDkxI4OLVhDxUUzrU4SKm9eNwYp-dbVezWAvMMsU=.5fea4b11-bbb5-4c90-9510-389fea25700b@github.com>

On Wed, 11 Sep 2024 12:15:47 GMT, Joel Sikstr?m <duke at openjdk.org> wrote:

> In ZGC, when a page becomes old it needs a remembered set (remset) which stores 2 bits per byte, a memory overhead of 3.125% (2/64) per page that stores an allocated remset.
> 
> When an old page is potentially freed and inserted in the page cache, it can later be re-used as a young page. In this case, the remset is still allocated even though the young page does not need it. This is especially noteworthy for long-running programs where pages are recycled for a long enough period to have a remset allocated for close to all pages.
> 
> The attached plot shows remset memory usage for pages that are "live" for a program that frequently recycles pages using a cache-mechanism. As remsets for young pages are unused, it should be considered wasted memory.
> 
> ![remset_waste](https://github.com/user-attachments/assets/2a60948b-9297-4554-8fb4-d9f527855c33)
> 
> The alternative solution that I propose in this PR deletes/frees remsets when an old page is inserted into the page cache, to not wast ememory. This would mean that remsets are only stored for old pages that are in use. Pages that are not in use or are young, should not have an allocated remset. With this change, the line showing remset usage for young pages would disappear in the plot above and the total memory usage dictated by the number of "live" old pages.
> 
> Below is a performance measurement of initializing vs. clearing remsets in the same cache program mentioned above. When deleting remsets, freeing is made by GC threads so there is no latency impact for mutators. Initializing remsets is on average ~2.3x slower than clearing, but also uses 3.125% less memory for pages that do not need a remset. ~89% of the measured initializations are made by GC threads and the rest by mutator threads.
> 
> |              | min (ms) | max (ms) | mean (ms)  |
> | ------------ | -------- | -------- | ---------- |
> | remset init  | 0.000292 | 0.706    | 0.00258083 |
> | remset clear | 0.000082 | 0.015    | 0.00111340 |
> 
> Tested with tiers 1-7 and local test making sure there are no remsets for young pages. SPECjbb2015 performance measurements show no statistically significant regression/improvement.

@jsikstro 
Your change (at version af01efcb9fb9567bf1aec73eca91c987626cbe8a) is now ready to be sponsored by a Committer.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20947#issuecomment-2368035079

From duke at openjdk.org  Mon Sep 23 12:31:41 2024
From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=)
Date: Mon, 23 Sep 2024 12:31:41 GMT
Subject: Integrated: 8339161: ZGC: Remove unused remembered sets
In-Reply-To: <yHFK07SWLlxEyRnbof_8OY9ZPszetezf6t83VEA2RVk=.7edd8ad8-52ea-4eb6-b037-699373804b47@github.com>
References: <yHFK07SWLlxEyRnbof_8OY9ZPszetezf6t83VEA2RVk=.7edd8ad8-52ea-4eb6-b037-699373804b47@github.com>
Message-ID: <TdK5M7JmLkRiRkR717MYZjrzi3rTYn4CBb-iBDLG760=.f0b10a84-5a85-47af-b9f3-9bb09b776e14@github.com>

On Wed, 11 Sep 2024 12:15:47 GMT, Joel Sikstr?m <duke at openjdk.org> wrote:

> In ZGC, when a page becomes old it needs a remembered set (remset) which stores 2 bits per byte, a memory overhead of 3.125% (2/64) per page that stores an allocated remset.
> 
> When an old page is potentially freed and inserted in the page cache, it can later be re-used as a young page. In this case, the remset is still allocated even though the young page does not need it. This is especially noteworthy for long-running programs where pages are recycled for a long enough period to have a remset allocated for close to all pages.
> 
> The attached plot shows remset memory usage for pages that are "live" for a program that frequently recycles pages using a cache-mechanism. As remsets for young pages are unused, it should be considered wasted memory.
> 
> ![remset_waste](https://github.com/user-attachments/assets/2a60948b-9297-4554-8fb4-d9f527855c33)
> 
> The alternative solution that I propose in this PR deletes/frees remsets when an old page is inserted into the page cache, to not wast ememory. This would mean that remsets are only stored for old pages that are in use. Pages that are not in use or are young, should not have an allocated remset. With this change, the line showing remset usage for young pages would disappear in the plot above and the total memory usage dictated by the number of "live" old pages.
> 
> Below is a performance measurement of initializing vs. clearing remsets in the same cache program mentioned above. When deleting remsets, freeing is made by GC threads so there is no latency impact for mutators. Initializing remsets is on average ~2.3x slower than clearing, but also uses 3.125% less memory for pages that do not need a remset. ~89% of the measured initializations are made by GC threads and the rest by mutator threads.
> 
> |              | min (ms) | max (ms) | mean (ms)  |
> | ------------ | -------- | -------- | ---------- |
> | remset init  | 0.000292 | 0.706    | 0.00258083 |
> | remset clear | 0.000082 | 0.015    | 0.00111340 |
> 
> Tested with tiers 1-7 and local test making sure there are no remsets for young pages. SPECjbb2015 performance measurements show no statistically significant regression/improvement.

This pull request has now been integrated.

Changeset: 37ec80df
Author:    Joel Sikstr?m <joel.sikstrom at oracle.com>
Committer: Stefan Karlsson <stefank at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/37ec80df8d3b014292fc3d31a1b2aad4e8218ea5
Stats:     95 lines in 7 files changed: 1 ins; 67 del; 27 mod

8339161: ZGC: Remove unused remembered sets

Reviewed-by: aboldtch, stefank

-------------

PR: https://git.openjdk.org/jdk/pull/20947

From zgu at openjdk.org  Mon Sep 23 12:37:36 2024
From: zgu at openjdk.org (Zhengyu Gu)
Date: Mon, 23 Sep 2024 12:37:36 GMT
Subject: RFR: 8340408: Shenandoah: Remove redundant task stats printing
 code in ShenandoahTaskQueue
In-Reply-To: <b8j5T3wNtPN9ysHoj8pDP3c3oWWnWG25E1gDGk1SpIs=.fc6b14f8-af4b-4dc1-a624-044957b4e61a@github.com>
References: <6PiT6kRNGaQjkJTNknVTqMV14SyYXwFosAz9Ts3Emfk=.46c01286-6733-424c-a0d6-e6ff8aa4726c@github.com>
 <b8j5T3wNtPN9ysHoj8pDP3c3oWWnWG25E1gDGk1SpIs=.fc6b14f8-af4b-4dc1-a624-044957b4e61a@github.com>
Message-ID: <dn5pBKouiC38iMzyt1yR6YAakV_lMs0JdhdREVpqsZw=.e24b27d9-3bbe-439f-9d97-64dbced92444@github.com>

On Thu, 19 Sep 2024 21:46:43 GMT, William Kemper <wkemper at openjdk.org> wrote:

>> [JDK-8280397](https://bugs.openjdk.org/browse/JDK-8280397) made the code redundant. 
>> 
>> Adopt shared implementation.
>
> src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp line 228:
> 
>> 226:   finish_mark_work();
>> 227:   assert(task_queues()->is_empty(), "Should be empty");
>> 228:   TASKQUEUE_STATS_ONLY(task_queues()->print_and_reset_taskqueue_stats(""));
> 
> Could we pass `"Finish Mark"` for the label here.

The label is used for queue names in other GCs, instead of phases. I passed empty string to be consistent with old label.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/21077#discussion_r1771319732

From rkennke at openjdk.org  Mon Sep 23 14:30:41 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Mon, 23 Sep 2024 14:30:41 GMT
Subject: RFR: 8340183: Shenandoah: Incorrect match for clone barrier in
 is_gc_barrier_node
In-Reply-To: <lMJ7zz1QZt3FMIqVLhEpYe6aqMc_uqU_d0P2mWQI4IY=.136a8eb3-b7ae-43f4-b268-5f7a6d189e09@github.com>
References: <lMJ7zz1QZt3FMIqVLhEpYe6aqMc_uqU_d0P2mWQI4IY=.136a8eb3-b7ae-43f4-b268-5f7a6d189e09@github.com>
Message-ID: <56gDsD_PGk6_iCgfzeI2NIEC-FpUrjRyW8WUzKy5oXs=.5965cafb-3926-4134-88e6-6d9cf72fef1d@github.com>

On Mon, 16 Sep 2024 10:35:15 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> The name of the call we emit is "shenandoah_clone":
> https://github.com/openjdk/jdk/blob/545951889c1ea68646be600decaf2bf4c049600b/src/hotspot/share/gc/shenandoah/c2/shenandoahBarrierSetC2.cpp#L806
> 
> ...yet we test for "shenandoah_clone_barrier" here:
> https://github.com/openjdk/jdk/blob/545951889c1ea68646be600decaf2bf4c049600b/src/hotspot/share/gc/shenandoah/c2/shenandoahBarrierSetC2.cpp#L688
> 
> I think we are better off polling the call target instead of relying on call name. This change also eliminates `shenandoah_cas_obj` matcher, for which we do not have the emitted call ever since we started doing CAS expansions inline.
> 
> Additional testing:
>  - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah`
>  - [ ] Linux x86_64 server fastdebug, `all` with `-XX:+UseShenandoahGC`

Looks good, thank you!

-------------

Marked as reviewed by rkennke (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/21014#pullrequestreview-2322461615

From shade at openjdk.org  Mon Sep 23 14:35:41 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Mon, 23 Sep 2024 14:35:41 GMT
Subject: RFR: 8340183: Shenandoah: Incorrect match for clone barrier in
 is_gc_barrier_node
In-Reply-To: <lMJ7zz1QZt3FMIqVLhEpYe6aqMc_uqU_d0P2mWQI4IY=.136a8eb3-b7ae-43f4-b268-5f7a6d189e09@github.com>
References: <lMJ7zz1QZt3FMIqVLhEpYe6aqMc_uqU_d0P2mWQI4IY=.136a8eb3-b7ae-43f4-b268-5f7a6d189e09@github.com>
Message-ID: <Lh5UMiuFh_ThUfKAXgVpnP8MS5TCdz03WqoYJD28CoI=.86bc7534-600a-413e-a471-4460826a1be4@github.com>

On Mon, 16 Sep 2024 10:35:15 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> The name of the call we emit is "shenandoah_clone":
> https://github.com/openjdk/jdk/blob/545951889c1ea68646be600decaf2bf4c049600b/src/hotspot/share/gc/shenandoah/c2/shenandoahBarrierSetC2.cpp#L806
> 
> ...yet we test for "shenandoah_clone_barrier" here:
> https://github.com/openjdk/jdk/blob/545951889c1ea68646be600decaf2bf4c049600b/src/hotspot/share/gc/shenandoah/c2/shenandoahBarrierSetC2.cpp#L688
> 
> I think we are better off polling the call target instead of relying on call name. This change also eliminates `shenandoah_cas_obj` matcher, for which we do not have the emitted call ever since we started doing CAS expansions inline.
> 
> Additional testing:
>  - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah`
>  - [ ] Linux x86_64 server fastdebug, `all` with `-XX:+UseShenandoahGC`

Thanks!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21014#issuecomment-2368467997

From shade at openjdk.org  Mon Sep 23 14:35:42 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Mon, 23 Sep 2024 14:35:42 GMT
Subject: Integrated: 8340183: Shenandoah: Incorrect match for clone barrier in
 is_gc_barrier_node
In-Reply-To: <lMJ7zz1QZt3FMIqVLhEpYe6aqMc_uqU_d0P2mWQI4IY=.136a8eb3-b7ae-43f4-b268-5f7a6d189e09@github.com>
References: <lMJ7zz1QZt3FMIqVLhEpYe6aqMc_uqU_d0P2mWQI4IY=.136a8eb3-b7ae-43f4-b268-5f7a6d189e09@github.com>
Message-ID: <Y6P4jC25XkAQoNTV3KcizrE8jZtmZFPCdjkbbhpzXYo=.6386e4ad-8393-49b4-8e85-4d1f5464731f@github.com>

On Mon, 16 Sep 2024 10:35:15 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> The name of the call we emit is "shenandoah_clone":
> https://github.com/openjdk/jdk/blob/545951889c1ea68646be600decaf2bf4c049600b/src/hotspot/share/gc/shenandoah/c2/shenandoahBarrierSetC2.cpp#L806
> 
> ...yet we test for "shenandoah_clone_barrier" here:
> https://github.com/openjdk/jdk/blob/545951889c1ea68646be600decaf2bf4c049600b/src/hotspot/share/gc/shenandoah/c2/shenandoahBarrierSetC2.cpp#L688
> 
> I think we are better off polling the call target instead of relying on call name. This change also eliminates `shenandoah_cas_obj` matcher, for which we do not have the emitted call ever since we started doing CAS expansions inline.
> 
> Additional testing:
>  - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah`
>  - [ ] Linux x86_64 server fastdebug, `all` with `-XX:+UseShenandoahGC`

This pull request has now been integrated.

Changeset: ea8f35b9
Author:    Aleksey Shipilev <shade at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/ea8f35b98e618bfa55371e45b3ef61fa5289dd94
Stats:     20 lines in 3 files changed: 6 ins; 10 del; 4 mod

8340183: Shenandoah: Incorrect match for clone barrier in is_gc_barrier_node

Reviewed-by: roland, rkennke

-------------

PR: https://git.openjdk.org/jdk/pull/21014

From kirk at kodewerk.com  Mon Sep 23 15:48:22 2024
From: kirk at kodewerk.com (Kirk Pepperdine)
Date: Mon, 23 Sep 2024 08:48:22 -0700
Subject: Aligning the Serial collector with ZGC
Message-ID: <49D0FB58-94AB-46B6-A8A6-F9F08773770E@kodewerk.com>

Hi,

I wanted to surface to the mailing list that we've taken on the task of adding Automated Heap Sizing (AHS) as has been introduced into ZGC (and is currently being introduced into G1 https://github.com/openjdk/jdk/pull/20783 <https://github.com/openjdk/jdk/pull/20783>)) into the Serial collector. The goals of this effort are modeled after the goals for ZGC and we plan to borrow as much as possible (or as much as makes sense). For example, we would like to alter the default settings for -Xmx and -Xms. Instead of 1/4, the default MaxHeapSize would be set to available RAM. The collector will use of memory and CPU pressure, similar to what was introduced in ZGC, to control heap expansion and contraction. Current sizing ergonomics is based on the number of non-daemon threads. Altering this is expected to give the Serial collector a more dynamic ability to uncommit memory no longer in use (thus be more memory efficient when running in a container). The flags SoftMaxHeapSize and SerialPressure as well as the level of global memory pressure would be used to help guide ergonomic choices. This new ergonomic choice should work to minimize GC overhead while avoiding becoming an OOM victim. As part of this, the goal is to provide enough memory but not at all costs.

We see this work being broken down into several steps. Very roughly the steps would be;

- Introduce an adaptive size policy that takes into account memory and CPU pressure along with global memory pressure.
   - Heap should be large enough to minimize GC overhead but not large enough to trigger OOM.
   - Introduce -XX:SerialPressure=[0-100] to support this work.
   - introduce a smoothing algorythm to avoid excessive small resizes.

- Introduce manageable flag SoftMaxHeapSize to define a target heap size and set the default max heap size to 100% of available.
- Add in the ability to uncommit memory (to reduce global memory pressure).


While working through the details of this work I noted that there appear to opportunities to offer new defaults for other settings. For example, when tuning GC I've found that it's best to set max heap size to the sum of the size of Eden, Survivor, and Tenured. The reasoning is that each of these spaces surves a specific purpose in managing object lifecycles and as such, (with few exceptions) each make use of a different metric to guide how to size. Also, unlike ZGC, there is an overhead penalty for having an oversized tenured space. Consequently there is a sizing sweetspot where overhead will be minimized . Too small and overheads and GC cycle freqency will be very high. As heap is increases past the optimal size you will tend to see a gradual degradation in GC performance.

For Eden the guiding metric is allocation rate. For Survivor it's life cycle (age table). For Tenured it's live set size. Using these metrics to determine size of the parts and use that to then calculate a max heap size has almost always yielded lower GC overheads than setting a heap size and then letting ratios size everything. This maybe a separate piece of work but the intent would be to have ergonomics calculate optimal eden, survivor and tenured sizes. Each young collection is an opportunity to resize Eden and Survivor whereas a full would be used to resize Eden, Survivor and Tenured space. This may lead to the need to ignore NewRatio and (the soft target) MaxGCPauseMillis.

As for testing. I?m currently looking at modifying HyperAlloc to add ability to alter the shape of the load on the collector over time.

All of this is still in it?s infancy and we?re open for guidance and input.

As for the work on G1, an initial patch as been submitted (URL above) and is open for comments.


Kind regards,
Kirk
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20240923/9ff7a053/attachment-0001.htm>

From wkemper at openjdk.org  Mon Sep 23 17:17:34 2024
From: wkemper at openjdk.org (William Kemper)
Date: Mon, 23 Sep 2024 17:17:34 GMT
Subject: RFR: 8340408: Shenandoah: Remove redundant task stats printing
 code in ShenandoahTaskQueue
In-Reply-To: <6PiT6kRNGaQjkJTNknVTqMV14SyYXwFosAz9Ts3Emfk=.46c01286-6733-424c-a0d6-e6ff8aa4726c@github.com>
References: <6PiT6kRNGaQjkJTNknVTqMV14SyYXwFosAz9Ts3Emfk=.46c01286-6733-424c-a0d6-e6ff8aa4726c@github.com>
Message-ID: <M0N8WW5VcJLovtxOJKwFMl0Mtpi9xvBhlgUg-ZYaEaM=.9c64f316-c138-4bef-a6ab-cac3260ef334@github.com>

On Thu, 19 Sep 2024 00:33:04 GMT, Zhengyu Gu <zgu at openjdk.org> wrote:

> [JDK-8280397](https://bugs.openjdk.org/browse/JDK-8280397) made the code redundant. 
> 
> Adopt shared implementation.

Marked as reviewed by wkemper (Committer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/21077#pullrequestreview-2322876022

From wkemper at openjdk.org  Mon Sep 23 17:17:35 2024
From: wkemper at openjdk.org (William Kemper)
Date: Mon, 23 Sep 2024 17:17:35 GMT
Subject: RFR: 8340408: Shenandoah: Remove redundant task stats printing
 code in ShenandoahTaskQueue
In-Reply-To: <dn5pBKouiC38iMzyt1yR6YAakV_lMs0JdhdREVpqsZw=.e24b27d9-3bbe-439f-9d97-64dbced92444@github.com>
References: <6PiT6kRNGaQjkJTNknVTqMV14SyYXwFosAz9Ts3Emfk=.46c01286-6733-424c-a0d6-e6ff8aa4726c@github.com>
 <b8j5T3wNtPN9ysHoj8pDP3c3oWWnWG25E1gDGk1SpIs=.fc6b14f8-af4b-4dc1-a624-044957b4e61a@github.com>
 <dn5pBKouiC38iMzyt1yR6YAakV_lMs0JdhdREVpqsZw=.e24b27d9-3bbe-439f-9d97-64dbced92444@github.com>
Message-ID: <WKZQMA4H0GiO6DFwQPdvQ5fiM4gLukIRZW4rpLl1nRo=.218b9316-6495-4426-8dfc-8ee3d6d7fb98@github.com>

On Mon, 23 Sep 2024 12:35:16 GMT, Zhengyu Gu <zgu at openjdk.org> wrote:

>> src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp line 228:
>> 
>>> 226:   finish_mark_work();
>>> 227:   assert(task_queues()->is_empty(), "Should be empty");
>>> 228:   TASKQUEUE_STATS_ONLY(task_queues()->print_and_reset_taskqueue_stats(""));
>> 
>> Could we pass `"Finish Mark"` for the label here.
>
> The label is used for queue names in other GCs, instead of phases. I passed empty string to be consistent with old label.

Okay.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/21077#discussion_r1771815824

From shade at openjdk.org  Mon Sep 23 17:26:37 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Mon, 23 Sep 2024 17:26:37 GMT
Subject: RFR: 8340408: Shenandoah: Remove redundant task stats printing
 code in ShenandoahTaskQueue
In-Reply-To: <6PiT6kRNGaQjkJTNknVTqMV14SyYXwFosAz9Ts3Emfk=.46c01286-6733-424c-a0d6-e6ff8aa4726c@github.com>
References: <6PiT6kRNGaQjkJTNknVTqMV14SyYXwFosAz9Ts3Emfk=.46c01286-6733-424c-a0d6-e6ff8aa4726c@github.com>
Message-ID: <nnnTNDbAJwVdzMij-79dhOPpMRu55AYW_S9Z0yEsmho=.a1cbc740-9182-4004-8cf9-9a8a112420a1@github.com>

On Thu, 19 Sep 2024 00:33:04 GMT, Zhengyu Gu <zgu at openjdk.org> wrote:

> [JDK-8280397](https://bugs.openjdk.org/browse/JDK-8280397) made the code redundant. 
> 
> Adopt shared implementation.

Marked as reviewed by shade (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/21077#pullrequestreview-2322894077

From aboldtch at openjdk.org  Tue Sep 24 05:37:38 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Tue, 24 Sep 2024 05:37:38 GMT
Subject: RFR: 8340146: ZGC: TestAllocateHeapAt.java should not run with
 UseLargePages
In-Reply-To: <1AOCA9rqJqTywzPzJ7jtmw4SJ01kE1LAfvmG1otOH-U=.e17bd4bd-221b-4ae6-a213-57e031f731a1@github.com>
References: <1AOCA9rqJqTywzPzJ7jtmw4SJ01kE1LAfvmG1otOH-U=.e17bd4bd-221b-4ae6-a213-57e031f731a1@github.com>
Message-ID: <K940OSXJRPrhQD-yU1fHj_4Pv006AGgvdoI06s-aeg8=.ba72648d-8a5e-4ee8-bd2e-7f776bc97cb2@github.com>

On Mon, 23 Sep 2024 07:22:44 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

> TestAllocateHeapAt.java expects that creating the heap file works in the current director (`.`). But when using persistent hugepages (-XX:+UseLargePages) this would require the filesystem to be a HugeTLBFS.
> 
> I propose that we do not allow running these tests with persistent hugepages.

Thanks for the reviews.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21127#issuecomment-2370228388

From aboldtch at openjdk.org  Tue Sep 24 05:37:38 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Tue, 24 Sep 2024 05:37:38 GMT
Subject: Integrated: 8340146: ZGC: TestAllocateHeapAt.java should not run with
 UseLargePages
In-Reply-To: <1AOCA9rqJqTywzPzJ7jtmw4SJ01kE1LAfvmG1otOH-U=.e17bd4bd-221b-4ae6-a213-57e031f731a1@github.com>
References: <1AOCA9rqJqTywzPzJ7jtmw4SJ01kE1LAfvmG1otOH-U=.e17bd4bd-221b-4ae6-a213-57e031f731a1@github.com>
Message-ID: <YiEPxC5rL0FoZdxAifKbKvO3qx8sYx42h4pBJKX4xVA=.08b2c2b9-39fd-4e43-8ade-6f2da0a230b2@github.com>

On Mon, 23 Sep 2024 07:22:44 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

> TestAllocateHeapAt.java expects that creating the heap file works in the current director (`.`). But when using persistent hugepages (-XX:+UseLargePages) this would require the filesystem to be a HugeTLBFS.
> 
> I propose that we do not allow running these tests with persistent hugepages.

This pull request has now been integrated.

Changeset: 4098acc2
Author:    Axel Boldt-Christmas <aboldtch at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/4098acc200e608369ac1631dcc8513ea797bd59e
Stats:     5 lines in 3 files changed: 3 ins; 0 del; 2 mod

8340146: ZGC: TestAllocateHeapAt.java should not run with UseLargePages

Reviewed-by: tschatzl, stefank

-------------

PR: https://git.openjdk.org/jdk/pull/21127

From rcastanedalo at openjdk.org  Tue Sep 24 09:01:53 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Tue, 24 Sep 2024 09:01:53 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v15]
In-Reply-To: <kBBMPkYzFAS7ne_MN9yQpW1f60peFJsaxLxnSAD0reo=.f7abc0d7-dddd-4c13-a27f-a58772f7ba96@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <CCz9B03dUxzua_YHk9-nyyHS_FTa6axwrwYjACcz4Y0=.6052f53c-9ddb-4868-a817-aff4276a2b6a@github.com>
 <6UJOrZqmfsJj6pRzMjPdlYt191QgBV6fIv1qJAYsv60=.15284272-464f-4321-b76c-3412dafc6c63@github.com>
 <poutu1_cWKZJS_aUjdHdxXdoYBb0nS5Zohb1-_UwFiY=.403c84c9-a971-4b16-b436-e34cc9654321@github.com>
 <IvQ2g_HP080mVZMQYJM3SCnjeAwZnJHfDHGUzwkXpR4=.053ed2be-6e53-4b91-b048-8efb57645c9c@github.com>
 <n5SsWdtVlbLQeGu2xxc39RFOkpcMUBUb16rd7IaZEAY=.a4d84480-b40f-4d74-9314-6b47c6551c20@github.com>
 <kBBMPkYzFAS7ne_MN9yQpW1f60peFJsaxLxnSAD0reo=.f7abc0d7-dddd-4c13-a27f-a58772f7ba96@github.com>
Message-ID: <Rp36TAlybOuvz86lsu_c81LwnhHMP91ljeMUtfmxGE0=.0344a557-48b1-4f37-94ab-7ca5ce38a9b1@github.com>

On Fri, 20 Sep 2024 15:26:36 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> I tried to reproduce for a few hours now using a custom testcase, with no success.
>> I am pretty sure that this can happen, that is why I added this code. Originally I had an assert there asserting that index is not used. I do remember that this happens very rarely, and I don't remember the exact condition. Looking at the possible operands in opclass memory, I think this can only happen when we load an nKlass from an address of the form [rX, rY], i.e. the address in rX indexed by rY. This is an odd thing to happen for loadNKlass, I think, because rY should always be klass_offset_in_bytes. Maybe this is possible when we get odd address merges where we get a PhiNode as the offset/index? I don't know.
>> I agree, this *might* lead to surprising problems with implicit null-checking, if it is expected that the first instruction in loadNKlass provokes the SIGSEGV. A way around this would be to declare an opclass that is a subset of 'memory' that excludes all operands with index, and match on that. I think this would force the lea as a separate instruction and ensure that we never see such a thing in loadNKlass. However, I would not feel very confident to do that without a reproducer. Let me dig a little further.
>> 
>> For reference, here is my unsuccessful reproducer: https://gist.github.com/rkennke/8a57610d74fcde07a9390f268ec35738
>
> Something like this is what I have in mind. It seems to pass tier1 tests. I still haven't managed to reproduce the path that requires an index register, though.
> https://github.com/rkennke/jdk/commit/2c4a7877e4ef94017c8155578d8cfc9342441656

Thanks for the update! If there is a path requiring an index register, I would agree on limiting the memory opclass to exclude indices as you suggest.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1772945253

From rkennke at openjdk.org  Tue Sep 24 11:42:30 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Tue, 24 Sep 2024 11:42:30 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v24]
In-Reply-To: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
Message-ID: <7N9vxRKxAK2GCBNlnU5E0Bj0sGV6_T-2QX9fKCCxlWg=.bdee038b-cee3-4c52-825c-d381d3616092@github.com>

> This is the main body of the JEP 450: Compact Object Headers (Experimental).
> 
> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
> 
> Main changes:
>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>  - Arrays will now store their length at offset 8.
>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv...

Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:

  Improve matching of loadNKlassCompactHeaders on aarch64

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20677/files
  - new: https://git.openjdk.org/jdk/pull/20677/files/0d8a9236..2c4a7877

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=23
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=22-23

  Stats: 17 lines in 3 files changed: 5 ins; 5 del; 7 mod
  Patch: https://git.openjdk.org/jdk/pull/20677.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677

PR: https://git.openjdk.org/jdk/pull/20677

From zgu at openjdk.org  Tue Sep 24 13:19:43 2024
From: zgu at openjdk.org (Zhengyu Gu)
Date: Tue, 24 Sep 2024 13:19:43 GMT
Subject: Integrated: 8340408: Shenandoah: Remove redundant task stats printing
 code in ShenandoahTaskQueue
In-Reply-To: <6PiT6kRNGaQjkJTNknVTqMV14SyYXwFosAz9Ts3Emfk=.46c01286-6733-424c-a0d6-e6ff8aa4726c@github.com>
References: <6PiT6kRNGaQjkJTNknVTqMV14SyYXwFosAz9Ts3Emfk=.46c01286-6733-424c-a0d6-e6ff8aa4726c@github.com>
Message-ID: <v_Fid1_HS99nipTKX2neDPZKrTPeVlNnIeBGfS_SVFk=.850a7c0c-a6ca-4a5e-b289-18da9354e043@github.com>

On Thu, 19 Sep 2024 00:33:04 GMT, Zhengyu Gu <zgu at openjdk.org> wrote:

> [JDK-8280397](https://bugs.openjdk.org/browse/JDK-8280397) made the code redundant. 
> 
> Adopt shared implementation.

This pull request has now been integrated.

Changeset: 279086d4
Author:    Zhengyu Gu <zgu at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/279086d4ce7e05972e099022e8045f39680dd4e8
Stats:     49 lines in 4 files changed: 0 ins; 47 del; 2 mod

8340408: Shenandoah: Remove redundant task stats printing code in ShenandoahTaskQueue

Reviewed-by: shade, wkemper

-------------

PR: https://git.openjdk.org/jdk/pull/21077

From erik.osterlund at oracle.com  Tue Sep 24 13:28:42 2024
From: erik.osterlund at oracle.com (Erik Osterlund)
Date: Tue, 24 Sep 2024 13:28:42 +0000
Subject: Aligning the Serial collector with ZGC
In-Reply-To: <49D0FB58-94AB-46B6-A8A6-F9F08773770E@kodewerk.com>
References: <49D0FB58-94AB-46B6-A8A6-F9F08773770E@kodewerk.com>
Message-ID: <CEAE5E60-7CA5-4851-A0D1-7B3D6496EDF6@oracle.com>

Hi Kirk,

I wonder if we all end up having a -XX:{Z, G1, Shenandoah, Serial, Parallel?}GCPressure=[similar, range] flag to hint to the GC to be more or less aggressive, if we should try to have just a single GCPressure flag for this instead. What do you think?

Kind regards,
/Erik

On 23 Sep 2024, at 17:48, Kirk Pepperdine <kirk at kodewerk.com> wrote:

Hi,

I wanted to surface to the mailing list that we've taken on the task of adding Automated Heap Sizing (AHS) as has been introduced into ZGC (and is currently being introduced into G1 https://github.com/openjdk/jdk/pull/20783)) into the Serial collector. The goals of this effort are modeled after the goals for ZGC and we plan to borrow as much as possible (or as much as makes sense). For example, we would like to alter the default settings for -Xmx and -Xms. Instead of 1/4, the default MaxHeapSize would be set to available RAM. The collector will use of memory and CPU pressure, similar to what was introduced in ZGC, to control heap expansion and contraction. Current sizing ergonomics is based on the number of non-daemon threads. Altering this is expected to give the Serial collector a more dynamic ability to uncommit memory no longer in use (thus be more memory efficient when running in a container). The flags SoftMaxHeapSize and SerialPressure as well as the level of global memory pressure would be used to help guide ergonomic choices. This new ergonomic choice should work to minimize GC overhead while avoiding becoming an OOM victim. As part of this, the goal is to provide enough memory but not at all costs.

We see this work being broken down into several steps. Very roughly the steps would be;

- Introduce an adaptive size policy that takes into account memory and CPU pressure along with global memory pressure.
   - Heap should be large enough to minimize GC overhead but not large enough to trigger OOM.
   - Introduce -XX:SerialPressure=[0-100] to support this work.
   - introduce a smoothing algorythm to avoid excessive small resizes.

- Introduce manageable flag SoftMaxHeapSize to define a target heap size and set the default max heap size to 100% of available.
- Add in the ability to uncommit memory (to reduce global memory pressure).


While working through the details of this work I noted that there appear to opportunities to offer new defaults for other settings. For example, when tuning GC I've found that it's best to set max heap size to the sum of the size of Eden, Survivor, and Tenured. The reasoning is that each of these spaces surves a specific purpose in managing object lifecycles and as such, (with few exceptions) each make use of a different metric to guide how to size. Also, unlike ZGC, there is an overhead penalty for having an oversized tenured space. Consequently there is a sizing sweetspot where overhead will be minimized . Too small and overheads and GC cycle freqency will be very high. As heap is increases past the optimal size you will tend to see a gradual degradation in GC performance.

For Eden the guiding metric is allocation rate. For Survivor it's life cycle (age table). For Tenured it's live set size. Using these metrics to determine size of the parts and use that to then calculate a max heap size has almost always yielded lower GC overheads than setting a heap size and then letting ratios size everything. This maybe a separate piece of work but the intent would be to have ergonomics calculate optimal eden, survivor and tenured sizes. Each young collection is an opportunity to resize Eden and Survivor whereas a full would be used to resize Eden, Survivor and Tenured space. This may lead to the need to ignore NewRatio and (the soft target) MaxGCPauseMillis.

As for testing. I?m currently looking at modifying HyperAlloc to add ability to alter the shape of the load on the collector over time.

All of this is still in it?s infancy and we?re open for guidance and input.

As for the work on G1, an initial patch as been submitted (URL above) and is open for comments.


Kind regards,
Kirk

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20240924/1c182f8f/attachment.htm>

From kirk at kodewerk.com  Tue Sep 24 15:27:43 2024
From: kirk at kodewerk.com (Kirk Pepperdine)
Date: Tue, 24 Sep 2024 08:27:43 -0700
Subject: Aligning the Serial collector with ZGC
In-Reply-To: <CEAE5E60-7CA5-4851-A0D1-7B3D6496EDF6@oracle.com>
References: <49D0FB58-94AB-46B6-A8A6-F9F08773770E@kodewerk.com>
 <CEAE5E60-7CA5-4851-A0D1-7B3D6496EDF6@oracle.com>
Message-ID: <ED2B32F4-7FC4-4F0D-B33B-D2FA9105391A@kodewerk.com>

Hi Erik,

I wasn?t sure how committed everyone was to the ZGCPressure especially as it?s in the JEP. I also wasn?t sure about how entangled one would want the flags to be. For example, I?m guessing that a good default value for GCPressure would be 1 or 2 for the Serial collector whereas for G1 I believe Google has settled on 20 IIRC for their version of this flag. I could see 1 or 2 for the Parallel collector should it be decided that the work be performed on that collector also. But other than that, my first thought was, maybe this could just be GCPressure.

Kind regards,
Kirk

> On Sep 24, 2024, at 6:28 AM, Erik Osterlund <erik.osterlund at oracle.com> wrote:
> 
> Hi Kirk,
> 
> I wonder if we all end up having a -XX:{Z, G1, Shenandoah, Serial, Parallel?}GCPressure=[similar, range] flag to hint to the GC to be more or less aggressive, if we should try to have just a single GCPressure flag for this instead. What do you think?
> 
> Kind regards,
> /Erik
> 
>> On 23 Sep 2024, at 17:48, Kirk Pepperdine <kirk at kodewerk.com> wrote:
>> 
>> Hi,
>> 
>> I wanted to surface to the mailing list that we've taken on the task of adding Automated Heap Sizing (AHS) as has been introduced into ZGC (and is currently being introduced into G1 https://github.com/openjdk/jdk/pull/20783 <https://github.com/openjdk/jdk/pull/20783>)) into the Serial collector. The goals of this effort are modeled after the goals for ZGC and we plan to borrow as much as possible (or as much as makes sense). For example, we would like to alter the default settings for -Xmx and -Xms. Instead of 1/4, the default MaxHeapSize would be set to available RAM. The collector will use of memory and CPU pressure, similar to what was introduced in ZGC, to control heap expansion and contraction. Current sizing ergonomics is based on the number of non-daemon threads. Altering this is expected to give the Serial collector a more dynamic ability to uncommit memory no longer in use (thus be more memory efficient when running in a container). The flags SoftMaxHeapSize and SerialPressure as well as the level of global memory pressure would be used to help guide ergonomic choices. This new ergonomic choice should work to minimize GC overhead while avoiding becoming an OOM victim. As part of this, the goal is to provide enough memory but not at all costs.
>> 
>> We see this work being broken down into several steps. Very roughly the steps would be;
>> 
>> - Introduce an adaptive size policy that takes into account memory and CPU pressure along with global memory pressure.
>>    - Heap should be large enough to minimize GC overhead but not large enough to trigger OOM.
>>    - Introduce -XX:SerialPressure=[0-100] to support this work.
>>    - introduce a smoothing algorythm to avoid excessive small resizes.
>> 
>> - Introduce manageable flag SoftMaxHeapSize to define a target heap size and set the default max heap size to 100% of available.
>> - Add in the ability to uncommit memory (to reduce global memory pressure).
>> 
>> 
>> While working through the details of this work I noted that there appear to opportunities to offer new defaults for other settings. For example, when tuning GC I've found that it's best to set max heap size to the sum of the size of Eden, Survivor, and Tenured. The reasoning is that each of these spaces surves a specific purpose in managing object lifecycles and as such, (with few exceptions) each make use of a different metric to guide how to size. Also, unlike ZGC, there is an overhead penalty for having an oversized tenured space. Consequently there is a sizing sweetspot where overhead will be minimized . Too small and overheads and GC cycle freqency will be very high. As heap is increases past the optimal size you will tend to see a gradual degradation in GC performance.
>> 
>> For Eden the guiding metric is allocation rate. For Survivor it's life cycle (age table). For Tenured it's live set size. Using these metrics to determine size of the parts and use that to then calculate a max heap size has almost always yielded lower GC overheads than setting a heap size and then letting ratios size everything. This maybe a separate piece of work but the intent would be to have ergonomics calculate optimal eden, survivor and tenured sizes. Each young collection is an opportunity to resize Eden and Survivor whereas a full would be used to resize Eden, Survivor and Tenured space. This may lead to the need to ignore NewRatio and (the soft target) MaxGCPauseMillis.
>> 
>> As for testing. I?m currently looking at modifying HyperAlloc to add ability to alter the shape of the load on the collector over time.
>> 
>> All of this is still in it?s infancy and we?re open for guidance and input.
>> 
>> As for the work on G1, an initial patch as been submitted (URL above) and is open for comments.
>> 
>> 
>> Kind regards,
>> Kirk
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20240924/461a3f84/attachment-0001.htm>

From coleenp at openjdk.org  Tue Sep 24 15:40:55 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Tue, 24 Sep 2024 15:40:55 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v23]
In-Reply-To: <VjSp2vvwfzZWKtreZyiwjxsSblAX5JHCpF3c0mcYUHs=.b154b3dd-a8f2-4fdf-8176-52709f906891@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <0BrAbBTKmpqTGDrc--2znzO8t07yoqabwa6g2K05GHI=.d3c17fd5-4770-4623-8d2f-604816afc033@github.com>
 <VjSp2vvwfzZWKtreZyiwjxsSblAX5JHCpF3c0mcYUHs=.b154b3dd-a8f2-4fdf-8176-52709f906891@github.com>
Message-ID: <X5Y3hEUvZJkXEwR0yijyK-ZABZgLf5NE-Glvc9qSUBk=.08666f11-c243-4d1b-a99e-d18c224e8063@github.com>

On Fri, 20 Sep 2024 18:11:43 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

>> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision:
>> 
>>  - Merge remote-tracking branch 'lilliput/JEP-450-temporary-fix-branch-2' into JDK-8305895-v4
>>  - review feedback
>
> src/hotspot/share/memory/metaspace/metablock.hpp line 74:
> 
>> 72: #define METABLOCKFORMATARGS(__block__)  p2i((__block__).base()), (__block__).word_size()
>> 73: 
>> 74: } // namespace metaspace
> 
> I am wondering if some of these metaspace changes, that is, the addition of MetaBlock could be upstreamed ahead of the CompactObjectHeaders.  Some is refactoring so that you can use the wastage to allocate into class-arena but a lot of this seems neutral to compact object headers, and would reduce this patch and allow different people to focus on just this.

For the record, I am fine with these metaspace changes going in with this PR if the timing for that is better.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1773607587

From kvn at openjdk.org  Tue Sep 24 20:00:47 2024
From: kvn at openjdk.org (Vladimir Kozlov)
Date: Tue, 24 Sep 2024 20:00:47 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v25]
In-Reply-To: <CDl6hsO3iqzIUZDlHRIab-AUNryWMQnpCeYxmsTIKWI=.b8571ef3-6a76-4964-9c88-e604f71d3f0e@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <fZlrdfsGJHUzjBFq0wEWY0QPZlhjC3KAK1obtgfMCTk=.7ccbe569-50d8-4a3e-8728-5aaca9678ab6@github.com>
 <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com>
 <CDl6hsO3iqzIUZDlHRIab-AUNryWMQnpCeYxmsTIKWI=.b8571ef3-6a76-4964-9c88-e604f71d3f0e@github.com>
Message-ID: <GRpQpjCQayJ56W27H8M_E5fgDSsAXxu5fUeY4E8ZE1k=.0e1ecf80-a97b-47e9-a31e-dfbee84dcd1f@github.com>

On Mon, 23 Sep 2024 07:54:39 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> src/hotspot/share/opto/matcher.cpp line 1821:
>> 
>>> 1819:   if( rule >= _END_INST_CHAIN_RULE || rule < _BEGIN_INST_CHAIN_RULE ) {
>>> 1820:     assert(C->node_arena()->contains(s->_leaf) || !has_new_node(s->_leaf),
>>> 1821:            "duplicating node that's already been matched");
>> 
>> Why it was removed?
>
> The assertion was failing due to it being too strict in several cases where the matcher would generate valid code anyway. One of them is when `is_encode_and_store_pattern(n, m)` returns true but `m -> n` cannot be matched by a single `g1EncodePAndStoreN` instruction. Commit 9ad158b6 removes this case by ensuring that `is_encode_and_store_pattern(n, m)` holds only if `m -> n` can indeed be matched.
> There are other cases (all of them harmless as far as I can see) in which this assertion can fail. I am investigating whether they can be avoided so that the assertion can be restored, and what would be the impact on the "redundant decompression removal" (`g1EncodePAndStoreN`) optimization.

I think you can do that as follow up changes as separate RFE. I am fine with removal in this JEP code.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1773999931

From rcastanedalo at openjdk.org  Wed Sep 25 04:22:25 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Wed, 25 Sep 2024 04:22:25 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v26]
In-Reply-To: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
Message-ID: <L5NFvxi35ASdQ-1Ap3VJFB_n0Od5-S-rmbi6_853Lp0=.fa70e94d-3b29-457f-b86c-1ba19d11d5b1@github.com>

> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail.
> 
> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold:
> 
> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and
> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work.
> 
> ## Summary of the Changes
> 
> ### Platform-Independent Changes (`src/hotspot/share`)
> 
> These consist mainly of:
> 
> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early;
> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and
> - temporary support for porting the JEP to the remaining platforms.
> 
> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed.
> 
> ### Platform-Dependent Changes (`src/hotspot/cpu`)
> 
> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms.
> 
> #### ADL Changes
> 
> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy.
> 
> #### `G1BarrierSetAssembler` Changes
> 
> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c...

Roberto Casta?eda Lozano has updated the pull request incrementally with three additional commits since the last revision:

 - Relax 'must match' assertion in ppc's g1StoreN after limiting pinning bypass optimization
 - Remove unnecessary reg-to-reg moves in aarch64's g1CompareAndX instructions
 - Reintroduce matcher assert and limit pinning bypass optimization to non-shared EncodeP nodes

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19746/files
  - new: https://git.openjdk.org/jdk/pull/19746/files/47c982ba..6fb36e50

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=25
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=24-25

  Stats: 104 lines in 5 files changed: 4 ins; 30 del; 70 mod
  Patch: https://git.openjdk.org/jdk/pull/19746.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746

PR: https://git.openjdk.org/jdk/pull/19746

From rcastanedalo at openjdk.org  Wed Sep 25 04:26:43 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Wed, 25 Sep 2024 04:26:43 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v24]
In-Reply-To: <y4r0APx4JgMk680hZEpMvn1fWLeJ_3-NIQBDaykZLRg=.e83a8cca-0a87-4f8a-bfd8-814626c0a086@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <-TKP8SOh4eIFRiOcn-a3aRb_GEgApaWv9vhyEOiKJSw=.d4ade865-3cb1-477d-a7c7-f46449553a64@github.com>
 <y4r0APx4JgMk680hZEpMvn1fWLeJ_3-NIQBDaykZLRg=.e83a8cca-0a87-4f8a-bfd8-814626c0a086@github.com>
Message-ID: <jV4TJDq7qighlqMQ06R5cofGGzjfjlZe_y_NXHRxsNQ=.feaf3684-e7c5-4666-922d-7689828df684@github.com>

On Sat, 21 Sep 2024 06:44:21 GMT, Fei Yang <fyang at openjdk.org> wrote:

>> Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision:
>> 
>>  - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion
>>  - Remove redundant comment
>
> src/hotspot/cpu/aarch64/gc/g1/g1_aarch64.ad line 257:
> 
>> 255:                       RegSet::of($res$$Register) /* no_preserve */);
>> 256:     __ mov($tmp1$$Register, $oldval$$Register);
>> 257:     __ mov($tmp2$$Register, $newval$$Register);
> 
> Hi, I don't quite understand these two register-register moves here. Seems to me that we could pass `oldval` and `newval` to `cmpxchg` directly as `cmpxchg` won't modify them, which help us save these two moves. Did I miss anything? Thanks.

Hi Fei, good catch, thanks. These moves have been around since the changes were initially prototyped, and are indeed unnecessary. Note that micro-optimization of the barrier code for atomic memory accesses has not been a focus of this JEP since we have not found any performance reason to do it at the macro level. Having said that, the moves are just wasteful and (perhaps more importantly) make the code harder to read and maintain, so I just removed them (commit 2c7f374e).

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1774393587

From rcastanedalo at openjdk.org  Wed Sep 25 04:58:45 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Wed, 25 Sep 2024 04:58:45 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v19]
In-Reply-To: <GRpQpjCQayJ56W27H8M_E5fgDSsAXxu5fUeY4E8ZE1k=.0e1ecf80-a97b-47e9-a31e-dfbee84dcd1f@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <fZlrdfsGJHUzjBFq0wEWY0QPZlhjC3KAK1obtgfMCTk=.7ccbe569-50d8-4a3e-8728-5aaca9678ab6@github.com>
 <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com>
 <CDl6hsO3iqzIUZDlHRIab-AUNryWMQnpCeYxmsTIKWI=.b8571ef3-6a76-4964-9c88-e604f71d3f0e@github.com>
 <GRpQpjCQayJ56W27H8M_E5fgDSsAXxu5fUeY4E8ZE1k=.0e1ecf80-a97b-47e9-a31e-dfbee84dcd1f@github.com>
Message-ID: <CoGzB-M98fZthWHUktGUQwJsc-xH3gsihk3yr98llrM=.751d2012-9ca7-44ce-a9b5-96682c7fcb08@github.com>

On Tue, 24 Sep 2024 19:57:29 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

> I think you can do that as follow up changes as separate RFE. I am fine with removal in this JEP code.

Thanks, in the meantime I found the remaining case that was causing assertion failures, and managed to handle it and reintroduce the original assertion. The case is where the output of `m` is shared by multiple nodes `N = {n1, n2, ...}` and there exists at the same time a `n` in `N` such that `is_encode_and_store_pattern(n, m)`, and a different `n` in `N` such that `!is_encode_and_store_pattern(n, m)`. Here is an example of such a case:

![ideal](https://github.com/user-attachments/assets/2122dfe0-757c-4094-b8f7-451f4380af45)

Commit f96dfe73 ensures that this case does not trigger the assertion by avoiding cloning `m` in `m -> n` if `m` is shared. This means that a few encode-and-store patterns that were matched by a single `g1EncodePAndStoreN` before are now matched with the less optimized `encodeHeapOop` + `g1StoreN`, e.g. in the above example:

![mach-before-after](https://github.com/user-attachments/assets/a6fe5ad2-c0fb-4098-8bed-34593dafbcc5)

Luckily, this case is very infrequent so we will only miss around 1% of all optimization opportunities. In return, we can reintroduce the original assertion and be sure that the original invariants of the matcher are preserved.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1774467183

From rcastanedalo at openjdk.org  Wed Sep 25 04:58:45 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Wed, 25 Sep 2024 04:58:45 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v19]
In-Reply-To: <CoGzB-M98fZthWHUktGUQwJsc-xH3gsihk3yr98llrM=.751d2012-9ca7-44ce-a9b5-96682c7fcb08@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <fZlrdfsGJHUzjBFq0wEWY0QPZlhjC3KAK1obtgfMCTk=.7ccbe569-50d8-4a3e-8728-5aaca9678ab6@github.com>
 <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com>
 <CDl6hsO3iqzIUZDlHRIab-AUNryWMQnpCeYxmsTIKWI=.b8571ef3-6a76-4964-9c88-e604f71d3f0e@github.com>
 <GRpQpjCQayJ56W27H8M_E5fgDSsAXxu5fUeY4E8ZE1k=.0e1ecf80-a97b-47e9-a31e-dfbee84dcd1f@github.com>
 <CoGzB-M98fZthWHUktGUQwJsc-xH3gsihk3yr98llrM=.751d2012-9ca7-44ce-a9b5-96682c7fcb08@github.com>
Message-ID: <sGoT9QlCGofR207SMqNbuco6uR6a1JTy2aUtRgCjPfk=.e5521fdb-849d-40cf-a7f4-003990698bbe@github.com>

On Wed, 25 Sep 2024 04:55:35 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> I think you can do that as follow up changes as separate RFE. I am fine with removal in this JEP code.
>
>> I think you can do that as follow up changes as separate RFE. I am fine with removal in this JEP code.
> 
> Thanks, in the meantime I found the remaining case that was causing assertion failures, and managed to handle it and reintroduce the original assertion. The case is where the output of `m` is shared by multiple nodes `N = {n1, n2, ...}` and there exists at the same time a `n` in `N` such that `is_encode_and_store_pattern(n, m)`, and a different `n` in `N` such that `!is_encode_and_store_pattern(n, m)`. Here is an example of such a case:
> 
> ![ideal](https://github.com/user-attachments/assets/2122dfe0-757c-4094-b8f7-451f4380af45)
> 
> Commit f96dfe73 ensures that this case does not trigger the assertion by avoiding cloning `m` in `m -> n` if `m` is shared. This means that a few encode-and-store patterns that were matched by a single `g1EncodePAndStoreN` before are now matched with the less optimized `encodeHeapOop` + `g1StoreN`, e.g. in the above example:
> 
> ![mach-before-after](https://github.com/user-attachments/assets/a6fe5ad2-c0fb-4098-8bed-34593dafbcc5)
> 
> Luckily, this case is very infrequent so we will only miss around 1% of all optimization opportunities. In return, we can reintroduce the original assertion and be sure that the original invariants of the matcher are preserved.

@TheRealMDoerr: since there are now a few corner cases where we match a StoreN node with g1StoreN even though it stores the output of an EncodeP node, I had to remove the assertions in the x64 and ppc g1StoreN definitions, see above.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1774467652

From fyang at openjdk.org  Wed Sep 25 07:36:47 2024
From: fyang at openjdk.org (Fei Yang)
Date: Wed, 25 Sep 2024 07:36:47 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v24]
In-Reply-To: <jV4TJDq7qighlqMQ06R5cofGGzjfjlZe_y_NXHRxsNQ=.feaf3684-e7c5-4666-922d-7689828df684@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <-TKP8SOh4eIFRiOcn-a3aRb_GEgApaWv9vhyEOiKJSw=.d4ade865-3cb1-477d-a7c7-f46449553a64@github.com>
 <y4r0APx4JgMk680hZEpMvn1fWLeJ_3-NIQBDaykZLRg=.e83a8cca-0a87-4f8a-bfd8-814626c0a086@github.com>
 <jV4TJDq7qighlqMQ06R5cofGGzjfjlZe_y_NXHRxsNQ=.feaf3684-e7c5-4666-922d-7689828df684@github.com>
Message-ID: <drO5OTj2wztzWdYKhSVotFkJHxKJ78R4oKvg4B4MIzI=.aa8a9d68-7e75-4eb9-a985-a19f474146ea@github.com>

On Wed, 25 Sep 2024 04:22:49 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> src/hotspot/cpu/aarch64/gc/g1/g1_aarch64.ad line 257:
>> 
>>> 255:                       RegSet::of($res$$Register) /* no_preserve */);
>>> 256:     __ mov($tmp1$$Register, $oldval$$Register);
>>> 257:     __ mov($tmp2$$Register, $newval$$Register);
>> 
>> Hi, I don't quite understand these two register-register moves here. Seems to me that we could pass `oldval` and `newval` to `cmpxchg` directly as `cmpxchg` won't modify them, which help us save these two moves. Did I miss anything? Thanks.
>
> Hi Fei, good catch, thanks. These moves have been around since the changes were initially prototyped, and are indeed unnecessary. Note that micro-optimization of the barrier code for atomic memory accesses has not been a focus of this JEP since we have not found any performance reason to do it at the macro level. Having said that, the moves are just wasteful and (perhaps more importantly) make the code harder to read and maintain, so I just removed them (commit 2c7f374e).

Thanks for the update. It now looks cleaner and easier to understand. BTW: Seems that RISC-V part bears a similar issue. I will discuss with @feilongjiang and hopefully we will come up with a similar fix.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1774695093

From sjohanss at openjdk.org  Wed Sep 25 08:05:43 2024
From: sjohanss at openjdk.org (Stefan Johansson)
Date: Wed, 25 Sep 2024 08:05:43 GMT
Subject: RFR: 8340419: ZGC: Create an UseLargePages adaptation of
 TestAllocateHeapAt.java
In-Reply-To: <ynQJWSbQMh3j1ryENUE6EMdku-NwJsDEVxlfbP8wRIU=.f0aa81d7-cda3-40b6-a478-43285305da22@github.com>
References: <ynQJWSbQMh3j1ryENUE6EMdku-NwJsDEVxlfbP8wRIU=.f0aa81d7-cda3-40b6-a478-43285305da22@github.com>
Message-ID: <9ZwyHA5LJ4HJ7S_j9rKB7PqehVDuil-EzwJEOA72zIY=.f6ca79f6-86ec-4815-9fbf-10745ed034bd@github.com>

On Mon, 23 Sep 2024 07:28:15 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

> [JDK-8340146](https://bugs.openjdk.org/browse/JDK-8340146) / #21127  disables TestAllocateHeapAt.java when running with persistent hugepages because it makes assumptions on the underlying file systems.
> 
> I propose creating a version of this tests which instead first checks if there is an appropriate mount point for a persistent hugepages heap file, and only runs the test if exists.

Looks good.

test/hotspot/jtreg/gc/z/TestAllocateHeapAtWithHugeTLBFS.java line 80:

> 78:         ProcessTools.executeTestJava(
> 79:             "-XX:+UseZGC",
> 80:             "-XX:+ZGenerational",

Any reason to include `-XX:+ZGenerational` or should we just skip it?

-------------

Marked as reviewed by sjohanss (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/21128#pullrequestreview-2327468399
PR Review Comment: https://git.openjdk.org/jdk/pull/21128#discussion_r1774745690

From aboldtch at openjdk.org  Wed Sep 25 09:25:36 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Wed, 25 Sep 2024 09:25:36 GMT
Subject: RFR: 8340419: ZGC: Create an UseLargePages adaptation of
 TestAllocateHeapAt.java
In-Reply-To: <9ZwyHA5LJ4HJ7S_j9rKB7PqehVDuil-EzwJEOA72zIY=.f6ca79f6-86ec-4815-9fbf-10745ed034bd@github.com>
References: <ynQJWSbQMh3j1ryENUE6EMdku-NwJsDEVxlfbP8wRIU=.f0aa81d7-cda3-40b6-a478-43285305da22@github.com>
 <9ZwyHA5LJ4HJ7S_j9rKB7PqehVDuil-EzwJEOA72zIY=.f6ca79f6-86ec-4815-9fbf-10745ed034bd@github.com>
Message-ID: <-jmkh9ZNe6sDpJ6wTKeR7AB9JFBk5PSnY3ISsoT9ErM=.65995ba4-85e4-44b5-b3e4-594d8d1c8d75@github.com>

On Wed, 25 Sep 2024 08:02:39 GMT, Stefan Johansson <sjohanss at openjdk.org> wrote:

>> [JDK-8340146](https://bugs.openjdk.org/browse/JDK-8340146) / #21127  disables TestAllocateHeapAt.java when running with persistent hugepages because it makes assumptions on the underlying file systems.
>> 
>> I propose creating a version of this tests which instead first checks if there is an appropriate mount point for a persistent hugepages heap file, and only runs the test if exists.
>
> test/hotspot/jtreg/gc/z/TestAllocateHeapAtWithHugeTLBFS.java line 80:
> 
>> 78:         ProcessTools.executeTestJava(
>> 79:             "-XX:+UseZGC",
>> 80:             "-XX:+ZGenerational",
> 
> Any reason to include `-XX:+ZGenerational` or should we just skip it?

I try to keep this option explicit in tests until it is removed. Avoid assumptions about default values.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/21128#discussion_r1774882896

From duke at openjdk.org  Wed Sep 25 11:57:40 2024
From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=)
Date: Wed, 25 Sep 2024 11:57:40 GMT
Subject: RFR: 8340419: ZGC: Create an UseLargePages adaptation of
 TestAllocateHeapAt.java
In-Reply-To: <ynQJWSbQMh3j1ryENUE6EMdku-NwJsDEVxlfbP8wRIU=.f0aa81d7-cda3-40b6-a478-43285305da22@github.com>
References: <ynQJWSbQMh3j1ryENUE6EMdku-NwJsDEVxlfbP8wRIU=.f0aa81d7-cda3-40b6-a478-43285305da22@github.com>
Message-ID: <M2LnXwjry91w3O6KqTEoGJLau1-YxYucet1fmXhEHX8=.f2839406-96c9-4004-a86b-6fedb9da692d@github.com>

On Mon, 23 Sep 2024 07:28:15 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

> [JDK-8340146](https://bugs.openjdk.org/browse/JDK-8340146) / #21127  disables TestAllocateHeapAt.java when running with persistent hugepages because it makes assumptions on the underlying file systems.
> 
> I propose creating a version of this tests which instead first checks if there is an appropriate mount point for a persistent hugepages heap file, and only runs the test if exists.

I think this looks good.

-------------

Marked as reviewed by jsikstro at github.com (no known OpenJDK username).

PR Review: https://git.openjdk.org/jdk/pull/21128#pullrequestreview-2328027755

From rkennke at openjdk.org  Wed Sep 25 12:34:36 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Wed, 25 Sep 2024 12:34:36 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v25]
In-Reply-To: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
Message-ID: <EQf5fU43OLq4Oarrg-8e7E7DowLm77XO5seh3Lr61D8=.787cea42-e961-4fab-a222-92145c53d3c8@github.com>

> This is the main body of the JEP 450: Compact Object Headers (Experimental).
> 
> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
> 
> Main changes:
>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>  - Arrays will now store their length at offset 8.
>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv...

Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:

  Enforce lightweight locking on 32-bit platforms

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20677/files
  - new: https://git.openjdk.org/jdk/pull/20677/files/2c4a7877..cd69da86

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=24
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=23-24

  Stats: 9 lines in 1 file changed: 9 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/20677.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677

PR: https://git.openjdk.org/jdk/pull/20677

From rkennke at openjdk.org  Wed Sep 25 12:53:17 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Wed, 25 Sep 2024 12:53:17 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v26]
In-Reply-To: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
Message-ID: <fEpfi4CzTFr1Pl-IyOAVyHNXWlVz33lHKQyzVY0kBc8=.cc39355b-ee7a-4118-952a-28e4184eec84@github.com>

> This is the main body of the JEP 450: Compact Object Headers (Experimental).
> 
> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
> 
> Main changes:
>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>  - Arrays will now store their length at offset 8.
>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv...

Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:

  Allow LM_MONITOR on 32-bit platforms

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20677/files
  - new: https://git.openjdk.org/jdk/pull/20677/files/cd69da86..4904d433

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=25
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=24-25

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/20677.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677

PR: https://git.openjdk.org/jdk/pull/20677

From rcastanedalo at openjdk.org  Wed Sep 25 13:54:59 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Wed, 25 Sep 2024 13:54:59 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v26]
In-Reply-To: <P6u4aaxhJr4zjXT4_J993iocS4F11e7lfHjnnGqIJKc=.1bf65c03-fe7f-4b1a-ba08-26c64dafa3f7@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <DUgk6HT_Zk5KnbmAd_HzrJi4B8D1J-uuoAGzReWyI8s=.6c04f90a-bf1d-49af-a31d-5224973952d4@github.com>
 <GQVVR0K-mDh4evKvrKG9eXwwF4j7zAg-8ai4__gWNGE=.3c502820-717b-44b7-b82d-a24dd7fdd9d5@github.com>
 <-CikzUsH1qKbMujGJQFhaPlKaCUDzqH-jEZNM5BZVQQ=.22d236a1-a69a-42e0-86d1-aa738c6e6e6d@github.com>
 <P6u4aaxhJr4zjXT4_J993iocS4F11e7lfHjnnGqIJKc=.1bf65c03-fe7f-4b1a-ba08-26c64dafa3f7@github.com>
Message-ID: <2adTLZAwTvFTVNGeR5e9Cef5uNqpsz2haeobLIDZiNI=.cb2bbf0d-5c1b-4583-b4bd-898e0c5cdbb7@github.com>

On Fri, 13 Sep 2024 06:43:34 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> I don't remember if this change was a reaction to an error or if I just guarded `CompressedKlassPointers::shift()` with +UseCCP because that is the prerequisite now. Probably the latter. I can remove this. Probably should assert then for +UseCCP.
>
> I see, thanks. In that case, I would suggest removing the explicit `UseCompressedClassPointers` test, since it should be implied by `t->isa_narrowklass()`. `check_init()` within `CompressedKlassPointers::shift()` would already fail for the unexpected case where `t->isa_narrowklass() && !UseCompressedClassPointers`, no?

I think it would be good to remove the explicit `UseCompressedClassPointers` test as argued above (i.e. revert this change), unless there is any other reason to keep it that I am missing out?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1775277784

From rcastanedalo at openjdk.org  Wed Sep 25 14:19:54 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Wed, 25 Sep 2024 14:19:54 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v26]
In-Reply-To: <fEpfi4CzTFr1Pl-IyOAVyHNXWlVz33lHKQyzVY0kBc8=.cc39355b-ee7a-4118-952a-28e4184eec84@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <fEpfi4CzTFr1Pl-IyOAVyHNXWlVz33lHKQyzVY0kBc8=.cc39355b-ee7a-4118-952a-28e4184eec84@github.com>
Message-ID: <McrslKGfzgS8urELrWmDFf9M9d2qfRxlMPMIlANo3qI=.c9eb879b-1949-4541-86c4-6aacba9075ae@github.com>

On Wed, 25 Sep 2024 12:53:17 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> This is the main body of the JEP 450: Compact Object Headers (Experimental).
>> 
>> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
>> 
>> Main changes:
>>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>>  - Arrays will now store their length at offset 8.
>>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co...
>
> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Allow LM_MONITOR on 32-bit platforms

src/hotspot/share/opto/memnode.cpp line 2256:

> 2254:   if (!UseCompactObjectHeaders && alloc != nullptr) {
> 2255:     return TypeX::make(markWord::prototype().value());
> 2256:   }

Suggestion: make these four lines conditional on `!UseCompactObjectHeaders`, like so:

  if (!UseCompactObjectHeaders) {
    Node* alloc = is_new_object_mark_load();
    if (alloc != nullptr) {
      return TypeX::make(markWord::prototype().value());
    }
  }

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1775322670

From sjohanss at openjdk.org  Wed Sep 25 20:10:44 2024
From: sjohanss at openjdk.org (Stefan Johansson)
Date: Wed, 25 Sep 2024 20:10:44 GMT
Subject: RFR: 8340426: ZGC: Move defragment out of the allocation path
Message-ID: <iOWgd4xauN7tO8BWxLOov7sev7p1xi6_XCneWtFU-Ew=.b5afdc71-aae4-419f-8fd4-86eed815daf4@github.com>

Please review this change to move defragmentation of small pages out of the allocation path,

**Summary**
In ZGC small pages are allocated at lower addresses while medium and large pages are allocated at higher addresses. Sometimes when memory usage is high or the distribution of pages in the cache demand it, small pages can be split out from medium or large pages. When this happens the small page is defragmented by remapping it to a lower address if it resides at a too high address. This is done to avoid fragmentation, but doing it in the allocation path comes with a cost since the remapping involves system calls.

This change moves the remapping away from the allocation path to when the pages are freed after being garbage collected. This can increase the fragmentation a bit, since small pages are allowed to reside at higher addresses for a period time. The expectation is that since small pages are frequently collected the period should be short enough to not cause any problems.

I've done detailed experiments with temporary JFR events to look at the cost of doing the defrag/remapping of pages. One problem is that by default we don't get a lot of defrags (unless we run out of memory), and to better investigate this I also altered the allocation path to force more defragmentation. These tests showed that moving defrag out of the allocation path not only have the positive effect of reducing the time spent allocating, but also spaced out the defragmentation a bit more.  The reason for this is that now we will defrag when a page is freed by the GC, instead of when the page is allocated and the tests show that often many threads allocate at the same time (and then also defrag at the same time). This in turn leads to many concurrent calls down to the system to remap the memory, and this increases the cost of the actual mapping (seen by adding JFR events).

**Additional testing**

- Functional testing in mach5 tier1-7
- Sanity performance testing in aurora

-------------

Commit messages:
 - Move statistics to cover all cases
 - Enable defragment for ZRelocate calls to free_page
 - 8340426: ZGC: Move defragment out of the allocation path

Changes: https://git.openjdk.org/jdk/pull/21191/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21191&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8340426
  Stats: 75 lines in 5 files changed: 47 ins; 17 del; 11 mod
  Patch: https://git.openjdk.org/jdk/pull/21191.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/21191/head:pull/21191

PR: https://git.openjdk.org/jdk/pull/21191

From thomas.schatzl at oracle.com  Thu Sep 26 08:20:11 2024
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Thu, 26 Sep 2024 10:20:11 +0200
Subject: Aligning the Serial collector with ZGC
In-Reply-To: <49D0FB58-94AB-46B6-A8A6-F9F08773770E@kodewerk.com>
References: <49D0FB58-94AB-46B6-A8A6-F9F08773770E@kodewerk.com>
Message-ID: <44230e90-d8bc-491f-a5c4-2c483646fc3e@oracle.com>

Hi Kirk,

   somewhat random comments...

On 23.09.24 17:48, Kirk Pepperdine wrote:> Hi,
 >
 > I wanted to surface to the mailing list that we've taken on the task
 > of adding Automated Heap Sizing (AHS) as has been introduced into ZGC
 > (and is currently being introduced into G1
 > https://github.com/openjdk/jdk/pull/20783
 > <https://github.com/openjdk/jdk/pull/20783>)) into the Serial
 > collector.
 > The goals of this effort are modeled after the goals for ZGC and we
 > plan to borrow as much as possible (or as much as makes sense). For
 > example, we would like to alter the default settings for -Xmx and
 > -Xms. Instead of 1/4, the default MaxHeapSize would be set to
 > available RAM. The
 > collector will use of memory and CPU pressure, similar to what was
 > introduced in ZGC, to control heap expansion and contraction. Current
 > sizing ergonomics is based on the number of non-daemon threads.
 > Altering this is expected to give the Serial collector a more dynamic
 > ability to uncommit memory no longer in use (thus be more memory
 > efficient when running in a container). The flags SoftMaxHeapSize and
 > SerialPressure as well as the level of global memory pressure would be
 > used to help guide ergonomic choices. This new ergonomic choice should
 > work to minimize GC overhead while avoiding becoming an OOM victim. As
 > part of this, the goal is to provide enough memory but not at all
 > costs.
 >
 > We see this work being broken down into several steps. Very roughly
 > the steps would be;
 >
 > - Introduce an adaptive size policy that takes into account memory and
 > CPU pressure along with global memory pressure.
 >     - Heap should be large enough to minimize GC overhead but not
 > large enough to trigger OOM.

(probably meant "small enough" the second time)

 >     - Introduce -XX:SerialPressure=[0-100] to support this work.

(Fwiw, regards to the other discussion, I agree that if we have a flag 
with the same "meaning" across collectors it might be useful to use the 
same name).

 >     - introduce a smoothing algorythm to avoid excessive small
 > resizes.

One option is to split this further into parts:

* list what actions Serial GC could do in reaction to memory pressure on 
an abstract level, and which make sense; from that see what 
functionality is needed.

* provide functionality that tries to keep some kind of GC/mutator time 
ratio; I would start with looking at G1 does because Serial GC's 
behaviour is probably closer to G1 than ZGC, but ymmv.
(Obviously improvements are welcome :))

(This may not need to be exposed externally like some 
GCTimeRatio/GCCPUPercentage/whatever flag name)

* add functionality to calculate memory pressure from the environment; 
maybe in a containerized environment from a manageable flag as it does 
not have a global "pressure" view. This could probably taken from ZGC, 
at least partially

* some transfer function that translates this external memory pressure, 
based on "GCPressure", (e.g. that "sigmoid" function plus lots of magic 
numbers) to reaction in the gc: e.g. change the gc/mutator pause time 
goal, start collections, uncommit memory...

* (probably) some background thread that continuously calculates and 
reacts on global pressure (uncommit memory, do a gc, resize heap, ...) 
because one probably does not want to wait for the next gc to react...

* do lots of testing to weed out corner cases

 > - Introduce manageable flag SoftMaxHeapSize to define a target heap
 > size nd set the default max heap size to 100% of available.

I am a bit torn about SoftMaxHeapSize in Serial GC. What do you envision 
that Serial GC would do when the SoftMaxHeapSize has been reached, and 
what if old gen occupancy permanently stays above that value?

The usefulness of SoftMaxHeapSize kind of relies on having a minimally 
invasive old gen collection that tries to get old gen usage back below 
that value.

Serial GC has no "minimally invasive" way to collect old generation. It 
is either Full GC or nothing. This is the only option for Serial, but 
always doing Full collections after reaching that threshold seems very 
heavy handed, expensive and undesirable to me (ymmv).

That reaction would follow the spirit of the flag though.

Maybe at the small heaps Serial GC targets, this makes sense, and full 
gc is not that costly anyway.

It might be useful to enumerate what actions could be performed on 
global pressure.

 > - Add in the ability to uncommit memory (to reduce global memory
 > pressure).
 >

The following imo outlines a compdoneletely separate idea, and should be 
discussed separately:

 >
 > While working through the details of this work I noted that there
 > appear  to opportunities to offer new defaults for other settings. For
 > example, [...]

That seems to be some more elaborate way of finding "optimal" generation 
size for a given heap size (which may follow from what the gc/mutator 
time ratio algorithm gives you).

 >
 > For Eden the guiding metric is allocation rate. For Survivor it's life
 > cycle (age table). For Tenured it's live set size. Using these metrics
 > to determine size of the parts and use that to then calculate a max
 > heap size has almost always yielded lower GC overheads than setting a
 > heap size and then letting ratios size everything. This maybe a
 > separate piece of work

+1

 > but the intent would be to have ergonomics calculate
 > optimal eden, survivor and tenured sizes. Each young collection is an
 > opportunity to resize Eden and Survivor whereas a full would be used
 > to resize Eden, Survivor and Tenured space. This may lead to the need
 > to ignore NewRatio and (the soft target) MaxGCPauseMillis.

Fwiw, the only collector that observes MaxGCPauseMillis is G1; in the 
context of Serial GC discussed further above I am confused.

Not sure if MaxGCPauseMillis would make sense in Serial GC given that 
you can't control Full GC pause length.

Also, in the context of G1 some of the statements above are hard to 
understand: e.g. the text seems to imply that there is a fixed ratio 
between eden and survivor which isn't really the case, at least not in 
the sense of Serial GC.

Could you elaborate?

Even then, with Serial GC's fixed generation sizes fine-grained 
on-the-fly adaptation as somewhat suggested might be harder than usual.

Not against doing all that, but it really sounds like separate work.

 >
 > As for testing. I?m currently looking at modifying HyperAlloc to add
 > ability to alter the shape of the load on the collector over time.
 >
 > All of this is still in it?s infancy and we?re open for guidance and
 > input.
 >
 > As for the work on G1, an initial patch as been submitted (URL above)
 > and is open for comments.
 >

The patch does not seem to implement AHS. It implements 
CurrentMaxHeapSize which might be what AHS uses to set max heap size.

To implement AHS for G1 roughly at least the following items need to be 
added/implemented/changed:

* remove the use of Min/MaxHeapFreeRatio for heap sizing. These flags 
completely disregard cpu and heap pressure based heap sizing (should 
also be removed from Serial GC - this means deprecating/obsoleting this 
flag as soon as the last user is gone).

* implement CurrentMaxHeapSize which is a (configurable) hard limit on 
how much the Java application may allocate (JDK-8204088) in support of 
AHS. As mentioned, that patch might be an initial discussion base.
I do not think we need a JEP for that, but it gives you more publicity.

* implement SoftMaxHeapSize in the sense of ZGC where it uses it to 
guide IHOP (or ZGC's equivalent). Note that I am not sure that 
SoftMaxHeapSize is something absolutely necessary in the context of AHS, 
but may be a tool.

* the same background functionality as for serial: implement some 
mechanism to control the heap size based on the decisions of AHS; i.e. 
start collections to get to heap target, uncommit stuff/enqueue for 
uncommit etc.

Currently G1 only resizes the heap during Remark and Full GC which is 
too limiting to follow current "memory pressure". Maybe use/update 
Soft/CurrentMaxHeapSize as needed so that GC compacts the heap first; 
this may either be in the form of JDK-8238687 which uncommits at every 
gc, which is probably still too limiting for an AHS system.

Probably other issues will crop up along the way.

* do lots of testing to weed out corner cases and hopefully not regress 
too much from current performance

Hth,
   Thomas


From rcastanedalo at openjdk.org  Thu Sep 26 09:07:56 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Thu, 26 Sep 2024 09:07:56 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v26]
In-Reply-To: <fEpfi4CzTFr1Pl-IyOAVyHNXWlVz33lHKQyzVY0kBc8=.cc39355b-ee7a-4118-952a-28e4184eec84@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <fEpfi4CzTFr1Pl-IyOAVyHNXWlVz33lHKQyzVY0kBc8=.cc39355b-ee7a-4118-952a-28e4184eec84@github.com>
Message-ID: <DKuidQyyiiJOsrerLCeziselVyG9Ztpzsyc64ZNjlI4=.722fc4f7-d7cc-49df-9d82-5c1094508f74@github.com>

On Wed, 25 Sep 2024 12:53:17 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> This is the main body of the JEP 450: Compact Object Headers (Experimental).
>> 
>> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
>> 
>> Main changes:
>>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>>  - Arrays will now store their length at offset 8.
>>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co...
>
> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Allow LM_MONITOR on 32-bit platforms

src/hotspot/cpu/x86/macroAssembler_x86.cpp line 5692:

> 5690: 
> 5691: void MacroAssembler::load_klass(Register dst, Register src, Register tmp) {
> 5692:   BLOCK_COMMENT("load_klass");

I am not sure that the complexity of `MacroAssembler::load_klass` and the two `MacroAssembler::cmp_klass` functions warrant adding block comments, but if you prefer to leave them in, could you use opening and closing comments, as in the other functions in this file (e.g. `MacroAssembler::_verify_oop`)? In that case, please update the comment in the two `MacroAssembler::cmp_klass` functions with a more descriptive name than `cmp_klass 1` and `cmp_klass 2`.

src/hotspot/cpu/x86/macroAssembler_x86.cpp line 5726:

> 5724: #ifdef _LP64
> 5725:   if (UseCompactObjectHeaders) {
> 5726:     load_nklass_compact(tmp, obj);

Suggestion: assert here that `tmp != noreg`, just like in `MacroAssembler::cmp_klass(Register src, Register dst, Register tmp1, Register tmp2)` below. Perhaps also assert that the input registers are different.

src/hotspot/cpu/x86/macroAssembler_x86.hpp line 379:

> 377:   // Uses tmp1 and tmp2 as temporary registers.
> 378:   void cmp_klass(Register src, Register dst, Register tmp1, Register tmp2);
> 379: 

The naming of these two functions could be made clearer and more consistent with their documentation. Please consider renaming the four-argument `cmp_klass` function to `cmp_klasses_from_objects` or similar. The notion of "source" and "destination" in the parameter names is unclear, I suggest to just call them `obj`, `obj1`, `obj2` etc. 
Please also make sure that the parameter names are consistent in the declaration and definition (e.g. `dst` vs `obj`).

src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4008:

> 4006: #ifdef COMPILER2
> 4007:   if ((UseAVX == 2) && EnableX86ECoreOpts && !UseCompactObjectHeaders) {
> 4008:     generate_string_indexof(StubRoutines::_string_indexof_array);

This stub routine should be re-enabled if `UseCompactObjectHeaders` is to become non-experimental and enabled by default in the future. Is there a RFE for this task?

src/hotspot/share/opto/memnode.cpp line 1976:

> 1974:       // The field is Klass::_prototype_header.  Return its (constant) value.
> 1975:       assert(this->Opcode() == Op_LoadX, "must load a proper type from _prototype_header");
> 1976:       return TypeX::make(klass->prototype_header());

This code is dead, because by the time we call `load_array_final_field` from `LoadNode::Value` (its only caller) we know that if `UseCompactObjectHeaders`, then `tkls->offset() != in_bytes(Klass::prototype_header_offset()` (or else we would have returned from line 2161). Please remove it, or replace it with an assertion if you prefer.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1776676785
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1776628929
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1776644021
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1776663594
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1776621766

From stefank at openjdk.org  Thu Sep 26 09:14:34 2024
From: stefank at openjdk.org (Stefan Karlsson)
Date: Thu, 26 Sep 2024 09:14:34 GMT
Subject: RFR: 8340426: ZGC: Move defragment out of the allocation path
In-Reply-To: <iOWgd4xauN7tO8BWxLOov7sev7p1xi6_XCneWtFU-Ew=.b5afdc71-aae4-419f-8fd4-86eed815daf4@github.com>
References: <iOWgd4xauN7tO8BWxLOov7sev7p1xi6_XCneWtFU-Ew=.b5afdc71-aae4-419f-8fd4-86eed815daf4@github.com>
Message-ID: <pKwxwabFlmJQuP9Z_UIf55oinyfLbYVhZfSnCVud5Xw=.a8669679-dafe-4576-a7db-7c4ba013448b@github.com>

On Wed, 25 Sep 2024 20:05:17 GMT, Stefan Johansson <sjohanss at openjdk.org> wrote:

> Please review this change to move defragmentation of small pages out of the allocation path,
> 
> **Summary**
> In ZGC small pages are allocated at lower addresses while medium and large pages are allocated at higher addresses. Sometimes when memory usage is high or the distribution of pages in the cache demand it, small pages can be split out from medium or large pages. When this happens the small page is defragmented by remapping it to a lower address if it resides at a too high address. This is done to avoid fragmentation, but doing it in the allocation path comes with a cost since the remapping involves system calls.
> 
> This change moves the remapping away from the allocation path to when the pages are freed after being garbage collected. This can increase the fragmentation a bit, since small pages are allowed to reside at higher addresses for a period time. The expectation is that since small pages are frequently collected the period should be short enough to not cause any problems.
> 
> I've done detailed experiments with temporary JFR events to look at the cost of doing the defrag/remapping of pages. One problem is that by default we don't get a lot of defrags (unless we run out of memory), and to better investigate this I also altered the allocation path to force more defragmentation. These tests showed that moving defrag out of the allocation path not only have the positive effect of reducing the time spent allocating, but also spaced out the defragmentation a bit more.  The reason for this is that now we will defrag when a page is freed by the GC, instead of when the page is allocated and the tests show that often many threads allocate at the same time (and then also defrag at the same time). This in turn leads to many concurrent calls down to the system to remap the memory, and this increases the cost of the actual mapping (seen by adding JFR events).
> 
> **Additional testing**
> 
> - Functional testing in mach5 tier1-7
> - Sanity performance testing in aurora

Thanks for fixing this!

I would like to suggest the following style changes:
https://github.com/openjdk/jdk/commit/996688ae541d9fc9f88268f1d090af409c5ee65a
https://github.com/openjdk/jdk/compare/master...stefank:jdk:pull/21191

My main motivation for the suggestions is to get rid of the addition of the if / else block in the `free_page[s]` functions. The addition of them lead to code duplication, non-const initialization of the local variable, a disproportionate amount of lines compared to the rest of the code, which all lead to readability taking a hit, IMHO.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21191#issuecomment-2376393758

From rcastanedalo at openjdk.org  Thu Sep 26 09:54:57 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Thu, 26 Sep 2024 09:54:57 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v26]
In-Reply-To: <fEpfi4CzTFr1Pl-IyOAVyHNXWlVz33lHKQyzVY0kBc8=.cc39355b-ee7a-4118-952a-28e4184eec84@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <fEpfi4CzTFr1Pl-IyOAVyHNXWlVz33lHKQyzVY0kBc8=.cc39355b-ee7a-4118-952a-28e4184eec84@github.com>
Message-ID: <4sBfv1qLQjGZnrCuHBPuWp1PNkIDFLBjxMo3z_RR0Mo=.38e699ce-30bc-42fe-86b6-988df6700c82@github.com>

On Wed, 25 Sep 2024 12:53:17 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> This is the main body of the JEP 450: Compact Object Headers (Experimental).
>> 
>> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
>> 
>> Main changes:
>>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>>  - Arrays will now store their length at offset 8.
>>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co...
>
> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Allow LM_MONITOR on 32-bit platforms

src/hotspot/cpu/x86/x86_64.ad line 4388:

> 4386:   effect(KILL cr);
> 4387:   ins_cost(125); // XXX
> 4388:   format %{ "movl    $dst, $mem\t# compressed klass ptr" %}

For consistency with the aarch64 back-end:
Suggestion:

  format %{ "load_nklass_compact    $dst, $mem\t# compressed klass ptr" %}

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1776747538

From rkennke at openjdk.org  Thu Sep 26 11:41:53 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Thu, 26 Sep 2024 11:41:53 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v26]
In-Reply-To: <DKuidQyyiiJOsrerLCeziselVyG9Ztpzsyc64ZNjlI4=.722fc4f7-d7cc-49df-9d82-5c1094508f74@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <fEpfi4CzTFr1Pl-IyOAVyHNXWlVz33lHKQyzVY0kBc8=.cc39355b-ee7a-4118-952a-28e4184eec84@github.com>
 <DKuidQyyiiJOsrerLCeziselVyG9Ztpzsyc64ZNjlI4=.722fc4f7-d7cc-49df-9d82-5c1094508f74@github.com>
Message-ID: <hdttC4LhHE6y6Nbq0jjJwxxF4W0hTfJLK_siODDCBkM=.1667f495-fcd9-4b7f-8d58-24e27665eeb3@github.com>

On Thu, 26 Sep 2024 08:55:44 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Allow LM_MONITOR on 32-bit platforms
>
> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4008:
> 
>> 4006: #ifdef COMPILER2
>> 4007:   if ((UseAVX == 2) && EnableX86ECoreOpts && !UseCompactObjectHeaders) {
>> 4008:     generate_string_indexof(StubRoutines::_string_indexof_array);
> 
> This stub routine should be re-enabled if `UseCompactObjectHeaders` is to become non-experimental and enabled by default in the future. Is there a RFE for this task?

This comes from an assert in `LibraryCallKit::inline_string_indexOfI` and I believe we can perhaps remove that assert and the !UCOH clause. I checked a couple of tests that tripped that assert, and they seem to work fine, and I also checked the code in `LibraryCallKit::inline_string_indexOfI` and `generate_string_indexof_stubs()` and could not find anything obvious that requires the base offset to be >=16. I am not sure why that assert is there. I am now running tier1-4 with that change: https://github.com/rkennke/jdk/commit/7001783e8c11718226506f42b7c1f1fda1af3ad0

If you know (or find) any reason why we need that assert, please let me know. Otherwise I'd remove it, if you don't have objections.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1776888460

From shade at openjdk.org  Thu Sep 26 11:43:06 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Thu, 26 Sep 2024 11:43:06 GMT
Subject: RFR: 8341015: OopStorage location decoder crashes accessing
 non-initalized OopStorage
Message-ID: <VWg-M-ojhmYz04GJXC3n_m025xdf0Da9Fbdo1d7pBUA=.c186aff0-5c92-44f8-95c0-fa3a41614313@github.com>

When debugging CDS, I asked for `os::print_location` before OopStorage was initialized, like an error handler would do. This is a fairly unusual situation, this is why we have not seen it before. Anyhow, we entered the new code added by [JDK-8340392](https://bugs.openjdk.org/browse/JDK-8340392), which crashed on `OopStorage` that was `nullptr`. I think we should null-check `OopStorage` before calling into it.

Additional testing:
 - [x] OopStorageSetTest still passing
 - [x] Verified the check is now passing in similar debugging session

-------------

Commit messages:
 - Fix

Changes: https://git.openjdk.org/jdk/pull/21204/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21204&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8341015
  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/21204.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/21204/head:pull/21204

PR: https://git.openjdk.org/jdk/pull/21204

From rcastanedalo at openjdk.org  Thu Sep 26 12:16:52 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Thu, 26 Sep 2024 12:16:52 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v26]
In-Reply-To: <fEpfi4CzTFr1Pl-IyOAVyHNXWlVz33lHKQyzVY0kBc8=.cc39355b-ee7a-4118-952a-28e4184eec84@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <fEpfi4CzTFr1Pl-IyOAVyHNXWlVz33lHKQyzVY0kBc8=.cc39355b-ee7a-4118-952a-28e4184eec84@github.com>
Message-ID: <W4Z02hPO5V8J_XtvI4eXZDw3P3Kt5i4ULGHOSVpC5xk=.4031d913-80b2-4ca2-951e-e8d0e4fc03d4@github.com>

On Wed, 25 Sep 2024 12:53:17 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> This is the main body of the JEP 450: Compact Object Headers (Experimental).
>> 
>> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
>> 
>> Main changes:
>>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>>  - Arrays will now store their length at offset 8.
>>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co...
>
> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Allow LM_MONITOR on 32-bit platforms

src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2570:

> 2568:   // we get the heapBase in obj, and the narrowOop+klass_offset_in_bytes/sizeof(narrowOop) in index.
> 2569:   // When that happens, we need to lea the address into a single register, and subtract the
> 2570:   // klass_offset_in_bytes, to get the address of the mark-word.

Parts of this comment are obsolete after commit 2c4a7877, please update the comment.

src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 882:

> 880:   void store_klass(Register dst, Register src);
> 881:   void cmp_klass(Register oop, Register trial_klass, Register tmp);
> 882:   void cmp_klass(Register src, Register dst, Register tmp1, Register tmp2);

Same suggestion as for the analogous x86 functions: consider renaming the four-argument `cmp_klass` function to `cmp_klasses_from_objects` or similar, and the `src` and `dst` parameters to `oop1` and `oop2` or similar if there is no notion of "source" and "destination".

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1776927247
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1776942226

From kbarrett at openjdk.org  Thu Sep 26 12:19:35 2024
From: kbarrett at openjdk.org (Kim Barrett)
Date: Thu, 26 Sep 2024 12:19:35 GMT
Subject: RFR: 8341015: OopStorage location decoder crashes accessing
 non-initalized OopStorage
In-Reply-To: <VWg-M-ojhmYz04GJXC3n_m025xdf0Da9Fbdo1d7pBUA=.c186aff0-5c92-44f8-95c0-fa3a41614313@github.com>
References: <VWg-M-ojhmYz04GJXC3n_m025xdf0Da9Fbdo1d7pBUA=.c186aff0-5c92-44f8-95c0-fa3a41614313@github.com>
Message-ID: <vqpXbgEcN9cy_9u4aQ5DsrutKp0JHqb3omkQDPNJbQo=.0cca6748-802f-4681-80c8-c410d564cac2@github.com>

On Thu, 26 Sep 2024 11:36:26 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> When debugging CDS, I asked for `os::print_location` before OopStorage was initialized, like an error handler would do. This is a fairly unusual situation, this is why we have not seen it before. Anyhow, we entered the new code added by [JDK-8340392](https://bugs.openjdk.org/browse/JDK-8340392), which crashed on `OopStorage` that was `nullptr`. I think we should null-check `OopStorage` before calling into it.
> 
> Additional testing:
>  - [x] OopStorageSetTest still passing
>  - [x] Verified the check is now passing in similar debugging session

Looks good.

src/hotspot/share/gc/shared/oopStorageSet.cpp line 89:

> 87:     const void* aligned_addr = align_down(addr, alignof(oop));
> 88:     for (OopStorage* storage : Range<Id>()) {
> 89:       if (storage != nullptr && storage->print_containing((oop*) aligned_addr, st)) {

Add a comment?  Something like "Might get here while handling error before storage initialization."

-------------

Marked as reviewed by kbarrett (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/21204#pullrequestreview-2331047091
PR Review Comment: https://git.openjdk.org/jdk/pull/21204#discussion_r1776951002

From shade at openjdk.org  Thu Sep 26 12:47:47 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Thu, 26 Sep 2024 12:47:47 GMT
Subject: RFR: 8341015: OopStorage location decoder crashes accessing
 non-initalized OopStorage [v2]
In-Reply-To: <VWg-M-ojhmYz04GJXC3n_m025xdf0Da9Fbdo1d7pBUA=.c186aff0-5c92-44f8-95c0-fa3a41614313@github.com>
References: <VWg-M-ojhmYz04GJXC3n_m025xdf0Da9Fbdo1d7pBUA=.c186aff0-5c92-44f8-95c0-fa3a41614313@github.com>
Message-ID: <gDzWhB5KUiTUP1gUKXN2QSnTh30HJKQGD8p_Kn1Sc4k=.5dc44843-d833-4ee8-8c2f-272e8a014f22@github.com>

> When debugging CDS, I asked for `os::print_location` before OopStorage was initialized, like an error handler would do. This is a fairly unusual situation, this is why we have not seen it before. Anyhow, we entered the new code added by [JDK-8340392](https://bugs.openjdk.org/browse/JDK-8340392), which crashed on `OopStorage` that was `nullptr`. I think we should null-check `OopStorage` before calling into it.
> 
> Additional testing:
>  - [x] OopStorageSetTest still passing
>  - [x] Verified the check is now passing in similar debugging session

Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:

  Touchups

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/21204/files
  - new: https://git.openjdk.org/jdk/pull/21204/files/c2e276c6..73b21b46

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=21204&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21204&range=00-01

  Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/21204.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/21204/head:pull/21204

PR: https://git.openjdk.org/jdk/pull/21204

From shade at openjdk.org  Thu Sep 26 12:47:48 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Thu, 26 Sep 2024 12:47:48 GMT
Subject: RFR: 8341015: OopStorage location decoder crashes accessing
 non-initalized OopStorage [v2]
In-Reply-To: <vqpXbgEcN9cy_9u4aQ5DsrutKp0JHqb3omkQDPNJbQo=.0cca6748-802f-4681-80c8-c410d564cac2@github.com>
References: <VWg-M-ojhmYz04GJXC3n_m025xdf0Da9Fbdo1d7pBUA=.c186aff0-5c92-44f8-95c0-fa3a41614313@github.com>
 <vqpXbgEcN9cy_9u4aQ5DsrutKp0JHqb3omkQDPNJbQo=.0cca6748-802f-4681-80c8-c410d564cac2@github.com>
Message-ID: <-74WDew849TqHNEh7XAyT1-bSP_TkCqjY5SnEy509bY=.692c0018-a6ca-4240-b167-294975ab821a@github.com>

On Thu, 26 Sep 2024 12:16:27 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:

>> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Touchups
>
> src/hotspot/share/gc/shared/oopStorageSet.cpp line 89:
> 
>> 87:     const void* aligned_addr = align_down(addr, alignof(oop));
>> 88:     for (OopStorage* storage : Range<Id>()) {
>> 89:       if (storage != nullptr && storage->print_containing((oop*) aligned_addr, st)) {
> 
> Add a comment?  Something like "Might get here while handling error before storage initialization."

Sure thing, see new commit.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/21204#discussion_r1776997613

From tschatzl at openjdk.org  Thu Sep 26 12:55:35 2024
From: tschatzl at openjdk.org (Thomas Schatzl)
Date: Thu, 26 Sep 2024 12:55:35 GMT
Subject: RFR: 8341015: OopStorage location decoder crashes accessing
 non-initalized OopStorage [v2]
In-Reply-To: <gDzWhB5KUiTUP1gUKXN2QSnTh30HJKQGD8p_Kn1Sc4k=.5dc44843-d833-4ee8-8c2f-272e8a014f22@github.com>
References: <VWg-M-ojhmYz04GJXC3n_m025xdf0Da9Fbdo1d7pBUA=.c186aff0-5c92-44f8-95c0-fa3a41614313@github.com>
 <gDzWhB5KUiTUP1gUKXN2QSnTh30HJKQGD8p_Kn1Sc4k=.5dc44843-d833-4ee8-8c2f-272e8a014f22@github.com>
Message-ID: <i4GkmmofJcPE9-iu357UXbg9pyGdOtAfZoMQTTHqEsE=.36147f52-b5eb-4627-8aa1-4d86947612de@github.com>

On Thu, 26 Sep 2024 12:47:47 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> When debugging CDS, I asked for `os::print_location` before OopStorage was initialized, like an error handler would do. This is a fairly unusual situation, this is why we have not seen it before. Anyhow, we entered the new code added by [JDK-8340392](https://bugs.openjdk.org/browse/JDK-8340392), which crashed on `OopStorage` that was `nullptr`. I think we should null-check `OopStorage` before calling into it.
>> 
>> Additional testing:
>>  - [x] OopStorageSetTest still passing
>>  - [x] Verified the check is now passing in similar debugging session
>
> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Touchups

lgtm, maybe it's worth to explicitly print an "unitialized" message

-------------

Marked as reviewed by tschatzl (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/21204#pullrequestreview-2331150552

From tschatzl at openjdk.org  Thu Sep 26 12:55:35 2024
From: tschatzl at openjdk.org (Thomas Schatzl)
Date: Thu, 26 Sep 2024 12:55:35 GMT
Subject: RFR: 8341015: OopStorage location decoder crashes accessing
 non-initalized OopStorage [v2]
In-Reply-To: <-74WDew849TqHNEh7XAyT1-bSP_TkCqjY5SnEy509bY=.692c0018-a6ca-4240-b167-294975ab821a@github.com>
References: <VWg-M-ojhmYz04GJXC3n_m025xdf0Da9Fbdo1d7pBUA=.c186aff0-5c92-44f8-95c0-fa3a41614313@github.com>
 <vqpXbgEcN9cy_9u4aQ5DsrutKp0JHqb3omkQDPNJbQo=.0cca6748-802f-4681-80c8-c410d564cac2@github.com>
 <-74WDew849TqHNEh7XAyT1-bSP_TkCqjY5SnEy509bY=.692c0018-a6ca-4240-b167-294975ab821a@github.com>
Message-ID: <3suLikiPJ0bF8rh7_2xpcdrlHcQlDw0iuaN-gS1rxWs=.4c850872-1a6f-490b-b340-808a8521bf4a@github.com>

On Thu, 26 Sep 2024 12:43:40 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> src/hotspot/share/gc/shared/oopStorageSet.cpp line 89:
>> 
>>> 87:     const void* aligned_addr = align_down(addr, alignof(oop));
>>> 88:     for (OopStorage* storage : Range<Id>()) {
>>> 89:       if (storage != nullptr && storage->print_containing((oop*) aligned_addr, st)) {
>> 
>> Add a comment?  Something like "Might get here while handling error before storage initialization."
>
> Sure thing, see new commit.

Another maybe preferable option could be printing "uninitialized" or something.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/21204#discussion_r1777011233

From rcastanedalo at openjdk.org  Thu Sep 26 13:07:55 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Thu, 26 Sep 2024 13:07:55 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v26]
In-Reply-To: <hdttC4LhHE6y6Nbq0jjJwxxF4W0hTfJLK_siODDCBkM=.1667f495-fcd9-4b7f-8d58-24e27665eeb3@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <fEpfi4CzTFr1Pl-IyOAVyHNXWlVz33lHKQyzVY0kBc8=.cc39355b-ee7a-4118-952a-28e4184eec84@github.com>
 <DKuidQyyiiJOsrerLCeziselVyG9Ztpzsyc64ZNjlI4=.722fc4f7-d7cc-49df-9d82-5c1094508f74@github.com>
 <hdttC4LhHE6y6Nbq0jjJwxxF4W0hTfJLK_siODDCBkM=.1667f495-fcd9-4b7f-8d58-24e27665eeb3@github.com>
Message-ID: <fWSdSqQfhOrGpFoExiaO4L-UDtR90_t0gRxp9YBO3_c=.0173618e-f392-4921-8fae-39d44b24de08@github.com>

On Thu, 26 Sep 2024 11:39:02 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4008:
>> 
>>> 4006: #ifdef COMPILER2
>>> 4007:   if ((UseAVX == 2) && EnableX86ECoreOpts && !UseCompactObjectHeaders) {
>>> 4008:     generate_string_indexof(StubRoutines::_string_indexof_array);
>> 
>> This stub routine should be re-enabled if `UseCompactObjectHeaders` is to become non-experimental and enabled by default in the future. Is there a RFE for this task?
>
> This comes from an assert in `LibraryCallKit::inline_string_indexOfI` and I believe we can perhaps remove that assert and the !UCOH clause. I checked a couple of tests that tripped that assert, and they seem to work fine, and I also checked the code in `LibraryCallKit::inline_string_indexOfI` and `generate_string_indexof_stubs()` and could not find anything obvious that requires the base offset to be >=16. I am not sure why that assert is there. I am now running tier1-4 with that change: https://github.com/rkennke/jdk/commit/7001783e8c11718226506f42b7c1f1fda1af3ad0
> 
> If you know (or find) any reason why we need that assert, please let me know. Otherwise I'd remove it, if you don't have objections.

I am not familiar with the `indexOf` implementation, but here is a relevant comment that motivates the assertion: https://github.com/openjdk/jdk/pull/16753#discussion_r1592774634.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1777033220

From kbarrett at openjdk.org  Thu Sep 26 13:13:37 2024
From: kbarrett at openjdk.org (Kim Barrett)
Date: Thu, 26 Sep 2024 13:13:37 GMT
Subject: RFR: 8341015: OopStorage location decoder crashes accessing
 non-initalized OopStorage [v2]
In-Reply-To: <gDzWhB5KUiTUP1gUKXN2QSnTh30HJKQGD8p_Kn1Sc4k=.5dc44843-d833-4ee8-8c2f-272e8a014f22@github.com>
References: <VWg-M-ojhmYz04GJXC3n_m025xdf0Da9Fbdo1d7pBUA=.c186aff0-5c92-44f8-95c0-fa3a41614313@github.com>
 <gDzWhB5KUiTUP1gUKXN2QSnTh30HJKQGD8p_Kn1Sc4k=.5dc44843-d833-4ee8-8c2f-272e8a014f22@github.com>
Message-ID: <asWkJ6UiJs7Z3kN91m_ZB-fdgc9tQ7xJcR7gHiwzabQ=.720073e3-3a57-4cb6-8ca4-d5ba6988ad4c@github.com>

On Thu, 26 Sep 2024 12:47:47 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> When debugging CDS, I asked for `os::print_location` before OopStorage was initialized, like an error handler would do. This is a fairly unusual situation, this is why we have not seen it before. Anyhow, we entered the new code added by [JDK-8340392](https://bugs.openjdk.org/browse/JDK-8340392), which crashed on `OopStorage` that was `nullptr`. I think we should null-check `OopStorage` before calling into it.
>> 
>> Additional testing:
>>  - [x] OopStorageSetTest still passing
>>  - [x] Verified the check is now passing in similar debugging session
>
> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Touchups

Still looks good.

-------------

Marked as reviewed by kbarrett (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/21204#pullrequestreview-2331218638

From kbarrett at openjdk.org  Thu Sep 26 13:13:38 2024
From: kbarrett at openjdk.org (Kim Barrett)
Date: Thu, 26 Sep 2024 13:13:38 GMT
Subject: RFR: 8341015: OopStorage location decoder crashes accessing
 non-initalized OopStorage [v2]
In-Reply-To: <3suLikiPJ0bF8rh7_2xpcdrlHcQlDw0iuaN-gS1rxWs=.4c850872-1a6f-490b-b340-808a8521bf4a@github.com>
References: <VWg-M-ojhmYz04GJXC3n_m025xdf0Da9Fbdo1d7pBUA=.c186aff0-5c92-44f8-95c0-fa3a41614313@github.com>
 <vqpXbgEcN9cy_9u4aQ5DsrutKp0JHqb3omkQDPNJbQo=.0cca6748-802f-4681-80c8-c410d564cac2@github.com>
 <-74WDew849TqHNEh7XAyT1-bSP_TkCqjY5SnEy509bY=.692c0018-a6ca-4240-b167-294975ab821a@github.com>
 <3suLikiPJ0bF8rh7_2xpcdrlHcQlDw0iuaN-gS1rxWs=.4c850872-1a6f-490b-b340-808a8521bf4a@github.com>
Message-ID: <K6DuwxXWlSxPYA-fvVoT2UcEzOZq8r60wjcxebFi5WY=.6e77e35d-f373-40a4-b4e6-c30c0813f62a@github.com>

On Thu, 26 Sep 2024 12:52:51 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:

>> Sure thing, see new commit.
>
> Another maybe preferable option could be printing "uninitialized" or something.

@tschatzl I don't think printing "uninitialized" or anything like that is really appropriate here.  What this code
is doing is printing something if-and-only-if the pointer is found to be in an oopstorage block.  There aren't
any of those if there's no oopstorage yet.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/21204#discussion_r1777052593

From rkennke at openjdk.org  Thu Sep 26 14:00:58 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Thu, 26 Sep 2024 14:00:58 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v26]
In-Reply-To: <fWSdSqQfhOrGpFoExiaO4L-UDtR90_t0gRxp9YBO3_c=.0173618e-f392-4921-8fae-39d44b24de08@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <fEpfi4CzTFr1Pl-IyOAVyHNXWlVz33lHKQyzVY0kBc8=.cc39355b-ee7a-4118-952a-28e4184eec84@github.com>
 <DKuidQyyiiJOsrerLCeziselVyG9Ztpzsyc64ZNjlI4=.722fc4f7-d7cc-49df-9d82-5c1094508f74@github.com>
 <hdttC4LhHE6y6Nbq0jjJwxxF4W0hTfJLK_siODDCBkM=.1667f495-fcd9-4b7f-8d58-24e27665eeb3@github.com>
 <fWSdSqQfhOrGpFoExiaO4L-UDtR90_t0gRxp9YBO3_c=.0173618e-f392-4921-8fae-39d44b24de08@github.com>
Message-ID: <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com>

On Thu, 26 Sep 2024 13:04:57 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> This comes from an assert in `LibraryCallKit::inline_string_indexOfI` and I believe we can perhaps remove that assert and the !UCOH clause. I checked a couple of tests that tripped that assert, and they seem to work fine, and I also checked the code in `LibraryCallKit::inline_string_indexOfI` and `generate_string_indexof_stubs()` and could not find anything obvious that requires the base offset to be >=16. I am not sure why that assert is there. I am now running tier1-4 with that change: https://github.com/rkennke/jdk/commit/7001783e8c11718226506f42b7c1f1fda1af3ad0
>> 
>> If you know (or find) any reason why we need that assert, please let me know. Otherwise I'd remove it, if you don't have objections.
>
> I am not familiar with the `indexOf` implementation, but here is a relevant comment that motivates the assertion: https://github.com/openjdk/jdk/pull/16753#discussion_r1592774634.

Ok, this is indeed relevant and helpful. This could segfault if we happen to read from the very first object on the heap. I can solve this by allowing to copy only 8 bytes onto the stack: https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1

Does this look correct to you? Or better to do it as a follow-up?
(It passes a couple of indexOf tests, will run tier1-4 on it).

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1777134871

From rkennke at openjdk.org  Thu Sep 26 14:04:43 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Thu, 26 Sep 2024 14:04:43 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v27]
In-Reply-To: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
Message-ID: <vXY57OOQjLAnLf9faNtb5Zr_geVFfS2x_BURnooT9iU=.e7c87d03-e6c3-42b6-87ff-78f116bb5dcf@github.com>

> This is the main body of the JEP 450: Compact Object Headers (Experimental).
> 
> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
> 
> Main changes:
>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>  - Arrays will now store their length at offset 8.
>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv...

Roman Kennke has updated the pull request incrementally with two additional commits since the last revision:

 - @robcasloz review comments
 - Improve CollectedHeap::is_oop()

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20677/files
  - new: https://git.openjdk.org/jdk/pull/20677/files/4904d433..d48f55d6

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=26
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=25-26

  Stats: 86 lines in 10 files changed: 20 ins; 21 del; 45 mod
  Patch: https://git.openjdk.org/jdk/pull/20677.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677

PR: https://git.openjdk.org/jdk/pull/20677

From shade at openjdk.org  Thu Sep 26 14:37:35 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Thu, 26 Sep 2024 14:37:35 GMT
Subject: RFR: 8341015: OopStorage location decoder crashes accessing
 non-initalized OopStorage [v2]
In-Reply-To: <i4GkmmofJcPE9-iu357UXbg9pyGdOtAfZoMQTTHqEsE=.36147f52-b5eb-4627-8aa1-4d86947612de@github.com>
References: <VWg-M-ojhmYz04GJXC3n_m025xdf0Da9Fbdo1d7pBUA=.c186aff0-5c92-44f8-95c0-fa3a41614313@github.com>
 <gDzWhB5KUiTUP1gUKXN2QSnTh30HJKQGD8p_Kn1Sc4k=.5dc44843-d833-4ee8-8c2f-272e8a014f22@github.com>
 <i4GkmmofJcPE9-iu357UXbg9pyGdOtAfZoMQTTHqEsE=.36147f52-b5eb-4627-8aa1-4d86947612de@github.com>
Message-ID: <htIPwdArq7UlCPRcq2j8-2uT0rL222YijL_3tPyKnA8=.2b3a4a82-47ea-4b4b-84a8-1cc93fdfa626@github.com>

On Thu, 26 Sep 2024 12:53:24 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:

> lgtm, maybe it's worth to explicitly print an "unitialized" message

A normal thing to do in these printers is to silently return, letting other printers to handle the location. If OopStorage does not recognize the pointer, the downstream NMT and generic SafeFetch code would try to look it up.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21204#issuecomment-2377156417

From rcastanedalo at openjdk.org  Thu Sep 26 16:02:55 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Thu, 26 Sep 2024 16:02:55 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v26]
In-Reply-To: <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <fEpfi4CzTFr1Pl-IyOAVyHNXWlVz33lHKQyzVY0kBc8=.cc39355b-ee7a-4118-952a-28e4184eec84@github.com>
 <DKuidQyyiiJOsrerLCeziselVyG9Ztpzsyc64ZNjlI4=.722fc4f7-d7cc-49df-9d82-5c1094508f74@github.com>
 <hdttC4LhHE6y6Nbq0jjJwxxF4W0hTfJLK_siODDCBkM=.1667f495-fcd9-4b7f-8d58-24e27665eeb3@github.com>
 <fWSdSqQfhOrGpFoExiaO4L-UDtR90_t0gRxp9YBO3_c=.0173618e-f392-4921-8fae-39d44b24de08@github.com>
 <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com>
Message-ID: <C5LwWrsa-VO87kdm_sYuisTGMsHoxv4Eoe0QikdkusE=.63b0993c-86c8-4ef6-a3cf-bc8a67e10815@github.com>

On Thu, 26 Sep 2024 13:58:02 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

> Does this look correct to you? Or better to do it as a follow-up?

I do not feel confident enough to review this part. If you want to include https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1 in this changeset, I would prefer that the original author of JDK-8320448 or at least someone from Intel reviews it, otherwise I think it is fine to leave it as a follow-up enhancement.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1777370316

From rkennke at openjdk.org  Thu Sep 26 16:18:58 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Thu, 26 Sep 2024 16:18:58 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v26]
In-Reply-To: <C5LwWrsa-VO87kdm_sYuisTGMsHoxv4Eoe0QikdkusE=.63b0993c-86c8-4ef6-a3cf-bc8a67e10815@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <fEpfi4CzTFr1Pl-IyOAVyHNXWlVz33lHKQyzVY0kBc8=.cc39355b-ee7a-4118-952a-28e4184eec84@github.com>
 <DKuidQyyiiJOsrerLCeziselVyG9Ztpzsyc64ZNjlI4=.722fc4f7-d7cc-49df-9d82-5c1094508f74@github.com>
 <hdttC4LhHE6y6Nbq0jjJwxxF4W0hTfJLK_siODDCBkM=.1667f495-fcd9-4b7f-8d58-24e27665eeb3@github.com>
 <fWSdSqQfhOrGpFoExiaO4L-UDtR90_t0gRxp9YBO3_c=.0173618e-f392-4921-8fae-39d44b24de08@github.com>
 <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com>
 <C5LwWrsa-VO87kdm_sYuisTGMsHoxv4Eoe0QikdkusE=.63b0993c-86c8-4ef6-a3cf-bc8a67e10815@github.com>
Message-ID: <WOlTJMNHCr_Kk69sEIUwCVI3ZbiRlckY4TSBRfL7zoA=.1a4c59f1-a000-4d9f-abbc-36b24ed5105e@github.com>

On Thu, 26 Sep 2024 15:59:50 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> Ok, this is indeed relevant and helpful. This could segfault if we happen to read from the very first object on the heap. I can solve this by allowing to copy only 8 bytes onto the stack: https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1
>> 
>> Does this look correct to you? Or better to do it as a follow-up?
>> (It passes a couple of indexOf tests, will run tier1-4 on it).
>
>> Does this look correct to you? Or better to do it as a follow-up?
> 
> I do not feel confident enough to review this part. If you want to include https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1 in this changeset, I would prefer that the original author of JDK-8320448 or at least someone from Intel reviews it, otherwise I think it is fine to leave it as a follow-up enhancement.

@sviswa7 or @asgibbons WDYT about including https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1 as part of compact object headers implementation? Otherwise we would have to disable indexOf intrinsic when running with compact headers, because of the assumption that array headers are >= 16 bytes, which is no longer true with compact headers.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1777396409

From sgibbons at openjdk.org  Thu Sep 26 17:27:50 2024
From: sgibbons at openjdk.org (Scott Gibbons)
Date: Thu, 26 Sep 2024 17:27:50 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v26]
In-Reply-To: <WOlTJMNHCr_Kk69sEIUwCVI3ZbiRlckY4TSBRfL7zoA=.1a4c59f1-a000-4d9f-abbc-36b24ed5105e@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <fEpfi4CzTFr1Pl-IyOAVyHNXWlVz33lHKQyzVY0kBc8=.cc39355b-ee7a-4118-952a-28e4184eec84@github.com>
 <DKuidQyyiiJOsrerLCeziselVyG9Ztpzsyc64ZNjlI4=.722fc4f7-d7cc-49df-9d82-5c1094508f74@github.com>
 <hdttC4LhHE6y6Nbq0jjJwxxF4W0hTfJLK_siODDCBkM=.1667f495-fcd9-4b7f-8d58-24e27665eeb3@github.com>
 <fWSdSqQfhOrGpFoExiaO4L-UDtR90_t0gRxp9YBO3_c=.0173618e-f392-4921-8fae-39d44b24de08@github.com>
 <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com>
 <C5LwWrsa-VO87kdm_sYuisTGMsHoxv4Eoe0QikdkusE=.63b0993c-86c8-4ef6-a3cf-bc8a67e10815@github.com>
 <WOlTJMNHCr_Kk69sEIUwCVI3ZbiRlckY4TSBRfL7zoA=.1a4c59f1-a000-4d9f-abbc-36b24ed5105e@github.com>
Message-ID: <UjwpWz9Y3W8WRzYHZmDk9EvoiLSCZG1aai5oASGtDKA=.68cab6dd-e67d-4651-a9c9-75dccf42d85f@github.com>

On Thu, 26 Sep 2024 16:15:39 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>>> Does this look correct to you? Or better to do it as a follow-up?
>> 
>> I do not feel confident enough to review this part. If you want to include https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1 in this changeset, I would prefer that the original author of JDK-8320448 or at least someone from Intel reviews it, otherwise I think it is fine to leave it as a follow-up enhancement.
>
> @sviswa7 or @asgibbons WDYT about including https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1 as part of compact object headers implementation? Otherwise we would have to disable indexOf intrinsic when running with compact headers, because of the assumption that array headers are >= 16 bytes, which is no longer true with compact headers.

@rkennke I reviewed [rkennke@ 097c2af](https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1) and the code looks good to me.  I would prefer this approach instead of not generating the IndexOf intrinsic.

Should the controlling `if` be conditioned on `UseCompactObjectHeaders` instead of `arrayOopDesc::base_offset_in_bytes`?  I can see benefits to either - which provides more clarity?  I like the assert as it makes the intention clear (thanks!).

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1777485078

From xpeng at openjdk.org  Thu Sep 26 17:42:35 2024
From: xpeng at openjdk.org (Xiaolong Peng)
Date: Thu, 26 Sep 2024 17:42:35 GMT
Subject: RFR: 8340490: Shenandoah: Optimize ShenandoahPacer [v2]
In-Reply-To: <Xf89-L8wLAMbngt5PxoN3B0sX94chZQz5zjqfuiBYZM=.6c6387e8-572a-4c47-8de2-d4f407924c88@github.com>
References: <mh4-1fuv1_9n0e_BCeIX-WHnXZN-1A7woAkyJinXkIU=.c03edc09-82fc-44c2-b09b-22d543fb4ec4@github.com>
 <61bsu02-sDgUFx45vjC3xAvAaDtIY405Qm8lh3Edp28=.5a1c2d42-a51c-4014-b5e3-64cbd02786ed@github.com>
 <p1_dfGVCQ8Hswy1n4E2HPW22LVPjwPVba5cecSQxrPg=.22aeb487-488f-46e7-a86a-be1f4a6baedd@github.com>
 <Xf89-L8wLAMbngt5PxoN3B0sX94chZQz5zjqfuiBYZM=.6c6387e8-572a-4c47-8de2-d4f407924c88@github.com>
Message-ID: <dBqq5vX5rTQzA5tfApAmif_B_rXqVs2kuzq5_Q1hpIk=.a664e2b4-0476-44ae-a7db-34eb1b65f5ec@github.com>

On Sat, 21 Sep 2024 05:52:10 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> > > I am good with this, assuming performance runs show good results.
> > 
> > 
> > Latency wise, in most time it is better than old impl.
> 
> It is great it improves targeted tests, and it makes sense from the first principles. Run our usual performance pipeline to sanity check if this affects any other benchmarks in any meaningful way.

Performance pipeline showed improvments in most Dacapo benchmarks, we did found very small regression in Dacapo Spring max latency(<1%?), tried to reproduce it with bare metal instance and can't really stably reproduce the regression, sometime  better and sometime worse, it could be just noises.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21099#issuecomment-2377567597

From kdnilsen at openjdk.org  Thu Sep 26 17:57:36 2024
From: kdnilsen at openjdk.org (Kelvin Nilsen)
Date: Thu, 26 Sep 2024 17:57:36 GMT
Subject: RFR: 8340490: Shenandoah: Optimize ShenandoahPacer [v2]
In-Reply-To: <U4-6cg1j9inHQdKJDt0mYEl9DFc6nQ9sbdiao4hSH_4=.9e421feb-a9c7-4c77-b745-e307749b052f@github.com>
References: <mh4-1fuv1_9n0e_BCeIX-WHnXZN-1A7woAkyJinXkIU=.c03edc09-82fc-44c2-b09b-22d543fb4ec4@github.com>
 <U4-6cg1j9inHQdKJDt0mYEl9DFc6nQ9sbdiao4hSH_4=.9e421feb-a9c7-4c77-b745-e307749b052f@github.com>
Message-ID: <S0JZpd5zEqYPmQBxIgdAcKqqqHID9Bgm8TP_38lEUWQ=.0bff936d-6b39-4b54-bc6a-c33193c184b8@github.com>

On Fri, 20 Sep 2024 18:47:50 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote:

>> In a simple latency benchmark for memory allocation, I found ShenandoahPacer contributed quite a lot to the long tail latency > 10ms, when there are multi mutator threads failed at fast path to claim budget [here](https://github.com/openjdk/jdk/blob/fdc16a373459cb2311316448c765b1bee5c73694/src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp#L230), all of them will forcefully claim and them wait for up to 10ms([code link](https://github.com/openjdk/jdk/blob/fdc16a373459cb2311316448c765b1bee5c73694/src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp#L239-L277))
>> 
>> The change in this PR makes ShenandoahPacer impact long tail latency much less, instead forcefully claim budget and them wait, it attempts to claim after waiting for 1ms, and keep doing this until: 1/ either spent 10ms waiting in total; 2/ or successfully claimed the budget.
>> 
>> Here the latency comparison for the optimization:
>> ![hdr-histogram-optimize-pacer](https://github.com/user-attachments/assets/811f48c5-87eb-462d-8b27-d15bd08be7b0)
>> 
>> With the optimization, long tail latency from the test code below has been much improved from over 20ms to ~10ms on MacOS with M3 chip:
>> 
>>     static final int threadCount = Runtime.getRuntime().availableProcessors();
>>     static final LongAdder totalCount = new LongAdder();
>>     static volatile byte[] sink;
>>     public static void main(String[] args) {
>>         runAllocationTest(100000);
>>     }
>>     static void recordTimeToAllocate(final int dataSize, final Histogram histogram) {
>>         long startTime = System.nanoTime();
>>         sink = new byte[dataSize];
>>         long endTime = System.nanoTime();
>>         histogram.recordValue(endTime - startTime);
>>     }
>> 
>>     static void runAllocationTest(final int dataSize) {
>>         final long endTime = System.currentTimeMillis() + 30_000;
>>         final CountDownLatch startSignal = new CountDownLatch(1);
>>         final CountDownLatch finished = new CountDownLatch(threadCount);
>>         final Thread[] threads = new Thread[threadCount];
>>         final Histogram[] histograms = new Histogram[threadCount];
>>         final Histogram totalHistogram = new Histogram(3600000000000L, 3);
>>         for (int i = 0; i < threadCount; i++) {
>>             final var histogram = new Histogram(3600000000000L, 3);
>>             histograms[i] = histogram;
>>             threads[i] = new Thread(() -> {
>>                 wait(startSignal);
>>                 do {
>>                     recordTimeToAllocate(dataS...
>
> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision:
> 
>   clean up

Marked as reviewed by kdnilsen (Author).

-------------

PR Review: https://git.openjdk.org/jdk/pull/21099#pullrequestreview-2332001130

From xpeng at openjdk.org  Thu Sep 26 18:57:35 2024
From: xpeng at openjdk.org (Xiaolong Peng)
Date: Thu, 26 Sep 2024 18:57:35 GMT
Subject: RFR: 8340490: Shenandoah: Optimize ShenandoahPacer [v2]
In-Reply-To: <Xf89-L8wLAMbngt5PxoN3B0sX94chZQz5zjqfuiBYZM=.6c6387e8-572a-4c47-8de2-d4f407924c88@github.com>
References: <mh4-1fuv1_9n0e_BCeIX-WHnXZN-1A7woAkyJinXkIU=.c03edc09-82fc-44c2-b09b-22d543fb4ec4@github.com>
 <61bsu02-sDgUFx45vjC3xAvAaDtIY405Qm8lh3Edp28=.5a1c2d42-a51c-4014-b5e3-64cbd02786ed@github.com>
 <p1_dfGVCQ8Hswy1n4E2HPW22LVPjwPVba5cecSQxrPg=.22aeb487-488f-46e7-a86a-be1f4a6baedd@github.com>
 <Xf89-L8wLAMbngt5PxoN3B0sX94chZQz5zjqfuiBYZM=.6c6387e8-572a-4c47-8de2-d4f407924c88@github.com>
Message-ID: <VhfOzl_EgVdGxX46fFI_-aQnfHkV8NUL85j29o_VlH0=.e6f9a5a3-e457-4fc6-8f34-e7c96e2e8166@github.com>

On Sat, 21 Sep 2024 05:52:10 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>>> I am good with this, assuming performance runs show good results.
>> 
>> Latency wise, in most time it is better than old impl. 
>> 
>> In my specific test with 8G heap on MacOS, throughput is very close to the test w/ ShenandoahPacing disabled, and about 25%~30% improvement comparing the old implementation.
>
>> > I am good with this, assuming performance runs show good results.
>> 
>> Latency wise, in most time it is better than old impl.
> 
> It is great it improves targeted tests, and it makes sense from the first principles. Run our usual performance pipeline to sanity check if this affects any other benchmarks in any meaningful way.

@shipilev Need you to review it again since I pushed minor refactor and format change as per your comments.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21099#issuecomment-2377706480

From shade at openjdk.org  Fri Sep 27 07:43:42 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Fri, 27 Sep 2024 07:43:42 GMT
Subject: RFR: 8340490: Shenandoah: Optimize ShenandoahPacer [v2]
In-Reply-To: <U4-6cg1j9inHQdKJDt0mYEl9DFc6nQ9sbdiao4hSH_4=.9e421feb-a9c7-4c77-b745-e307749b052f@github.com>
References: <mh4-1fuv1_9n0e_BCeIX-WHnXZN-1A7woAkyJinXkIU=.c03edc09-82fc-44c2-b09b-22d543fb4ec4@github.com>
 <U4-6cg1j9inHQdKJDt0mYEl9DFc6nQ9sbdiao4hSH_4=.9e421feb-a9c7-4c77-b745-e307749b052f@github.com>
Message-ID: <MQtbCrTMWUyadYeOA5725L_r1i5cGLZSeq4l71KgTeQ=.da2515ef-250f-4960-8b1c-f635d66ceab9@github.com>

On Fri, 20 Sep 2024 18:47:50 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote:

>> In a simple latency benchmark for memory allocation, I found ShenandoahPacer contributed quite a lot to the long tail latency > 10ms, when there are multi mutator threads failed at fast path to claim budget [here](https://github.com/openjdk/jdk/blob/fdc16a373459cb2311316448c765b1bee5c73694/src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp#L230), all of them will forcefully claim and them wait for up to 10ms([code link](https://github.com/openjdk/jdk/blob/fdc16a373459cb2311316448c765b1bee5c73694/src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp#L239-L277))
>> 
>> The change in this PR makes ShenandoahPacer impact long tail latency much less, instead forcefully claim budget and them wait, it attempts to claim after waiting for 1ms, and keep doing this until: 1/ either spent 10ms waiting in total; 2/ or successfully claimed the budget.
>> 
>> Here the latency comparison for the optimization:
>> ![hdr-histogram-optimize-pacer](https://github.com/user-attachments/assets/811f48c5-87eb-462d-8b27-d15bd08be7b0)
>> 
>> With the optimization, long tail latency from the test code below has been much improved from over 20ms to ~10ms on MacOS with M3 chip:
>> 
>>     static final int threadCount = Runtime.getRuntime().availableProcessors();
>>     static final LongAdder totalCount = new LongAdder();
>>     static volatile byte[] sink;
>>     public static void main(String[] args) {
>>         runAllocationTest(100000);
>>     }
>>     static void recordTimeToAllocate(final int dataSize, final Histogram histogram) {
>>         long startTime = System.nanoTime();
>>         sink = new byte[dataSize];
>>         long endTime = System.nanoTime();
>>         histogram.recordValue(endTime - startTime);
>>     }
>> 
>>     static void runAllocationTest(final int dataSize) {
>>         final long endTime = System.currentTimeMillis() + 30_000;
>>         final CountDownLatch startSignal = new CountDownLatch(1);
>>         final CountDownLatch finished = new CountDownLatch(threadCount);
>>         final Thread[] threads = new Thread[threadCount];
>>         final Histogram[] histograms = new Histogram[threadCount];
>>         final Histogram totalHistogram = new Histogram(3600000000000L, 3);
>>         for (int i = 0; i < threadCount; i++) {
>>             final var histogram = new Histogram(3600000000000L, 3);
>>             histograms[i] = histogram;
>>             threads[i] = new Thread(() -> {
>>                 wait(startSignal);
>>                 do {
>>                     recordTimeToAllocate(dataS...
>
> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision:
> 
>   clean up

Marked as reviewed by shade (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/21099#pullrequestreview-2333007889

From axel.boldt-christmas at oracle.com  Fri Sep 27 08:02:33 2024
From: axel.boldt-christmas at oracle.com (Axel Boldt-Christmas)
Date: Fri, 27 Sep 2024 08:02:33 +0000
Subject: RFC: ZGC: Remove Non-Generational Mode
Message-ID: <9010A225-7333-48D1-A17F-A21085175D7A@oracle.com>

Hi,

I have written a draft JEP for removing the non-generational mode of ZGC. The JEP description is available in JBS:

https://bugs.openjdk.org/browse/JDK-8335850

Comments and feedback are welcome.

// Axel Boldt-Christmas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20240927/a3d69334/attachment.htm>

From rkennke at openjdk.org  Fri Sep 27 08:27:57 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Fri, 27 Sep 2024 08:27:57 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v26]
In-Reply-To: <UjwpWz9Y3W8WRzYHZmDk9EvoiLSCZG1aai5oASGtDKA=.68cab6dd-e67d-4651-a9c9-75dccf42d85f@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <fEpfi4CzTFr1Pl-IyOAVyHNXWlVz33lHKQyzVY0kBc8=.cc39355b-ee7a-4118-952a-28e4184eec84@github.com>
 <DKuidQyyiiJOsrerLCeziselVyG9Ztpzsyc64ZNjlI4=.722fc4f7-d7cc-49df-9d82-5c1094508f74@github.com>
 <hdttC4LhHE6y6Nbq0jjJwxxF4W0hTfJLK_siODDCBkM=.1667f495-fcd9-4b7f-8d58-24e27665eeb3@github.com>
 <fWSdSqQfhOrGpFoExiaO4L-UDtR90_t0gRxp9YBO3_c=.0173618e-f392-4921-8fae-39d44b24de08@github.com>
 <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com>
 <C5LwWrsa-VO87kdm_sYuisTGMsHoxv4Eoe0QikdkusE=.63b0993c-86c8-4ef6-a3cf-bc8a67e10815@github.com>
 <WOlTJMNHCr_Kk69sEIUwCVI3ZbiRlckY4TSBRfL7zoA=.1a4c59f1-a000-4d9f-abbc-36b24ed5105e@github.com>
 <UjwpWz9Y3W8WRzYHZmDk9EvoiLSCZG1aai5oASGtDKA=.68cab6dd-e67d-4651-a9c9-75dccf42d85f@github.com>
Message-ID: <l_7f8gGpIdG9H3Ini5vHWruIa0oDi7fkn2Lz5thDI5c=.969b804f-bae2-48e1-9c44-e631d90037e7@github.com>

On Thu, 26 Sep 2024 17:25:06 GMT, Scott Gibbons <sgibbons at openjdk.org> wrote:

>> @sviswa7 or @asgibbons WDYT about including https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1 as part of compact object headers implementation? Otherwise we would have to disable indexOf intrinsic when running with compact headers, because of the assumption that array headers are >= 16 bytes, which is no longer true with compact headers.
>
> @rkennke I reviewed [rkennke@ 097c2af](https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1) and the code looks good to me.  I would prefer this approach instead of not generating the IndexOf intrinsic.
> 
> Should the controlling `if` be conditioned on `UseCompactObjectHeaders` instead of `arrayOopDesc::base_offset_in_bytes`?  I can see benefits to either - which provides more clarity?  I like the assert as it makes the intention clear (thanks!).

I like to have the functional connection: if - for whatever reason - the array base offset is smaller than 16, we need to deal with that. The reason for this happens to be `UseCompactObjectHeaders`, but that may not be clear to the reader of the code. I could add an `assert(UseCompactObjectHeaders` in that branch to make that connection clear. Also consider that `UseCompactObjectHeaders` is intended to go away at some point.

I wonder if having 2 or 3 branches ahead of the main-loop (which probably doesn't do much, because haystack is <=32 bytes) is a useful approach, or if there may be a better way to get the bytes on the stack? I don't know enough about the implementation to make that judgement.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1778230714

From sjohanss at openjdk.org  Fri Sep 27 08:34:19 2024
From: sjohanss at openjdk.org (Stefan Johansson)
Date: Fri, 27 Sep 2024 08:34:19 GMT
Subject: RFR: 8340426: ZGC: Move defragment out of the allocation path [v2]
In-Reply-To: <iOWgd4xauN7tO8BWxLOov7sev7p1xi6_XCneWtFU-Ew=.b5afdc71-aae4-419f-8fd4-86eed815daf4@github.com>
References: <iOWgd4xauN7tO8BWxLOov7sev7p1xi6_XCneWtFU-Ew=.b5afdc71-aae4-419f-8fd4-86eed815daf4@github.com>
Message-ID: <RkDIWQFzEdyHWQ1xpR07f6DA9_SxxkSAAfvk9Ch30TM=.e89d0c5d-833f-405c-9303-e50a380e7cb7@github.com>

> Please review this change to move defragmentation of small pages out of the allocation path,
> 
> **Summary**
> In ZGC small pages are allocated at lower addresses while medium and large pages are allocated at higher addresses. Sometimes when memory usage is high or the distribution of pages in the cache demand it, small pages can be split out from medium or large pages. When this happens the small page is defragmented by remapping it to a lower address if it resides at a too high address. This is done to avoid fragmentation, but doing it in the allocation path comes with a cost since the remapping involves system calls.
> 
> This change moves the remapping away from the allocation path to when the pages are freed after being garbage collected. This can increase the fragmentation a bit, since small pages are allowed to reside at higher addresses for a period time. The expectation is that since small pages are frequently collected the period should be short enough to not cause any problems.
> 
> I've done detailed experiments with temporary JFR events to look at the cost of doing the defrag/remapping of pages. One problem is that by default we don't get a lot of defrags (unless we run out of memory), and to better investigate this I also altered the allocation path to force more defragmentation. These tests showed that moving defrag out of the allocation path not only have the positive effect of reducing the time spent allocating, but also spaced out the defragmentation a bit more.  The reason for this is that now we will defrag when a page is freed by the GC, instead of when the page is allocated and the tests show that often many threads allocate at the same time (and then also defrag at the same time). This in turn leads to many concurrent calls down to the system to remap the memory, and this increases the cost of the actual mapping (seen by adding JFR events).
> 
> **Additional testing**
> 
> - Functional testing in mach5 tier1-7
> - Sanity performance testing in aurora

Stefan Johansson has updated the pull request incrementally with two additional commits since the last revision:

 - Additional changes
 - StefanK review

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/21191/files
  - new: https://git.openjdk.org/jdk/pull/21191/files/1e64e361..1fe872d4

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=21191&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21191&range=00-01

  Stats: 36 lines in 2 files changed: 19 ins; 5 del; 12 mod
  Patch: https://git.openjdk.org/jdk/pull/21191.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/21191/head:pull/21191

PR: https://git.openjdk.org/jdk/pull/21191

From sjohanss at openjdk.org  Fri Sep 27 08:34:19 2024
From: sjohanss at openjdk.org (Stefan Johansson)
Date: Fri, 27 Sep 2024 08:34:19 GMT
Subject: RFR: 8340426: ZGC: Move defragment out of the allocation path
In-Reply-To: <iOWgd4xauN7tO8BWxLOov7sev7p1xi6_XCneWtFU-Ew=.b5afdc71-aae4-419f-8fd4-86eed815daf4@github.com>
References: <iOWgd4xauN7tO8BWxLOov7sev7p1xi6_XCneWtFU-Ew=.b5afdc71-aae4-419f-8fd4-86eed815daf4@github.com>
Message-ID: <kLsbBtOaroF1Jipycp3ToRcdafUcSzMFnpQA4c4ZJ2A=.f5e36ac7-4a70-452f-a71b-b35ed101336d@github.com>

On Wed, 25 Sep 2024 20:05:17 GMT, Stefan Johansson <sjohanss at openjdk.org> wrote:

> Please review this change to move defragmentation of small pages out of the allocation path,
> 
> **Summary**
> In ZGC small pages are allocated at lower addresses while medium and large pages are allocated at higher addresses. Sometimes when memory usage is high or the distribution of pages in the cache demand it, small pages can be split out from medium or large pages. When this happens the small page is defragmented by remapping it to a lower address if it resides at a too high address. This is done to avoid fragmentation, but doing it in the allocation path comes with a cost since the remapping involves system calls.
> 
> This change moves the remapping away from the allocation path to when the pages are freed after being garbage collected. This can increase the fragmentation a bit, since small pages are allowed to reside at higher addresses for a period time. The expectation is that since small pages are frequently collected the period should be short enough to not cause any problems.
> 
> I've done detailed experiments with temporary JFR events to look at the cost of doing the defrag/remapping of pages. One problem is that by default we don't get a lot of defrags (unless we run out of memory), and to better investigate this I also altered the allocation path to force more defragmentation. These tests showed that moving defrag out of the allocation path not only have the positive effect of reducing the time spent allocating, but also spaced out the defragmentation a bit more.  The reason for this is that now we will defrag when a page is freed by the GC, instead of when the page is allocated and the tests show that often many threads allocate at the same time (and then also defrag at the same time). This in turn leads to many concurrent calls down to the system to remap the memory, and this increases the cost of the actual mapping (seen by adding JFR events).
> 
> **Additional testing**
> 
> - Functional testing in mach5 tier1-7
> - Sanity performance testing in aurora

Me and StefanK discussed his proposal and did some additional changes with regards to naming and structure.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21191#issuecomment-2378741094

From aboldtch at openjdk.org  Fri Sep 27 09:24:35 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Fri, 27 Sep 2024 09:24:35 GMT
Subject: RFR: 8340426: ZGC: Move defragment out of the allocation path [v2]
In-Reply-To: <RkDIWQFzEdyHWQ1xpR07f6DA9_SxxkSAAfvk9Ch30TM=.e89d0c5d-833f-405c-9303-e50a380e7cb7@github.com>
References: <iOWgd4xauN7tO8BWxLOov7sev7p1xi6_XCneWtFU-Ew=.b5afdc71-aae4-419f-8fd4-86eed815daf4@github.com>
 <RkDIWQFzEdyHWQ1xpR07f6DA9_SxxkSAAfvk9Ch30TM=.e89d0c5d-833f-405c-9303-e50a380e7cb7@github.com>
Message-ID: <L1Ws4z9lcj9xbMiZA0l__J-a8KeLD-W7iIC1jVr-7cU=.4e5b3f3c-239d-4397-8018-8efe87443ef1@github.com>

On Fri, 27 Sep 2024 08:34:19 GMT, Stefan Johansson <sjohanss at openjdk.org> wrote:

>> Please review this change to move defragmentation of small pages out of the allocation path,
>> 
>> **Summary**
>> In ZGC small pages are allocated at lower addresses while medium and large pages are allocated at higher addresses. Sometimes when memory usage is high or the distribution of pages in the cache demand it, small pages can be split out from medium or large pages. When this happens the small page is defragmented by remapping it to a lower address if it resides at a too high address. This is done to avoid fragmentation, but doing it in the allocation path comes with a cost since the remapping involves system calls.
>> 
>> This change moves the remapping away from the allocation path to when the pages are freed after being garbage collected. This can increase the fragmentation a bit, since small pages are allowed to reside at higher addresses for a period time. The expectation is that since small pages are frequently collected the period should be short enough to not cause any problems.
>> 
>> I've done detailed experiments with temporary JFR events to look at the cost of doing the defrag/remapping of pages. One problem is that by default we don't get a lot of defrags (unless we run out of memory), and to better investigate this I also altered the allocation path to force more defragmentation. These tests showed that moving defrag out of the allocation path not only have the positive effect of reducing the time spent allocating, but also spaced out the defragmentation a bit more.  The reason for this is that now we will defrag when a page is freed by the GC, instead of when the page is allocated and the tests show that often many threads allocate at the same time (and then also defrag at the same time). This in turn leads to many concurrent calls down to the system to remap the memory, and this increases the cost of the actual mapping (seen by adding JFR events).
>> 
>> **Additional testing**
>> 
>> - Functional testing in mach5 tier1-7
>> - Sanity performance testing in aurora
>
> Stefan Johansson has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Additional changes
>  - StefanK review

lgtm.

-------------

Marked as reviewed by aboldtch (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/21191#pullrequestreview-2333222238

From rkennke at openjdk.org  Fri Sep 27 09:41:17 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Fri, 27 Sep 2024 09:41:17 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v28]
In-Reply-To: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
Message-ID: <s4Shjqv28fYx8U5UNboZcK2F0dJrwKjJYxRUDn7lFNk=.6dd86afe-2a80-4393-8508-1478aa3de4b4@github.com>

> This is the main body of the JEP 450: Compact Object Headers (Experimental).
> 
> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
> 
> Main changes:
>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>  - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>  - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>  - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>  - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>  - Arrays will now store their length at offset 8.
>  - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv...

Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:

  Disable TestSplitPacks::test4a, failing on aarch64

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20677/files
  - new: https://git.openjdk.org/jdk/pull/20677/files/d48f55d6..059b1573

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=27
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=26-27

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/20677.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677

PR: https://git.openjdk.org/jdk/pull/20677

From shade at openjdk.org  Fri Sep 27 09:46:41 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Fri, 27 Sep 2024 09:46:41 GMT
Subject: RFR: 8341015: OopStorage location decoder crashes accessing
 non-initalized OopStorage [v2]
In-Reply-To: <gDzWhB5KUiTUP1gUKXN2QSnTh30HJKQGD8p_Kn1Sc4k=.5dc44843-d833-4ee8-8c2f-272e8a014f22@github.com>
References: <VWg-M-ojhmYz04GJXC3n_m025xdf0Da9Fbdo1d7pBUA=.c186aff0-5c92-44f8-95c0-fa3a41614313@github.com>
 <gDzWhB5KUiTUP1gUKXN2QSnTh30HJKQGD8p_Kn1Sc4k=.5dc44843-d833-4ee8-8c2f-272e8a014f22@github.com>
Message-ID: <nDdFC3zUzwyiaLjO9_8IKFCS3Kqd3Rg2NuHWF9H_Z2A=.3feccede-bb8f-40fa-aa85-bac40c72c764@github.com>

On Thu, 26 Sep 2024 12:47:47 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> When debugging CDS, I asked for `os::print_location` before OopStorage was initialized, like an error handler would do. This is a fairly unusual situation, this is why we have not seen it before. Anyhow, we entered the new code added by [JDK-8340392](https://bugs.openjdk.org/browse/JDK-8340392), which crashed on `OopStorage` that was `nullptr`. I think we should null-check `OopStorage` before calling into it.
>> 
>> Additional testing:
>>  - [x] OopStorageSetTest still passing
>>  - [x] Verified the check is now passing in similar debugging session
>
> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Touchups

Thanks for reviews, I think this is simple enough to push on Friday.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21204#issuecomment-2378876903

From shade at openjdk.org  Fri Sep 27 09:46:42 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Fri, 27 Sep 2024 09:46:42 GMT
Subject: Integrated: 8341015: OopStorage location decoder crashes accessing
 non-initalized OopStorage
In-Reply-To: <VWg-M-ojhmYz04GJXC3n_m025xdf0Da9Fbdo1d7pBUA=.c186aff0-5c92-44f8-95c0-fa3a41614313@github.com>
References: <VWg-M-ojhmYz04GJXC3n_m025xdf0Da9Fbdo1d7pBUA=.c186aff0-5c92-44f8-95c0-fa3a41614313@github.com>
Message-ID: <0Bm-oLEAC72_g1jyOfPU8qOYWDk49HD79ZqmFKVGlaQ=.b2038275-9777-49d5-af2e-aeff6696f88e@github.com>

On Thu, 26 Sep 2024 11:36:26 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> When debugging CDS, I asked for `os::print_location` before OopStorage was initialized, like an error handler would do. This is a fairly unusual situation, this is why we have not seen it before. Anyhow, we entered the new code added by [JDK-8340392](https://bugs.openjdk.org/browse/JDK-8340392), which crashed on `OopStorage` that was `nullptr`. I think we should null-check `OopStorage` before calling into it.
> 
> Additional testing:
>  - [x] OopStorageSetTest still passing
>  - [x] Verified the check is now passing in similar debugging session

This pull request has now been integrated.

Changeset: 6587909c
Author:    Aleksey Shipilev <shade at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/6587909c7db6482bda92d314096a2a1795900ffd
Stats:     3 lines in 1 file changed: 2 ins; 0 del; 1 mod

8341015: OopStorage location decoder crashes accessing non-initalized OopStorage

Reviewed-by: kbarrett, tschatzl

-------------

PR: https://git.openjdk.org/jdk/pull/21204

From rcastanedalo at openjdk.org  Fri Sep 27 14:35:54 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Fri, 27 Sep 2024 14:35:54 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v28]
In-Reply-To: <-CikzUsH1qKbMujGJQFhaPlKaCUDzqH-jEZNM5BZVQQ=.22d236a1-a69a-42e0-86d1-aa738c6e6e6d@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <DUgk6HT_Zk5KnbmAd_HzrJi4B8D1J-uuoAGzReWyI8s=.6c04f90a-bf1d-49af-a31d-5224973952d4@github.com>
 <GQVVR0K-mDh4evKvrKG9eXwwF4j7zAg-8ai4__gWNGE=.3c502820-717b-44b7-b82d-a24dd7fdd9d5@github.com>
 <-CikzUsH1qKbMujGJQFhaPlKaCUDzqH-jEZNM5BZVQQ=.22d236a1-a69a-42e0-86d1-aa738c6e6e6d@github.com>
Message-ID: <K3AopS6Hcuvlj38a4botYqOe9BsSbP-SqSBgJOSm8xg=.b9186913-ed75-4c63-a8d0-93df079d3d03@github.com>

On Thu, 12 Sep 2024 15:42:59 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> src/hotspot/share/opto/machnode.cpp line 390:
>> 
>>> 388:     t = t->make_ptr();
>>> 389:   }
>>> 390:   if (t->isa_narrowklass() && CompressedKlassPointers::shift() == 0) {
>> 
>> Does this change have any effect? `UseCompressedClassPointers` should be implied by `t->isa_narrowklass()`.
>
> I don't remember if this change was a reaction to an error or if I just guarded `CompressedKlassPointers::shift()` with +UseCCP because that is the prerequisite now. Probably the latter. I can remove this. Probably should assert then for +UseCCP.

@tstuefe @rkennke what do you think about this suggestion? If there is a known case where `t->isa_narrowklass() && !UseCompressedClassPointers` holds, it should be investigated because it might be a symptom of a larger problem. If there is no such a case, I think the explicit `UseCompressedClassPointers` test should be removed to avoid confusion.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1778724120

From sgibbons at openjdk.org  Fri Sep 27 14:47:51 2024
From: sgibbons at openjdk.org (Scott Gibbons)
Date: Fri, 27 Sep 2024 14:47:51 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v26]
In-Reply-To: <l_7f8gGpIdG9H3Ini5vHWruIa0oDi7fkn2Lz5thDI5c=.969b804f-bae2-48e1-9c44-e631d90037e7@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <fEpfi4CzTFr1Pl-IyOAVyHNXWlVz33lHKQyzVY0kBc8=.cc39355b-ee7a-4118-952a-28e4184eec84@github.com>
 <DKuidQyyiiJOsrerLCeziselVyG9Ztpzsyc64ZNjlI4=.722fc4f7-d7cc-49df-9d82-5c1094508f74@github.com>
 <hdttC4LhHE6y6Nbq0jjJwxxF4W0hTfJLK_siODDCBkM=.1667f495-fcd9-4b7f-8d58-24e27665eeb3@github.com>
 <fWSdSqQfhOrGpFoExiaO4L-UDtR90_t0gRxp9YBO3_c=.0173618e-f392-4921-8fae-39d44b24de08@github.com>
 <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com>
 <C5LwWrsa-VO87kdm_sYuisTGMsHoxv4Eoe0QikdkusE=.63b0993c-86c8-4ef6-a3cf-bc8a67e10815@github.com>
 <WOlTJMNHCr_Kk69sEIUwCVI3ZbiRlckY4TSBRfL7zoA=.1a4c59f1-a000-4d9f-abbc-36b24ed5105e@github.com>
 <UjwpWz9Y3W8WRzYHZmDk9EvoiLSCZG1aai5oASGtDKA=.68cab6dd-e67d-4651-a9c9-75dccf42d85f@github.com>
 <l_7f8gGpIdG9H3Ini5vHWruIa0oDi7fkn2Lz5thDI5c=.969b804f-bae2-48e1-9c44-e631d90037e7@github.com>
Message-ID: <BtvwMUG5tcy_zU4tRG3Q5VVw08qE_3gSE35wCxlDnj4=.954353e5-cab2-48f4-b0f2-5852e586ba20@github.com>

On Fri, 27 Sep 2024 08:24:50 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> @rkennke I reviewed [rkennke@ 097c2af](https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1) and the code looks good to me.  I would prefer this approach instead of not generating the IndexOf intrinsic.
>> 
>> Should the controlling `if` be conditioned on `UseCompactObjectHeaders` instead of `arrayOopDesc::base_offset_in_bytes`?  I can see benefits to either - which provides more clarity?  I like the assert as it makes the intention clear (thanks!).
>
> I like to have the functional connection: if - for whatever reason - the array base offset is smaller than 16, we need to deal with that. The reason for this happens to be `UseCompactObjectHeaders`, but that may not be clear to the reader of the code. I could add an `assert(UseCompactObjectHeaders` in that branch to make that connection clear. Also consider that `UseCompactObjectHeaders` is intended to go away at some point.
> 
> I wonder if having 2 or 3 branches ahead of the main-loop (which probably doesn't do much, because haystack is <=32 bytes) is a useful approach, or if there may be a better way to get the bytes on the stack? I don't know enough about the implementation to make that judgement.

I believe the code in the patch is good enough as-is, especially if `UseCompactObjectHeaders` is slated to go away.  The existing `if` will prevent the < 16 byte header code from being emitted, which is the desired behavior - i.e., if the header size is >= 16, there will be no code emitted to the intrinsic for that block.  So there will not be an additional branch for the code when it is executed.

I'm good with a comment tying `UseCompactObjectHeaders` to the condition.  The comment can be removed when the flag is removed.  "Ship it" :-)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1778739517

From xpeng at openjdk.org  Fri Sep 27 15:07:39 2024
From: xpeng at openjdk.org (Xiaolong Peng)
Date: Fri, 27 Sep 2024 15:07:39 GMT
Subject: RFR: 8340490: Shenandoah: Optimize ShenandoahPacer [v2]
In-Reply-To: <U4-6cg1j9inHQdKJDt0mYEl9DFc6nQ9sbdiao4hSH_4=.9e421feb-a9c7-4c77-b745-e307749b052f@github.com>
References: <mh4-1fuv1_9n0e_BCeIX-WHnXZN-1A7woAkyJinXkIU=.c03edc09-82fc-44c2-b09b-22d543fb4ec4@github.com>
 <U4-6cg1j9inHQdKJDt0mYEl9DFc6nQ9sbdiao4hSH_4=.9e421feb-a9c7-4c77-b745-e307749b052f@github.com>
Message-ID: <u78zqLFbHxXy4xzEn55L21-pQYZ2aMkwYbVH--Kta2E=.eb298ecd-a0f8-4ad9-a72a-e00c448ac70b@github.com>

On Fri, 20 Sep 2024 18:47:50 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote:

>> In a simple latency benchmark for memory allocation, I found ShenandoahPacer contributed quite a lot to the long tail latency > 10ms, when there are multi mutator threads failed at fast path to claim budget [here](https://github.com/openjdk/jdk/blob/fdc16a373459cb2311316448c765b1bee5c73694/src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp#L230), all of them will forcefully claim and them wait for up to 10ms([code link](https://github.com/openjdk/jdk/blob/fdc16a373459cb2311316448c765b1bee5c73694/src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp#L239-L277))
>> 
>> The change in this PR makes ShenandoahPacer impact long tail latency much less, instead forcefully claim budget and them wait, it attempts to claim after waiting for 1ms, and keep doing this until: 1/ either spent 10ms waiting in total; 2/ or successfully claimed the budget.
>> 
>> Here the latency comparison for the optimization:
>> ![hdr-histogram-optimize-pacer](https://github.com/user-attachments/assets/811f48c5-87eb-462d-8b27-d15bd08be7b0)
>> 
>> With the optimization, long tail latency from the test code below has been much improved from over 20ms to ~10ms on MacOS with M3 chip:
>> 
>>     static final int threadCount = Runtime.getRuntime().availableProcessors();
>>     static final LongAdder totalCount = new LongAdder();
>>     static volatile byte[] sink;
>>     public static void main(String[] args) {
>>         runAllocationTest(100000);
>>     }
>>     static void recordTimeToAllocate(final int dataSize, final Histogram histogram) {
>>         long startTime = System.nanoTime();
>>         sink = new byte[dataSize];
>>         long endTime = System.nanoTime();
>>         histogram.recordValue(endTime - startTime);
>>     }
>> 
>>     static void runAllocationTest(final int dataSize) {
>>         final long endTime = System.currentTimeMillis() + 30_000;
>>         final CountDownLatch startSignal = new CountDownLatch(1);
>>         final CountDownLatch finished = new CountDownLatch(threadCount);
>>         final Thread[] threads = new Thread[threadCount];
>>         final Histogram[] histograms = new Histogram[threadCount];
>>         final Histogram totalHistogram = new Histogram(3600000000000L, 3);
>>         for (int i = 0; i < threadCount; i++) {
>>             final var histogram = new Histogram(3600000000000L, 3);
>>             histograms[i] = histogram;
>>             threads[i] = new Thread(() -> {
>>                 wait(startSignal);
>>                 do {
>>                     recordTimeToAllocate(dataS...
>
> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision:
> 
>   clean up

Thanks all for the reviews!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21099#issuecomment-2379487017

From duke at openjdk.org  Fri Sep 27 15:07:39 2024
From: duke at openjdk.org (duke)
Date: Fri, 27 Sep 2024 15:07:39 GMT
Subject: RFR: 8340490: Shenandoah: Optimize ShenandoahPacer [v2]
In-Reply-To: <U4-6cg1j9inHQdKJDt0mYEl9DFc6nQ9sbdiao4hSH_4=.9e421feb-a9c7-4c77-b745-e307749b052f@github.com>
References: <mh4-1fuv1_9n0e_BCeIX-WHnXZN-1A7woAkyJinXkIU=.c03edc09-82fc-44c2-b09b-22d543fb4ec4@github.com>
 <U4-6cg1j9inHQdKJDt0mYEl9DFc6nQ9sbdiao4hSH_4=.9e421feb-a9c7-4c77-b745-e307749b052f@github.com>
Message-ID: <u0tOhQutfah_3PKyXlNjU3KaF0rZmgTuitQSSdNJ61E=.00635052-61d8-4798-ae95-ea9a09788ead@github.com>

On Fri, 20 Sep 2024 18:47:50 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote:

>> In a simple latency benchmark for memory allocation, I found ShenandoahPacer contributed quite a lot to the long tail latency > 10ms, when there are multi mutator threads failed at fast path to claim budget [here](https://github.com/openjdk/jdk/blob/fdc16a373459cb2311316448c765b1bee5c73694/src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp#L230), all of them will forcefully claim and them wait for up to 10ms([code link](https://github.com/openjdk/jdk/blob/fdc16a373459cb2311316448c765b1bee5c73694/src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp#L239-L277))
>> 
>> The change in this PR makes ShenandoahPacer impact long tail latency much less, instead forcefully claim budget and them wait, it attempts to claim after waiting for 1ms, and keep doing this until: 1/ either spent 10ms waiting in total; 2/ or successfully claimed the budget.
>> 
>> Here the latency comparison for the optimization:
>> ![hdr-histogram-optimize-pacer](https://github.com/user-attachments/assets/811f48c5-87eb-462d-8b27-d15bd08be7b0)
>> 
>> With the optimization, long tail latency from the test code below has been much improved from over 20ms to ~10ms on MacOS with M3 chip:
>> 
>>     static final int threadCount = Runtime.getRuntime().availableProcessors();
>>     static final LongAdder totalCount = new LongAdder();
>>     static volatile byte[] sink;
>>     public static void main(String[] args) {
>>         runAllocationTest(100000);
>>     }
>>     static void recordTimeToAllocate(final int dataSize, final Histogram histogram) {
>>         long startTime = System.nanoTime();
>>         sink = new byte[dataSize];
>>         long endTime = System.nanoTime();
>>         histogram.recordValue(endTime - startTime);
>>     }
>> 
>>     static void runAllocationTest(final int dataSize) {
>>         final long endTime = System.currentTimeMillis() + 30_000;
>>         final CountDownLatch startSignal = new CountDownLatch(1);
>>         final CountDownLatch finished = new CountDownLatch(threadCount);
>>         final Thread[] threads = new Thread[threadCount];
>>         final Histogram[] histograms = new Histogram[threadCount];
>>         final Histogram totalHistogram = new Histogram(3600000000000L, 3);
>>         for (int i = 0; i < threadCount; i++) {
>>             final var histogram = new Histogram(3600000000000L, 3);
>>             histograms[i] = histogram;
>>             threads[i] = new Thread(() -> {
>>                 wait(startSignal);
>>                 do {
>>                     recordTimeToAllocate(dataS...
>
> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision:
> 
>   clean up

@pengxiaolong 
Your change (at version 58196a4f6f9f509525667dba1bd1fb2c2afa3e8e) is now ready to be sponsored by a Committer.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21099#issuecomment-2379489972

From rkennke at openjdk.org  Fri Sep 27 16:25:54 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Fri, 27 Sep 2024 16:25:54 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v26]
In-Reply-To: <BtvwMUG5tcy_zU4tRG3Q5VVw08qE_3gSE35wCxlDnj4=.954353e5-cab2-48f4-b0f2-5852e586ba20@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <fEpfi4CzTFr1Pl-IyOAVyHNXWlVz33lHKQyzVY0kBc8=.cc39355b-ee7a-4118-952a-28e4184eec84@github.com>
 <DKuidQyyiiJOsrerLCeziselVyG9Ztpzsyc64ZNjlI4=.722fc4f7-d7cc-49df-9d82-5c1094508f74@github.com>
 <hdttC4LhHE6y6Nbq0jjJwxxF4W0hTfJLK_siODDCBkM=.1667f495-fcd9-4b7f-8d58-24e27665eeb3@github.com>
 <fWSdSqQfhOrGpFoExiaO4L-UDtR90_t0gRxp9YBO3_c=.0173618e-f392-4921-8fae-39d44b24de08@github.com>
 <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com>
 <C5LwWrsa-VO87kdm_sYuisTGMsHoxv4Eoe0QikdkusE=.63b0993c-86c8-4ef6-a3cf-bc8a67e10815@github.com>
 <WOlTJMNHCr_Kk69sEIUwCVI3ZbiRlckY4TSBRfL7zoA=.1a4c59f1-a000-4d9f-abbc-36b24ed5105e@github.com>
 <UjwpWz9Y3W8WRzYHZmDk9EvoiLSCZG1aai5oASGtDKA=.68cab6dd-e67d-4651-a9c9-75dccf42d85f@github.com>
 <l_7f8gGpIdG9H3Ini5vHWruIa0oDi7fkn2Lz5thDI5c=.969b804f-bae2-48e1-9c44-e631d90037e7@github.com>
 <BtvwMUG5tcy_zU4tRG3Q5VVw08qE_3gSE35
 wCxlDnj4=.954353e5-cab2-48f4-b0f2-5852e586ba20@github.com>
Message-ID: <zi8MM6qUWhvxPSyb3ewa3baXWhmDIDlut2uYUQmWvcw=.7b32fa42-7a3f-43bb-8f15-fe6b8e66c49b@github.com>

On Fri, 27 Sep 2024 14:44:35 GMT, Scott Gibbons <sgibbons at openjdk.org> wrote:

>> I like to have the functional connection: if - for whatever reason - the array base offset is smaller than 16, we need to deal with that. The reason for this happens to be `UseCompactObjectHeaders`, but that may not be clear to the reader of the code. I could add an `assert(UseCompactObjectHeaders` in that branch to make that connection clear. Also consider that `UseCompactObjectHeaders` is intended to go away at some point.
>> 
>> I wonder if having 2 or 3 branches ahead of the main-loop (which probably doesn't do much, because haystack is <=32 bytes) is a useful approach, or if there may be a better way to get the bytes on the stack? I don't know enough about the implementation to make that judgement.
>
> I believe the code in the patch is good enough as-is, especially if `UseCompactObjectHeaders` is slated to go away.  The existing `if` will prevent the < 16 byte header code from being emitted, which is the desired behavior - i.e., if the header size is >= 16, there will be no code emitted to the intrinsic for that block.  So there will not be an additional branch for the code when it is executed.
> 
> I'm good with a comment tying `UseCompactObjectHeaders` to the condition.  The comment can be removed when the flag is removed.  "Ship it" :-)

Wait a second, I've probably not been clear. `UseCompactObjectHeaders` is slated to become *on by default* and then slated to go away. That means that array base offets <= 16 bytes will become the default. The generated code will be something like:


if (haystack_len <= 8) {
  // Copy 8 bytes onto stack
} else if (haystack_len <= 16) {
  // Copy 16 bytes onto stack
} else {
  // Copy 32 bytes onto stack
}


So that is 2 branches in this prologue code instead of originally 1.

However, I just noticed that what I proposed is not enough. Consider what happens when haystack_len is 17. This would take the last case and copy 32 bytes. But we only have 17+8=25 bytes that we can guarantee to be available for copying. If this happens to be the array at the very beginning of the heap (very rare/unlikely), this would segfault.

I think I need to mull over it some more to come up with a correct fix.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1778874906

From yzheng at openjdk.org  Fri Sep 27 16:34:55 2024
From: yzheng at openjdk.org (Yudi Zheng)
Date: Fri, 27 Sep 2024 16:34:55 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v9]
In-Reply-To: <2w9H6VAbxm7BgSGRwKAxbI56bG-k4bE_ZDviGrBF36o=.3d4cb47f-0f84-479a-a809-6d0186dfad2d@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com>
 <NR6CKOtdUaVarB7hiAUIAsIMe03qiTQr5VcEOWGt8mA=.164efc10-f0ff-4ddf-80b8-c56599c0ba7d@github.com>
 <BNQiyA0NCBO-oFZU4Nz3yvaMxDzICFxfqDKWt4wCL_U=.9405414d-f961-4168-8d26-ef5f58069161@github.com>
 <igBchd3diNsHdnQwZS2sj4tjaKejOq4f4EZovOmpmRQ=.abf79fee-9592-4555-ac01-de0170665ae4@github.com>
 <BkuKSFHleaF8I2tWCDIC-rGlWnfGhbK1zQ6CLlB-rVE=.9c7976e3-b0e9-4356-8d4b-d880aa30bffa@github.com>
 <ubcY5Q-aPRCLmhZV9MjBfzzbs84FegDvNkHx8nwPUd0=.46503fd8-d274-4558-bd0f-cc0c4f199667@github.com>
 <2w9H6VAbxm7BgSGRwKAxbI56bG-k4bE_ZDviGrBF36o=.3d4cb47f-0f84-479a-a809-6d0186dfad2d@github.com>
Message-ID: <CKaxMvoxrt7YLGw6Zee_SW8uBb8tg3j94mw-Mcf5u0I=.78240440-cee1-413b-ae49-1b85592d5ba7@github.com>

On Thu, 19 Sep 2024 14:22:51 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:

>> We haven't decided whether or not we will git rid of ```Klass::_prototype_header``` before intergrating this PR, or not. @stefank could point you to a WIP branch, if that's helpful.
>
> This is my current work-in-progress code:
> https://github.com/stefank/jdk/compare/pull/20677...stefank:jdk:lilliput_remove_prototype_header_wip_2
> 
> I've made some large rewrites and I'm currently running it through functional testing.

If @stefank 's patch does not go in this PR, could you please export `Klass::_prototype_header` to JVMCI? Thanks!

diff --git a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp
index 9d1b8a1cb9f..e462025074f 100644
--- a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp
+++ b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp
@@ -278,6 +278,7 @@
   nonstatic_field(Klass,                       _bitmap,                                       uintx)                                 \
   nonstatic_field(Klass,                       _hash_slot,                                    uint8_t)                               \
   nonstatic_field(Klass,                       _misc_flags._flags,                            u1)                                    \
+  nonstatic_field(Klass,                       _prototype_header,                             markWord)                              \
                                                                                                                                      \
   nonstatic_field(LocalVariableTableElement,   start_bci,                                     u2)                                    \
   nonstatic_field(LocalVariableTableElement,   length,                                        u2)                                    \

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1778884055

From xpeng at openjdk.org  Fri Sep 27 17:08:45 2024
From: xpeng at openjdk.org (Xiaolong Peng)
Date: Fri, 27 Sep 2024 17:08:45 GMT
Subject: Integrated: 8340490: Shenandoah: Optimize ShenandoahPacer
In-Reply-To: <mh4-1fuv1_9n0e_BCeIX-WHnXZN-1A7woAkyJinXkIU=.c03edc09-82fc-44c2-b09b-22d543fb4ec4@github.com>
References: <mh4-1fuv1_9n0e_BCeIX-WHnXZN-1A7woAkyJinXkIU=.c03edc09-82fc-44c2-b09b-22d543fb4ec4@github.com>
Message-ID: <aTtxcYcmm-03N8vsOsXZqy-F_0MeWpKggs1OcBvSdfA=.7cbc75c3-bd9a-40af-80dc-7bdca203f62b@github.com>

On Thu, 19 Sep 2024 23:32:14 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote:

> In a simple latency benchmark for memory allocation, I found ShenandoahPacer contributed quite a lot to the long tail latency > 10ms, when there are multi mutator threads failed at fast path to claim budget [here](https://github.com/openjdk/jdk/blob/fdc16a373459cb2311316448c765b1bee5c73694/src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp#L230), all of them will forcefully claim and them wait for up to 10ms([code link](https://github.com/openjdk/jdk/blob/fdc16a373459cb2311316448c765b1bee5c73694/src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp#L239-L277))
> 
> The change in this PR makes ShenandoahPacer impact long tail latency much less, instead forcefully claim budget and them wait, it attempts to claim after waiting for 1ms, and keep doing this until: 1/ either spent 10ms waiting in total; 2/ or successfully claimed the budget.
> 
> Here the latency comparison for the optimization:
> ![hdr-histogram-optimize-pacer](https://github.com/user-attachments/assets/811f48c5-87eb-462d-8b27-d15bd08be7b0)
> 
> With the optimization, long tail latency from the test code below has been much improved from over 20ms to ~10ms on MacOS with M3 chip:
> 
>     static final int threadCount = Runtime.getRuntime().availableProcessors();
>     static final LongAdder totalCount = new LongAdder();
>     static volatile byte[] sink;
>     public static void main(String[] args) {
>         runAllocationTest(100000);
>     }
>     static void recordTimeToAllocate(final int dataSize, final Histogram histogram) {
>         long startTime = System.nanoTime();
>         sink = new byte[dataSize];
>         long endTime = System.nanoTime();
>         histogram.recordValue(endTime - startTime);
>     }
> 
>     static void runAllocationTest(final int dataSize) {
>         final long endTime = System.currentTimeMillis() + 30_000;
>         final CountDownLatch startSignal = new CountDownLatch(1);
>         final CountDownLatch finished = new CountDownLatch(threadCount);
>         final Thread[] threads = new Thread[threadCount];
>         final Histogram[] histograms = new Histogram[threadCount];
>         final Histogram totalHistogram = new Histogram(3600000000000L, 3);
>         for (int i = 0; i < threadCount; i++) {
>             final var histogram = new Histogram(3600000000000L, 3);
>             histograms[i] = histogram;
>             threads[i] = new Thread(() -> {
>                 wait(startSignal);
>                 do {
>                     recordTimeToAllocate(dataSize, histogram);
>                 } while (System.currentTimeMillis() < e...

This pull request has now been integrated.

Changeset: 65200a95
Author:    Xiaolong Peng <xpeng at openjdk.org>
Committer: Aleksey Shipilev <shade at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/65200a9589e46956a2194b20c4c90d003351a539
Stats:     41 lines in 3 files changed: 8 ins; 16 del; 17 mod

8340490: Shenandoah: Optimize ShenandoahPacer

Reviewed-by: shade, kdnilsen

-------------

PR: https://git.openjdk.org/jdk/pull/21099

From wkemper at openjdk.org  Fri Sep 27 21:35:05 2024
From: wkemper at openjdk.org (William Kemper)
Date: Fri, 27 Sep 2024 21:35:05 GMT
Subject: RFR: 8332697: ubsan: shenandoahSimpleBitMap.inline.hpp:68:23: runtime
 error: signed integer overflow: -9223372036854775808 - 1 cannot be
 represented in type 'long int'
Message-ID: <I8sOa0g-PbjrI3R9yuuosgg83kNah0m0wiLbsNN8IFM=.495943c2-9206-44c5-abb1-4f68cd98b0c5@github.com>

Use an unsigned version of `right_n_bits`.

-------------

Commit messages:
 - Use an unsigned variant of right_n_bits

Changes: https://git.openjdk.org/jdk/pull/21236/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21236&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8332697
  Stats: 5 lines in 1 file changed: 3 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/21236.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/21236/head:pull/21236

PR: https://git.openjdk.org/jdk/pull/21236

From kirk at kodewerk.com  Fri Sep 27 22:55:43 2024
From: kirk at kodewerk.com (Kirk Pepperdine)
Date: Fri, 27 Sep 2024 15:55:43 -0700
Subject: Aligning the Serial collector with ZGC
In-Reply-To: <44230e90-d8bc-491f-a5c4-2c483646fc3e@oracle.com>
References: <49D0FB58-94AB-46B6-A8A6-F9F08773770E@kodewerk.com>
 <44230e90-d8bc-491f-a5c4-2c483646fc3e@oracle.com>
Message-ID: <C14C1F49-71BB-49DB-8A9D-E8993E324958@kodewerk.com>

Hi Thomas,

I wanted to respond to all of your comments but I thought better of it given one response deserves it?s own email. The focus is mostly on that one question.


> >
> > - Introduce an adaptive size policy that takes into account memory and
> > CPU pressure along with global memory pressure.
> >     - Heap should be large enough to minimize GC overhead but not
> > large enough to trigger OOM.
> 
> (probably meant "small enough" the second time)

I actually did mean large but in the context of OOM killer?. But to your point, smaller but avoid OOME is also a concern.

> 
> >     - Introduce -XX:SerialPressure=[0-100] to support this work.
> 
> (Fwiw, regards to the other discussion, I agree that if we have a flag with the same "meaning" across collectors it might be useful to use the same name).

I think we have deadly agreement on this one.

> 
> >     - introduce a smoothing algorythm to avoid excessive small
> > resizes.
> 
> One option is to split this further into parts:
> 
> * list what actions Serial GC could do in reaction to memory pressure on an abstract level, and which make sense; from that see what functionality is needed.

I built a chart some time ago and this is an expanded version of it.

GC Overhead
(Pause:mutator time)
Allocation Pressure
Global Memory Pressure
Action (Eden)
Action (Tenured)
(full collection only)
< target
Low
Low
shrink
shrink
< target
Low
Medium
shrink
shrink
< target
Low
High
shrink
shrink
< target
Medium
Low
hold
shrink
< target
Medium
Medium
shrink
shrink
< target
Medium
High
shrink
shrink
< target
High
Low
shrink
shrink
< target
High
Medium
shrink
shrink
< target
High
High
shrink
shrink
~= target
Low
Low
hold
hold
~= target
Low
Medium
hold
hold
~= target
Low
High
shrink
shrink
~= target
Medium
Low
hold
hold
~= target
Medium
Medium
hold
hold
~= target
Medium
High
shrink
shrink
~= target
High
Low
hold
hold
~= target
High
Medium
hold
hold
~= target
High
High
shrink
shrink
> target
Low
Low
expand
expand
> target
Low
Medium
expand
expand
> target
Low
High
hold
hold
> target
Medium
Low
expand
expand
> target
Medium
Medium
expand
expand
> target
Medium
High
hold
hold
> target
High
Low
expand
expand
> target
High
Medium
expand
expand
> target
High
High
hold
hold
 
 
Some of my thoughts used to construct the table.

GC Overhead tells us if the heap is under/appropriately/over sized.
Allocation pressure combined with the size of Eden drives the frequency of young generational collections
Global memory pressure is an measure of the availability of memory.
 
The goal of resizing is to hit a target GC Overhead threshold without risking either OutOfMemoryError or the OOM killer. Reducing Full GC activity requires one to provide enough tenured space to hold the Live Data Set (LDS) as well as minimizing the promotion of transients. Partial GC frequency is a function of the size of Eden and the allocation pressure. Controlling GC frequency is key to controlling the rate at which transients are promoted.
 
On Heap sizing.
Tenured maybe resized at the end of a tenured (full) collection. Eden and Survivor maybe resized at the end of either a tenured or partial collection. The size of Eden, Survivor and Tenured will be decided separately. Overall logic is the heap should have as much memory as it needs for the GC to run within overhead targets.
 
The live set size is used to determine the size of tenured. The heuristic is that tenured should be 1.5 to 2x * LDS. Tenured should be expanded or shrunk to meet this ratio. Expansion should only happen when there is memory to support it.
 
The decision to resize young is based on;
is the GC overhead target being met
the strength of the allocation pressure
the availability of global memory
 
Meeting the GC overhead target indicates that the heap is appropriately sized. Under this condition there is no pressure to resize unless there is a shortage of global memory. If this is the case, there should be a balance made between being a good neighbour by releasing memory and the risk/costs of higher GC overhead.
 
Having GC overhead being under target is an indication that the heap is oversized. In this case it should be safe to reduce the heap size and release memory back to OS.
 
Having GC overhead be higher than the target indicates that heap is undersized. In this case heap (and likely Eden in particular) should be expanded assuming there is enough global memory to support the expansion without risking an OOM killer event.
 
 
Allocation pressure combined with the size of Eden sets GC frequency. High GC frequency tends to drive up GC overhead. If allocation pressure is high and GC overhead is high then increasing the size of Eden should reduce GC overhead. Having both allocation pressure and GC overhead be low provides and opportunity to reduce heap size and return memory.
 
All of the resizing decisions need to be moderated by the availability of (global) memory. If global memory is scarce, then the decision should favour releasing (uncommitting) memory. This may come at the expense of higher GC overhead. Resizing to smaller pool sizes is not without risk and in the case of young, both high global memory pressure and high allocation pressure add to the risk.


> 
> * provide functionality that tries to keep some kind of GC/mutator time ratio; I would start with looking at G1 does because Serial GC's behaviour is probably closer to G1 than ZGC, but ymmv.
> (Obviously improvements are welcome :))

I would agree.
> 
> (This may not need to be exposed externally like some GCTimeRatio/GCCPUPercentage/whatever flag name)
> 
> * add functionality to calculate memory pressure from the environment; maybe in a containerized environment from a manageable flag as it does not have a global "pressure" view. This could probably taken from ZGC, at least partially

This is but one area where we are looking to ?borrow? from.

> 
> * some transfer function that translates this external memory pressure, based on "GCPressure", (e.g. that "sigmoid" function plus lots of magic numbers) to reaction in the gc: e.g. change the gc/mutator pause time goal, start collections, uncommit memory...

We prototyped our own smoothing function but I?d defer to the sigmoid function as I?d prefer to share where ever possible.

> 
> * (probably) some background thread that continuously calculates and reacts on global pressure (uncommit memory, do a gc, resize heap, ...) because one probably does not want to wait for the next gc to react...

I?ve been trying to avoid an extra background thread and try to backload the work on the GC thread but I also recognize that an extra thread maybe necessary.

> 
> * do lots of testing to weed out corner cases
> 
> > - Introduce manageable flag SoftMaxHeapSize to define a target heap
> > size nd set the default max heap size to 100% of available.
> 
> I am a bit torn about SoftMaxHeapSize in Serial GC. What do you envision that Serial GC would do when the SoftMaxHeapSize has been reached, and what if old gen occupancy permanently stays above that value?

At the moment, SoftMaxHeapSize is an implementation in Z. I?d first like to pull a (rough) spec out of the implementation and then try to answer your question. It?s currently not clear to me how this should work with any collector.
> 
> The usefulness of SoftMaxHeapSize kind of relies on having a minimally invasive old gen collection that tries to get old gen usage back below that value.

Well, the LDS is what it is and running a speculative collection would likely clean up (prematurely) promoted transients? but that?s about it. Whereas it would clean both transients and floating garbage for the concurrent collectors. I?m not at fan of speculative collections given all of the time I?ve spent getting rid of them :-) IMO, a DGC triggered full collections was rarely necessary (all overhead with very little return). This also applied to the G1 patch that speculatively ran to counter to-space overflows and it also applied to running a young gen prior to remark with CMS collector. Long story sort, loads of extra overhead with very little to no payback.

> 
> Serial GC has no "minimally invasive" way to collect old generation. It is either Full GC or nothing. This is the only option for Serial, but always doing Full collections after reaching that threshold seems very heavy handed, expensive and undesirable to me (ymmv).
> 
> That reaction would follow the spirit of the flag though.
> 
> Maybe at the small heaps Serial GC targets, this makes sense, and full gc is not that costly anyway.

Yeah, for small heap this shouldn?t be a big deal. But this is one of the reasons why I believe we should treat young and old separately. We can cheaply and safely return memory from young gen and leave the sizing of tenured to when a full is really needed. I grant you that this may not be very timely but I?m not sure that we need this to happen on demand? I think we can wait for natural cycles to take their course. But, maybe I?m wrong on this point. We plan to experiment with this.
> 
> It might be useful to enumerate what actions could be performed on global pressure.

That?s in the table?

> 
> > - Add in the ability to uncommit memory (to reduce global memory
> > pressure).
> >
> 
> The following imo outlines a compdoneletely separate idea, and should be discussed separately:
> 
> >
> > While working through the details of this work I noted that there
> > appear  to opportunities to offer new defaults for other settings. For
> > example, [...]
> 
> That seems to be some more elaborate way of finding "optimal" generation size for a given heap size (which may follow from what the gc/mutator time ratio algorithm gives you).

I?m trying to apply my years of experience tuning 100s of collectors across 100s of applications.

> 
> >
> > For Eden the guiding metric is allocation rate. For Survivor it's life
> > cycle (age table). For Tenured it's live set size. Using these metrics
> > to determine size of the parts and use that to then calculate a max
> > heap size has almost always yielded lower GC overheads than setting a
> > heap size and then letting ratios size everything. This maybe a
> > separate piece of work
> 
> +1
> 
> > but the intent would be to have ergonomics calculate
> > optimal eden, survivor and tenured sizes. Each young collection is an
> > opportunity to resize Eden and Survivor whereas a full would be used
> > to resize Eden, Survivor and Tenured space. This may lead to the need
> > to ignore NewRatio and (the soft target) MaxGCPauseMillis.
> 
> Fwiw, the only collector that observes MaxGCPauseMillis is G1; in the context of Serial GC discussed further above I am confused.
> 
> Not sure if MaxGCPauseMillis would make sense in Serial GC given that you can't control Full GC pause length.

Agreed. Sizing in Serial is currently controlled by the number of non-daemon threads and that rarely changes. This implies that pause times are loosely a function of load and LDS size.

> 
> Also, in the context of G1 some of the statements above are hard to understand: e.g. the text seems to imply that there is a fixed ratio between eden and survivor which isn't really the case, at least not in the sense of Serial  GC.

Sorry for the confusion, I wasn?t trying to imply that the ratio is fixed. I was trying to do was introduce better default start settings When I?m tuning I tend to set the young to tenured ratio to 1 and then set the survivor ratio to 2. This allows me to collect as clean a signal from the collector as it possible. I would then make adjustments from that starting point. If we want to resize then I believe that this starting point would give ergonomics a better chance to stabilize at a more optimal place. 
> 
> Could you elaborate?
> 
> Even then, with Serial GC's fixed generation sizes fine-grained on-the-fly adaptation as somewhat suggested might be harder than usual.
> 
> Not against doing all that, but it really sounds like separate work.

I believe it might be as it feels like to falls into the category of auto-tuning.

> 
> >
> > As for testing. I?m currently looking at modifying HyperAlloc to add
> > ability to alter the shape of the load on the collector over time.
> >
> > All of this is still in it?s infancy and we?re open for guidance and
> > input.
> >
> > As for the work on G1, an initial patch as been submitted (URL above)
> > and is open for comments.
> >
> 
> The patch does not seem to implement AHS. It implements CurrentMaxHeapSize which might be what AHS uses to set max heap size.
> 
> To implement AHS for G1 roughly at least the following items need to be added/implemented/changed:
> 
> * remove the use of Min/MaxHeapFreeRatio for heap sizing. These flags completely disregard cpu and heap pressure based heap sizing (should also be removed from Serial GC - this means deprecating/obsoleting this flag as soon as the last user is gone).
> 
> * implement CurrentMaxHeapSize which is a (configurable) hard limit on how much the Java application may allocate (JDK-8204088) in support of AHS. As mentioned, that patch might be an initial discussion base.
> I do not think we need a JEP for that, but it gives you more publicity.
> 
> * implement SoftMaxHeapSize in the sense of ZGC where it uses it to guide IHOP (or ZGC's equivalent). Note that I am not sure that SoftMaxHeapSize is something absolutely necessary in the context of AHS, but may be a tool.
> 
> * the same background functionality as for serial: implement some mechanism to control the heap size based on the decisions of AHS; i.e. start collections to get to heap target, uncommit stuff/enqueue for uncommit etc.
> 
> Currently G1 only resizes the heap during Remark and Full GC which is too limiting to follow current "memory pressure". Maybe use/update Soft/CurrentMaxHeapSize as needed so that GC compacts the heap first; this may either be in the form of JDK-8238687 which uncommits at every gc, which is probably still too limiting for an AHS system.

Got it, I think we?re aiming to get all of this done it?s just not written here. But I appreciate the list as it?s helpful.

> 
> Probably other issues will crop up along the way.
> 
> * do lots of testing to weed out corner cases and hopefully not regress too much from current performance

I?m hoping that instead of regressing, we reduce GC interference. And happy to avoid a JEP but also happy to write one if it?s really needed.. and I don?t need nor want more publicity but thanks for the warning. ;-)

Kind regards,
Kirk

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20240927/74296741/attachment-0001.htm>

From wkemper at openjdk.org  Fri Sep 27 23:26:33 2024
From: wkemper at openjdk.org (William Kemper)
Date: Fri, 27 Sep 2024 23:26:33 GMT
Subject: RFR: 8332697: ubsan: shenandoahSimpleBitMap.inline.hpp:68:23:
 runtime error: signed integer overflow: -9223372036854775808 - 1 cannot be
 represented in type 'long int' [v2]
In-Reply-To: <I8sOa0g-PbjrI3R9yuuosgg83kNah0m0wiLbsNN8IFM=.495943c2-9206-44c5-abb1-4f68cd98b0c5@github.com>
References: <I8sOa0g-PbjrI3R9yuuosgg83kNah0m0wiLbsNN8IFM=.495943c2-9206-44c5-abb1-4f68cd98b0c5@github.com>
Message-ID: <w6J3PVBQ9SVVEp2MaMORMCyMd0lXyksDMYiBkSTzUSk=.8cf70b3c-8e93-4f30-b7dc-7adaf64486cb@github.com>

> Use an unsigned version of `right_n_bits`.

William Kemper has updated the pull request incrementally with one additional commit since the last revision:

  Use template to match type of subtrahend and minuend

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/21236/files
  - new: https://git.openjdk.org/jdk/pull/21236/files/4e33d52f..a3fb5858

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=21236&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21236&range=00-01

  Stats: 29 lines in 3 files changed: 4 ins; 3 del; 22 mod
  Patch: https://git.openjdk.org/jdk/pull/21236.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/21236/head:pull/21236

PR: https://git.openjdk.org/jdk/pull/21236

From wkemper at openjdk.org  Fri Sep 27 23:39:15 2024
From: wkemper at openjdk.org (William Kemper)
Date: Fri, 27 Sep 2024 23:39:15 GMT
Subject: RFR: 8332697: ubsan: shenandoahSimpleBitMap.inline.hpp:68:23:
 runtime error: signed integer overflow: -9223372036854775808 - 1 cannot be
 represented in type 'long int' [v3]
In-Reply-To: <I8sOa0g-PbjrI3R9yuuosgg83kNah0m0wiLbsNN8IFM=.495943c2-9206-44c5-abb1-4f68cd98b0c5@github.com>
References: <I8sOa0g-PbjrI3R9yuuosgg83kNah0m0wiLbsNN8IFM=.495943c2-9206-44c5-abb1-4f68cd98b0c5@github.com>
Message-ID: <G1ZPGXEoLjBF6PKKptpfHZfitfTsLQZfuTAHvNRSuz8=.caed0816-0227-47f5-9d4f-1e95e18f5e39@github.com>

> Use a template version of `right_n_bits` to use the same type for minuend and subtrahend.

William Kemper has updated the pull request incrementally with one additional commit since the last revision:

  Fix comments

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/21236/files
  - new: https://git.openjdk.org/jdk/pull/21236/files/a3fb5858..97d1272b

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=21236&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21236&range=01-02

  Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/21236.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/21236/head:pull/21236

PR: https://git.openjdk.org/jdk/pull/21236

From kbarrett at openjdk.org  Sat Sep 28 05:24:42 2024
From: kbarrett at openjdk.org (Kim Barrett)
Date: Sat, 28 Sep 2024 05:24:42 GMT
Subject: RFR: 8340945: Ubsan: oopStorage.cpp:374:8: runtime error: applying
 non-zero offset 18446744073709551168 to null pointer
Message-ID: <1uTdfa0c-Nrk1-6t0IPbYRmwy9URAHyz7-V4YHx5hDo=.377970e5-1b31-4658-94fc-e232bfbdec3b@github.com>

Please review this change to the OopStorage handling of storage block lookup,
now being more careful about pointer arithmetic to avoid UB.

As an initial cleanup, renamed OopStorage::find_block_or_null to
block_for_ptr, for consistency with the Block function that implements it.
Also moved the precondition assert that the argument is non-null into the
Block function, where the requirement is located.

Changed OopStorage::Block::block_for_ptr to avoid pointer arithmetic that
might invoke UB, instead converting the pointer argument to uintptr_t and
performing arithmetic on it. Also fixed its description in the header file.

Similarly changed OopStorage::Block::active_index_safe to avoid pointer
arithmetic, instead converting to uintptr_t for arithmetic.  This avoids
potential problems when the Block argument is a "false positive" from
block_for_ptr. 

Changed OopStorage::allocation_status to check up front for a null argument,
immediately returning INVALID_ENTRY in that case. This avoids voilating
block_for_ptr's precondition that the argument is non-null. Added a gtest for
this.  Also added a gtest for the potential false-positive case.

While updating gtests, removed #ifndef DISABLE_GARBAGE_ALLOCATION_STATUS_TESTS.
That macro was included when these tests were first added, because some tests
needed to be disabled on Windows, due to SafeFetchN in gtest context not working
on that platform. That was later fixed by JDK-8185734. The conditional #define
of that macro in test_oopStorage.cpp was removed, but the no longer needed
#ifndef was inadvertently not removed.

Testing: mach5 tier1-5
Locally (linux-x64) reproduced the reported ubsan failure, and verified it no
longer reproduces with these changes.

While working on this change I noticed a related issue.  The recently added
OopStorage::print_containing doesn't verify the block is not a false positive
before using it as a block.  I'll file a JBS issue for this.

-------------

Commit messages:
 - be more careful

Changes: https://git.openjdk.org/jdk/pull/21240/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21240&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8340945
  Stats: 69 lines in 4 files changed: 37 ins; 4 del; 28 mod
  Patch: https://git.openjdk.org/jdk/pull/21240.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/21240/head:pull/21240

PR: https://git.openjdk.org/jdk/pull/21240

From fjiang at openjdk.org  Sat Sep 28 11:55:45 2024
From: fjiang at openjdk.org (Feilong Jiang)
Date: Sat, 28 Sep 2024 11:55:45 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v23]
In-Reply-To: <tJsw5FSPMXWKAZ3WXlzGlcJShrMdjNJbvGqQiANonsE=.b93d6ecf-dbd5-41ad-9dfb-359db1389a16@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <tmnNZFXsAOWiXareBZUTR0NSDWDyPJOm3-MfoOfwsK8=.8ebb7ccc-ef9b-478e-bbeb-28becd9c1c85@github.com>
 <tJsw5FSPMXWKAZ3WXlzGlcJShrMdjNJbvGqQiANonsE=.b93d6ecf-dbd5-41ad-9dfb-359db1389a16@github.com>
Message-ID: <I1S80XwpG4xc8fyAf3rijX6NEjnZZYcQB44tqeSBrRA=.e79bf065-5f72-4338-9da6-0f129178564f@github.com>

On Wed, 18 Sep 2024 07:57:15 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> Roberto Casta?eda Lozano has updated the pull request incrementally with seven additional commits since the last revision:
>> 
>>  - Assert that unneeded stub tmp registers are not initialized in x64 and aarch64 platforms
>>  - Set tmp registers to noreg by default in G1PreBarrierStubC2::initialize_registers, for consistency
>>  - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion
>>  - Restore some asserts
>>  - Default values for tmp regs of G1PostBarrierStubC2
>>  -  8334060: [arm32] Implementation of Late Barrier Expansion for G1
>>  - 8330685: [arm32] share barrier spilling logic
>
> Thanks for the arm 32-bits port @snazarkin! Merged in commit 3957c03f.
> Besides the arm 32-bits port, @snazarkin's changeset includes adding the possibility to use a third temporary register in the platform-independent class `G1PostBarrierStubC2`. This temporary register (`G1PostBarrierStubC2::_tmp3`) is initialized to `noreg` by default in `G1PostBarrierStubC2::initialize_registers`, so no other platform should be affected.

Hi @robcasloz, riscv port cleanup is available at https://github.com/feilongjiang/jdk/commit/1297f6086e1de62196e2acddf2f7c86a29619dd7, would you please help to apply it?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2380614984

From rsunderbabu at openjdk.org  Sun Sep 29 08:39:05 2024
From: rsunderbabu at openjdk.org (Ramkumar Sunderbabu)
Date: Sun, 29 Sep 2024 08:39:05 GMT
Subject: RFR: 8211400: nsk.share.gc.Memory::getArrayLength returns wrong value
Message-ID: <GQaOlCJVR2a3MmdcVNu62xZvHuxk4uHOL0Sjr3lAba0=.c37cb8d2-3b9a-47a6-b45f-72f83cdab21e@github.com>

Current formula is incorrect since array doesn't use reference for each element.

Tested with test groups,
vmTestbase_vm_gc_ref
vmTestbase_vm_gc_juggle
vmTestbase_vm_gc_misc

-------------

Commit messages:
 - 8211400: nsk.share.gc.Memory::getArrayLength returns wrong value

Changes: https://git.openjdk.org/jdk/pull/21247/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21247&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8211400
  Stats: 7 lines in 1 file changed: 0 ins; 3 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/21247.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/21247/head:pull/21247

PR: https://git.openjdk.org/jdk/pull/21247

From phh at openjdk.org  Sun Sep 29 21:01:34 2024
From: phh at openjdk.org (Paul Hohensee)
Date: Sun, 29 Sep 2024 21:01:34 GMT
Subject: RFR: 8332697: ubsan: shenandoahSimpleBitMap.inline.hpp:68:23:
 runtime error: signed integer overflow: -9223372036854775808 - 1 cannot be
 represented in type 'long int' [v3]
In-Reply-To: <G1ZPGXEoLjBF6PKKptpfHZfitfTsLQZfuTAHvNRSuz8=.caed0816-0227-47f5-9d4f-1e95e18f5e39@github.com>
References: <I8sOa0g-PbjrI3R9yuuosgg83kNah0m0wiLbsNN8IFM=.495943c2-9206-44c5-abb1-4f68cd98b0c5@github.com>
 <G1ZPGXEoLjBF6PKKptpfHZfitfTsLQZfuTAHvNRSuz8=.caed0816-0227-47f5-9d4f-1e95e18f5e39@github.com>
Message-ID: <tUFC0HZBXgjR-L3CeomY9kgZYSFndJpQBbqqLZVSsNo=.1630cd38-f726-46ca-9e4e-53469cd60bc3@github.com>

On Fri, 27 Sep 2024 23:39:15 GMT, William Kemper <wkemper at openjdk.org> wrote:

>> Use a template version of `right_n_bits` to use the same type for minuend and subtrahend.
>
> William Kemper has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix comments

Rather than define a new method get_right_n_bits(), why not just replace the definition of right_n_bits() in globalDefinitions.hpp? The C++ compiler will inline and optimize both.

-------------

PR Review: https://git.openjdk.org/jdk/pull/21236#pullrequestreview-2335997131

From kbarrett at openjdk.org  Mon Sep 30 01:42:45 2024
From: kbarrett at openjdk.org (Kim Barrett)
Date: Mon, 30 Sep 2024 01:42:45 GMT
Subject: RFR: 8211400: nsk.share.gc.Memory::getArrayLength returns wrong
 value
In-Reply-To: <GQaOlCJVR2a3MmdcVNu62xZvHuxk4uHOL0Sjr3lAba0=.c37cb8d2-3b9a-47a6-b45f-72f83cdab21e@github.com>
References: <GQaOlCJVR2a3MmdcVNu62xZvHuxk4uHOL0Sjr3lAba0=.c37cb8d2-3b9a-47a6-b45f-72f83cdab21e@github.com>
Message-ID: <qT2xtZJ26_V8LIVq0ldY3kjifb0ATcy-qsPj2Vt0kyc=.1db003b8-6ec4-418f-99ca-96171e474f0b@github.com>

On Sun, 29 Sep 2024 08:33:31 GMT, Ramkumar Sunderbabu <rsunderbabu at openjdk.org> wrote:

> Current formula is incorrect since array doesn't use reference for each element.
> 
> Tested with test groups,
> vmTestbase_vm_gc_ref
> vmTestbase_vm_gc_juggle
> vmTestbase_vm_gc_misc

I've never looked at this file before.  Wow!  Several problems spotted on just brief
skimming!  But out of scope for this specific issue.

test/hotspot/jtreg/vmTestbase/nsk/share/gc/Memory.java line 166:

> 164:          */
> 165:         public static long getArraySize(int length, long objectSize) {
> 166:                 return getObjectExtraSize() + length * objectSize;

pre-existing: Shouldn't that be getArrayExtraSize()?

-------------

Changes requested by kbarrett (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/21247#pullrequestreview-2336196756
PR Review Comment: https://git.openjdk.org/jdk/pull/21247#discussion_r1780306207

From rcastanedalo at openjdk.org  Mon Sep 30 05:02:12 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Mon, 30 Sep 2024 05:02:12 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v27]
In-Reply-To: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
Message-ID: <YYVkTHSwqRigZMpp3wjVjULgwfhawfXCszR2iKEPazA=.a7b37dee-07cd-45e9-aeed-0f90cba1268a@github.com>

> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail.
> 
> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold:
> 
> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and
> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work.
> 
> ## Summary of the Changes
> 
> ### Platform-Independent Changes (`src/hotspot/share`)
> 
> These consist mainly of:
> 
> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early;
> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and
> - temporary support for porting the JEP to the remaining platforms.
> 
> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed.
> 
> ### Platform-Dependent Changes (`src/hotspot/cpu`)
> 
> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms.
> 
> #### ADL Changes
> 
> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy.
> 
> #### `G1BarrierSetAssembler` Changes
> 
> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c...

Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 53 additional commits since the last revision:

 - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion
 - riscv port refactor
 - Remove temporary support code
 - Merge jdk-24+17
 - Relax 'must match' assertion in ppc's g1StoreN after limiting pinning bypass optimization
 - Remove unnecessary reg-to-reg moves in aarch64's g1CompareAndX instructions
 - Reintroduce matcher assert and limit pinning bypass optimization to non-shared EncodeP nodes
 - Merge jdk-24+16
 - Ensure that detected encode-and-store patterns are matched
 - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion
 - ... and 43 more: https://git.openjdk.org/jdk/compare/8ee5f762...14483b83

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19746/files
  - new: https://git.openjdk.org/jdk/pull/19746/files/6fb36e50..14483b83

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=26
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=25-26

  Stats: 19042 lines in 408 files changed: 13042 ins; 3680 del; 2320 mod
  Patch: https://git.openjdk.org/jdk/pull/19746.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746

PR: https://git.openjdk.org/jdk/pull/19746

From aboldtch at openjdk.org  Mon Sep 30 06:22:46 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Mon, 30 Sep 2024 06:22:46 GMT
Subject: RFR: 8340419: ZGC: Create an UseLargePages adaptation of
 TestAllocateHeapAt.java
In-Reply-To: <ynQJWSbQMh3j1ryENUE6EMdku-NwJsDEVxlfbP8wRIU=.f0aa81d7-cda3-40b6-a478-43285305da22@github.com>
References: <ynQJWSbQMh3j1ryENUE6EMdku-NwJsDEVxlfbP8wRIU=.f0aa81d7-cda3-40b6-a478-43285305da22@github.com>
Message-ID: <oK-CchJ3DHD5Gbowc6Df3ERmEQOkcbKjhmGAeoGyww4=.8831aa6a-8424-4a4a-8d54-13ae675f218c@github.com>

On Mon, 23 Sep 2024 07:28:15 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

> [JDK-8340146](https://bugs.openjdk.org/browse/JDK-8340146) / #21127  disables TestAllocateHeapAt.java when running with persistent hugepages because it makes assumptions on the underlying file systems.
> 
> I propose creating a version of this tests which instead first checks if there is an appropriate mount point for a persistent hugepages heap file, and only runs the test if exists.

Thanks for the reviews.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21128#issuecomment-2382205091

From aboldtch at openjdk.org  Mon Sep 30 06:22:47 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Mon, 30 Sep 2024 06:22:47 GMT
Subject: Integrated: 8340419: ZGC: Create an UseLargePages adaptation of
 TestAllocateHeapAt.java
In-Reply-To: <ynQJWSbQMh3j1ryENUE6EMdku-NwJsDEVxlfbP8wRIU=.f0aa81d7-cda3-40b6-a478-43285305da22@github.com>
References: <ynQJWSbQMh3j1ryENUE6EMdku-NwJsDEVxlfbP8wRIU=.f0aa81d7-cda3-40b6-a478-43285305da22@github.com>
Message-ID: <jr9CdbDHpzAbExREpyQOuCSBRbf8u01pL62cNIIvzAc=.6c6a0c3f-c9ad-480b-bea8-4d61cc3a093e@github.com>

On Mon, 23 Sep 2024 07:28:15 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

> [JDK-8340146](https://bugs.openjdk.org/browse/JDK-8340146) / #21127  disables TestAllocateHeapAt.java when running with persistent hugepages because it makes assumptions on the underlying file systems.
> 
> I propose creating a version of this tests which instead first checks if there is an appropriate mount point for a persistent hugepages heap file, and only runs the test if exists.

This pull request has now been integrated.

Changeset: 6514aef8
Author:    Axel Boldt-Christmas <aboldtch at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/6514aef8403fa5fc09e5c064a783ff0f1fccd0cf
Stats:     91 lines in 1 file changed: 91 ins; 0 del; 0 mod

8340419: ZGC: Create an UseLargePages adaptation of TestAllocateHeapAt.java

Reviewed-by: stefank, sjohanss, jsikstro

-------------

PR: https://git.openjdk.org/jdk/pull/21128

From rcastanedalo at openjdk.org  Mon Sep 30 07:59:45 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Mon, 30 Sep 2024 07:59:45 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v23]
In-Reply-To: <tJsw5FSPMXWKAZ3WXlzGlcJShrMdjNJbvGqQiANonsE=.b93d6ecf-dbd5-41ad-9dfb-359db1389a16@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <tmnNZFXsAOWiXareBZUTR0NSDWDyPJOm3-MfoOfwsK8=.8ebb7ccc-ef9b-478e-bbeb-28becd9c1c85@github.com>
 <tJsw5FSPMXWKAZ3WXlzGlcJShrMdjNJbvGqQiANonsE=.b93d6ecf-dbd5-41ad-9dfb-359db1389a16@github.com>
Message-ID: <wDXbebDXKaS1BLFr5XjyorgSTKdMR4VyWurQxpnq8qI=.1f273382-8da9-4c76-9e89-d2ad380768c8@github.com>

On Wed, 18 Sep 2024 07:57:15 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> Roberto Casta?eda Lozano has updated the pull request incrementally with seven additional commits since the last revision:
>> 
>>  - Assert that unneeded stub tmp registers are not initialized in x64 and aarch64 platforms
>>  - Set tmp registers to noreg by default in G1PreBarrierStubC2::initialize_registers, for consistency
>>  - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion
>>  - Restore some asserts
>>  - Default values for tmp regs of G1PostBarrierStubC2
>>  -  8334060: [arm32] Implementation of Late Barrier Expansion for G1
>>  - 8330685: [arm32] share barrier spilling logic
>
> Thanks for the arm 32-bits port @snazarkin! Merged in commit 3957c03f.
> Besides the arm 32-bits port, @snazarkin's changeset includes adding the possibility to use a third temporary register in the platform-independent class `G1PostBarrierStubC2`. This temporary register (`G1PostBarrierStubC2::_tmp3`) is initialized to `noreg` by default in `G1PostBarrierStubC2::initialize_registers`, so no other platform should be affected.

> Hi @robcasloz, riscv port cleanup is available at [feilongjiang at 1297f60](https://github.com/feilongjiang/jdk/commit/1297f6086e1de62196e2acddf2f7c86a29619dd7), would you please help to apply it?

Done (commit 14483b83), thanks!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2382377364

From rcastanedalo at openjdk.org  Mon Sep 30 08:24:52 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Mon, 30 Sep 2024 08:24:52 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v27]
In-Reply-To: <YYVkTHSwqRigZMpp3wjVjULgwfhawfXCszR2iKEPazA=.a7b37dee-07cd-45e9-aeed-0f90cba1268a@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <YYVkTHSwqRigZMpp3wjVjULgwfhawfXCszR2iKEPazA=.a7b37dee-07cd-45e9-aeed-0f90cba1268a@github.com>
Message-ID: <kn6wmU2JWQIgsUOV5Ps7mCSKKWM1MZjUL3wxplLznpA=.bbbba473-c0a4-45ac-bfbb-185b7df15328@github.com>

On Mon, 30 Sep 2024 05:02:12 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail.
>> 
>> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold:
>> 
>> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and
>> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work.
>> 
>> ## Summary of the Changes
>> 
>> ### Platform-Independent Changes (`src/hotspot/share`)
>> 
>> These consist mainly of:
>> 
>> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early;
>> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and
>> - temporary support for porting the JEP to the remaining platforms.
>> 
>> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed.
>> 
>> ### Platform-Dependent Changes (`src/hotspot/cpu`)
>> 
>> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms.
>> 
>> #### ADL Changes
>> 
>> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy.
>> 
>> #### `G1BarrierSetAssembler` Changes
>> 
>> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ...
>
> Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 53 additional commits since the last revision:
> 
>  - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion
>  - riscv port refactor
>  - Remove temporary support code
>  - Merge jdk-24+17
>  - Relax 'must match' assertion in ppc's g1StoreN after limiting pinning bypass optimization
>  - Remove unnecessary reg-to-reg moves in aarch64's g1CompareAndX instructions
>  - Reintroduce matcher assert and limit pinning bypass optimization to non-shared EncodeP nodes
>  - Merge jdk-24+16
>  - Ensure that detected encode-and-store patterns are matched
>  - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion
>  - ... and 43 more: https://git.openjdk.org/jdk/compare/60c13deb...14483b83

I just updated to jdk-24+17 (commit bda4ab21) and removed the temporary support code guarded by `G1_LATE_BARRIER_MIGRATION_SUPPORT` (commit 55a1f621). The current changeset passes all tests specified in the pull request [description](https://github.com/openjdk/jdk/pull/19746#issue-2356905813) and yields benchmark results similar to those of the original submission.
@albertnetymk @vnkozlov @tschatzl @kimbarrett could you please re-review? Thanks!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2382431347

From tschatzl at openjdk.org  Mon Sep 30 08:31:37 2024
From: tschatzl at openjdk.org (Thomas Schatzl)
Date: Mon, 30 Sep 2024 08:31:37 GMT
Subject: RFR: 8340945: Ubsan: oopStorage.cpp:374:8: runtime error: applying
 non-zero offset 18446744073709551168 to null pointer
In-Reply-To: <1uTdfa0c-Nrk1-6t0IPbYRmwy9URAHyz7-V4YHx5hDo=.377970e5-1b31-4658-94fc-e232bfbdec3b@github.com>
References: <1uTdfa0c-Nrk1-6t0IPbYRmwy9URAHyz7-V4YHx5hDo=.377970e5-1b31-4658-94fc-e232bfbdec3b@github.com>
Message-ID: <GXtisYQsJzRmKg4XjtDl6uPHjAYSltJkd-3JfmmdBK0=.490a65c5-5957-4821-96d9-2c6fdbc059d8@github.com>

On Sat, 28 Sep 2024 05:20:23 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:

> Please review this change to the OopStorage handling of storage block lookup,
> now being more careful about pointer arithmetic to avoid UB.
> 
> As an initial cleanup, renamed OopStorage::find_block_or_null to
> block_for_ptr, for consistency with the Block function that implements it.
> Also moved the precondition assert that the argument is non-null into the
> Block function, where the requirement is located.
> 
> Changed OopStorage::Block::block_for_ptr to avoid pointer arithmetic that
> might invoke UB, instead converting the pointer argument to uintptr_t and
> performing arithmetic on it. Also fixed its description in the header file.
> 
> Similarly changed OopStorage::Block::active_index_safe to avoid pointer
> arithmetic, instead converting to uintptr_t for arithmetic.  This avoids
> potential problems when the Block argument is a "false positive" from
> block_for_ptr. 
> 
> Changed OopStorage::allocation_status to check up front for a null argument,
> immediately returning INVALID_ENTRY in that case. This avoids voilating
> block_for_ptr's precondition that the argument is non-null. Added a gtest for
> this.  Also added a gtest for the potential false-positive case.
> 
> While updating gtests, removed #ifndef DISABLE_GARBAGE_ALLOCATION_STATUS_TESTS.
> That macro was included when these tests were first added, because some tests
> needed to be disabled on Windows, due to SafeFetchN in gtest context not working
> on that platform. That was later fixed by JDK-8185734. The conditional #define
> of that macro in test_oopStorage.cpp was removed, but the no longer needed
> #ifndef was inadvertently not removed.
> 
> Testing: mach5 tier1-5
> Locally (linux-x64) reproduced the reported ubsan failure, and verified it no
> longer reproduces with these changes.
> 
> While working on this change I noticed a related issue.  The recently added
> OopStorage::print_containing doesn't verify the block is not a false positive
> before using it as a block.  I'll file a JBS issue for this.

Marked as reviewed by tschatzl (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/21240#pullrequestreview-2336748455

From rsunderbabu at openjdk.org  Mon Sep 30 08:52:11 2024
From: rsunderbabu at openjdk.org (Ramkumar Sunderbabu)
Date: Mon, 30 Sep 2024 08:52:11 GMT
Subject: RFR: 8211400: nsk.share.gc.Memory::getArrayLength returns wrong
 value [v2]
In-Reply-To: <GQaOlCJVR2a3MmdcVNu62xZvHuxk4uHOL0Sjr3lAba0=.c37cb8d2-3b9a-47a6-b45f-72f83cdab21e@github.com>
References: <GQaOlCJVR2a3MmdcVNu62xZvHuxk4uHOL0Sjr3lAba0=.c37cb8d2-3b9a-47a6-b45f-72f83cdab21e@github.com>
Message-ID: <sAIyMLk3kDaqXRP53Lfj3zgm1dVEuXDsnNrXDajwXIc=.2bba3617-f411-4a7e-a689-85184f6a1def@github.com>

> Current formula is incorrect since array doesn't use reference for each element.
> 
> Tested with test groups,
> vmTestbase_vm_gc_ref
> vmTestbase_vm_gc_juggle
> vmTestbase_vm_gc_misc

Ramkumar Sunderbabu has updated the pull request incrementally with one additional commit since the last revision:

  review comment fix on getArraySize method

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/21247/files
  - new: https://git.openjdk.org/jdk/pull/21247/files/eb3dcde5..d26624f5

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=21247&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21247&range=00-01

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/21247.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/21247/head:pull/21247

PR: https://git.openjdk.org/jdk/pull/21247

From kbarrett at openjdk.org  Mon Sep 30 09:42:35 2024
From: kbarrett at openjdk.org (Kim Barrett)
Date: Mon, 30 Sep 2024 09:42:35 GMT
Subject: RFR: 8211400: nsk.share.gc.Memory::getArrayLength returns wrong
 value [v2]
In-Reply-To: <sAIyMLk3kDaqXRP53Lfj3zgm1dVEuXDsnNrXDajwXIc=.2bba3617-f411-4a7e-a689-85184f6a1def@github.com>
References: <GQaOlCJVR2a3MmdcVNu62xZvHuxk4uHOL0Sjr3lAba0=.c37cb8d2-3b9a-47a6-b45f-72f83cdab21e@github.com>
 <sAIyMLk3kDaqXRP53Lfj3zgm1dVEuXDsnNrXDajwXIc=.2bba3617-f411-4a7e-a689-85184f6a1def@github.com>
Message-ID: <cFh-EDxLmWCx9VVZ_FrzyZv7m5Dsa8U8EqOEdTq8mIw=.e4633eb2-7a5e-4b0b-ad24-884a0e41bd28@github.com>

On Mon, 30 Sep 2024 08:52:11 GMT, Ramkumar Sunderbabu <rsunderbabu at openjdk.org> wrote:

>> Current formula is incorrect since array doesn't use reference for each element.
>> 
>> Tested with test groups,
>> vmTestbase_vm_gc_ref
>> vmTestbase_vm_gc_juggle
>> vmTestbase_vm_gc_misc
>
> Ramkumar Sunderbabu has updated the pull request incrementally with one additional commit since the last revision:
> 
>   review comment fix on getArraySize method

Looks good.

-------------

Marked as reviewed by kbarrett (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/21247#pullrequestreview-2336922963

From tschatzl at openjdk.org  Mon Sep 30 09:50:35 2024
From: tschatzl at openjdk.org (Thomas Schatzl)
Date: Mon, 30 Sep 2024 09:50:35 GMT
Subject: RFR: 8211400: nsk.share.gc.Memory::getArrayLength returns wrong
 value [v2]
In-Reply-To: <sAIyMLk3kDaqXRP53Lfj3zgm1dVEuXDsnNrXDajwXIc=.2bba3617-f411-4a7e-a689-85184f6a1def@github.com>
References: <GQaOlCJVR2a3MmdcVNu62xZvHuxk4uHOL0Sjr3lAba0=.c37cb8d2-3b9a-47a6-b45f-72f83cdab21e@github.com>
 <sAIyMLk3kDaqXRP53Lfj3zgm1dVEuXDsnNrXDajwXIc=.2bba3617-f411-4a7e-a689-85184f6a1def@github.com>
Message-ID: <HG7iVOQz3bALqZcK3UTFaKRTvOMQvezW0w3OyKcM2rk=.ab8ca304-9494-45bd-88e3-a5cbac7802eb@github.com>

On Mon, 30 Sep 2024 08:52:11 GMT, Ramkumar Sunderbabu <rsunderbabu at openjdk.org> wrote:

>> Current formula is incorrect since array doesn't use reference for each element.
>> 
>> Tested with test groups,
>> vmTestbase_vm_gc_ref
>> vmTestbase_vm_gc_juggle
>> vmTestbase_vm_gc_misc
>
> Ramkumar Sunderbabu has updated the pull request incrementally with one additional commit since the last revision:
> 
>   review comment fix on getArraySize method

Marked as reviewed by tschatzl (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/21247#pullrequestreview-2336937758

From duke at openjdk.org  Mon Sep 30 09:50:35 2024
From: duke at openjdk.org (duke)
Date: Mon, 30 Sep 2024 09:50:35 GMT
Subject: RFR: 8211400: nsk.share.gc.Memory::getArrayLength returns wrong
 value [v2]
In-Reply-To: <sAIyMLk3kDaqXRP53Lfj3zgm1dVEuXDsnNrXDajwXIc=.2bba3617-f411-4a7e-a689-85184f6a1def@github.com>
References: <GQaOlCJVR2a3MmdcVNu62xZvHuxk4uHOL0Sjr3lAba0=.c37cb8d2-3b9a-47a6-b45f-72f83cdab21e@github.com>
 <sAIyMLk3kDaqXRP53Lfj3zgm1dVEuXDsnNrXDajwXIc=.2bba3617-f411-4a7e-a689-85184f6a1def@github.com>
Message-ID: <X3L8ybnkWiZVxLIcAbaiIPeQmR0XMuA6SjJlFHI_0u0=.aa44fa5b-39ae-4a52-bcc3-c172b6087f1d@github.com>

On Mon, 30 Sep 2024 08:52:11 GMT, Ramkumar Sunderbabu <rsunderbabu at openjdk.org> wrote:

>> Current formula is incorrect since array doesn't use reference for each element.
>> 
>> Tested with test groups,
>> vmTestbase_vm_gc_ref
>> vmTestbase_vm_gc_juggle
>> vmTestbase_vm_gc_misc
>
> Ramkumar Sunderbabu has updated the pull request incrementally with one additional commit since the last revision:
> 
>   review comment fix on getArraySize method

@rsunderbabu 
Your change (at version d26624f56ca8817a4f0de5eb105a3d0e1442c7aa) is now ready to be sponsored by a Committer.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21247#issuecomment-2382646344

From tschatzl at openjdk.org  Mon Sep 30 10:04:51 2024
From: tschatzl at openjdk.org (Thomas Schatzl)
Date: Mon, 30 Sep 2024 10:04:51 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v27]
In-Reply-To: <YYVkTHSwqRigZMpp3wjVjULgwfhawfXCszR2iKEPazA=.a7b37dee-07cd-45e9-aeed-0f90cba1268a@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <YYVkTHSwqRigZMpp3wjVjULgwfhawfXCszR2iKEPazA=.a7b37dee-07cd-45e9-aeed-0f90cba1268a@github.com>
Message-ID: <9o1uzat6Ap4MIn7o6xZhwXKaHsgMRNi_2qyvpcjAlIQ=.311f7fe0-45be-4f39-a28d-3fc1d5ea5471@github.com>

On Mon, 30 Sep 2024 05:02:12 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail.
>> 
>> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold:
>> 
>> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and
>> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work.
>> 
>> ## Summary of the Changes
>> 
>> ### Platform-Independent Changes (`src/hotspot/share`)
>> 
>> These consist mainly of:
>> 
>> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early;
>> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and
>> - temporary support for porting the JEP to the remaining platforms.
>> 
>> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed.
>> 
>> ### Platform-Dependent Changes (`src/hotspot/cpu`)
>> 
>> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms.
>> 
>> #### ADL Changes
>> 
>> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy.
>> 
>> #### `G1BarrierSetAssembler` Changes
>> 
>> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ...
>
> Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 53 additional commits since the last revision:
> 
>  - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion
>  - riscv port refactor
>  - Remove temporary support code
>  - Merge jdk-24+17
>  - Relax 'must match' assertion in ppc's g1StoreN after limiting pinning bypass optimization
>  - Remove unnecessary reg-to-reg moves in aarch64's g1CompareAndX instructions
>  - Reintroduce matcher assert and limit pinning bypass optimization to non-shared EncodeP nodes
>  - Merge jdk-24+16
>  - Ensure that detected encode-and-store patterns are matched
>  - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion
>  - ... and 43 more: https://git.openjdk.org/jdk/compare/55c0ecf8...14483b83

Still seems good.

Mostly only looked at the changes in the GC directory and the barrier code themselves as I do not feel enabled to comment too much on other (compiler) changes.

-------------

Marked as reviewed by tschatzl (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19746#pullrequestreview-2336972915

From rcastanedalo at openjdk.org  Mon Sep 30 11:33:47 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Mon, 30 Sep 2024 11:33:47 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v27]
In-Reply-To: <9o1uzat6Ap4MIn7o6xZhwXKaHsgMRNi_2qyvpcjAlIQ=.311f7fe0-45be-4f39-a28d-3fc1d5ea5471@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <YYVkTHSwqRigZMpp3wjVjULgwfhawfXCszR2iKEPazA=.a7b37dee-07cd-45e9-aeed-0f90cba1268a@github.com>
 <9o1uzat6Ap4MIn7o6xZhwXKaHsgMRNi_2qyvpcjAlIQ=.311f7fe0-45be-4f39-a28d-3fc1d5ea5471@github.com>
Message-ID: <k5L6RA13sr-DPBjDHrQvOHHvuyVp-QS-1K0hvmGKcAI=.40e3f989-1bff-451c-8ea3-0d8131faeabf@github.com>

On Mon, 30 Sep 2024 10:02:17 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:

> Still seems good.
> 
> Mostly only looked at the changes in the GC directory and the barrier code themselves as I do not feel enabled to comment too much on other (compiler) changes.

Thanks for re-reviewing, Thomas!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2382930857

From fyang at openjdk.org  Mon Sep 30 11:53:52 2024
From: fyang at openjdk.org (Fei Yang)
Date: Mon, 30 Sep 2024 11:53:52 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v27]
In-Reply-To: <YYVkTHSwqRigZMpp3wjVjULgwfhawfXCszR2iKEPazA=.a7b37dee-07cd-45e9-aeed-0f90cba1268a@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <YYVkTHSwqRigZMpp3wjVjULgwfhawfXCszR2iKEPazA=.a7b37dee-07cd-45e9-aeed-0f90cba1268a@github.com>
Message-ID: <CxZFW6wYMPcWcYD5D_5MJA-8o_crcP-9Hj14kHydcPw=.fed64934-c7a9-4272-8bea-808b60449743@github.com>

On Mon, 30 Sep 2024 05:02:12 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail.
>> 
>> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold:
>> 
>> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and
>> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work.
>> 
>> ## Summary of the Changes
>> 
>> ### Platform-Independent Changes (`src/hotspot/share`)
>> 
>> These consist mainly of:
>> 
>> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early;
>> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and
>> - temporary support for porting the JEP to the remaining platforms.
>> 
>> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed.
>> 
>> ### Platform-Dependent Changes (`src/hotspot/cpu`)
>> 
>> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms.
>> 
>> #### ADL Changes
>> 
>> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy.
>> 
>> #### `G1BarrierSetAssembler` Changes
>> 
>> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ...
>
> Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 53 additional commits since the last revision:
> 
>  - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion
>  - riscv port refactor
>  - Remove temporary support code
>  - Merge jdk-24+17
>  - Relax 'must match' assertion in ppc's g1StoreN after limiting pinning bypass optimization
>  - Remove unnecessary reg-to-reg moves in aarch64's g1CompareAndX instructions
>  - Reintroduce matcher assert and limit pinning bypass optimization to non-shared EncodeP nodes
>  - Merge jdk-24+16
>  - Ensure that detected encode-and-store patterns are matched
>  - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion
>  - ... and 43 more: https://git.openjdk.org/jdk/compare/dede1992...14483b83

Updated RISC-V part of the change looks good to me.

-------------

Marked as reviewed by fyang (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19746#pullrequestreview-2337279856

From rcastanedalo at openjdk.org  Mon Sep 30 12:06:48 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Mon, 30 Sep 2024 12:06:48 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v27]
In-Reply-To: <CxZFW6wYMPcWcYD5D_5MJA-8o_crcP-9Hj14kHydcPw=.fed64934-c7a9-4272-8bea-808b60449743@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <YYVkTHSwqRigZMpp3wjVjULgwfhawfXCszR2iKEPazA=.a7b37dee-07cd-45e9-aeed-0f90cba1268a@github.com>
 <CxZFW6wYMPcWcYD5D_5MJA-8o_crcP-9Hj14kHydcPw=.fed64934-c7a9-4272-8bea-808b60449743@github.com>
Message-ID: <su1eKQ6fTSj1LtGrc9hxcFZwGXqT716OMwhgyWSuggs=.232a40cc-19d1-4d96-aa82-c8fccb7172b9@github.com>

On Mon, 30 Sep 2024 11:51:02 GMT, Fei Yang <fyang at openjdk.org> wrote:

> Updated RISC-V part of the change looks good to me.

Thanks, Fei!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2382997964

From rcastanedalo at openjdk.org  Mon Sep 30 12:40:54 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Mon, 30 Sep 2024 12:40:54 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v11]
In-Reply-To: <shoDPnL4nQyaYX4xy-WyzQ7CwsoCbAeCE3c755CkE1o=.caf940aa-5fbb-4cb5-a4a2-68a2452ffe1b@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <DUgk6HT_Zk5KnbmAd_HzrJi4B8D1J-uuoAGzReWyI8s=.6c04f90a-bf1d-49af-a31d-5224973952d4@github.com>
 <YTte6oY9EeAPHxzzOndL8M202wY91SbeaYC4hPva0W0=.caf00270-89c7-44b0-ab88-a970fb8e840d@github.com>
 <kzodWwxj5qdAqPYcP2s4k2QTXqjju2H6Iftu7dojezk=.1f72fc31-94ba-4017-b330-c8688a8d39a4@github.com>
 <hkGaBWSugvpV0PgoR9rnutHCgQzkeXv1ctH7QwV3zK0=.204737d5-fadb-410a-8891-7db59d36830f@github.com>
 <shoDPnL4nQyaYX4xy-WyzQ7CwsoCbAeCE3c755CkE1o=.caf940aa-5fbb-4cb5-a4a2-68a2452ffe1b@github.com>
Message-ID: <TBZCkcFNNnEvat7TBnRkqUPo4Y5TEe1L0iUZ9loD7DI=.39c0249c-f6b2-45c8-b7c8-9e2c9c1b7a99@github.com>

On Thu, 12 Sep 2024 13:20:14 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> Indeed, I could re-enable all tests in:
> 
> ```
> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java
> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java
> test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java
> ```
> 
> but unfortunately not those others:
> 
> ```
> > > > test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java
> > > > test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java
> ```
> 
> I think the issue with all of them is that vectorization in those scenarios only works when the operations inside the loop start at an array index that addresses an element at 8-byte-aligned offset.
> 
> I have filed https://bugs.openjdk.org/browse/JDK-8340010 to track it.

@rkennke A test run of the current changeset in our internal CI system revealed that the following tests fail (because of missing vectorization) when using `-XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders -XX:UseSSE=N` with `N <= 3` on an Intel Xeon Platinum 8358 machine:

- test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java
- test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java
- test/hotspot/jtreg/compiler/vectorization/runner/LoopCombinedOpTest.java

Here are the failure details:


test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java:

1) Method "public static void compiler.c2.irTests.TestVectorizationNotRun.test(byte[],long[])" - [Failed IR rules: 1]:
   * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#LOAD_VECTOR_L#_", ">=1", "_#STORE_VECTOR#_", ">=1"}, applyIfPlatform={}, applyIfPlatformOr={}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
     > Phase "PrintIdeal":
       - counts: Graph contains wrong number of nodes:
         * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]\[2\]:\{long\})"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!
         * Constraint 2: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!


test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java:

1) Method "public static void compiler.c2.irTests.TestVectorizationMismatchedAccess.testByteByte1(byte[],byte[])" - [Failed IR rules: 1]:
   * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#LOAD_VECTOR_L#_", ">=1", "_#STORE_VECTOR#_", ">=1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
     > Phase "PrintIdeal":
       - counts: Graph contains wrong number of nodes:
         * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]\[2\]:\{long\})"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!
         * Constraint 2: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!

2) Method "public static void compiler.c2.irTests.TestVectorizationMismatchedAccess.testByteByte2(byte[],byte[])" - [Failed IR rules: 1]:
   * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#LOAD_VECTOR_L#_", ">=1", "_#STORE_VECTOR#_", ">=1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
     > Phase "PrintIdeal":
       - counts: Graph contains wrong number of nodes:
         * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]\[2\]:\{long\})"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!
         * Constraint 2: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!

3) Method "public static void compiler.c2.irTests.TestVectorizationMismatchedAccess.testByteLong1(byte[],long[])" - [Failed IR rules: 1]:
   * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#LOAD_VECTOR_L#_", ">=1", "_#STORE_VECTOR#_", ">=1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
     > Phase "PrintIdeal":
       - counts: Graph contains wrong number of nodes:
         * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]\[2\]:\{long\})"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!
         * Constraint 2: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!

4) Method "public static void compiler.c2.irTests.TestVectorizationMismatchedAccess.testByteLong2(byte[],long[])" - [Failed IR rules: 1]:
   * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#LOAD_VECTOR_L#_", ">=1", "_#STORE_VECTOR#_", ">=1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
     > Phase "PrintIdeal":
       - counts: Graph contains wrong number of nodes:
         * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]\[2\]:\{long\})"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!
         * Constraint 2: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!

5) Method "public static void compiler.c2.irTests.TestVectorizationMismatchedAccess.testByteLong3(byte[],long[])" - [Failed IR rules: 1]:
   * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#LOAD_VECTOR_L#_", ">=1", "_#STORE_VECTOR#_", ">=1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
     > Phase "PrintIdeal":
       - counts: Graph contains wrong number of nodes:
         * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]\[2\]:\{long\})"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!
         * Constraint 2: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!

6) Method "public static void compiler.c2.irTests.TestVectorizationMismatchedAccess.testByteLong5(byte[],long[],int,int)" - [Failed IR rules: 1]:
   * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#LOAD_VECTOR_L#_", ">=1", "_#STORE_VECTOR#_", ">=1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
     > Phase "PrintIdeal":
       - counts: Graph contains wrong number of nodes:
         * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]\[2\]:\{long\})"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!
         * Constraint 2: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!


test/hotspot/jtreg/compiler/vectorization/runner/LoopCombinedOpTest.java:

1) Method "public int[] compiler.vectorization.runner.LoopCombinedOpTest.multipleOpsWith2DifferentTypesAndComplexExpression()" - [Failed IR rules: 1]:
   * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"asimd", "true", "sse2", "true"}, counts={"_#STORE_VECTOR#_", ">0"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
     > Phase "PrintIdeal":
       - counts: Graph contains wrong number of nodes:
         * Constraint 1: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)"
           - Failed comparison: [found] 0 > 0 [given]
           - No nodes matched!

2) Method "public int[] compiler.vectorization.runner.LoopCombinedOpTest.multipleOpsWith2DifferentTypesAndInvariant()" - [Failed IR rules: 1]:
   * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"asimd", "true", "sse2", "true"}, counts={"_#STORE_VECTOR#_", ">0"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
     > Phase "PrintIdeal":
       - counts: Graph contains wrong number of nodes:
         * Constraint 1: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)"
           - Failed comparison: [found] 0 > 0 [given]
           - No nodes matched!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2383072505

From rsunderbabu at openjdk.org  Mon Sep 30 13:46:39 2024
From: rsunderbabu at openjdk.org (Ramkumar Sunderbabu)
Date: Mon, 30 Sep 2024 13:46:39 GMT
Subject: Integrated: 8211400: nsk.share.gc.Memory::getArrayLength returns wrong
 value
In-Reply-To: <GQaOlCJVR2a3MmdcVNu62xZvHuxk4uHOL0Sjr3lAba0=.c37cb8d2-3b9a-47a6-b45f-72f83cdab21e@github.com>
References: <GQaOlCJVR2a3MmdcVNu62xZvHuxk4uHOL0Sjr3lAba0=.c37cb8d2-3b9a-47a6-b45f-72f83cdab21e@github.com>
Message-ID: <U6hbf1yPZBbZ_YrTbUfePDMcHRUJ4Y20ANh3ZT51OxA=.33fd011f-1077-49eb-9989-8f58a6508e3d@github.com>

On Sun, 29 Sep 2024 08:33:31 GMT, Ramkumar Sunderbabu <rsunderbabu at openjdk.org> wrote:

> Current formula is incorrect since array doesn't use reference for each element.
> 
> Tested with test groups,
> vmTestbase_vm_gc_ref
> vmTestbase_vm_gc_juggle
> vmTestbase_vm_gc_misc

This pull request has now been integrated.

Changeset: 860d49db
Author:    Ramkumar Sunderbabu <rsunderbabu at openjdk.org>
Committer: Kim Barrett <kbarrett at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/860d49db22cf352eaf1b3b20fff43d090f0eebc8
Stats:     7 lines in 1 file changed: 0 ins; 3 del; 4 mod

8211400: nsk.share.gc.Memory::getArrayLength returns wrong value

Reviewed-by: kbarrett, tschatzl

-------------

PR: https://git.openjdk.org/jdk/pull/21247

From shade at openjdk.org  Mon Sep 30 15:00:10 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Mon, 30 Sep 2024 15:00:10 GMT
Subject: RFR: 8340183: Shenandoah: LRB node is not matched as GC barrier after
 JDK-8340183
Message-ID: <QIPZMJIfweyc5taFSjRc3R1zPjHd6_xAvtyli0jW1iU=.c8dbf4eb-ca88-4f46-8703-55358599ab3c@github.com>

[JDK-8340183](https://bugs.openjdk.org/browse/JDK-8340183) introduced a regression: `ShenandoahBarrierSetC2::is_gc_barrier_node` is now answering `false` for the actual `ShenandoahLoadReferenceBarrierNode`. The fix reinstates the check for LRB node.

Additional testing:
 - [ ] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah`
 - [x] Linux x86_64 server fastdebug, `jdk/jfr/api/consumer/ ` (100x) -- used to fail intermittently, now it does not

-------------

Commit messages:
 - Fix

Changes: https://git.openjdk.org/jdk/pull/21266/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21266&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8340183
  Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/21266.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/21266/head:pull/21266

PR: https://git.openjdk.org/jdk/pull/21266

From rkennke at openjdk.org  Mon Sep 30 15:18:35 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Mon, 30 Sep 2024 15:18:35 GMT
Subject: RFR: 8341242: Shenandoah: LRB node is not matched as GC barrier
 after JDK-8340183
In-Reply-To: <QIPZMJIfweyc5taFSjRc3R1zPjHd6_xAvtyli0jW1iU=.c8dbf4eb-ca88-4f46-8703-55358599ab3c@github.com>
References: <QIPZMJIfweyc5taFSjRc3R1zPjHd6_xAvtyli0jW1iU=.c8dbf4eb-ca88-4f46-8703-55358599ab3c@github.com>
Message-ID: <_pdzl3TfvgJVVZUL9VKDAsUIaulvTgYg7FcKzuAGATg=.4e7ce808-9ff9-4815-a016-08c81dfc1272@github.com>

On Mon, 30 Sep 2024 14:54:30 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> [JDK-8340183](https://bugs.openjdk.org/browse/JDK-8340183) introduced a regression: `ShenandoahBarrierSetC2::is_gc_barrier_node` is now answering `false` for the actual `ShenandoahLoadReferenceBarrierNode`. The fix reinstates the check for LRB node.
> 
> Additional testing:
>  - [ ] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah`
>  - [x] Linux x86_64 server fastdebug, `jdk/jfr/api/consumer/ ` (100x) -- used to fail intermittently, now it does not

Looks good!

-------------

Marked as reviewed by rkennke (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/21266#pullrequestreview-2337883746

From phh at openjdk.org  Mon Sep 30 16:40:35 2024
From: phh at openjdk.org (Paul Hohensee)
Date: Mon, 30 Sep 2024 16:40:35 GMT
Subject: RFR: 8341242: Shenandoah: LRB node is not matched as GC barrier
 after JDK-8340183
In-Reply-To: <QIPZMJIfweyc5taFSjRc3R1zPjHd6_xAvtyli0jW1iU=.c8dbf4eb-ca88-4f46-8703-55358599ab3c@github.com>
References: <QIPZMJIfweyc5taFSjRc3R1zPjHd6_xAvtyli0jW1iU=.c8dbf4eb-ca88-4f46-8703-55358599ab3c@github.com>
Message-ID: <N0Are4lg11cF62QxtMp-CbuVsojSZo63o_qvlvlVryw=.bb3d835e-85d3-4e8c-b251-d911a0fc29af@github.com>

On Mon, 30 Sep 2024 14:54:30 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> [JDK-8340183](https://bugs.openjdk.org/browse/JDK-8340183) introduced a regression: `ShenandoahBarrierSetC2::is_gc_barrier_node` is now answering `false` for the actual `ShenandoahLoadReferenceBarrierNode`. The fix reinstates the check for LRB node.
> 
> Additional testing:
>  - [ ] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah`
>  - [x] Linux x86_64 server fastdebug, `jdk/jfr/api/consumer/ ` (100x) -- used to fail intermittently, now it does not

Marked as reviewed by phh (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/21266#pullrequestreview-2338076905

From shade at openjdk.org  Mon Sep 30 16:50:46 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Mon, 30 Sep 2024 16:50:46 GMT
Subject: RFR: 8341242: Shenandoah: LRB node is not matched as GC barrier
 after JDK-8340183
In-Reply-To: <QIPZMJIfweyc5taFSjRc3R1zPjHd6_xAvtyli0jW1iU=.c8dbf4eb-ca88-4f46-8703-55358599ab3c@github.com>
References: <QIPZMJIfweyc5taFSjRc3R1zPjHd6_xAvtyli0jW1iU=.c8dbf4eb-ca88-4f46-8703-55358599ab3c@github.com>
Message-ID: <atj4_-uUV39uleSX62sMEBlBA7lIODWkBGEtvmn4nOI=.733032b0-0eb9-47d8-bb47-b669fe3771e4@github.com>

On Mon, 30 Sep 2024 14:54:30 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> [JDK-8340183](https://bugs.openjdk.org/browse/JDK-8340183) introduced a regression: `ShenandoahBarrierSetC2::is_gc_barrier_node` is now answering `false` for the actual `ShenandoahLoadReferenceBarrierNode`. The fix reinstates the check for LRB node.
> 
> Additional testing:
>  - [ ] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah`
>  - [x] Linux x86_64 server fastdebug, `jdk/jfr/api/consumer/ ` (100x) -- used to fail intermittently, now it does not

Thanks! Trivial, right? Restores the code to previous shape.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21266#issuecomment-2383696926

From kvn at openjdk.org  Mon Sep 30 16:59:59 2024
From: kvn at openjdk.org (Vladimir Kozlov)
Date: Mon, 30 Sep 2024 16:59:59 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1
 [v27]
In-Reply-To: <YYVkTHSwqRigZMpp3wjVjULgwfhawfXCszR2iKEPazA=.a7b37dee-07cd-45e9-aeed-0f90cba1268a@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <YYVkTHSwqRigZMpp3wjVjULgwfhawfXCszR2iKEPazA=.a7b37dee-07cd-45e9-aeed-0f90cba1268a@github.com>
Message-ID: <vChwINsCng-QfOUe_LpgAimWHG0xWx0Fcx-HsGssGrg=.fbdffb81-171e-4098-9289-8b2e6a2ac5ec@github.com>

On Mon, 30 Sep 2024 05:02:12 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail.
>> 
>> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold:
>> 
>> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and
>> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work.
>> 
>> ## Summary of the Changes
>> 
>> ### Platform-Independent Changes (`src/hotspot/share`)
>> 
>> These consist mainly of:
>> 
>> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early;
>> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and
>> - temporary support for porting the JEP to the remaining platforms.
>> 
>> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed.
>> 
>> ### Platform-Dependent Changes (`src/hotspot/cpu`)
>> 
>> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms.
>> 
>> #### ADL Changes
>> 
>> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy.
>> 
>> #### `G1BarrierSetAssembler` Changes
>> 
>> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ...
>
> Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 53 additional commits since the last revision:
> 
>  - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion
>  - riscv port refactor
>  - Remove temporary support code
>  - Merge jdk-24+17
>  - Relax 'must match' assertion in ppc's g1StoreN after limiting pinning bypass optimization
>  - Remove unnecessary reg-to-reg moves in aarch64's g1CompareAndX instructions
>  - Reintroduce matcher assert and limit pinning bypass optimization to non-shared EncodeP nodes
>  - Merge jdk-24+16
>  - Ensure that detected encode-and-store patterns are matched
>  - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion
>  - ... and 43 more: https://git.openjdk.org/jdk/compare/ae84aa47...14483b83

Good.

-------------

Marked as reviewed by kvn (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19746#pullrequestreview-2338111198

From wkemper at openjdk.org  Mon Sep 30 17:01:36 2024
From: wkemper at openjdk.org (William Kemper)
Date: Mon, 30 Sep 2024 17:01:36 GMT
Subject: RFR: 8332697: ubsan: shenandoahSimpleBitMap.inline.hpp:68:23:
 runtime error: signed integer overflow: -9223372036854775808 - 1 cannot be
 represented in type 'long int' [v3]
In-Reply-To: <G1ZPGXEoLjBF6PKKptpfHZfitfTsLQZfuTAHvNRSuz8=.caed0816-0227-47f5-9d4f-1e95e18f5e39@github.com>
References: <I8sOa0g-PbjrI3R9yuuosgg83kNah0m0wiLbsNN8IFM=.495943c2-9206-44c5-abb1-4f68cd98b0c5@github.com>
 <G1ZPGXEoLjBF6PKKptpfHZfitfTsLQZfuTAHvNRSuz8=.caed0816-0227-47f5-9d4f-1e95e18f5e39@github.com>
Message-ID: <BhAHctSmhJrk8PXpvvIaXwfrEdqhTBhTfQj2XgfzREA=.18dac0d1-1364-4f26-949a-77017388806d@github.com>

On Fri, 27 Sep 2024 23:39:15 GMT, William Kemper <wkemper at openjdk.org> wrote:

>> Use a template version of `right_n_bits` to use the same type for minuend and subtrahend.
>
> William Kemper has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix comments

I tried that, but there is a warning at the macro declaration:

// (note: #define used only so that they can be used in enum constant definitions)
#define nth_bit(n)        (((n) >= BitsPerWord) ? 0 : (OneBit << (n)))
#define right_n_bits(n)   (nth_bit(n) - 1)

There are many usages of `right_n_bits` that use an unnamed enum constant, which will not accept a cast from a numeric type.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21236#issuecomment-2383717052

From rkennke at openjdk.org  Mon Sep 30 17:50:54 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Mon, 30 Sep 2024 17:50:54 GMT
Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers
 (Experimental) [v26]
In-Reply-To: <zi8MM6qUWhvxPSyb3ewa3baXWhmDIDlut2uYUQmWvcw=.7b32fa42-7a3f-43bb-8f15-fe6b8e66c49b@github.com>
References: <TMFZyyalInj2UamEkFjV62vkXfP3zRF_fZq1NR4Msw4=.abb047d2-8eaa-447c-854e-ff5fe40a7169@github.com>
 <fEpfi4CzTFr1Pl-IyOAVyHNXWlVz33lHKQyzVY0kBc8=.cc39355b-ee7a-4118-952a-28e4184eec84@github.com>
 <DKuidQyyiiJOsrerLCeziselVyG9Ztpzsyc64ZNjlI4=.722fc4f7-d7cc-49df-9d82-5c1094508f74@github.com>
 <hdttC4LhHE6y6Nbq0jjJwxxF4W0hTfJLK_siODDCBkM=.1667f495-fcd9-4b7f-8d58-24e27665eeb3@github.com>
 <fWSdSqQfhOrGpFoExiaO4L-UDtR90_t0gRxp9YBO3_c=.0173618e-f392-4921-8fae-39d44b24de08@github.com>
 <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com>
 <C5LwWrsa-VO87kdm_sYuisTGMsHoxv4Eoe0QikdkusE=.63b0993c-86c8-4ef6-a3cf-bc8a67e10815@github.com>
 <WOlTJMNHCr_Kk69sEIUwCVI3ZbiRlckY4TSBRfL7zoA=.1a4c59f1-a000-4d9f-abbc-36b24ed5105e@github.com>
 <UjwpWz9Y3W8WRzYHZmDk9EvoiLSCZG1aai5oASGtDKA=.68cab6dd-e67d-4651-a9c9-75dccf42d85f@github.com>
 <l_7f8gGpIdG9H3Ini5vHWruIa0oDi7fkn2Lz5thDI5c=.969b804f-bae2-48e1-9c44-e631d90037e7@github.com>
 <BtvwMUG5tcy_zU4tRG3Q5VVw08qE_3gSE35
 wCxlDnj4=.954353e5-cab2-48f4-b0f2-5852e586ba20@github.com>
 <zi8MM6qUWhvxPSyb3ewa3baXWhmDIDlut2uYUQmWvcw=.7b32fa42-7a3f-43bb-8f15-fe6b8e66c49b@github.com>
Message-ID: <TSxYcKKfAdx7SsVcq6iwiySCLlkoVbXQO7a25vsiG4k=.29e59a6c-b8e0-42e9-8bdd-cc8ca23df146@github.com>

On Fri, 27 Sep 2024 16:23:15 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> I believe the code in the patch is good enough as-is, especially if `UseCompactObjectHeaders` is slated to go away.  The existing `if` will prevent the < 16 byte header code from being emitted, which is the desired behavior - i.e., if the header size is >= 16, there will be no code emitted to the intrinsic for that block.  So there will not be an additional branch for the code when it is executed.
>> 
>> I'm good with a comment tying `UseCompactObjectHeaders` to the condition.  The comment can be removed when the flag is removed.  "Ship it" :-)
>
> Wait a second, I've probably not been clear. `UseCompactObjectHeaders` is slated to become *on by default* and then slated to go away. That means that array base offets <= 16 bytes will become the default. The generated code will be something like:
> 
> 
> if (haystack_len <= 8) {
>   // Copy 8 bytes onto stack
> } else if (haystack_len <= 16) {
>   // Copy 16 bytes onto stack
> } else {
>   // Copy 32 bytes onto stack
> }
> 
> 
> So that is 2 branches in this prologue code instead of originally 1.
> 
> However, I just noticed that what I proposed is not enough. Consider what happens when haystack_len is 17. This would take the last case and copy 32 bytes. But we only have 17+8=25 bytes that we can guarantee to be available for copying. If this happens to be the array at the very beginning of the heap (very rare/unlikely), this would segfault.
> 
> I think I need to mull over it some more to come up with a correct fix.

I changed the header<16 version to be a small loop: https://github.com/rkennke/jdk/commit/bcba264ea5c15581647933db1163ca1dae39b6c5

The idea is the same as before, except it's made as a small loop with a maximum of 4 iterations (backward-branches), and it copies 8 bytes at a time, such that 1. it may copy up to 7 bytes that precede the array and 2. doesn't run over the end of the array (which would potentially crash).

I am not sure if using XMM_TMP1 and XMM_TMP2 there is ok, or if it would encode better to use one of the regular registers.?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1781535745

From duke at openjdk.org  Mon Sep 30 20:46:08 2024
From: duke at openjdk.org (joejackson1993)
Date: Mon, 30 Sep 2024 20:46:08 GMT
Subject: RFR: 8337389: Parallel: Remove unnecessary forward declarations in
 psScavenge.hpp
Message-ID: <6pGuQAwh4aO2dyQcgugU0uz9ys8hulq1lyAckrKRDK0=.2f124adb-05d5-4a81-97ed-971a7d8b3a6b@github.com>

trivial cleanup

-------------

Commit messages:
 - 8337389: Remove unnecessary forward declarations

Changes: https://git.openjdk.org/jdk/pull/20393/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20393&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8337389
  Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/20393.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20393/head:pull/20393

PR: https://git.openjdk.org/jdk/pull/20393

From zgu at openjdk.org  Mon Sep 30 20:46:08 2024
From: zgu at openjdk.org (Zhengyu Gu)
Date: Mon, 30 Sep 2024 20:46:08 GMT
Subject: RFR: 8337389: Parallel: Remove unnecessary forward declarations in
 psScavenge.hpp
In-Reply-To: <6pGuQAwh4aO2dyQcgugU0uz9ys8hulq1lyAckrKRDK0=.2f124adb-05d5-4a81-97ed-971a7d8b3a6b@github.com>
References: <6pGuQAwh4aO2dyQcgugU0uz9ys8hulq1lyAckrKRDK0=.2f124adb-05d5-4a81-97ed-971a7d8b3a6b@github.com>
Message-ID: <AoCImlPql4-Y5HN7TOR2I0L73KX0OIAuJkmtH4ee_0U=.0c7ff191-ed90-405d-8859-41659974201f@github.com>

On Tue, 30 Jul 2024 18:14:36 GMT, joejackson1993 <duke at openjdk.org> wrote:

> trivial cleanup

> Thank you! Please allow for a few business days to verify that your employer has signed the OCA. Also, please note that pull requests that are pending an OCA check will not usually be evaluated, so your patience is appreciated!

I can confirm that Joseph Jackson <[joseph.jackson at servicenow.com](mailto:joseph.jackson at servicenow.com)> is an employee of ServiceNow, he is covered by ServiceNow OCA

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20393#issuecomment-2258945487

From duke at openjdk.org  Mon Sep 30 20:46:08 2024
From: duke at openjdk.org (joejackson1993)
Date: Mon, 30 Sep 2024 20:46:08 GMT
Subject: RFR: 8337389: Parallel: Remove unnecessary forward declarations in
 psScavenge.hpp
In-Reply-To: <6pGuQAwh4aO2dyQcgugU0uz9ys8hulq1lyAckrKRDK0=.2f124adb-05d5-4a81-97ed-971a7d8b3a6b@github.com>
References: <6pGuQAwh4aO2dyQcgugU0uz9ys8hulq1lyAckrKRDK0=.2f124adb-05d5-4a81-97ed-971a7d8b3a6b@github.com>
Message-ID: <AilTUIB6G1R49c4IlFWDxg4e4kO2bQLqz1x3QtNMeeo=.5aa6d569-1eda-443b-8008-5ff625130714@github.com>

On Tue, 30 Jul 2024 18:14:36 GMT, joejackson1993 <duke at openjdk.org> wrote:

> trivial cleanup

still waiting on oca confirmation.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20393#issuecomment-2318817050

From phh at openjdk.org  Mon Sep 30 21:47:39 2024
From: phh at openjdk.org (Paul Hohensee)
Date: Mon, 30 Sep 2024 21:47:39 GMT
Subject: RFR: 8332697: ubsan: shenandoahSimpleBitMap.inline.hpp:68:23:
 runtime error: signed integer overflow: -9223372036854775808 - 1 cannot be
 represented in type 'long int' [v3]
In-Reply-To: <G1ZPGXEoLjBF6PKKptpfHZfitfTsLQZfuTAHvNRSuz8=.caed0816-0227-47f5-9d4f-1e95e18f5e39@github.com>
References: <I8sOa0g-PbjrI3R9yuuosgg83kNah0m0wiLbsNN8IFM=.495943c2-9206-44c5-abb1-4f68cd98b0c5@github.com>
 <G1ZPGXEoLjBF6PKKptpfHZfitfTsLQZfuTAHvNRSuz8=.caed0816-0227-47f5-9d4f-1e95e18f5e39@github.com>
Message-ID: <vZQV9QBDMMXU1YnC50Uy8_rzzj7x_fmKomArTdAQ7S8=.1bb41d79-3e8a-47f2-8dd8-88eca476d4c1@github.com>

On Fri, 27 Sep 2024 23:39:15 GMT, William Kemper <wkemper at openjdk.org> wrote:

>> Use a template version of `right_n_bits` to use the same type for minuend and subtrahend.
>
> William Kemper has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix comments

In that case, I'd put get_right_n_bits() in globalDefinitions.hpp because it's generally useful, and a comment on why, of course. :)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21236#issuecomment-2384203382