From dholmes at openjdk.java.net  Mon Nov  1 02:12:12 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Mon, 1 Nov 2021 02:12:12 GMT
Subject: RFR: 8276096: Simplify Unsafe.{load|store}Fence fallbacks by
 delegating to fullFence
In-Reply-To: <VgjinVfDn81_9ofZ8O9EZvKQjTekvEA5wQByn0T3i_U=.b5a7a4bc-24ac-4db3-bda0-da86ba0c0312@github.com>
References: <VgjinVfDn81_9ofZ8O9EZvKQjTekvEA5wQByn0T3i_U=.b5a7a4bc-24ac-4db3-bda0-da86ba0c0312@github.com>
Message-ID: <4DdPvz6BaeM-ekZb2BB51hbEubCZch8CmnzwjBfE4Wo=.cb94614d-8d67-4bf9-868c-0c2e04d1befe@github.com>

On Thu, 28 Oct 2021 08:47:31 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> `Unsafe.{load|store}Fence` falls back to `unsafe.cpp` for `OrderAccess::{acquire|release}Fence()`. It seems too heavy-handed (useless?) to call to runtime for a single memory barrier. We can simplify the native `Unsafe` interface by falling back to `fullFence` when `{load|store}Fence` intrinsics are not available. This would be similar to what `Unsafe.{loadLoad|storeStore}Fences` do. 
> 
> This is the behavior of these intrinsics now, on x86_64, using benchmarks from JDK-8276054:
> 
> 
> Benchmark          Mode  Cnt  Score   Error  Units
> 
> # Default
> Single.acquire     avgt    3   0.407 ? 0.060  ns/op
> Single.full        avgt    3   4.693 ? 0.005  ns/op
> Single.loadLoad    avgt    3   0.415 ? 0.095  ns/op
> Single.plain       avgt    3   0.406 ? 0.002  ns/op
> Single.release     avgt    3   0.408 ? 0.047  ns/op
> Single.storeStore  avgt    3   0.408 ? 0.043  ns/op
> 
> # -XX:DisableIntrinsic=_storeFence
> Single.acquire     avgt    3   0.408 ? 0.016  ns/op
> Single.full        avgt    3   4.694 ? 0.002  ns/op
> Single.loadLoad    avgt    3   0.406 ? 0.002  ns/op
> Single.plain       avgt    3   0.406 ? 0.001  ns/op
> Single.release     avgt    3   4.694 ? 0.003  ns/op <--- upgraded to full
> Single.storeStore  avgt    3   4.690 ? 0.005  ns/op <--- upgraded to full
> 
> # -XX:DisableIntrinsic=_loadFence
> Single.acquire     avgt    3   4.691 ? 0.001  ns/op <--- upgraded to full
> Single.full        avgt    3   4.693 ? 0.009  ns/op
> Single.loadLoad    avgt    3   4.693 ? 0.013  ns/op <--- upgraded to full
> Single.plain       avgt    3   0.408 ? 0.072  ns/op
> Single.release     avgt    3   0.415 ? 0.016  ns/op
> Single.storeStore  avgt    3   0.416 ? 0.041  ns/op
> 
> # -XX:DisableIntrinsic=_fullFence
> Single.acquire     avgt    3   0.406 ? 0.014  ns/op
> Single.full        avgt    3  15.836 ? 0.151  ns/op <--- calls runtime
> Single.loadLoad    avgt    3   0.406 ? 0.001  ns/op
> Single.plain       avgt    3   0.426 ? 0.361  ns/op
> Single.release     avgt    3   0.407 ? 0.021  ns/op
> Single.storeStore  avgt    3   0.410 ? 0.061  ns/op
> 
> # -XX:DisableIntrinsic=_fullFence,_loadFence
> Single.acquire     avgt    3  15.822 ? 0.282  ns/op <--- upgraded, calls runtime
> Single.full        avgt    3  15.851 ? 0.127  ns/op <--- calls runtime
> Single.loadLoad    avgt    3  15.829 ? 0.045  ns/op <--- upgraded, calls runtime
> Single.plain       avgt    3   0.406 ? 0.001  ns/op
> Single.release     avgt    3   0.414 ? 0.156  ns/op
> Single.storeStore  avgt    3   0.422 ? 0.452  ns/op
> 
> # -XX:DisableIntrinsic=_fullFence,_storeFence
> Single.acquire     avgt    3   0.407 ? 0.016  ns/op
> Single.full        avgt    3  15.347 ? 6.783  ns/op <--- calls runtime
> Single.loadLoad    avgt    3   0.406 ? 0.001  ns/op
> Single.plain       avgt    3   0.406 ? 0.002  ns/op 
> Single.release     avgt    3  15.828 ? 0.019  ns/op <--- upgraded, calls runtime
> Single.storeStore  avgt    3  15.834 ? 0.045  ns/op <--- upgraded, calls runtime
> 
> # -XX:DisableIntrinsic=_fullFence,_loadFence,_storeFence
> Single.acquire     avgt    3  15.838 ? 0.030  ns/op <--- upgraded, calls runtime
> Single.full        avgt    3  15.854 ? 0.277  ns/op <--- calls runtime
> Single.loadLoad    avgt    3  15.826 ? 0.160  ns/op <--- upgraded, calls runtime
> Single.plain       avgt    3   0.406 ? 0.003  ns/op
> Single.release     avgt    3  15.838 ? 0.019  ns/op <--- upgraded, calls runtime
> Single.storeStore  avgt    3  15.844 ? 0.104  ns/op <--- upgraded, calls runtime
> 
> 
> Additional testing:
>  - [x] Linux x86_64 fastdebug `tier1`

src/hotspot/share/classfile/vmIntrinsics.hpp line 526:

> 524:    do_name(     storeFence_name,                                        "storeFence")                                            \
> 525:    do_alias(    storeFence_signature,                                   void_method_signature)                                   \
> 526:   do_intrinsic(_fullFence,                jdk_internal_misc_Unsafe,     fullFence_name, fullFence_signature,           F_R)      \

Why did you drop the N from F_RN? AFAICS the fullFence method is still native.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6149

From dholmes at openjdk.java.net  Mon Nov  1 02:18:13 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Mon, 1 Nov 2021 02:18:13 GMT
Subject: RFR: 8276096: Simplify Unsafe.{load|store}Fence fallbacks by
 delegating to fullFence
In-Reply-To: <VgjinVfDn81_9ofZ8O9EZvKQjTekvEA5wQByn0T3i_U=.b5a7a4bc-24ac-4db3-bda0-da86ba0c0312@github.com>
References: <VgjinVfDn81_9ofZ8O9EZvKQjTekvEA5wQByn0T3i_U=.b5a7a4bc-24ac-4db3-bda0-da86ba0c0312@github.com>
Message-ID: <2qd34U1LfATnTafN9vzz8PJx-AjhPv_o_GiKkpWN9MM=.ea3d89b4-a7ec-4993-98f7-6a42139c1796@github.com>

On Thu, 28 Oct 2021 08:47:31 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> `Unsafe.{load|store}Fence` falls back to `unsafe.cpp` for `OrderAccess::{acquire|release}Fence()`. It seems too heavy-handed (useless?) to call to runtime for a single memory barrier. We can simplify the native `Unsafe` interface by falling back to `fullFence` when `{load|store}Fence` intrinsics are not available. This would be similar to what `Unsafe.{loadLoad|storeStore}Fences` do. 
> 
> This is the behavior of these intrinsics now, on x86_64, using benchmarks from JDK-8276054:
> 
> 
> Benchmark          Mode  Cnt  Score   Error  Units
> 
> # Default
> Single.acquire     avgt    3   0.407 ? 0.060  ns/op
> Single.full        avgt    3   4.693 ? 0.005  ns/op
> Single.loadLoad    avgt    3   0.415 ? 0.095  ns/op
> Single.plain       avgt    3   0.406 ? 0.002  ns/op
> Single.release     avgt    3   0.408 ? 0.047  ns/op
> Single.storeStore  avgt    3   0.408 ? 0.043  ns/op
> 
> # -XX:DisableIntrinsic=_storeFence
> Single.acquire     avgt    3   0.408 ? 0.016  ns/op
> Single.full        avgt    3   4.694 ? 0.002  ns/op
> Single.loadLoad    avgt    3   0.406 ? 0.002  ns/op
> Single.plain       avgt    3   0.406 ? 0.001  ns/op
> Single.release     avgt    3   4.694 ? 0.003  ns/op <--- upgraded to full
> Single.storeStore  avgt    3   4.690 ? 0.005  ns/op <--- upgraded to full
> 
> # -XX:DisableIntrinsic=_loadFence
> Single.acquire     avgt    3   4.691 ? 0.001  ns/op <--- upgraded to full
> Single.full        avgt    3   4.693 ? 0.009  ns/op
> Single.loadLoad    avgt    3   4.693 ? 0.013  ns/op <--- upgraded to full
> Single.plain       avgt    3   0.408 ? 0.072  ns/op
> Single.release     avgt    3   0.415 ? 0.016  ns/op
> Single.storeStore  avgt    3   0.416 ? 0.041  ns/op
> 
> # -XX:DisableIntrinsic=_fullFence
> Single.acquire     avgt    3   0.406 ? 0.014  ns/op
> Single.full        avgt    3  15.836 ? 0.151  ns/op <--- calls runtime
> Single.loadLoad    avgt    3   0.406 ? 0.001  ns/op
> Single.plain       avgt    3   0.426 ? 0.361  ns/op
> Single.release     avgt    3   0.407 ? 0.021  ns/op
> Single.storeStore  avgt    3   0.410 ? 0.061  ns/op
> 
> # -XX:DisableIntrinsic=_fullFence,_loadFence
> Single.acquire     avgt    3  15.822 ? 0.282  ns/op <--- upgraded, calls runtime
> Single.full        avgt    3  15.851 ? 0.127  ns/op <--- calls runtime
> Single.loadLoad    avgt    3  15.829 ? 0.045  ns/op <--- upgraded, calls runtime
> Single.plain       avgt    3   0.406 ? 0.001  ns/op
> Single.release     avgt    3   0.414 ? 0.156  ns/op
> Single.storeStore  avgt    3   0.422 ? 0.452  ns/op
> 
> # -XX:DisableIntrinsic=_fullFence,_storeFence
> Single.acquire     avgt    3   0.407 ? 0.016  ns/op
> Single.full        avgt    3  15.347 ? 6.783  ns/op <--- calls runtime
> Single.loadLoad    avgt    3   0.406 ? 0.001  ns/op
> Single.plain       avgt    3   0.406 ? 0.002  ns/op 
> Single.release     avgt    3  15.828 ? 0.019  ns/op <--- upgraded, calls runtime
> Single.storeStore  avgt    3  15.834 ? 0.045  ns/op <--- upgraded, calls runtime
> 
> # -XX:DisableIntrinsic=_fullFence,_loadFence,_storeFence
> Single.acquire     avgt    3  15.838 ? 0.030  ns/op <--- upgraded, calls runtime
> Single.full        avgt    3  15.854 ? 0.277  ns/op <--- calls runtime
> Single.loadLoad    avgt    3  15.826 ? 0.160  ns/op <--- upgraded, calls runtime
> Single.plain       avgt    3   0.406 ? 0.003  ns/op
> Single.release     avgt    3  15.838 ? 0.019  ns/op <--- upgraded, calls runtime
> Single.storeStore  avgt    3  15.844 ? 0.104  ns/op <--- upgraded, calls runtime
> 
> 
> Additional testing:
>  - [x] Linux x86_64 fastdebug `tier1`

I'm not quite seeing the motivation here. Your claim is that the non-intrinsic implementations involve a native call and so that is too expensive; yet the new code still relies on the fullFence being intrinsified else it is still a native call and a heavier barrier. If these fences were intrinisified piecemeal then perhaps this is an issue on some platform, but is that really the case? If you intrinsified one wouldn't you intrinsify all?

-------------

PR: https://git.openjdk.java.net/jdk/pull/6149

From ngasson at openjdk.java.net  Mon Nov  1 04:12:12 2021
From: ngasson at openjdk.java.net (Nick Gasson)
Date: Mon, 1 Nov 2021 04:12:12 GMT
Subject: RFR: 8274179: AArch64: Support SVE operations with encodable
 immediates [v4]
In-Reply-To: <6RXxK49iDwBKpqVZar9-4B1AO5z6lLagcjLVBHT5sKo=.a274ceff-4c76-4cad-a9e7-f5f7f148ea83@github.com>
References: <iQ6P7TblcS20eQ-acZmAkLU_ikzTkMCUGXorw4LFD_8=.15686e9d-b6a0-4ee8-bd55-ecbe0512b721@github.com>
 <6RXxK49iDwBKpqVZar9-4B1AO5z6lLagcjLVBHT5sKo=.a274ceff-4c76-4cad-a9e7-f5f7f148ea83@github.com>
Message-ID: <nJArlhBGd_w8sLM80cVZSgjtHTRumCXffp1xETlV47w=.08d6e509-47c0-4e59-87c3-fef693216833@github.com>

On Fri, 29 Oct 2021 09:24:47 GMT, Fei Gao <duke at openjdk.java.net> wrote:

>> for(int i = 0; i < LENGTH; i++) {
>>       c[i] = a[i] + 2;
>>     }
>> 
>> For the case showed above, after superword optimization with SVE,
>> without the patch, the vector add operation always has 2 z-reg inputs,
>> like:
>> mov     z16.s, #2
>> add	z17.s, z17.s, z16.s
>> 
>> Considering sve has supported basic binary operations with immediate,
>> this pattern could be further optimized to:
>> add     z16.s, z16.s, #2
>> 
>> To implement it, we added some new match rules and assembler rules in
>> the aarch64 backend. We also made some extensions on immediate types
>> and functions to keep backward compatible.
>> 
>> With the patch, only these binary integer vector operations, +(add),
>> -(sub), &(and), |(orr), and ^(eor) with immediate are supported for
>> the optimization. Other vector operations are not supported currently.
>> 
>> Tested tier1 and test/hotspot/jtreg/compiler on SVE featured AArch64
>> CPU, no new failure.
>> 
>> There is no obvious performance uplift but it can help remove one
>> redundant mov instruction.
>
> Fei Gao has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Add some assertion lines for help functions
>   
>   Change-Id: Ic9120902bd8f8a8ead2e3740435a40f35d21757c

Marked as reviewed by ngasson (Reviewer).

-------------

PR: https://git.openjdk.java.net/jdk/pull/6115

From shade at openjdk.java.net  Mon Nov  1 07:36:57 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Mon, 1 Nov 2021 07:36:57 GMT
Subject: RFR: 8276096: Simplify Unsafe.{load|store}Fence fallbacks by
 delegating to fullFence [v2]
In-Reply-To: <4DdPvz6BaeM-ekZb2BB51hbEubCZch8CmnzwjBfE4Wo=.cb94614d-8d67-4bf9-868c-0c2e04d1befe@github.com>
References: <VgjinVfDn81_9ofZ8O9EZvKQjTekvEA5wQByn0T3i_U=.b5a7a4bc-24ac-4db3-bda0-da86ba0c0312@github.com>
 <4DdPvz6BaeM-ekZb2BB51hbEubCZch8CmnzwjBfE4Wo=.cb94614d-8d67-4bf9-868c-0c2e04d1befe@github.com>
Message-ID: <R-0qeP6lPDN3LQ8js6rHXGelUe_2OcEC7Hu-VFYco7w=.7d126728-8273-459e-a407-e4880b4b60a8@github.com>

On Mon, 1 Nov 2021 02:09:19 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Restore RN for fullFence
>
> src/hotspot/share/classfile/vmIntrinsics.hpp line 526:
> 
>> 524:    do_name(     storeFence_name,                                        "storeFence")                                            \
>> 525:    do_alias(    storeFence_signature,                                   void_method_signature)                                   \
>> 526:   do_intrinsic(_fullFence,                jdk_internal_misc_Unsafe,     fullFence_name, fullFence_signature,           F_R)      \
> 
> Why did you drop the N from F_RN? AFAICS the fullFence method is still native.

Good spot! That's indeed incorrect, fixed in new commit. I am surprised `CheckIntrinsics` did not found this discrepancy.  I believe "native" flags are not checked at all? For example, existing `_hashCode` intrinsic is also `F_R`, while it covers the native `java.lang.Object::hashCode`. I try to beef up those checks separately.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6149

From shade at openjdk.java.net  Mon Nov  1 07:36:53 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Mon, 1 Nov 2021 07:36:53 GMT
Subject: RFR: 8276096: Simplify Unsafe.{load|store}Fence fallbacks by
 delegating to fullFence [v2]
In-Reply-To: <VgjinVfDn81_9ofZ8O9EZvKQjTekvEA5wQByn0T3i_U=.b5a7a4bc-24ac-4db3-bda0-da86ba0c0312@github.com>
References: <VgjinVfDn81_9ofZ8O9EZvKQjTekvEA5wQByn0T3i_U=.b5a7a4bc-24ac-4db3-bda0-da86ba0c0312@github.com>
Message-ID: <6VLgphi_CCvby1B3jzpYuchN6ZT-dFaZ2e9VSba3YsQ=.62b863ac-0b29-47fa-a6d8-2ca49b8dd891@github.com>

> `Unsafe.{load|store}Fence` falls back to `unsafe.cpp` for `OrderAccess::{acquire|release}Fence()`. It seems too heavy-handed (useless?) to call to runtime for a single memory barrier. We can simplify the native `Unsafe` interface by falling back to `fullFence` when `{load|store}Fence` intrinsics are not available. This would be similar to what `Unsafe.{loadLoad|storeStore}Fences` do. 
> 
> This is the behavior of these intrinsics now, on x86_64, using benchmarks from JDK-8276054:
> 
> 
> Benchmark          Mode  Cnt  Score   Error  Units
> 
> # Default
> Single.acquire     avgt    3   0.407 ? 0.060  ns/op
> Single.full        avgt    3   4.693 ? 0.005  ns/op
> Single.loadLoad    avgt    3   0.415 ? 0.095  ns/op
> Single.plain       avgt    3   0.406 ? 0.002  ns/op
> Single.release     avgt    3   0.408 ? 0.047  ns/op
> Single.storeStore  avgt    3   0.408 ? 0.043  ns/op
> 
> # -XX:DisableIntrinsic=_storeFence
> Single.acquire     avgt    3   0.408 ? 0.016  ns/op
> Single.full        avgt    3   4.694 ? 0.002  ns/op
> Single.loadLoad    avgt    3   0.406 ? 0.002  ns/op
> Single.plain       avgt    3   0.406 ? 0.001  ns/op
> Single.release     avgt    3   4.694 ? 0.003  ns/op <--- upgraded to full
> Single.storeStore  avgt    3   4.690 ? 0.005  ns/op <--- upgraded to full
> 
> # -XX:DisableIntrinsic=_loadFence
> Single.acquire     avgt    3   4.691 ? 0.001  ns/op <--- upgraded to full
> Single.full        avgt    3   4.693 ? 0.009  ns/op
> Single.loadLoad    avgt    3   4.693 ? 0.013  ns/op <--- upgraded to full
> Single.plain       avgt    3   0.408 ? 0.072  ns/op
> Single.release     avgt    3   0.415 ? 0.016  ns/op
> Single.storeStore  avgt    3   0.416 ? 0.041  ns/op
> 
> # -XX:DisableIntrinsic=_fullFence
> Single.acquire     avgt    3   0.406 ? 0.014  ns/op
> Single.full        avgt    3  15.836 ? 0.151  ns/op <--- calls runtime
> Single.loadLoad    avgt    3   0.406 ? 0.001  ns/op
> Single.plain       avgt    3   0.426 ? 0.361  ns/op
> Single.release     avgt    3   0.407 ? 0.021  ns/op
> Single.storeStore  avgt    3   0.410 ? 0.061  ns/op
> 
> # -XX:DisableIntrinsic=_fullFence,_loadFence
> Single.acquire     avgt    3  15.822 ? 0.282  ns/op <--- upgraded, calls runtime
> Single.full        avgt    3  15.851 ? 0.127  ns/op <--- calls runtime
> Single.loadLoad    avgt    3  15.829 ? 0.045  ns/op <--- upgraded, calls runtime
> Single.plain       avgt    3   0.406 ? 0.001  ns/op
> Single.release     avgt    3   0.414 ? 0.156  ns/op
> Single.storeStore  avgt    3   0.422 ? 0.452  ns/op
> 
> # -XX:DisableIntrinsic=_fullFence,_storeFence
> Single.acquire     avgt    3   0.407 ? 0.016  ns/op
> Single.full        avgt    3  15.347 ? 6.783  ns/op <--- calls runtime
> Single.loadLoad    avgt    3   0.406 ? 0.001  ns/op
> Single.plain       avgt    3   0.406 ? 0.002  ns/op 
> Single.release     avgt    3  15.828 ? 0.019  ns/op <--- upgraded, calls runtime
> Single.storeStore  avgt    3  15.834 ? 0.045  ns/op <--- upgraded, calls runtime
> 
> # -XX:DisableIntrinsic=_fullFence,_loadFence,_storeFence
> Single.acquire     avgt    3  15.838 ? 0.030  ns/op <--- upgraded, calls runtime
> Single.full        avgt    3  15.854 ? 0.277  ns/op <--- calls runtime
> Single.loadLoad    avgt    3  15.826 ? 0.160  ns/op <--- upgraded, calls runtime
> Single.plain       avgt    3   0.406 ? 0.003  ns/op
> Single.release     avgt    3  15.838 ? 0.019  ns/op <--- upgraded, calls runtime
> Single.storeStore  avgt    3  15.844 ? 0.104  ns/op <--- upgraded, calls runtime
> 
> 
> Additional testing:
>  - [x] Linux x86_64 fastdebug `tier1`

Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:

  Restore RN for fullFence

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6149/files
  - new: https://git.openjdk.java.net/jdk/pull/6149/files/e2c623be..a0fd03ee

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6149&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6149&range=00-01

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6149.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6149/head:pull/6149

PR: https://git.openjdk.java.net/jdk/pull/6149

From shade at openjdk.java.net  Mon Nov  1 08:18:17 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Mon, 1 Nov 2021 08:18:17 GMT
Subject: RFR: 8276096: Simplify Unsafe.{load|store}Fence fallbacks by
 delegating to fullFence
In-Reply-To: <2qd34U1LfATnTafN9vzz8PJx-AjhPv_o_GiKkpWN9MM=.ea3d89b4-a7ec-4993-98f7-6a42139c1796@github.com>
References: <VgjinVfDn81_9ofZ8O9EZvKQjTekvEA5wQByn0T3i_U=.b5a7a4bc-24ac-4db3-bda0-da86ba0c0312@github.com>
 <2qd34U1LfATnTafN9vzz8PJx-AjhPv_o_GiKkpWN9MM=.ea3d89b4-a7ec-4993-98f7-6a42139c1796@github.com>
Message-ID: <766OQW0EKB1-XFSKGDvYLBFPP_I0Kxwj_dI84d1RoeE=.b2da8f04-9fc5-401c-afe5-9f763d130f65@github.com>

On Mon, 1 Nov 2021 02:15:04 GMT, David Holmes <dholmes at openjdk.org> wrote:

> I'm not quite seeing the motivation here. Your claim is that the non-intrinsic implementations involve a native call and so that is too expensive; yet the new code still relies on the fullFence being intrinsified else it is still a native call and a heavier barrier. If these fences were intrinisified piecemeal then perhaps this is an issue on some platform, but is that really the case? If you intrinsified one wouldn't you intrinsify all?

Yes, that was not clear, sorry. For current platforms, it is mostly a maintenance cleanup to shrink the unnecessary Unsafe interfaces: if we disable the `acquireFence` intrinsic, we don't need to call into native fallback (which would be excessive), instead we can just go to Java-level fallback (which would also be faster). 

I am looking at the cases where we would like to only intrinsify `fullFence`, for example for Zero interpreter. Instead of handling all three flavors of fences, we can get the majority of performance win by only drilling the interpreter-entry-intrinsic hole for `fullFence`, and let everything else handled at Java level.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6149

From shade at openjdk.java.net  Mon Nov  1 09:16:09 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Mon, 1 Nov 2021 09:16:09 GMT
Subject: RFR: 8275718: Relax memory constraint on exception counter updates
In-Reply-To: <c2Zx315h8Vh5o3lI8wa9Pvt6LU3jYx0xg0RicwpjBk0=.058e2612-2d0b-4c6c-94d8-1cd30a3853f4@github.com>
References: <c2Zx315h8Vh5o3lI8wa9Pvt6LU3jYx0xg0RicwpjBk0=.058e2612-2d0b-4c6c-94d8-1cd30a3853f4@github.com>
Message-ID: <gl_aD-XD3JMqev_LazMh-ZL0JYhVrOaS_ii_xKHSpAc=.a9cbb293-f1f6-4528-a5eb-ebfaa476f167@github.com>

On Thu, 21 Oct 2021 15:16:28 GMT, Zhengyu Gu <zgu at openjdk.org> wrote:

> This is another instance of counter updates that only need atomic guarantee.

(I am not arguing in favor or against this particular change, but I think we can talk a bit about generic stuff here...)

> I don't know where this guarantee is coming from. Two r-m-w atomic ops must have some guarantee via coherence for the atomic op to actually work. And an implementation could make any atomic r-m-w implementation ensure global immediate visibility. But you cannot assume this is guaranteed for all hardware. Even for a given platform this would need to be a specified guarantee in the architecture manual, not just something deduced/inferred by reasoning.

Hotspot's `memory_order_relaxed` is [aligned](https://github.com/openjdk/jdk/blob/5bb1992b8408a0d196b1afa308bc00d007458dbd/src/hotspot/share/runtime/atomic.hpp#L44-L45) with C++11 atomics semantics. C++11 atomic semantics for relaxed atomic ops requires [single modification order consistency](https://en.cppreference.com/w/cpp/atomic/memory_order#Relaxed_ordering), which implies [coherence](https://en.cppreference.com/w/cpp/atomic/memory_order#Modification_order).

All known hardware platforms provide coherence out of the box (they are, indeed, cache-coherent platforms), that's why it is easy to implement in C++ (`mo_relaxed`) and in Java (`VarHandles.(get|set)opaque`).

I am always confused by "immediate global visibility". The problem with statements that include "immediate", "before", "after" is that they leak in the notion of time, which is ill-defined for a single memory location without any reference to other variables. Maybe you can expand your concern with the example?

-------------

PR: https://git.openjdk.java.net/jdk/pull/6065

From stefank at openjdk.java.net  Mon Nov  1 09:31:14 2021
From: stefank at openjdk.java.net (Stefan Karlsson)
Date: Mon, 1 Nov 2021 09:31:14 GMT
Subject: RFR: 8275527: Refactor forward pointer access [v4]
In-Reply-To: <tu8-LJVLLI-0yU9Fsvdnqkx9sTkfarZu3XNIElJ0kak=.6c3578d7-0a9a-447e-a167-27c412ac200f@github.com>
References: <lLd1nmhXCgBhAySmq81KrMplWUSWMCJS4OybGZuMjco=.4012548b-6f1d-44ee-a59c-21f1077cba01@github.com>
 <tu8-LJVLLI-0yU9Fsvdnqkx9sTkfarZu3XNIElJ0kak=.6c3578d7-0a9a-447e-a167-27c412ac200f@github.com>
Message-ID: <L12GPa4-qxtJlMxtEOEpiVc4v1TJ_-DwFqC5DWOarI4=.35af6235-3555-41e6-b0c6-67e011a83829@github.com>

On Thu, 28 Oct 2021 12:35:37 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> Accessing the forward pointer is currently a little inconsistent. Some code paths call oopDesc::forwardee() / oopDesc::is_forwarded(), some code paths call forwardee() and check it for ==/!= NULL, some code paths even call markWord::decode_pointer() and markWord::is_marked() instead.
>> 
>> This change attempts to make the situation more consistent. For simple cases it preserves oopDesc::forwardee() / is_forwarded(), some cases need to use the markWord for consistency in concurrent GC, they now use markWord::forwardee() and markWord::is_forwarded(). Also, checking whether or not an object is forwarded is now consistently done using is_forwarded() and not by checking forwardee ==/!= NULL. This also resolves the mess in G1 full GC that changes not-forwarded objects to have a NULL (fake-) pointer. This is not necessary, because we can just as well use the lock bits to determine whether or not the object is forwarded.
>> 
>> Testing:
>>  - [x] tier
>>  - [x] tier2
>>  - [x] hotspot_gc
>
> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Move forward impl into markWord and add assert

Thanks for doing this change. This looks good to me. I've added a comment below that I think would be nice to get resolved somehow, though I don't need to re-review if you update with any of the suggestions.

src/hotspot/share/oops/markWord.hpp line 253:

> 251:     return cast_to_oop(decode_pointer());
> 252:   }
> 253: };

This brings the forwarded/forwardee terminology into the markWord. The markWord was previously decoupled from those to concepts. I would personally let those function names stay in oopDesc and not leak down into the markWord. If you do want to keep it here, could you update the comments at the top that describes the bits?

//    [ptr             | 11]  marked             used to mark an object

-------------

Marked as reviewed by stefank (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/5955

From aph at openjdk.java.net  Mon Nov  1 10:44:10 2021
From: aph at openjdk.java.net (Andrew Haley)
Date: Mon, 1 Nov 2021 10:44:10 GMT
Subject: RFR: 8275718: Relax memory constraint on exception counter updates
In-Reply-To: <ACmXkouJG2jF_Ms1_pfG4HEKdJh6oc6preXZ6dwiGnU=.9267edd2-a52e-4765-9018-355d1619fea3@github.com>
References: <c2Zx315h8Vh5o3lI8wa9Pvt6LU3jYx0xg0RicwpjBk0=.058e2612-2d0b-4c6c-94d8-1cd30a3853f4@github.com>
 <kB0r9yWGd9rsQhPxVhi3MHpPnjU53PG9nteNdC4Vg8E=.ccf14c29-f1b0-4085-bf90-418f6ac582aa@github.com>
 <oLxDFwe2TZYTGD5pp0hgV9xD0ujrONkQ5SXXFr5_f_4=.e014f919-3ef0-434e-8dbe-171625276f57@github.com>
 <ACmXkouJG2jF_Ms1_pfG4HEKdJh6oc6preXZ6dwiGnU=.9267edd2-a52e-4765-9018-355d1619fea3@github.com>
Message-ID: <raJN0HxMyMx4VKwIcS7-wAkUaw6QHL0joYyjEtdBWx8=.59f9a360-70ab-4678-b58d-7090e29acec1@github.com>

On Sun, 31 Oct 2021 11:53:36 GMT, Andrew Haley <aph at openjdk.org> wrote:

> > We had internal discussion on this topic, Aleksey pointed out: "All modifications to any particular atomic variable occur in a total order that is specific to this one atomic variable". This guarantee holds even for relaxed atomic load/stores. This is a very basic guarantee.
> 
> I think that's true for most processors as a consequence of multi-copy atomicity, but we support Power which is not multi-copy atomic, where stores can become visible to one group of threads before they become visible to all threads.

Sorry, this was something of a red herring.

My main point: imposing ordering with respect to other memory accesses around a counter increment does nothing useful unless you care about the ordering of the increment with respect to those other accesses, which AFAICS you don't in this case.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6065

From shade at openjdk.java.net  Mon Nov  1 11:34:26 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Mon, 1 Nov 2021 11:34:26 GMT
Subject: RFR: 8276217: Harmonize StrictMath intrinsics handling
Message-ID: <v9bs8_XIuFrN450gme8g0FcjBEOepcUmr3MStF-B7pg=.5ee1a282-da4b-4ec0-a71c-f5321ff702c8@github.com>

This blocks JDK-8276215: `StrictMath` intrinsics are handled peculiarly by giving failing intrinsics a second chance to match against the similar `Math` intrinsics. This has interesting consequence for matchers: we can match the native `StrictMath.sqrt` to non-native intrinsic for `Math.sqrt`. Interpreter would then have to disambiguate the two. It could be made simpler and more consistent.

For `min`/`max` methods, `StrictMath` already delegates to `Math` methods, so we can just drop the intrinsics for them. `sqrt` is harder to delegate, because it is `native` and a part of public API, so we can instead do the proper special intrinsic for it.

There seem to be no performance regressions with this patch at least on Linux x86_64:


$ CONF=linux-x86_64-server-release make test TEST="micro:StrictMathBench" 

Benchmark                   Mode  Cnt       Score     Error   Units

### Before

StrictMathBench.minDouble  thrpt    4  230921.558 ? 234.238  ops/ms
StrictMathBench.minFloat   thrpt    4  230932.303 ? 126.721  ops/ms
StrictMathBench.minInt     thrpt    4  230917.256 ?  73.008  ops/ms
StrictMathBench.minLong    thrpt    4  194460.828 ? 178.079  ops/ms


StrictMathBench.maxDouble  thrpt    4  230983.180 ? 161.211  ops/ms
StrictMathBench.maxFloat   thrpt    4  230969.290 ? 277.500  ops/ms
StrictMathBench.maxInt     thrpt    4  231033.581 ? 200.015  ops/ms
StrictMathBench.maxLong    thrpt    4  194590.744 ? 114.295  ops/ms


StrictMathBench.sqrtDouble  thrpt    4  230722.037 ? 2222.080  ops/ms

### After

StrictMathBench.minDouble  thrpt    4  230976.625 ?  67.338  ops/ms
StrictMathBench.minFloat   thrpt    4  230896.021 ? 270.434  ops/ms
StrictMathBench.minInt     thrpt    4  230859.741 ? 403.147  ops/ms
StrictMathBench.minLong    thrpt    4  194456.673 ? 111.557  ops/ms

StrictMathBench.maxDouble  thrpt    4  230890.776 ?  89.924  ops/ms
StrictMathBench.maxFloat   thrpt    4  230918.334 ?  63.160  ops/ms
StrictMathBench.maxInt     thrpt    4  231059.128 ?  51.224  ops/ms
StrictMathBench.maxLong    thrpt    4  194488.210 ? 495.224  ops/ms

StrictMathBench.sqrtDouble  thrpt    4  231023.703 ? 247.330  ops/ms


Additional testing:
 - [x] `StrictMath` benchmarks
 - [x] Linux x86_64 fastdebug `tier1`

-------------

Commit messages:
 - Fix

Changes: https://git.openjdk.java.net/jdk/pull/6184/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6184&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8276217
  Stats: 66 lines in 16 files changed: 27 ins; 26 del; 13 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6184.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6184/head:pull/6184

PR: https://git.openjdk.java.net/jdk/pull/6184

From mcimadamore at openjdk.java.net  Mon Nov  1 12:05:32 2021
From: mcimadamore at openjdk.java.net (Maurizio Cimadamore)
Date: Mon, 1 Nov 2021 12:05:32 GMT
Subject: RFR: 8275063: Implementation of Foreign Function & Memory API
 (Second incubator) [v10]
In-Reply-To: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
References: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
Message-ID: <wUAuOEYgLOLXwm-WgtLLMj3GKVJ2nFIZazQSi16zaXc=.4637910d-a678-4d1a-89d9-82f439ade88a@github.com>

> This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment.
> 
> [1] - https://openjdk.java.net/jeps/419

Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits:

 - Add cache for memory address var handles
 - Merge branch 'master' into JEP-419
 - Fix regression in VaList treatment on AArch64 (contributed by @nick-arm)
 - Merge branch 'master' into JEP-419
 - Fix copyright header in TestArrayCopy
 - Fix failing microbenchmarks. Contributed by @FrauBoes (thanks!)
 - * use `invokeWithArguments` to simplify new test
 - Add test for liveness check with high-aririty downcalls
   (make sure that if an exception occurs in a downcall because of liveness,
   ref count of other resources are left intact).
 - * Fix javadoc issue in VaList
   * Fix bug in concurrent logic for shared scope acquire
 - Address review comments
 - ... and 7 more: https://git.openjdk.java.net/jdk/compare/5bb1992b...9b519343

-------------

Changes: https://git.openjdk.java.net/jdk/pull/5907/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=09
  Stats: 14497 lines in 189 files changed: 6773 ins; 5149 del; 2575 mod
  Patch: https://git.openjdk.java.net/jdk/pull/5907.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/5907/head:pull/5907

PR: https://git.openjdk.java.net/jdk/pull/5907

From shade at openjdk.java.net  Mon Nov  1 12:27:13 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Mon, 1 Nov 2021 12:27:13 GMT
Subject: RFR: 8276096: Simplify Unsafe.{load|store}Fence fallbacks by
 delegating to fullFence [v2]
In-Reply-To: <R-0qeP6lPDN3LQ8js6rHXGelUe_2OcEC7Hu-VFYco7w=.7d126728-8273-459e-a407-e4880b4b60a8@github.com>
References: <VgjinVfDn81_9ofZ8O9EZvKQjTekvEA5wQByn0T3i_U=.b5a7a4bc-24ac-4db3-bda0-da86ba0c0312@github.com>
 <4DdPvz6BaeM-ekZb2BB51hbEubCZch8CmnzwjBfE4Wo=.cb94614d-8d67-4bf9-868c-0c2e04d1befe@github.com>
 <R-0qeP6lPDN3LQ8js6rHXGelUe_2OcEC7Hu-VFYco7w=.7d126728-8273-459e-a407-e4880b4b60a8@github.com>
Message-ID: <JzVb0KDNU6qP_G9tVajrv36JHuYnfu_7WAWylHI8ok4=.bd9594bb-5143-4e4e-9ede-da49d5b98184@github.com>

On Mon, 1 Nov 2021 07:32:18 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> src/hotspot/share/classfile/vmIntrinsics.hpp line 526:
>> 
>>> 524:    do_name(     storeFence_name,                                        "storeFence")                                            \
>>> 525:    do_alias(    storeFence_signature,                                   void_method_signature)                                   \
>>> 526:   do_intrinsic(_fullFence,                jdk_internal_misc_Unsafe,     fullFence_name, fullFence_signature,           F_R)      \
>> 
>> Why did you drop the N from F_RN? AFAICS the fullFence method is still native.
>
> Good spot! That's indeed incorrect, fixed in new commit. I am surprised `CheckIntrinsics` did not found this discrepancy.  I believe "native" flags are not checked at all? For example, existing `_hashCode` intrinsic is also `F_R`, while it covers the native `java.lang.Object::hashCode`. I try to beef up those checks separately.

This `CheckIntrinsics` oddity is handled by #6187.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6149

From duke at openjdk.java.net  Mon Nov  1 12:46:12 2021
From: duke at openjdk.java.net (Evgeny Astigeevich)
Date: Mon, 1 Nov 2021 12:46:12 GMT
Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v10]
In-Reply-To: <XJp4oiJWEXZBs7uJoJuI6X1EFVDEdPcRqdp0iojsqyc=.161d77f1-7288-4edc-9a30-f60b0f6f4289@github.com>
References: <kNBCb5Kvq3poBJuzHn0mw_MP5ubeoyWUUvvkhkXt2dA=.70e0af5a-b45a-4962-9550-e518747e35fc@github.com>
 <8P-tWT-7UC9TMLz8zo5liDy2rOONBU864RUlRhthLeY=.05ef40f6-38ca-4e84-a54b-eb8dbae2b97f@github.com>
 <PswJD64Mx5cD9Q5uWSfvIiyw24tUw6SfUvPNEvuGVSs=.786f3266-6d0b-48c0-be80-7dc5a4e79de5@github.com>
 <XJp4oiJWEXZBs7uJoJuI6X1EFVDEdPcRqdp0iojsqyc=.161d77f1-7288-4edc-9a30-f60b0f6f4289@github.com>
Message-ID: <DEYSo_-jNIvioXIGAuq6lwkZjv--Ak1UnbpNoI6s8qE=.469b7fa7-973f-4357-aa8f-49f337e159bc@github.com>

On Fri, 15 Oct 2021 13:09:27 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> Can we have a simple (as simple as possible) JMH benchmark, please? It should be something like a couple of threads racing to count up to a million.
>
>> @theRealAph, any comments on the microbenchmark I wrote?
> 
> Something like this works well:
> 
> 
>    @Param({"1000000"})
>     public int maxNum;
> 
>     @Param({"4"})
>     public int threadCount;
> 
>     AtomicInteger theCounter;
> 
>     Thread threads[];
> 
>     void work() {
>         for (;;) {
>             int prev = theCounter.get();
>             if (prev >= maxNum) {
>                 break;
>             }
>             if (theCounter.compareAndExchange(prev, prev + 1) != prev) {
>                 Thread.onSpinWait();
>             }
>         }
>     }
> 
>     @Setup(Level.Trial)
>     public void foo() {
>         theCounter = new AtomicInteger();
>     }
> 
>     @Setup(Level.Invocation)
>     public void setup() {
>         theCounter.set(0);
>         threads = new Thread[threadCount];
> 
>         for (int i = 0; i< threads.length; i++) {
>             threads[i] = new Thread(this::work);
>         }
> 
>     }
> 
>     @Benchmark
>     public void trial() throws Exception {
>         for (int i = 0; i< threads.length; i++) {
>             threads[i].start();
>         }
>         for (int i = 0; i< threads.length; i++) {
>             threads[i].join();
>         }
>     }
> }
> 
> Before:
> 
> Benchmark               (maxNum)  (threadCount)  Mode  Cnt   Score    Error  Units
> ThreadOnSpinWait.trial   1000000              2  avgt    3  43.830 ? 32.543  ms/op
> 
> With `-XX:OnSpinWaitInst=isb -XX:OnSpinWaitInstCount=4`
> 
> Benchmark               (maxNum)  (threadCount)  Mode  Cnt   Score    Error  Units
> ThreadOnSpinWait.trial   1000000              2  avgt    3  22.181 ? 11.592  ms/op
> 
> With `-XX:OnSpinWaitInst=isb -XX:OnSpinWaitInstCount=1`
> 
> Benchmark               (maxNum)  (threadCount)  Mode  Cnt   Score    Error  Units
> ThreadOnSpinWait.trial   1000000              2  avgt    3  36.281 ? 31.700  ms/op
> 
> 
> This is Apple M1, where you have to be very careful because there's some processor
> frequency scaling going on.

Hi @theRealAph,
I see there are no other comments.
Can I proceed to integrate?

-------------

PR: https://git.openjdk.java.net/jdk/pull/5562

From aph at openjdk.java.net  Mon Nov  1 13:11:08 2021
From: aph at openjdk.java.net (Andrew Haley)
Date: Mon, 1 Nov 2021 13:11:08 GMT
Subject: RFR: 8276217: Harmonize StrictMath intrinsics handling
In-Reply-To: <v9bs8_XIuFrN450gme8g0FcjBEOepcUmr3MStF-B7pg=.5ee1a282-da4b-4ec0-a71c-f5321ff702c8@github.com>
References: <v9bs8_XIuFrN450gme8g0FcjBEOepcUmr3MStF-B7pg=.5ee1a282-da4b-4ec0-a71c-f5321ff702c8@github.com>
Message-ID: <MwXf1SkRGmfHmCvncZuvfuIDgeucSc-bgWsnd398wzw=.2897d2b9-e9bb-4e8b-a7e3-0e3024de3820@github.com>

On Mon, 1 Nov 2021 11:23:16 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> This blocks JDK-8276215: `StrictMath` intrinsics are handled peculiarly by giving failing intrinsics a second chance to match against the similar `Math` intrinsics. This has interesting consequence for matchers: we can match the native `StrictMath.sqrt` to non-native intrinsic for `Math.sqrt`. Interpreter would then have to disambiguate the two. It could be made simpler and more consistent.
> 
> For `min`/`max` methods, `StrictMath` already delegates to `Math` methods, so we can just drop the intrinsics for them. `sqrt` is harder to delegate, because it is `native` and a part of public API, so we can instead do the proper special intrinsic for it.
> 
> There seem to be no performance regressions with this patch at least on Linux x86_64:
> 
> 
> $ CONF=linux-x86_64-server-release make test TEST="micro:StrictMathBench" 
> 
> Benchmark                   Mode  Cnt       Score     Error   Units
> 
> ### Before
> 
> StrictMathBench.minDouble  thrpt    4  230921.558 ? 234.238  ops/ms
> StrictMathBench.minFloat   thrpt    4  230932.303 ? 126.721  ops/ms
> StrictMathBench.minInt     thrpt    4  230917.256 ?  73.008  ops/ms
> StrictMathBench.minLong    thrpt    4  194460.828 ? 178.079  ops/ms
> 
> 
> StrictMathBench.maxDouble  thrpt    4  230983.180 ? 161.211  ops/ms
> StrictMathBench.maxFloat   thrpt    4  230969.290 ? 277.500  ops/ms
> StrictMathBench.maxInt     thrpt    4  231033.581 ? 200.015  ops/ms
> StrictMathBench.maxLong    thrpt    4  194590.744 ? 114.295  ops/ms
> 
> 
> StrictMathBench.sqrtDouble  thrpt    4  230722.037 ? 2222.080  ops/ms
> 
> ### After
> 
> StrictMathBench.minDouble  thrpt    4  230976.625 ?  67.338  ops/ms
> StrictMathBench.minFloat   thrpt    4  230896.021 ? 270.434  ops/ms
> StrictMathBench.minInt     thrpt    4  230859.741 ? 403.147  ops/ms
> StrictMathBench.minLong    thrpt    4  194456.673 ? 111.557  ops/ms
> 
> StrictMathBench.maxDouble  thrpt    4  230890.776 ?  89.924  ops/ms
> StrictMathBench.maxFloat   thrpt    4  230918.334 ?  63.160  ops/ms
> StrictMathBench.maxInt     thrpt    4  231059.128 ?  51.224  ops/ms
> StrictMathBench.maxLong    thrpt    4  194488.210 ? 495.224  ops/ms
> 
> StrictMathBench.sqrtDouble  thrpt    4  231023.703 ? 247.330  ops/ms
> 
> 
> Additional testing:
>  - [x] `StrictMath` benchmarks
>  - [x] Linux x86_64 fastdebug `tier1`

So we have _dsqrt and_dsqrt_strict, which must be functionally identical, but we provide both names because they're part of a public API. I think this deserves an explanatory comment in the code.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6184

From aph at openjdk.java.net  Mon Nov  1 13:15:14 2021
From: aph at openjdk.java.net (Andrew Haley)
Date: Mon, 1 Nov 2021 13:15:14 GMT
Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v13]
In-Reply-To: <QgjNkBwREASPzII84F3dZx40HABtBTfpNwyBm9jU-eg=.7e340843-ec86-4122-8085-9411e9db3216@github.com>
References: <kNBCb5Kvq3poBJuzHn0mw_MP5ubeoyWUUvvkhkXt2dA=.70e0af5a-b45a-4962-9550-e518747e35fc@github.com>
 <G5Vy0H_xF5ugFVFp275IngvLejfHBoCpx8EcoAudnHw=.41e17cb9-417e-49ca-98c1-e3c4656a37f5@github.com>
 <mPTXDZyRIGV_0_sfp4Geh6ng2NhN6pNRLgqfMEo6FAw=.d0cb7552-673e-4f5f-8e6e-1339823bbedb@github.com>
 <pX7x19mPaCbDloBmj8WSPZRedZ3pGOShgSZwDKUQhXs=.b043f6bc-5cc5-4cb2-9d49-75dc69d3b0d7@github.com>
 <K_GrJFzVK9UZDyvt06am1ZaNtwFt4wOj9gholDojhhU=.97714c75-53ca-4e4c-a0c1-3bd192650f4e@github.com>
 <QgjNkBwREASPzII84F3dZx40HABtBTfpNwyBm9jU-eg=.7e340843-ec86-4122-8085-9411e9db3216@github.com>
Message-ID: <31DzKXEmMNYWZ1NL3FroXD7dCIDhwBJNzRotZCkKTqg=.30bc4ee0-9701-4cf7-925d-27901f47cdcc@github.com>

On Thu, 21 Oct 2021 15:19:47 GMT, Evgeny Astigeevich <duke at openjdk.java.net> wrote:

>> Looks good. I'm not entirely sure whether this test is truly representative of the real-world cases that people have seen, but if we find out more we can always add another JMH test.
>
> This test is too artificial. Going through my records I've found I have a microbenchmark for `java.util.concurrent. SynchronousQueue` which shows good improvements on jdk11. `SynchronousQueue` uses `onSpinWait`. Since jdk17 `SynchronousQueue` has not been using `onSpinWait` any more (See https://bugs.openjdk.java.net/browse/JDK-8267502). Maybe I can come up with a microbenchmark based on `SynchronousQueue` [code](https://github.com/openjdk/jdk11u-dev/blob/master/src/java.base/share/classes/java/util/concurrent/SynchronousQueue.java#L412):
> 
>         SNode awaitFulfill(SNode s, boolean timed, long nanos) {
>             /*
>              * When a node/thread is about to block, it sets its waiter
>              * field and then rechecks state at least one more time
>              * before actually parking, thus covering race vs
>              * fulfiller noticing that waiter is non-null so should be
>              * woken.
>              *
>              * When invoked by nodes that appear at the point of call
>              * to be at the head of the stack, calls to park are
>              * preceded by spins to avoid blocking when producers and
>              * consumers are arriving very close in time.  This can
>              * happen enough to bother only on multiprocessors.
>              *
>              * The order of checks for returning out of main loop
>              * reflects fact that interrupts have precedence over
>              * normal returns, which have precedence over
>              * timeouts. (So, on timeout, one last check for match is
>              * done before giving up.) Except that calls from untimed
>              * SynchronousQueue.{poll/offer} don't check interrupts
>              * and don't wait at all, so are trapped in transfer
>              * method rather than calling awaitFulfill.
>              */
>             final long deadline = timed ? System.nanoTime() + nanos : 0L;
>             Thread w = Thread.currentThread();
>             int spins = shouldSpin(s)
>                 ? (timed ? MAX_TIMED_SPINS : MAX_UNTIMED_SPINS)
>                 : 0;
>             for (;;) {
>                 if (w.isInterrupted())
>                     s.tryCancel();
>                 SNode m = s.match;
>                 if (m != null)
>                     return m;
>                 if (timed) {
>                     nanos = deadline - System.nanoTime();
>                     if (nanos <= 0L) {
>                         s.tryCancel();
>                         continue;
>                     }
>                 }
>                 if (spins > 0) {
>                     Thread.onSpinWait();
>                     spins = shouldSpin(s) ? (spins - 1) : 0;
>                 }
>                 else if (s.waiter == null)
>                     s.waiter = w; // establish waiter so can park next iter
>                 else if (!timed)
>                     LockSupport.park(this);
>                 else if (nanos > SPIN_FOR_TIMEOUT_THRESHOLD)
>                     LockSupport.parkNanos(this, nanos);
>             }
>         }
> 
> 
> I've created https://bugs.openjdk.java.net/browse/JDK-8275728 to write such a microbenchmark.

I suggest you do https://bugs.openjdk.java.net/browse/JDK-8275728 before you commit this. A benchmark which proves that this patch has some utility is needed, isn't it?

-------------

PR: https://git.openjdk.java.net/jdk/pull/5562

From shade at openjdk.java.net  Mon Nov  1 15:35:36 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Mon, 1 Nov 2021 15:35:36 GMT
Subject: RFR: 8276217: Harmonize StrictMath intrinsics handling [v2]
In-Reply-To: <v9bs8_XIuFrN450gme8g0FcjBEOepcUmr3MStF-B7pg=.5ee1a282-da4b-4ec0-a71c-f5321ff702c8@github.com>
References: <v9bs8_XIuFrN450gme8g0FcjBEOepcUmr3MStF-B7pg=.5ee1a282-da4b-4ec0-a71c-f5321ff702c8@github.com>
Message-ID: <kHdzI1o2YbuQVtEbBBMerRSdJddeAjDeAqleZ9tHAIs=.2be30613-bb11-41a3-9269-29647465c9ea@github.com>

> This blocks JDK-8276215: `StrictMath` intrinsics are handled peculiarly by giving failing intrinsics a second chance to match against the similar `Math` intrinsics. This has interesting consequence for matchers: we can match the native `StrictMath.sqrt` to non-native intrinsic for `Math.sqrt`. Interpreter would then have to disambiguate the two. It could be made simpler and more consistent.
> 
> For `min`/`max` methods, `StrictMath` already delegates to `Math` methods, so we can just drop the intrinsics for them. `sqrt` is harder to delegate, because it is `native` and a part of public API, so we can instead do the proper special intrinsic for it.
> 
> There seem to be no performance regressions with this patch at least on Linux x86_64:
> 
> 
> $ CONF=linux-x86_64-server-release make test TEST="micro:StrictMathBench" 
> 
> Benchmark                   Mode  Cnt       Score     Error   Units
> 
> ### Before
> 
> StrictMathBench.minDouble  thrpt    4  230921.558 ? 234.238  ops/ms
> StrictMathBench.minFloat   thrpt    4  230932.303 ? 126.721  ops/ms
> StrictMathBench.minInt     thrpt    4  230917.256 ?  73.008  ops/ms
> StrictMathBench.minLong    thrpt    4  194460.828 ? 178.079  ops/ms
> 
> 
> StrictMathBench.maxDouble  thrpt    4  230983.180 ? 161.211  ops/ms
> StrictMathBench.maxFloat   thrpt    4  230969.290 ? 277.500  ops/ms
> StrictMathBench.maxInt     thrpt    4  231033.581 ? 200.015  ops/ms
> StrictMathBench.maxLong    thrpt    4  194590.744 ? 114.295  ops/ms
> 
> 
> StrictMathBench.sqrtDouble  thrpt    4  230722.037 ? 2222.080  ops/ms
> 
> ### After
> 
> StrictMathBench.minDouble  thrpt    4  230976.625 ?  67.338  ops/ms
> StrictMathBench.minFloat   thrpt    4  230896.021 ? 270.434  ops/ms
> StrictMathBench.minInt     thrpt    4  230859.741 ? 403.147  ops/ms
> StrictMathBench.minLong    thrpt    4  194456.673 ? 111.557  ops/ms
> 
> StrictMathBench.maxDouble  thrpt    4  230890.776 ?  89.924  ops/ms
> StrictMathBench.maxFloat   thrpt    4  230918.334 ?  63.160  ops/ms
> StrictMathBench.maxInt     thrpt    4  231059.128 ?  51.224  ops/ms
> StrictMathBench.maxLong    thrpt    4  194488.210 ? 495.224  ops/ms
> 
> StrictMathBench.sqrtDouble  thrpt    4  231023.703 ? 247.330  ops/ms
> 
> 
> Additional testing:
>  - [x] `StrictMath` benchmarks
>  - [x] Linux x86_64 fastdebug `tier1`

Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:

  Touchups

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6184/files
  - new: https://git.openjdk.java.net/jdk/pull/6184/files/4cd966dc..27202fa4

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6184&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6184&range=00-01

  Stats: 8 lines in 3 files changed: 4 ins; 2 del; 2 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6184.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6184/head:pull/6184

PR: https://git.openjdk.java.net/jdk/pull/6184

From shade at openjdk.java.net  Mon Nov  1 15:35:36 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Mon, 1 Nov 2021 15:35:36 GMT
Subject: RFR: 8276217: Harmonize StrictMath intrinsics handling [v2]
In-Reply-To: <MwXf1SkRGmfHmCvncZuvfuIDgeucSc-bgWsnd398wzw=.2897d2b9-e9bb-4e8b-a7e3-0e3024de3820@github.com>
References: <v9bs8_XIuFrN450gme8g0FcjBEOepcUmr3MStF-B7pg=.5ee1a282-da4b-4ec0-a71c-f5321ff702c8@github.com>
 <MwXf1SkRGmfHmCvncZuvfuIDgeucSc-bgWsnd398wzw=.2897d2b9-e9bb-4e8b-a7e3-0e3024de3820@github.com>
Message-ID: <WBT0X0rM4mLnFTTt6fn2PF5n-1UJDVQww1JIHp_LyFU=.6a4fb8f8-048e-4c2f-bd37-3a6265012c9a@github.com>

On Mon, 1 Nov 2021 13:08:05 GMT, Andrew Haley <aph at openjdk.org> wrote:

> So we have _dsqrt and_dsqrt_strict, which must be functionally identical, but we provide both names because they're part of a public API. I think this deserves an explanatory comment in the code.

Yes, no problem, added comment near intrinsic definition.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6184

From mcimadamore at openjdk.java.net  Mon Nov  1 17:15:55 2021
From: mcimadamore at openjdk.java.net (Maurizio Cimadamore)
Date: Mon, 1 Nov 2021 17:15:55 GMT
Subject: RFR: 8275063: Implementation of Foreign Function & Memory API
 (Second incubator) [v11]
In-Reply-To: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
References: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
Message-ID: <8DLqVOZo6ZXYqntQe91nI4wIKu0_gn0DY-l8MA2rznM=.fdab6f3c-119e-492d-b61c-6314d51cdd58@github.com>

> This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment.
> 
> [1] - https://openjdk.java.net/jeps/419

Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision:

  Fix liveness issue with loader lookups

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/5907/files
  - new: https://git.openjdk.java.net/jdk/pull/5907/files/9b519343..17f45861

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=10
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=09-10

  Stats: 191 lines in 6 files changed: 187 ins; 0 del; 4 mod
  Patch: https://git.openjdk.java.net/jdk/pull/5907.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/5907/head:pull/5907

PR: https://git.openjdk.java.net/jdk/pull/5907

From shade at openjdk.java.net  Mon Nov  1 17:54:09 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Mon, 1 Nov 2021 17:54:09 GMT
Subject: RFR: 8252990: Intrinsify Unsafe.storeStoreFence [v2]
In-Reply-To: <B-PzmmTo_XpAh8NticuAL1a0KidAJvUbEjT2hiYqLhk=.408dc04e-5082-42df-a591-241816725e4a@github.com>
References: <ZWT707BgFCyrHdx6AEgAmCAEeOystOlYAEVD6WT7fSg=.6ddbc7c3-89cb-4bfe-90e2-dcf3e16624e3@github.com>
 <B-PzmmTo_XpAh8NticuAL1a0KidAJvUbEjT2hiYqLhk=.408dc04e-5082-42df-a591-241816725e4a@github.com>
Message-ID: <TUfmTpeaGo5CtJXeTgfAqpdlbakd5dXk2IUnOgpuIoU=.3ce7c08c-ec2d-4cf7-9262-8f5dadf3e8c3@github.com>

On Thu, 28 Oct 2021 08:58:48 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> `Unsafe.storeStoreFence` currently delegates to stronger `Unsafe.storeFence`. We can teach compilers to map this directly to already existing rules that handle `MemBarStoreStore`. Like explicit `LoadFence`/`StoreFence`, we introduce the special node to differentiate explicit fence and implicit store-store barriers. `storeStoreFence` is usually used to simulate safe `final`-field like constructions in special JDK classes, like `ConstantCallSite` and friends.
>> 
>> Motivational performance difference on benchmarks from JDK-8276054 on ARM32 (Raspberry Pi 4):
>> 
>> 
>> Benchmark                      Mode  Cnt   Score    Error  Units
>> Multiple.plain                 avgt    3   2.669 ?  0.004  ns/op
>> Multiple.release               avgt    3  16.688 ?  0.057  ns/op
>> Multiple.storeStore            avgt    3  14.021 ?  0.144  ns/op // Better
>> 
>> MultipleWithLoads.plain        avgt    3   4.672 ?  0.053  ns/op
>> MultipleWithLoads.release      avgt    3  16.689 ?  0.044  ns/op
>> MultipleWithLoads.storeStore   avgt    3  14.012 ?  0.010  ns/op // Better
>> 
>> MultipleWithStores.plain       avgt    3  14.687 ?  0.009  ns/op
>> MultipleWithStores.release     avgt    3  45.393 ?  0.192  ns/op
>> MultipleWithStores.storeStore  avgt    3  38.048 ?  0.033  ns/op // Better
>> 
>> Publishing.plain               avgt    3  27.079 ?  0.201  ns/op
>> Publishing.release             avgt    3  27.088 ?  0.241  ns/op
>> Publishing.storeStore          avgt    3  27.009 ?  0.259  ns/op // Within error, hidden by allocation
>> 
>> Single.plain                   avgt    3   2.670 ? 0.002  ns/op
>> Single.releaseFence            avgt    3   6.675 ? 0.001  ns/op
>> Single.storeStoreFence         avgt    3   8.012 ? 0.027  ns/op  // Worse, seems to be ARM32 implementation artifact
>> 
>> 
>> The same thing on AArch64 (Raspberry Pi 3):
>> 
>> 
>> Benchmark                      Mode  Cnt   Score   Error  Units
>> 
>> Multiple.plain                 avgt    3   5.914 ? 0.115  ns/op
>> Multiple.release               avgt    3  10.149 ? 0.059  ns/op
>> Multiple.storeStore            avgt    3   6.757 ? 0.138  ns/op // Better
>> 
>> MultipleWithLoads.plain        avgt    3  11.849 ? 0.331  ns/op
>> MultipleWithLoads.release      avgt    3  35.565 ? 1.144  ns/op
>> MultipleWithLoads.storeStore   avgt    3  19.441 ? 0.471  ns/op // Better
>> 
>> MultipleWithStores.plain       avgt    3   5.920 ? 0.213  ns/op
>> MultipleWithStores.release     avgt    3  20.286 ? 0.347  ns/op
>> MultipleWithStores.storeStore  avgt    3  12.686 ? 0.230  ns/op // Better
>> 
>> Publishing.plain               avgt    3  22.261 ? 1.630  ns/op
>> Publishing.release             avgt    3  22.269 ? 0.576  ns/op
>> Publishing.storeStore          avgt    3  17.464 ? 0.397  ns/op // Better
>> 
>> Single.plain                   avgt    3   5.916 ? 0.063  ns/op
>> Single.release                 avgt    3  10.148 ? 0.401  ns/op
>> Single.storeStore              avgt    3   6.767 ? 0.164  ns/op // Better
>> 
>> 
>> As expected, this does not affect x86_64 at all, because both `release` and `storeStore` are effectively no-ops, only affecting compiler optimizations:
>> 
>> 
>> Benchmark                      Mode  Cnt  Score   Error  Units
>> 
>> Multiple.plain                 avgt    3  0.406 ? 0.002  ns/op
>> Multiple.release               avgt    3  0.409 ? 0.018  ns/op
>> Multiple.storeStore            avgt    3  0.406 ? 0.001  ns/op
>> 
>> MultipleWithLoads.plain        avgt    3  4.328 ? 0.006  ns/op
>> MultipleWithLoads.release      avgt    3  4.600 ? 0.014  ns/op
>> MultipleWithLoads.storeStore   avgt    3  4.602 ? 0.006  ns/op
>> 
>> MultipleWithStores.plain       avgt    3  0.812 ? 0.001  ns/op
>> MultipleWithStores.release     avgt    3  0.812 ? 0.002  ns/op
>> MultipleWithStores.storeStore  avgt    3  0.812 ? 0.002  ns/op
>> 
>> Publishing.plain               avgt    3  6.370 ? 0.059  ns/op
>> Publishing.release             avgt    3  6.358 ? 0.436  ns/op
>> Publishing.storeStore          avgt    3  6.367 ? 0.054  ns/op
>> 
>> Single.plain                   avgt    3  0.407 ? 0.039  ns/op
>> Single.releaseFence            avgt    3  0.406 ? 0.001  ns/op
>> Single.storeStoreFence         avgt    3  0.406 ? 0.001  ns/op
>> 
>> 
>> Additional testing:
>>  - [x] Linux x86_64 fastdebug `tier1`
>
> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix the comment to match JDK-8276096

Finally revived my quiet AArch64 dev board, added AArch64 results, which are even better than ARM32. Updated PR with perf results.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6136

From harold.seigel at oracle.com  Mon Nov  1 17:57:55 2021
From: harold.seigel at oracle.com (Harold Seigel)
Date: Mon, 1 Nov 2021 13:57:55 -0400
Subject: Incorrect hehavior on the class name (UTF8) in the constant pool
 of bytecode
In-Reply-To: <OFE23357DD.C2648D74-ON0025877C.00838A7A-8525877D.00025795@ibm.com>
References: <OFE23357DD.C2648D74-ON0025877C.00838A7A-8525877D.00025795@ibm.com>
Message-ID: <975eac75-2454-4c90-507d-290384f3a5f5@oracle.com>

Hi Cheng,

Thank you for reporting this problem and providing a samle program.? 
I've created JBS bug https://bugs.openjdk.java.net/browse/JDK-8276241 
for this problem.? You can follow progress of the issue by using that link.

Thanks, Harold

On 10/28/2021 8:25 PM, Cheng Jin wrote:
> One or more of the following files ( dumped.class ) violates IBM policy and all attachment(s) have been removed from the message.
>
> **********************************************************************
>
>
> Hi There,
>
> I created a simple test that loads a class file as follows to see whether a
> package name in the constant pool is rejected as invalid for a class name.
> However, it surprised me that it just passed without any exception on
> Hotspot (e.g. OpenJDK11).
>
> (See attached file: dumped.class)
>
> constant_pool (in dumped.class)
> ...
> 3. Utf8
> tag: 1
> length: 16
> bytes: die/verwandlung/ <----- a package name rather than an valid class
> name
> 4. Class
> tag: 7
> name_index: 3 <------
>
>
> import java.io.*;
> public class CustomClassLoader extends ClassLoader {
>
>      @Override
>      public Class findClass(String fileName) throws ClassNotFoundException {
>          byte[] b = loadClassBytes(fileName);
>          return defineClass(null, b, 0, b.length);
>      }
>
>      private byte[] loadClassBytes(String fileName)  {
>          InputStream inputStream =
> getClass().getClassLoader().getResourceAsStream(fileName + ".class");
>          ByteArrayOutputStream byteOutStream = new ByteArrayOutputStream();
>          try {
>              int nextByte = 0;
>              while ((nextByte = inputStream.read()) != -1) {
>                  byteOutStream.write(nextByte);
>              }
>          } catch (IOException e) {
>              e.printStackTrace();
>          }
>          return byteOutStream.toByteArray();
>      }
>
>      public static void main(String args[]) {
>      try {
>         CustomClassLoader  cl = new CustomClassLoader();
>         cl.findClass("dumped");
>         System.out.println("DONE.....");
>        } catch (Exception e) {
>           e.printStackTrace();
>        }
>      }
> }
>
> $ jdk11_hotspot/bin/java  CustomClassLoader
> DONE.....
>
>
> According to the VM Spec at 4.2.1 Binary Class and Interface Names
>
> Class and interface names that appear in class file structures are always
> represented in a fully qualified form known as binary names (JLS ?13.1).
> ...In this internal form, the ASCII periods (.) that normally separate the
> identifiers which
> make up the binary name are replaced by ASCII forward slashes (/). The
> identifiers
> themselves must be unqualified names (?4.2.2).
>
> For example, the normal binary name of class Thread is java.lang.Thread. In
> the
> internal form used in descriptors in the class file format, a reference to
> the name of class
> Thread is implemented using a CONSTANT_Utf8_info structure representing the
> string
> java/lang/Thread.
>
> It means a valid class name should be something like "xxx" or
> "xxx/yyy/zzz" (where "/" only serves as the separator in between, and "/"
> shouldn't occur at the end),
> in which case "xxx/yyy/" is treated as invalid for a class name.
>
>
> So I am wondering why Hotspot doesn't follow the VM Spec to check the
> invalid package name in the constant pool.
>
>
> Thanks and Best Regards
> Cheng Jin

From kvn at openjdk.java.net  Mon Nov  1 18:48:10 2021
From: kvn at openjdk.java.net (Vladimir Kozlov)
Date: Mon, 1 Nov 2021 18:48:10 GMT
Subject: RFR: 8276217: Harmonize StrictMath intrinsics handling [v2]
In-Reply-To: <kHdzI1o2YbuQVtEbBBMerRSdJddeAjDeAqleZ9tHAIs=.2be30613-bb11-41a3-9269-29647465c9ea@github.com>
References: <v9bs8_XIuFrN450gme8g0FcjBEOepcUmr3MStF-B7pg=.5ee1a282-da4b-4ec0-a71c-f5321ff702c8@github.com>
 <kHdzI1o2YbuQVtEbBBMerRSdJddeAjDeAqleZ9tHAIs=.2be30613-bb11-41a3-9269-29647465c9ea@github.com>
Message-ID: <fZ03otugaFJG-_AuKyVFnv7kZo5r8i4v6huX9kQWudQ=.1cfcf38a-e291-47e7-a51b-f12529ff0588@github.com>

On Mon, 1 Nov 2021 15:35:36 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> This blocks JDK-8276215: `StrictMath` intrinsics are handled peculiarly by giving failing intrinsics a second chance to match against the similar `Math` intrinsics. This has interesting consequence for matchers: we can match the native `StrictMath.sqrt` to non-native intrinsic for `Math.sqrt`. Interpreter would then have to disambiguate the two. It could be made simpler and more consistent.
>> 
>> For `min`/`max` methods, `StrictMath` already delegates to `Math` methods, so we can just drop the intrinsics for them. `sqrt` is harder to delegate, because it is `native` and a part of public API, so we can instead do the proper special intrinsic for it.
>> 
>> There seem to be no performance regressions with this patch at least on Linux x86_64:
>> 
>> 
>> $ CONF=linux-x86_64-server-release make test TEST="micro:StrictMathBench" 
>> 
>> Benchmark                   Mode  Cnt       Score     Error   Units
>> 
>> ### Before
>> 
>> StrictMathBench.minDouble  thrpt    4  230921.558 ? 234.238  ops/ms
>> StrictMathBench.minFloat   thrpt    4  230932.303 ? 126.721  ops/ms
>> StrictMathBench.minInt     thrpt    4  230917.256 ?  73.008  ops/ms
>> StrictMathBench.minLong    thrpt    4  194460.828 ? 178.079  ops/ms
>> 
>> 
>> StrictMathBench.maxDouble  thrpt    4  230983.180 ? 161.211  ops/ms
>> StrictMathBench.maxFloat   thrpt    4  230969.290 ? 277.500  ops/ms
>> StrictMathBench.maxInt     thrpt    4  231033.581 ? 200.015  ops/ms
>> StrictMathBench.maxLong    thrpt    4  194590.744 ? 114.295  ops/ms
>> 
>> 
>> StrictMathBench.sqrtDouble  thrpt    4  230722.037 ? 2222.080  ops/ms
>> 
>> ### After
>> 
>> StrictMathBench.minDouble  thrpt    4  230976.625 ?  67.338  ops/ms
>> StrictMathBench.minFloat   thrpt    4  230896.021 ? 270.434  ops/ms
>> StrictMathBench.minInt     thrpt    4  230859.741 ? 403.147  ops/ms
>> StrictMathBench.minLong    thrpt    4  194456.673 ? 111.557  ops/ms
>> 
>> StrictMathBench.maxDouble  thrpt    4  230890.776 ?  89.924  ops/ms
>> StrictMathBench.maxFloat   thrpt    4  230918.334 ?  63.160  ops/ms
>> StrictMathBench.maxInt     thrpt    4  231059.128 ?  51.224  ops/ms
>> StrictMathBench.maxLong    thrpt    4  194488.210 ? 495.224  ops/ms
>> 
>> StrictMathBench.sqrtDouble  thrpt    4  231023.703 ? 247.330  ops/ms
>> 
>> 
>> Additional testing:
>>  - [x] `StrictMath` benchmarks
>>  - [x] Linux x86_64 fastdebug `tier1`
>
> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Touchups

Removing intrinsics for StrictMatch `min/max` methods may prevent them from inlining if they are not hot when caller is compiled.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6184

From shade at openjdk.java.net  Mon Nov  1 19:00:13 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Mon, 1 Nov 2021 19:00:13 GMT
Subject: RFR: 8276217: Harmonize StrictMath intrinsics handling [v2]
In-Reply-To: <fZ03otugaFJG-_AuKyVFnv7kZo5r8i4v6huX9kQWudQ=.1cfcf38a-e291-47e7-a51b-f12529ff0588@github.com>
References: <v9bs8_XIuFrN450gme8g0FcjBEOepcUmr3MStF-B7pg=.5ee1a282-da4b-4ec0-a71c-f5321ff702c8@github.com>
 <kHdzI1o2YbuQVtEbBBMerRSdJddeAjDeAqleZ9tHAIs=.2be30613-bb11-41a3-9269-29647465c9ea@github.com>
 <fZ03otugaFJG-_AuKyVFnv7kZo5r8i4v6huX9kQWudQ=.1cfcf38a-e291-47e7-a51b-f12529ff0588@github.com>
Message-ID: <l_qRsEdA6eNLlkcdiV9y_M7_R2iYiBRIuMZzqYxMwR4=.0f825d04-bbf8-4ddc-8979-98e38fee6c83@github.com>

On Mon, 1 Nov 2021 18:44:53 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

> Removing intrinsics for StrictMatch `min/max` methods may prevent them from inlining if they are not hot when caller is compiled.

Would you like me to leave them instead? That would mean we introduce these new intrinsic definitions:


  /* StrictMath intrinsics, similar to what we have in Math. */                                                         \
  do_intrinsic(_min_strict,               java_lang_StrictMath,   min_name,           int2_int_signature,        F_S)   \
  do_intrinsic(_max_strict,               java_lang_StrictMath,   max_name,           int2_int_signature,        F_S)   \
  do_intrinsic(_minF_strict,              java_lang_StrictMath,   min_name,           float2_float_signature,    F_S)   \
  do_intrinsic(_maxF_strict,              java_lang_StrictMath,   max_name,           float2_float_signature,    F_S)   \
  do_intrinsic(_minD_strict,              java_lang_StrictMath,   min_name,           double2_double_signature,  F_S)   \
  do_intrinsic(_maxD_strict,              java_lang_StrictMath,   max_name,           double2_double_signature,  F_S)   \
  /* Special flavor of dsqrt intrinsic to handle the "native" method in StrictMath. Otherwise the same as in Math. */   \
  do_intrinsic(_dsqrt_strict,             java_lang_StrictMath,   sqrt_name,          double_double_signature,   F_SN)  \

-------------

PR: https://git.openjdk.java.net/jdk/pull/6184

From mcimadamore at openjdk.java.net  Mon Nov  1 22:36:40 2021
From: mcimadamore at openjdk.java.net (Maurizio Cimadamore)
Date: Mon, 1 Nov 2021 22:36:40 GMT
Subject: RFR: 8275063: Implementation of Foreign Function & Memory API
 (Second incubator) [v12]
In-Reply-To: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
References: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
Message-ID: <LZdOj0F56w1jvphEqwIPKjBGVqK9bK18hoyBRS7LvH8=.28940500-42f1-4550-94ab-4dc090785924@github.com>

> This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment.
> 
> [1] - https://openjdk.java.net/jeps/419

Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision:

  Tweak javadoc of loaderLookup

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/5907/files
  - new: https://git.openjdk.java.net/jdk/pull/5907/files/17f45861..7cf4fcd9

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=11
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=10-11

  Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod
  Patch: https://git.openjdk.java.net/jdk/pull/5907.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/5907/head:pull/5907

PR: https://git.openjdk.java.net/jdk/pull/5907

From kvn at openjdk.java.net  Mon Nov  1 23:02:10 2021
From: kvn at openjdk.java.net (Vladimir Kozlov)
Date: Mon, 1 Nov 2021 23:02:10 GMT
Subject: RFR: 8276217: Harmonize StrictMath intrinsics handling [v2]
In-Reply-To: <kHdzI1o2YbuQVtEbBBMerRSdJddeAjDeAqleZ9tHAIs=.2be30613-bb11-41a3-9269-29647465c9ea@github.com>
References: <v9bs8_XIuFrN450gme8g0FcjBEOepcUmr3MStF-B7pg=.5ee1a282-da4b-4ec0-a71c-f5321ff702c8@github.com>
 <kHdzI1o2YbuQVtEbBBMerRSdJddeAjDeAqleZ9tHAIs=.2be30613-bb11-41a3-9269-29647465c9ea@github.com>
Message-ID: <WdqlhQhVbxe2NIqSwyOmc8yn5rqq6EinwBBW3zcGtoA=.6187110e-46d5-4e05-adb4-106f01858af2@github.com>

On Mon, 1 Nov 2021 15:35:36 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> This blocks JDK-8276215: `StrictMath` intrinsics are handled peculiarly by giving failing intrinsics a second chance to match against the similar `Math` intrinsics. This has interesting consequence for matchers: we can match the native `StrictMath.sqrt` to non-native intrinsic for `Math.sqrt`. Interpreter would then have to disambiguate the two. It could be made simpler and more consistent.
>> 
>> For `min`/`max` methods, `StrictMath` already delegates to `Math` methods, so we can just drop the intrinsics for them. `sqrt` is harder to delegate, because it is `native` and a part of public API, so we can instead do the proper special intrinsic for it.
>> 
>> There seem to be no performance regressions with this patch at least on Linux x86_64:
>> 
>> 
>> $ CONF=linux-x86_64-server-release make test TEST="micro:StrictMathBench" 
>> 
>> Benchmark                   Mode  Cnt       Score     Error   Units
>> 
>> ### Before
>> 
>> StrictMathBench.minDouble  thrpt    4  230921.558 ? 234.238  ops/ms
>> StrictMathBench.minFloat   thrpt    4  230932.303 ? 126.721  ops/ms
>> StrictMathBench.minInt     thrpt    4  230917.256 ?  73.008  ops/ms
>> StrictMathBench.minLong    thrpt    4  194460.828 ? 178.079  ops/ms
>> 
>> 
>> StrictMathBench.maxDouble  thrpt    4  230983.180 ? 161.211  ops/ms
>> StrictMathBench.maxFloat   thrpt    4  230969.290 ? 277.500  ops/ms
>> StrictMathBench.maxInt     thrpt    4  231033.581 ? 200.015  ops/ms
>> StrictMathBench.maxLong    thrpt    4  194590.744 ? 114.295  ops/ms
>> 
>> 
>> StrictMathBench.sqrtDouble  thrpt    4  230722.037 ? 2222.080  ops/ms
>> 
>> ### After
>> 
>> StrictMathBench.minDouble  thrpt    4  230976.625 ?  67.338  ops/ms
>> StrictMathBench.minFloat   thrpt    4  230896.021 ? 270.434  ops/ms
>> StrictMathBench.minInt     thrpt    4  230859.741 ? 403.147  ops/ms
>> StrictMathBench.minLong    thrpt    4  194456.673 ? 111.557  ops/ms
>> 
>> StrictMathBench.maxDouble  thrpt    4  230890.776 ?  89.924  ops/ms
>> StrictMathBench.maxFloat   thrpt    4  230918.334 ?  63.160  ops/ms
>> StrictMathBench.maxInt     thrpt    4  231059.128 ?  51.224  ops/ms
>> StrictMathBench.maxLong    thrpt    4  194488.210 ? 495.224  ops/ms
>> 
>> StrictMathBench.sqrtDouble  thrpt    4  231023.703 ? 247.330  ops/ms
>> 
>> 
>> Additional testing:
>>  - [x] `StrictMath` benchmarks
>>  - [x] Linux x86_64 fastdebug `tier1`
>
> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Touchups

Yes, I am fine with new intrinsics for them.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6184

From psandoz at openjdk.java.net  Tue Nov  2 00:27:23 2021
From: psandoz at openjdk.java.net (Paul Sandoz)
Date: Tue, 2 Nov 2021 00:27:23 GMT
Subject: RFR: 8275063: Implementation of Foreign Function & Memory API
 (Second incubator) [v10]
In-Reply-To: <wUAuOEYgLOLXwm-WgtLLMj3GKVJ2nFIZazQSi16zaXc=.4637910d-a678-4d1a-89d9-82f439ade88a@github.com>
References: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
 <wUAuOEYgLOLXwm-WgtLLMj3GKVJ2nFIZazQSi16zaXc=.4637910d-a678-4d1a-89d9-82f439ade88a@github.com>
Message-ID: <qLFAt0lFdftR7B2zt_HX1F7hhhqR0_rc8xjJI5i45fY=.f501e843-ed35-400d-ba0e-0e8711505006@github.com>

On Mon, 1 Nov 2021 12:05:32 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

>> This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment.
>> 
>> [1] - https://openjdk.java.net/jeps/419
>
> Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits:
> 
>  - Add cache for memory address var handles
>  - Merge branch 'master' into JEP-419
>  - Fix regression in VaList treatment on AArch64 (contributed by @nick-arm)
>  - Merge branch 'master' into JEP-419
>  - Fix copyright header in TestArrayCopy
>  - Fix failing microbenchmarks. Contributed by @FrauBoes (thanks!)
>  - * use `invokeWithArguments` to simplify new test
>  - Add test for liveness check with high-aririty downcalls
>    (make sure that if an exception occurs in a downcall because of liveness,
>    ref count of other resources are left intact).
>  - * Fix javadoc issue in VaList
>    * Fix bug in concurrent logic for shared scope acquire
>  - Address review comments
>  - ... and 7 more: https://git.openjdk.java.net/jdk/compare/5bb1992b...9b519343

src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/Utils.java line 111:

> 109:         class VarHandleCache {
> 110:             private static final Map<ValueLayout, VarHandle> handleMap = new ConcurrentHashMap<>();
> 111:             private static final Map<ValueLayout, VarHandle> handleMapNoAlignCheck = new ConcurrentHashMap<>();

Something to consider later if this is an issue. Since the number of `ValueLayout` instances is fixed, carrier x order = 18, we can use stable arrays with ordinals on the instances.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5907

From duke at openjdk.java.net  Tue Nov  2 02:31:15 2021
From: duke at openjdk.java.net (Vamsi Parasa)
Date: Tue, 2 Nov 2021 02:31:15 GMT
Subject: RFR: 8275167: x86 intrinsic for unsignedMultiplyHigh [v2]
In-Reply-To: <JaCrAre7A7edrch3X9n-403kw2vDPv3GUN0cmGlWoZk=.62f14d8d-671b-4dc3-a67f-0ace55a2d45a@github.com>
References: <7IzrZdL0elgXbuisyLNYC2wkyOTe1RHUPuGRI7YsAQ4=.aed9dea3-4775-4592-b43e-c3e08e167f90@github.com>
 <JaCrAre7A7edrch3X9n-403kw2vDPv3GUN0cmGlWoZk=.62f14d8d-671b-4dc3-a67f-0ace55a2d45a@github.com>
Message-ID: <XYdx7JDFI58BVyndyZLZslPoyZO0ZgVF5BquT8PSPoQ=.88e47796-bea2-48dd-9d37-2451af36a971@github.com>

On Tue, 19 Oct 2021 20:34:55 GMT, Vamsi Parasa <duke at openjdk.java.net> wrote:

>> Optimize the new Math.unsignedMultiplyHigh using the x86 mul instruction. This change show 1.87X improvement on a micro benchmark.
>
> Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision:
> 
>   refactoring to remove code duplication by using a common routine for UMulHiLNode and MulHiLNode

Thank you for spotting the stale comment. It will removed in another related commit that will be pushed soon...

-------------

PR: https://git.openjdk.java.net/jdk/pull/5933

From denghui.ddh at alibaba-inc.com  Tue Nov  2 03:09:56 2021
From: denghui.ddh at alibaba-inc.com (Denghui Dong)
Date: Tue, 02 Nov 2021 11:09:56 +0800
Subject: =?UTF-8?B?UmU6IFJGQzogRXh0ZW5kIERDbWQoRGlhZ25vc3RpYy1Db21tYW5kKSBmcmFtZXdvcmsgdG8g?=
 =?UTF-8?B?c3VwcG9ydCBKYXZhIGxldmVsIERDbWQ=?=
In-Reply-To: <6581e9e4-851a-c562-37af-915ff3fdc492@oracle.com>
References: <7f2df6ad-7d73-46ac-a23e-959fd6b4d4af.denghui.ddh@alibaba-inc.com>,
 <6581e9e4-851a-c562-37af-915ff3fdc492@oracle.com>
Message-ID: <8a5c6087-a131-4ddd-9195-0f1e51705351.denghui.ddh@alibaba-inc.com>

Hi Chris,

Thank you for the comments.

Yes, we have no good way to restrict the user registration commands to only include diagnosis-related operations, but in my opinion, this does not seem to be a problem that must be solved perfectly.

The following are my thoughts.

This extension is an entry that triggers the operation that the user wants to perform (similar to the Signal Handler mechanism but with a name and parameters). Even without this extension, the user can have other ways to achieve the same goal.

On the one hand, we could standardize the usage scenarios of the API on the document(Indeed, users can still write programs not in accordance with the specifications, for example, users can implement multiple calls to the same object's hachCode method to return different values or make an object alive again during finalize method executing).

On the other hand, we can add some restrictions to help users make better use of this extension.
e.g we can add a new VM option, such as EnableUserLevelDCmd, the application can only register customer commands when this option is enabled.

Or from another perspective, can we allow users to do some non-diagnostic-related operations in custom commands?

Best,
Denghui
------------------------------------------------------------------
From:Chris Plummer <chris.plummer at oracle.com>
Send Time:2021?11?2?(???) 03:35
To:???(??) <denghui.ddh at alibaba-inc.com>; serviceability-dev <serviceability-dev at openjdk.java.net>; hotspot-dev <hotspot-dev at openjdk.java.net>
Subject:Re: RFC: Extend DCmd(Diagnostic-Command) framework to support Java level DCmd

I have similar concerns to those others have expressed, so I'll try to add something new to the discussion and not just repeat.

 DCMDs have historically been very VM centric. That's not to say they aren't useful for debugging applications, but they do so by providing VM related info like stack traces, heap dumps, and class histograms. Also hotspot has been the gatekeeper for new DCMDs, meaning that new ones do not get added without going through the hotspot review process.

 Allowing any application or framework to add a DCMD changes this VM centric view in a way that concerns me. This approach allows a DCMD to pretty much do anything (java security not withstanding). App writers could even use them to provide a user facing interface. For example, if an app has some sort internal database, it could allow users to query it via a DCMD, and maybe even suggest that users write simple shell scripts that use jcmd to do these queries. Allowing this type of non-diagnostic usage seems like a path we don't want to go down, yet I don't see how it can be prevented once you allow applications to add DCMDs.

 Chris

 On 10/25/21 1:37 AM, Denghui Dong wrote:
Hi there!

 We'd like to discuss a proposal for extending the current DCmd framework to support Java level DCmd.

 At present, DCmd only allows the VM to register commands, which can be called through jcmd or JMX. It would be beneficial if the user could create their own commands.

 The idea of this extension originally came from our internal Java agent that detects the misusage of Unsafe API.

 This agent can collect the call sites that allocate or free direct memory in the application(NMT could not do it IMO) to detect direct memory leaks.

 In the beginning, it just prints all call sites, without any statistical function, it's hard to use.

 So we plan to use a way similar to jeprof (from jemalloc) to generate a report file that aggregates all useful information.

 During the implementation process, we found that we need a mechanism to notify the agent to generate reports.

 The common practice is:
 a) Register a service port, triggered by an HTTP request
 b) Triggered by signal
 c) Generate reports periodically, or when the process exits

 But these three ways have certain problems.
 For a) we need to introduce a network component, will increase the complexity of implementation
 For b) we cannot pass parameters
 For c) some files that may never be used will be generated

 Essentially, this question is how to notify the application to do a certain task, or in other words, how do we issue a command to the application. We believe that other Java developers will also encounter similar problems. 
 (And sometimes there may be multiple unrelated dependent components in a Java application that require such a mechanism.)

 Naturally, we think that jcmd can already issue some commands registered in VM to the application, why can't we extend to the java level?

 This feature will be very useful for some lightweight tools, just like the scenario we encountered, to notify the tools to perform certain operations.

 In addition, this feature will also bring benefits to Java beginners.

 For example, in the beginning, beginners may not use advanced log components, but they will also encounter the need to output debug logs. They may write code like this:

 ```
     if (debug) {
       System.out.println("...");
     }
 ```

 If developers can easily control the value of debug, it's attractive.

 Like this:

 ```
     Factory.register("MyApp.flipDebug", out -> debug = !debug);

     jcmd <pid> MyApp.flipDebug
 ```

 For mainstream framework, we can apply this feature to trigger some common activities, such as health checks, graceful shutdown, and dynamic configuration updates, But to be honest, these frameworks are very mature and stable, and for compatibility purposes, it's hard to let them use this extension.

 Comments welcome!

 Thanks,
 Denghui 

From denghui.ddh at alibaba-inc.com  Tue Nov  2 03:20:47 2021
From: denghui.ddh at alibaba-inc.com (Denghui Dong)
Date: Tue, 02 Nov 2021 11:20:47 +0800
Subject: =?UTF-8?B?UmU6IFJGQzogRXh0ZW5kIERDbWQoRGlhZ25vc3RpYy1Db21tYW5kKSBmcmFtZXdvcmsgdG8g?=
 =?UTF-8?B?c3VwcG9ydCBKYXZhIGxldmVsIERDbWQ=?=
In-Reply-To: <8a5c6087-a131-4ddd-9195-0f1e51705351.denghui.ddh@alibaba-inc.com>
References: <7f2df6ad-7d73-46ac-a23e-959fd6b4d4af.denghui.ddh@alibaba-inc.com>,
 <6581e9e4-851a-c562-37af-915ff3fdc492@oracle.com>,
 <8a5c6087-a131-4ddd-9195-0f1e51705351.denghui.ddh@alibaba-inc.com>
Message-ID: <967efbed-b345-462a-943c-c171b410cc21.denghui.ddh@alibaba-inc.com>

By the way, Erik mentioned that the DCmd command in JFR is unlikely to use this extension.
But there are some other VM commands I think can be easily replaced with this extension,
such as RunFinalizationDCmd, FinalizerInfoDCmd, PrintSystemPropertiesDCmd, JMX-related DCmds, etc.

Denghui
------------------------------------------------------------------
From:???(??) <denghui.ddh at alibaba-inc.com>
Send Time:2021?11?2?(???) 11:09
To:serviceability-dev <serviceability-dev at openjdk.java.net>; hotspot-dev <hotspot-dev at openjdk.java.net>; Chris Plummer <chris.plummer at oracle.com>
Subject:Re: RFC: Extend DCmd(Diagnostic-Command) framework to support Java level DCmd

Hi Chris,

Thank you for the comments.

Yes, we have no good way to restrict the user registration commands to only include diagnosis-related operations, but in my opinion, this does not seem to be a problem that must be solved perfectly.

The following are my thoughts.

This extension is an entry that triggers the operation that the user wants to perform (similar to the Signal Handler mechanism but with a name and parameters). Even without this extension, the user can have other ways to achieve the same goal.

On the one hand, we could standardize the usage scenarios of the API on the document(Indeed, users can still write programs not in accordance with the specifications, for example, users can implement multiple calls to the same object's hachCode method to return different values or make an object alive again during finalize method executing).

On the other hand, we can add some restrictions to help users make better use of this extension.
e.g we can add a new VM option, such as EnableUserLevelDCmd, the application can only register customer commands when this option is enabled.

Or from another perspective, can we allow users to do some non-diagnostic-related operations in custom commands?

Best,
Denghui
------------------------------------------------------------------
From:Chris Plummer <chris.plummer at oracle.com>
Send Time:2021?11?2?(???) 03:35
To:???(??) <denghui.ddh at alibaba-inc.com>; serviceability-dev <serviceability-dev at openjdk.java.net>; hotspot-dev <hotspot-dev at openjdk.java.net>
Subject:Re: RFC: Extend DCmd(Diagnostic-Command) framework to support Java level DCmd

I have similar concerns to those others have expressed, so I'll try to add something new to the discussion and not just repeat.

 DCMDs have historically been very VM centric. That's not to say they aren't useful for debugging applications, but they do so by providing VM related info like stack traces, heap dumps, and class histograms. Also hotspot has been the gatekeeper for new DCMDs, meaning that new ones do not get added without going through the hotspot review process.

 Allowing any application or framework to add a DCMD changes this VM centric view in a way that concerns me. This approach allows a DCMD to pretty much do anything (java security not withstanding). App writers could even use them to provide a user facing interface. For example, if an app has some sort internal database, it could allow users to query it via a DCMD, and maybe even suggest that users write simple shell scripts that use jcmd to do these queries. Allowing this type of non-diagnostic usage seems like a path we don't want to go down, yet I don't see how it can be prevented once you allow applications to add DCMDs.

 Chris

 On 10/25/21 1:37 AM, Denghui Dong wrote:
Hi there!

 We'd like to discuss a proposal for extending the current DCmd framework to support Java level DCmd.

 At present, DCmd only allows the VM to register commands, which can be called through jcmd or JMX. It would be beneficial if the user could create their own commands.

 The idea of this extension originally came from our internal Java agent that detects the misusage of Unsafe API.

 This agent can collect the call sites that allocate or free direct memory in the application(NMT could not do it IMO) to detect direct memory leaks.

 In the beginning, it just prints all call sites, without any statistical function, it's hard to use.

 So we plan to use a way similar to jeprof (from jemalloc) to generate a report file that aggregates all useful information.

 During the implementation process, we found that we need a mechanism to notify the agent to generate reports.

 The common practice is:
 a) Register a service port, triggered by an HTTP request
 b) Triggered by signal
 c) Generate reports periodically, or when the process exits

 But these three ways have certain problems.
 For a) we need to introduce a network component, will increase the complexity of implementation
 For b) we cannot pass parameters
 For c) some files that may never be used will be generated

 Essentially, this question is how to notify the application to do a certain task, or in other words, how do we issue a command to the application. We believe that other Java developers will also encounter similar problems. 
 (And sometimes there may be multiple unrelated dependent components in a Java application that require such a mechanism.)

 Naturally, we think that jcmd can already issue some commands registered in VM to the application, why can't we extend to the java level?

 This feature will be very useful for some lightweight tools, just like the scenario we encountered, to notify the tools to perform certain operations.

 In addition, this feature will also bring benefits to Java beginners.

 For example, in the beginning, beginners may not use advanced log components, but they will also encounter the need to output debug logs. They may write code like this:

 ```
     if (debug) {
       System.out.println("...");
     }
 ```

 If developers can easily control the value of debug, it's attractive.

 Like this:

 ```
     Factory.register("MyApp.flipDebug", out -> debug = !debug);

     jcmd <pid> MyApp.flipDebug
 ```

 For mainstream framework, we can apply this feature to trigger some common activities, such as health checks, graceful shutdown, and dynamic configuration updates, But to be honest, these frameworks are very mature and stable, and for compatibility purposes, it's hard to let them use this extension.

 Comments welcome!

 Thanks,
 Denghui 

From shade at openjdk.java.net  Tue Nov  2 06:25:33 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Tue, 2 Nov 2021 06:25:33 GMT
Subject: RFR: 8276217: Harmonize StrictMath intrinsics handling [v3]
In-Reply-To: <v9bs8_XIuFrN450gme8g0FcjBEOepcUmr3MStF-B7pg=.5ee1a282-da4b-4ec0-a71c-f5321ff702c8@github.com>
References: <v9bs8_XIuFrN450gme8g0FcjBEOepcUmr3MStF-B7pg=.5ee1a282-da4b-4ec0-a71c-f5321ff702c8@github.com>
Message-ID: <x0dEIeACmCL8uU35Mbr_ej-ZMRgI9Ye-dE_APsoZYXs=.d90877ae-d6a1-48d1-b8fa-e235509a492b@github.com>

> This blocks JDK-8276215: `StrictMath` intrinsics are handled peculiarly by giving failing intrinsics a second chance to match against the similar `Math` intrinsics. This has interesting consequence for matchers: we can match the native `StrictMath.sqrt` to non-native intrinsic for `Math.sqrt`. Interpreter would then have to disambiguate the two. It could be made simpler and more consistent.
> 
> For `min`/`max` methods, `StrictMath` already delegates to `Math` methods, so we can just drop the intrinsics for them. `sqrt` is harder to delegate, because it is `native` and a part of public API, so we can instead do the proper special intrinsic for it.
> 
> There seem to be no performance regressions with this patch at least on Linux x86_64:
> 
> 
> $ CONF=linux-x86_64-server-release make test TEST="micro:StrictMathBench" 
> 
> Benchmark                   Mode  Cnt       Score     Error   Units
> 
> ### Before
> 
> StrictMathBench.minDouble  thrpt    4  230921.558 ? 234.238  ops/ms
> StrictMathBench.minFloat   thrpt    4  230932.303 ? 126.721  ops/ms
> StrictMathBench.minInt     thrpt    4  230917.256 ?  73.008  ops/ms
> StrictMathBench.minLong    thrpt    4  194460.828 ? 178.079  ops/ms
> 
> 
> StrictMathBench.maxDouble  thrpt    4  230983.180 ? 161.211  ops/ms
> StrictMathBench.maxFloat   thrpt    4  230969.290 ? 277.500  ops/ms
> StrictMathBench.maxInt     thrpt    4  231033.581 ? 200.015  ops/ms
> StrictMathBench.maxLong    thrpt    4  194590.744 ? 114.295  ops/ms
> 
> 
> StrictMathBench.sqrtDouble  thrpt    4  230722.037 ? 2222.080  ops/ms
> 
> ### After
> 
> StrictMathBench.minDouble  thrpt    4  230976.625 ?  67.338  ops/ms
> StrictMathBench.minFloat   thrpt    4  230896.021 ? 270.434  ops/ms
> StrictMathBench.minInt     thrpt    4  230859.741 ? 403.147  ops/ms
> StrictMathBench.minLong    thrpt    4  194456.673 ? 111.557  ops/ms
> 
> StrictMathBench.maxDouble  thrpt    4  230890.776 ?  89.924  ops/ms
> StrictMathBench.maxFloat   thrpt    4  230918.334 ?  63.160  ops/ms
> StrictMathBench.maxInt     thrpt    4  231059.128 ?  51.224  ops/ms
> StrictMathBench.maxLong    thrpt    4  194488.210 ? 495.224  ops/ms
> 
> StrictMathBench.sqrtDouble  thrpt    4  231023.703 ? 247.330  ops/ms
> 
> 
> Additional testing:
>  - [x] `StrictMath` benchmarks
>  - [x] Linux x86_64 fastdebug `tier1`

Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:

  Keep intrinsics on StrictMath

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6184/files
  - new: https://git.openjdk.java.net/jdk/pull/6184/files/27202fa4..005cace6

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6184&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6184&range=01-02

  Stats: 67 lines in 5 files changed: 55 ins; 5 del; 7 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6184.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6184/head:pull/6184

PR: https://git.openjdk.java.net/jdk/pull/6184

From shade at openjdk.java.net  Tue Nov  2 06:25:33 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Tue, 2 Nov 2021 06:25:33 GMT
Subject: RFR: 8276217: Harmonize StrictMath intrinsics handling [v2]
In-Reply-To: <WdqlhQhVbxe2NIqSwyOmc8yn5rqq6EinwBBW3zcGtoA=.6187110e-46d5-4e05-adb4-106f01858af2@github.com>
References: <v9bs8_XIuFrN450gme8g0FcjBEOepcUmr3MStF-B7pg=.5ee1a282-da4b-4ec0-a71c-f5321ff702c8@github.com>
 <kHdzI1o2YbuQVtEbBBMerRSdJddeAjDeAqleZ9tHAIs=.2be30613-bb11-41a3-9269-29647465c9ea@github.com>
 <WdqlhQhVbxe2NIqSwyOmc8yn5rqq6EinwBBW3zcGtoA=.6187110e-46d5-4e05-adb4-106f01858af2@github.com>
Message-ID: <68cTNOLxxPPW5cJFydBuPv56t_UUdGQi-F0yTT9x2zE=.55f3ab68-ab5c-41f6-8b0d-0c29e4c680b1@github.com>

On Mon, 1 Nov 2021 22:59:10 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

> Yes, I am fine with new intrinsics for them.

All right, see new commit then.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6184

From shade at openjdk.java.net  Tue Nov  2 10:29:16 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Tue, 2 Nov 2021 10:29:16 GMT
Subject: RFR: 8252990: Intrinsify Unsafe.storeStoreFence [v2]
In-Reply-To: <B-PzmmTo_XpAh8NticuAL1a0KidAJvUbEjT2hiYqLhk=.408dc04e-5082-42df-a591-241816725e4a@github.com>
References: <ZWT707BgFCyrHdx6AEgAmCAEeOystOlYAEVD6WT7fSg=.6ddbc7c3-89cb-4bfe-90e2-dcf3e16624e3@github.com>
 <B-PzmmTo_XpAh8NticuAL1a0KidAJvUbEjT2hiYqLhk=.408dc04e-5082-42df-a591-241816725e4a@github.com>
Message-ID: <syxTy3E8pLp4W1mj4M-Flyd3rBRe3lTihZUsLGQZEc0=.037afdfa-2a7b-4d18-ab81-f20f781574a7@github.com>

On Thu, 28 Oct 2021 08:58:48 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> `Unsafe.storeStoreFence` currently delegates to stronger `Unsafe.storeFence`. We can teach compilers to map this directly to already existing rules that handle `MemBarStoreStore`. Like explicit `LoadFence`/`StoreFence`, we introduce the special node to differentiate explicit fence and implicit store-store barriers. `storeStoreFence` is usually used to simulate safe `final`-field like constructions in special JDK classes, like `ConstantCallSite` and friends.
>> 
>> Motivational performance difference on benchmarks from JDK-8276054 on ARM32 (Raspberry Pi 4):
>> 
>> 
>> Benchmark                      Mode  Cnt   Score    Error  Units
>> Multiple.plain                 avgt    3   2.669 ?  0.004  ns/op
>> Multiple.release               avgt    3  16.688 ?  0.057  ns/op
>> Multiple.storeStore            avgt    3  14.021 ?  0.144  ns/op // Better
>> 
>> MultipleWithLoads.plain        avgt    3   4.672 ?  0.053  ns/op
>> MultipleWithLoads.release      avgt    3  16.689 ?  0.044  ns/op
>> MultipleWithLoads.storeStore   avgt    3  14.012 ?  0.010  ns/op // Better
>> 
>> MultipleWithStores.plain       avgt    3  14.687 ?  0.009  ns/op
>> MultipleWithStores.release     avgt    3  45.393 ?  0.192  ns/op
>> MultipleWithStores.storeStore  avgt    3  38.048 ?  0.033  ns/op // Better
>> 
>> Publishing.plain               avgt    3  27.079 ?  0.201  ns/op
>> Publishing.release             avgt    3  27.088 ?  0.241  ns/op
>> Publishing.storeStore          avgt    3  27.009 ?  0.259  ns/op // Within error, hidden by allocation
>> 
>> Single.plain                   avgt    3   2.670 ? 0.002  ns/op
>> Single.releaseFence            avgt    3   6.675 ? 0.001  ns/op
>> Single.storeStoreFence         avgt    3   8.012 ? 0.027  ns/op  // Worse, seems to be ARM32 implementation artifact
>> 
>> 
>> The same thing on AArch64 (Raspberry Pi 3):
>> 
>> 
>> Benchmark                      Mode  Cnt   Score   Error  Units
>> 
>> Multiple.plain                 avgt    3   5.914 ? 0.115  ns/op
>> Multiple.release               avgt    3  10.149 ? 0.059  ns/op
>> Multiple.storeStore            avgt    3   6.757 ? 0.138  ns/op // Better
>> 
>> MultipleWithLoads.plain        avgt    3  11.849 ? 0.331  ns/op
>> MultipleWithLoads.release      avgt    3  35.565 ? 1.144  ns/op
>> MultipleWithLoads.storeStore   avgt    3  19.441 ? 0.471  ns/op // Better
>> 
>> MultipleWithStores.plain       avgt    3   5.920 ? 0.213  ns/op
>> MultipleWithStores.release     avgt    3  20.286 ? 0.347  ns/op
>> MultipleWithStores.storeStore  avgt    3  12.686 ? 0.230  ns/op // Better
>> 
>> Publishing.plain               avgt    3  22.261 ? 1.630  ns/op
>> Publishing.release             avgt    3  22.269 ? 0.576  ns/op
>> Publishing.storeStore          avgt    3  17.464 ? 0.397  ns/op // Better
>> 
>> Single.plain                   avgt    3   5.916 ? 0.063  ns/op
>> Single.release                 avgt    3  10.148 ? 0.401  ns/op
>> Single.storeStore              avgt    3   6.767 ? 0.164  ns/op // Better
>> 
>> 
>> As expected, this does not affect x86_64 at all, because both `release` and `storeStore` are effectively no-ops, only affecting compiler optimizations:
>> 
>> 
>> Benchmark                      Mode  Cnt  Score   Error  Units
>> 
>> Multiple.plain                 avgt    3  0.406 ? 0.002  ns/op
>> Multiple.release               avgt    3  0.409 ? 0.018  ns/op
>> Multiple.storeStore            avgt    3  0.406 ? 0.001  ns/op
>> 
>> MultipleWithLoads.plain        avgt    3  4.328 ? 0.006  ns/op
>> MultipleWithLoads.release      avgt    3  4.600 ? 0.014  ns/op
>> MultipleWithLoads.storeStore   avgt    3  4.602 ? 0.006  ns/op
>> 
>> MultipleWithStores.plain       avgt    3  0.812 ? 0.001  ns/op
>> MultipleWithStores.release     avgt    3  0.812 ? 0.002  ns/op
>> MultipleWithStores.storeStore  avgt    3  0.812 ? 0.002  ns/op
>> 
>> Publishing.plain               avgt    3  6.370 ? 0.059  ns/op
>> Publishing.release             avgt    3  6.358 ? 0.436  ns/op
>> Publishing.storeStore          avgt    3  6.367 ? 0.054  ns/op
>> 
>> Single.plain                   avgt    3  0.407 ? 0.039  ns/op
>> Single.releaseFence            avgt    3  0.406 ? 0.001  ns/op
>> Single.storeStoreFence         avgt    3  0.406 ? 0.001  ns/op
>> 
>> 
>> Additional testing:
>>  - [x] Linux x86_64 fastdebug `tier1`
>>  - [x] Linux AArch64 fastdebug `tier1`
>>  - [x] Linux x86_64 Fences benchmark
>>  - [x] Linux AArch64 Fences benchmark
>>  - [x] Linux ARM32 Fences benchmark
>>  - [x] Linux AArch64 jcstress `quick` run
>
> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix the comment to match JDK-8276096

jcstress and tier1 passes on AArch64. Seems like we are good to go.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6136

From shade at openjdk.java.net  Tue Nov  2 10:29:17 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Tue, 2 Nov 2021 10:29:17 GMT
Subject: Integrated: 8252990: Intrinsify Unsafe.storeStoreFence
In-Reply-To: <ZWT707BgFCyrHdx6AEgAmCAEeOystOlYAEVD6WT7fSg=.6ddbc7c3-89cb-4bfe-90e2-dcf3e16624e3@github.com>
References: <ZWT707BgFCyrHdx6AEgAmCAEeOystOlYAEVD6WT7fSg=.6ddbc7c3-89cb-4bfe-90e2-dcf3e16624e3@github.com>
Message-ID: <V-6zsH7d_0lw8j77L6cfhbqt5cva1Cyzuds5XcY5BWo=.d0372e21-7fd2-466b-9e71-8ae53c936371@github.com>

On Wed, 27 Oct 2021 11:53:47 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> `Unsafe.storeStoreFence` currently delegates to stronger `Unsafe.storeFence`. We can teach compilers to map this directly to already existing rules that handle `MemBarStoreStore`. Like explicit `LoadFence`/`StoreFence`, we introduce the special node to differentiate explicit fence and implicit store-store barriers. `storeStoreFence` is usually used to simulate safe `final`-field like constructions in special JDK classes, like `ConstantCallSite` and friends.
> 
> Motivational performance difference on benchmarks from JDK-8276054 on ARM32 (Raspberry Pi 4):
> 
> 
> Benchmark                      Mode  Cnt   Score    Error  Units
> Multiple.plain                 avgt    3   2.669 ?  0.004  ns/op
> Multiple.release               avgt    3  16.688 ?  0.057  ns/op
> Multiple.storeStore            avgt    3  14.021 ?  0.144  ns/op // Better
> 
> MultipleWithLoads.plain        avgt    3   4.672 ?  0.053  ns/op
> MultipleWithLoads.release      avgt    3  16.689 ?  0.044  ns/op
> MultipleWithLoads.storeStore   avgt    3  14.012 ?  0.010  ns/op // Better
> 
> MultipleWithStores.plain       avgt    3  14.687 ?  0.009  ns/op
> MultipleWithStores.release     avgt    3  45.393 ?  0.192  ns/op
> MultipleWithStores.storeStore  avgt    3  38.048 ?  0.033  ns/op // Better
> 
> Publishing.plain               avgt    3  27.079 ?  0.201  ns/op
> Publishing.release             avgt    3  27.088 ?  0.241  ns/op
> Publishing.storeStore          avgt    3  27.009 ?  0.259  ns/op // Within error, hidden by allocation
> 
> Single.plain                   avgt    3   2.670 ? 0.002  ns/op
> Single.releaseFence            avgt    3   6.675 ? 0.001  ns/op
> Single.storeStoreFence         avgt    3   8.012 ? 0.027  ns/op  // Worse, seems to be ARM32 implementation artifact
> 
> 
> The same thing on AArch64 (Raspberry Pi 3):
> 
> 
> Benchmark                      Mode  Cnt   Score   Error  Units
> 
> Multiple.plain                 avgt    3   5.914 ? 0.115  ns/op
> Multiple.release               avgt    3  10.149 ? 0.059  ns/op
> Multiple.storeStore            avgt    3   6.757 ? 0.138  ns/op // Better
> 
> MultipleWithLoads.plain        avgt    3  11.849 ? 0.331  ns/op
> MultipleWithLoads.release      avgt    3  35.565 ? 1.144  ns/op
> MultipleWithLoads.storeStore   avgt    3  19.441 ? 0.471  ns/op // Better
> 
> MultipleWithStores.plain       avgt    3   5.920 ? 0.213  ns/op
> MultipleWithStores.release     avgt    3  20.286 ? 0.347  ns/op
> MultipleWithStores.storeStore  avgt    3  12.686 ? 0.230  ns/op // Better
> 
> Publishing.plain               avgt    3  22.261 ? 1.630  ns/op
> Publishing.release             avgt    3  22.269 ? 0.576  ns/op
> Publishing.storeStore          avgt    3  17.464 ? 0.397  ns/op // Better
> 
> Single.plain                   avgt    3   5.916 ? 0.063  ns/op
> Single.release                 avgt    3  10.148 ? 0.401  ns/op
> Single.storeStore              avgt    3   6.767 ? 0.164  ns/op // Better
> 
> 
> As expected, this does not affect x86_64 at all, because both `release` and `storeStore` are effectively no-ops, only affecting compiler optimizations:
> 
> 
> Benchmark                      Mode  Cnt  Score   Error  Units
> 
> Multiple.plain                 avgt    3  0.406 ? 0.002  ns/op
> Multiple.release               avgt    3  0.409 ? 0.018  ns/op
> Multiple.storeStore            avgt    3  0.406 ? 0.001  ns/op
> 
> MultipleWithLoads.plain        avgt    3  4.328 ? 0.006  ns/op
> MultipleWithLoads.release      avgt    3  4.600 ? 0.014  ns/op
> MultipleWithLoads.storeStore   avgt    3  4.602 ? 0.006  ns/op
> 
> MultipleWithStores.plain       avgt    3  0.812 ? 0.001  ns/op
> MultipleWithStores.release     avgt    3  0.812 ? 0.002  ns/op
> MultipleWithStores.storeStore  avgt    3  0.812 ? 0.002  ns/op
> 
> Publishing.plain               avgt    3  6.370 ? 0.059  ns/op
> Publishing.release             avgt    3  6.358 ? 0.436  ns/op
> Publishing.storeStore          avgt    3  6.367 ? 0.054  ns/op
> 
> Single.plain                   avgt    3  0.407 ? 0.039  ns/op
> Single.releaseFence            avgt    3  0.406 ? 0.001  ns/op
> Single.storeStoreFence         avgt    3  0.406 ? 0.001  ns/op
> 
> 
> Additional testing:
>  - [x] Linux x86_64 fastdebug `tier1`
>  - [x] Linux AArch64 fastdebug `tier1`
>  - [x] Linux x86_64 Fences benchmark
>  - [x] Linux AArch64 Fences benchmark
>  - [x] Linux ARM32 Fences benchmark
>  - [x] Linux AArch64 jcstress `quick` run

This pull request has now been integrated.

Changeset: b7a06be9
Author:    Aleksey Shipilev <shade at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/b7a06be98d3057dac4adbb7f4071ac62cf88fe52
Stats:     38 lines in 16 files changed: 32 ins; 5 del; 1 mod

8252990: Intrinsify Unsafe.storeStoreFence

Reviewed-by: dholmes, thartmann, whuang

-------------

PR: https://git.openjdk.java.net/jdk/pull/6136

From mcimadamore at openjdk.java.net  Tue Nov  2 10:34:18 2021
From: mcimadamore at openjdk.java.net (Maurizio Cimadamore)
Date: Tue, 2 Nov 2021 10:34:18 GMT
Subject: RFR: 8275063: Implementation of Foreign Function & Memory API
 (Second incubator) [v10]
In-Reply-To: <qLFAt0lFdftR7B2zt_HX1F7hhhqR0_rc8xjJI5i45fY=.f501e843-ed35-400d-ba0e-0e8711505006@github.com>
References: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
 <wUAuOEYgLOLXwm-WgtLLMj3GKVJ2nFIZazQSi16zaXc=.4637910d-a678-4d1a-89d9-82f439ade88a@github.com>
 <qLFAt0lFdftR7B2zt_HX1F7hhhqR0_rc8xjJI5i45fY=.f501e843-ed35-400d-ba0e-0e8711505006@github.com>
Message-ID: <ESPjgPDRHUCiWLeJiiePwBlXPrF-Qoxl5ytexABrJVE=.980a26b6-7690-4b0d-8beb-56fdd943ee1b@github.com>

On Tue, 2 Nov 2021 00:24:12 GMT, Paul Sandoz <psandoz at openjdk.org> wrote:

>> Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits:
>> 
>>  - Add cache for memory address var handles
>>  - Merge branch 'master' into JEP-419
>>  - Fix regression in VaList treatment on AArch64 (contributed by @nick-arm)
>>  - Merge branch 'master' into JEP-419
>>  - Fix copyright header in TestArrayCopy
>>  - Fix failing microbenchmarks. Contributed by @FrauBoes (thanks!)
>>  - * use `invokeWithArguments` to simplify new test
>>  - Add test for liveness check with high-aririty downcalls
>>    (make sure that if an exception occurs in a downcall because of liveness,
>>    ref count of other resources are left intact).
>>  - * Fix javadoc issue in VaList
>>    * Fix bug in concurrent logic for shared scope acquire
>>  - Address review comments
>>  - ... and 7 more: https://git.openjdk.java.net/jdk/compare/5bb1992b...9b519343
>
> src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/Utils.java line 111:
> 
>> 109:         class VarHandleCache {
>> 110:             private static final Map<ValueLayout, VarHandle> handleMap = new ConcurrentHashMap<>();
>> 111:             private static final Map<ValueLayout, VarHandle> handleMapNoAlignCheck = new ConcurrentHashMap<>();
> 
> Something to consider later if this is an issue. Since the number of `ValueLayout` instances is fixed, carrier x order = 18, we can use stable arrays with ordinals on the instances.

What about alignment?

-------------

PR: https://git.openjdk.java.net/jdk/pull/5907

From psandoz at openjdk.java.net  Tue Nov  2 15:44:12 2021
From: psandoz at openjdk.java.net (Paul Sandoz)
Date: Tue, 2 Nov 2021 15:44:12 GMT
Subject: RFR: 8275063: Implementation of Foreign Function & Memory API
 (Second incubator) [v12]
In-Reply-To: <LZdOj0F56w1jvphEqwIPKjBGVqK9bK18hoyBRS7LvH8=.28940500-42f1-4550-94ab-4dc090785924@github.com>
References: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
 <LZdOj0F56w1jvphEqwIPKjBGVqK9bK18hoyBRS7LvH8=.28940500-42f1-4550-94ab-4dc090785924@github.com>
Message-ID: <UX8V13GkspdQyaQurhAql9BaZiAQH5klYW8GyZiRJ-g=.17702d38-a07e-4596-af99-2fcb5243862c@github.com>

On Mon, 1 Nov 2021 22:36:40 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

>> This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment.
>> 
>> [1] - https://openjdk.java.net/jeps/419
>
> Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Tweak javadoc of loaderLookup

Marked as reviewed by psandoz (Reviewer).

-------------

PR: https://git.openjdk.java.net/jdk/pull/5907

From psandoz at openjdk.java.net  Tue Nov  2 15:44:13 2021
From: psandoz at openjdk.java.net (Paul Sandoz)
Date: Tue, 2 Nov 2021 15:44:13 GMT
Subject: RFR: 8275063: Implementation of Foreign Function & Memory API
 (Second incubator) [v10]
In-Reply-To: <ESPjgPDRHUCiWLeJiiePwBlXPrF-Qoxl5ytexABrJVE=.980a26b6-7690-4b0d-8beb-56fdd943ee1b@github.com>
References: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
 <wUAuOEYgLOLXwm-WgtLLMj3GKVJ2nFIZazQSi16zaXc=.4637910d-a678-4d1a-89d9-82f439ade88a@github.com>
 <qLFAt0lFdftR7B2zt_HX1F7hhhqR0_rc8xjJI5i45fY=.f501e843-ed35-400d-ba0e-0e8711505006@github.com>
 <ESPjgPDRHUCiWLeJiiePwBlXPrF-Qoxl5ytexABrJVE=.980a26b6-7690-4b0d-8beb-56fdd943ee1b@github.com>
Message-ID: <5onID0SnzIoPH5_Le4f71eC5ll_zGn0DQecQVpL1jDM=.43d7b2af-185a-4251-828f-058da6a69115@github.com>

On Tue, 2 Nov 2021 10:30:42 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

>> src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/Utils.java line 111:
>> 
>>> 109:         class VarHandleCache {
>>> 110:             private static final Map<ValueLayout, VarHandle> handleMap = new ConcurrentHashMap<>();
>>> 111:             private static final Map<ValueLayout, VarHandle> handleMapNoAlignCheck = new ConcurrentHashMap<>();
>> 
>> Something to consider later if this is an issue. Since the number of `ValueLayout` instances is fixed, carrier x order = 18, we can use stable arrays with ordinals on the instances.
>
> What about alignment?

Drat, `skipAlignmentCheck` misled me but perhaps there is still benefit for common constants with 8 bit and size alignment and fallback otherwise.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5907

From kvn at openjdk.java.net  Tue Nov  2 17:11:09 2021
From: kvn at openjdk.java.net (Vladimir Kozlov)
Date: Tue, 2 Nov 2021 17:11:09 GMT
Subject: RFR: 8276217: Harmonize StrictMath intrinsics handling [v3]
In-Reply-To: <x0dEIeACmCL8uU35Mbr_ej-ZMRgI9Ye-dE_APsoZYXs=.d90877ae-d6a1-48d1-b8fa-e235509a492b@github.com>
References: <v9bs8_XIuFrN450gme8g0FcjBEOepcUmr3MStF-B7pg=.5ee1a282-da4b-4ec0-a71c-f5321ff702c8@github.com>
 <x0dEIeACmCL8uU35Mbr_ej-ZMRgI9Ye-dE_APsoZYXs=.d90877ae-d6a1-48d1-b8fa-e235509a492b@github.com>
Message-ID: <mwi7QCmri06l0t38hwv6QR7sZAMD5EO0NmCXqH9Q5Kg=.d5336aa8-dcef-4367-ad51-3c273d47e3a0@github.com>

On Tue, 2 Nov 2021 06:25:33 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> This blocks JDK-8276215: `StrictMath` intrinsics are handled peculiarly by giving failing intrinsics a second chance to match against the similar `Math` intrinsics. This has interesting consequence for matchers: we can match the native `StrictMath.sqrt` to non-native intrinsic for `Math.sqrt`. Interpreter would then have to disambiguate the two. It could be made simpler and more consistent.
>> 
>> For `min`/`max` methods, `StrictMath` already delegates to `Math` methods, so we can just drop the intrinsics for them. `sqrt` is harder to delegate, because it is `native` and a part of public API, so we can instead do the proper special intrinsic for it.
>> 
>> There seem to be no performance regressions with this patch at least on Linux x86_64:
>> 
>> 
>> $ CONF=linux-x86_64-server-release make test TEST="micro:StrictMathBench" 
>> 
>> Benchmark                   Mode  Cnt       Score     Error   Units
>> 
>> ### Before
>> 
>> StrictMathBench.minDouble  thrpt    4  230921.558 ? 234.238  ops/ms
>> StrictMathBench.minFloat   thrpt    4  230932.303 ? 126.721  ops/ms
>> StrictMathBench.minInt     thrpt    4  230917.256 ?  73.008  ops/ms
>> StrictMathBench.minLong    thrpt    4  194460.828 ? 178.079  ops/ms
>> 
>> 
>> StrictMathBench.maxDouble  thrpt    4  230983.180 ? 161.211  ops/ms
>> StrictMathBench.maxFloat   thrpt    4  230969.290 ? 277.500  ops/ms
>> StrictMathBench.maxInt     thrpt    4  231033.581 ? 200.015  ops/ms
>> StrictMathBench.maxLong    thrpt    4  194590.744 ? 114.295  ops/ms
>> 
>> 
>> StrictMathBench.sqrtDouble  thrpt    4  230722.037 ? 2222.080  ops/ms
>> 
>> ### After
>> 
>> StrictMathBench.minDouble  thrpt    4  230976.625 ?  67.338  ops/ms
>> StrictMathBench.minFloat   thrpt    4  230896.021 ? 270.434  ops/ms
>> StrictMathBench.minInt     thrpt    4  230859.741 ? 403.147  ops/ms
>> StrictMathBench.minLong    thrpt    4  194456.673 ? 111.557  ops/ms
>> 
>> StrictMathBench.maxDouble  thrpt    4  230890.776 ?  89.924  ops/ms
>> StrictMathBench.maxFloat   thrpt    4  230918.334 ?  63.160  ops/ms
>> StrictMathBench.maxInt     thrpt    4  231059.128 ?  51.224  ops/ms
>> StrictMathBench.maxLong    thrpt    4  194488.210 ? 495.224  ops/ms
>> 
>> StrictMathBench.sqrtDouble  thrpt    4  231023.703 ? 247.330  ops/ms
>> 
>> 
>> Additional testing:
>>  - [x] `StrictMath` benchmarks
>>  - [x] Linux x86_64 fastdebug `tier1`
>
> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Keep intrinsics on StrictMath

Good. Thank you for fixing it.

-------------

Marked as reviewed by kvn (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6184

From jvernee at openjdk.java.net  Tue Nov  2 17:32:24 2021
From: jvernee at openjdk.java.net (Jorn Vernee)
Date: Tue, 2 Nov 2021 17:32:24 GMT
Subject: RFR: 8275063: Implementation of Foreign Function & Memory API
 (Second incubator) [v10]
In-Reply-To: <wUAuOEYgLOLXwm-WgtLLMj3GKVJ2nFIZazQSi16zaXc=.4637910d-a678-4d1a-89d9-82f439ade88a@github.com>
References: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
 <wUAuOEYgLOLXwm-WgtLLMj3GKVJ2nFIZazQSi16zaXc=.4637910d-a678-4d1a-89d9-82f439ade88a@github.com>
Message-ID: <QcXpMpgQuuA-3772AYtOml3cWcnnR5vEBZ1WtfqBM4M=.eabbe80f-22df-4001-a7c9-d2ccb31c3186@github.com>

On Mon, 1 Nov 2021 12:05:32 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

>> This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment.
>> 
>> [1] - https://openjdk.java.net/jeps/419
>
> Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits:
> 
>  - Add cache for memory address var handles
>  - Merge branch 'master' into JEP-419
>  - Fix regression in VaList treatment on AArch64 (contributed by @nick-arm)
>  - Merge branch 'master' into JEP-419
>  - Fix copyright header in TestArrayCopy
>  - Fix failing microbenchmarks. Contributed by @FrauBoes (thanks!)
>  - * use `invokeWithArguments` to simplify new test
>  - Add test for liveness check with high-aririty downcalls
>    (make sure that if an exception occurs in a downcall because of liveness,
>    ref count of other resources are left intact).
>  - * Fix javadoc issue in VaList
>    * Fix bug in concurrent logic for shared scope acquire
>  - Address review comments
>  - ... and 7 more: https://git.openjdk.java.net/jdk/compare/5bb1992b...9b519343

src/java.base/share/classes/java/lang/invoke/MethodHandleImpl.java line 1586:

> 1584:             public void ensureCustomized(MethodHandle mh) {
> 1585:                 mh.customize();
> 1586:             }

This is no longer needed, but it probably got picked up in the merge.

src/java.base/share/classes/jdk/internal/access/JavaLangInvokeAccess.java line 144:

> 142:      * @param mh the method handle
> 143:      */
> 144:     void ensureCustomized(MethodHandle mh);

Same here, no longer needed. (it was used by now removed upcall handler code. See https://github.com/openjdk/panama-foreign/pull/553)

src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/MemoryAddress.java line 107:

> 105:      *
> 106:      * @param offset offset in bytes (relative to this address). The final address of this read operation can be expressed as {@code toRowLongValue() + offset}.
> 107:      * @return a Java UTF-8 string containing all the bytes read from the given starting address ({@code toRowLongValue() + offset})

(see also comment on MemorySegment.getUtf8String)
Suggestion:

     * @return a Java string constructed from the bytes read from the given starting address ({@code toRowLongValue() + offset})

src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/MemorySegment.java line 387:

> 385: 
> 386:     /**
> 387:      * Performs an element-wise bulk copy from given source segment to this segment. More specifically, the bytes at

Suggestion:

     * Performs a byte-wise bulk copy from given source segment to this segment. More specifically, the bytes at

src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/MemorySegment.java line 400:

> 398:      * a multiple of the source element layout size, if the source segment is incompatible with the alignment constraints
> 399:      * in the source element layout, or if this segment is incompatible with the alignment constraints
> 400:      * in the destination element layout.

This speaks about element layouts, but I don't see any element layouts in the method implementation.

src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/MemorySegment.java line 633:

> 631:      * java.nio.charset.CharsetDecoder} class should be used when more control
> 632:      * over the decoding process is required.
> 633:      * @param offset offset in bytes (relative to this segment). For instance, if this segment is a {@link #isNative()} segment,

Suggestion:

     * @param offset offset in bytes (relative to this segment). For instance, if this segment is a {@link #isNative() native} segment,

src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/MemorySegment.java line 636:

> 634:      *               the final address of this read operation can be expressed as {@code address().toRowLongValue() + offset}.
> 635:      * @return a Java UTF-8 string containing all the bytes read from the given starting address up to (but not including)
> 636:      * the first {@code '\0'} terminator character (assuming one is found).

The phrase "a Java UTF-8 string" sounds strange to me, as Java Strings are not encoded in UTF-8. The string that is read is UTF-8 encoded, but then it is converted from UTF-8 to Java internal String encoding (UTF-16 or Latin1).

I'd suggest just dropping the 'UTF-8', and changing 'containing all' to 'constructed from'.
Suggestion:

     * @return a Java string constructed from the bytes read from the given starting address up to (but not including)
     * the first {@code '\0'} terminator character (assuming one is found).

src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/MemorySegment.java line 652:

> 650:      * java.nio.charset.CharsetDecoder} class should be used when more control
> 651:      * over the decoding process is required.
> 652:      * @param offset offset in bytes (relative to this segment). For instance, if this segment is a {@link #isNative()} segment,

Suggestion:

     * @param offset offset in bytes (relative to this segment). For instance, if this segment is a {@link #isNative() native} segment,

src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/MemorySegment.java line 762:

> 760: 
> 761:     /**
> 762:      * Creates a new native memory segment with given size and resource scope, and whose base address is this address.

Suggestion:

     * Creates a new native memory segment with given size and resource scope, and whose base address is the given address.

src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/MemorySegment.java line 769:

> 767:      * provided resource scope.
> 768:      * <p>
> 769:      * Clients should ensure that the address and bounds refers to a valid region of memory that is accessible for reading and,

Suggestion:

     * Clients should ensure that the address and bounds refer to a valid region of memory that is accessible for reading and,

src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/MemorySegment.java line 1035:

> 1033:      *
> 1034:      * @param layout the layout of the memory region to be read.
> 1035:      * @param offset offset in bytes (relative to this segment). For instance, if this segment is a {@link #isNative()} segment,

Suggestion:

     * @param offset offset in bytes (relative to this segment). For instance, if this segment is a {@link #isNative() native} segment,

src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/MemorySegment.java line 1549:

> 1547:      * @param index index (relative to this segment). For instance, if this segment is a {@link #isNative()} segment,
> 1548:      *               the final address of this write operation can be expressed as {@code address().toRowLongValue() + (index * layout.byteSize())}.
> 1549:      * @param value the byte value to be written.

Suggestion:

     * @param value the address value to be written.

src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/MemorySegment.java line 1563:

> 1561:      * Copies a number of elements from a source segment to a destination array,
> 1562:      * starting at a given segment offset (expressed in bytes), and a given array index, using the given source element layout.
> 1563:      * Supported array types are {@code byte[]}, {@code char[]},{@code short[]},{@code int[]},{@code float[]},{@code long[]} and {@code double[]}.

Suggestion:

     * Supported array types are {@code byte[]}, {@code char[]}, {@code short[]}, {@code int[]}, {@code float[]}, {@code long[]} and {@code double[]}.

src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/MemorySegment.java line 1604:

> 1602:      * Copies a number of elements from a source array to a destination segment,
> 1603:      * starting at a given array index, and a given segment offset (expressed in bytes), using the given destination element layout.
> 1604:      * Supported array types are {@code byte[]}, {@code char[]},{@code short[]},{@code int[]},{@code float[]},{@code long[]} and {@code double[]}.

Suggestion:

     * Supported array types are {@code byte[]}, {@code char[]}, {@code short[]}, {@code int[]}, {@code float[]}, {@code long[]} and {@code double[]}.

src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/ResourceScope.java line 208:

> 206:      */
> 207:     static ResourceScope newConfinedScope() {
> 208:         return ResourceScopeImpl.createConfined( Thread.currentThread(), null);

Suggestion:

        return ResourceScopeImpl.createConfined(Thread.currentThread(), null);

src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/VaList.java line 132:

> 130:     /**
> 131:      * Copies this variable argument list at its current position into a new variable argument list associated
> 132:      * with the same scope as this variable argument list. using the segment provided allocator. Copying is useful to

I think ". using the segment provided allocator" can be removed. Seems like a leftover from when we had an overload that took an allocator.
Suggestion:

     * with the same scope as this variable argument list. Copying is useful to

-------------

PR: https://git.openjdk.java.net/jdk/pull/5907

From jvernee at openjdk.java.net  Tue Nov  2 17:32:17 2021
From: jvernee at openjdk.java.net (Jorn Vernee)
Date: Tue, 2 Nov 2021 17:32:17 GMT
Subject: RFR: 8275063: Implementation of Foreign Function & Memory API
 (Second incubator) [v12]
In-Reply-To: <LZdOj0F56w1jvphEqwIPKjBGVqK9bK18hoyBRS7LvH8=.28940500-42f1-4550-94ab-4dc090785924@github.com>
References: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
 <LZdOj0F56w1jvphEqwIPKjBGVqK9bK18hoyBRS7LvH8=.28940500-42f1-4550-94ab-4dc090785924@github.com>
Message-ID: <wCzCmT6ysO0EAQgn5C6ZbDTTZxGk8Bf0JnuKxjOfUQQ=.e6be3695-4027-4845-a2a2-d16b39290f75@github.com>

On Mon, 1 Nov 2021 22:36:40 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

>> This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment.
>> 
>> [1] - https://openjdk.java.net/jeps/419
>
> Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Tweak javadoc of loaderLookup

Mostly some minor javadoc comments.

src/java.base/share/classes/java/lang/Module.java line 32:

> 30: import java.lang.annotation.Annotation;
> 31: import java.lang.invoke.MethodHandle;
> 32: import java.lang.invoke.VarHandle;

These imports seem spurious now.

src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/ValueLayout.java line 177:

> 175:         }
> 176:         if (carrier.isPrimitive() && Wrapper.forPrimitiveType(carrier).bitWidth() != size &&
> 177:                 carrier != boolean.class && size != 8) {

I find this condition hard to parse, I'd suggest re-writing it as:

if (carrier.isPrimitive()) {
    long expectedSize = carrier == boolean.class ? 8 : Wrapper.forPrimitiveType(carrier).bitWidth();
    if (size != expectedSize) {
        throw ...
    }
}

(Maybe even change the `if` to an `else` and combine it with the above if).

src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/ValueLayout.java line 484:

> 482:     public static final class OfAddress extends ValueLayout {
> 483:         OfAddress(ByteOrder order) {
> 484:             super(MemoryAddress.class, order, Unsafe.ADDRESS_SIZE * 8);

I see `Unsafe.ADDRESS_SIZE` used in several places, suggest to maybe add an `ADDRESS_SIZE_BITS` constants somewhere (it's a bit more readable).

src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/ArenaAllocator.java line 42:

> 40:     final long blockSize;
> 41:     final long arenaSize;
> 42:     final ResourceScope scope;

Could these field be made private?

src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/ArenaAllocator.java line 88:

> 86:                     if (size > arenaSize) {
> 87:                         throw new OutOfMemoryError();
> 88:                     }

Isn't this already covered by the `finally` block? Also, this seems to be checking the unaltered `size`, which I think should have been already done at the end of the previous `allocate` call right?

src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/ResourceScopeImpl.java line 122:

> 120:         ResourceScopeImpl targetImpl = (ResourceScopeImpl)target;
> 121:         targetImpl.acquire0();
> 122:         addCloseAction(targetImpl::release0);

Maybe this should explicitly check if target is `null` (though the call to `acquire0` would also produce an NPE, the stack trace having Objects::requireNonNull in there would make the error more obvious I think).
Suggestion:

    public void keepAlive(ResourceScope target) {
        Objects.requireNonNull(target);
        if (target == this) {
            throw new IllegalArgumentException("Invalid target scope.");
        }
        ResourceScopeImpl targetImpl = (ResourceScopeImpl)target;
        targetImpl.acquire0();
        addCloseAction(targetImpl::release0);

src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/SharedScope.java line 101:

> 99:         int value;
> 100:         do {
> 101:             value = (int) STATE.getVolatile(jdk.internal.foreign.SharedScope.this);

Doesn't need to be fully qualified I think?
Suggestion:

            value = (int) STATE.getVolatile(this);

src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/SharedScope.java line 106:

> 104:                 throw new IllegalStateException("Already closed");
> 105:             }
> 106:         } while (!STATE.compareAndSet(jdk.internal.foreign.SharedScope.this, value, value - 1));

Same here
Suggestion:

        } while (!STATE.compareAndSet(this, value, value - 1));

-------------

Marked as reviewed by jvernee (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/5907

From jvernee at openjdk.java.net  Tue Nov  2 17:32:25 2021
From: jvernee at openjdk.java.net (Jorn Vernee)
Date: Tue, 2 Nov 2021 17:32:25 GMT
Subject: RFR: 8275063: Implementation of Foreign Function & Memory API
 (Second incubator) [v10]
In-Reply-To: <QcXpMpgQuuA-3772AYtOml3cWcnnR5vEBZ1WtfqBM4M=.eabbe80f-22df-4001-a7c9-d2ccb31c3186@github.com>
References: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
 <wUAuOEYgLOLXwm-WgtLLMj3GKVJ2nFIZazQSi16zaXc=.4637910d-a678-4d1a-89d9-82f439ade88a@github.com>
 <QcXpMpgQuuA-3772AYtOml3cWcnnR5vEBZ1WtfqBM4M=.eabbe80f-22df-4001-a7c9-d2ccb31c3186@github.com>
Message-ID: <aZeyuXAwjCwreT-4v7vdv2vAvBPwbQB-3HFuls7Wumc=.fe0fb911-9a1e-48bd-99e1-45b3c3f14c13@github.com>

On Mon, 1 Nov 2021 15:38:18 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:

>> Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits:
>> 
>>  - Add cache for memory address var handles
>>  - Merge branch 'master' into JEP-419
>>  - Fix regression in VaList treatment on AArch64 (contributed by @nick-arm)
>>  - Merge branch 'master' into JEP-419
>>  - Fix copyright header in TestArrayCopy
>>  - Fix failing microbenchmarks. Contributed by @FrauBoes (thanks!)
>>  - * use `invokeWithArguments` to simplify new test
>>  - Add test for liveness check with high-aririty downcalls
>>    (make sure that if an exception occurs in a downcall because of liveness,
>>    ref count of other resources are left intact).
>>  - * Fix javadoc issue in VaList
>>    * Fix bug in concurrent logic for shared scope acquire
>>  - Address review comments
>>  - ... and 7 more: https://git.openjdk.java.net/jdk/compare/5bb1992b...9b519343
>
> src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/MemorySegment.java line 1035:
> 
>> 1033:      *
>> 1034:      * @param layout the layout of the memory region to be read.
>> 1035:      * @param offset offset in bytes (relative to this segment). For instance, if this segment is a {@link #isNative()} segment,
> 
> Suggestion:
> 
>      * @param offset offset in bytes (relative to this segment). For instance, if this segment is a {@link #isNative() native} segment,

Same suggestion with all the other getters/setters below (I assume you wanted to add text to the link here?)

> src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/MemorySegment.java line 1549:
> 
>> 1547:      * @param index index (relative to this segment). For instance, if this segment is a {@link #isNative()} segment,
>> 1548:      *               the final address of this write operation can be expressed as {@code address().toRowLongValue() + (index * layout.byteSize())}.
>> 1549:      * @param value the byte value to be written.
> 
> Suggestion:
> 
>      * @param value the address value to be written.

I think all the setters have this problem.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5907

From mcimadamore at openjdk.java.net  Tue Nov  2 18:52:21 2021
From: mcimadamore at openjdk.java.net (Maurizio Cimadamore)
Date: Tue, 2 Nov 2021 18:52:21 GMT
Subject: RFR: 8275063: Implementation of Foreign Function & Memory API
 (Second incubator) [v12]
In-Reply-To: <wCzCmT6ysO0EAQgn5C6ZbDTTZxGk8Bf0JnuKxjOfUQQ=.e6be3695-4027-4845-a2a2-d16b39290f75@github.com>
References: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
 <LZdOj0F56w1jvphEqwIPKjBGVqK9bK18hoyBRS7LvH8=.28940500-42f1-4550-94ab-4dc090785924@github.com>
 <wCzCmT6ysO0EAQgn5C6ZbDTTZxGk8Bf0JnuKxjOfUQQ=.e6be3695-4027-4845-a2a2-d16b39290f75@github.com>
Message-ID: <LmumNYBTtfpHdsaUSOYaYA65QtVIiewuGNF2Jxk7-kI=.e6199a44-78c7-492b-89ee-700a406aa642@github.com>

On Tue, 2 Nov 2021 16:51:06 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:

>> Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Tweak javadoc of loaderLookup
>
> src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/ArenaAllocator.java line 88:
> 
>> 86:                     if (size > arenaSize) {
>> 87:                         throw new OutOfMemoryError();
>> 88:                     }
> 
> Isn't this already covered by the `finally` block? Also, this seems to be checking the unaltered `size`, which I think should have been already done at the end of the previous `allocate` call right?

I'll have to think some more about this. I don't think this is covered inside the block - that is, the block tries to allocate, and then in the finally we throw if we realized we've allocated too much.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5907

From alanb at openjdk.java.net  Tue Nov  2 19:49:15 2021
From: alanb at openjdk.java.net (Alan Bateman)
Date: Tue, 2 Nov 2021 19:49:15 GMT
Subject: RFR: 8275063: Implementation of Foreign Function & Memory API
 (Second incubator) [v13]
In-Reply-To: <1DhHETKpULKzqGU-0EU7qcdSWDngTBO1UMQ39E8qzBw=.ad279b49-57fb-4026-9049-862b4aef2ada@github.com>
References: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
 <1DhHETKpULKzqGU-0EU7qcdSWDngTBO1UMQ39E8qzBw=.ad279b49-57fb-4026-9049-862b4aef2ada@github.com>
Message-ID: <IcqL5_hBf_lFdJ_2ImJUzvk8G9vNTzfG6TtOqD4gJtY=.f557e13d-d56a-4d7e-9ff2-7ac79228b390@github.com>

On Tue, 2 Nov 2021 19:35:29 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

>> This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment.
>> 
>> [1] - https://openjdk.java.net/jeps/419
>
> Maurizio Cimadamore has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Address impl review comments
>  - Address API review comments

src/java.base/share/classes/java/lang/Module.java line 114:

> 112: 
> 113:     // true, if this module allows restricted native access; @Stable makes sure that modules that allow native
> 114:     // access capture this property as a constant.

Do you mind fixing this comment to avoid the really long line, it sticks out compare to everything else around it.

src/java.base/share/classes/sun/nio/ch/IOUtil.java line 478:

> 476:     private static final JavaNioAccess NIO_ACCESS = SharedSecrets.getJavaNioAccess();
> 477: 
> 478:     static Runnable acquireScope(ByteBuffer bb, boolean async) {

At some point (not this PR) we should move the "async" out of this file, IOUtil was for synchronous I/O.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5907

From mcimadamore at openjdk.java.net  Tue Nov  2 19:49:14 2021
From: mcimadamore at openjdk.java.net (Maurizio Cimadamore)
Date: Tue, 2 Nov 2021 19:49:14 GMT
Subject: RFR: 8275063: Implementation of Foreign Function & Memory API
 (Second incubator) [v13]
In-Reply-To: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
References: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
Message-ID: <1DhHETKpULKzqGU-0EU7qcdSWDngTBO1UMQ39E8qzBw=.ad279b49-57fb-4026-9049-862b4aef2ada@github.com>

> This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment.
> 
> [1] - https://openjdk.java.net/jeps/419

Maurizio Cimadamore has updated the pull request incrementally with two additional commits since the last revision:

 - Address impl review comments
 - Address API review comments

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/5907/files
  - new: https://git.openjdk.java.net/jdk/pull/5907/files/7cf4fcd9..1126133a

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=12
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=11-12

  Stats: 103 lines in 11 files changed: 8 ins; 23 del; 72 mod
  Patch: https://git.openjdk.java.net/jdk/pull/5907.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/5907/head:pull/5907

PR: https://git.openjdk.java.net/jdk/pull/5907

From mcimadamore at openjdk.java.net  Tue Nov  2 19:49:16 2021
From: mcimadamore at openjdk.java.net (Maurizio Cimadamore)
Date: Tue, 2 Nov 2021 19:49:16 GMT
Subject: RFR: 8275063: Implementation of Foreign Function & Memory API
 (Second incubator) [v12]
In-Reply-To: <LmumNYBTtfpHdsaUSOYaYA65QtVIiewuGNF2Jxk7-kI=.e6199a44-78c7-492b-89ee-700a406aa642@github.com>
References: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
 <LZdOj0F56w1jvphEqwIPKjBGVqK9bK18hoyBRS7LvH8=.28940500-42f1-4550-94ab-4dc090785924@github.com>
 <wCzCmT6ysO0EAQgn5C6ZbDTTZxGk8Bf0JnuKxjOfUQQ=.e6be3695-4027-4845-a2a2-d16b39290f75@github.com>
 <LmumNYBTtfpHdsaUSOYaYA65QtVIiewuGNF2Jxk7-kI=.e6199a44-78c7-492b-89ee-700a406aa642@github.com>
Message-ID: <ekptcSWurL7TCI9UQg9oUm261suMGQedFTB-s86XC_w=.2eb23e6e-c51e-45d9-ac4f-305fc97c9be1@github.com>

On Tue, 2 Nov 2021 18:48:57 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

>> src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/ArenaAllocator.java line 88:
>> 
>>> 86:                     if (size > arenaSize) {
>>> 87:                         throw new OutOfMemoryError();
>>> 88:                     }
>> 
>> Isn't this already covered by the `finally` block? Also, this seems to be checking the unaltered `size`, which I think should have been already done at the end of the previous `allocate` call right?
>
> I'll have to think some more about this. I don't think this is covered inside the block - that is, the block tries to allocate, and then in the finally we throw if we realized we've allocated too much.

What is missing, I think, is a check (size > arenaSize) at the beginning of the method (we only check this in one of the paths). But we need to check before and after, I think, as it is possible to allocate a segment and then realize that we ended up overflowing the arena size.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5907

From mcimadamore at openjdk.java.net  Tue Nov  2 19:49:16 2021
From: mcimadamore at openjdk.java.net (Maurizio Cimadamore)
Date: Tue, 2 Nov 2021 19:49:16 GMT
Subject: RFR: 8275063: Implementation of Foreign Function & Memory API
 (Second incubator) [v12]
In-Reply-To: <ekptcSWurL7TCI9UQg9oUm261suMGQedFTB-s86XC_w=.2eb23e6e-c51e-45d9-ac4f-305fc97c9be1@github.com>
References: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
 <LZdOj0F56w1jvphEqwIPKjBGVqK9bK18hoyBRS7LvH8=.28940500-42f1-4550-94ab-4dc090785924@github.com>
 <wCzCmT6ysO0EAQgn5C6ZbDTTZxGk8Bf0JnuKxjOfUQQ=.e6be3695-4027-4845-a2a2-d16b39290f75@github.com>
 <LmumNYBTtfpHdsaUSOYaYA65QtVIiewuGNF2Jxk7-kI=.e6199a44-78c7-492b-89ee-700a406aa642@github.com>
 <ekptcSWurL7TCI9UQg9oUm261suMGQedFTB-s86XC_w=.2eb23e6e-c51e-45d9-ac4f-305fc97c9be1@github.com>
Message-ID: <smw2ow_lpXCXpH6gI8RI_bgfEEWfOUL8l1Th75EkqDE=.7316e2f5-1434-486a-a492-d48c4352877f@github.com>

On Tue, 2 Nov 2021 18:55:47 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

>> I'll have to think some more about this. I don't think this is covered inside the block - that is, the block tries to allocate, and then in the finally we throw if we realized we've allocated too much.
>
> What is missing, I think, is a check (size > arenaSize) at the beginning of the method (we only check this in one of the paths). But we need to check before and after, I think, as it is possible to allocate a segment and then realize that we ended up overflowing the arena size.

While what I said above correctly reflects what the implementation does, I think a broader issue is that the arena allocator implementation is allocating sometimes more native memory than what its contract specifies. While in some cases we can prevent that, I think in the general case (e.g. where we allocate a new block) we cannot, unless we add extra API guarantees - e.g. that the arena size should be a multiple of the block size (but then we'd have to special case `Long.MAX_VALUE`, or maybe pick a "big enough" power of two instead)

-------------

PR: https://git.openjdk.java.net/jdk/pull/5907

From jvernee at openjdk.java.net  Tue Nov  2 19:49:16 2021
From: jvernee at openjdk.java.net (Jorn Vernee)
Date: Tue, 2 Nov 2021 19:49:16 GMT
Subject: RFR: 8275063: Implementation of Foreign Function & Memory API
 (Second incubator) [v12]
In-Reply-To: <smw2ow_lpXCXpH6gI8RI_bgfEEWfOUL8l1Th75EkqDE=.7316e2f5-1434-486a-a492-d48c4352877f@github.com>
References: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
 <LZdOj0F56w1jvphEqwIPKjBGVqK9bK18hoyBRS7LvH8=.28940500-42f1-4550-94ab-4dc090785924@github.com>
 <wCzCmT6ysO0EAQgn5C6ZbDTTZxGk8Bf0JnuKxjOfUQQ=.e6be3695-4027-4845-a2a2-d16b39290f75@github.com>
 <LmumNYBTtfpHdsaUSOYaYA65QtVIiewuGNF2Jxk7-kI=.e6199a44-78c7-492b-89ee-700a406aa642@github.com>
 <ekptcSWurL7TCI9UQg9oUm261suMGQedFTB-s86XC_w=.2eb23e6e-c51e-45d9-ac4f-305fc97c9be1@github.com>
 <smw2ow_lpXCXpH6gI8RI_bgfEEWfOUL8l1Th75EkqDE=.7316e2f5-1434-486a-a492-d48c4352877f@github.com>
Message-ID: <P_Y3_iIoH99Qj1lvgehRTP_ibi9QU-fENOzmkK6gDxU=.de280620-0f33-48a9-93e7-f861ff3eb0e8@github.com>

On Tue, 2 Nov 2021 19:02:51 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

>> What is missing, I think, is a check (size > arenaSize) at the beginning of the method (we only check this in one of the paths). But we need to check before and after, I think, as it is possible to allocate a segment and then realize that we ended up overflowing the arena size.
>
> While what I said above correctly reflects what the implementation does, I think a broader issue is that the arena allocator implementation is allocating sometimes more native memory than what its contract specifies. While in some cases we can prevent that, I think in the general case (e.g. where we allocate a new block) we cannot, unless we add extra API guarantees - e.g. that the arena size should be a multiple of the block size (but then we'd have to special case `Long.MAX_VALUE`, or maybe pick a "big enough" power of two instead)

Maybe we should not support block size in the case of a bounded arena. i.e. just allocate the whole thing upfront, and have 3 APIs:

1. arena with no bounds and default block size.
2. arena with no bounds and custom block size.
3. arena with bounds, that has no blocks size but allocates the whole thing in one go (could be modeled as block size = arena size).

Right now we have 1. and 2., but instead of 3. we have a variant that allows setting both the arena size and block size.

If we want to keep what we currently have, I'd suggest changing the arena size to a block count for the variant that takes both the arena size and the block size (I think in that case `Long.MAX_VALUE` should still work?).

Any ways, that seems like something that could be addressed in 19 as well.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5907

From mcimadamore at openjdk.java.net  Tue Nov  2 21:33:46 2021
From: mcimadamore at openjdk.java.net (Maurizio Cimadamore)
Date: Tue, 2 Nov 2021 21:33:46 GMT
Subject: RFR: 8275063: Implementation of Foreign Function & Memory API
 (Second incubator) [v14]
In-Reply-To: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
References: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
Message-ID: <supho3PloghhSdg03onlnilrfi3lZ2JowsQ70DkJPd8=.d4fc30ce-243f-4d35-95c9-ba48cd6afc26@github.com>

> This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment.
> 
> [1] - https://openjdk.java.net/jeps/419

Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision:

  Fix long comment line in Module.java

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/5907/files
  - new: https://git.openjdk.java.net/jdk/pull/5907/files/1126133a..c219ae12

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=13
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=12-13

  Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/5907.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/5907/head:pull/5907

PR: https://git.openjdk.java.net/jdk/pull/5907

From duke at openjdk.java.net  Tue Nov  2 23:47:26 2021
From: duke at openjdk.java.net (Joshua Cao)
Date: Tue, 2 Nov 2021 23:47:26 GMT
Subject: RFR: 8274860: gcc 10.2.1 produces an uninitialized warning in
 sharedRuntimeTrig.cpp
Message-ID: <wmAuCknOmuK3H_2Fw8kJ95QKPEJBWI-Hlr19IDZIZaA=.0497fe7b-cb66-4d22-8c76-024b8d1b3288@github.com>

Initialize `fq` to an array to zeroes.

-------------

Commit messages:
 - 8274860: gcc 10.2.1 produces an uninitialized warning in sharedRuntimeTrig.cpp

Changes: https://git.openjdk.java.net/jdk/pull/6220/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6220&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8274860
  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6220.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6220/head:pull/6220

PR: https://git.openjdk.java.net/jdk/pull/6220

From manc at openjdk.java.net  Wed Nov  3 01:07:28 2021
From: manc at openjdk.java.net (Man Cao)
Date: Wed, 3 Nov 2021 01:07:28 GMT
Subject: RFR: 8276453: Undefined behavior in C1 LIR_OprDesc causes SEGV in
 fastdebug build
Message-ID: <xaWM4bbpTGc-F4Rj5OEo63U2uiIcV2TtaE5qVoMhm0Y=.4e24408f-1819-403b-9be6-49549d476947@github.com>

Hi all,

Could anyone provide some feedback on this bug fix and refactoring change? See https://bugs.openjdk.java.net/browse/JDK-8276453 for more details.
If the direction of this change looks good, we can proceed removing the "UGLY HACK" in c1_LIR.hpp and refactor occurrences of "opr->fn()" to "opr.fn()".

-------------

Commit messages:
 - Add _value field and rename LIR_OprDesc to LIR_Opr

Changes: https://git.openjdk.java.net/jdk/pull/6221/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6221&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8276453
  Stats: 287 lines in 25 files changed: 23 ins; 16 del; 248 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6221.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6221/head:pull/6221

PR: https://git.openjdk.java.net/jdk/pull/6221

From dholmes at openjdk.java.net  Wed Nov  3 01:48:14 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Wed, 3 Nov 2021 01:48:14 GMT
Subject: RFR: 8274860: gcc 10.2.1 produces an uninitialized warning in
 sharedRuntimeTrig.cpp
In-Reply-To: <wmAuCknOmuK3H_2Fw8kJ95QKPEJBWI-Hlr19IDZIZaA=.0497fe7b-cb66-4d22-8c76-024b8d1b3288@github.com>
References: <wmAuCknOmuK3H_2Fw8kJ95QKPEJBWI-Hlr19IDZIZaA=.0497fe7b-cb66-4d22-8c76-024b8d1b3288@github.com>
Message-ID: <YFoUZLyq6k9SopnPkS0idK_1OIa8Sbn-POJA40VzOh8=.b03d249c-2028-4849-b8ae-4a28b0f1fc6f@github.com>

On Tue, 2 Nov 2021 23:39:48 GMT, Joshua Cao <duke at openjdk.java.net> wrote:

> Initialize `fq` to an array to zeroes.

Hi Joshua,

This warning looks like a false positive to me. I'd prefer to see the warning disabled than make a change to highly optimised math code.

Cheers,
David

-------------

Changes requested by dholmes (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6220

From njian at openjdk.java.net  Wed Nov  3 03:15:15 2021
From: njian at openjdk.java.net (Ningsheng Jian)
Date: Wed, 3 Nov 2021 03:15:15 GMT
Subject: RFR: 8271515: Integration of JEP 417: Vector API (Third Incubator)
 [v7]
In-Reply-To: <OpWaNZuhL36S1Co2ItS-TiqRn7IIzEe_GF6MEAwZzwk=.9107d865-8b13-4a23-b381-424f7b017e95@github.com>
References: <_QQ9ntdJJfzVcAGrbjev0ZM-xNfD4wNATphnXkb-Y00=.bbf46985-8776-4dda-ada5-b15ab50774aa@github.com>
 <OpWaNZuhL36S1Co2ItS-TiqRn7IIzEe_GF6MEAwZzwk=.9107d865-8b13-4a23-b381-424f7b017e95@github.com>
Message-ID: <4RJyhhtKPTjcJ894CoYqMYX0RdAsjRj0wwDcug9x4I8=.12d8e963-dc36-4cce-ad1b-241188dadd7b@github.com>

On Wed, 27 Oct 2021 21:42:29 GMT, Paul Sandoz <psandoz at openjdk.org> wrote:

>> This PR improves the performance of vector operations that accept masks on architectures that support masking in hardware, specifically Intel AVX512 and ARM SVE.
>> 
>> On architectures that do not support masking in hardware the same technique as before is applied to most operations, specifically composition using blend.
>> 
>> Masked loads/stores are a special form of masked operation that require additional care to ensure out-of-bounds access throw exceptions. The range checking has not been fully optimized and will require further work.
>> 
>> No API enhancements were required and only a few additional tests were needed.
>
> Paul Sandoz has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits:
> 
>  - Merge branch 'master' into JDK-8271515-vector-api
>  - Merge pull request #1 from nsjian/JDK-8271515
>    
>    Address AArch64 review comments from Nick.
>  - Address review comments from Nick.
>  - Merge branch 'master' into JDK-8271515-vector-api
>  - Resolve review comments.
>  - Merge branch 'master' into JDK-8271515-vector-api
>  - Apply patch from https://github.com/openjdk/panama-vector/pull/152
>  - Apply patch from https://github.com/openjdk/panama-vector/pull/142
>  - Apply patch from https://github.com/openjdk/panama-vector/pull/139
>  - Apply patch from https://github.com/openjdk/panama-vector/pull/151
>  - ... and 2 more: https://git.openjdk.java.net/jdk/compare/9a3e9542...c9a77225

src/hotspot/cpu/aarch64/aarch64_sve_ad.m4 line 2349:

> 2347:     BasicType to_bt = Matcher::vector_element_basic_type(this);
> 2348:     Assembler::SIMD_RegVariant to_size = __ elemType_to_regVariant(to_bt);
> 2349:     __ sve_fcvtzs(as_FloatRegister($dst$$reg), __ D, ptrue, as_FloatRegister($src$$reg), __ D);

Converting from double to long and then narrow to target types did not follow JLS. I will fix it. Thanks to @fg1417 for helping to find out this issue.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5873

From shade at openjdk.java.net  Wed Nov  3 09:30:13 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Wed, 3 Nov 2021 09:30:13 GMT
Subject: RFR: 8276217: Harmonize StrictMath intrinsics handling [v3]
In-Reply-To: <x0dEIeACmCL8uU35Mbr_ej-ZMRgI9Ye-dE_APsoZYXs=.d90877ae-d6a1-48d1-b8fa-e235509a492b@github.com>
References: <v9bs8_XIuFrN450gme8g0FcjBEOepcUmr3MStF-B7pg=.5ee1a282-da4b-4ec0-a71c-f5321ff702c8@github.com>
 <x0dEIeACmCL8uU35Mbr_ej-ZMRgI9Ye-dE_APsoZYXs=.d90877ae-d6a1-48d1-b8fa-e235509a492b@github.com>
Message-ID: <NDPkyE6ezHY0-KA97DnparSocaSCliKJtIMh84Uoluk=.b4fdf5ee-2c84-4f27-ae70-f4a048410100@github.com>

On Tue, 2 Nov 2021 06:25:33 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> This blocks JDK-8276215: `StrictMath` intrinsics are handled peculiarly by giving failing intrinsics a second chance to match against the similar `Math` intrinsics. This has interesting consequence for matchers: we can match the native `StrictMath.sqrt` to non-native intrinsic for `Math.sqrt`. Interpreter would then have to disambiguate the two. It could be made simpler and more consistent.
>> 
>> For `min`/`max` methods, `StrictMath` already delegates to `Math` methods, so we can just drop the intrinsics for them. `sqrt` is harder to delegate, because it is `native` and a part of public API, so we can instead do the proper special intrinsic for it.
>> 
>> There seem to be no performance regressions with this patch at least on Linux x86_64:
>> 
>> 
>> $ CONF=linux-x86_64-server-release make test TEST="micro:StrictMathBench" 
>> 
>> Benchmark                   Mode  Cnt       Score     Error   Units
>> 
>> ### Before
>> 
>> StrictMathBench.minDouble  thrpt    4  230921.558 ? 234.238  ops/ms
>> StrictMathBench.minFloat   thrpt    4  230932.303 ? 126.721  ops/ms
>> StrictMathBench.minInt     thrpt    4  230917.256 ?  73.008  ops/ms
>> StrictMathBench.minLong    thrpt    4  194460.828 ? 178.079  ops/ms
>> 
>> 
>> StrictMathBench.maxDouble  thrpt    4  230983.180 ? 161.211  ops/ms
>> StrictMathBench.maxFloat   thrpt    4  230969.290 ? 277.500  ops/ms
>> StrictMathBench.maxInt     thrpt    4  231033.581 ? 200.015  ops/ms
>> StrictMathBench.maxLong    thrpt    4  194590.744 ? 114.295  ops/ms
>> 
>> 
>> StrictMathBench.sqrtDouble  thrpt    4  230722.037 ? 2222.080  ops/ms
>> 
>> ### After
>> 
>> StrictMathBench.minDouble  thrpt    4  230976.625 ?  67.338  ops/ms
>> StrictMathBench.minFloat   thrpt    4  230896.021 ? 270.434  ops/ms
>> StrictMathBench.minInt     thrpt    4  230859.741 ? 403.147  ops/ms
>> StrictMathBench.minLong    thrpt    4  194456.673 ? 111.557  ops/ms
>> 
>> StrictMathBench.maxDouble  thrpt    4  230890.776 ?  89.924  ops/ms
>> StrictMathBench.maxFloat   thrpt    4  230918.334 ?  63.160  ops/ms
>> StrictMathBench.maxInt     thrpt    4  231059.128 ?  51.224  ops/ms
>> StrictMathBench.maxLong    thrpt    4  194488.210 ? 495.224  ops/ms
>> 
>> StrictMathBench.sqrtDouble  thrpt    4  231023.703 ? 247.330  ops/ms
>> 
>> 
>> Additional testing:
>>  - [x] `StrictMath` benchmarks
>>  - [x] Linux x86_64 fastdebug `java/lang/StrictMath`, `java/lang/Math`
>>  - [x] Linux x86_64 fastdebug `tier1`
>
> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Keep intrinsics on StrictMath

Thanks! I re-ran the tests, they seem to be fine. I need a second (R)eviewer for this.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6184

From aph at openjdk.java.net  Wed Nov  3 09:37:16 2021
From: aph at openjdk.java.net (Andrew Haley)
Date: Wed, 3 Nov 2021 09:37:16 GMT
Subject: RFR: 8276217: Harmonize StrictMath intrinsics handling [v3]
In-Reply-To: <x0dEIeACmCL8uU35Mbr_ej-ZMRgI9Ye-dE_APsoZYXs=.d90877ae-d6a1-48d1-b8fa-e235509a492b@github.com>
References: <v9bs8_XIuFrN450gme8g0FcjBEOepcUmr3MStF-B7pg=.5ee1a282-da4b-4ec0-a71c-f5321ff702c8@github.com>
 <x0dEIeACmCL8uU35Mbr_ej-ZMRgI9Ye-dE_APsoZYXs=.d90877ae-d6a1-48d1-b8fa-e235509a492b@github.com>
Message-ID: <O68RUcvNXC9M6IZW4Jxqn64xvy1QrfWItLypiuip7r4=.24566bef-15f2-4ff6-9be8-391a9d4f49c8@github.com>

On Tue, 2 Nov 2021 06:25:33 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> This blocks JDK-8276215: `StrictMath` intrinsics are handled peculiarly by giving failing intrinsics a second chance to match against the similar `Math` intrinsics. This has interesting consequence for matchers: we can match the native `StrictMath.sqrt` to non-native intrinsic for `Math.sqrt`. Interpreter would then have to disambiguate the two. It could be made simpler and more consistent.
>> 
>> For `min`/`max` methods, `StrictMath` already delegates to `Math` methods, so we can just drop the intrinsics for them. `sqrt` is harder to delegate, because it is `native` and a part of public API, so we can instead do the proper special intrinsic for it.
>> 
>> There seem to be no performance regressions with this patch at least on Linux x86_64:
>> 
>> 
>> $ CONF=linux-x86_64-server-release make test TEST="micro:StrictMathBench" 
>> 
>> Benchmark                   Mode  Cnt       Score     Error   Units
>> 
>> ### Before
>> 
>> StrictMathBench.minDouble  thrpt    4  230921.558 ? 234.238  ops/ms
>> StrictMathBench.minFloat   thrpt    4  230932.303 ? 126.721  ops/ms
>> StrictMathBench.minInt     thrpt    4  230917.256 ?  73.008  ops/ms
>> StrictMathBench.minLong    thrpt    4  194460.828 ? 178.079  ops/ms
>> 
>> 
>> StrictMathBench.maxDouble  thrpt    4  230983.180 ? 161.211  ops/ms
>> StrictMathBench.maxFloat   thrpt    4  230969.290 ? 277.500  ops/ms
>> StrictMathBench.maxInt     thrpt    4  231033.581 ? 200.015  ops/ms
>> StrictMathBench.maxLong    thrpt    4  194590.744 ? 114.295  ops/ms
>> 
>> 
>> StrictMathBench.sqrtDouble  thrpt    4  230722.037 ? 2222.080  ops/ms
>> 
>> ### After
>> 
>> StrictMathBench.minDouble  thrpt    4  230976.625 ?  67.338  ops/ms
>> StrictMathBench.minFloat   thrpt    4  230896.021 ? 270.434  ops/ms
>> StrictMathBench.minInt     thrpt    4  230859.741 ? 403.147  ops/ms
>> StrictMathBench.minLong    thrpt    4  194456.673 ? 111.557  ops/ms
>> 
>> StrictMathBench.maxDouble  thrpt    4  230890.776 ?  89.924  ops/ms
>> StrictMathBench.maxFloat   thrpt    4  230918.334 ?  63.160  ops/ms
>> StrictMathBench.maxInt     thrpt    4  231059.128 ?  51.224  ops/ms
>> StrictMathBench.maxLong    thrpt    4  194488.210 ? 495.224  ops/ms
>> 
>> StrictMathBench.sqrtDouble  thrpt    4  231023.703 ? 247.330  ops/ms
>> 
>> 
>> Additional testing:
>>  - [x] `StrictMath` benchmarks
>>  - [x] Linux x86_64 fastdebug `java/lang/StrictMath`, `java/lang/Math`
>>  - [x] Linux x86_64 fastdebug `tier1`
>
> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Keep intrinsics on StrictMath

Marked as reviewed by aph (Reviewer).

-------------

PR: https://git.openjdk.java.net/jdk/pull/6184

From duke at openjdk.java.net  Wed Nov  3 10:03:20 2021
From: duke at openjdk.java.net (duke)
Date: Wed, 3 Nov 2021 10:03:20 GMT
Subject: Withdrawn: 8137018: [JVMCI] Encapsulate new Thread fields for JVMCI
In-Reply-To: <NWmWajlQeuKETqpR6QIpXNLZcZMkLxfRCCEXyGFAqGU=.aeab6312-c18a-47e6-9842-f0f79f0bc9d1@github.com>
References: <NWmWajlQeuKETqpR6QIpXNLZcZMkLxfRCCEXyGFAqGU=.aeab6312-c18a-47e6-9842-f0f79f0bc9d1@github.com>
Message-ID: <isG4hB9mrdfGQBJr_igU_hP_vH4UFebxAU116pbTPjg=.2731ff7a-d56f-4426-9a54-67f9c2083d7b@github.com>

On Wed, 1 Sep 2021 18:03:11 GMT, Tom Rodriguez <never at openjdk.org> wrote:

> This evacuates all JVMCI related methods and fields into a separately declared struct.

This pull request has been closed without being integrated.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5339

From aph at openjdk.java.net  Wed Nov  3 10:17:32 2021
From: aph at openjdk.java.net (Andrew Haley)
Date: Wed, 3 Nov 2021 10:17:32 GMT
Subject: RFR: 8275586: Zero: Simplify interpreter initialization
In-Reply-To: <8FCUBqssHqcaYRC6gnr37F8A9gGX1Hzvx8ny5BQblOY=.b54f2029-7a37-493a-bcc9-6fec9c29c943@github.com>
References: <8FCUBqssHqcaYRC6gnr37F8A9gGX1Hzvx8ny5BQblOY=.b54f2029-7a37-493a-bcc9-6fec9c29c943@github.com>
Message-ID: <LNGvasi-FTT0PVUabQ1H-Ci9EPKxiz7RP4LSwimNGvw=.4ee2ee91-288e-4883-b1a7-33735fab6781@github.com>

On Wed, 20 Oct 2021 07:44:36 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> The prolog in `BytecodeInterpreter` is hairy due to early initialization of interpreter statics. Previous rewrites make it mostly redundant, and we can now simplify it.
> 
> This also implicitly fixes a initialization bug. If `JvmtiExport::can_post_interpreter_events()` changes at runtime, we will call into the uninitialized version:
> 
> 
>     // Call the interpreter
>     if (JvmtiExport::can_post_interpreter_events()) {
>       BytecodeInterpreter::run<true>(istate);
>     } else {
>       BytecodeInterpreter::run<false>(istate);
>     } 
> 
> 
> Additional testing:
>  - [x] Linux x86_64 fastdebug `make bootcycle-images`

Marked as reviewed by aph (Reviewer).

src/hotspot/share/interpreter/zero/bytecodeInterpreter.cpp line 417:

> 415: #define THREAD istate->thread()
> 416: #endif
> 417: 

This is a weirdly-hacky optimization, and is perhaps obsolete on modern compilers. While simplifying, I'd take it out.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6029

From shade at openjdk.java.net  Wed Nov  3 10:17:33 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Wed, 3 Nov 2021 10:17:33 GMT
Subject: RFR: 8275586: Zero: Simplify interpreter initialization
In-Reply-To: <LNGvasi-FTT0PVUabQ1H-Ci9EPKxiz7RP4LSwimNGvw=.4ee2ee91-288e-4883-b1a7-33735fab6781@github.com>
References: <8FCUBqssHqcaYRC6gnr37F8A9gGX1Hzvx8ny5BQblOY=.b54f2029-7a37-493a-bcc9-6fec9c29c943@github.com>
 <LNGvasi-FTT0PVUabQ1H-Ci9EPKxiz7RP4LSwimNGvw=.4ee2ee91-288e-4883-b1a7-33735fab6781@github.com>
Message-ID: <z0-U4N4affmXYuOV-rwS9aza_9ypahliRd0kRZS6AfY=.09920197-4844-415b-8ab6-afb2eae984a1@github.com>

On Wed, 3 Nov 2021 10:12:19 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> The prolog in `BytecodeInterpreter` is hairy due to early initialization of interpreter statics. Previous rewrites make it mostly redundant, and we can now simplify it.
>> 
>> This also implicitly fixes a initialization bug. If `JvmtiExport::can_post_interpreter_events()` changes at runtime, we will call into the uninitialized version:
>> 
>> 
>>     // Call the interpreter
>>     if (JvmtiExport::can_post_interpreter_events()) {
>>       BytecodeInterpreter::run<true>(istate);
>>     } else {
>>       BytecodeInterpreter::run<false>(istate);
>>     } 
>> 
>> 
>> Additional testing:
>>  - [x] Linux x86_64 fastdebug `make bootcycle-images`
>
> src/hotspot/share/interpreter/zero/bytecodeInterpreter.cpp line 417:
> 
>> 415: #define THREAD istate->thread()
>> 416: #endif
>> 417: 
> 
> This is a weirdly-hacky optimization, and is perhaps obsolete on modern compilers. While simplifying, I'd take it out.

I remember following up on this whole `LOTS_OF_REGS` mess, and it seems still profitable. I can take a look in a separate RFE, OK?

-------------

PR: https://git.openjdk.java.net/jdk/pull/6029

From aph at openjdk.java.net  Wed Nov  3 11:02:18 2021
From: aph at openjdk.java.net (Andrew Haley)
Date: Wed, 3 Nov 2021 11:02:18 GMT
Subject: RFR: 8275586: Zero: Simplify interpreter initialization
In-Reply-To: <z0-U4N4affmXYuOV-rwS9aza_9ypahliRd0kRZS6AfY=.09920197-4844-415b-8ab6-afb2eae984a1@github.com>
References: <8FCUBqssHqcaYRC6gnr37F8A9gGX1Hzvx8ny5BQblOY=.b54f2029-7a37-493a-bcc9-6fec9c29c943@github.com>
 <LNGvasi-FTT0PVUabQ1H-Ci9EPKxiz7RP4LSwimNGvw=.4ee2ee91-288e-4883-b1a7-33735fab6781@github.com>
 <z0-U4N4affmXYuOV-rwS9aza_9ypahliRd0kRZS6AfY=.09920197-4844-415b-8ab6-afb2eae984a1@github.com>
Message-ID: <ybBq0Lu3sCWElmTwJpKGYAgszB7X3byDlTKrUvcUHaM=.7afeea9a-4585-4fdb-bea3-267e5cccf0ba@github.com>

On Wed, 3 Nov 2021 10:14:24 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> src/hotspot/share/interpreter/zero/bytecodeInterpreter.cpp line 417:
>> 
>>> 415: #define THREAD istate->thread()
>>> 416: #endif
>>> 417: 
>> 
>> This is a weirdly-hacky optimization, and is perhaps obsolete on modern compilers. While simplifying, I'd take it out.
>
> I remember following up on this whole `LOTS_OF_REGS` mess, and it seems still profitable. I can take a look in a separate RFE, OK?

OK.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6029

From mcimadamore at openjdk.java.net  Wed Nov  3 11:32:50 2021
From: mcimadamore at openjdk.java.net (Maurizio Cimadamore)
Date: Wed, 3 Nov 2021 11:32:50 GMT
Subject: RFR: 8275063: Implementation of Foreign Function & Memory API
 (Second incubator) [v15]
In-Reply-To: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
References: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
Message-ID: <3l5SgC7qqzs4wj1leQ3TKp4gqDMXozx6W6bUxO1wlTA=.5db656f9-98d2-474a-918a-f076e63be127@github.com>

> This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment.
> 
> [1] - https://openjdk.java.net/jeps/419

Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision:

  Simplify ArenaAllocator impl.
  The arena should respect its boundaries and never allocate more memory than its size specifies.

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/5907/files
  - new: https://git.openjdk.java.net/jdk/pull/5907/files/c219ae12..7f847271

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=14
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=13-14

  Stats: 40 lines in 1 file changed: 8 ins; 15 del; 17 mod
  Patch: https://git.openjdk.java.net/jdk/pull/5907.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/5907/head:pull/5907

PR: https://git.openjdk.java.net/jdk/pull/5907

From david.holmes at oracle.com  Wed Nov  3 12:09:52 2021
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 3 Nov 2021 22:09:52 +1000
Subject: RFR: 8275718: Relax memory constraint on exception counter updates
In-Reply-To: <gl_aD-XD3JMqev_LazMh-ZL0JYhVrOaS_ii_xKHSpAc=.a9cbb293-f1f6-4528-a5eb-ebfaa476f167@github.com>
References: <c2Zx315h8Vh5o3lI8wa9Pvt6LU3jYx0xg0RicwpjBk0=.058e2612-2d0b-4c6c-94d8-1cd30a3853f4@github.com>
 <gl_aD-XD3JMqev_LazMh-ZL0JYhVrOaS_ii_xKHSpAc=.a9cbb293-f1f6-4528-a5eb-ebfaa476f167@github.com>
Message-ID: <90446117-abb2-d26b-0396-be21a6387252@oracle.com>

On 1/11/2021 7:16 pm, Aleksey Shipilev wrote:
> On Thu, 21 Oct 2021 15:16:28 GMT, Zhengyu Gu <zgu at openjdk.org> wrote:
> 
>> This is another instance of counter updates that only need atomic guarantee.
> 
> (I am not arguing in favor or against this particular change, but I think we can talk a bit about generic stuff here...)
> 
>> I don't know where this guarantee is coming from. Two r-m-w atomic ops must have some guarantee via coherence for the atomic op to actually work. And an implementation could make any atomic r-m-w implementation ensure global immediate visibility. But you cannot assume this is guaranteed for all hardware. Even for a given platform this would need to be a specified guarantee in the architecture manual, not just something deduced/inferred by reasoning.
> 
> Hotspot's `memory_order_relaxed` is [aligned](https://github.com/openjdk/jdk/blob/5bb1992b8408a0d196b1afa308bc00d007458dbd/src/hotspot/share/runtime/atomic.hpp#L44-L45) with C++11 atomics semantics. C++11 atomic semantics for relaxed atomic ops requires [single modification order consistency](https://en.cppreference.com/w/cpp/atomic/memory_order#Relaxed_ordering), which implies [coherence](https://en.cppreference.com/w/cpp/atomic/memory_order#Modification_order).
> 
> All known hardware platforms provide coherence out of the box (they are, indeed, cache-coherent platforms), that's why it is easy to implement in C++ (`mo_relaxed`) and in Java (`VarHandles.(get|set)opaque`).
> 
> I am always confused by "immediate global visibility". The problem with statements that include "immediate", "before", "after" is that they leak in the notion of time, which is ill-defined for a single memory location without any reference to other variables. Maybe you can expand your concern with the example?

Let me back up to be clear. I stated that memory-order-conservative 
might lower the chances (in a general platform-agnostic way) of seeing a 
stale value, compared to memory-order-relaxed, due to the stronger 
memory fence/barrier operation it implies. The response to that was:

"value updated via atomic r-m-w operation should be visible to other 
threads guaranteed by coherence protocol"

claiming that visibility guarantees were inherently present due to 
coherence regardless of what kind of memory fence/barrier were 
associated with the r-m-w atomic operation. I'm not sure if that is 
actually true. If it is true then we would not need any memory-order 
parameter on the r-m-w atomic operations because they would be all be 
the same due to this underlying coherence property.

When I said "immediate global visibility" I was referring to a situation 
where once the write in the r-m-w atomic op had occurred then all 
subsequent reads would see the value of that write. It is true that such 
a thing may not require "immediacy" in a temporal sense, but the net 
effect is the same.

David
-----

> -------------
> 
> PR: https://git.openjdk.java.net/jdk/pull/6065
> 

From david.holmes at oracle.com  Wed Nov  3 12:23:47 2021
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 3 Nov 2021 22:23:47 +1000
Subject: RFR: 8275718: Relax memory constraint on exception counter updates
In-Reply-To: <90446117-abb2-d26b-0396-be21a6387252@oracle.com>
References: <c2Zx315h8Vh5o3lI8wa9Pvt6LU3jYx0xg0RicwpjBk0=.058e2612-2d0b-4c6c-94d8-1cd30a3853f4@github.com>
 <gl_aD-XD3JMqev_LazMh-ZL0JYhVrOaS_ii_xKHSpAc=.a9cbb293-f1f6-4528-a5eb-ebfaa476f167@github.com>
 <90446117-abb2-d26b-0396-be21a6387252@oracle.com>
Message-ID: <e4f6ad65-d75f-6495-33de-cdf7e79da229@oracle.com>

Correction ...

On 3/11/2021 10:09 pm, David Holmes wrote:
> On 1/11/2021 7:16 pm, Aleksey Shipilev wrote:
>> On Thu, 21 Oct 2021 15:16:28 GMT, Zhengyu Gu <zgu at openjdk.org> wrote:
>>
>>> This is another instance of counter updates that only need atomic 
>>> guarantee.
>>
>> (I am not arguing in favor or against this particular change, but I 
>> think we can talk a bit about generic stuff here...)
>>
>>> I don't know where this guarantee is coming from. Two r-m-w atomic 
>>> ops must have some guarantee via coherence for the atomic op to 
>>> actually work. And an implementation could make any atomic r-m-w 
>>> implementation ensure global immediate visibility. But you cannot 
>>> assume this is guaranteed for all hardware. Even for a given platform 
>>> this would need to be a specified guarantee in the architecture 
>>> manual, not just something deduced/inferred by reasoning.
>>
>> Hotspot's `memory_order_relaxed` is 
>> [aligned](https://github.com/openjdk/jdk/blob/5bb1992b8408a0d196b1afa308bc00d007458dbd/src/hotspot/share/runtime/atomic.hpp#L44-L45) 
>> with C++11 atomics semantics. C++11 atomic semantics for relaxed 
>> atomic ops requires [single modification order 
>> consistency](https://en.cppreference.com/w/cpp/atomic/memory_order#Relaxed_ordering), 
>> which implies 
>> [coherence](https://en.cppreference.com/w/cpp/atomic/memory_order#Modification_order). 
>>
>>
>> All known hardware platforms provide coherence out of the box (they 
>> are, indeed, cache-coherent platforms), that's why it is easy to 
>> implement in C++ (`mo_relaxed`) and in Java 
>> (`VarHandles.(get|set)opaque`).
>>
>> I am always confused by "immediate global visibility". The problem 
>> with statements that include "immediate", "before", "after" is that 
>> they leak in the notion of time, which is ill-defined for a single 
>> memory location without any reference to other variables. Maybe you 
>> can expand your concern with the example?
> 
> Let me back up to be clear. I stated that memory-order-conservative 
> might lower the chances (in a general platform-agnostic way) of seeing a 
> stale value, compared to memory-order-relaxed, due to the stronger 
> memory fence/barrier operation it implies. The response to that was:
> 
> "value updated via atomic r-m-w operation should be visible to other 
> threads guaranteed by coherence protocol"
> 
> claiming that visibility guarantees were inherently present due to 
> coherence regardless of what kind of memory fence/barrier were 
> associated with the r-m-w atomic operation. I'm not sure if that is 
> actually true. If it is true then we would not need any memory-order 
> parameter on the r-m-w atomic operations because they would be all be 
> the same due to this underlying coherence property.

No that isn't true. I see now that the C++ "Modification Order" 
definition requires the write to the counter to be (for want of a better 
term) "immediately visible" to any subsequent read - so no stale value 
could be read. That is a far stronger guarantee than I expected from 
mo_relaxed. The use of other mo values on the r-m-w atomic operation 
impact the ordering between that variable and other atomic variables.

David
-----

> When I said "immediate global visibility" I was referring to a situation 
> where once the write in the r-m-w atomic op had occurred then all 
> subsequent reads would see the value of that write. It is true that such 
> a thing may not require "immediacy" in a temporal sense, but the net 
> effect is the same.
> 
> David
> -----
> 
>> -------------
>>
>> PR: https://git.openjdk.java.net/jdk/pull/6065
>>

From mcimadamore at openjdk.java.net  Wed Nov  3 13:08:55 2021
From: mcimadamore at openjdk.java.net (Maurizio Cimadamore)
Date: Wed, 3 Nov 2021 13:08:55 GMT
Subject: RFR: 8275063: Implementation of Foreign Function & Memory API
 (Second incubator) [v16]
In-Reply-To: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
References: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
Message-ID: <rOalCbBDAEN_vlQC0lNqzPsbsSDwbM_yfuVd5WPwrLU=.04da3f41-1fc9-4cfa-bc75-2483e37e2bdc@github.com>

> This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment.
> 
> [1] - https://openjdk.java.net/jeps/419

Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision:

  Make ArenaAllocator impl more flexible in the face of OOME
  An ArenaAllocator should remain open for business, even if OOME is thrown in case other allocations can fit the arena size.

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/5907/files
  - new: https://git.openjdk.java.net/jdk/pull/5907/files/7f847271..9fafb2a6

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=15
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=14-15

  Stats: 13 lines in 2 files changed: 3 ins; 6 del; 4 mod
  Patch: https://git.openjdk.java.net/jdk/pull/5907.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/5907/head:pull/5907

PR: https://git.openjdk.java.net/jdk/pull/5907

From jvernee at openjdk.java.net  Wed Nov  3 13:40:16 2021
From: jvernee at openjdk.java.net (Jorn Vernee)
Date: Wed, 3 Nov 2021 13:40:16 GMT
Subject: RFR: 8275063: Implementation of Foreign Function & Memory API
 (Second incubator) [v16]
In-Reply-To: <rOalCbBDAEN_vlQC0lNqzPsbsSDwbM_yfuVd5WPwrLU=.04da3f41-1fc9-4cfa-bc75-2483e37e2bdc@github.com>
References: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
 <rOalCbBDAEN_vlQC0lNqzPsbsSDwbM_yfuVd5WPwrLU=.04da3f41-1fc9-4cfa-bc75-2483e37e2bdc@github.com>
Message-ID: <hJuLXwbM4GqJmdmYNuRXLnah0y4nowhk68uLdoKCFcw=.d90d275b-41f2-4708-b561-620bbc3f31d5@github.com>

On Wed, 3 Nov 2021 13:08:55 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

>> This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment.
>> 
>> [1] - https://openjdk.java.net/jeps/419
>
> Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Make ArenaAllocator impl more flexible in the face of OOME
>   An ArenaAllocator should remain open for business, even if OOME is thrown in case other allocations can fit the arena size.

Marked as reviewed by jvernee (Reviewer).

-------------

PR: https://git.openjdk.java.net/jdk/pull/5907

From zgu at openjdk.java.net  Wed Nov  3 16:54:26 2021
From: zgu at openjdk.java.net (Zhengyu Gu)
Date: Wed, 3 Nov 2021 16:54:26 GMT
Subject: RFR: 8275718: Relax memory constraint on exception counter updates
In-Reply-To: <c2Zx315h8Vh5o3lI8wa9Pvt6LU3jYx0xg0RicwpjBk0=.058e2612-2d0b-4c6c-94d8-1cd30a3853f4@github.com>
References: <c2Zx315h8Vh5o3lI8wa9Pvt6LU3jYx0xg0RicwpjBk0=.058e2612-2d0b-4c6c-94d8-1cd30a3853f4@github.com>
Message-ID: <XGrFsvU2SFatQ8E0elOJUq2IG83jvTTA9yq98TD842k=.5ae4c3ea-ded7-43a6-9982-f97007687f0f@github.com>

On Thu, 21 Oct 2021 15:16:28 GMT, Zhengyu Gu <zgu at openjdk.org> wrote:

> This is another instance of counter updates that only need atomic guarantee.

> _Mailing list message from [David Holmes](mailto:david.holmes at oracle.com) on [hotspot-dev](mailto:hotspot-dev at mail.openjdk.java.net):_
> 
> Correction ...
> 
> On 3/11/2021 10:09 pm, David Holmes wrote:
> 
> > On 1/11/2021 7:16 pm, Aleksey Shipilev wrote:
> > > On Thu, 21 Oct 2021 15:16:28 GMT, Zhengyu Gu <zgu at openjdk.org> wrote:
> > > > This is another instance of counter updates that only need atomic
> > > > guarantee.
> > > 
> > > 
> > > (I am not arguing in favor or against this particular change, but I
> > > think we can talk a bit about generic stuff here...)
> > > > I don't know where this guarantee is coming from. Two r-m-w atomic
> > > > ops must have some guarantee via coherence for the atomic op to
> > > > actually work. And an implementation could make any atomic r-m-w
> > > > implementation ensure global immediate visibility. But you cannot
> > > > assume this is guaranteed for all hardware. Even for a given platform
> > > > this would need to be a specified guarantee in the architecture
> > > > manual, not just something deduced/inferred by reasoning.
> > > 
> > > 
> > > Hotspot's `memory_order_relaxed` is
> > > [aligned](https://github.com/openjdk/jdk/blob/5bb1992b8408a0d196b1afa308bc00d007458dbd/src/hotspot/share/runtime/atomic.hpp#L44-L45)
> > > with C++11 atomics semantics. C++11 atomic semantics for relaxed
> > > atomic ops requires [single modification order
> > > consistency](https://en.cppreference.com/w/cpp/atomic/memory_order#Relaxed_ordering),
> > > which implies
> > > [coherence](https://en.cppreference.com/w/cpp/atomic/memory_order#Modification_order).
> > > All known hardware platforms provide coherence out of the box (they
> > > are, indeed, cache-coherent platforms), that's why it is easy to
> > > implement in C++ (`mo_relaxed`) and in Java
> > > (`VarHandles.(get|set)opaque`).
> > > I am always confused by "immediate global visibility". The problem
> > > with statements that include "immediate", "before", "after" is that
> > > they leak in the notion of time, which is ill-defined for a single
> > > memory location without any reference to other variables. Maybe you
> > > can expand your concern with the example?
> > 
> > 
> > Let me back up to be clear. I stated that memory-order-conservative
> > might lower the chances (in a general platform-agnostic way) of seeing a
> > stale value, compared to memory-order-relaxed, due to the stronger
> > memory fence/barrier operation it implies. The response to that was:
> > "value updated via atomic r-m-w operation should be visible to other
> > threads guaranteed by coherence protocol"
> > claiming that visibility guarantees were inherently present due to
> > coherence regardless of what kind of memory fence/barrier were
> > associated with the r-m-w atomic operation. I'm not sure if that is
> > actually true. If it is true then we would not need any memory-order
> > parameter on the r-m-w atomic operations because they would be all be
> > the same due to this underlying coherence property.
> 
> No that isn't true. I see now that the C++ "Modification Order" definition requires the write to the counter to be (for want of a better term) "immediately visible" to any subsequent read - so no stale value could be read. That is a far stronger guarantee than I expected from mo_relaxed. The use of other mo values on the r-m-w atomic operation impact the ordering between that variable and other atomic variables.
> 
> David -----
>
Yes, for this single location atomic counter, there is no ordering involved. Although the counters are not hot, but more restricted memory constraints do not add any values.

Are you okay with this change?

Thanks,

-Zhengyu
 
> > When I said "immediate global visibility" I was referring to a situation
> > where once the write in the r-m-w atomic op had occurred then all
> > subsequent reads would see the value of that write. It is true that such
> > a thing may not require "immediacy" in a temporal sense, but the net
> > effect is the same.
> > David
> > -----

-------------

PR: https://git.openjdk.java.net/jdk/pull/6065

From mcimadamore at openjdk.java.net  Wed Nov  3 17:40:56 2021
From: mcimadamore at openjdk.java.net (Maurizio Cimadamore)
Date: Wed, 3 Nov 2021 17:40:56 GMT
Subject: RFR: 8275063: Implementation of Foreign Function & Memory API
 (Second incubator) [v17]
In-Reply-To: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
References: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
Message-ID: <QQd23tEy3_2A9iKOH825nfYiNU63aRp2gdPu2EyfwFM=.bfffd2de-bf56-4ce7-8aca-87be09f1058c@github.com>

> This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment.
> 
> [1] - https://openjdk.java.net/jeps/419

Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision:

  Fix TestUpcall
  * reverse() has a bug, as it doesn't tweak parameter types
  * reverse() is applied to the wrong MH

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/5907/files
  - new: https://git.openjdk.java.net/jdk/pull/5907/files/9fafb2a6..b9432473

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=16
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=15-16

  Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod
  Patch: https://git.openjdk.java.net/jdk/pull/5907.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/5907/head:pull/5907

PR: https://git.openjdk.java.net/jdk/pull/5907

From shade at openjdk.java.net  Wed Nov  3 17:42:19 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Wed, 3 Nov 2021 17:42:19 GMT
Subject: RFR: 8276217: Harmonize StrictMath intrinsics handling [v3]
In-Reply-To: <x0dEIeACmCL8uU35Mbr_ej-ZMRgI9Ye-dE_APsoZYXs=.d90877ae-d6a1-48d1-b8fa-e235509a492b@github.com>
References: <v9bs8_XIuFrN450gme8g0FcjBEOepcUmr3MStF-B7pg=.5ee1a282-da4b-4ec0-a71c-f5321ff702c8@github.com>
 <x0dEIeACmCL8uU35Mbr_ej-ZMRgI9Ye-dE_APsoZYXs=.d90877ae-d6a1-48d1-b8fa-e235509a492b@github.com>
Message-ID: <8z4CwkkYxAh283DZApwKTUKeqHgrohjezFmCX49g1dU=.f347492f-f369-40da-bacf-573f9ec9a997@github.com>

On Tue, 2 Nov 2021 06:25:33 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> This blocks JDK-8276215: `StrictMath` intrinsics are handled peculiarly by giving failing intrinsics a second chance to match against the similar `Math` intrinsics. This has interesting consequence for matchers: we can match the native `StrictMath.sqrt` to non-native intrinsic for `Math.sqrt`. Interpreter would then have to disambiguate the two. It could be made simpler and more consistent.
>> 
>> For `min`/`max` methods, `StrictMath` already delegates to `Math` methods, so we can just drop the intrinsics for them. `sqrt` is harder to delegate, because it is `native` and a part of public API, so we can instead do the proper special intrinsic for it.
>> 
>> There seem to be no performance regressions with this patch at least on Linux x86_64:
>> 
>> 
>> $ CONF=linux-x86_64-server-release make test TEST="micro:StrictMathBench" 
>> 
>> Benchmark                   Mode  Cnt       Score     Error   Units
>> 
>> ### Before
>> 
>> StrictMathBench.minDouble  thrpt    4  230921.558 ? 234.238  ops/ms
>> StrictMathBench.minFloat   thrpt    4  230932.303 ? 126.721  ops/ms
>> StrictMathBench.minInt     thrpt    4  230917.256 ?  73.008  ops/ms
>> StrictMathBench.minLong    thrpt    4  194460.828 ? 178.079  ops/ms
>> 
>> 
>> StrictMathBench.maxDouble  thrpt    4  230983.180 ? 161.211  ops/ms
>> StrictMathBench.maxFloat   thrpt    4  230969.290 ? 277.500  ops/ms
>> StrictMathBench.maxInt     thrpt    4  231033.581 ? 200.015  ops/ms
>> StrictMathBench.maxLong    thrpt    4  194590.744 ? 114.295  ops/ms
>> 
>> 
>> StrictMathBench.sqrtDouble  thrpt    4  230722.037 ? 2222.080  ops/ms
>> 
>> ### After
>> 
>> StrictMathBench.minDouble  thrpt    4  230976.625 ?  67.338  ops/ms
>> StrictMathBench.minFloat   thrpt    4  230896.021 ? 270.434  ops/ms
>> StrictMathBench.minInt     thrpt    4  230859.741 ? 403.147  ops/ms
>> StrictMathBench.minLong    thrpt    4  194456.673 ? 111.557  ops/ms
>> 
>> StrictMathBench.maxDouble  thrpt    4  230890.776 ?  89.924  ops/ms
>> StrictMathBench.maxFloat   thrpt    4  230918.334 ?  63.160  ops/ms
>> StrictMathBench.maxInt     thrpt    4  231059.128 ?  51.224  ops/ms
>> StrictMathBench.maxLong    thrpt    4  194488.210 ? 495.224  ops/ms
>> 
>> StrictMathBench.sqrtDouble  thrpt    4  231023.703 ? 247.330  ops/ms
>> 
>> 
>> Additional testing:
>>  - [x] `StrictMath` benchmarks
>>  - [x] Linux x86_64 fastdebug `java/lang/StrictMath`, `java/lang/Math`
>>  - [x] Linux x86_64 fastdebug `tier1`
>
> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Keep intrinsics on StrictMath

Thanks! I am going to push this tomorrow morning, if no other comments show up.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6184

From kvn at openjdk.java.net  Wed Nov  3 19:00:23 2021
From: kvn at openjdk.java.net (Vladimir Kozlov)
Date: Wed, 3 Nov 2021 19:00:23 GMT
Subject: RFR: 8276571: C2: pass compilation options as structure
Message-ID: <X0svBmkpcS4RQn4h4C4sLw-XpUALo6_uXUOiHc7HxP0=.63aa5d9b-fd29-4351-bac4-e63c24fbd6dc@github.com>

Currently we pass several compilation options as separate arguments to `Compile`: 

Compile C(env, target, entry_bci, subsume_loads, do_escape_analysis, eliminate_boxing, do_locks_coarsening, install_code, directive); 

Originally we had only `subsume_loads` option but we added few since then and we may add more. 

I suggest to add new `Options` class to pass these values into `Compile`.

-------------

Commit messages:
 - 8276571: C2: pass compilation options as structure

Changes: https://git.openjdk.java.net/jdk/pull/6237/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6237&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8276571
  Stats: 66 lines in 4 files changed: 30 ins; 15 del; 21 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6237.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6237/head:pull/6237

PR: https://git.openjdk.java.net/jdk/pull/6237

From darcy at openjdk.java.net  Wed Nov  3 21:06:31 2021
From: darcy at openjdk.java.net (Joe Darcy)
Date: Wed, 3 Nov 2021 21:06:31 GMT
Subject: RFR: JDK-8276588: Change "ccc" to "CSR" in HotSpot sources
Message-ID: <It2_AUSe6_4CClR6PCcsjV55_e4U6RKvfyI7ElCFOcM=.93b3eb7d-3035-46f1-a8a6-7c3991b8cbc5@github.com>

I noticed an out-of-date use of "ccc" in the HotSpot sources and grepped over the sources to find and fix all such instances.

-------------

Commit messages:
 - JDK-8276588: Change "ccc" to "CSR" in HotSpot sources

Changes: https://git.openjdk.java.net/jdk/pull/6240/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6240&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8276588
  Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6240.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6240/head:pull/6240

PR: https://git.openjdk.java.net/jdk/pull/6240

From dcubed at openjdk.java.net  Wed Nov  3 21:11:14 2021
From: dcubed at openjdk.java.net (Daniel D.Daugherty)
Date: Wed, 3 Nov 2021 21:11:14 GMT
Subject: RFR: JDK-8276588: Change "ccc" to "CSR" in HotSpot sources
In-Reply-To: <It2_AUSe6_4CClR6PCcsjV55_e4U6RKvfyI7ElCFOcM=.93b3eb7d-3035-46f1-a8a6-7c3991b8cbc5@github.com>
References: <It2_AUSe6_4CClR6PCcsjV55_e4U6RKvfyI7ElCFOcM=.93b3eb7d-3035-46f1-a8a6-7c3991b8cbc5@github.com>
Message-ID: <Y99w8IjPSvU8wbU7md-93k5A9SpPyPq-3vX7eMm77Z8=.a1ffea25-8489-4631-82c4-6663c13d4b53@github.com>

On Wed, 3 Nov 2021 20:58:23 GMT, Joe Darcy <darcy at openjdk.org> wrote:

> I noticed an out-of-date use of "ccc" in the HotSpot sources and grepped over the sources to find and fix all such instances.

Thumbs up.

-------------

Marked as reviewed by dcubed (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6240

From kbarrett at openjdk.java.net  Wed Nov  3 21:19:12 2021
From: kbarrett at openjdk.java.net (Kim Barrett)
Date: Wed, 3 Nov 2021 21:19:12 GMT
Subject: RFR: JDK-8276588: Change "ccc" to "CSR" in HotSpot sources
In-Reply-To: <It2_AUSe6_4CClR6PCcsjV55_e4U6RKvfyI7ElCFOcM=.93b3eb7d-3035-46f1-a8a6-7c3991b8cbc5@github.com>
References: <It2_AUSe6_4CClR6PCcsjV55_e4U6RKvfyI7ElCFOcM=.93b3eb7d-3035-46f1-a8a6-7c3991b8cbc5@github.com>
Message-ID: <Ijl2_UPhqBgx-BZ9m98RnbJZHKsxqKbUlmBrjtjs0sM=.bbc59704-900b-43c2-a851-2646cb72f87c@github.com>

On Wed, 3 Nov 2021 20:58:23 GMT, Joe Darcy <darcy at openjdk.org> wrote:

> I noticed an out-of-date use of "ccc" in the HotSpot sources and grepped over the sources to find and fix all such instances.

Marked as reviewed by kbarrett (Reviewer).

src/hotspot/share/oops/instanceKlass.cpp line 731:

> 729: }
> 730: 
> 731: // To remove these from requires an incompatible change and CSR review.

I don't know what this comment is trying to say; I think there might be missing words or something.  But the change for CCC -> CSR is fine.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6240

From darcy at openjdk.java.net  Wed Nov  3 21:23:15 2021
From: darcy at openjdk.java.net (Joe Darcy)
Date: Wed, 3 Nov 2021 21:23:15 GMT
Subject: Integrated: JDK-8276588: Change "ccc" to "CSR" in HotSpot sources
In-Reply-To: <It2_AUSe6_4CClR6PCcsjV55_e4U6RKvfyI7ElCFOcM=.93b3eb7d-3035-46f1-a8a6-7c3991b8cbc5@github.com>
References: <It2_AUSe6_4CClR6PCcsjV55_e4U6RKvfyI7ElCFOcM=.93b3eb7d-3035-46f1-a8a6-7c3991b8cbc5@github.com>
Message-ID: <yPGkMRuGG4kl2XiKdEREgMYdkAfRlePUBbi0AIhgbv8=.4d83e167-8601-446d-9d6f-cc50f4f4597d@github.com>

On Wed, 3 Nov 2021 20:58:23 GMT, Joe Darcy <darcy at openjdk.org> wrote:

> I noticed an out-of-date use of "ccc" in the HotSpot sources and grepped over the sources to find and fix all such instances.

This pull request has now been integrated.

Changeset: f3320d2f
Author:    Joe Darcy <darcy at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/f3320d2fbd28349fa5eab3ea0da0ff0a3ef54c62
Stats:     3 lines in 2 files changed: 0 ins; 0 del; 3 mod

8276588: Change "ccc" to "CSR" in HotSpot sources

Reviewed-by: dcubed, kbarrett

-------------

PR: https://git.openjdk.java.net/jdk/pull/6240

From dholmes at openjdk.java.net  Thu Nov  4 01:32:08 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Thu, 4 Nov 2021 01:32:08 GMT
Subject: RFR: 8275718: Relax memory constraint on exception counter updates
In-Reply-To: <c2Zx315h8Vh5o3lI8wa9Pvt6LU3jYx0xg0RicwpjBk0=.058e2612-2d0b-4c6c-94d8-1cd30a3853f4@github.com>
References: <c2Zx315h8Vh5o3lI8wa9Pvt6LU3jYx0xg0RicwpjBk0=.058e2612-2d0b-4c6c-94d8-1cd30a3853f4@github.com>
Message-ID: <qgQMT1D05siyxMw3TTZFyaA37gfJNk9Vmg_QiMLilcI=.be6f87ea-9e8b-4586-ab20-9c500cc3b2a8@github.com>

On Thu, 21 Oct 2021 15:16:28 GMT, Zhengyu Gu <zgu at openjdk.org> wrote:

> This is another instance of counter updates that only need atomic guarantee.

I'm not sure there is any actual benefit to this change, but I also do not see any harm. So okay.

Thanks,
David

-------------

Marked as reviewed by dholmes (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6065

From dholmes at openjdk.java.net  Thu Nov  4 01:45:19 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Thu, 4 Nov 2021 01:45:19 GMT
Subject: RFR: 8276096: Simplify Unsafe.{load|store}Fence fallbacks by
 delegating to fullFence [v2]
In-Reply-To: <6VLgphi_CCvby1B3jzpYuchN6ZT-dFaZ2e9VSba3YsQ=.62b863ac-0b29-47fa-a6d8-2ca49b8dd891@github.com>
References: <VgjinVfDn81_9ofZ8O9EZvKQjTekvEA5wQByn0T3i_U=.b5a7a4bc-24ac-4db3-bda0-da86ba0c0312@github.com>
 <6VLgphi_CCvby1B3jzpYuchN6ZT-dFaZ2e9VSba3YsQ=.62b863ac-0b29-47fa-a6d8-2ca49b8dd891@github.com>
Message-ID: <q0qd4ZvSiC-rdV0oKp4CPkTeewJlr_iz-haeHUApnAE=.8ac5cf5e-d774-4245-a848-511b557fc260@github.com>

On Mon, 1 Nov 2021 07:36:53 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> `Unsafe.{load|store}Fence` falls back to `unsafe.cpp` for `OrderAccess::{acquire|release}Fence()`. It seems too heavy-handed (useless?) to call to runtime for a single memory barrier. We can simplify the native `Unsafe` interface by falling back to `fullFence` when `{load|store}Fence` intrinsics are not available. This would be similar to what `Unsafe.{loadLoad|storeStore}Fences` do. 
>> 
>> This is the behavior of these intrinsics now, on x86_64, using benchmarks from JDK-8276054:
>> 
>> 
>> Benchmark          Mode  Cnt  Score   Error  Units
>> 
>> # Default
>> Single.acquire     avgt    3   0.407 ? 0.060  ns/op
>> Single.full        avgt    3   4.693 ? 0.005  ns/op
>> Single.loadLoad    avgt    3   0.415 ? 0.095  ns/op
>> Single.plain       avgt    3   0.406 ? 0.002  ns/op
>> Single.release     avgt    3   0.408 ? 0.047  ns/op
>> Single.storeStore  avgt    3   0.408 ? 0.043  ns/op
>> 
>> # -XX:DisableIntrinsic=_storeFence
>> Single.acquire     avgt    3   0.408 ? 0.016  ns/op
>> Single.full        avgt    3   4.694 ? 0.002  ns/op
>> Single.loadLoad    avgt    3   0.406 ? 0.002  ns/op
>> Single.plain       avgt    3   0.406 ? 0.001  ns/op
>> Single.release     avgt    3   4.694 ? 0.003  ns/op <--- upgraded to full
>> Single.storeStore  avgt    3   4.690 ? 0.005  ns/op <--- upgraded to full
>> 
>> # -XX:DisableIntrinsic=_loadFence
>> Single.acquire     avgt    3   4.691 ? 0.001  ns/op <--- upgraded to full
>> Single.full        avgt    3   4.693 ? 0.009  ns/op
>> Single.loadLoad    avgt    3   4.693 ? 0.013  ns/op <--- upgraded to full
>> Single.plain       avgt    3   0.408 ? 0.072  ns/op
>> Single.release     avgt    3   0.415 ? 0.016  ns/op
>> Single.storeStore  avgt    3   0.416 ? 0.041  ns/op
>> 
>> # -XX:DisableIntrinsic=_fullFence
>> Single.acquire     avgt    3   0.406 ? 0.014  ns/op
>> Single.full        avgt    3  15.836 ? 0.151  ns/op <--- calls runtime
>> Single.loadLoad    avgt    3   0.406 ? 0.001  ns/op
>> Single.plain       avgt    3   0.426 ? 0.361  ns/op
>> Single.release     avgt    3   0.407 ? 0.021  ns/op
>> Single.storeStore  avgt    3   0.410 ? 0.061  ns/op
>> 
>> # -XX:DisableIntrinsic=_fullFence,_loadFence
>> Single.acquire     avgt    3  15.822 ? 0.282  ns/op <--- upgraded, calls runtime
>> Single.full        avgt    3  15.851 ? 0.127  ns/op <--- calls runtime
>> Single.loadLoad    avgt    3  15.829 ? 0.045  ns/op <--- upgraded, calls runtime
>> Single.plain       avgt    3   0.406 ? 0.001  ns/op
>> Single.release     avgt    3   0.414 ? 0.156  ns/op
>> Single.storeStore  avgt    3   0.422 ? 0.452  ns/op
>> 
>> # -XX:DisableIntrinsic=_fullFence,_storeFence
>> Single.acquire     avgt    3   0.407 ? 0.016  ns/op
>> Single.full        avgt    3  15.347 ? 6.783  ns/op <--- calls runtime
>> Single.loadLoad    avgt    3   0.406 ? 0.001  ns/op
>> Single.plain       avgt    3   0.406 ? 0.002  ns/op 
>> Single.release     avgt    3  15.828 ? 0.019  ns/op <--- upgraded, calls runtime
>> Single.storeStore  avgt    3  15.834 ? 0.045  ns/op <--- upgraded, calls runtime
>> 
>> # -XX:DisableIntrinsic=_fullFence,_loadFence,_storeFence
>> Single.acquire     avgt    3  15.838 ? 0.030  ns/op <--- upgraded, calls runtime
>> Single.full        avgt    3  15.854 ? 0.277  ns/op <--- calls runtime
>> Single.loadLoad    avgt    3  15.826 ? 0.160  ns/op <--- upgraded, calls runtime
>> Single.plain       avgt    3   0.406 ? 0.003  ns/op
>> Single.release     avgt    3  15.838 ? 0.019  ns/op <--- upgraded, calls runtime
>> Single.storeStore  avgt    3  15.844 ? 0.104  ns/op <--- upgraded, calls runtime
>> 
>> 
>> Additional testing:
>>  - [x] Linux x86_64 fastdebug `tier1`
>
> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Restore RN for fullFence

Marked as reviewed by dholmes (Reviewer).

-------------

PR: https://git.openjdk.java.net/jdk/pull/6149

From dholmes at openjdk.java.net  Thu Nov  4 02:13:19 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Thu, 4 Nov 2021 02:13:19 GMT
Subject: RFR: JDK-8276588: Change "ccc" to "CSR" in HotSpot sources
In-Reply-To: <Ijl2_UPhqBgx-BZ9m98RnbJZHKsxqKbUlmBrjtjs0sM=.bbc59704-900b-43c2-a851-2646cb72f87c@github.com>
References: <It2_AUSe6_4CClR6PCcsjV55_e4U6RKvfyI7ElCFOcM=.93b3eb7d-3035-46f1-a8a6-7c3991b8cbc5@github.com>
 <Ijl2_UPhqBgx-BZ9m98RnbJZHKsxqKbUlmBrjtjs0sM=.bbc59704-900b-43c2-a851-2646cb72f87c@github.com>
Message-ID: <NOqlt_187bGblvVAOKmY-OG_OLwxrM05J6mTyJxyQUE=.e1cf86ab-8c5c-45c3-b6d4-75ecd23b901e@github.com>

On Wed, 3 Nov 2021 21:15:29 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:

>> I noticed an out-of-date use of "ccc" in the HotSpot sources and grepped over the sources to find and fix all such instances.
>
> src/hotspot/share/oops/instanceKlass.cpp line 731:
> 
>> 729: }
>> 730: 
>> 731: // To remove these from requires an incompatible change and CSR review.
> 
> I don't know what this comment is trying to say; I think there might be missing words or something.  But the change for CCC -> CSR is fine.

Given the 'R' in CSR already stands for Review this should have said "CSR request".

But I also have no idea what the comment is actually trying to say - what is "these" referring to???

-------------

PR: https://git.openjdk.java.net/jdk/pull/6240

From duke at openjdk.java.net  Thu Nov  4 02:39:09 2021
From: duke at openjdk.java.net (Fei Gao)
Date: Thu, 4 Nov 2021 02:39:09 GMT
Subject: RFR: 8274179: AArch64: Support SVE operations with encodable
 immediates
In-Reply-To: <XlVFZDTwwYvLPlMddZCPyMUlG-a_ryk-pYxniQBSQu8=.9f72283d-7d54-4839-b45a-3962138ff261@github.com>
References: <iQ6P7TblcS20eQ-acZmAkLU_ikzTkMCUGXorw4LFD_8=.15686e9d-b6a0-4ee8-bd55-ecbe0512b721@github.com>
 <XlVFZDTwwYvLPlMddZCPyMUlG-a_ryk-pYxniQBSQu8=.9f72283d-7d54-4839-b45a-3962138ff261@github.com>
Message-ID: <wV-8SVh1lrTLWJuJpo9AOR1dvRJAySYtrFzR_wXl_gE=.eb17f752-6f6e-4b0a-9d72-bb601b263ce4@github.com>

On Tue, 26 Oct 2021 11:37:23 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> for(int i = 0; i < LENGTH; i++) {
>>       c[i] = a[i] + 2;
>>     }
>> 
>> For the case showed above, after superword optimization with SVE,
>> without the patch, the vector add operation always has 2 z-reg inputs,
>> like:
>> mov     z16.s, #2
>> add	z17.s, z17.s, z16.s
>> 
>> Considering sve has supported basic binary operations with immediate,
>> this pattern could be further optimized to:
>> add     z16.s, z16.s, #2
>> 
>> To implement it, we added some new match rules and assembler rules in
>> the aarch64 backend. We also made some extensions on immediate types
>> and functions to keep backward compatible.
>> 
>> With the patch, only these binary integer vector operations, +(add),
>> -(sub), &(and), |(orr), and ^(eor) with immediate are supported for
>> the optimization. Other vector operations are not supported currently.
>> 
>> Tested tier1 and test/hotspot/jtreg/compiler on SVE featured AArch64
>> CPU, no new failure.
>> 
>> There is no obvious performance uplift but it can help remove one
>> redundant mov instruction.
>
> I'd like you to split this patch into two parts, please.
> First, please use the new functions such as `Assembler::operand_valid_for_logical_immediate(bool is32, uint64_t imm)` only for SVE, leaving the existing logic in `Assembler` entirely untouched. This will cause some duplication, but that's OK. We can review changes to merge functionality in a separate patch. This will be much easier.

@theRealAph , could you please help approve it? Thanks for your time :)

-------------

PR: https://git.openjdk.java.net/jdk/pull/6115

From mli at openjdk.java.net  Thu Nov  4 05:16:33 2021
From: mli at openjdk.java.net (Hamlin Li)
Date: Thu, 4 Nov 2021 05:16:33 GMT
Subject: RFR: 8276618: Pad cacheline for Thread::_rcu_counter
Message-ID: <6kHhrYgTQ2_ST7TG7H0Syf6_QR8OW4qTc1KGIRJMhWE=.e29aee68-ca4e-46b0-a930-fc38e5176ca9@github.com>

Currently, Thread::_rcu_counter is not padded by cacheline, it should be beneficail to do so.

The initial spebjbb test shows about 10.5% improvement of critical, and 0.7% improvement of max in specjbb2015, specjbb arguments:
  GROUP_COUNT=4
  TI_JVM_COUNT=1
  JAVA_OPTS_BE="-server -XX:+UseG1GC -Xms32g -Xmx32g"
  MODE_ARGS="-ikv"

-------------

Commit messages:
 - Initial commit

Changes: https://git.openjdk.java.net/jdk/pull/6246/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6246&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8276618
  Stats: 8 lines in 3 files changed: 2 ins; 0 del; 6 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6246.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6246/head:pull/6246

PR: https://git.openjdk.java.net/jdk/pull/6246

From dholmes at openjdk.java.net  Thu Nov  4 06:19:09 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Thu, 4 Nov 2021 06:19:09 GMT
Subject: RFR: 8276618: Pad cacheline for Thread::_rcu_counter
In-Reply-To: <6kHhrYgTQ2_ST7TG7H0Syf6_QR8OW4qTc1KGIRJMhWE=.e29aee68-ca4e-46b0-a930-fc38e5176ca9@github.com>
References: <6kHhrYgTQ2_ST7TG7H0Syf6_QR8OW4qTc1KGIRJMhWE=.e29aee68-ca4e-46b0-a930-fc38e5176ca9@github.com>
Message-ID: <HvHvpajrdVqkR5KkKf39n4o-XsWV72s6iedYDQu_NVg=.1a1b2778-e0de-4179-af5a-bdf12cead26b@github.com>

On Thu, 4 Nov 2021 05:09:48 GMT, Hamlin Li <mli at openjdk.org> wrote:

> Currently, Thread::_rcu_counter is not padded by cacheline, it should be beneficail to do so.
> 
> The initial spebjbb test shows about 10.5% improvement of critical, and 0.7% improvement of max in specjbb2015, specjbb arguments:
>   GROUP_COUNT=4
>   TI_JVM_COUNT=1
>   JAVA_OPTS_BE="-server -XX:+UseG1GC -Xms32g -Xmx32g"
>   MODE_ARGS="-ikv"

Hi Hamlin,

This seems reasonable to me, however whenever we add padding to optimise the placement of one field, I always wonder if that same padding has de-optimised the placement of other fields? I think we need to see a broader run of benchmarks here and across more than just x86_64.

I will see if I can assist on the benchmark front.

Thanks,
David

src/hotspot/share/runtime/thread.hpp line 253:

> 251: 
> 252:   // Support for GlobalCounter
> 253:  private:

pre-existing nit: this private is not needed; nor is the public at line 260.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6246

From alanb at openjdk.java.net  Thu Nov  4 07:29:19 2021
From: alanb at openjdk.java.net (Alan Bateman)
Date: Thu, 4 Nov 2021 07:29:19 GMT
Subject: RFR: JDK-8276588: Change "ccc" to "CSR" in HotSpot sources
In-Reply-To: <NOqlt_187bGblvVAOKmY-OG_OLwxrM05J6mTyJxyQUE=.e1cf86ab-8c5c-45c3-b6d4-75ecd23b901e@github.com>
References: <It2_AUSe6_4CClR6PCcsjV55_e4U6RKvfyI7ElCFOcM=.93b3eb7d-3035-46f1-a8a6-7c3991b8cbc5@github.com>
 <Ijl2_UPhqBgx-BZ9m98RnbJZHKsxqKbUlmBrjtjs0sM=.bbc59704-900b-43c2-a851-2646cb72f87c@github.com>
 <NOqlt_187bGblvVAOKmY-OG_OLwxrM05J6mTyJxyQUE=.e1cf86ab-8c5c-45c3-b6d4-75ecd23b901e@github.com>
Message-ID: <GaM740gmpgFUGPZ_ntVamyU4bZ0jLqjq-Fh0LhjOs6w=.c8a935c2-93b5-4f64-9f56-f1297d73a5f4@github.com>

On Thu, 4 Nov 2021 02:10:37 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> src/hotspot/share/oops/instanceKlass.cpp line 731:
>> 
>>> 729: }
>>> 730: 
>>> 731: // To remove these from requires an incompatible change and CSR review.
>> 
>> I don't know what this comment is trying to say; I think there might be missing words or something.  But the change for CCC -> CSR is fine.
>
> Given the 'R' in CSR already stands for Review this should have said "CSR request".
> 
> But I also have no idea what the comment is actually trying to say - what is "these" referring to???

I don't know why that comment is there. The API is Class::getSigners and any changes to its behavior would require a CSR, but we are free to change the implementation. So maybe the comment should be removed.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6240

From mli at openjdk.java.net  Thu Nov  4 07:30:09 2021
From: mli at openjdk.java.net (Hamlin Li)
Date: Thu, 4 Nov 2021 07:30:09 GMT
Subject: RFR: 8276618: Pad cacheline for Thread::_rcu_counter
In-Reply-To: <6kHhrYgTQ2_ST7TG7H0Syf6_QR8OW4qTc1KGIRJMhWE=.e29aee68-ca4e-46b0-a930-fc38e5176ca9@github.com>
References: <6kHhrYgTQ2_ST7TG7H0Syf6_QR8OW4qTc1KGIRJMhWE=.e29aee68-ca4e-46b0-a930-fc38e5176ca9@github.com>
Message-ID: <Sd7NmZNhMf-2k9Wk5zCsfAbeAnC41_g-UCAsRcrPJDM=.ff93b7b9-10d0-4976-8f1a-aac52266cc3c@github.com>

On Thu, 4 Nov 2021 05:09:48 GMT, Hamlin Li <mli at openjdk.org> wrote:

> Currently, Thread::_rcu_counter is not padded by cacheline, it should be beneficail to do so.
> 
> The initial spebjbb test shows about 10.5% improvement of critical, and 0.7% improvement of max in specjbb2015, specjbb arguments:
>   GROUP_COUNT=4
>   TI_JVM_COUNT=1
>   JAVA_OPTS_BE="-server -XX:+UseG1GC -Xms32g -Xmx32g"
>   MODE_ARGS="-ikv"

Thanks a lot David, it will be very helpful.
BTW, I will modify as you suggested later together with other's comments.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6246

From shade at openjdk.java.net  Thu Nov  4 08:03:09 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Thu, 4 Nov 2021 08:03:09 GMT
Subject: RFR: 8275586: Zero: Simplify interpreter initialization
In-Reply-To: <8FCUBqssHqcaYRC6gnr37F8A9gGX1Hzvx8ny5BQblOY=.b54f2029-7a37-493a-bcc9-6fec9c29c943@github.com>
References: <8FCUBqssHqcaYRC6gnr37F8A9gGX1Hzvx8ny5BQblOY=.b54f2029-7a37-493a-bcc9-6fec9c29c943@github.com>
Message-ID: <Dj7IQIs7RCu2qAIndeRq1iWD-0pnKtPfjlcggbewnNo=.880adec8-6ee5-4cc1-92b8-1a25e25f488d@github.com>

On Wed, 20 Oct 2021 07:44:36 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> The prolog in `BytecodeInterpreter` is hairy due to early initialization of interpreter statics. Previous rewrites make it mostly redundant, and we can now simplify it.
> 
> This also implicitly fixes a initialization bug. If `JvmtiExport::can_post_interpreter_events()` changes at runtime, we will call into the uninitialized version:
> 
> 
>     // Call the interpreter
>     if (JvmtiExport::can_post_interpreter_events()) {
>       BytecodeInterpreter::run<true>(istate);
>     } else {
>       BytecodeInterpreter::run<false>(istate);
>     } 
> 
> 
> Additional testing:
>  - [x] Linux x86_64 fastdebug `make bootcycle-images`

I think I need a second (R)eviewer for this.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6029

From tschatzl at openjdk.java.net  Thu Nov  4 08:04:10 2021
From: tschatzl at openjdk.java.net (Thomas Schatzl)
Date: Thu, 4 Nov 2021 08:04:10 GMT
Subject: RFR: 8276618: Pad cacheline for Thread::_rcu_counter
In-Reply-To: <6kHhrYgTQ2_ST7TG7H0Syf6_QR8OW4qTc1KGIRJMhWE=.e29aee68-ca4e-46b0-a930-fc38e5176ca9@github.com>
References: <6kHhrYgTQ2_ST7TG7H0Syf6_QR8OW4qTc1KGIRJMhWE=.e29aee68-ca4e-46b0-a930-fc38e5176ca9@github.com>
Message-ID: <q1dzITw2hj4IrHHw27tnX_i8b7K2RMMKv0ry8LcUQS0=.034351a8-a500-4753-9ea4-8ce38c81bb5b@github.com>

On Thu, 4 Nov 2021 05:09:48 GMT, Hamlin Li <mli at openjdk.org> wrote:

> Currently, Thread::_rcu_counter is not padded by cacheline, it should be beneficail to do so.
> 
> The initial spebjbb test shows about 10.5% improvement of critical, and 0.7% improvement of max in specjbb2015, specjbb arguments:
>   GROUP_COUNT=4
>   TI_JVM_COUNT=1
>   JAVA_OPTS_BE="-server -XX:+UseG1GC -Xms32g -Xmx32g"
>   MODE_ARGS="-ikv"

I'll push it through our perf testing.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6246

From shade at openjdk.java.net  Thu Nov  4 08:08:17 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Thu, 4 Nov 2021 08:08:17 GMT
Subject: RFR: 8276096: Simplify Unsafe.{load|store}Fence fallbacks by
 delegating to fullFence [v2]
In-Reply-To: <6VLgphi_CCvby1B3jzpYuchN6ZT-dFaZ2e9VSba3YsQ=.62b863ac-0b29-47fa-a6d8-2ca49b8dd891@github.com>
References: <VgjinVfDn81_9ofZ8O9EZvKQjTekvEA5wQByn0T3i_U=.b5a7a4bc-24ac-4db3-bda0-da86ba0c0312@github.com>
 <6VLgphi_CCvby1B3jzpYuchN6ZT-dFaZ2e9VSba3YsQ=.62b863ac-0b29-47fa-a6d8-2ca49b8dd891@github.com>
Message-ID: <rU8YjuBzxJshiroi9AnWY9YsHfubM412NBrT1fhEAPE=.22e0a91a-0766-4f04-b7ba-a4d546fd8b99@github.com>

On Mon, 1 Nov 2021 07:36:53 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> `Unsafe.{load|store}Fence` falls back to `unsafe.cpp` for `OrderAccess::{acquire|release}Fence()`. It seems too heavy-handed (useless?) to call to runtime for a single memory barrier. We can simplify the native `Unsafe` interface by falling back to `fullFence` when `{load|store}Fence` intrinsics are not available. This would be similar to what `Unsafe.{loadLoad|storeStore}Fences` do. 
>> 
>> This is the behavior of these intrinsics now, on x86_64, using benchmarks from JDK-8276054:
>> 
>> 
>> Benchmark          Mode  Cnt  Score   Error  Units
>> 
>> # Default
>> Single.acquire     avgt    3   0.407 ? 0.060  ns/op
>> Single.full        avgt    3   4.693 ? 0.005  ns/op
>> Single.loadLoad    avgt    3   0.415 ? 0.095  ns/op
>> Single.plain       avgt    3   0.406 ? 0.002  ns/op
>> Single.release     avgt    3   0.408 ? 0.047  ns/op
>> Single.storeStore  avgt    3   0.408 ? 0.043  ns/op
>> 
>> # -XX:DisableIntrinsic=_storeFence
>> Single.acquire     avgt    3   0.408 ? 0.016  ns/op
>> Single.full        avgt    3   4.694 ? 0.002  ns/op
>> Single.loadLoad    avgt    3   0.406 ? 0.002  ns/op
>> Single.plain       avgt    3   0.406 ? 0.001  ns/op
>> Single.release     avgt    3   4.694 ? 0.003  ns/op <--- upgraded to full
>> Single.storeStore  avgt    3   4.690 ? 0.005  ns/op <--- upgraded to full
>> 
>> # -XX:DisableIntrinsic=_loadFence
>> Single.acquire     avgt    3   4.691 ? 0.001  ns/op <--- upgraded to full
>> Single.full        avgt    3   4.693 ? 0.009  ns/op
>> Single.loadLoad    avgt    3   4.693 ? 0.013  ns/op <--- upgraded to full
>> Single.plain       avgt    3   0.408 ? 0.072  ns/op
>> Single.release     avgt    3   0.415 ? 0.016  ns/op
>> Single.storeStore  avgt    3   0.416 ? 0.041  ns/op
>> 
>> # -XX:DisableIntrinsic=_fullFence
>> Single.acquire     avgt    3   0.406 ? 0.014  ns/op
>> Single.full        avgt    3  15.836 ? 0.151  ns/op <--- calls runtime
>> Single.loadLoad    avgt    3   0.406 ? 0.001  ns/op
>> Single.plain       avgt    3   0.426 ? 0.361  ns/op
>> Single.release     avgt    3   0.407 ? 0.021  ns/op
>> Single.storeStore  avgt    3   0.410 ? 0.061  ns/op
>> 
>> # -XX:DisableIntrinsic=_fullFence,_loadFence
>> Single.acquire     avgt    3  15.822 ? 0.282  ns/op <--- upgraded, calls runtime
>> Single.full        avgt    3  15.851 ? 0.127  ns/op <--- calls runtime
>> Single.loadLoad    avgt    3  15.829 ? 0.045  ns/op <--- upgraded, calls runtime
>> Single.plain       avgt    3   0.406 ? 0.001  ns/op
>> Single.release     avgt    3   0.414 ? 0.156  ns/op
>> Single.storeStore  avgt    3   0.422 ? 0.452  ns/op
>> 
>> # -XX:DisableIntrinsic=_fullFence,_storeFence
>> Single.acquire     avgt    3   0.407 ? 0.016  ns/op
>> Single.full        avgt    3  15.347 ? 6.783  ns/op <--- calls runtime
>> Single.loadLoad    avgt    3   0.406 ? 0.001  ns/op
>> Single.plain       avgt    3   0.406 ? 0.002  ns/op 
>> Single.release     avgt    3  15.828 ? 0.019  ns/op <--- upgraded, calls runtime
>> Single.storeStore  avgt    3  15.834 ? 0.045  ns/op <--- upgraded, calls runtime
>> 
>> # -XX:DisableIntrinsic=_fullFence,_loadFence,_storeFence
>> Single.acquire     avgt    3  15.838 ? 0.030  ns/op <--- upgraded, calls runtime
>> Single.full        avgt    3  15.854 ? 0.277  ns/op <--- calls runtime
>> Single.loadLoad    avgt    3  15.826 ? 0.160  ns/op <--- upgraded, calls runtime
>> Single.plain       avgt    3   0.406 ? 0.003  ns/op
>> Single.release     avgt    3  15.838 ? 0.019  ns/op <--- upgraded, calls runtime
>> Single.storeStore  avgt    3  15.844 ? 0.104  ns/op <--- upgraded, calls runtime
>> 
>> 
>> Additional testing:
>>  - [x] Linux x86_64 fastdebug `tier1`
>
> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Restore RN for fullFence

Thanks!

-------------

PR: https://git.openjdk.java.net/jdk/pull/6149

From shade at openjdk.java.net  Thu Nov  4 08:08:18 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Thu, 4 Nov 2021 08:08:18 GMT
Subject: Integrated: 8276096: Simplify Unsafe.{load|store}Fence fallbacks by
 delegating to fullFence
In-Reply-To: <VgjinVfDn81_9ofZ8O9EZvKQjTekvEA5wQByn0T3i_U=.b5a7a4bc-24ac-4db3-bda0-da86ba0c0312@github.com>
References: <VgjinVfDn81_9ofZ8O9EZvKQjTekvEA5wQByn0T3i_U=.b5a7a4bc-24ac-4db3-bda0-da86ba0c0312@github.com>
Message-ID: <YbSBwQV85k74gbTMUz6ucwXzroF6zQkOVBEXvzNXixk=.440f871d-4eed-4d4e-80f0-41060a39e318@github.com>

On Thu, 28 Oct 2021 08:47:31 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> `Unsafe.{load|store}Fence` falls back to `unsafe.cpp` for `OrderAccess::{acquire|release}Fence()`. It seems too heavy-handed (useless?) to call to runtime for a single memory barrier. We can simplify the native `Unsafe` interface by falling back to `fullFence` when `{load|store}Fence` intrinsics are not available. This would be similar to what `Unsafe.{loadLoad|storeStore}Fences` do. 
> 
> This is the behavior of these intrinsics now, on x86_64, using benchmarks from JDK-8276054:
> 
> 
> Benchmark          Mode  Cnt  Score   Error  Units
> 
> # Default
> Single.acquire     avgt    3   0.407 ? 0.060  ns/op
> Single.full        avgt    3   4.693 ? 0.005  ns/op
> Single.loadLoad    avgt    3   0.415 ? 0.095  ns/op
> Single.plain       avgt    3   0.406 ? 0.002  ns/op
> Single.release     avgt    3   0.408 ? 0.047  ns/op
> Single.storeStore  avgt    3   0.408 ? 0.043  ns/op
> 
> # -XX:DisableIntrinsic=_storeFence
> Single.acquire     avgt    3   0.408 ? 0.016  ns/op
> Single.full        avgt    3   4.694 ? 0.002  ns/op
> Single.loadLoad    avgt    3   0.406 ? 0.002  ns/op
> Single.plain       avgt    3   0.406 ? 0.001  ns/op
> Single.release     avgt    3   4.694 ? 0.003  ns/op <--- upgraded to full
> Single.storeStore  avgt    3   4.690 ? 0.005  ns/op <--- upgraded to full
> 
> # -XX:DisableIntrinsic=_loadFence
> Single.acquire     avgt    3   4.691 ? 0.001  ns/op <--- upgraded to full
> Single.full        avgt    3   4.693 ? 0.009  ns/op
> Single.loadLoad    avgt    3   4.693 ? 0.013  ns/op <--- upgraded to full
> Single.plain       avgt    3   0.408 ? 0.072  ns/op
> Single.release     avgt    3   0.415 ? 0.016  ns/op
> Single.storeStore  avgt    3   0.416 ? 0.041  ns/op
> 
> # -XX:DisableIntrinsic=_fullFence
> Single.acquire     avgt    3   0.406 ? 0.014  ns/op
> Single.full        avgt    3  15.836 ? 0.151  ns/op <--- calls runtime
> Single.loadLoad    avgt    3   0.406 ? 0.001  ns/op
> Single.plain       avgt    3   0.426 ? 0.361  ns/op
> Single.release     avgt    3   0.407 ? 0.021  ns/op
> Single.storeStore  avgt    3   0.410 ? 0.061  ns/op
> 
> # -XX:DisableIntrinsic=_fullFence,_loadFence
> Single.acquire     avgt    3  15.822 ? 0.282  ns/op <--- upgraded, calls runtime
> Single.full        avgt    3  15.851 ? 0.127  ns/op <--- calls runtime
> Single.loadLoad    avgt    3  15.829 ? 0.045  ns/op <--- upgraded, calls runtime
> Single.plain       avgt    3   0.406 ? 0.001  ns/op
> Single.release     avgt    3   0.414 ? 0.156  ns/op
> Single.storeStore  avgt    3   0.422 ? 0.452  ns/op
> 
> # -XX:DisableIntrinsic=_fullFence,_storeFence
> Single.acquire     avgt    3   0.407 ? 0.016  ns/op
> Single.full        avgt    3  15.347 ? 6.783  ns/op <--- calls runtime
> Single.loadLoad    avgt    3   0.406 ? 0.001  ns/op
> Single.plain       avgt    3   0.406 ? 0.002  ns/op 
> Single.release     avgt    3  15.828 ? 0.019  ns/op <--- upgraded, calls runtime
> Single.storeStore  avgt    3  15.834 ? 0.045  ns/op <--- upgraded, calls runtime
> 
> # -XX:DisableIntrinsic=_fullFence,_loadFence,_storeFence
> Single.acquire     avgt    3  15.838 ? 0.030  ns/op <--- upgraded, calls runtime
> Single.full        avgt    3  15.854 ? 0.277  ns/op <--- calls runtime
> Single.loadLoad    avgt    3  15.826 ? 0.160  ns/op <--- upgraded, calls runtime
> Single.plain       avgt    3   0.406 ? 0.003  ns/op
> Single.release     avgt    3  15.838 ? 0.019  ns/op <--- upgraded, calls runtime
> Single.storeStore  avgt    3  15.844 ? 0.104  ns/op <--- upgraded, calls runtime
> 
> 
> Additional testing:
>  - [x] Linux x86_64 fastdebug `tier1`

This pull request has now been integrated.

Changeset: fb0be81f
Author:    Aleksey Shipilev <shade at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/fb0be81f0148d9aea73321a0c2bd83b2e477d952
Stats:     21 lines in 3 files changed: 6 ins; 11 del; 4 mod

8276096: Simplify Unsafe.{load|store}Fence fallbacks by delegating to fullFence

Reviewed-by: psandoz, aph, dholmes

-------------

PR: https://git.openjdk.java.net/jdk/pull/6149

From shade at openjdk.java.net  Thu Nov  4 08:11:18 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Thu, 4 Nov 2021 08:11:18 GMT
Subject: Integrated: 8276217: Harmonize StrictMath intrinsics handling
In-Reply-To: <v9bs8_XIuFrN450gme8g0FcjBEOepcUmr3MStF-B7pg=.5ee1a282-da4b-4ec0-a71c-f5321ff702c8@github.com>
References: <v9bs8_XIuFrN450gme8g0FcjBEOepcUmr3MStF-B7pg=.5ee1a282-da4b-4ec0-a71c-f5321ff702c8@github.com>
Message-ID: <kW3DqandiA9cmbIsNnasfvTstif2CaQddxRBZPdZFvU=.c5025278-57ca-423b-8ceb-b7f354e92f7f@github.com>

On Mon, 1 Nov 2021 11:23:16 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> This blocks JDK-8276215: `StrictMath` intrinsics are handled peculiarly by giving failing intrinsics a second chance to match against the similar `Math` intrinsics. This has interesting consequence for matchers: we can match the native `StrictMath.sqrt` to non-native intrinsic for `Math.sqrt`. Interpreter would then have to disambiguate the two. It could be made simpler and more consistent.
> 
> For `min`/`max` methods, `StrictMath` already delegates to `Math` methods, so we can just drop the intrinsics for them. `sqrt` is harder to delegate, because it is `native` and a part of public API, so we can instead do the proper special intrinsic for it.
> 
> There seem to be no performance regressions with this patch at least on Linux x86_64:
> 
> 
> $ CONF=linux-x86_64-server-release make test TEST="micro:StrictMathBench" 
> 
> Benchmark                   Mode  Cnt       Score     Error   Units
> 
> ### Before
> 
> StrictMathBench.minDouble  thrpt    4  230921.558 ? 234.238  ops/ms
> StrictMathBench.minFloat   thrpt    4  230932.303 ? 126.721  ops/ms
> StrictMathBench.minInt     thrpt    4  230917.256 ?  73.008  ops/ms
> StrictMathBench.minLong    thrpt    4  194460.828 ? 178.079  ops/ms
> 
> 
> StrictMathBench.maxDouble  thrpt    4  230983.180 ? 161.211  ops/ms
> StrictMathBench.maxFloat   thrpt    4  230969.290 ? 277.500  ops/ms
> StrictMathBench.maxInt     thrpt    4  231033.581 ? 200.015  ops/ms
> StrictMathBench.maxLong    thrpt    4  194590.744 ? 114.295  ops/ms
> 
> 
> StrictMathBench.sqrtDouble  thrpt    4  230722.037 ? 2222.080  ops/ms
> 
> ### After
> 
> StrictMathBench.minDouble  thrpt    4  230976.625 ?  67.338  ops/ms
> StrictMathBench.minFloat   thrpt    4  230896.021 ? 270.434  ops/ms
> StrictMathBench.minInt     thrpt    4  230859.741 ? 403.147  ops/ms
> StrictMathBench.minLong    thrpt    4  194456.673 ? 111.557  ops/ms
> 
> StrictMathBench.maxDouble  thrpt    4  230890.776 ?  89.924  ops/ms
> StrictMathBench.maxFloat   thrpt    4  230918.334 ?  63.160  ops/ms
> StrictMathBench.maxInt     thrpt    4  231059.128 ?  51.224  ops/ms
> StrictMathBench.maxLong    thrpt    4  194488.210 ? 495.224  ops/ms
> 
> StrictMathBench.sqrtDouble  thrpt    4  231023.703 ? 247.330  ops/ms
> 
> 
> Additional testing:
>  - [x] `StrictMath` benchmarks
>  - [x] Linux x86_64 fastdebug `java/lang/StrictMath`, `java/lang/Math`
>  - [x] Linux x86_64 fastdebug `tier1`

This pull request has now been integrated.

Changeset: 9eadcbb4
Author:    Aleksey Shipilev <shade at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/9eadcbb47e902f42d933ba68e24f2bfb0ee20915
Stats:     125 lines in 15 files changed: 80 ins; 27 del; 18 mod

8276217: Harmonize StrictMath intrinsics handling

Reviewed-by: aph, kvn

-------------

PR: https://git.openjdk.java.net/jdk/pull/6184

From mli at openjdk.java.net  Thu Nov  4 08:38:19 2021
From: mli at openjdk.java.net (Hamlin Li)
Date: Thu, 4 Nov 2021 08:38:19 GMT
Subject: RFR: 8276618: Pad cacheline for Thread::_rcu_counter
In-Reply-To: <6kHhrYgTQ2_ST7TG7H0Syf6_QR8OW4qTc1KGIRJMhWE=.e29aee68-ca4e-46b0-a930-fc38e5176ca9@github.com>
References: <6kHhrYgTQ2_ST7TG7H0Syf6_QR8OW4qTc1KGIRJMhWE=.e29aee68-ca4e-46b0-a930-fc38e5176ca9@github.com>
Message-ID: <rijAs5XL51ssillS9ou1JY8NVYCHVeKNYwNiqIul7p0=.ecee3616-5647-4924-b86c-455519dce192@github.com>

On Thu, 4 Nov 2021 05:09:48 GMT, Hamlin Li <mli at openjdk.org> wrote:

> Currently, Thread::_rcu_counter is not padded by cacheline, it should be beneficail to do so.
> 
> The initial spebjbb test shows about 10.5% improvement of critical, and 0.7% improvement of max in specjbb2015, specjbb arguments:
>   GROUP_COUNT=4
>   TI_JVM_COUNT=1
>   JAVA_OPTS_BE="-server -XX:+UseG1GC -Xms32g -Xmx32g"
>   MODE_ARGS="-ikv"

Thanks a lot Thomas. :)

-------------

PR: https://git.openjdk.java.net/jdk/pull/6246

From shade at openjdk.java.net  Thu Nov  4 09:40:11 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Thu, 4 Nov 2021 09:40:11 GMT
Subject: RFR: 8276571: C2: pass compilation options as structure
In-Reply-To: <X0svBmkpcS4RQn4h4C4sLw-XpUALo6_uXUOiHc7HxP0=.63aa5d9b-fd29-4351-bac4-e63c24fbd6dc@github.com>
References: <X0svBmkpcS4RQn4h4C4sLw-XpUALo6_uXUOiHc7HxP0=.63aa5d9b-fd29-4351-bac4-e63c24fbd6dc@github.com>
Message-ID: <gaxcPfvMQhPLFi0KBKJcXOPdFp-K_mjkMTipcXQV4F4=.62a5feaf-6e51-4f83-a79a-dcedfff94730@github.com>

On Wed, 3 Nov 2021 18:49:47 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

> Currently we pass several compilation options as separate arguments to `Compile`: 
> 
> Compile C(env, target, entry_bci, subsume_loads, do_escape_analysis, eliminate_boxing, do_locks_coarsening, install_code, directive); 
> 
> Originally we had only `subsume_loads` option but we added few since then and we may add more. 
> 
> I suggest to add new `Options` class to pass these values into `Compile`.

I like the way it is going, but unfortunately I find the list of unnamed boolean arguments as confusing and error-prone as before... Could we use "named parameters idiom" here, or some other way to name these parameters?

Something like:


class Options {
  Options() : _subsume_loads(false), _do_escape_analysis(false) {};
  Options& subsume_loads() { _subsume_loads = true; return *this; }
  Options& do_escape_analysis() { _do_escape_analysis = true; return *this; }
}

src/hotspot/share/opto/compile.cpp line 490:

> 488: #ifndef PRODUCT
> 489:   // Check if recompiling
> 490:   if ((subsume_loads() == false) && PrintOpto) {

Suggestion:

  if (!subsume_loads() && PrintOpto) {

-------------

PR: https://git.openjdk.java.net/jdk/pull/6237

From adinn at openjdk.java.net  Thu Nov  4 09:53:16 2021
From: adinn at openjdk.java.net (Andrew Dinn)
Date: Thu, 4 Nov 2021 09:53:16 GMT
Subject: RFR: 8275586: Zero: Simplify interpreter initialization
In-Reply-To: <8FCUBqssHqcaYRC6gnr37F8A9gGX1Hzvx8ny5BQblOY=.b54f2029-7a37-493a-bcc9-6fec9c29c943@github.com>
References: <8FCUBqssHqcaYRC6gnr37F8A9gGX1Hzvx8ny5BQblOY=.b54f2029-7a37-493a-bcc9-6fec9c29c943@github.com>
Message-ID: <kZB2dtAJcWrIORN6oeM26L9Sn68PLVxnEo3zdjgwQTs=.d31805ee-87be-4ce3-89bc-a84e3f987cad@github.com>

On Wed, 20 Oct 2021 07:44:36 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> The prolog in `BytecodeInterpreter` is hairy due to early initialization of interpreter statics. Previous rewrites make it mostly redundant, and we can now simplify it.
> 
> This also implicitly fixes a initialization bug. If `JvmtiExport::can_post_interpreter_events()` changes at runtime, we will call into the uninitialized version:
> 
> 
>     // Call the interpreter
>     if (JvmtiExport::can_post_interpreter_events()) {
>       BytecodeInterpreter::run<true>(istate);
>     } else {
>       BytecodeInterpreter::run<false>(istate);
>     } 
> 
> 
> Additional testing:
>  - [x] Linux x86_64 fastdebug `make bootcycle-images`

Yes this looks good.

-------------

Marked as reviewed by adinn (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6029

From shade at openjdk.java.net  Thu Nov  4 10:26:17 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Thu, 4 Nov 2021 10:26:17 GMT
Subject: RFR: 8275586: Zero: Simplify interpreter initialization
In-Reply-To: <8FCUBqssHqcaYRC6gnr37F8A9gGX1Hzvx8ny5BQblOY=.b54f2029-7a37-493a-bcc9-6fec9c29c943@github.com>
References: <8FCUBqssHqcaYRC6gnr37F8A9gGX1Hzvx8ny5BQblOY=.b54f2029-7a37-493a-bcc9-6fec9c29c943@github.com>
Message-ID: <_JLmXDetzSNbsPPQedgscbv9a-WSJ8N0i5xW1w7t9eI=.f328c164-5056-4a8d-b2f4-da51eace5d9e@github.com>

On Wed, 20 Oct 2021 07:44:36 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> The prolog in `BytecodeInterpreter` is hairy due to early initialization of interpreter statics. Previous rewrites make it mostly redundant, and we can now simplify it.
> 
> This also implicitly fixes a initialization bug. If `JvmtiExport::can_post_interpreter_events()` changes at runtime, we will call into the uninitialized version:
> 
> 
>     // Call the interpreter
>     if (JvmtiExport::can_post_interpreter_events()) {
>       BytecodeInterpreter::run<true>(istate);
>     } else {
>       BytecodeInterpreter::run<false>(istate);
>     } 
> 
> 
> Additional testing:
>  - [x] Linux x86_64 fastdebug `make bootcycle-images`

Cool, thank you.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6029

From shade at openjdk.java.net  Thu Nov  4 10:26:18 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Thu, 4 Nov 2021 10:26:18 GMT
Subject: Integrated: 8275586: Zero: Simplify interpreter initialization
In-Reply-To: <8FCUBqssHqcaYRC6gnr37F8A9gGX1Hzvx8ny5BQblOY=.b54f2029-7a37-493a-bcc9-6fec9c29c943@github.com>
References: <8FCUBqssHqcaYRC6gnr37F8A9gGX1Hzvx8ny5BQblOY=.b54f2029-7a37-493a-bcc9-6fec9c29c943@github.com>
Message-ID: <MwGdKj3S4R3PoLRVE8aYKuk-J3vT7DQvI-mWL1GcTBg=.13bbe715-c45c-4521-92b7-290af70e6df2@github.com>

On Wed, 20 Oct 2021 07:44:36 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> The prolog in `BytecodeInterpreter` is hairy due to early initialization of interpreter statics. Previous rewrites make it mostly redundant, and we can now simplify it.
> 
> This also implicitly fixes a initialization bug. If `JvmtiExport::can_post_interpreter_events()` changes at runtime, we will call into the uninitialized version:
> 
> 
>     // Call the interpreter
>     if (JvmtiExport::can_post_interpreter_events()) {
>       BytecodeInterpreter::run<true>(istate);
>     } else {
>       BytecodeInterpreter::run<false>(istate);
>     } 
> 
> 
> Additional testing:
>  - [x] Linux x86_64 fastdebug `make bootcycle-images`

This pull request has now been integrated.

Changeset: 3613ce7c
Author:    Aleksey Shipilev <shade at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/3613ce7c7d5bc8b7d603e1cf6a123588339aed3f
Stats:     70 lines in 3 files changed: 7 ins; 48 del; 15 mod

8275586: Zero: Simplify interpreter initialization

Reviewed-by: aph, adinn

-------------

PR: https://git.openjdk.java.net/jdk/pull/6029

From simonis at openjdk.java.net  Thu Nov  4 12:18:46 2021
From: simonis at openjdk.java.net (Volker Simonis)
Date: Thu, 4 Nov 2021 12:18:46 GMT
Subject: RFR: 8273563: Improve performance of implicit exceptions with
 -XX:-OmitStackTraceInFastThrow [v4]
In-Reply-To: <V33bqOWqbYrr4ACfUn3SkS_JbOuICbX68z7jlX37pXQ=.a4a6b7ce-da22-440f-b3c7-d0a99a28fea0@github.com>
References: <V33bqOWqbYrr4ACfUn3SkS_JbOuICbX68z7jlX37pXQ=.a4a6b7ce-da22-440f-b3c7-d0a99a28fea0@github.com>
Message-ID: <aS-K9P9P2eRR8nOMDsgru4UEv1D7KgLRjHpC-4RfmBw=.a45fec02-f9d1-41e2-a892-f6dfcfcc3bf2@github.com>

> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening.
> 
> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274):
> 
>     public static boolean isAlpha(int c) {
>         try {
>             return IS_ALPHA[c];
>         } catch (ArrayIndexOutOfBoundsException ex) {
>             return false;
>         }
>     }
> 
> 
> ### Solution
> 
> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code:
> 
> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions
> Benchmark                 (exceptionProbability)  Mode  Cnt      Score      Error  Units
> ImplicitExceptions.bench                     0.0  avgt    5      1.430 ?    0.353  ns/op
> ImplicitExceptions.bench                    0.33  avgt    5   3563.038 ?   77.358  ns/op
> ImplicitExceptions.bench                    0.66  avgt    5   8609.693 ? 1205.104  ns/op
> ImplicitExceptions.bench                    1.00  avgt    5  12842.401 ? 1022.728  ns/op
> 
> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions
> Benchmark                 (exceptionProbability)  Mode  Cnt      Score      Error  Units
> ImplicitExceptions.bench                     0.0  avgt    5     1.432  ?    0.352  ns/op
> ImplicitExceptions.bench                    0.33  avgt    5   355.723  ?   16.641  ns/op
> ImplicitExceptions.bench                    0.66  avgt    5   887.068  ?  166.728  ns/op
> ImplicitExceptions.bench                    1.00  avgt    5  1274.418  ?   88.235  ns/op
> 
> 
> ### Implementation details
> 
> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default.
> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `<init>` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. 
> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`.
> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call.
> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster.

Volker Simonis has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits:

 - Added jtreg test and extended the Whitebox API to export decompile, deopt and trap counters
 - Fix special case where we're creating an implicit exception for a regular invoke* bytecode
 - Minor updates as requested by @TheRealMDoerr
 - 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow

-------------

Changes: https://git.openjdk.java.net/jdk/pull/5488/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5488&range=03
  Stats: 747 lines in 15 files changed: 739 ins; 0 del; 8 mod
  Patch: https://git.openjdk.java.net/jdk/pull/5488.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/5488/head:pull/5488

PR: https://git.openjdk.java.net/jdk/pull/5488

From zgu at openjdk.java.net  Thu Nov  4 12:28:16 2021
From: zgu at openjdk.java.net (Zhengyu Gu)
Date: Thu, 4 Nov 2021 12:28:16 GMT
Subject: RFR: 8275718: Relax memory constraint on exception counter updates
In-Reply-To: <qgQMT1D05siyxMw3TTZFyaA37gfJNk9Vmg_QiMLilcI=.be6f87ea-9e8b-4586-ab20-9c500cc3b2a8@github.com>
References: <c2Zx315h8Vh5o3lI8wa9Pvt6LU3jYx0xg0RicwpjBk0=.058e2612-2d0b-4c6c-94d8-1cd30a3853f4@github.com>
 <qgQMT1D05siyxMw3TTZFyaA37gfJNk9Vmg_QiMLilcI=.be6f87ea-9e8b-4586-ab20-9c500cc3b2a8@github.com>
Message-ID: <6ngB1es9Q-dgch7Z4qRzxSksn01hTlpVLmWHdYHUn98=.bef754fb-710d-4607-9a81-09135e70eb77@github.com>

On Thu, 4 Nov 2021 01:29:03 GMT, David Holmes <dholmes at openjdk.org> wrote:

> I'm not sure there is any actual benefit to this change, but I also do not see any harm. So okay.
> 
> Thanks, David

Thanks, @dholmes-ora 

I don't believe it has measurable impact neither. In theory, mo_conservative is much more expensive ...

-Zhengyu

-------------

PR: https://git.openjdk.java.net/jdk/pull/6065

From simonis at openjdk.java.net  Thu Nov  4 12:35:11 2021
From: simonis at openjdk.java.net (Volker Simonis)
Date: Thu, 4 Nov 2021 12:35:11 GMT
Subject: RFR: 8273563: Improve performance of implicit exceptions with
 -XX:-OmitStackTraceInFastThrow [v4]
In-Reply-To: <aS-K9P9P2eRR8nOMDsgru4UEv1D7KgLRjHpC-4RfmBw=.a45fec02-f9d1-41e2-a892-f6dfcfcc3bf2@github.com>
References: <V33bqOWqbYrr4ACfUn3SkS_JbOuICbX68z7jlX37pXQ=.a4a6b7ce-da22-440f-b3c7-d0a99a28fea0@github.com>
 <aS-K9P9P2eRR8nOMDsgru4UEv1D7KgLRjHpC-4RfmBw=.a45fec02-f9d1-41e2-a892-f6dfcfcc3bf2@github.com>
Message-ID: <eI0jNcCXoMIkTB-JGCeUEUp4KIO8Ra4tVpb4zRGv67c=.5a712fca-9e0d-4782-8f71-efbf5265708d@github.com>

On Thu, 4 Nov 2021 12:18:46 GMT, Volker Simonis <simonis at openjdk.org> wrote:

>> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening.
>> 
>> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274):
>> 
>>     public static boolean isAlpha(int c) {
>>         try {
>>             return IS_ALPHA[c];
>>         } catch (ArrayIndexOutOfBoundsException ex) {
>>             return false;
>>         }
>>     }
>> 
>> 
>> ### Solution
>> 
>> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code:
>> 
>> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions
>> Benchmark                 (exceptionProbability)  Mode  Cnt      Score      Error  Units
>> ImplicitExceptions.bench                     0.0  avgt    5      1.430 ?    0.353  ns/op
>> ImplicitExceptions.bench                    0.33  avgt    5   3563.038 ?   77.358  ns/op
>> ImplicitExceptions.bench                    0.66  avgt    5   8609.693 ? 1205.104  ns/op
>> ImplicitExceptions.bench                    1.00  avgt    5  12842.401 ? 1022.728  ns/op
>> 
>> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions
>> Benchmark                 (exceptionProbability)  Mode  Cnt      Score      Error  Units
>> ImplicitExceptions.bench                     0.0  avgt    5     1.432  ?    0.352  ns/op
>> ImplicitExceptions.bench                    0.33  avgt    5   355.723  ?   16.641  ns/op
>> ImplicitExceptions.bench                    0.66  avgt    5   887.068  ?  166.728  ns/op
>> ImplicitExceptions.bench                    1.00  avgt    5  1274.418  ?   88.235  ns/op
>> 
>> 
>> ### Implementation details
>> 
>> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default.
>> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `<init>` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. 
>> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`.
>> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call.
>> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster.
>
> Volker Simonis has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits:
> 
>  - Added jtreg test and extended the Whitebox API to export decompile, deopt and trap counters
>  - Fix special case where we're creating an implicit exception for a regular invoke* bytecode
>  - Minor updates as requested by @TheRealMDoerr
>  - 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow

Hi,

sorry for the delay. I've had a look at the IR Test Framework but I didn't found it to be a best fit for this  change. I also wanted to have a test which woks in both, product and debug builds.

So I have instead extended the Whitebox API to expose the decompile, deopt and trap counters. I think (and hope) this functionality will be helpful for others in the future.

The test itself got quite elaborate which is partially because different built-in exceptions are currently profiled and compiled differently (see [JDK-8275908: Record null_check traps for calls and array_check traps in the interpreter](https://bugs.openjdk.java.net/browse/JDK-8275908)). The current jtreg test can also serve as a test for JDK-8275908 once it will be fixed (just have to set the `JDK8275908_fixed` field to `true`).

As I've mentioned before, I did run a full set of jtreg and JCK tests together with some benchmark suits with a special build with `-XX:-OmitStackTraceInFastThrow` disabled by default and couldn't find any issue.

Please take a look,
Volker

-------------

PR: https://git.openjdk.java.net/jdk/pull/5488

From stuefe at openjdk.java.net  Thu Nov  4 13:33:28 2021
From: stuefe at openjdk.java.net (Thomas Stuefe)
Date: Thu, 4 Nov 2021 13:33:28 GMT
Subject: RFR: JDK-8272065: jcmd cannot rely on the old core reflection
 implementation which will be changed after JEP 416
Message-ID: <rFA3D6A4-VutPJqmsw3VJwQv7ppfv-Pj9PVB_piIU8M=.9af77966-baa7-415e-8b3d-95a8cf47ea3e@github.com>

`VM.metaspace`, `VM.classloaders` and `VM.class_hierarchy` all print out reflection invocation targets for delegating reflection class loaders. Post JEP 416 we don't use DelegatingClassLoaders anymore.

This patch removes the display of reflection targets from these commands as well as associated helper code and tests.

I don't have enough time atm to reimplement this feature using method handles. But at least we can remove the old code, and prepare the way for more code removal.

The patch does not touch vmClasses, `reflect_ConstructorAccessor` and `reflect_MethodAccessor` are both still there.

Tests: GHAs, manually testing the commands.

-------------

Commit messages:
 - Remove reflection invocation target printing from VM.metaspace, VM.classloaders, VM.class_hierarchy

Changes: https://git.openjdk.java.net/jdk/pull/6257/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6257&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8272065
  Stats: 368 lines in 8 files changed: 0 ins; 367 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6257.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6257/head:pull/6257

PR: https://git.openjdk.java.net/jdk/pull/6257

From psandoz at openjdk.java.net  Thu Nov  4 15:56:46 2021
From: psandoz at openjdk.java.net (Paul Sandoz)
Date: Thu, 4 Nov 2021 15:56:46 GMT
Subject: RFR: 8271515: Integration of JEP 417: Vector API (Third Incubator)
 [v8]
In-Reply-To: <_QQ9ntdJJfzVcAGrbjev0ZM-xNfD4wNATphnXkb-Y00=.bbf46985-8776-4dda-ada5-b15ab50774aa@github.com>
References: <_QQ9ntdJJfzVcAGrbjev0ZM-xNfD4wNATphnXkb-Y00=.bbf46985-8776-4dda-ada5-b15ab50774aa@github.com>
Message-ID: <K4wUsSStO_un_Cz_hHNQfxDo4uEZQRE9OI-df56qiJs=.4d12c6d9-1e6b-454c-ac14-91931d3ec3ad@github.com>

> This PR improves the performance of vector operations that accept masks on architectures that support masking in hardware, specifically Intel AVX512 and ARM SVE.
> 
> On architectures that do not support masking in hardware the same technique as before is applied to most operations, specifically composition using blend.
> 
> Masked loads/stores are a special form of masked operation that require additional care to ensure out-of-bounds access throw exceptions. The range checking has not been fully optimized and will require further work.
> 
> No API enhancements were required and only a few additional tests were needed.

Paul Sandoz has updated the pull request incrementally with two additional commits since the last revision:

 - Merge pull request #2 from nsjian/vector-conversion-fix
   
   AArch64: Incorrect SVE double to int and float to long vector conversion
 - Incorrect double to int and float to long vector conversion
   
   Like JDK-8276151, SVE vector double to int and float to long
   conversions have similar issue. According to Java language
   specification [1], we should convert double/float to
   integer/long directly, instead of converting to long/int and then
   narrowing/extending to target types. Test cases will be updated in
   JDK-8276151.
   
   [1] https://docs.oracle.com/javase/specs/jls/se17/html/jls-5.html#jls-5.1.3

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/5873/files
  - new: https://git.openjdk.java.net/jdk/pull/5873/files/c9a77225..571e6f39

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5873&range=07
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5873&range=06-07

  Stats: 40 lines in 2 files changed: 22 ins; 4 del; 14 mod
  Patch: https://git.openjdk.java.net/jdk/pull/5873.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/5873/head:pull/5873

PR: https://git.openjdk.java.net/jdk/pull/5873

From simonis at openjdk.java.net  Thu Nov  4 16:02:48 2021
From: simonis at openjdk.java.net (Volker Simonis)
Date: Thu, 4 Nov 2021 16:02:48 GMT
Subject: RFR: 8273563: Improve performance of implicit exceptions with
 -XX:-OmitStackTraceInFastThrow [v5]
In-Reply-To: <V33bqOWqbYrr4ACfUn3SkS_JbOuICbX68z7jlX37pXQ=.a4a6b7ce-da22-440f-b3c7-d0a99a28fea0@github.com>
References: <V33bqOWqbYrr4ACfUn3SkS_JbOuICbX68z7jlX37pXQ=.a4a6b7ce-da22-440f-b3c7-d0a99a28fea0@github.com>
Message-ID: <wqEL17qjlzQ2g8UoXS41eNUrumNRdKgLl4pFHNJ7Tbc=.38c54808-bdd8-4ef5-9c61-2074bbaa1a3d@github.com>

> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening.
> 
> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274):
> 
>     public static boolean isAlpha(int c) {
>         try {
>             return IS_ALPHA[c];
>         } catch (ArrayIndexOutOfBoundsException ex) {
>             return false;
>         }
>     }
> 
> 
> ### Solution
> 
> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code:
> 
> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions
> Benchmark                 (exceptionProbability)  Mode  Cnt      Score      Error  Units
> ImplicitExceptions.bench                     0.0  avgt    5      1.430 ?    0.353  ns/op
> ImplicitExceptions.bench                    0.33  avgt    5   3563.038 ?   77.358  ns/op
> ImplicitExceptions.bench                    0.66  avgt    5   8609.693 ? 1205.104  ns/op
> ImplicitExceptions.bench                    1.00  avgt    5  12842.401 ? 1022.728  ns/op
> 
> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions
> Benchmark                 (exceptionProbability)  Mode  Cnt      Score      Error  Units
> ImplicitExceptions.bench                     0.0  avgt    5     1.432  ?    0.352  ns/op
> ImplicitExceptions.bench                    0.33  avgt    5   355.723  ?   16.641  ns/op
> ImplicitExceptions.bench                    0.66  avgt    5   887.068  ?  166.728  ns/op
> ImplicitExceptions.bench                    1.00  avgt    5  1274.418  ?   88.235  ns/op
> 
> 
> ### Implementation details
> 
> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default.
> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `<init>` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. 
> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`.
> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call.
> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster.

Volker Simonis has updated the pull request incrementally with one additional commit since the last revision:

  Fix build issue for minimal/zero build

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/5488/files
  - new: https://git.openjdk.java.net/jdk/pull/5488/files/8043f8d0..bdf37bf2

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5488&range=04
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5488&range=03-04

  Stats: 8 lines in 1 file changed: 8 ins; 0 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/5488.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/5488/head:pull/5488

PR: https://git.openjdk.java.net/jdk/pull/5488

From kvn at openjdk.java.net  Thu Nov  4 16:16:36 2021
From: kvn at openjdk.java.net (Vladimir Kozlov)
Date: Thu, 4 Nov 2021 16:16:36 GMT
Subject: RFR: 8276571: C2: pass compilation options as structure [v2]
In-Reply-To: <X0svBmkpcS4RQn4h4C4sLw-XpUALo6_uXUOiHc7HxP0=.63aa5d9b-fd29-4351-bac4-e63c24fbd6dc@github.com>
References: <X0svBmkpcS4RQn4h4C4sLw-XpUALo6_uXUOiHc7HxP0=.63aa5d9b-fd29-4351-bac4-e63c24fbd6dc@github.com>
Message-ID: <jefRE3Fbf7ymWVeq1aL-MEmi_apLOl-mRfvbZ5aAy_Y=.c65eeef6-293f-40bd-8490-6932f04fea5e@github.com>

> Currently we pass several compilation options as separate arguments to `Compile`: 
> 
> Compile C(env, target, entry_bci, subsume_loads, do_escape_analysis, eliminate_boxing, do_locks_coarsening, install_code, directive); 
> 
> Originally we had only `subsume_loads` option but we added few since then and we may add more. 
> 
> I suggest to add new `Options` class to pass these values into `Compile`.

Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision:

  Address review comments

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6237/files
  - new: https://git.openjdk.java.net/jdk/pull/6237/files/34f29c8d..d2490eb4

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6237&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6237&range=00-01

  Stats: 12 lines in 2 files changed: 8 ins; 2 del; 2 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6237.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6237/head:pull/6237

PR: https://git.openjdk.java.net/jdk/pull/6237

From kvn at openjdk.java.net  Thu Nov  4 16:16:38 2021
From: kvn at openjdk.java.net (Vladimir Kozlov)
Date: Thu, 4 Nov 2021 16:16:38 GMT
Subject: RFR: 8276571: C2: pass compilation options as structure [v2]
In-Reply-To: <gaxcPfvMQhPLFi0KBKJcXOPdFp-K_mjkMTipcXQV4F4=.62a5feaf-6e51-4f83-a79a-dcedfff94730@github.com>
References: <X0svBmkpcS4RQn4h4C4sLw-XpUALo6_uXUOiHc7HxP0=.63aa5d9b-fd29-4351-bac4-e63c24fbd6dc@github.com>
 <gaxcPfvMQhPLFi0KBKJcXOPdFp-K_mjkMTipcXQV4F4=.62a5feaf-6e51-4f83-a79a-dcedfff94730@github.com>
Message-ID: <a9HMjl-J4gJTijELK8kdg5WAcQwCWTCg27LYvJzsMX8=.a87f538d-921a-4053-b677-a9498c59e98c@github.com>

On Thu, 4 Nov 2021 09:29:44 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Address review comments
>
> src/hotspot/share/opto/compile.cpp line 490:
> 
>> 488: #ifndef PRODUCT
>> 489:   // Check if recompiling
>> 490:   if ((subsume_loads() == false) && PrintOpto) {
> 
> Suggestion:
> 
>   if (!subsume_loads() && PrintOpto) {

done

-------------

PR: https://git.openjdk.java.net/jdk/pull/6237

From shade at openjdk.java.net  Thu Nov  4 16:21:10 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Thu, 4 Nov 2021 16:21:10 GMT
Subject: RFR: 8276571: C2: pass compilation options as structure [v2]
In-Reply-To: <jefRE3Fbf7ymWVeq1aL-MEmi_apLOl-mRfvbZ5aAy_Y=.c65eeef6-293f-40bd-8490-6932f04fea5e@github.com>
References: <X0svBmkpcS4RQn4h4C4sLw-XpUALo6_uXUOiHc7HxP0=.63aa5d9b-fd29-4351-bac4-e63c24fbd6dc@github.com>
 <jefRE3Fbf7ymWVeq1aL-MEmi_apLOl-mRfvbZ5aAy_Y=.c65eeef6-293f-40bd-8490-6932f04fea5e@github.com>
Message-ID: <YRgeLrEr_EUyALZRZbj6IPffXWsobTIFOqbl9uefz_I=.e5ce5c5a-33f7-4b31-9099-3e24e0c03556@github.com>

On Thu, 4 Nov 2021 16:16:36 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

>> Currently we pass several compilation options as separate arguments to `Compile`: 
>> 
>> Compile C(env, target, entry_bci, subsume_loads, do_escape_analysis, eliminate_boxing, do_locks_coarsening, install_code, directive); 
>> 
>> Originally we had only `subsume_loads` option but we added few since then and we may add more. 
>> 
>> I suggest to add new `Options` class to pass these values into `Compile`.
>
> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Address review comments

All right, this works too. Next user of `Options` would probably have to introduce per-use factory methods to disambiguate constructors, so maybe we could do this early on.


class Options {
  static Options for_runtime_stub_gen() const {
    return Options(
       /* subsume_loads = */ true,
       /* do_escape_analysis = */ false,
       /* eliminate_boxing = */ false,
       /* do_lock_coarsening = */ false,
       /* install_code = */ true
    );
  }
}

-------------

Marked as reviewed by shade (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6237

From kvn at openjdk.java.net  Thu Nov  4 16:25:11 2021
From: kvn at openjdk.java.net (Vladimir Kozlov)
Date: Thu, 4 Nov 2021 16:25:11 GMT
Subject: RFR: 8276571: C2: pass compilation options as structure [v2]
In-Reply-To: <jefRE3Fbf7ymWVeq1aL-MEmi_apLOl-mRfvbZ5aAy_Y=.c65eeef6-293f-40bd-8490-6932f04fea5e@github.com>
References: <X0svBmkpcS4RQn4h4C4sLw-XpUALo6_uXUOiHc7HxP0=.63aa5d9b-fd29-4351-bac4-e63c24fbd6dc@github.com>
 <jefRE3Fbf7ymWVeq1aL-MEmi_apLOl-mRfvbZ5aAy_Y=.c65eeef6-293f-40bd-8490-6932f04fea5e@github.com>
Message-ID: <-H2T_dh5-4rkYwmEZRdw9dIkyyBsLIVdeqbbGLgoq_s=.1eda85b6-8dbf-4373-bd15-f07b767c0665@github.com>

On Thu, 4 Nov 2021 16:16:36 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

>> Currently we pass several compilation options as separate arguments to `Compile`: 
>> 
>> Compile C(env, target, entry_bci, subsume_loads, do_escape_analysis, eliminate_boxing, do_locks_coarsening, install_code, directive); 
>> 
>> Originally we had only `subsume_loads` option but we added few since then and we may add more. 
>> 
>> I suggest to add new `Options` class to pass these values into `Compile`.
>
> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Address review comments

Thank you, Aleksey, for review.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6237

From simonis at openjdk.java.net  Thu Nov  4 16:28:52 2021
From: simonis at openjdk.java.net (Volker Simonis)
Date: Thu, 4 Nov 2021 16:28:52 GMT
Subject: RFR: 8273563: Improve performance of implicit exceptions with
 -XX:-OmitStackTraceInFastThrow [v6]
In-Reply-To: <V33bqOWqbYrr4ACfUn3SkS_JbOuICbX68z7jlX37pXQ=.a4a6b7ce-da22-440f-b3c7-d0a99a28fea0@github.com>
References: <V33bqOWqbYrr4ACfUn3SkS_JbOuICbX68z7jlX37pXQ=.a4a6b7ce-da22-440f-b3c7-d0a99a28fea0@github.com>
Message-ID: <0II1AGxTmud7vWbpWeKGm_vPr_yqFuVaauWEzBN_pMw=.f09fc7be-1833-431b-8353-161b9dad3cf4@github.com>

> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening.
> 
> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274):
> 
>     public static boolean isAlpha(int c) {
>         try {
>             return IS_ALPHA[c];
>         } catch (ArrayIndexOutOfBoundsException ex) {
>             return false;
>         }
>     }
> 
> 
> ### Solution
> 
> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code:
> 
> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions
> Benchmark                 (exceptionProbability)  Mode  Cnt      Score      Error  Units
> ImplicitExceptions.bench                     0.0  avgt    5      1.430 ?    0.353  ns/op
> ImplicitExceptions.bench                    0.33  avgt    5   3563.038 ?   77.358  ns/op
> ImplicitExceptions.bench                    0.66  avgt    5   8609.693 ? 1205.104  ns/op
> ImplicitExceptions.bench                    1.00  avgt    5  12842.401 ? 1022.728  ns/op
> 
> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions
> Benchmark                 (exceptionProbability)  Mode  Cnt      Score      Error  Units
> ImplicitExceptions.bench                     0.0  avgt    5     1.432  ?    0.352  ns/op
> ImplicitExceptions.bench                    0.33  avgt    5   355.723  ?   16.641  ns/op
> ImplicitExceptions.bench                    0.66  avgt    5   887.068  ?  166.728  ns/op
> ImplicitExceptions.bench                    1.00  avgt    5  1274.418  ?   88.235  ns/op
> 
> 
> ### Implementation details
> 
> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default.
> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `<init>` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. 
> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`.
> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call.
> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster.

Volker Simonis has updated the pull request incrementally with one additional commit since the last revision:

  Add new WhiteBox functionality to sun/hotspot/WhiteBox.java as well to avoid warnings in the tests which are still using it.

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/5488/files
  - new: https://git.openjdk.java.net/jdk/pull/5488/files/bdf37bf2..99db7e54

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5488&range=05
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5488&range=04-05

  Stats: 30 lines in 1 file changed: 30 ins; 0 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/5488.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/5488/head:pull/5488

PR: https://git.openjdk.java.net/jdk/pull/5488

From kvn at openjdk.java.net  Thu Nov  4 16:39:47 2021
From: kvn at openjdk.java.net (Vladimir Kozlov)
Date: Thu, 4 Nov 2021 16:39:47 GMT
Subject: RFR: 8276571: C2: pass compilation options as structure [v2]
In-Reply-To: <YRgeLrEr_EUyALZRZbj6IPffXWsobTIFOqbl9uefz_I=.e5ce5c5a-33f7-4b31-9099-3e24e0c03556@github.com>
References: <X0svBmkpcS4RQn4h4C4sLw-XpUALo6_uXUOiHc7HxP0=.63aa5d9b-fd29-4351-bac4-e63c24fbd6dc@github.com>
 <jefRE3Fbf7ymWVeq1aL-MEmi_apLOl-mRfvbZ5aAy_Y=.c65eeef6-293f-40bd-8490-6932f04fea5e@github.com>
 <YRgeLrEr_EUyALZRZbj6IPffXWsobTIFOqbl9uefz_I=.e5ce5c5a-33f7-4b31-9099-3e24e0c03556@github.com>
Message-ID: <nD48uY0mjjbXn06rsjV8vfSeUHER_9wTVeUrmgoeJ8g=.8649407d-1211-46cc-96fd-efbc31eb4149@github.com>

On Thu, 4 Nov 2021 16:17:44 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> All right, this works too. Next user of `Options` would probably have to introduce per-use factory methods to disambiguate constructors, so maybe we could do this early on.

I agree.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6237

From kvn at openjdk.java.net  Thu Nov  4 16:39:45 2021
From: kvn at openjdk.java.net (Vladimir Kozlov)
Date: Thu, 4 Nov 2021 16:39:45 GMT
Subject: RFR: 8276571: C2: pass compilation options as structure [v3]
In-Reply-To: <X0svBmkpcS4RQn4h4C4sLw-XpUALo6_uXUOiHc7HxP0=.63aa5d9b-fd29-4351-bac4-e63c24fbd6dc@github.com>
References: <X0svBmkpcS4RQn4h4C4sLw-XpUALo6_uXUOiHc7HxP0=.63aa5d9b-fd29-4351-bac4-e63c24fbd6dc@github.com>
Message-ID: <zNDdDM6pSDDO6MCIYGYPGBhNmymwd11Arg2ErUxZzK4=.4c04644d-e9fc-4195-8fbd-51ef106bdf77@github.com>

> Currently we pass several compilation options as separate arguments to `Compile`: 
> 
> Compile C(env, target, entry_bci, subsume_loads, do_escape_analysis, eliminate_boxing, do_locks_coarsening, install_code, directive); 
> 
> Originally we had only `subsume_loads` option but we added few since then and we may add more. 
> 
> I suggest to add new `Options` class to pass these values into `Compile`.

Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision:

  Per-use Options factory method

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6237/files
  - new: https://git.openjdk.java.net/jdk/pull/6237/files/d2490eb4..4565547e

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6237&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6237&range=01-02

  Stats: 9 lines in 2 files changed: 2 ins; 0 del; 7 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6237.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6237/head:pull/6237

PR: https://git.openjdk.java.net/jdk/pull/6237

From mchung at openjdk.java.net  Thu Nov  4 17:05:22 2021
From: mchung at openjdk.java.net (Mandy Chung)
Date: Thu, 4 Nov 2021 17:05:22 GMT
Subject: RFR: JDK-8272065: jcmd cannot rely on the old core reflection
 implementation which will be changed after JEP 416
In-Reply-To: <rFA3D6A4-VutPJqmsw3VJwQv7ppfv-Pj9PVB_piIU8M=.9af77966-baa7-415e-8b3d-95a8cf47ea3e@github.com>
References: <rFA3D6A4-VutPJqmsw3VJwQv7ppfv-Pj9PVB_piIU8M=.9af77966-baa7-415e-8b3d-95a8cf47ea3e@github.com>
Message-ID: <KjwvC6CQy34mPsRSa6nTw27h2OjEBiGFbsmoRUIgx6I=.3f5c2ec8-1ac0-4b7c-b63f-1aefba753cb4@github.com>

On Thu, 4 Nov 2021 13:25:14 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> `VM.metaspace`, `VM.classloaders` and `VM.class_hierarchy` all print out reflection invocation targets for delegating reflection class loaders. Post JEP 416 we don't use DelegatingClassLoaders anymore.
> 
> This patch removes the display of reflection targets from these commands as well as associated helper code and tests.
> 
> I don't have enough time atm to reimplement this feature using method handles. But at least we can remove the old code, and prepare the way for more code removal.
> 
> The patch does not touch vmClasses, `reflect_ConstructorAccessor` and `reflect_MethodAccessor` are both still there.
> 
> Tests: GHAs, manually testing the commands.

Looks good to me.   Thanks for following this up.    The new implementation does not spin any new class loader and so I don't think jcmd needs to extend its support for the new implementation using method handles.

-------------

Marked as reviewed by mchung (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6257

From stuefe at openjdk.java.net  Thu Nov  4 18:33:12 2021
From: stuefe at openjdk.java.net (Thomas Stuefe)
Date: Thu, 4 Nov 2021 18:33:12 GMT
Subject: RFR: JDK-8272065: jcmd cannot rely on the old core reflection
 implementation which will be changed after JEP 416
In-Reply-To: <KjwvC6CQy34mPsRSa6nTw27h2OjEBiGFbsmoRUIgx6I=.3f5c2ec8-1ac0-4b7c-b63f-1aefba753cb4@github.com>
References: <rFA3D6A4-VutPJqmsw3VJwQv7ppfv-Pj9PVB_piIU8M=.9af77966-baa7-415e-8b3d-95a8cf47ea3e@github.com>
 <KjwvC6CQy34mPsRSa6nTw27h2OjEBiGFbsmoRUIgx6I=.3f5c2ec8-1ac0-4b7c-b63f-1aefba753cb4@github.com>
Message-ID: <wSbAXB0EzWsouvRalyfV8wHWZGX4__FN1v2UeRXQb-Y=.ecc4a666-759e-4b70-9de7-4a9546876659@github.com>

On Thu, 4 Nov 2021 17:02:05 GMT, Mandy Chung <mchung at openjdk.org> wrote:

> Looks good to me. Thanks for following this up. The new implementation does not spin any new class loader and so I don't think jcmd needs to extend its support for the new implementation using method handles.

Thank you Mandy!

-------------

PR: https://git.openjdk.java.net/jdk/pull/6257

From coleenp at openjdk.java.net  Thu Nov  4 19:16:09 2021
From: coleenp at openjdk.java.net (Coleen Phillimore)
Date: Thu, 4 Nov 2021 19:16:09 GMT
Subject: RFR: JDK-8272065: jcmd cannot rely on the old core reflection
 implementation which will be changed after JEP 416
In-Reply-To: <rFA3D6A4-VutPJqmsw3VJwQv7ppfv-Pj9PVB_piIU8M=.9af77966-baa7-415e-8b3d-95a8cf47ea3e@github.com>
References: <rFA3D6A4-VutPJqmsw3VJwQv7ppfv-Pj9PVB_piIU8M=.9af77966-baa7-415e-8b3d-95a8cf47ea3e@github.com>
Message-ID: <ubFuH-1RX-e9mIR-t94MAvoO9vymggJU-DW9fZIdN50=.023ec773-7967-44f6-8fb9-1fcdc8782630@github.com>

On Thu, 4 Nov 2021 13:25:14 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> `VM.metaspace`, `VM.classloaders` and `VM.class_hierarchy` all print out reflection invocation targets for delegating reflection class loaders. Post JEP 416 we don't use DelegatingClassLoaders anymore.
> 
> This patch removes the display of reflection targets from these commands as well as associated helper code and tests.
> 
> I don't have enough time atm to reimplement this feature using method handles. But at least we can remove the old code, and prepare the way for more code removal.
> 
> The patch does not touch vmClasses, `reflect_ConstructorAccessor` and `reflect_MethodAccessor` are both still there.
> 
> Tests: GHAs, manually testing the commands.

Yes, looks good to me also.

-------------

Marked as reviewed by coleenp (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6257

From minqi at openjdk.java.net  Thu Nov  4 19:26:14 2021
From: minqi at openjdk.java.net (Yumin Qi)
Date: Thu, 4 Nov 2021 19:26:14 GMT
Subject: RFR: 8275718: Relax memory constraint on exception counter updates
In-Reply-To: <c2Zx315h8Vh5o3lI8wa9Pvt6LU3jYx0xg0RicwpjBk0=.058e2612-2d0b-4c6c-94d8-1cd30a3853f4@github.com>
References: <c2Zx315h8Vh5o3lI8wa9Pvt6LU3jYx0xg0RicwpjBk0=.058e2612-2d0b-4c6c-94d8-1cd30a3853f4@github.com>
Message-ID: <3QIVN0gOFurHxsfnBAqIwJGC25AbniAUb4jizq2ffyw=.4dcd7164-72d3-45fd-b202-0b540a0d6263@github.com>

On Thu, 21 Oct 2021 15:16:28 GMT, Zhengyu Gu <zgu at openjdk.org> wrote:

> This is another instance of counter updates that only need atomic guarantee.

LGTM.

-------------

Marked as reviewed by minqi (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6065

From zgu at openjdk.java.net  Thu Nov  4 19:44:21 2021
From: zgu at openjdk.java.net (Zhengyu Gu)
Date: Thu, 4 Nov 2021 19:44:21 GMT
Subject: RFR: 8275718: Relax memory constraint on exception counter updates
In-Reply-To: <3QIVN0gOFurHxsfnBAqIwJGC25AbniAUb4jizq2ffyw=.4dcd7164-72d3-45fd-b202-0b540a0d6263@github.com>
References: <c2Zx315h8Vh5o3lI8wa9Pvt6LU3jYx0xg0RicwpjBk0=.058e2612-2d0b-4c6c-94d8-1cd30a3853f4@github.com>
 <3QIVN0gOFurHxsfnBAqIwJGC25AbniAUb4jizq2ffyw=.4dcd7164-72d3-45fd-b202-0b540a0d6263@github.com>
Message-ID: <0VzHUKSj8F8J8OpE0HSgXigHmXjrKy6h5hhdsFcFZhI=.a8ed0b94-2123-46d2-8d4f-755cfb6e5f3a@github.com>

On Thu, 4 Nov 2021 19:22:49 GMT, Yumin Qi <minqi at openjdk.org> wrote:

>> This is another instance of counter updates that only need atomic guarantee.
>
> LGTM.

Thanks, @yminqi

-------------

PR: https://git.openjdk.java.net/jdk/pull/6065

From zgu at openjdk.java.net  Thu Nov  4 19:44:21 2021
From: zgu at openjdk.java.net (Zhengyu Gu)
Date: Thu, 4 Nov 2021 19:44:21 GMT
Subject: Integrated: 8275718: Relax memory constraint on exception counter
 updates
In-Reply-To: <c2Zx315h8Vh5o3lI8wa9Pvt6LU3jYx0xg0RicwpjBk0=.058e2612-2d0b-4c6c-94d8-1cd30a3853f4@github.com>
References: <c2Zx315h8Vh5o3lI8wa9Pvt6LU3jYx0xg0RicwpjBk0=.058e2612-2d0b-4c6c-94d8-1cd30a3853f4@github.com>
Message-ID: <rp4zZalzQw6BIC8-GpJc696jb2qh5uCQf6cshEPMX3k=.7056ab61-da41-4620-ba40-ba47a52008c9@github.com>

On Thu, 21 Oct 2021 15:16:28 GMT, Zhengyu Gu <zgu at openjdk.org> wrote:

> This is another instance of counter updates that only need atomic guarantee.

This pull request has now been integrated.

Changeset: 2b5a32c7
Author:    Zhengyu Gu <zgu at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/2b5a32c73f22c69d7ccedac761af1dbb4a7f297d
Stats:     5 lines in 1 file changed: 0 ins; 0 del; 5 mod

8275718: Relax memory constraint on exception counter updates

Reviewed-by: dholmes, minqi

-------------

PR: https://git.openjdk.java.net/jdk/pull/6065

From ngasson at openjdk.java.net  Fri Nov  5 02:43:12 2021
From: ngasson at openjdk.java.net (Nick Gasson)
Date: Fri, 5 Nov 2021 02:43:12 GMT
Subject: RFR: 8271515: Integration of JEP 417: Vector API (Third Incubator)
 [v8]
In-Reply-To: <K4wUsSStO_un_Cz_hHNQfxDo4uEZQRE9OI-df56qiJs=.4d12c6d9-1e6b-454c-ac14-91931d3ec3ad@github.com>
References: <_QQ9ntdJJfzVcAGrbjev0ZM-xNfD4wNATphnXkb-Y00=.bbf46985-8776-4dda-ada5-b15ab50774aa@github.com>
 <K4wUsSStO_un_Cz_hHNQfxDo4uEZQRE9OI-df56qiJs=.4d12c6d9-1e6b-454c-ac14-91931d3ec3ad@github.com>
Message-ID: <RZfdUsx06FfmyVgtdF7Z0Z5U3rwgoSZEDn-4hs_sG0Q=.65a1b4dc-fb54-461d-8560-c0cda48c4794@github.com>

On Thu, 4 Nov 2021 15:56:46 GMT, Paul Sandoz <psandoz at openjdk.org> wrote:

>> This PR improves the performance of vector operations that accept masks on architectures that support masking in hardware, specifically Intel AVX512 and ARM SVE.
>> 
>> On architectures that do not support masking in hardware the same technique as before is applied to most operations, specifically composition using blend.
>> 
>> Masked loads/stores are a special form of masked operation that require additional care to ensure out-of-bounds access throw exceptions. The range checking has not been fully optimized and will require further work.
>> 
>> No API enhancements were required and only a few additional tests were needed.
>
> Paul Sandoz has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Merge pull request #2 from nsjian/vector-conversion-fix
>    
>    AArch64: Incorrect SVE double to int and float to long vector conversion
>  - Incorrect double to int and float to long vector conversion
>    
>    Like JDK-8276151, SVE vector double to int and float to long
>    conversions have similar issue. According to Java language
>    specification [1], we should convert double/float to
>    integer/long directly, instead of converting to long/int and then
>    narrowing/extending to target types. Test cases will be updated in
>    JDK-8276151.
>    
>    [1] https://docs.oracle.com/javase/specs/jls/se17/html/jls-5.html#jls-5.1.3

Marked as reviewed by ngasson (Reviewer).

-------------

PR: https://git.openjdk.java.net/jdk/pull/5873

From njian at openjdk.java.net  Fri Nov  5 02:43:13 2021
From: njian at openjdk.java.net (Ningsheng Jian)
Date: Fri, 5 Nov 2021 02:43:13 GMT
Subject: RFR: 8271515: Integration of JEP 417: Vector API (Third Incubator)
 [v7]
In-Reply-To: <4RJyhhtKPTjcJ894CoYqMYX0RdAsjRj0wwDcug9x4I8=.12d8e963-dc36-4cce-ad1b-241188dadd7b@github.com>
References: <_QQ9ntdJJfzVcAGrbjev0ZM-xNfD4wNATphnXkb-Y00=.bbf46985-8776-4dda-ada5-b15ab50774aa@github.com>
 <OpWaNZuhL36S1Co2ItS-TiqRn7IIzEe_GF6MEAwZzwk=.9107d865-8b13-4a23-b381-424f7b017e95@github.com>
 <4RJyhhtKPTjcJ894CoYqMYX0RdAsjRj0wwDcug9x4I8=.12d8e963-dc36-4cce-ad1b-241188dadd7b@github.com>
Message-ID: <k7ND3YAntjRWP53UvllPA6QQbV9cggXAcXRULugXX4M=.715b4a27-5d4a-4403-b927-4496c08b7f4b@github.com>

On Wed, 3 Nov 2021 03:10:16 GMT, Ningsheng Jian <njian at openjdk.org> wrote:

> Converting from double to long and then narrow to target types did not follow JLS. I will fix it. Thanks to @fg1417 for helping to find out this issue.

Fixed in the new commit. Thanks to @PaulSandoz for integrating the fix!

Hi Nick @nick-arm ,

Could you please help to review the new commit, which fixes the same issue as JDK-8276151 for SVE? Thanks!

-------------

PR: https://git.openjdk.java.net/jdk/pull/5873

From dholmes at openjdk.java.net  Fri Nov  5 05:03:09 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Fri, 5 Nov 2021 05:03:09 GMT
Subject: RFR: JDK-8272065: jcmd cannot rely on the old core reflection
 implementation which will be changed after JEP 416
In-Reply-To: <rFA3D6A4-VutPJqmsw3VJwQv7ppfv-Pj9PVB_piIU8M=.9af77966-baa7-415e-8b3d-95a8cf47ea3e@github.com>
References: <rFA3D6A4-VutPJqmsw3VJwQv7ppfv-Pj9PVB_piIU8M=.9af77966-baa7-415e-8b3d-95a8cf47ea3e@github.com>
Message-ID: <IoZDYRyGPiUt9SjWWxE_SSPn1xzxMl3cjUsAqxkPLAM=.1ac59c16-b3e4-4fa0-9e69-6c7961b73e24@github.com>

On Thu, 4 Nov 2021 13:25:14 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> `VM.metaspace`, `VM.classloaders` and `VM.class_hierarchy` all print out reflection invocation targets for delegating reflection class loaders. Post JEP 416 we don't use DelegatingClassLoaders anymore.
> 
> This patch removes the display of reflection targets from these commands as well as associated helper code and tests.
> 
> I don't have enough time atm to reimplement this feature using method handles. But at least we can remove the old code, and prepare the way for more code removal.
> 
> The patch does not touch vmClasses, `reflect_ConstructorAccessor` and `reflect_MethodAccessor` are both still there.
> 
> Tests: GHAs, manually testing the commands.

I never realized we needed special handling for these classloaders so I'm glad to see this gone too.

Thanks,
David

-------------

Marked as reviewed by dholmes (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6257

From stuefe at openjdk.java.net  Fri Nov  5 05:19:15 2021
From: stuefe at openjdk.java.net (Thomas Stuefe)
Date: Fri, 5 Nov 2021 05:19:15 GMT
Subject: RFR: JDK-8272065: jcmd cannot rely on the old core reflection
 implementation which will be changed after JEP 416
In-Reply-To: <KjwvC6CQy34mPsRSa6nTw27h2OjEBiGFbsmoRUIgx6I=.3f5c2ec8-1ac0-4b7c-b63f-1aefba753cb4@github.com>
References: <rFA3D6A4-VutPJqmsw3VJwQv7ppfv-Pj9PVB_piIU8M=.9af77966-baa7-415e-8b3d-95a8cf47ea3e@github.com>
 <KjwvC6CQy34mPsRSa6nTw27h2OjEBiGFbsmoRUIgx6I=.3f5c2ec8-1ac0-4b7c-b63f-1aefba753cb4@github.com>
Message-ID: <WGn5718rMsKOs4ODx0POSObQH6Dg4jr_EIXA7e_tFCM=.752638d1-62e0-461a-81ee-d981de0d7b1d@github.com>

On Thu, 4 Nov 2021 17:02:05 GMT, Mandy Chung <mchung at openjdk.org> wrote:

>> `VM.metaspace`, `VM.classloaders` and `VM.class_hierarchy` all print out reflection invocation targets for delegating reflection class loaders. Post JEP 416 we don't use DelegatingClassLoaders anymore.
>> 
>> This patch removes the display of reflection targets from these commands as well as associated helper code and tests.
>> 
>> I don't have enough time atm to reimplement this feature using method handles. But at least we can remove the old code, and prepare the way for more code removal.
>> 
>> The patch does not touch vmClasses, `reflect_ConstructorAccessor` and `reflect_MethodAccessor` are both still there.
>> 
>> Tests: GHAs, manually testing the commands.
>
> Looks good to me.   Thanks for following this up.    The new implementation does not spin any new class loader and so I don't think jcmd needs to extend its support for the new implementation using method handles.

Thanks @mlchung, @coleenp and @dholmes-ora.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6257

From stuefe at openjdk.java.net  Fri Nov  5 05:19:15 2021
From: stuefe at openjdk.java.net (Thomas Stuefe)
Date: Fri, 5 Nov 2021 05:19:15 GMT
Subject: Integrated: JDK-8272065: jcmd cannot rely on the old core reflection
 implementation which will be changed after JEP 416
In-Reply-To: <rFA3D6A4-VutPJqmsw3VJwQv7ppfv-Pj9PVB_piIU8M=.9af77966-baa7-415e-8b3d-95a8cf47ea3e@github.com>
References: <rFA3D6A4-VutPJqmsw3VJwQv7ppfv-Pj9PVB_piIU8M=.9af77966-baa7-415e-8b3d-95a8cf47ea3e@github.com>
Message-ID: <7mVhoO4DH6O8bRsa1YBfnlRFAkY3UXeR1v_wPZzgsF0=.7d26c042-2d77-428a-bfb5-dd9b25e4aaa6@github.com>

On Thu, 4 Nov 2021 13:25:14 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> `VM.metaspace`, `VM.classloaders` and `VM.class_hierarchy` all print out reflection invocation targets for delegating reflection class loaders. Post JEP 416 we don't use DelegatingClassLoaders anymore.
> 
> This patch removes the display of reflection targets from these commands as well as associated helper code and tests.
> 
> I don't have enough time atm to reimplement this feature using method handles. But at least we can remove the old code, and prepare the way for more code removal.
> 
> The patch does not touch vmClasses, `reflect_ConstructorAccessor` and `reflect_MethodAccessor` are both still there.
> 
> Tests: GHAs, manually testing the commands.

This pull request has now been integrated.

Changeset: 7281861e
Author:    Thomas Stuefe <stuefe at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/7281861e0662e6c51507066a1f12673a236c7491
Stats:     368 lines in 8 files changed: 0 ins; 367 del; 1 mod

8272065: jcmd cannot rely on the old core reflection implementation which will be changed after JEP 416

Reviewed-by: mchung, coleenp, dholmes

-------------

PR: https://git.openjdk.java.net/jdk/pull/6257

From qingfeng.yy at alibaba-inc.com  Fri Nov  5 08:34:25 2021
From: qingfeng.yy at alibaba-inc.com (Yi Yang)
Date: Fri, 05 Nov 2021 16:34:25 +0800
Subject: =?UTF-8?B?UmU6IFtFeHRlcm5hbF0gOiBSZTogUkZDOiBFeHRlbmQgRENtZChEaWFnbm9zdGljLUNvbW1h?=
 =?UTF-8?B?bmQpIGZyYW1ld29yayB0byBzdXBwb3J0IEphdmEgbGV2ZWwgRENtZA==?=
In-Reply-To: <b320493e-225f-c1ce-1daf-68b7673668f7@oracle.com>
References: <7f2df6ad-7d73-46ac-a23e-959fd6b4d4af.denghui.ddh@alibaba-inc.com>
 <6581e9e4-851a-c562-37af-915ff3fdc492@oracle.com>
 <8a5c6087-a131-4ddd-9195-0f1e51705351.denghui.ddh@alibaba-inc.com>,
 <b320493e-225f-c1ce-1daf-68b7673668f7@oracle.com>
Message-ID: <d1884e62-1ebe-45c9-9b05-c17c98e2953e.qingfeng.yy@alibaba-inc.com>

Hi all,

I had an offline discussion about this with Denghui, when I first time hear this idea, I felt it was useful. It allows users to do some stuff that requires a lot of effort in a simple way. I'm also tracking discussion on the mailing list, I've seen many folks come up with very constructive comments and questions/concerns. In order to make the follow-up discussion simple, I want to try to summarize and give some answers on behalf of myself. Each headline is a question/concern that folks are concerned about, followed by my personal opinion on it. I'd appreciate it if you can append any missing content.

=== What is it?
It provides the ability for users to trigger predefined callbacks while the application is running.

=== May misuse?
It is provided through jcmd, this ability should ideally be used for debugging/development/diagnosis purposes. It may be misused, but this is beyond our control, just as users can use signal handler to download App and play a song.

=== Maintainability?
It expands current jcmd implementation rather than a significant modification, so maintainability should be ok IMHO.

=== Safety?
Undeniably, it may raise some potential security issues.

=== Alternatives?
Socket: It is inconvenient for users to simply do the same thing compared to this, we have to write a lot of boilerplate socket code.
Signal: Not open to users,  a limited number of signals, more likely to be misused.

=== Purpose?
1. I have a web application that can analyze Java heap dump. I hope to provide a simple way to report runtime app metrics, such as disk usage and online worker load, instead of writing a complete web page and providing an admin page to access it. This information can also be gathered on other monitoring platforms.2. Trigger the DEBUG functionality while running, output some debug logs

Best regards.


------------------------------------------------------------------
From:Chris Plummer <chris.plummer at oracle.com>
Send Time:2021 Nov. 4 (Thu.) 14:10
To:dong denghui <denghui.ddh at alibaba-inc.com>; serviceability-dev <serviceability-dev at openjdk.java.net>; hotspot-dev <hotspot-dev at openjdk.java.net>
Subject:Re: [External] : Re: RFC: Extend DCmd(Diagnostic-Command) framework to support Java level DCmd

 
Hi Denghui,

 Yes, there are other ways the same thing could be accomplished like sockets or signals, but all of this is outside of the purview of the JDK, and therefore we don't become responsible for its design, maintenance, and potential security concerns. EnableUserLevelDCmd doesn't really fix any of these concerns, because an app can just always launch with this flag enabled. It really should be reserved for launching a JVM for the specific purpose of gathering some extra diagnostic data, but there is no way to enforce that.

 Anyway, I'm not the gatekeeper on this. Just expressing some of my concerns. Others have done the same. I think we've seen a lack of enthusiasm in favor of doing this except from you. I would be good to see input from others that would like this feature in place.

 cheers,

 Chris

 On 11/1/21 8:09 PM, Denghui Dong wrote:

Hi Chris,

 Thank you for the comments.

 Yes, we have no good way to restrict the user registration commands to only include diagnosis-related operations, but in my opinion, this does not seem to be a problem that must be solved perfectly.

 The following are my thoughts.

 This extension is an entry that triggers the operation that the user wants to perform (similar to the Signal Handler mechanism but with a name and parameters). Even without this extension, the user can have other ways to achieve the same goal.

 On the one hand, we could standardize the usage scenarios of the API on the document(Indeed, users can still write programs not in accordance with the specifications, for example, users can implement multiple calls to the same object's hachCode method to return different values or make an object alive again during finalize method executing).

 On the other hand, we can add some restrictions to help users make better use of this extension.
 e.g we can add a new VM option, such as EnableUserLevelDCmd, the application can only register customer commands when this option is enabled.

 Or from another perspective, can we allow users to do some non-diagnostic-related operations in custom commands?

 Best,
 Denghui
------------------------------------------------------------------
From:Chris Plummer <chris.plummer at oracle.com>
Send Time:2021?11?2?(???) 03:35
To:???(??) <denghui.ddh at alibaba-inc.com>; serviceability-dev <serviceability-dev at openjdk.java.net>; hotspot-dev <hotspot-dev at openjdk.java.net>
Subject:Re: RFC: Extend DCmd(Diagnostic-Command) framework to support Java level DCmd

I have similar concerns to those others have expressed, so I'll try to add something new to the discussion and not just repeat.

 DCMDs have historically been very VM centric. That's not to say they aren't useful for debugging applications, but they do so by providing VM related info like stack traces, heap dumps, and class histograms. Also hotspot has been the gatekeeper for new DCMDs, meaning that new ones do not get added without going through the hotspot review process.

 Allowing any application or framework to add a DCMD changes this VM centric view in a way that concerns me. This approach allows a DCMD to pretty much do anything (java security not withstanding). App writers could even use them to provide a user facing interface. For example, if an app has some sort internal database, it could allow users to query it via a DCMD, and maybe even suggest that users write simple shell scripts that use jcmd to do these queries. Allowing this type of non-diagnostic usage seems like a path we don't want to go down, yet I don't see how it can be prevented once you allow applications to add DCMDs.

 Chris

 On 10/25/21 1:37 AM, Denghui Dong wrote:
Hi there!

 We'd like to discuss a proposal for extending the current DCmd framework to support Java level DCmd.

 At present, DCmd only allows the VM to register commands, which can be called through jcmd or JMX. It would be beneficial if the user could create their own commands.

 The idea of this extension originally came from our internal Java agent that detects the misusage of Unsafe API.

 This agent can collect the call sites that allocate or free direct memory in the application(NMT could not do it IMO) to detect direct memory leaks.

 In the beginning, it just prints all call sites, without any statistical function, it's hard to use.

 So we plan to use a way similar to jeprof (from jemalloc) to generate a report file that aggregates all useful information.

 During the implementation process, we found that we need a mechanism to notify the agent to generate reports.

 The common practice is:
 a) Register a service port, triggered by an HTTP request
 b) Triggered by signal
 c) Generate reports periodically, or when the process exits

 But these three ways have certain problems.
 For a) we need to introduce a network component, will increase the complexity of implementation
 For b) we cannot pass parameters
 For c) some files that may never be used will be generated

 Essentially, this question is how to notify the application to do a certain task, or in other words, how do we issue a command to the application. We believe that other Java developers will also encounter similar problems. 
 (And sometimes there may be multiple unrelated dependent components in a Java application that require such a mechanism.)

 Naturally, we think that jcmd can already issue some commands registered in VM to the application, why can't we extend to the java level?

 This feature will be very useful for some lightweight tools, just like the scenario we encountered, to notify the tools to perform certain operations.

 In addition, this feature will also bring benefits to Java beginners.

 For example, in the beginning, beginners may not use advanced log components, but they will also encounter the need to output debug logs. They may write code like this:

 ```
     if (debug) {
       System.out.println("...");
     }
 ```

 If developers can easily control the value of debug, it's attractive.

 Like this:

 ```
     Factory.register("MyApp.flipDebug", out -> debug = !debug);

     jcmd <pid> MyApp.flipDebug
 ```

 For mainstream framework, we can apply this feature to trigger some common activities, such as health checks, graceful shutdown, and dynamic configuration updates, But to be honest, these frameworks are very mature and stable, and for compatibility purposes, it's hard to let them use this extension.

 Comments welcome!

 Thanks,
 Denghui 


From chagedorn at openjdk.java.net  Fri Nov  5 09:16:12 2021
From: chagedorn at openjdk.java.net (Christian Hagedorn)
Date: Fri, 5 Nov 2021 09:16:12 GMT
Subject: RFR: 8276571: C2: pass compilation options as structure [v3]
In-Reply-To: <zNDdDM6pSDDO6MCIYGYPGBhNmymwd11Arg2ErUxZzK4=.4c04644d-e9fc-4195-8fbd-51ef106bdf77@github.com>
References: <X0svBmkpcS4RQn4h4C4sLw-XpUALo6_uXUOiHc7HxP0=.63aa5d9b-fd29-4351-bac4-e63c24fbd6dc@github.com>
 <zNDdDM6pSDDO6MCIYGYPGBhNmymwd11Arg2ErUxZzK4=.4c04644d-e9fc-4195-8fbd-51ef106bdf77@github.com>
Message-ID: <mVWU9eREbcz_0G_t6OiBBIrnDmPWR8P6_Nbr9kTOKas=.a925d522-3624-4da8-b249-b482faaed6bc@github.com>

On Thu, 4 Nov 2021 16:39:45 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

>> Currently we pass several compilation options as separate arguments to `Compile`: 
>> 
>> Compile C(env, target, entry_bci, subsume_loads, do_escape_analysis, eliminate_boxing, do_locks_coarsening, install_code, directive); 
>> 
>> Originally we had only `subsume_loads` option but we added few since then and we may add more. 
>> 
>> I suggest to add new `Options` class to pass these values into `Compile`.
>
> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Per-use Options factory method

Looks good!

-------------

Marked as reviewed by chagedorn (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6237

From lkorinth at openjdk.java.net  Fri Nov  5 09:30:15 2021
From: lkorinth at openjdk.java.net (Leo Korinth)
Date: Fri, 5 Nov 2021 09:30:15 GMT
Subject: RFR: 8275506: Rename allocated_on_stack to
 allocated_on_stack_or_embedded [v2]
In-Reply-To: <A-GeqVgpuacN0p2LE3h2pvcd8ksnmo9Q9P9wKnIcpTE=.36a284a4-dfd7-4864-ba2b-79bb4c815272@github.com>
References: <lcZrszzuQgF2EdE8jF3Rtp9F03i5Yth8FO_32TGHaH8=.66a43c46-5996-490c-930e-285ab03bca61@github.com>
 <A-GeqVgpuacN0p2LE3h2pvcd8ksnmo9Q9P9wKnIcpTE=.36a284a4-dfd7-4864-ba2b-79bb4c815272@github.com>
Message-ID: <AUqdwE7JJKlTUR8Kcxr8PRBvUQCprEcWuEjzyjBql-w=.abc0435f-a3cf-41a1-953f-3defd9cdc8f0@github.com>

On Fri, 29 Oct 2021 13:29:40 GMT, Leo Korinth <lkorinth at openjdk.org> wrote:

>> In allocation.hpp, the name allocated_on_stack can be misleading, better rename the function to allocated_on_stack_or_embedded and it will match the name of the enum as a bonus.
>
> Leo Korinth has updated the pull request incrementally with one additional commit since the last revision:
> 
>   restart failed github tests

Thanks Thomas.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6004

From lkorinth at openjdk.java.net  Fri Nov  5 09:30:15 2021
From: lkorinth at openjdk.java.net (Leo Korinth)
Date: Fri, 5 Nov 2021 09:30:15 GMT
Subject: Integrated: 8275506: Rename allocated_on_stack to
 allocated_on_stack_or_embedded
In-Reply-To: <lcZrszzuQgF2EdE8jF3Rtp9F03i5Yth8FO_32TGHaH8=.66a43c46-5996-490c-930e-285ab03bca61@github.com>
References: <lcZrszzuQgF2EdE8jF3Rtp9F03i5Yth8FO_32TGHaH8=.66a43c46-5996-490c-930e-285ab03bca61@github.com>
Message-ID: <tHU-PcLIHTRcPIY7gEqvSJ8k8UsDDRE7l6l6paxhyEA=.fc247cd8-6068-4419-8466-e2cf350c5900@github.com>

On Tue, 19 Oct 2021 12:18:30 GMT, Leo Korinth <lkorinth at openjdk.org> wrote:

> In allocation.hpp, the name allocated_on_stack can be misleading, better rename the function to allocated_on_stack_or_embedded and it will match the name of the enum as a bonus.

This pull request has now been integrated.

Changeset: 323d2017
Author:    Leo Korinth <lkorinth at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/323d2017959dc96d25eaa1aad6404586099c237e
Stats:     16 lines in 6 files changed: 0 ins; 0 del; 16 mod

8275506: Rename allocated_on_stack to allocated_on_stack_or_embedded

Reviewed-by: stuefe

-------------

PR: https://git.openjdk.java.net/jdk/pull/6004

From shade at openjdk.java.net  Fri Nov  5 10:13:08 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Fri, 5 Nov 2021 10:13:08 GMT
Subject: RFR: 8274903: Zero: Support AsyncGetCallTrace
In-Reply-To: <JjNvKdaMic8QCxlSJG-pmw0Ru9eLqwnf3KQ8xGVzETY=.4d44f1e7-503a-4f08-8d62-3bf7eae74a49@github.com>
References: <JjNvKdaMic8QCxlSJG-pmw0Ru9eLqwnf3KQ8xGVzETY=.4d44f1e7-503a-4f08-8d62-3bf7eae74a49@github.com>
Message-ID: <ZUcIQfInAewW4bA4z_-W93P26sAbNmdihcdgm2qns1Q=.16eb2b31-a863-481e-b5fc-17bdc17dbe21@github.com>

On Thu, 7 Oct 2021 12:42:48 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> This is a Zero infrastructure improvement that makes Zero VM work with AsyncGetCallTrace, and by extension, async-profiler.
> 
> Zero is quite odd in stack management. The "real" stack actually contains the C++ Interpreter and the rest of VM code. The Java stack is reported through the usual "frame" mechanism the rest of VM uses to get the mapping from Template Interpreter, stub, and compiled code. So, to support Java-centric AsyncGetCallTrace, we t "only" need Zero to report the proper Java frames from its ZeroStack from the profiling/signal handlers. 
> 
> Additional testing:
>  - [x] Linux x86_64 Zero `serviceability/AsyncGetCallTrace` now pass
>  - [x] Linux x86_64 Zero works with `async-profiler`

Anyone has opinions about this patch? :)

-------------

PR: https://git.openjdk.java.net/jdk/pull/5848

From mcimadamore at openjdk.java.net  Fri Nov  5 11:06:53 2021
From: mcimadamore at openjdk.java.net (Maurizio Cimadamore)
Date: Fri, 5 Nov 2021 11:06:53 GMT
Subject: RFR: 8275063: Implementation of Foreign Function & Memory API
 (Second incubator) [v18]
In-Reply-To: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
References: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
Message-ID: <4iHYQMJoHZwfRJCHV9tYB_5t92pjEsgISw_d9_Nt6H8=.1fb75c40-a6ee-498a-9d4e-cf1b6d11b583@github.com>

> This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment.
> 
> [1] - https://openjdk.java.net/jeps/419

Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision:

  * Add two new CLinker static methods to compute upcall/downcall method types
  * Clarify section on CLinker downcall type
  * Add section on CLinker safety guarantees

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/5907/files
  - new: https://git.openjdk.java.net/jdk/pull/5907/files/b9432473..ce561e1f

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=17
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=16-17

  Stats: 79 lines in 3 files changed: 47 ins; 17 del; 15 mod
  Patch: https://git.openjdk.java.net/jdk/pull/5907.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/5907/head:pull/5907

PR: https://git.openjdk.java.net/jdk/pull/5907

From mcimadamore at openjdk.java.net  Fri Nov  5 11:30:59 2021
From: mcimadamore at openjdk.java.net (Maurizio Cimadamore)
Date: Fri, 5 Nov 2021 11:30:59 GMT
Subject: RFR: 8275063: Implementation of Foreign Function & Memory API
 (Second incubator) [v19]
In-Reply-To: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
References: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
Message-ID: <wvROJ-xKOsnrsOl2uIL98pouvLignJGoNuiZ9RhnT-E=.b2800aa4-fd0c-4e32-a3d6-88c1247fcf8b@github.com>

> This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment.
> 
> [1] - https://openjdk.java.net/jeps/419

Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision:

  Streamline javadoc for package-info

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/5907/files
  - new: https://git.openjdk.java.net/jdk/pull/5907/files/ce561e1f..350f1f07

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=18
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=17-18

  Stats: 37 lines in 1 file changed: 9 ins; 3 del; 25 mod
  Patch: https://git.openjdk.java.net/jdk/pull/5907.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/5907/head:pull/5907

PR: https://git.openjdk.java.net/jdk/pull/5907

From mcimadamore at openjdk.java.net  Fri Nov  5 11:37:57 2021
From: mcimadamore at openjdk.java.net (Maurizio Cimadamore)
Date: Fri, 5 Nov 2021 11:37:57 GMT
Subject: RFR: 8275063: Implementation of Foreign Function & Memory API
 (Second incubator) [v20]
In-Reply-To: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
References: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
Message-ID: <1EDavlhSqnzIbpu1uQArxPknmjIMaeQEoPV8W1T3UjE=.9a5dcd88-0cc7-4965-b2b7-3cccaf70b50e@github.com>

> This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment.
> 
> [1] - https://openjdk.java.net/jeps/419

Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision:

  Rename MemorySegment::ofAddressNative to MemorySegment::ofAddress
  (which is consistent with other restricted factories in VaList and NativeSymbol)

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/5907/files
  - new: https://git.openjdk.java.net/jdk/pull/5907/files/350f1f07..663e72a8

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=19
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=18-19

  Stats: 51 lines in 23 files changed: 0 ins; 3 del; 48 mod
  Patch: https://git.openjdk.java.net/jdk/pull/5907.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/5907/head:pull/5907

PR: https://git.openjdk.java.net/jdk/pull/5907

From mcimadamore at openjdk.java.net  Fri Nov  5 11:54:14 2021
From: mcimadamore at openjdk.java.net (Maurizio Cimadamore)
Date: Fri, 5 Nov 2021 11:54:14 GMT
Subject: RFR: 8275063: Implementation of Foreign Function & Memory API
 (Second incubator) [v12]
In-Reply-To: <UX8V13GkspdQyaQurhAql9BaZiAQH5klYW8GyZiRJ-g=.17702d38-a07e-4596-af99-2fcb5243862c@github.com>
References: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
 <LZdOj0F56w1jvphEqwIPKjBGVqK9bK18hoyBRS7LvH8=.28940500-42f1-4550-94ab-4dc090785924@github.com>
 <UX8V13GkspdQyaQurhAql9BaZiAQH5klYW8GyZiRJ-g=.17702d38-a07e-4596-af99-2fcb5243862c@github.com>
Message-ID: <QZeHdQOpqqH2mQBQ11F1wKLV6i68XxxXAfn02eqfvZA=.c882ffe7-bab2-4664-b6ea-81fef94054aa@github.com>

On Tue, 2 Nov 2021 15:40:45 GMT, Paul Sandoz <psandoz at openjdk.org> wrote:

>> Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Tweak javadoc of loaderLookup
>
> Marked as reviewed by psandoz (Reviewer).

I have made some minor API changes (added two methods to `CLinker` to return the upcall and downcall method types, as suggested offline by @PaulSandoz). I've also cleaned up the `CLinker` javadoc, and added a section on safety consideration, streamlined the links in the package-level javadoc and renamed `MemorySegment::ofAddressNative` to simply `MemorySegment::ofAddress` (which is consistent with restricted factories in `NativeSymbol` and `VaList`).

javadoc: http://cr.openjdk.java.net/~mcimadamore/JEP-419/v2/javadoc/jdk/incubator/foreign/package-summary.html
specdiff: http://cr.openjdk.java.net/~mcimadamore/JEP-419/v2/specdiff_out/overview-summary.html

-------------

PR: https://git.openjdk.java.net/jdk/pull/5907

From jvernee at openjdk.java.net  Fri Nov  5 14:29:19 2021
From: jvernee at openjdk.java.net (Jorn Vernee)
Date: Fri, 5 Nov 2021 14:29:19 GMT
Subject: RFR: 8275063: Implementation of Foreign Function & Memory API
 (Second incubator) [v18]
In-Reply-To: <4iHYQMJoHZwfRJCHV9tYB_5t92pjEsgISw_d9_Nt6H8=.1fb75c40-a6ee-498a-9d4e-cf1b6d11b583@github.com>
References: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
 <4iHYQMJoHZwfRJCHV9tYB_5t92pjEsgISw_d9_Nt6H8=.1fb75c40-a6ee-498a-9d4e-cf1b6d11b583@github.com>
Message-ID: <VaFgr9j0KCMW5_zyWLdUizp_oVTQ2wQDw80reGsrncc=.cc8aeca8-f65d-4655-a8c3-da1198e7b602@github.com>

On Fri, 5 Nov 2021 11:06:53 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

>> This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment.
>> 
>> [1] - https://openjdk.java.net/jeps/419
>
> Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision:
> 
>   * Add two new CLinker static methods to compute upcall/downcall method types
>   * Clarify section on CLinker downcall type
>   * Add section on CLinker safety guarantees

src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/CLinker.java line 65:

> 63:  * <li>if {@code L} is a {@link ValueLayout} with carrier {@code E} then there are two cases:
> 64:  *     <ul>
> 65:  *         <li>if {@code L} occurs in a parameter position and {@code E} is {@code NativeAddress.class},

This looks spurious

src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/CLinker.java line 134:

> 132:  * <p>
> 133:  * Upcall stubs are generally safer to work with, as the linker runtime can validate the type of the target method
> 134:  * handle against the provided function descriptor and report an error if any mismatch is detected. If the target method

But, in the case of upcalls, errors can still occur if the native code casts the pointer to the upcall stub to an incorrect type, e.g. `FunctionDescriptor.ofVoid(ADDRESS, ADDRESS)`, but on the native side cast it to `void (*)(void*)`, meaning the second argument would be garbage on the Java side. i.e. there is still room for a mismatch the same as with downcalls.

src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/CLinker.java line 267:

> 265:     static MethodType upcallType(FunctionDescriptor functionDescriptor) {
> 266:         return SharedUtils.inferMethodType(functionDescriptor, true);
> 267:     }

Nice! :)

-------------

PR: https://git.openjdk.java.net/jdk/pull/5907

From mcimadamore at openjdk.java.net  Fri Nov  5 14:37:23 2021
From: mcimadamore at openjdk.java.net (Maurizio Cimadamore)
Date: Fri, 5 Nov 2021 14:37:23 GMT
Subject: RFR: 8275063: Implementation of Foreign Function & Memory API
 (Second incubator) [v18]
In-Reply-To: <VaFgr9j0KCMW5_zyWLdUizp_oVTQ2wQDw80reGsrncc=.cc8aeca8-f65d-4655-a8c3-da1198e7b602@github.com>
References: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
 <4iHYQMJoHZwfRJCHV9tYB_5t92pjEsgISw_d9_Nt6H8=.1fb75c40-a6ee-498a-9d4e-cf1b6d11b583@github.com>
 <VaFgr9j0KCMW5_zyWLdUizp_oVTQ2wQDw80reGsrncc=.cc8aeca8-f65d-4655-a8c3-da1198e7b602@github.com>
Message-ID: <z_is0y7ChExhNkR5zBq_fkBFCcCrqKkQGJg94ANBdD4=.0a8212a2-2f54-4b56-83e2-67ab3c53c66d@github.com>

On Fri, 5 Nov 2021 14:25:35 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:

>> Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   * Add two new CLinker static methods to compute upcall/downcall method types
>>   * Clarify section on CLinker downcall type
>>   * Add section on CLinker safety guarantees
>
> src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/CLinker.java line 134:
> 
>> 132:  * <p>
>> 133:  * Upcall stubs are generally safer to work with, as the linker runtime can validate the type of the target method
>> 134:  * handle against the provided function descriptor and report an error if any mismatch is detected. If the target method
> 
> But, in the case of upcalls, errors can still occur if the native code casts the pointer to the upcall stub to an incorrect type, e.g. `FunctionDescriptor.ofVoid(ADDRESS, ADDRESS)`, but on the native side cast it to `void (*)(void*)`, meaning the second argument would be garbage on the Java side. i.e. there is still room for a mismatch the same as with downcalls.

Yes and no. In a downcall, you just don't know what signature the downcall will feature in the native lib. So you pass a function descriptor and you hope it's ok. In the upcall case you _do_ know the signature of the Java upcall code you want to call, so you can validate the descriptor against that. Of course the native code can still cast things around in ways that blow things up, but the two problems seem somewhat different, at least to me. But I can tweak the text a bit.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5907

From mcimadamore at openjdk.java.net  Fri Nov  5 15:28:45 2021
From: mcimadamore at openjdk.java.net (Maurizio Cimadamore)
Date: Fri, 5 Nov 2021 15:28:45 GMT
Subject: RFR: 8275063: Implementation of Foreign Function & Memory API
 (Second incubator) [v21]
In-Reply-To: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
References: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
Message-ID: <_YLlQk23TfRkCzouXvgHH3Zxktw1sxo1uvae5KsjlFw=.3c4f3aeb-e24f-424b-94f4-04b19f0e834b@github.com>

> This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment.
> 
> [1] - https://openjdk.java.net/jeps/419

Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision:

  Clarify safety considerations for upcalls

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/5907/files
  - new: https://git.openjdk.java.net/jdk/pull/5907/files/663e72a8..2aa126a9

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=20
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=19-20

  Stats: 5 lines in 1 file changed: 2 ins; 0 del; 3 mod
  Patch: https://git.openjdk.java.net/jdk/pull/5907.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/5907/head:pull/5907

PR: https://git.openjdk.java.net/jdk/pull/5907

From jvernee at openjdk.java.net  Fri Nov  5 15:52:16 2021
From: jvernee at openjdk.java.net (Jorn Vernee)
Date: Fri, 5 Nov 2021 15:52:16 GMT
Subject: RFR: 8275063: Implementation of Foreign Function & Memory API
 (Second incubator) [v18]
In-Reply-To: <z_is0y7ChExhNkR5zBq_fkBFCcCrqKkQGJg94ANBdD4=.0a8212a2-2f54-4b56-83e2-67ab3c53c66d@github.com>
References: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
 <4iHYQMJoHZwfRJCHV9tYB_5t92pjEsgISw_d9_Nt6H8=.1fb75c40-a6ee-498a-9d4e-cf1b6d11b583@github.com>
 <VaFgr9j0KCMW5_zyWLdUizp_oVTQ2wQDw80reGsrncc=.cc8aeca8-f65d-4655-a8c3-da1198e7b602@github.com>
 <z_is0y7ChExhNkR5zBq_fkBFCcCrqKkQGJg94ANBdD4=.0a8212a2-2f54-4b56-83e2-67ab3c53c66d@github.com>
Message-ID: <B_3QI_9_jBQ9_5pfkQksCrCAA3NM_-loFl7R9m4-FNY=.3eb09f05-f6eb-4b14-8a29-d3e050b42ac9@github.com>

On Fri, 5 Nov 2021 14:33:44 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

>> src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/CLinker.java line 134:
>> 
>>> 132:  * <p>
>>> 133:  * Upcall stubs are generally safer to work with, as the linker runtime can validate the type of the target method
>>> 134:  * handle against the provided function descriptor and report an error if any mismatch is detected. If the target method
>> 
>> But, in the case of upcalls, errors can still occur if the native code casts the pointer to the upcall stub to an incorrect type, e.g. `FunctionDescriptor.ofVoid(ADDRESS, ADDRESS)`, but on the native side cast it to `void (*)(void*)`, meaning the second argument would be garbage on the Java side. i.e. there is still room for a mismatch the same as with downcalls.
>
> Yes and no. In a downcall, you just don't know what signature the downcall will feature in the native lib. So you pass a function descriptor and you hope it's ok. In the upcall case you _do_ know the signature of the Java upcall code you want to call, so you can validate the descriptor against that. Of course the native code can still cast things around in ways that blow things up, but the two problems seem somewhat different, at least to me. But I can tweak the text a bit.

Ok, thanks.

I think of it more like this: in both cases we specify a native type as well as a Java type, both in the form of a FunctionDescriptor, from which we then derive the Java type in the form of a MethodType. If there is a mismatch here with what the native code does we are in trouble, this seems the same for downcalls and upcalls. In both cases we know the Java side for sure, it's the native side we can't validate (they are just flipped around for upcalls).

But, for upcalls there is an additional thing that can go wrong: the type of the target MethodHandle we pass could have a mismatch with the type we inferred from the FunctionDescriptor, so there we need to do an extra check. i.e. in a way this seems _less_ safe (though a different kind of safety), than downcalls, since there is an additional way to mess up with the linkage request, although we can catch that case.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5907

From mcimadamore at openjdk.java.net  Fri Nov  5 16:02:43 2021
From: mcimadamore at openjdk.java.net (Maurizio Cimadamore)
Date: Fri, 5 Nov 2021 16:02:43 GMT
Subject: RFR: 8275063: Implementation of Foreign Function & Memory API
 (Second incubator) [v22]
In-Reply-To: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
References: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
Message-ID: <C26kQ22eRpl5GkWQTPML0L7I5QeQDavEtZA3Y9wXz74=.5585b09d-d131-43d8-a1cf-04b089679fb4@github.com>

> This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment.
> 
> [1] - https://openjdk.java.net/jeps/419

Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision:

  Further tweak upcall safety considerations

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/5907/files
  - new: https://git.openjdk.java.net/jdk/pull/5907/files/2aa126a9..4e3af9f1

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=21
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=20-21

  Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod
  Patch: https://git.openjdk.java.net/jdk/pull/5907.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/5907/head:pull/5907

PR: https://git.openjdk.java.net/jdk/pull/5907

From kvn at openjdk.java.net  Fri Nov  5 16:11:18 2021
From: kvn at openjdk.java.net (Vladimir Kozlov)
Date: Fri, 5 Nov 2021 16:11:18 GMT
Subject: RFR: 8276571: C2: pass compilation options as structure [v3]
In-Reply-To: <zNDdDM6pSDDO6MCIYGYPGBhNmymwd11Arg2ErUxZzK4=.4c04644d-e9fc-4195-8fbd-51ef106bdf77@github.com>
References: <X0svBmkpcS4RQn4h4C4sLw-XpUALo6_uXUOiHc7HxP0=.63aa5d9b-fd29-4351-bac4-e63c24fbd6dc@github.com>
 <zNDdDM6pSDDO6MCIYGYPGBhNmymwd11Arg2ErUxZzK4=.4c04644d-e9fc-4195-8fbd-51ef106bdf77@github.com>
Message-ID: <9Vo2Sucm_eCSVFaAB_G7_KRKdoq4z3y2BmLhxT8rREk=.14cf2e9a-b74c-4f7d-9d7d-6075f4e6cb96@github.com>

On Thu, 4 Nov 2021 16:39:45 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

>> Currently we pass several compilation options as separate arguments to `Compile`: 
>> 
>> Compile C(env, target, entry_bci, subsume_loads, do_escape_analysis, eliminate_boxing, do_locks_coarsening, install_code, directive); 
>> 
>> Originally we had only `subsume_loads` option but we added few since then and we may add more. 
>> 
>> I suggest to add new `Options` class to pass these values into `Compile`.
>
> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Per-use Options factory method

Thank you, Aleksey and Christian.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6237

From kvn at openjdk.java.net  Fri Nov  5 16:11:18 2021
From: kvn at openjdk.java.net (Vladimir Kozlov)
Date: Fri, 5 Nov 2021 16:11:18 GMT
Subject: Integrated: 8276571: C2: pass compilation options as structure
In-Reply-To: <X0svBmkpcS4RQn4h4C4sLw-XpUALo6_uXUOiHc7HxP0=.63aa5d9b-fd29-4351-bac4-e63c24fbd6dc@github.com>
References: <X0svBmkpcS4RQn4h4C4sLw-XpUALo6_uXUOiHc7HxP0=.63aa5d9b-fd29-4351-bac4-e63c24fbd6dc@github.com>
Message-ID: <LukABxX04o_E7GpdK12pfOc39N_jJx_tu4F2AuJvc1Y=.e50eda1b-3222-4bb5-b645-2e339dd97924@github.com>

On Wed, 3 Nov 2021 18:49:47 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

> Currently we pass several compilation options as separate arguments to `Compile`: 
> 
> Compile C(env, target, entry_bci, subsume_loads, do_escape_analysis, eliminate_boxing, do_locks_coarsening, install_code, directive); 
> 
> Originally we had only `subsume_loads` option but we added few since then and we may add more. 
> 
> I suggest to add new `Options` class to pass these values into `Compile`.

This pull request has now been integrated.

Changeset: a74a839a
Author:    Vladimir Kozlov <kvn at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/a74a839af02446d322d77c6e546e652ec6ad5d73
Stats:     76 lines in 4 files changed: 40 ins; 17 del; 19 mod

8276571: C2: pass compilation options as structure

Reviewed-by: shade, chagedorn

-------------

PR: https://git.openjdk.java.net/jdk/pull/6237

From stuefe at openjdk.java.net  Sat Nov  6 05:49:45 2021
From: stuefe at openjdk.java.net (Thomas Stuefe)
Date: Sat, 6 Nov 2021 05:49:45 GMT
Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks
In-Reply-To: <h_TohyRCtTn8BzN-h8uTTotWxPr5gTe4CWB04_lS8uA=.3f687b45-3058-42ab-902b-f42cfeb9104d@github.com>
References: <h_TohyRCtTn8BzN-h8uTTotWxPr5gTe4CWB04_lS8uA=.3f687b45-3058-42ab-902b-f42cfeb9104d@github.com>
Message-ID: <mwMP1_opIKk3THF6Ydf_4GzGQYI3WiadyYX76lP59Xg=.3de94f35-910c-4327-b295-15b90d836311@github.com>

On Thu, 14 Oct 2021 15:49:05 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> This is part of a number of RFE I plan to improve and simplify C-heap overflow checking in hotspot. For the whole story please refer to https://bugs.openjdk.java.net/browse/JDK-8275301.
> 
> This proposal adds NMT buffer overflow checking. As laid out in JDK-8275301:
> 
> - it would give us C-heap overflow checking in release builds
> - the additional costs are neglectable
> - NMT needs intact headers anyway. Faced with buffer overwrites today, it would maybe crash or maybe account wrongly, but it's a bit of a lottery really. The error reports would also be confusing.
> - it is a preparation for future code removal (the memory guarding done in debug only in os::malloc() and friends, and possibly the guarding done with CheckJNICalls)
> 
> Patch notes:
> 
> 1) The malloc header is changed such that it contains a 16-bit canary directly preceding the user payload of the allocation. 
> 
> On 64-bit, we don't even need to enlarge the malloc header: we carve some bits out by decreasing the size of the bucket index bit field to 16 bits. The bucket index field is used to store the bucket slot of the malloc site table in NMT detail mode. The malloc site table width is 512 atm, so 65k gives plenty of room for growing the malloc site table should we ever want to.
> 
> On 32-bit, I had to enlarge the header from 8 bytes to 16 bytes. That is because there were not enough bits to spare for a canary. On the upside, 8 bytes were not enough anyway, strictly speaking, to guarantee proper alignment e.g. for 128bit data types on all 32-bit platforms. See e.g. the malloc alignment the glibc uses.
> 
> I also took the freedom of re-arranging the malloc header fields a bit to minimize the difference between 32-bit and 64-bit platforms, and to align each field optimally according to its size. I also switched from bitfields to real types in order to be able to do a sizeof() on them.
> 
> For more details, see the comment in mallocTracker.hpp.
> 
> 2) I added a footer canary trailing the user allocation to catch tail buffer overruns. For simplicity reasons (alignment) and to save some cycles I made it a byte only. That is enough to catch most overrun scenarios. If you think this is too small, I'm open to change it.
> 
> 3) I put a bit of work into error reporting. When NMT detects corruption, it will now print out a hex dump of the corrupted area to tty before asserting.
> 
> 4) I added a bunch of gtests to test various heap overwrite scenarios. I also had to extend the gtest macros a bit because I wanted these tests of course to run in release builds too, but we did not have a death test macro for release builds yet (there are possibilities for code simplification here too, but that's for another RFE).
> 
> (Note that these gtests, to test anything, need to run with NMT switched on. We do this as part of our NMT jtreg-controlled gtests in tier1).
> 
> Even though the patch adds more code than it removes, it prepares possible code removal (if we can agree to do that) and the net result will be less complexity, not more. Again, see JDK-8275301 for details.
> 
> --------------
> 
> Example output a buffer overrun would provide:
> 
> 
> Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)
> NMT Block at 0x00005600f86136b0, corruption at: 0x00005600f86136c1: 
> 0x00005600f86136a8:   21 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00
> 0x00005600f86136b8:   00 00 00 00 0f 00 1f fa 00 61 00 00 00 00 00 00
> 0x00005600f86136c8:   41 39 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00005600f86136d8:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> 0x00005600f86136e8:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00005600f86136f8:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00005600f8613708:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00005600f8613718:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00005600f8613728:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00005600f8613738:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> assert failed: fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)#
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  Internal Error (mallocTracker.cpp:203), pid=10805, tid=10805
> #  fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)
> #
> 
> -------
> 
> Tests:
> - manual tests with Linux x64, x86, minimal build
> - GHAs all clean
> - SAP nightlies ran for 14 days in a row without problems

No takers?

-------------

PR: https://git.openjdk.java.net/jdk/pull/5952

From iklam at openjdk.java.net  Sun Nov  7 21:23:52 2021
From: iklam at openjdk.java.net (Ioi Lam)
Date: Sun, 7 Nov 2021 21:23:52 GMT
Subject: RFR: 8269986: Remove +3 from Symbol::identity_hash()
Message-ID: <sZinjAxTK8hjhtSVd0OktG6h44GR7vX2hudlz5sfCC0=.707fd475-c9a9-4da0-9964-d971e10a1176@github.com>

Please review this change that removes the `+3` from here:


  unsigned Symbol::identity_hash() const {
    unsigned addr_bits = (unsigned)((uintptr_t)this >> (LogMinObjAlignmentInBytes + 3));
                                                                                  ^^^
    return ((unsigned)extract_hash(_hash_and_refcount) & 0xffff) |
           ((addr_bits ^ (length() << 8) ^ (( _body[0] << 8) | _body[1])) << 16);
  }


The `+3` was intended to avoid getting the same value for these bits:


((uintptr_t)this) >> LogMinObjAlignmentInBytes) & 0x07)


However, as shown in the [bug report](https://bugs.openjdk.java.net/browse/JDK-8269986), the values for these bits are evenly distributed. So the `+3` is not necessary and may actually be counter-productive.

Testing: Oracle CI tiers 1-4

-------------

Commit messages:
 - 8269986: Remove +3 from Symbol::identity_hash()

Changes: https://git.openjdk.java.net/jdk/pull/6287/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6287&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8269986
  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6287.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6287/head:pull/6287

PR: https://git.openjdk.java.net/jdk/pull/6287

From duke at openjdk.java.net  Mon Nov  8 00:30:39 2021
From: duke at openjdk.java.net (Joshua Cao)
Date: Mon, 8 Nov 2021 00:30:39 GMT
Subject: RFR: 8274860: gcc 10.2.1 produces an uninitialized warning in
 sharedRuntimeTrig.cpp
In-Reply-To: <wmAuCknOmuK3H_2Fw8kJ95QKPEJBWI-Hlr19IDZIZaA=.0497fe7b-cb66-4d22-8c76-024b8d1b3288@github.com>
References: <wmAuCknOmuK3H_2Fw8kJ95QKPEJBWI-Hlr19IDZIZaA=.0497fe7b-cb66-4d22-8c76-024b8d1b3288@github.com>
Message-ID: <ArfsHwSXpBq-GewQ1qAd8rCcy10Ul_EaRYgxComJOvk=.cfe66e78-2926-4eca-8b02-979f31a698b8@github.com>

On Tue, 2 Nov 2021 23:39:48 GMT, Joshua Cao <duke at openjdk.java.net> wrote:

> Initialize `fq` to an array to zeroes.

I've taken a look at the discussion on the JBS issue again. I'm not sure why it was determined that this change should be applied to tip. I've tried to build locally for JDK tip and JDK15, and there is no uninitialized warning. There is a link in the JBS description explaining where the warning is disabled.

I think this issue should be closed, and I'll update https://github.com/openjdk/jdk11u-dev/pull/489.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6220

From duke at openjdk.java.net  Mon Nov  8 00:30:39 2021
From: duke at openjdk.java.net (Joshua Cao)
Date: Mon, 8 Nov 2021 00:30:39 GMT
Subject: Withdrawn: 8274860: gcc 10.2.1 produces an uninitialized warning in
 sharedRuntimeTrig.cpp
In-Reply-To: <wmAuCknOmuK3H_2Fw8kJ95QKPEJBWI-Hlr19IDZIZaA=.0497fe7b-cb66-4d22-8c76-024b8d1b3288@github.com>
References: <wmAuCknOmuK3H_2Fw8kJ95QKPEJBWI-Hlr19IDZIZaA=.0497fe7b-cb66-4d22-8c76-024b8d1b3288@github.com>
Message-ID: <iR7vPNNqY36XoBEilkS5ATKHkO9wzEDE4v0K4IRWzvk=.204354d6-447d-49b7-9f94-f2c3c6cb8b7f@github.com>

On Tue, 2 Nov 2021 23:39:48 GMT, Joshua Cao <duke at openjdk.java.net> wrote:

> Initialize `fq` to an array to zeroes.

This pull request has been closed without being integrated.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6220

From ddong at openjdk.java.net  Mon Nov  8 01:26:37 2021
From: ddong at openjdk.java.net (Denghui Dong)
Date: Mon, 8 Nov 2021 01:26:37 GMT
Subject: RFR: 8276209: Some call sites doesn't pass the parameter 'size' to
 SharedRuntime::dtrace_object_alloc(_base) [v2]
In-Reply-To: <QMapNYMdQg_awIZ5RvxX3vN85dXICxb89n4Wk0HagQE=.7f8f7b39-6165-4fb4-9880-f64715e506f0@github.com>
References: <oXp8apU_PFOujeuPDlfIdH1py41IfpOdAFxomx-JEww=.cd942605-1b73-4b5e-988a-899da8240a3d@github.com>
 <QMapNYMdQg_awIZ5RvxX3vN85dXICxb89n4Wk0HagQE=.7f8f7b39-6165-4fb4-9880-f64715e506f0@github.com>
Message-ID: <oIncH_DRR1zXny0bBOFNqLXY35WDFiHBPcSwdWW0y9g=.2427bc64-6863-49d1-9245-35ebae82a814@github.com>

On Sun, 31 Oct 2021 22:56:44 GMT, Denghui Dong <ddong at openjdk.org> wrote:

>> Hi,
>> 
>> Could I have a review of this fix that corrects the oop size value of dtrace_object_alloc(_base).
>> 
>> JDK-8039904 added a new parameter 'size' to SharedRuntime::dtrace_object_alloc and dtrace_object_alloc_base, but didn't modified the callsites(interpreter/c1/c2).
>> 
>> To make this fix as simple as possible, I overloaded dtrace_object_alloc_base rather than dtrace_object_alloc.
>> 
>> Thanks,
>> Denghui
>
> Denghui Dong has updated the pull request incrementally with one additional commit since the last revision:
> 
>   fix build problem

Gentle ping?
This problem seems to have existed for a long time. I think it's because there are few users, so it's not reported
As far as I know, a BCC's tool relies on this probe.
https://github.com/iovisor/bcc/blob/master/tools/lib/uobjnew.py#L110

-------------

PR: https://git.openjdk.java.net/jdk/pull/6181

From dholmes at openjdk.java.net  Mon Nov  8 01:54:34 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Mon, 8 Nov 2021 01:54:34 GMT
Subject: RFR: 8276209: Some call sites doesn't pass the parameter 'size' to
 SharedRuntime::dtrace_object_alloc(_base) [v2]
In-Reply-To: <QMapNYMdQg_awIZ5RvxX3vN85dXICxb89n4Wk0HagQE=.7f8f7b39-6165-4fb4-9880-f64715e506f0@github.com>
References: <oXp8apU_PFOujeuPDlfIdH1py41IfpOdAFxomx-JEww=.cd942605-1b73-4b5e-988a-899da8240a3d@github.com>
 <QMapNYMdQg_awIZ5RvxX3vN85dXICxb89n4Wk0HagQE=.7f8f7b39-6165-4fb4-9880-f64715e506f0@github.com>
Message-ID: <cX8tuomGCoxKU2HDw3Xxce9kZuRpVoWzdeqKKCQ6IvM=.99f2c59f-434f-45df-be74-849567b09eed@github.com>

On Sun, 31 Oct 2021 22:56:44 GMT, Denghui Dong <ddong at openjdk.org> wrote:

>> Hi,
>> 
>> Could I have a review of this fix that corrects the oop size value of dtrace_object_alloc(_base).
>> 
>> JDK-8039904 added a new parameter 'size' to SharedRuntime::dtrace_object_alloc and dtrace_object_alloc_base, but didn't modified the callsites(interpreter/c1/c2).
>> 
>> To make this fix as simple as possible, I overloaded dtrace_object_alloc_base rather than dtrace_object_alloc.
>> 
>> Thanks,
>> Denghui
>
> Denghui Dong has updated the pull request incrementally with one additional commit since the last revision:
> 
>   fix build problem

To me something like `dtrace_object_alloc_base` should not be called directly (like a foo_impl function) but only as the implementation of the real API entry points. If that isn't the case here then lets drop the "base" part and just have a set of overloaded `dtrace_object_alloc` functions.

Thanks,
David

-------------

Changes requested by dholmes (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6181

From ddong at openjdk.java.net  Mon Nov  8 02:42:58 2021
From: ddong at openjdk.java.net (Denghui Dong)
Date: Mon, 8 Nov 2021 02:42:58 GMT
Subject: RFR: 8276209: Some call sites doesn't pass the parameter 'size' to
 SharedRuntime::dtrace_object_alloc(_base) [v3]
In-Reply-To: <oXp8apU_PFOujeuPDlfIdH1py41IfpOdAFxomx-JEww=.cd942605-1b73-4b5e-988a-899da8240a3d@github.com>
References: <oXp8apU_PFOujeuPDlfIdH1py41IfpOdAFxomx-JEww=.cd942605-1b73-4b5e-988a-899da8240a3d@github.com>
Message-ID: <LKSrIrGgK-Om81TQDqVH1FLQRK2Iq-ysrGe9yp0wbFc=.6c0aef61-db46-4b5d-94cb-53096828da5d@github.com>

> Hi,
> 
> Could I have a review of this fix that corrects the oop size value of dtrace_object_alloc(_base).
> 
> JDK-8039904 added a new parameter 'size' to SharedRuntime::dtrace_object_alloc and dtrace_object_alloc_base, but didn't modified the callsites(interpreter/c1/c2).
> 
> To make this fix as simple as possible, I overloaded dtrace_object_alloc_base rather than dtrace_object_alloc.
> 
> Thanks,
> Denghui

Denghui Dong has updated the pull request incrementally with one additional commit since the last revision:

  update according to comments

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6181/files
  - new: https://git.openjdk.java.net/jdk/pull/6181/files/8d597ebc..0527097e

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6181&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6181&range=01-02

  Stats: 17 lines in 13 files changed: 0 ins; 0 del; 17 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6181.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6181/head:pull/6181

PR: https://git.openjdk.java.net/jdk/pull/6181

From ddong at openjdk.java.net  Mon Nov  8 02:42:58 2021
From: ddong at openjdk.java.net (Denghui Dong)
Date: Mon, 8 Nov 2021 02:42:58 GMT
Subject: RFR: 8276209: Some call sites doesn't pass the parameter 'size' to
 SharedRuntime::dtrace_object_alloc(_base) [v2]
In-Reply-To: <cX8tuomGCoxKU2HDw3Xxce9kZuRpVoWzdeqKKCQ6IvM=.99f2c59f-434f-45df-be74-849567b09eed@github.com>
References: <oXp8apU_PFOujeuPDlfIdH1py41IfpOdAFxomx-JEww=.cd942605-1b73-4b5e-988a-899da8240a3d@github.com>
 <QMapNYMdQg_awIZ5RvxX3vN85dXICxb89n4Wk0HagQE=.7f8f7b39-6165-4fb4-9880-f64715e506f0@github.com>
 <cX8tuomGCoxKU2HDw3Xxce9kZuRpVoWzdeqKKCQ6IvM=.99f2c59f-434f-45df-be74-849567b09eed@github.com>
Message-ID: <SFYvmnTNfQ7cJx8zxxeKw4c1ZcklOM-p9J9z7eMKQ_U=.e3a269d7-40d6-4518-a4cd-6642dc35f228@github.com>

On Mon, 8 Nov 2021 01:51:30 GMT, David Holmes <dholmes at openjdk.org> wrote:

> To me something like `dtrace_object_alloc_base` should not be called directly (like a foo_impl function) but only as the implementation of the real API entry points. If that isn't the case here then lets drop the "base" part and just have a set of overloaded `dtrace_object_alloc` functions.
> 
> Thanks, David

Changed.

Thanks,
Denghui

-------------

PR: https://git.openjdk.java.net/jdk/pull/6181

From dholmes at openjdk.java.net  Mon Nov  8 05:23:38 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Mon, 8 Nov 2021 05:23:38 GMT
Subject: RFR: 8276209: Some call sites doesn't pass the parameter 'size' to
 SharedRuntime::dtrace_object_alloc(_base) [v3]
In-Reply-To: <LKSrIrGgK-Om81TQDqVH1FLQRK2Iq-ysrGe9yp0wbFc=.6c0aef61-db46-4b5d-94cb-53096828da5d@github.com>
References: <oXp8apU_PFOujeuPDlfIdH1py41IfpOdAFxomx-JEww=.cd942605-1b73-4b5e-988a-899da8240a3d@github.com>
 <LKSrIrGgK-Om81TQDqVH1FLQRK2Iq-ysrGe9yp0wbFc=.6c0aef61-db46-4b5d-94cb-53096828da5d@github.com>
Message-ID: <2selTWcClSy4aHruUOuUlErZUVny8-VgrladwcRJVm4=.830e17bf-d507-452e-a37f-f6c819666955@github.com>

On Mon, 8 Nov 2021 02:42:58 GMT, Denghui Dong <ddong at openjdk.org> wrote:

>> Hi,
>> 
>> Could I have a review of this fix that corrects the oop size value of dtrace_object_alloc(_base).
>> 
>> JDK-8039904 added a new parameter 'size' to SharedRuntime::dtrace_object_alloc and dtrace_object_alloc_base, but didn't modified the callsites(interpreter/c1/c2).
>> 
>> To make this fix as simple as possible, I overloaded dtrace_object_alloc_base rather than dtrace_object_alloc.
>> 
>> Thanks,
>> Denghui
>
> Denghui Dong has updated the pull request incrementally with one additional commit since the last revision:
> 
>   update according to comments

Did your change actually work? I just realized that you can't use this:

```__ call(RuntimeAddress(CAST_FROM_FN_PTR(address, static_cast<int (*)(oopDesc*)>(SharedRuntime::dtrace_object_alloc))));```

because it has no idea what overload of `dtrace_object_alloc` needs to be invoked.

David

-------------

PR: https://git.openjdk.java.net/jdk/pull/6181

From dholmes at openjdk.java.net  Mon Nov  8 05:36:33 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Mon, 8 Nov 2021 05:36:33 GMT
Subject: RFR: 8276209: Some call sites doesn't pass the parameter 'size' to
 SharedRuntime::dtrace_object_alloc(_base) [v3]
In-Reply-To: <LKSrIrGgK-Om81TQDqVH1FLQRK2Iq-ysrGe9yp0wbFc=.6c0aef61-db46-4b5d-94cb-53096828da5d@github.com>
References: <oXp8apU_PFOujeuPDlfIdH1py41IfpOdAFxomx-JEww=.cd942605-1b73-4b5e-988a-899da8240a3d@github.com>
 <LKSrIrGgK-Om81TQDqVH1FLQRK2Iq-ysrGe9yp0wbFc=.6c0aef61-db46-4b5d-94cb-53096828da5d@github.com>
Message-ID: <pGhm_JF9LJ-UbTs63bpN5606XQRlMHOjudR59qge2vk=.dd99f8dc-07f8-460c-be06-bdf23fa7dbba@github.com>

On Mon, 8 Nov 2021 02:42:58 GMT, Denghui Dong <ddong at openjdk.org> wrote:

>> Hi,
>> 
>> Could I have a review of this fix that corrects the oop size value of dtrace_object_alloc(_base).
>> 
>> JDK-8039904 added a new parameter 'size' to SharedRuntime::dtrace_object_alloc and dtrace_object_alloc_base, but didn't modified the callsites(interpreter/c1/c2).
>> 
>> To make this fix as simple as possible, I overloaded dtrace_object_alloc_base rather than dtrace_object_alloc.
>> 
>> Thanks,
>> Denghui
>
> Denghui Dong has updated the pull request incrementally with one additional commit since the last revision:
> 
>   update according to comments

Sorry ignore that. I see that is what the static_cast is intended to do. I got confused by the need to change the additional call-sites that already used `dtrace_object_alloc`, because they were the ones previously calling the two-arg function but only passing one arg!

-------------

PR: https://git.openjdk.java.net/jdk/pull/6181

From dholmes at openjdk.java.net  Mon Nov  8 05:52:34 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Mon, 8 Nov 2021 05:52:34 GMT
Subject: RFR: 8276209: Some call sites doesn't pass the parameter 'size' to
 SharedRuntime::dtrace_object_alloc(_base) [v3]
In-Reply-To: <LKSrIrGgK-Om81TQDqVH1FLQRK2Iq-ysrGe9yp0wbFc=.6c0aef61-db46-4b5d-94cb-53096828da5d@github.com>
References: <oXp8apU_PFOujeuPDlfIdH1py41IfpOdAFxomx-JEww=.cd942605-1b73-4b5e-988a-899da8240a3d@github.com>
 <LKSrIrGgK-Om81TQDqVH1FLQRK2Iq-ysrGe9yp0wbFc=.6c0aef61-db46-4b5d-94cb-53096828da5d@github.com>
Message-ID: <ZVXfwHxk0LOodJ6JcFswnOXsJyfsGOvMcyiMNviagrI=.2d431faa-3788-4fbb-b969-2d695b4f1dc6@github.com>

On Mon, 8 Nov 2021 02:42:58 GMT, Denghui Dong <ddong at openjdk.org> wrote:

>> Hi,
>> 
>> Could I have a review of this fix that corrects the oop size value of dtrace_object_alloc(_base).
>> 
>> JDK-8039904 added a new parameter 'size' to SharedRuntime::dtrace_object_alloc and dtrace_object_alloc_base, but didn't modified the callsites(interpreter/c1/c2).
>> 
>> To make this fix as simple as possible, I overloaded dtrace_object_alloc_base rather than dtrace_object_alloc.
>> 
>> Thanks,
>> Denghui
>
> Denghui Dong has updated the pull request incrementally with one additional commit since the last revision:
> 
>   update according to comments

I understand now what you meant by the additional overload complicating the fix - I hadn't appreciated that may have been the reason for using different names for the functions originally.

I'm still unclear why the lack of the size argument did not cause problems? I guess whatever random value was next on the stack got read as the size, but reading it caused no harm it was just incorrect.

These changes look good to me now.

Thanks,
David

-------------

Marked as reviewed by dholmes (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6181

From tschatzl at openjdk.java.net  Mon Nov  8 09:51:36 2021
From: tschatzl at openjdk.java.net (Thomas Schatzl)
Date: Mon, 8 Nov 2021 09:51:36 GMT
Subject: RFR: 8276618: Pad cacheline for Thread::_rcu_counter
In-Reply-To: <6kHhrYgTQ2_ST7TG7H0Syf6_QR8OW4qTc1KGIRJMhWE=.e29aee68-ca4e-46b0-a930-fc38e5176ca9@github.com>
References: <6kHhrYgTQ2_ST7TG7H0Syf6_QR8OW4qTc1KGIRJMhWE=.e29aee68-ca4e-46b0-a930-fc38e5176ca9@github.com>
Message-ID: <KIMR7OXS5ynzzBmyIPXIgB2nBx7FNvgVk7Rug2DMOBQ=.a831d649-f2fa-442f-83f9-879ee0a4dc81@github.com>

On Thu, 4 Nov 2021 05:09:48 GMT, Hamlin Li <mli at openjdk.org> wrote:

> Currently, Thread::_rcu_counter is not padded by cacheline, it should be beneficail to do so.
> 
> The initial spebjbb test shows about 10.5% improvement of critical, and 0.7% improvement of max in specjbb2015, specjbb arguments:
>   GROUP_COUNT=4
>   TI_JVM_COUNT=1
>   JAVA_OPTS_BE="-server -XX:+UseG1GC -Xms32g -Xmx32g"
>   MODE_ARGS="-ikv"

Hi,

  we tried to reproduce your numbers internally, but failed to do so. Differences seem to be within noise.
  
  We tried with specjbb2015 multi-jvm on a fairly large machine (152 threads; it was what has been "on hand") and multiple runs of our internal benchmarks and specjbb2015 composite runs on various (not-so-large but still fairly big sized machines).

Could you post or send more details about your configuration?

The other concern that has been brought up internally has been that this increases the size of Thread by ~40% from 624 to 872 bytes; do you think there a way to save some memory by reorganizing the fields so that the counter is on a separate cache line "naturally"?

Thanks,
  Thomas

-------------

PR: https://git.openjdk.java.net/jdk/pull/6246

From ddong at openjdk.java.net  Mon Nov  8 12:11:33 2021
From: ddong at openjdk.java.net (Denghui Dong)
Date: Mon, 8 Nov 2021 12:11:33 GMT
Subject: RFR: 8276209: Some call sites doesn't pass the parameter 'size' to
 SharedRuntime::dtrace_object_alloc(_base) [v3]
In-Reply-To: <LKSrIrGgK-Om81TQDqVH1FLQRK2Iq-ysrGe9yp0wbFc=.6c0aef61-db46-4b5d-94cb-53096828da5d@github.com>
References: <oXp8apU_PFOujeuPDlfIdH1py41IfpOdAFxomx-JEww=.cd942605-1b73-4b5e-988a-899da8240a3d@github.com>
 <LKSrIrGgK-Om81TQDqVH1FLQRK2Iq-ysrGe9yp0wbFc=.6c0aef61-db46-4b5d-94cb-53096828da5d@github.com>
Message-ID: <Hw3Qr6T9tKtsuzlrxziKciYFZ9Ogpd3tE6OHmDuKB-c=.20765def-c24c-45f5-8f4e-6a5a24fdbdb8@github.com>

On Mon, 8 Nov 2021 02:42:58 GMT, Denghui Dong <ddong at openjdk.org> wrote:

>> Hi,
>> 
>> Could I have a review of this fix that corrects the oop size value of dtrace_object_alloc(_base).
>> 
>> JDK-8039904 added a new parameter 'size' to SharedRuntime::dtrace_object_alloc and dtrace_object_alloc_base, but didn't modified the callsites(interpreter/c1/c2).
>> 
>> To make this fix as simple as possible, I overloaded dtrace_object_alloc_base rather than dtrace_object_alloc.
>> 
>> Thanks,
>> Denghui
>
> Denghui Dong has updated the pull request incrementally with one additional commit since the last revision:
> 
>   update according to comments

Thank you, David.

Could I have another review?

-------------

PR: https://git.openjdk.java.net/jdk/pull/6181

From mli at openjdk.java.net  Mon Nov  8 12:36:36 2021
From: mli at openjdk.java.net (Hamlin Li)
Date: Mon, 8 Nov 2021 12:36:36 GMT
Subject: RFR: 8276618: Pad cacheline for Thread::_rcu_counter
In-Reply-To: <6kHhrYgTQ2_ST7TG7H0Syf6_QR8OW4qTc1KGIRJMhWE=.e29aee68-ca4e-46b0-a930-fc38e5176ca9@github.com>
References: <6kHhrYgTQ2_ST7TG7H0Syf6_QR8OW4qTc1KGIRJMhWE=.e29aee68-ca4e-46b0-a930-fc38e5176ca9@github.com>
Message-ID: <89OJo1V3vrvkTQ-dU-97B54AVaFV7eBlBe_vp6oXRmU=.f3918f6a-290c-4c85-b25b-3eef082a82fc@github.com>

On Thu, 4 Nov 2021 05:09:48 GMT, Hamlin Li <mli at openjdk.org> wrote:

> Currently, Thread::_rcu_counter is not padded by cacheline, it should be beneficail to do so.
> 
> The initial spebjbb test shows about 10.5% improvement of critical, and 0.7% improvement of max in specjbb2015.
> 
> 
> 
> ========= test result (1st round) ==========
> rcu		base
> 45096		38980
> 41741		41468
> 42349		41053
> 44485		42030
> 47103		39915
> 43864		36004
> 
> ==== average ====
> 44106.33333		39908.33333
> 
> ==== improvement ====
> 10.5%
> 
> ========= test result (2nd round) ==========
> Second round of run includes 3 types: 
> 1. pad gc data & pad rcu
> 2. pad rcu only
> 3. base
> 
> Although the improvement is not that much as the previous round (10%), but still got about 3~4% improvement.
> 
> gc data + rcu	rcu	base
> 41284	41860	37099
> 42296	42166	44692
> 42810	43423	41801
> 43492	45603	40274
> 43808	40641	39627
> 43029	40242	39793
> 42543	41662	41544
> 43420	42702	37991
> 44212	43354	40319
> 42692	43442	45264
> 44773	44577	44213
> 40835	41870	42008
> 44282	44167	42527
> 
> ==== average ====
> 43036.61538	42746.84615	41319.38462
> 
> ==== improvement ====
> gc data + rcu / base: 4.156%
> rcu / base: 3.45%
> 
> 
> 
> 
> ========= configuration and environment ==========
> specjbb arguments:
>   GROUP_COUNT=4
>   TI_JVM_COUNT=1
> 
>   SPEC_OPTS_C="-Dspecjbb.group.count=$GROUP_COUNT -Dspecjbb.txi.pergroup.count=$TI_JVM_COUNT"
>   SPEC_OPTS_TI=""
>   SPEC_OPTS_BE=""
> 
>   JAVA_OPTS_C="-server -Xms2g -Xmx2g -XX:+UseParallelGC"
>   JAVA_OPTS_TI="-server -Xms2g -Xmx2g -XX:+UseParallelGC"
>   JAVA_OPTS_BE="-server -XX:+UseG1GC -Xms32g -Xmx32g"
> 
>   MODE_ARGS_C="-ikv"
>   MODE_ARGS_TI="-ikv"
>   MODE_ARGS_BE="-ikv"
> 
>   NUM_OF_RUNS=1
> 
> HW:
>   Architecture:        x86_64
>   CPU op-mode(s):      32-bit, 64-bit
>   Byte Order:          Little Endian
>   CPU(s):              224
>   On-line CPU(s) list: 0-223
>   Thread(s) per core:  2
>   Core(s) per socket:  28
>   Socket(s):           4
>   NUMA node(s):        4
>   Vendor ID:           GenuineIntel
>   CPU family:          6
>   Model:               85
>   Model name:          Intel(R) Xeon(R) Platinum 8176M CPU @ 2.10GHz
>   Stepping:            4
>   CPU MHz:             1001.925
>   CPU max MHz:         2101.0000
>   CPU min MHz:         1000.0000
>   BogoMIPS:            4200.00
>   Virtualization:      VT-x
>   L1d cache:           32K
>   L1i cache:           32K
>   L2 cache:            1024K
>   L3 cache:            39424K
>   NUMA node0 CPU(s):   0-27,112-139
>   NUMA node1 CPU(s):   28-55,140-167
>   NUMA node2 CPU(s):   56-83,168-195
>   NUMA node3 CPU(s):   84-111,196-223
> 
>               total        used        free      shared  buff/cache   available
> Mem:           3.0T        3.8G        2.9T         18M         25G        2.9T
> Swap:           99G          0B         99G

Thanks Thomas for the feedback.

I have updated the summary of this PR with more configuration and environment info, and I also updated the 2nd round of run. Although the improvement is not that much as the previous round (10%), but still got about 3~4% improvement, and seems the data is more stable than the 1st round of run.
(JBS is not available currently, will update JBS too later)

-------------

PR: https://git.openjdk.java.net/jdk/pull/6246

From mli at openjdk.java.net  Mon Nov  8 12:41:34 2021
From: mli at openjdk.java.net (Hamlin Li)
Date: Mon, 8 Nov 2021 12:41:34 GMT
Subject: RFR: 8276618: Pad cacheline for Thread::_rcu_counter
In-Reply-To: <KIMR7OXS5ynzzBmyIPXIgB2nBx7FNvgVk7Rug2DMOBQ=.a831d649-f2fa-442f-83f9-879ee0a4dc81@github.com>
References: <6kHhrYgTQ2_ST7TG7H0Syf6_QR8OW4qTc1KGIRJMhWE=.e29aee68-ca4e-46b0-a930-fc38e5176ca9@github.com>
 <KIMR7OXS5ynzzBmyIPXIgB2nBx7FNvgVk7Rug2DMOBQ=.a831d649-f2fa-442f-83f9-879ee0a4dc81@github.com>
Message-ID: <DOcHtmUiWEiVVFK9wZxb38Qd8Cx0Qz-tqtI4lqmhk4A=.dc24908d-3388-4a5f-add4-88e778663dfc@github.com>

On Mon, 8 Nov 2021 09:48:24 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:

> The other concern that has been brought up internally has been that this increases the size of Thread by ~40% from 624 to 872 bytes; do you think there a way to save some memory by reorganizing the fields so that the counter is on a separate cache line "naturally"?

Sure, if current change is proven to bring some performance benefit, let me do some more research in this direction.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6246

From coleenp at openjdk.java.net  Mon Nov  8 13:48:42 2021
From: coleenp at openjdk.java.net (Coleen Phillimore)
Date: Mon, 8 Nov 2021 13:48:42 GMT
Subject: RFR: 8276209: Some call sites doesn't pass the parameter 'size' to
 SharedRuntime::dtrace_object_alloc(_base) [v3]
In-Reply-To: <LKSrIrGgK-Om81TQDqVH1FLQRK2Iq-ysrGe9yp0wbFc=.6c0aef61-db46-4b5d-94cb-53096828da5d@github.com>
References: <oXp8apU_PFOujeuPDlfIdH1py41IfpOdAFxomx-JEww=.cd942605-1b73-4b5e-988a-899da8240a3d@github.com>
 <LKSrIrGgK-Om81TQDqVH1FLQRK2Iq-ysrGe9yp0wbFc=.6c0aef61-db46-4b5d-94cb-53096828da5d@github.com>
Message-ID: <5ddSIvC7u6q93eA32fdwCpNuOKpfE0oOpI7qcL1yZ9I=.c2a2077d-9b82-4330-a5c9-2a893996d374@github.com>

On Mon, 8 Nov 2021 02:42:58 GMT, Denghui Dong <ddong at openjdk.org> wrote:

>> Hi,
>> 
>> Could I have a review of this fix that corrects the oop size value of dtrace_object_alloc(_base).
>> 
>> JDK-8039904 added a new parameter 'size' to SharedRuntime::dtrace_object_alloc and dtrace_object_alloc_base, but didn't modified the callsites(interpreter/c1/c2).
>> 
>> To make this fix as simple as possible, I overloaded dtrace_object_alloc_base rather than dtrace_object_alloc.
>> 
>> Thanks,
>> Denghui
>
> Denghui Dong has updated the pull request incrementally with one additional commit since the last revision:
> 
>   update according to comments

These casts are hard to look at but it seems fine.

-------------

Marked as reviewed by coleenp (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6181

From ddong at openjdk.java.net  Mon Nov  8 14:34:40 2021
From: ddong at openjdk.java.net (Denghui Dong)
Date: Mon, 8 Nov 2021 14:34:40 GMT
Subject: Integrated: 8276209: Some call sites doesn't pass the parameter 'size'
 to SharedRuntime::dtrace_object_alloc(_base)
In-Reply-To: <oXp8apU_PFOujeuPDlfIdH1py41IfpOdAFxomx-JEww=.cd942605-1b73-4b5e-988a-899da8240a3d@github.com>
References: <oXp8apU_PFOujeuPDlfIdH1py41IfpOdAFxomx-JEww=.cd942605-1b73-4b5e-988a-899da8240a3d@github.com>
Message-ID: <BdpCSYBACIZErcEPqBKNOCaHydFfAsuUxeqVWdnBXNU=.d177755f-5b39-4393-bf28-adf6639a2a08@github.com>

On Sun, 31 Oct 2021 15:08:11 GMT, Denghui Dong <ddong at openjdk.org> wrote:

> Hi,
> 
> Could I have a review of this fix that corrects the oop size value of dtrace_object_alloc(_base).
> 
> JDK-8039904 added a new parameter 'size' to SharedRuntime::dtrace_object_alloc and dtrace_object_alloc_base, but didn't modified the callsites(interpreter/c1/c2).
> 
> To make this fix as simple as possible, I overloaded dtrace_object_alloc_base rather than dtrace_object_alloc.
> 
> Thanks,
> Denghui

This pull request has now been integrated.

Changeset: c815c5cb
Author:    Denghui Dong <ddong at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/c815c5cbbb0b6a2aebd0a38cb930c74bd665d082
Stats:     22 lines in 13 files changed: 6 ins; 0 del; 16 mod

8276209: Some call sites doesn't pass the parameter 'size' to SharedRuntime::dtrace_object_alloc(_base)

Reviewed-by: dholmes, coleenp

-------------

PR: https://git.openjdk.java.net/jdk/pull/6181

From duke at openjdk.java.net  Tue Nov  9 10:06:42 2021
From: duke at openjdk.java.net (duke)
Date: Tue, 9 Nov 2021 10:06:42 GMT
Subject: Withdrawn: 8261492: Shenandoah: reconsider forwardee accesses memory
 ordering
In-Reply-To: <eP3bMKbgFS1MJ1MO2DUcTom0eNJT8zszs4A-PJXIgsI=.3cead947-d093-466a-9cfd-0c312eec592a@github.com>
References: <eP3bMKbgFS1MJ1MO2DUcTom0eNJT8zszs4A-PJXIgsI=.3cead947-d093-466a-9cfd-0c312eec592a@github.com>
Message-ID: <fRokf_yhB5g276oiNbLz9gGYTGYn7kygJj5SyGGcFTM=.a09e8aec-64a6-4111-8edd-ce5d00730150@github.com>

On Wed, 10 Feb 2021 08:55:39 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> Shenandoah carries forwardee information in object's mark word. Installing the new mark word is effectively "releasing" the object copy, and reading from the new mark word is "acquiring" that object copy.
> 
> For the forwardee update side, Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. This seems to be excessive for Shenandoah forwardee updates, and "release" is enough.
> 
> The reader side is much more interesting, because we generally want "consume", but it is not available. We can do "acquire", but it regresses performance all too much. The close inspection of the code reveals we need "acquire" on many paths, but not on the most critical one: heap updates. This must explain why current weaker reader side was never seen to fail, and this also opens a way to get `acquire`-in-lieu-of-`consume` without the observable performance penalty.
> 
> The relaxation in forwardee installation improves concurrent evacuation quite visibly. See for example GC cycle times with SPECjvm2008, Compiler.sunflow on AArch64:
> 
> Before:
> 
> 
> [info][gc,stats] Concurrent Evacuation          =    3.421 s (a =    21247 us) (n =   161)
> [info][gc,stats] Concurrent Evacuation          =    3.584 s (a =    21080 us) (n =   170)
> [info][gc,stats] Concurrent Evacuation          =    3.226 s (a =    21088 us) (n =   153)
> [info][gc,stats] Concurrent Evacuation          =    3.270 s (a =    20827 us) (n =   157)
> [info][gc,stats] Concurrent Evacuation          =    3.339 s (a =    20742 us) (n =   161)
> 
> 
> After:
> 
> [info][gc,stats] Concurrent Evacuation          =    3.109 s (a =    18617 us) (n =   167)
> [info][gc,stats] Concurrent Evacuation          =    3.027 s (a =    18918 us) (n =   160) 
> [info][gc,stats] Concurrent Evacuation          =    2.862 s (a =    17669 us) (n =   162) 
> [info][gc,stats] Concurrent Evacuation          =    2.858 s (a =    17425 us) (n =   164) 
> [info][gc,stats] Concurrent Evacuation          =    2.883 s (a =    17685 us) (n =   163) 
> 
> 
> Additional testing:
>  - [x] Linux x86_64 `hotspot_gc_shenandoah`
>  - [x] Linux AArch64 `hotspot_gc_shenandoah`
>  - [x] Linux x86_64 `tier1` with Shenandoah
>  - [x] Linux AArch64 `tier1` with Shenandoah

This pull request has been closed without being integrated.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2496

From darcy at openjdk.java.net  Tue Nov  9 17:36:42 2021
From: darcy at openjdk.java.net (Joe Darcy)
Date: Tue, 9 Nov 2021 17:36:42 GMT
Subject: RFR: JDK-8276588: Change "ccc" to "CSR" in HotSpot sources
In-Reply-To: <GaM740gmpgFUGPZ_ntVamyU4bZ0jLqjq-Fh0LhjOs6w=.c8a935c2-93b5-4f64-9f56-f1297d73a5f4@github.com>
References: <It2_AUSe6_4CClR6PCcsjV55_e4U6RKvfyI7ElCFOcM=.93b3eb7d-3035-46f1-a8a6-7c3991b8cbc5@github.com>
 <Ijl2_UPhqBgx-BZ9m98RnbJZHKsxqKbUlmBrjtjs0sM=.bbc59704-900b-43c2-a851-2646cb72f87c@github.com>
 <NOqlt_187bGblvVAOKmY-OG_OLwxrM05J6mTyJxyQUE=.e1cf86ab-8c5c-45c3-b6d4-75ecd23b901e@github.com>
 <GaM740gmpgFUGPZ_ntVamyU4bZ0jLqjq-Fh0LhjOs6w=.c8a935c2-93b5-4f64-9f56-f1297d73a5f4@github.com>
Message-ID: <8tp4-qz24AVj-fehq34bqnlQaredE_VuoYG9-_Mq68c=.716e3172-a465-43af-a566-58c78ac72839@github.com>

On Thu, 4 Nov 2021 07:26:34 GMT, Alan Bateman <alanb at openjdk.org> wrote:

>> Given the 'R' in CSR already stands for Review this should have said "CSR request".
>> 
>> But I also have no idea what the comment is actually trying to say - what is "these" referring to???
>
> I don't know why that comment is there. The API is Class::getSigners and any changes to its behavior would require a CSR, but we are free to change the implementation. So maybe the comment should be removed.

Filed JDK-8276889 in case further cleanup of the wording in instanceKlass.cpp is desired.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6240

From coleenp at openjdk.java.net  Tue Nov  9 18:17:42 2021
From: coleenp at openjdk.java.net (Coleen Phillimore)
Date: Tue, 9 Nov 2021 18:17:42 GMT
Subject: RFR: JDK-8276588: Change "ccc" to "CSR" in HotSpot sources
In-Reply-To: <8tp4-qz24AVj-fehq34bqnlQaredE_VuoYG9-_Mq68c=.716e3172-a465-43af-a566-58c78ac72839@github.com>
References: <It2_AUSe6_4CClR6PCcsjV55_e4U6RKvfyI7ElCFOcM=.93b3eb7d-3035-46f1-a8a6-7c3991b8cbc5@github.com>
 <Ijl2_UPhqBgx-BZ9m98RnbJZHKsxqKbUlmBrjtjs0sM=.bbc59704-900b-43c2-a851-2646cb72f87c@github.com>
 <NOqlt_187bGblvVAOKmY-OG_OLwxrM05J6mTyJxyQUE=.e1cf86ab-8c5c-45c3-b6d4-75ecd23b901e@github.com>
 <GaM740gmpgFUGPZ_ntVamyU4bZ0jLqjq-Fh0LhjOs6w=.c8a935c2-93b5-4f64-9f56-f1297d73a5f4@github.com>
 <8tp4-qz24AVj-fehq34bqnlQaredE_VuoYG9-_Mq68c=.716e3172-a465-43af-a566-58c78ac72839@github.com>
Message-ID: <3zjndpnkWASjczngWhkP4X3a5dVGeICaKIite9uicOg=.48267e6d-293d-4b3c-8fbe-75fdc6cded58@github.com>

On Tue, 9 Nov 2021 17:33:07 GMT, Joe Darcy <darcy at openjdk.org> wrote:

>> I don't know why that comment is there. The API is Class::getSigners and any changes to its behavior would require a CSR, but we are free to change the implementation. So maybe the comment should be removed.
>
> Filed JDK-8276889 in case further cleanup of the wording in instanceKlass.cpp is desired.

Oh at one point we were trying to figure out a different way of implementing signers so that it didn't have to store a field per InstanceKlass, when we were working on density.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6240

From coleenp at openjdk.java.net  Tue Nov  9 20:04:37 2021
From: coleenp at openjdk.java.net (Coleen Phillimore)
Date: Tue, 9 Nov 2021 20:04:37 GMT
Subject: RFR: 8269986: Remove +3 from Symbol::identity_hash()
In-Reply-To: <sZinjAxTK8hjhtSVd0OktG6h44GR7vX2hudlz5sfCC0=.707fd475-c9a9-4da0-9964-d971e10a1176@github.com>
References: <sZinjAxTK8hjhtSVd0OktG6h44GR7vX2hudlz5sfCC0=.707fd475-c9a9-4da0-9964-d971e10a1176@github.com>
Message-ID: <3QfGGc4vIbwBz-k8URuVmp2bVWOID4UQmEwKBSQo7Ls=.61a1aca5-7b04-4017-a37a-3f82a6327e9c@github.com>

On Sun, 7 Nov 2021 21:10:35 GMT, Ioi Lam <iklam at openjdk.org> wrote:

> Please review this change that removes the `+3` from here:
> 
> 
>   unsigned Symbol::identity_hash() const {
>     unsigned addr_bits = (unsigned)((uintptr_t)this >> (LogMinObjAlignmentInBytes + 3));
>                                                                                   ^^^
>     return ((unsigned)extract_hash(_hash_and_refcount) & 0xffff) |
>            ((addr_bits ^ (length() << 8) ^ (( _body[0] << 8) | _body[1])) << 16);
>   }
> 
> 
> The `+3` was intended to avoid getting the same value for these bits:
> 
> 
> ((uintptr_t)this) >> LogMinObjAlignmentInBytes) & 0x07)
> 
> 
> However, as shown in the [bug report](https://bugs.openjdk.java.net/browse/JDK-8269986), the values for these bits are evenly distributed. So the `+3` is not necessary and may actually be counter-productive.
> 
> Testing: Oracle CI tiers 1-4

Looks good! Thanks for doing the performance analysis.

-------------

Marked as reviewed by coleenp (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6287

From duke at openjdk.java.net  Wed Nov 10 01:10:45 2021
From: duke at openjdk.java.net (duke)
Date: Wed, 10 Nov 2021 01:10:45 GMT
Subject: Withdrawn: 8273239: Standardize Ticks APIs return type
In-Reply-To: <V-rEEDyaurdujp_BRqkWsHmG9OZmeA2bqlcsdmBnISY=.d99bade4-9ad3-4360-8d6f-40d5b8474514@github.com>
References: <V-rEEDyaurdujp_BRqkWsHmG9OZmeA2bqlcsdmBnISY=.d99bade4-9ad3-4360-8d6f-40d5b8474514@github.com>
Message-ID: <dqNY6f0FTcUxIVisqLwU6Ty9A2EfqqlmzLacq_jGcyQ=.cc0bd293-9666-47d1-87ec-a5c52ce1d401@github.com>

On Wed, 1 Sep 2021 14:38:52 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote:

> Simple change on return types of Ticks API.
> 
> The call of `milliseconds()` in `spinYield.cpp` seems a bug to me, because the unit in the message is `usecs`. Therefore, I changed it to `microseconds()`.
> 
> Test: tier1

This pull request has been closed without being integrated.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5332

From chris.plummer at oracle.com  Wed Nov 10 05:50:38 2021
From: chris.plummer at oracle.com (Chris Plummer)
Date: Tue, 9 Nov 2021 21:50:38 -0800
Subject: [External] : Re: RFC: Extend DCmd(Diagnostic-Command) framework
 to support Java level DCmd
In-Reply-To: <d1884e62-1ebe-45c9-9b05-c17c98e2953e.qingfeng.yy@alibaba-inc.com>
References: <7f2df6ad-7d73-46ac-a23e-959fd6b4d4af.denghui.ddh@alibaba-inc.com>
 <6581e9e4-851a-c562-37af-915ff3fdc492@oracle.com>
 <8a5c6087-a131-4ddd-9195-0f1e51705351.denghui.ddh@alibaba-inc.com>
 <b320493e-225f-c1ce-1daf-68b7673668f7@oracle.com>
 <d1884e62-1ebe-45c9-9b05-c17c98e2953e.qingfeng.yy@alibaba-inc.com>
Message-ID: <0d05daaa-82cb-1537-5292-03d8e1d1d625@oracle.com>

Hi Denghui,

Following up here with something that was discussed in the other email 
thread, Ioi asked if an MBean could be used to provide similar app 
diagnostics. It seems it can be.

Erik also mentioned to me that the REST API can be used for something 
like this. The example he gave is a query something like the following:

 ?? curl http://localhost:8080?command=foo?param1=bar

Also it has been pointed out that sockets could be used. I know a jcmd 
might be easier to use/access than any of these other 3 approaches, but 
we have to question if it is worth adding given all the concerns that 
have been pointed so far.

thanks,

Chris

On 11/5/21 1:34 AM, Yi Yang wrote:
> Hi all,
>
> I had an offline discussion about this with Denghui, when I first time 
> hear this idea, I felt it was useful. It allows users to do some stuff 
> that requires a lot of effort in a simple way. I'm also tracking 
> discussion on the mailing list, I've seen many folks come up with very 
> constructive comments and questions/concerns. In order to make the 
> follow-up discussion simple, I want to try to summarize and give some 
> answers on behalf of myself. Each headline is a question/concern that 
> folks are concerned about, followed by my personal opinion on it. I'd 
> appreciate it if you can append any missing content.
>
> === What is it?
> It provides the ability for users to trigger predefined callbacks 
> while the application is running.
>
> === May misuse?
> It is provided through jcmd, this ability should ideally be used for 
> debugging/development/diagnosis purposes. It may be misused, but this 
> is beyond our control, just as users can use signal handler to 
> download App and play a song.
>
> === Maintainability?
> It expands current jcmd implementation rather than a significant 
> modification, so maintainability should be ok IMHO.
>
> === Safety?
> Undeniably, it may raise some potential security issues.
>
> === Alternatives?
> Socket: It is inconvenient for users to simply do the same thing 
> compared to this, we have to write a lot of boilerplate socket code.
> Signal: Not open to users,? a limited number of signals, more likely 
> to be misused.
>
> === Purpose?
> 1. I have a web application that can analyze Java heap dump. I hope to 
> provide a simple way to report runtime app metrics, such as disk usage 
> and online worker load, instead of writing a complete web page and 
> providing an admin page to access it. This information can also be 
> gathered on other monitoring platforms.
> 2. Trigger the DEBUG functionality while running, output some debug logs
>
> Best regards.
>
>     ------------------------------------------------------------------
>     From:Chris Plummer <chris.plummer at oracle.com>
>     Send Time:2021 Nov. 4 (Thu.) 14:10
>     To:dong denghui <denghui.ddh at alibaba-inc.com>; serviceability-dev
>     <serviceability-dev at openjdk.java.net>; hotspot-dev
>     <hotspot-dev at openjdk.java.net>
>     Subject:Re: [External] : Re: RFC: Extend DCmd(Diagnostic-Command)
>     framework to support Java level DCmd
>
>     Hi Denghui,
>
>     Yes, there are other ways the same thing could be accomplished
>     like sockets or signals, but all of this is outside of the purview
>     of the JDK, and therefore we don't become responsible for its
>     design, maintenance, and potential security concerns.
>     EnableUserLevelDCmd doesn't really fix any of these concerns,
>     because an app can just always launch with this flag enabled. It
>     really should be reserved for launching a JVM for the specific
>     purpose of gathering some extra diagnostic data, but there is no
>     way to enforce that.
>
>     Anyway, I'm not the gatekeeper on this. Just expressing some of my
>     concerns. Others have done the same. I think we've seen a lack of
>     enthusiasm in favor of doing this except from you. I would be good
>     to see input from others that would like this feature in place.
>
>     cheers,
>
>     Chris
>
>     On 11/1/21 8:09 PM, Denghui Dong wrote:
>     Hi?Chris,
>
>     Thank?you?for?the?comments.
>
>     Yes,?we?have?no?good?way?to?restrict?the?user?registration?commands?to?only?include?diagnosis-related?operations,?but?in?my?opinion,?this?does?not?seem?to?be?a?problem?that?must?be?solved?perfectly.
>
>     The?following?are?my?thoughts.
>
>     This?extension?is?an?entry?that?triggers?the?operation?that?the?user?wants?to?perform?(similar?to?the?Signal?Handler?mechanism?but?with?a?name?and?parameters).?Even?without?this?extension,?the?user?can?have?other?ways?to?achieve?the?same?goal.
>
>     On?the?one?hand,?we?could?standardize?the?usage?scenarios?of?the?API?on?the?document(Indeed,?users?can?still?write?programs?not?in?accordance?with?the?specifications,?for?example,?users?can?implement?multiple?calls?to?the?same?object's?hachCode?method?to?return?different?values?or?make?an?object?alive?again?during?finalize?method?executing).
>
>     On?the?other?hand,?we?can?add?some?restrictions?to?help?users?make?better?use?of?this?extension.
>     e.g?we?can?add?a?new?VM?option,?such?as?EnableUserLevelDCmd,?the?application?can?only?register?customer?commands?when?this?option?is?enabled.
>
>     Or?from?another?perspective,?can?we?allow?users?to?do?some?non-diagnostic-related?operations?in?custom?commands?
>
>     Best,
>     Denghui
>     ------------------------------------------------------------------
>     From:Chris Plummer <chris.plummer at oracle.com>
>     Send Time:2021?11?2?(???) 03:35
>     To:???(??) <denghui.ddh at alibaba-inc.com>; serviceability-dev
>     <serviceability-dev at openjdk.java.net>; hotspot-dev
>     <hotspot-dev at openjdk.java.net>
>     Subject:Re: RFC: Extend DCmd(Diagnostic-Command) framework to
>     support Java level DCmd
>
>     I have similar concerns to those others have expressed, so I'll
>     try to add something new to the discussion and not just repeat.
>
>     DCMDs have historically been very VM centric. That's not to say
>     they aren't useful for debugging applications, but they do so by
>     providing VM related info like stack traces, heap dumps, and class
>     histograms. Also hotspot has been the gatekeeper for new DCMDs,
>     meaning that new ones do not get added without going through the
>     hotspot review process.
>
>     Allowing any application or framework to add a DCMD changes this
>     VM centric view in a way that concerns me. This approach allows a
>     DCMD to pretty much do anything (java security not withstanding).
>     App writers could even use them to provide a user facing
>     interface. For example, if an app has some sort internal database,
>     it could allow users to query it via a DCMD, and maybe even
>     suggest that users write simple shell scripts that use jcmd to do
>     these queries. Allowing this type of non-diagnostic usage seems
>     like a path we don't want to go down, yet I don't see how it can
>     be prevented once you allow applications to add DCMDs.
>
>     Chris
>
>     On 10/25/21 1:37 AM, Denghui Dong wrote:
>     Hi?there!
>
>     We'd?like?to?discuss?a?proposal?for?extending?the?current?DCmd?framework?to?support?Java?level?DCmd.
>
>     At?present,?DCmd?only?allows?the?VM?to?register?commands,?which?can?be?called?through?jcmd?or?JMX.?It?would?be?beneficial?if?the?user?could?create?their?own?commands.
>
>     The?idea?of
>     this?extension?originally?came?from?our?internal?Java?agent?that?detects?the?misusage?of?Unsafe?API.
>
>     This?agent?can?collect?the?call?sites?that?allocate?or?free?direct?memory?in?the?application(NMT?could?not?do?it?IMO)?to?detect?direct?memory?leaks.
>
>     In?the?beginning,?it?just?prints?all?call?sites,?without?any?statistical?function,?it's?hard?to?use.
>
>     So?we?plan?to?use?a?way?similar?to?jeprof?(from?jemalloc)?to?generate?a?report?file?that?aggregates?all?useful?information.
>
>     During?the?implementation?process,?we?found?that?we?need?a?mechanism?to?notify?the?agent?to?generate?reports.
>
>     The?common?practice?is:
>     a)?Register?a?service?port,?triggered?by?an?HTTP?request
>     b)?Triggered?by?signal
>     c)?Generate?reports?periodically,?or?when?the?process?exits
>
>     But?these?three?ways?have?certain?problems.
>     For?a)?we?need?to?introduce?a?network?component,?will?increase?the?complexity?of?implementation
>     For?b)?we?cannot?pass?parameters
>     For?c)?some?files?that?may?never?be?used?will?be?generated
>
>     Essentially,?this?question?is?how?to?notify?the?application?to?do?a?certain?task,?or?in?other?words,?how?do?we?issue?a?command?to?the?application.?We?believe?that?other?Java?developers?will?also?encounter?similar?problems.
>
>     (And?sometimes?there?may?be?multiple?unrelated?dependent?components?in?a?Java?application?that?require?such?a?mechanism.)
>
>     Naturally,?we?think?that?jcmd?can?already?issue?some?commands?registered?in?VM?to?the?application,?why?can't?we?extend?to?the?java?level?
>
>     This?feature?will?be?very?useful?for?some?lightweight?tools,?just?like?the?scenario?we?encountered,?to?notify?the?tools?to?perform?certain?operations.
>
>     In?addition,?this?feature?will?also?bring?benefits?to?Java?beginners.
>
>     For?example,?in?the?beginning,?beginners?may?not?use?advanced?log?components,?but?they?will?also?encounter?the?need?to?output?debug?logs.?They?may?write?code?like?this:
>
>     ```
>     ????if?(debug)?{
>     ??????System.out.println("...");
>     ????}
>     ```
>
>     If?developers?can?easily?control?the?value?of?debug,?it's?attractive.
>
>     Like?this:
>
>     ```
>     ????Factory.register("MyApp.flipDebug",?out?->?debug?=?!debug);
>
>     ????jcmd?<pid>?MyApp.flipDebug
>     ```
>
>     For?mainstream?framework,?we?can?apply?this?feature?to?trigger?some?common?activities,?such?as?health?checks,?graceful?shutdown,?and?dynamic?configuration?updates,?But?to?be?honest,?these?frameworks?are?very?mature?and?stable,?and?for?compatibility?purposes,?it's?hard?to?let?them?use?this?extension.
>
>     Comments?welcome!
>
>     Thanks,
>     Denghui
>
>


From duke at openjdk.java.net  Wed Nov 10 12:39:59 2021
From: duke at openjdk.java.net (Alan Hayward)
Date: Wed, 10 Nov 2021 12:39:59 GMT
Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64
Message-ID: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>

PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One
of its uses is to protect against ROP based attacks. This is done by
signing the Link Register whenever it is stored on the stack, and
authenticating the value when it is loaded back from the stack. If an
attacker were to try to change control flow by editing the stack then
the authentication check of the Link Register will fail, causing a
segfault when the function returns.

On a system with PAC enabled, it is expected that all applications will
be compiled with ROP protection. Fedora 33 and upwards already provide
this. By compiling for ARMv8.0, GCC and LLVM will only use the set of
PAC instructions that exist in the NOP space - on hardware without PAC,
these instructions act as NOPs, allowing backward compatibility for
negligible performance cost (2 NOPs per non-leaf function).

Hardware is currently limited to the Apple M1 MacBooks. All testing has
been done within a Fedora Docker image. A run of SpecJVM showed no
difference to that of noise - which was surprising.

The most important part of this patch is simply compiling using branch
protection provided by GCC/LLVM. This protects all C++ code from being
used in ROP attacks, removing all static ROP gadgets from use.

The remainder of the patch adds ROP protection to runtime generated
code, in both stubs and compiled Java code. Attacks here are much harder
as ROP gadgets must be found dynamically at runtime. If/when AOT
compilation is added to JDK, then all stubs and compiled Java will be
susceptible ROP gadgets being found by static analysis and therefore
potentially as vulnerable as C++ code.

There are a number of places where the VM changes control flow by
rewriting the stack or otherwise. I?ve done some analysis as to how
these could also be used for attacks (which I didn?t want to post here).
These areas can be protected ensuring the pointers to various stubs and
entry points are stored in memory as signed pointers. These changes are
simple to make (they can be reduced to a type change in common code and
a few addition sign/auth calls in the backend), but there a lot of them
and the total code change is fairly large. I?m happy to provide a few
work in progress patches.

In order to match the security benefits of the Apple Arm64e ABI across
the whole of JDK, then all the changes mentioned above would be
required.

-------------

Commit messages:
 - 8264130: PAC-RET protection for Linux/AArch64
 - Add PAC assembly instructions
 - Add AArch64 ROP protection runtime flag
 - Build with branch protection

Changes: https://git.openjdk.java.net/jdk/pull/6334/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8264130
  Stats: 1273 lines in 25 files changed: 457 ins; 20 del; 796 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6334.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6334/head:pull/6334

PR: https://git.openjdk.java.net/jdk/pull/6334

From sspitsyn at openjdk.java.net  Wed Nov 10 12:44:41 2021
From: sspitsyn at openjdk.java.net (Serguei Spitsyn)
Date: Wed, 10 Nov 2021 12:44:41 GMT
Subject: RFR: 8274903: Zero: Support AsyncGetCallTrace
In-Reply-To: <JjNvKdaMic8QCxlSJG-pmw0Ru9eLqwnf3KQ8xGVzETY=.4d44f1e7-503a-4f08-8d62-3bf7eae74a49@github.com>
References: <JjNvKdaMic8QCxlSJG-pmw0Ru9eLqwnf3KQ8xGVzETY=.4d44f1e7-503a-4f08-8d62-3bf7eae74a49@github.com>
Message-ID: <7gYL85rBe8eKvM0anhb3qhZ5Y7xaUFsWwD9JeO1AioI=.b818affe-36fd-404a-8d6b-45ab93c8fab3@github.com>

On Thu, 7 Oct 2021 12:42:48 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> This is a Zero infrastructure improvement that makes Zero VM work with AsyncGetCallTrace, and by extension, async-profiler.
> 
> Zero is quite odd in stack management. The "real" stack actually contains the C++ Interpreter and the rest of VM code. The Java stack is reported through the usual "frame" mechanism the rest of VM uses to get the mapping from Template Interpreter, stub, and compiled code. So, to support Java-centric AsyncGetCallTrace, we t "only" need Zero to report the proper Java frames from its ZeroStack from the profiling/signal handlers. 
> 
> Additional testing:
>  - [x] Linux x86_64 Zero `serviceability/AsyncGetCallTrace` now pass
>  - [x] Linux x86_64 Zero works with `async-profiler`

src/hotspot/cpu/zero/frame_zero.cpp line 174:

> 172: 
> 173:   // validate locals
> 174:   address locals =  (address) *interpreter_frame_locals_addr();

Unneeded spaces around '(address)'.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5848

From sspitsyn at openjdk.java.net  Wed Nov 10 12:50:38 2021
From: sspitsyn at openjdk.java.net (Serguei Spitsyn)
Date: Wed, 10 Nov 2021 12:50:38 GMT
Subject: RFR: 8274903: Zero: Support AsyncGetCallTrace
In-Reply-To: <JjNvKdaMic8QCxlSJG-pmw0Ru9eLqwnf3KQ8xGVzETY=.4d44f1e7-503a-4f08-8d62-3bf7eae74a49@github.com>
References: <JjNvKdaMic8QCxlSJG-pmw0Ru9eLqwnf3KQ8xGVzETY=.4d44f1e7-503a-4f08-8d62-3bf7eae74a49@github.com>
Message-ID: <68Lgv_Hwls0iUcUZwRMANWQi7TEYT4K1XFPRZyB071o=.33d613b4-207f-41b8-b976-6edb4ba9eb48@github.com>

On Thu, 7 Oct 2021 12:42:48 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> This is a Zero infrastructure improvement that makes Zero VM work with AsyncGetCallTrace, and by extension, async-profiler.
> 
> Zero is quite odd in stack management. The "real" stack actually contains the C++ Interpreter and the rest of VM code. The Java stack is reported through the usual "frame" mechanism the rest of VM uses to get the mapping from Template Interpreter, stub, and compiled code. So, to support Java-centric AsyncGetCallTrace, we t "only" need Zero to report the proper Java frames from its ZeroStack from the profiling/signal handlers. 
> 
> Additional testing:
>  - [x] Linux x86_64 Zero `serviceability/AsyncGetCallTrace` now pass
>  - [x] Linux x86_64 Zero works with `async-profiler`

Hi Aleksey,
Thank you for the update. It looks pretty good to me.
I've inlined a couple of minor comments.
Also, I hope, you will update the copyright years.
Thanks,
Serguei

src/hotspot/share/prims/forte.cpp line 348:

> 346:     return false;
> 347:   }
> 348: #endif

Could you, please, add some simple comments explaining each case at lines:  325, 329 and 336?

-------------

Marked as reviewed by sspitsyn (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/5848

From rkennke at openjdk.java.net  Wed Nov 10 12:52:08 2021
From: rkennke at openjdk.java.net (Roman Kennke)
Date: Wed, 10 Nov 2021 12:52:08 GMT
Subject: RFR: 8275527: Refactor forward pointer access [v5]
In-Reply-To: <lLd1nmhXCgBhAySmq81KrMplWUSWMCJS4OybGZuMjco=.4012548b-6f1d-44ee-a59c-21f1077cba01@github.com>
References: <lLd1nmhXCgBhAySmq81KrMplWUSWMCJS4OybGZuMjco=.4012548b-6f1d-44ee-a59c-21f1077cba01@github.com>
Message-ID: <Go6Jy3r74trkRi0_l3mXStP7uky_iN4C6Q3SF5fmkvE=.d754aa25-b4a8-4da0-a9b3-978616814577@github.com>

> Accessing the forward pointer is currently a little inconsistent. Some code paths call oopDesc::forwardee() / oopDesc::is_forwarded(), some code paths call forwardee() and check it for ==/!= NULL, some code paths even call markWord::decode_pointer() and markWord::is_marked() instead.
> 
> This change attempts to make the situation more consistent. For simple cases it preserves oopDesc::forwardee() / is_forwarded(), some cases need to use the markWord for consistency in concurrent GC, they now use markWord::forwardee() and markWord::is_forwarded(). Also, checking whether or not an object is forwarded is now consistently done using is_forwarded() and not by checking forwardee ==/!= NULL. This also resolves the mess in G1 full GC that changes not-forwarded objects to have a NULL (fake-) pointer. This is not necessary, because we can just as well use the lock bits to determine whether or not the object is forwarded.
> 
> Testing:
>  - [x] tier
>  - [x] tier2
>  - [x] hotspot_gc

Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits:

 - Merge branch 'master' into optimize-fwdptr
 - Don't use forwarded terminology in markWord
 - Move forward impl into markWord and add assert
 - Fix Parallel GC mistake
 - Revert unnecessary changes
 - Update some copyright headers
 - Add missing includes
 - Merge branch 'master' into optimize-fwdptr
 - Add missing includes
 - Rename mwd -> fwd
 - ... and 4 more: https://git.openjdk.java.net/jdk/compare/a0b84453...d63962a3

-------------

Changes: https://git.openjdk.java.net/jdk/pull/5955/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5955&range=04
  Stats: 46 lines in 9 files changed: 4 ins; 26 del; 16 mod
  Patch: https://git.openjdk.java.net/jdk/pull/5955.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/5955/head:pull/5955

PR: https://git.openjdk.java.net/jdk/pull/5955

From rkennke at openjdk.java.net  Wed Nov 10 12:52:13 2021
From: rkennke at openjdk.java.net (Roman Kennke)
Date: Wed, 10 Nov 2021 12:52:13 GMT
Subject: RFR: 8275527: Refactor forward pointer access [v4]
In-Reply-To: <L12GPa4-qxtJlMxtEOEpiVc4v1TJ_-DwFqC5DWOarI4=.35af6235-3555-41e6-b0c6-67e011a83829@github.com>
References: <lLd1nmhXCgBhAySmq81KrMplWUSWMCJS4OybGZuMjco=.4012548b-6f1d-44ee-a59c-21f1077cba01@github.com>
 <tu8-LJVLLI-0yU9Fsvdnqkx9sTkfarZu3XNIElJ0kak=.6c3578d7-0a9a-447e-a167-27c412ac200f@github.com>
 <L12GPa4-qxtJlMxtEOEpiVc4v1TJ_-DwFqC5DWOarI4=.35af6235-3555-41e6-b0c6-67e011a83829@github.com>
Message-ID: <SGF7uMA7SPhrRgfwADj4Zzge8NjfF5Gc2V4gzYt8JFo=.fd35c039-67b3-4539-824e-281671e2ffef@github.com>

On Mon, 1 Nov 2021 09:25:52 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:

>> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Move forward impl into markWord and add assert
>
> src/hotspot/share/oops/markWord.hpp line 253:
> 
>> 251:     return cast_to_oop(decode_pointer());
>> 252:   }
>> 253: };
> 
> This brings the forwarded/forwardee terminology into the markWord. The markWord was previously decoupled from those to concepts. I would personally let those function names stay in oopDesc and not leak down into the markWord. If you do want to keep it here, could you update the comments at the top that describes the bits?
> 
> //    [ptr             | 11]  marked             used to mark an object

Yeah, I am not quite sure about this. We have a couple of places where we need to use the markWord direcly, and they read m.is_marked() (when it really means is_forwarded, even though it's the same in the implementation), and then goes on to cast_to_oop(m.decode_pointer()) which reads more ugly than simply m.forwardee() which also comes with an assert and the cast.

I reverted the markWord change and related call-sites now. Maybe this warrants more thinking/discussion.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5955

From aph at openjdk.java.net  Wed Nov 10 13:14:43 2021
From: aph at openjdk.java.net (Andrew Haley)
Date: Wed, 10 Nov 2021 13:14:43 GMT
Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64
In-Reply-To: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
Message-ID: <S8-a1M4C5zZEj6OaCMj3aJkD92Ld8p6nECmq-fTYGkE=.e84e2cff-182f-4ac2-8213-7091917c2d89@github.com>

On Wed, 10 Nov 2021 12:32:53 GMT, Alan Hayward <duke at openjdk.java.net> wrote:

> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One
> of its uses is to protect against ROP based attacks. This is done by
> signing the Link Register whenever it is stored on the stack, and
> authenticating the value when it is loaded back from the stack. If an
> attacker were to try to change control flow by editing the stack then
> the authentication check of the Link Register will fail, causing a
> segfault when the function returns.
> 
> On a system with PAC enabled, it is expected that all applications will
> be compiled with ROP protection. Fedora 33 and upwards already provide
> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of
> PAC instructions that exist in the NOP space - on hardware without PAC,
> these instructions act as NOPs, allowing backward compatibility for
> negligible performance cost (2 NOPs per non-leaf function).
> 
> Hardware is currently limited to the Apple M1 MacBooks. All testing has
> been done within a Fedora Docker image. A run of SpecJVM showed no
> difference to that of noise - which was surprising.
> 
> The most important part of this patch is simply compiling using branch
> protection provided by GCC/LLVM. This protects all C++ code from being
> used in ROP attacks, removing all static ROP gadgets from use.
> 
> The remainder of the patch adds ROP protection to runtime generated
> code, in both stubs and compiled Java code. Attacks here are much harder
> as ROP gadgets must be found dynamically at runtime. If/when AOT
> compilation is added to JDK, then all stubs and compiled Java will be
> susceptible ROP gadgets being found by static analysis and therefore
> potentially as vulnerable as C++ code.
> 
> There are a number of places where the VM changes control flow by
> rewriting the stack or otherwise. I?ve done some analysis as to how
> these could also be used for attacks (which I didn?t want to post here).
> These areas can be protected ensuring the pointers to various stubs and
> entry points are stored in memory as signed pointers. These changes are
> simple to make (they can be reduced to a type change in common code and
> a few addition sign/auth calls in the backend), but there a lot of them
> and the total code change is fairly large. I?m happy to provide a few
> work in progress patches.
> 
> In order to match the security benefits of the Apple Arm64e ABI across
> the whole of JDK, then all the changes mentioned above would be
> required.

Gosh. This is going to take some time to review, and will need at least two reviewers.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6334

From aph at openjdk.java.net  Wed Nov 10 13:25:40 2021
From: aph at openjdk.java.net (Andrew Haley)
Date: Wed, 10 Nov 2021 13:25:40 GMT
Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64
In-Reply-To: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
Message-ID: <SBwk5kV3fu-jhOxb6OMecdWueqzVp_1_lDGjv7gr4ME=.12b13d5e-d020-4d2f-951d-a4934fccba2f@github.com>

On Wed, 10 Nov 2021 12:32:53 GMT, Alan Hayward <duke at openjdk.java.net> wrote:

> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One
> of its uses is to protect against ROP based attacks. This is done by
> signing the Link Register whenever it is stored on the stack, and
> authenticating the value when it is loaded back from the stack. If an
> attacker were to try to change control flow by editing the stack then
> the authentication check of the Link Register will fail, causing a
> segfault when the function returns.
> 
> On a system with PAC enabled, it is expected that all applications will
> be compiled with ROP protection. Fedora 33 and upwards already provide
> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of
> PAC instructions that exist in the NOP space - on hardware without PAC,
> these instructions act as NOPs, allowing backward compatibility for
> negligible performance cost (2 NOPs per non-leaf function).
> 
> Hardware is currently limited to the Apple M1 MacBooks. All testing has
> been done within a Fedora Docker image. A run of SpecJVM showed no
> difference to that of noise - which was surprising.
> 
> The most important part of this patch is simply compiling using branch
> protection provided by GCC/LLVM. This protects all C++ code from being
> used in ROP attacks, removing all static ROP gadgets from use.
> 
> The remainder of the patch adds ROP protection to runtime generated
> code, in both stubs and compiled Java code. Attacks here are much harder
> as ROP gadgets must be found dynamically at runtime. If/when AOT
> compilation is added to JDK, then all stubs and compiled Java will be
> susceptible ROP gadgets being found by static analysis and therefore
> potentially as vulnerable as C++ code.
> 
> There are a number of places where the VM changes control flow by
> rewriting the stack or otherwise. I?ve done some analysis as to how
> these could also be used for attacks (which I didn?t want to post here).
> These areas can be protected ensuring the pointers to various stubs and
> entry points are stored in memory as signed pointers. These changes are
> simple to make (they can be reduced to a type change in common code and
> a few addition sign/auth calls in the backend), but there a lot of them
> and the total code change is fairly large. I?m happy to provide a few
> work in progress patches.
> 
> In order to match the security benefits of the Apple Arm64e ABI across
> the whole of JDK, then all the changes mentioned above would be
> required.

src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5185:

> 5183: // ROP Protection
> 5184: 
> 5185: void MacroAssembler::protect_return_address() {

We need proper, full, detailed comments about what these functions do, with reference to primary AArch64 documentation.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6334

From erikj at openjdk.java.net  Wed Nov 10 13:34:38 2021
From: erikj at openjdk.java.net (Erik Joelsson)
Date: Wed, 10 Nov 2021 13:34:38 GMT
Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64
In-Reply-To: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
Message-ID: <9inSsWwjEQnZT_x-9GirjL-Avmycfnyj6yZoqCJ8M4g=.770fc4bb-6731-4eb2-83a7-cf018438ed7e@github.com>

On Wed, 10 Nov 2021 12:32:53 GMT, Alan Hayward <duke at openjdk.java.net> wrote:

> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One
> of its uses is to protect against ROP based attacks. This is done by
> signing the Link Register whenever it is stored on the stack, and
> authenticating the value when it is loaded back from the stack. If an
> attacker were to try to change control flow by editing the stack then
> the authentication check of the Link Register will fail, causing a
> segfault when the function returns.
> 
> On a system with PAC enabled, it is expected that all applications will
> be compiled with ROP protection. Fedora 33 and upwards already provide
> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of
> PAC instructions that exist in the NOP space - on hardware without PAC,
> these instructions act as NOPs, allowing backward compatibility for
> negligible performance cost (2 NOPs per non-leaf function).
> 
> Hardware is currently limited to the Apple M1 MacBooks. All testing has
> been done within a Fedora Docker image. A run of SpecJVM showed no
> difference to that of noise - which was surprising.
> 
> The most important part of this patch is simply compiling using branch
> protection provided by GCC/LLVM. This protects all C++ code from being
> used in ROP attacks, removing all static ROP gadgets from use.
> 
> The remainder of the patch adds ROP protection to runtime generated
> code, in both stubs and compiled Java code. Attacks here are much harder
> as ROP gadgets must be found dynamically at runtime. If/when AOT
> compilation is added to JDK, then all stubs and compiled Java will be
> susceptible ROP gadgets being found by static analysis and therefore
> potentially as vulnerable as C++ code.
> 
> There are a number of places where the VM changes control flow by
> rewriting the stack or otherwise. I?ve done some analysis as to how
> these could also be used for attacks (which I didn?t want to post here).
> These areas can be protected ensuring the pointers to various stubs and
> entry points are stored in memory as signed pointers. These changes are
> simple to make (they can be reduced to a type change in common code and
> a few addition sign/auth calls in the backend), but there a lot of them
> and the total code change is fairly large. I?m happy to provide a few
> work in progress patches.
> 
> In order to match the security benefits of the Apple Arm64e ABI across
> the whole of JDK, then all the changes mentioned above would be
> required.

Build change looks good, but I can't comment on the code changes.

-------------

Marked as reviewed by erikj (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6334

From duke at openjdk.java.net  Wed Nov 10 13:34:39 2021
From: duke at openjdk.java.net (Alan Hayward)
Date: Wed, 10 Nov 2021 13:34:39 GMT
Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64
In-Reply-To: <S8-a1M4C5zZEj6OaCMj3aJkD92Ld8p6nECmq-fTYGkE=.e84e2cff-182f-4ac2-8213-7091917c2d89@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
 <S8-a1M4C5zZEj6OaCMj3aJkD92Ld8p6nECmq-fTYGkE=.e84e2cff-182f-4ac2-8213-7091917c2d89@github.com>
Message-ID: <4yyBp8jgXpmK0ZyywX4mrjo5vfTWfy5CHV97fnX4-EE=.2bd02511-67cd-40a4-8fb0-a79b095f4bcd@github.com>

On Wed, 10 Nov 2021 13:11:21 GMT, Andrew Haley <aph at openjdk.org> wrote:

> Gosh. This is going to take some time to review, and will need at least two reviewers.

Sure. And thanks in advance.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6334

From aph at openjdk.java.net  Wed Nov 10 13:37:41 2021
From: aph at openjdk.java.net (Andrew Haley)
Date: Wed, 10 Nov 2021 13:37:41 GMT
Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64
In-Reply-To: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
Message-ID: <qyoqdCskYNR6Q1WG3fZP-XMWMdM1Uwg8k7nJhFQzoN0=.f41ee0da-eda2-40dd-99c5-9931964b6953@github.com>

On Wed, 10 Nov 2021 12:32:53 GMT, Alan Hayward <duke at openjdk.java.net> wrote:

> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One
> of its uses is to protect against ROP based attacks. This is done by
> signing the Link Register whenever it is stored on the stack, and
> authenticating the value when it is loaded back from the stack. If an
> attacker were to try to change control flow by editing the stack then
> the authentication check of the Link Register will fail, causing a
> segfault when the function returns.
> 
> On a system with PAC enabled, it is expected that all applications will
> be compiled with ROP protection. Fedora 33 and upwards already provide
> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of
> PAC instructions that exist in the NOP space - on hardware without PAC,
> these instructions act as NOPs, allowing backward compatibility for
> negligible performance cost (2 NOPs per non-leaf function).
> 
> Hardware is currently limited to the Apple M1 MacBooks. All testing has
> been done within a Fedora Docker image. A run of SpecJVM showed no
> difference to that of noise - which was surprising.
> 
> The most important part of this patch is simply compiling using branch
> protection provided by GCC/LLVM. This protects all C++ code from being
> used in ROP attacks, removing all static ROP gadgets from use.
> 
> The remainder of the patch adds ROP protection to runtime generated
> code, in both stubs and compiled Java code. Attacks here are much harder
> as ROP gadgets must be found dynamically at runtime. If/when AOT
> compilation is added to JDK, then all stubs and compiled Java will be
> susceptible ROP gadgets being found by static analysis and therefore
> potentially as vulnerable as C++ code.
> 
> There are a number of places where the VM changes control flow by
> rewriting the stack or otherwise. I?ve done some analysis as to how
> these could also be used for attacks (which I didn?t want to post here).
> These areas can be protected ensuring the pointers to various stubs and
> entry points are stored in memory as signed pointers. These changes are
> simple to make (they can be reduced to a type change in common code and
> a few addition sign/auth calls in the backend), but there a lot of them
> and the total code change is fairly large. I?m happy to provide a few
> work in progress patches.
> 
> In order to match the security benefits of the Apple Arm64e ABI across
> the whole of JDK, then all the changes mentioned above would be
> required.

src/hotspot/os_cpu/bsd_aarch64/pauth_bsd_aarch64.inline.hpp line 25:

> 23:  */
> 24: 
> 25: #ifndef OS_CPU_BSD_AARCH64_PAUTH_BSD_AARCH64_INLINE_HPP

Are these two files different enough to separate them for BSD and Linux?

-------------

PR: https://git.openjdk.java.net/jdk/pull/6334

From ihse at openjdk.java.net  Wed Nov 10 14:37:40 2021
From: ihse at openjdk.java.net (Magnus Ihse Bursie)
Date: Wed, 10 Nov 2021 14:37:40 GMT
Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64
In-Reply-To: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
Message-ID: <eIf-bTGnq-D8wblqRD7Ldwfv5pWpFneBzaia3TeV1D0=.5852688c-7bb3-4561-8854-7c9aef24a9c4@github.com>

On Wed, 10 Nov 2021 12:32:53 GMT, Alan Hayward <duke at openjdk.java.net> wrote:

> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One
> of its uses is to protect against ROP based attacks. This is done by
> signing the Link Register whenever it is stored on the stack, and
> authenticating the value when it is loaded back from the stack. If an
> attacker were to try to change control flow by editing the stack then
> the authentication check of the Link Register will fail, causing a
> segfault when the function returns.
> 
> On a system with PAC enabled, it is expected that all applications will
> be compiled with ROP protection. Fedora 33 and upwards already provide
> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of
> PAC instructions that exist in the NOP space - on hardware without PAC,
> these instructions act as NOPs, allowing backward compatibility for
> negligible performance cost (2 NOPs per non-leaf function).
> 
> Hardware is currently limited to the Apple M1 MacBooks. All testing has
> been done within a Fedora Docker image. A run of SpecJVM showed no
> difference to that of noise - which was surprising.
> 
> The most important part of this patch is simply compiling using branch
> protection provided by GCC/LLVM. This protects all C++ code from being
> used in ROP attacks, removing all static ROP gadgets from use.
> 
> The remainder of the patch adds ROP protection to runtime generated
> code, in both stubs and compiled Java code. Attacks here are much harder
> as ROP gadgets must be found dynamically at runtime. If/when AOT
> compilation is added to JDK, then all stubs and compiled Java will be
> susceptible ROP gadgets being found by static analysis and therefore
> potentially as vulnerable as C++ code.
> 
> There are a number of places where the VM changes control flow by
> rewriting the stack or otherwise. I?ve done some analysis as to how
> these could also be used for attacks (which I didn?t want to post here).
> These areas can be protected ensuring the pointers to various stubs and
> entry points are stored in memory as signed pointers. These changes are
> simple to make (they can be reduced to a type change in common code and
> a few addition sign/auth calls in the backend), but there a lot of them
> and the total code change is fairly large. I?m happy to provide a few
> work in progress patches.
> 
> In order to match the security benefits of the Apple Arm64e ABI across
> the whole of JDK, then all the changes mentioned above would be
> required.

Changes requested by ihse (Reviewer).

make/autoconf/flags-cflags.m4 line 899:

> 897:   elif test "x$TOOLCHAIN_TYPE" = xgcc || test "x$TOOLCHAIN_TYPE" = xclang; then
> 898:     # Check that the compiler actually supports branch protection.
> 899:     FLAGS_COMPILER_CHECK_ARGUMENTS(ARGUMENT: [${BRANCH_PROTECTION_FLAG}],

This branch misses a AC_MSG_RESULT, which prints the newline. The resulting output will look messy.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6334

From duke at openjdk.java.net  Wed Nov 10 15:04:39 2021
From: duke at openjdk.java.net (Alan Hayward)
Date: Wed, 10 Nov 2021 15:04:39 GMT
Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64
In-Reply-To: <qyoqdCskYNR6Q1WG3fZP-XMWMdM1Uwg8k7nJhFQzoN0=.f41ee0da-eda2-40dd-99c5-9931964b6953@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
 <qyoqdCskYNR6Q1WG3fZP-XMWMdM1Uwg8k7nJhFQzoN0=.f41ee0da-eda2-40dd-99c5-9931964b6953@github.com>
Message-ID: <AslxqdYQYOyLnckgXKHb9yB5_UyxQbliLC2DeVIHpG8=.14fb94d9-2cde-4343-b4f1-f7c7c7eeb44f@github.com>

On Wed, 10 Nov 2021 13:34:38 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One
>> of its uses is to protect against ROP based attacks. This is done by
>> signing the Link Register whenever it is stored on the stack, and
>> authenticating the value when it is loaded back from the stack. If an
>> attacker were to try to change control flow by editing the stack then
>> the authentication check of the Link Register will fail, causing a
>> segfault when the function returns.
>> 
>> On a system with PAC enabled, it is expected that all applications will
>> be compiled with ROP protection. Fedora 33 and upwards already provide
>> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of
>> PAC instructions that exist in the NOP space - on hardware without PAC,
>> these instructions act as NOPs, allowing backward compatibility for
>> negligible performance cost (2 NOPs per non-leaf function).
>> 
>> Hardware is currently limited to the Apple M1 MacBooks. All testing has
>> been done within a Fedora Docker image. A run of SpecJVM showed no
>> difference to that of noise - which was surprising.
>> 
>> The most important part of this patch is simply compiling using branch
>> protection provided by GCC/LLVM. This protects all C++ code from being
>> used in ROP attacks, removing all static ROP gadgets from use.
>> 
>> The remainder of the patch adds ROP protection to runtime generated
>> code, in both stubs and compiled Java code. Attacks here are much harder
>> as ROP gadgets must be found dynamically at runtime. If/when AOT
>> compilation is added to JDK, then all stubs and compiled Java will be
>> susceptible ROP gadgets being found by static analysis and therefore
>> potentially as vulnerable as C++ code.
>> 
>> There are a number of places where the VM changes control flow by
>> rewriting the stack or otherwise. I?ve done some analysis as to how
>> these could also be used for attacks (which I didn?t want to post here).
>> These areas can be protected ensuring the pointers to various stubs and
>> entry points are stored in memory as signed pointers. These changes are
>> simple to make (they can be reduced to a type change in common code and
>> a few addition sign/auth calls in the backend), but there a lot of them
>> and the total code change is fairly large. I?m happy to provide a few
>> work in progress patches.
>> 
>> In order to match the security benefits of the Apple Arm64e ABI across
>> the whole of JDK, then all the changes mentioned above would be
>> required.
>
> src/hotspot/os_cpu/bsd_aarch64/pauth_bsd_aarch64.inline.hpp line 25:
> 
>> 23:  */
>> 24: 
>> 25: #ifndef OS_CPU_BSD_AARCH64_PAUTH_BSD_AARCH64_INLINE_HPP
> 
> Are these two files different enough to separate them for BSD and Linux?

My motivation was to avoid having any ifdefs - but we need one anyway for the apple ifdef.

If I merged the two we would end up with just the contents of the BSD version of the file.

There is also the windows version of the file, which for now has empty functions. If PAC in windows is added, that'll either use the same code or maybe Windows will provide an API (like the Apple one). Merging everything would mean windows gains the UseROPProtection check.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6334

From adinn at openjdk.java.net  Wed Nov 10 15:27:41 2021
From: adinn at openjdk.java.net (Andrew Dinn)
Date: Wed, 10 Nov 2021 15:27:41 GMT
Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64
In-Reply-To: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
Message-ID: <a9MvJGV8QzQ5WDHb06t4NYL9eqfJ4Tgix6L-Ur5Igl8=.fd6072ce-7075-4131-8031-afba82a46a89@github.com>

On Wed, 10 Nov 2021 12:32:53 GMT, Alan Hayward <duke at openjdk.java.net> wrote:

> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One
> of its uses is to protect against ROP based attacks. This is done by
> signing the Link Register whenever it is stored on the stack, and
> authenticating the value when it is loaded back from the stack. If an
> attacker were to try to change control flow by editing the stack then
> the authentication check of the Link Register will fail, causing a
> segfault when the function returns.
> 
> On a system with PAC enabled, it is expected that all applications will
> be compiled with ROP protection. Fedora 33 and upwards already provide
> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of
> PAC instructions that exist in the NOP space - on hardware without PAC,
> these instructions act as NOPs, allowing backward compatibility for
> negligible performance cost (2 NOPs per non-leaf function).
> 
> Hardware is currently limited to the Apple M1 MacBooks. All testing has
> been done within a Fedora Docker image. A run of SpecJVM showed no
> difference to that of noise - which was surprising.
> 
> The most important part of this patch is simply compiling using branch
> protection provided by GCC/LLVM. This protects all C++ code from being
> used in ROP attacks, removing all static ROP gadgets from use.
> 
> The remainder of the patch adds ROP protection to runtime generated
> code, in both stubs and compiled Java code. Attacks here are much harder
> as ROP gadgets must be found dynamically at runtime. If/when AOT
> compilation is added to JDK, then all stubs and compiled Java will be
> susceptible ROP gadgets being found by static analysis and therefore
> potentially as vulnerable as C++ code.
> 
> There are a number of places where the VM changes control flow by
> rewriting the stack or otherwise. I?ve done some analysis as to how
> these could also be used for attacks (which I didn?t want to post here).
> These areas can be protected ensuring the pointers to various stubs and
> entry points are stored in memory as signed pointers. These changes are
> simple to make (they can be reduced to a type change in common code and
> a few addition sign/auth calls in the backend), but there a lot of them
> and the total code change is fairly large. I?m happy to provide a few
> work in progress patches.
> 
> In order to match the security benefits of the Apple Arm64e ABI across
> the whole of JDK, then all the changes mentioned above would be
> required.

I am also reviewing this.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6334

From duke at openjdk.java.net  Wed Nov 10 16:03:34 2021
From: duke at openjdk.java.net (Alan Hayward)
Date: Wed, 10 Nov 2021 16:03:34 GMT
Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64
In-Reply-To: <eIf-bTGnq-D8wblqRD7Ldwfv5pWpFneBzaia3TeV1D0=.5852688c-7bb3-4561-8854-7c9aef24a9c4@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
 <eIf-bTGnq-D8wblqRD7Ldwfv5pWpFneBzaia3TeV1D0=.5852688c-7bb3-4561-8854-7c9aef24a9c4@github.com>
Message-ID: <Q4JOlS_gKByw3pu0aQK0K-vKmAgKbnmNT3b45UBVIF8=.43fc2450-e8d5-4dd3-960a-4526c313002a@github.com>

On Wed, 10 Nov 2021 14:34:18 GMT, Magnus Ihse Bursie <ihse at openjdk.org> wrote:

>> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One
>> of its uses is to protect against ROP based attacks. This is done by
>> signing the Link Register whenever it is stored on the stack, and
>> authenticating the value when it is loaded back from the stack. If an
>> attacker were to try to change control flow by editing the stack then
>> the authentication check of the Link Register will fail, causing a
>> segfault when the function returns.
>> 
>> On a system with PAC enabled, it is expected that all applications will
>> be compiled with ROP protection. Fedora 33 and upwards already provide
>> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of
>> PAC instructions that exist in the NOP space - on hardware without PAC,
>> these instructions act as NOPs, allowing backward compatibility for
>> negligible performance cost (2 NOPs per non-leaf function).
>> 
>> Hardware is currently limited to the Apple M1 MacBooks. All testing has
>> been done within a Fedora Docker image. A run of SpecJVM showed no
>> difference to that of noise - which was surprising.
>> 
>> The most important part of this patch is simply compiling using branch
>> protection provided by GCC/LLVM. This protects all C++ code from being
>> used in ROP attacks, removing all static ROP gadgets from use.
>> 
>> The remainder of the patch adds ROP protection to runtime generated
>> code, in both stubs and compiled Java code. Attacks here are much harder
>> as ROP gadgets must be found dynamically at runtime. If/when AOT
>> compilation is added to JDK, then all stubs and compiled Java will be
>> susceptible ROP gadgets being found by static analysis and therefore
>> potentially as vulnerable as C++ code.
>> 
>> There are a number of places where the VM changes control flow by
>> rewriting the stack or otherwise. I?ve done some analysis as to how
>> these could also be used for attacks (which I didn?t want to post here).
>> These areas can be protected ensuring the pointers to various stubs and
>> entry points are stored in memory as signed pointers. These changes are
>> simple to make (they can be reduced to a type change in common code and
>> a few addition sign/auth calls in the backend), but there a lot of them
>> and the total code change is fairly large. I?m happy to provide a few
>> work in progress patches.
>> 
>> In order to match the security benefits of the Apple Arm64e ABI across
>> the whole of JDK, then all the changes mentioned above would be
>> required.
>
> make/autoconf/flags-cflags.m4 line 899:
> 
>> 897:   elif test "x$TOOLCHAIN_TYPE" = xgcc || test "x$TOOLCHAIN_TYPE" = xclang; then
>> 898:     # Check that the compiler actually supports branch protection.
>> 899:     FLAGS_COMPILER_CHECK_ARGUMENTS(ARGUMENT: [${BRANCH_PROTECTION_FLAG}],
> 
> This branch misses a AC_MSG_RESULT, which prints the newline. The resulting output will look messy.

Looking at this block of code again, I've got far too many outputted lines compared to other features. Removing some means I can simplify the code too, so I'll do that.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6334

From shade at openjdk.java.net  Wed Nov 10 16:26:10 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Wed, 10 Nov 2021 16:26:10 GMT
Subject: RFR: 8274903: Zero: Support AsyncGetCallTrace [v2]
In-Reply-To: <JjNvKdaMic8QCxlSJG-pmw0Ru9eLqwnf3KQ8xGVzETY=.4d44f1e7-503a-4f08-8d62-3bf7eae74a49@github.com>
References: <JjNvKdaMic8QCxlSJG-pmw0Ru9eLqwnf3KQ8xGVzETY=.4d44f1e7-503a-4f08-8d62-3bf7eae74a49@github.com>
Message-ID: <tQT5apamgECK6I32D6VC6IIHslMfw6Zw6_AoLowXo8c=.ad1a382d-8e41-4ddb-9fc5-0ea59d03c3c0@github.com>

> This is a Zero infrastructure improvement that makes Zero VM work with AsyncGetCallTrace, and by extension, async-profiler.
> 
> Zero is quite odd in stack management. The "real" stack actually contains the C++ Interpreter and the rest of VM code. The Java stack is reported through the usual "frame" mechanism the rest of VM uses to get the mapping from Template Interpreter, stub, and compiled code. So, to support Java-centric AsyncGetCallTrace, we t "only" need Zero to report the proper Java frames from its ZeroStack from the profiling/signal handlers. 
> 
> Additional testing:
>  - [x] Linux x86_64 Zero `serviceability/AsyncGetCallTrace` now pass
>  - [x] Linux x86_64 Zero works with `async-profiler`

Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:

 - Review feedback
 - Merge branch 'master' into JDK-8274903-zero-asyncgetcalltrace
 - Initial work: runs async-profiler successfully

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/5848/files
  - new: https://git.openjdk.java.net/jdk/pull/5848/files/5575516c..8e25258d

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5848&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5848&range=00-01

  Stats: 888778 lines in 1818 files changed: 455790 ins; 426281 del; 6707 mod
  Patch: https://git.openjdk.java.net/jdk/pull/5848.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/5848/head:pull/5848

PR: https://git.openjdk.java.net/jdk/pull/5848

From shade at openjdk.java.net  Wed Nov 10 16:26:16 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Wed, 10 Nov 2021 16:26:16 GMT
Subject: RFR: 8274903: Zero: Support AsyncGetCallTrace [v2]
In-Reply-To: <7gYL85rBe8eKvM0anhb3qhZ5Y7xaUFsWwD9JeO1AioI=.b818affe-36fd-404a-8d6b-45ab93c8fab3@github.com>
References: <JjNvKdaMic8QCxlSJG-pmw0Ru9eLqwnf3KQ8xGVzETY=.4d44f1e7-503a-4f08-8d62-3bf7eae74a49@github.com>
 <7gYL85rBe8eKvM0anhb3qhZ5Y7xaUFsWwD9JeO1AioI=.b818affe-36fd-404a-8d6b-45ab93c8fab3@github.com>
Message-ID: <5Z7ibS9XSydk2okYR911xl6Q0GSz7gEzZbp0MW7_Edo=.0a86e320-8dd3-4216-ae79-818df7cd6b38@github.com>

On Wed, 10 Nov 2021 12:41:44 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote:

>> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
>> 
>>  - Review feedback
>>  - Merge branch 'master' into JDK-8274903-zero-asyncgetcalltrace
>>  - Initial work: runs async-profiler successfully
>
> src/hotspot/cpu/zero/frame_zero.cpp line 174:
> 
>> 172: 
>> 173:   // validate locals
>> 174:   address locals =  (address) *interpreter_frame_locals_addr();
> 
> Unneeded spaces around '(address)'.

Fixed.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5848

From shade at openjdk.java.net  Wed Nov 10 16:38:57 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Wed, 10 Nov 2021 16:38:57 GMT
Subject: RFR: 8274903: Zero: Support AsyncGetCallTrace [v3]
In-Reply-To: <JjNvKdaMic8QCxlSJG-pmw0Ru9eLqwnf3KQ8xGVzETY=.4d44f1e7-503a-4f08-8d62-3bf7eae74a49@github.com>
References: <JjNvKdaMic8QCxlSJG-pmw0Ru9eLqwnf3KQ8xGVzETY=.4d44f1e7-503a-4f08-8d62-3bf7eae74a49@github.com>
Message-ID: <t6nZoibe2zyKRfM8pbTgdwA5yRBEfQomcoJMV3h8g4g=.cbe7cc85-d3a8-4d23-80df-957d15cdc989@github.com>

> This is a Zero infrastructure improvement that makes Zero VM work with AsyncGetCallTrace, and by extension, async-profiler.
> 
> Zero is quite odd in stack management. The "real" stack actually contains the C++ Interpreter and the rest of VM code. The Java stack is reported through the usual "frame" mechanism the rest of VM uses to get the mapping from Template Interpreter, stub, and compiled code. So, to support Java-centric AsyncGetCallTrace, we t "only" need Zero to report the proper Java frames from its ZeroStack from the profiling/signal handlers. 
> 
> Additional testing:
>  - [x] Linux x86_64 Zero `serviceability/AsyncGetCallTrace` now pass
>  - [x] Linux x86_64 Zero works with `async-profiler`

Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:

  More reviews

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/5848/files
  - new: https://git.openjdk.java.net/jdk/pull/5848/files/8e25258d..68ef4b63

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5848&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5848&range=01-02

  Stats: 7 lines in 1 file changed: 7 ins; 0 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/5848.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/5848/head:pull/5848

PR: https://git.openjdk.java.net/jdk/pull/5848

From shade at openjdk.java.net  Wed Nov 10 16:38:58 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Wed, 10 Nov 2021 16:38:58 GMT
Subject: RFR: 8274903: Zero: Support AsyncGetCallTrace [v3]
In-Reply-To: <68Lgv_Hwls0iUcUZwRMANWQi7TEYT4K1XFPRZyB071o=.33d613b4-207f-41b8-b976-6edb4ba9eb48@github.com>
References: <JjNvKdaMic8QCxlSJG-pmw0Ru9eLqwnf3KQ8xGVzETY=.4d44f1e7-503a-4f08-8d62-3bf7eae74a49@github.com>
 <68Lgv_Hwls0iUcUZwRMANWQi7TEYT4K1XFPRZyB071o=.33d613b4-207f-41b8-b976-6edb4ba9eb48@github.com>
Message-ID: <EUtlBLtPjZJUZ8iVIy1i0XJP40tYDthFiTBVz6h2qDs=.850ea68e-db33-4773-baf2-5eb01720b145@github.com>

On Wed, 10 Nov 2021 12:44:16 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote:

>> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   More reviews
>
> src/hotspot/share/prims/forte.cpp line 348:
> 
>> 346:     return false;
>> 347:   }
>> 348: #endif
> 
> Could you, please, add some simple comments explaining each case at lines:  325, 329 and 336?

See new commits!

-------------

PR: https://git.openjdk.java.net/jdk/pull/5848

From rkennke at openjdk.java.net  Wed Nov 10 16:55:54 2021
From: rkennke at openjdk.java.net (Roman Kennke)
Date: Wed, 10 Nov 2021 16:55:54 GMT
Subject: RFR: 8276901: Implement UseHeavyMonitors consistently
Message-ID: <CqBVDMLRhPl93f8gcv5VX0Un-fFF5BQtGEwrczIJdp4=.6a50ad8f-3bbe-4075-9f8c-ee4da37e9f8b@github.com>

The flag UseHeavyMonitors seems to imply that it makes Hotspot always use inflated monitors, rather than stack locks. However, it is only implemented in the interpreter that way. When it calls into runtime, it would still happily stack-lock. Even worse, C1 uses another flag UseFastLocking to achieve something similar (with the same caveat that runtime would stack-lock anyway). C2 doesn't have any such mechanism at all.
I would like to experiment with disabling stack-locking, and thus, having this flag work as expected would seem very useful.

The change removes the C1 flag UseFastLocking, and replaces its uses with equivalent (i.e. inverted) UseHeavyMonitors instead. I think it makes sense to make UseHeavyMonitors develop (I wouldn't want anybody to use this in production, not currently without this change, and not with this change). I also added a flag VerifyHeavyMonitors to be able to verify that stack-locking is really disabled. We can't currently verify this uncondiftionally (e.g. in debug builds) because all non-x86_64 platforms would need work.

Testing:
 - [x] tier1
 - [x] tier2
 - [ ] tier3
 - [ ] tier4

-------------

Commit messages:
 - Change VerifyHeavyMonitors flag to diagnostic
 - 8276901: Implement UseHeavyMonitors consistently

Changes: https://git.openjdk.java.net/jdk/pull/6320/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6320&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8276901
  Stats: 190 lines in 12 files changed: 54 ins; 18 del; 118 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6320.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6320/head:pull/6320

PR: https://git.openjdk.java.net/jdk/pull/6320

From mdoerr at openjdk.java.net  Wed Nov 10 17:12:44 2021
From: mdoerr at openjdk.java.net (Martin Doerr)
Date: Wed, 10 Nov 2021 17:12:44 GMT
Subject: RFR: 8273563: Improve performance of implicit exceptions with
 -XX:-OmitStackTraceInFastThrow [v6]
In-Reply-To: <0II1AGxTmud7vWbpWeKGm_vPr_yqFuVaauWEzBN_pMw=.f09fc7be-1833-431b-8353-161b9dad3cf4@github.com>
References: <V33bqOWqbYrr4ACfUn3SkS_JbOuICbX68z7jlX37pXQ=.a4a6b7ce-da22-440f-b3c7-d0a99a28fea0@github.com>
 <0II1AGxTmud7vWbpWeKGm_vPr_yqFuVaauWEzBN_pMw=.f09fc7be-1833-431b-8353-161b9dad3cf4@github.com>
Message-ID: <ErVlQs6sHdvx2AK0XhJQ8XBtlZognM5BLEQ3eisZi9U=.39e6778c-b618-493c-bbbe-0d6d1131c8f4@github.com>

On Thu, 4 Nov 2021 16:28:52 GMT, Volker Simonis <simonis at openjdk.org> wrote:

>> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening.
>> 
>> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274):
>> 
>>     public static boolean isAlpha(int c) {
>>         try {
>>             return IS_ALPHA[c];
>>         } catch (ArrayIndexOutOfBoundsException ex) {
>>             return false;
>>         }
>>     }
>> 
>> 
>> ### Solution
>> 
>> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code:
>> 
>> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions
>> Benchmark                 (exceptionProbability)  Mode  Cnt      Score      Error  Units
>> ImplicitExceptions.bench                     0.0  avgt    5      1.430 ?    0.353  ns/op
>> ImplicitExceptions.bench                    0.33  avgt    5   3563.038 ?   77.358  ns/op
>> ImplicitExceptions.bench                    0.66  avgt    5   8609.693 ? 1205.104  ns/op
>> ImplicitExceptions.bench                    1.00  avgt    5  12842.401 ? 1022.728  ns/op
>> 
>> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions
>> Benchmark                 (exceptionProbability)  Mode  Cnt      Score      Error  Units
>> ImplicitExceptions.bench                     0.0  avgt    5     1.432  ?    0.352  ns/op
>> ImplicitExceptions.bench                    0.33  avgt    5   355.723  ?   16.641  ns/op
>> ImplicitExceptions.bench                    0.66  avgt    5   887.068  ?  166.728  ns/op
>> ImplicitExceptions.bench                    1.00  avgt    5  1274.418  ?   88.235  ns/op
>> 
>> 
>> ### Implementation details
>> 
>> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default.
>> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `<init>` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. 
>> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`.
>> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call.
>> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster.
>
> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Add new WhiteBox functionality to sun/hotspot/WhiteBox.java as well to avoid warnings in the tests which are still using it.

Thanks for adding a test. Your new additions look basically good, but I have a few remarks and questions.

src/hotspot/share/prims/whitebox.cpp line 987:

> 985:     bool overflow = false;
> 986:     for (uint reason = 0; reason < mdo->trap_reason_limit(); reason++) {
> 987:       if (reason_str != NULL && !strcmp(reason_str, Deoptimization::trap_reason_name(reason))) {

Maybe the code would be better readable when checking `reason_str != NULL` first and then use 2 loops? Just a minor suggestion. Should only be done if readability is better.

src/hotspot/share/prims/whitebox.cpp line 1016:

> 1014:   }
> 1015:   ResourceMark rm(THREAD);
> 1016:   char *reason_str = (reason_obj == NULL) ?

I think we should use `const char*` as far as possible.

src/hotspot/share/runtime/deoptimization.cpp line 2695:

> 2693:   return 0;
> 2694: }
> 2695: 

Why do we need this? Is it a placeholder for a future enhancement? If so, a comment would at least be helpful.

test/hotspot/jtreg/compiler/exceptions/OptimizeImplicitExceptions.java line 78:

> 76:     private static final WhiteBox WB = WhiteBox.getWhiteBox();
> 77:     // Until JDK-8275908 is not fixed, null-pointer traps for invokes and array-store traps are not profiled in the interpreter.
> 78:     private static final boolean JDK8275908_fixed = false;

I don't know if that one should get fixed first, but I'm ok with your workaround. Would it make sense to add that bug id to this test's header?

-------------

PR: https://git.openjdk.java.net/jdk/pull/5488

From coleenp at openjdk.java.net  Wed Nov 10 17:24:47 2021
From: coleenp at openjdk.java.net (Coleen Phillimore)
Date: Wed, 10 Nov 2021 17:24:47 GMT
Subject: RFR: 8276658: Clean up JNI local handles code
Message-ID: <ghp2JMOJnBtdd8Gu4gCIp98JraT8n7YOLvAiAJIrQBU=.8cafc56a-cf56-4209-9ba0-636325123d9e@github.com>

JNI Local handles can only be created by JavaThread (there's an assert in make_local) but the fields are added to Thread.
Move the fields to JavaThread and adding JavaThread* argument.
Also, the global freelist isn't very useful now that global JNI handles don't use JNIHandleBlock, so the locking that claims incorrectly to block for safepoint is removed.
Lastly, there's at least 3 places that duplicate pushing a new JNIHandleBlock to the thread for temporarily adding JNI local handles. These have been moved to common code with a JNIHandleMark object, moved from jvmci code.
The commits are separate to help reviewing, but the entire change has been tested together with tier1-6.
The commits in this change have been performance tested individually and together with no meaningful differences from mainline.

-------------

Commit messages:
 - The VM Thread creates handles on the caller thread, unless it runs out then it allocates a block on its own thread, which it never cleans up.  Pass the caller thread to allocate_handle so that allocate_block will add to the right thread, which is a JavaThread.
 - Refactor pushing and popping JNIHandleBlocks.
 - Remove JNIHandleBlock global freelists and Mutex
 - Move active_handles to JavaThread.

Changes: https://git.openjdk.java.net/jdk/pull/6336/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6336&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8276658
  Stats: 426 lines in 25 files changed: 77 ins; 302 del; 47 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6336.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6336/head:pull/6336

PR: https://git.openjdk.java.net/jdk/pull/6336

From sspitsyn at openjdk.java.net  Wed Nov 10 18:05:35 2021
From: sspitsyn at openjdk.java.net (Serguei Spitsyn)
Date: Wed, 10 Nov 2021 18:05:35 GMT
Subject: RFR: 8274903: Zero: Support AsyncGetCallTrace [v3]
In-Reply-To: <t6nZoibe2zyKRfM8pbTgdwA5yRBEfQomcoJMV3h8g4g=.cbe7cc85-d3a8-4d23-80df-957d15cdc989@github.com>
References: <JjNvKdaMic8QCxlSJG-pmw0Ru9eLqwnf3KQ8xGVzETY=.4d44f1e7-503a-4f08-8d62-3bf7eae74a49@github.com>
 <t6nZoibe2zyKRfM8pbTgdwA5yRBEfQomcoJMV3h8g4g=.cbe7cc85-d3a8-4d23-80df-957d15cdc989@github.com>
Message-ID: <RqLUwBSarIPHSo_iL10zgPJGVg8fSe7KPQZbL4ruCaU=.c901b5e4-8e3e-4e66-a1b8-55c7f2edab81@github.com>

On Wed, 10 Nov 2021 16:38:57 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> This is a Zero infrastructure improvement that makes Zero VM work with AsyncGetCallTrace, and by extension, async-profiler.
>> 
>> Zero is quite odd in stack management. The "real" stack actually contains the C++ Interpreter and the rest of VM code. The Java stack is reported through the usual "frame" mechanism the rest of VM uses to get the mapping from Template Interpreter, stub, and compiled code. So, to support Java-centric AsyncGetCallTrace, we t "only" need Zero to report the proper Java frames from its ZeroStack from the profiling/signal handlers. 
>> 
>> Additional testing:
>>  - [x] Linux x86_64 Zero `serviceability/AsyncGetCallTrace` now pass
>>  - [x] Linux x86_64 Zero works with `async-profiler`
>
> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
> 
>   More reviews

Marked as reviewed by sspitsyn (Reviewer).

-------------

PR: https://git.openjdk.java.net/jdk/pull/5848

From duke at openjdk.java.net  Wed Nov 10 18:13:38 2021
From: duke at openjdk.java.net (Evgeny Astigeevich)
Date: Wed, 10 Nov 2021 18:13:38 GMT
Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v13]
In-Reply-To: <31DzKXEmMNYWZ1NL3FroXD7dCIDhwBJNzRotZCkKTqg=.30bc4ee0-9701-4cf7-925d-27901f47cdcc@github.com>
References: <kNBCb5Kvq3poBJuzHn0mw_MP5ubeoyWUUvvkhkXt2dA=.70e0af5a-b45a-4962-9550-e518747e35fc@github.com>
 <G5Vy0H_xF5ugFVFp275IngvLejfHBoCpx8EcoAudnHw=.41e17cb9-417e-49ca-98c1-e3c4656a37f5@github.com>
 <mPTXDZyRIGV_0_sfp4Geh6ng2NhN6pNRLgqfMEo6FAw=.d0cb7552-673e-4f5f-8e6e-1339823bbedb@github.com>
 <pX7x19mPaCbDloBmj8WSPZRedZ3pGOShgSZwDKUQhXs=.b043f6bc-5cc5-4cb2-9d49-75dc69d3b0d7@github.com>
 <K_GrJFzVK9UZDyvt06am1ZaNtwFt4wOj9gholDojhhU=.97714c75-53ca-4e4c-a0c1-3bd192650f4e@github.com>
 <QgjNkBwREASPzII84F3dZx40HABtBTfpNwyBm9jU-eg=.7e340843-ec86-4122-8085-9411e9db3216@github.com>
 <31DzKXEmMNYWZ1NL3FroXD7dCIDhwBJNzRotZCkKTqg=.30bc4ee0-9701-4cf7-925d-27901f47cdcc@github.com>
Message-ID: <y38vnQDbQwJr42XSWxKfoZ36M_IXDcuD1X2e75RNSgc=.159b3e10-4716-43f7-855f-2988774166f6@github.com>

On Mon, 1 Nov 2021 13:11:40 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> This test is too artificial. Going through my records I've found I have a microbenchmark for `java.util.concurrent. SynchronousQueue` which shows good improvements on jdk11. `SynchronousQueue` uses `onSpinWait`. Since jdk17 `SynchronousQueue` has not been using `onSpinWait` any more (See https://bugs.openjdk.java.net/browse/JDK-8267502). Maybe I can come up with a microbenchmark based on `SynchronousQueue` [code](https://github.com/openjdk/jdk11u-dev/blob/master/src/java.base/share/classes/java/util/concurrent/SynchronousQueue.java#L412):
>> 
>>         SNode awaitFulfill(SNode s, boolean timed, long nanos) {
>>             /*
>>              * When a node/thread is about to block, it sets its waiter
>>              * field and then rechecks state at least one more time
>>              * before actually parking, thus covering race vs
>>              * fulfiller noticing that waiter is non-null so should be
>>              * woken.
>>              *
>>              * When invoked by nodes that appear at the point of call
>>              * to be at the head of the stack, calls to park are
>>              * preceded by spins to avoid blocking when producers and
>>              * consumers are arriving very close in time.  This can
>>              * happen enough to bother only on multiprocessors.
>>              *
>>              * The order of checks for returning out of main loop
>>              * reflects fact that interrupts have precedence over
>>              * normal returns, which have precedence over
>>              * timeouts. (So, on timeout, one last check for match is
>>              * done before giving up.) Except that calls from untimed
>>              * SynchronousQueue.{poll/offer} don't check interrupts
>>              * and don't wait at all, so are trapped in transfer
>>              * method rather than calling awaitFulfill.
>>              */
>>             final long deadline = timed ? System.nanoTime() + nanos : 0L;
>>             Thread w = Thread.currentThread();
>>             int spins = shouldSpin(s)
>>                 ? (timed ? MAX_TIMED_SPINS : MAX_UNTIMED_SPINS)
>>                 : 0;
>>             for (;;) {
>>                 if (w.isInterrupted())
>>                     s.tryCancel();
>>                 SNode m = s.match;
>>                 if (m != null)
>>                     return m;
>>                 if (timed) {
>>                     nanos = deadline - System.nanoTime();
>>                     if (nanos <= 0L) {
>>                         s.tryCancel();
>>                         continue;
>>                     }
>>                 }
>>                 if (spins > 0) {
>>                     Thread.onSpinWait();
>>                     spins = shouldSpin(s) ? (spins - 1) : 0;
>>                 }
>>                 else if (s.waiter == null)
>>                     s.waiter = w; // establish waiter so can park next iter
>>                 else if (!timed)
>>                     LockSupport.park(this);
>>                 else if (nanos > SPIN_FOR_TIMEOUT_THRESHOLD)
>>                     LockSupport.parkNanos(this, nanos);
>>             }
>>         }
>> 
>> 
>> I've created https://bugs.openjdk.java.net/browse/JDK-8275728 to write such a microbenchmark.
>
> I suggest you do https://bugs.openjdk.java.net/browse/JDK-8275728 before you commit this. A benchmark which proves that this patch has some utility is needed, isn't it?

Hi Andrew (@theRealAph),
I've created a PR: https://github.com/openjdk/jdk/pull/6338 with a microbenchmark.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5562

From rkennke at openjdk.java.net  Wed Nov 10 19:19:13 2021
From: rkennke at openjdk.java.net (Roman Kennke)
Date: Wed, 10 Nov 2021 19:19:13 GMT
Subject: RFR: 8276901: Implement UseHeavyMonitors consistently [v2]
In-Reply-To: <CqBVDMLRhPl93f8gcv5VX0Un-fFF5BQtGEwrczIJdp4=.6a50ad8f-3bbe-4075-9f8c-ee4da37e9f8b@github.com>
References: <CqBVDMLRhPl93f8gcv5VX0Un-fFF5BQtGEwrczIJdp4=.6a50ad8f-3bbe-4075-9f8c-ee4da37e9f8b@github.com>
Message-ID: <PBwwZr_o3rG5s6S67oJlHbij7KAv2jrVZjRY0FFLPpo=.2e50a2d7-6988-48a2-8307-84f6aff56e59@github.com>

> The flag UseHeavyMonitors seems to imply that it makes Hotspot always use inflated monitors, rather than stack locks. However, it is only implemented in the interpreter that way. When it calls into runtime, it would still happily stack-lock. Even worse, C1 uses another flag UseFastLocking to achieve something similar (with the same caveat that runtime would stack-lock anyway). C2 doesn't have any such mechanism at all.
> I would like to experiment with disabling stack-locking, and thus, having this flag work as expected would seem very useful.
> 
> The change removes the C1 flag UseFastLocking, and replaces its uses with equivalent (i.e. inverted) UseHeavyMonitors instead. I think it makes sense to make UseHeavyMonitors develop (I wouldn't want anybody to use this in production, not currently without this change, and not with this change). I also added a flag VerifyHeavyMonitors to be able to verify that stack-locking is really disabled. We can't currently verify this uncondiftionally (e.g. in debug builds) because all non-x86_64 platforms would need work.
> 
> Testing:
>  - [x] tier1
>  - [x] tier2
>  - [x] tier3
>  - [ ] tier4

Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:

  Verify monitors even in non-debug builds

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6320/files
  - new: https://git.openjdk.java.net/jdk/pull/6320/files/f7b4c179..49dbc146

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6320&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6320&range=00-01

  Stats: 5 lines in 1 file changed: 0 ins; 2 del; 3 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6320.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6320/head:pull/6320

PR: https://git.openjdk.java.net/jdk/pull/6320

From coleenp at openjdk.java.net  Wed Nov 10 19:20:52 2021
From: coleenp at openjdk.java.net (Coleen Phillimore)
Date: Wed, 10 Nov 2021 19:20:52 GMT
Subject: RFR: 8276889: Improve compatibility discussion in instanceKlass.cpp
Message-ID: <jnL7O3CAVhozErVsdb0DML_oMk9wcXagQF4F7hDMmAQ=.518cc883-dd20-4200-89c1-01efa9847c31@github.com>

I removed the confusing comment that was missing some words, and linked the RFE to this one that the comment was referring to.  At one point in time, I really wanted to remove this code (still do but not as much now).  With JVMTI Heap functions deprecated JDK-8268242, maybe soon.
Please review this trivial change.

-------------

Commit messages:
 - 8276889: Improve compatibility discussion in instanceKlass.cpp

Changes: https://git.openjdk.java.net/jdk/pull/6340/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6340&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8276889
  Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6340.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6340/head:pull/6340

PR: https://git.openjdk.java.net/jdk/pull/6340

From hseigel at openjdk.java.net  Wed Nov 10 19:31:35 2021
From: hseigel at openjdk.java.net (Harold Seigel)
Date: Wed, 10 Nov 2021 19:31:35 GMT
Subject: RFR: 8276889: Improve compatibility discussion in
 instanceKlass.cpp
In-Reply-To: <jnL7O3CAVhozErVsdb0DML_oMk9wcXagQF4F7hDMmAQ=.518cc883-dd20-4200-89c1-01efa9847c31@github.com>
References: <jnL7O3CAVhozErVsdb0DML_oMk9wcXagQF4F7hDMmAQ=.518cc883-dd20-4200-89c1-01efa9847c31@github.com>
Message-ID: <pqGIdwRilZ5SAaX69e9qU2B4I91f3fANyDfK4u9NQjQ=.6489d6e0-1977-4339-8d80-1526a375a3cb@github.com>

On Wed, 10 Nov 2021 19:13:22 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

> I removed the confusing comment that was missing some words, and linked the RFE to this one that the comment was referring to.  At one point in time, I really wanted to remove this code (still do but not as much now).  With JVMTI Heap functions deprecated JDK-8268242, maybe soon.
> Please review this trivial change.

Looks good and trivial.
Thanks, Harold

-------------

Marked as reviewed by hseigel (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6340

From coleenp at openjdk.java.net  Wed Nov 10 19:48:38 2021
From: coleenp at openjdk.java.net (Coleen Phillimore)
Date: Wed, 10 Nov 2021 19:48:38 GMT
Subject: RFR: 8276889: Improve compatibility discussion in
 instanceKlass.cpp
In-Reply-To: <jnL7O3CAVhozErVsdb0DML_oMk9wcXagQF4F7hDMmAQ=.518cc883-dd20-4200-89c1-01efa9847c31@github.com>
References: <jnL7O3CAVhozErVsdb0DML_oMk9wcXagQF4F7hDMmAQ=.518cc883-dd20-4200-89c1-01efa9847c31@github.com>
Message-ID: <8XvzTfDW6ig-MGL8sVTvZ2dUzfeMeFELDW4tPoBPX8E=.59b33576-c242-4d61-94b4-2a3151a37c4a@github.com>

On Wed, 10 Nov 2021 19:13:22 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

> I removed the confusing comment that was missing some words, and linked the RFE to this one that the comment was referring to.  At one point in time, I really wanted to remove this code (still do but not as much now).  With JVMTI Heap functions deprecated JDK-8268242, maybe soon.
> Please review this trivial change.

Thanks Harold!

-------------

PR: https://git.openjdk.java.net/jdk/pull/6340

From coleenp at openjdk.java.net  Wed Nov 10 19:48:38 2021
From: coleenp at openjdk.java.net (Coleen Phillimore)
Date: Wed, 10 Nov 2021 19:48:38 GMT
Subject: Integrated: 8276889: Improve compatibility discussion in
 instanceKlass.cpp
In-Reply-To: <jnL7O3CAVhozErVsdb0DML_oMk9wcXagQF4F7hDMmAQ=.518cc883-dd20-4200-89c1-01efa9847c31@github.com>
References: <jnL7O3CAVhozErVsdb0DML_oMk9wcXagQF4F7hDMmAQ=.518cc883-dd20-4200-89c1-01efa9847c31@github.com>
Message-ID: <yYA7nAEl-Kg1fzQFTmQVGQAuNkRDrkWYoS2Au3D_BNQ=.420b4bc9-cd59-48fd-a1c2-245c37381b0c@github.com>

On Wed, 10 Nov 2021 19:13:22 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

> I removed the confusing comment that was missing some words, and linked the RFE to this one that the comment was referring to.  At one point in time, I really wanted to remove this code (still do but not as much now).  With JVMTI Heap functions deprecated JDK-8268242, maybe soon.
> Please review this trivial change.

This pull request has now been integrated.

Changeset: 67c2714b
Author:    Coleen Phillimore <coleenp at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/67c2714ba2c9658e07153a6f50391c896e4caebc
Stats:     1 line in 1 file changed: 0 ins; 1 del; 0 mod

8276889: Improve compatibility discussion in instanceKlass.cpp

Reviewed-by: hseigel

-------------

PR: https://git.openjdk.java.net/jdk/pull/6340

From iklam at openjdk.java.net  Wed Nov 10 20:26:38 2021
From: iklam at openjdk.java.net (Ioi Lam)
Date: Wed, 10 Nov 2021 20:26:38 GMT
Subject: RFR: 8269986: Remove +3 from Symbol::identity_hash()
In-Reply-To: <3QfGGc4vIbwBz-k8URuVmp2bVWOID4UQmEwKBSQo7Ls=.61a1aca5-7b04-4017-a37a-3f82a6327e9c@github.com>
References: <sZinjAxTK8hjhtSVd0OktG6h44GR7vX2hudlz5sfCC0=.707fd475-c9a9-4da0-9964-d971e10a1176@github.com>
 <3QfGGc4vIbwBz-k8URuVmp2bVWOID4UQmEwKBSQo7Ls=.61a1aca5-7b04-4017-a37a-3f82a6327e9c@github.com>
Message-ID: <DNf-zBl1fxSWTuJ7w6J3RuQgVcyLclFIU1D1eGFD0Ek=.329ccd98-3437-412f-ad33-567023b07610@github.com>

On Tue, 9 Nov 2021 20:01:58 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

>> Please review this trivial change that removes the `+3` from here:
>> 
>> 
>>   unsigned Symbol::identity_hash() const {
>>     unsigned addr_bits = (unsigned)((uintptr_t)this >> (LogMinObjAlignmentInBytes + 3));
>>                                                                                   ^^^
>>     return ((unsigned)extract_hash(_hash_and_refcount) & 0xffff) |
>>            ((addr_bits ^ (length() << 8) ^ (( _body[0] << 8) | _body[1])) << 16);
>>   }
>> 
>> 
>> The `+3` was intended to avoid getting the same value for these bits:
>> 
>> 
>> ((uintptr_t)this) >> LogMinObjAlignmentInBytes) & 0x07)
>> 
>> 
>> However, as shown in the [bug report](https://bugs.openjdk.java.net/browse/JDK-8269986), the values for these bits are evenly distributed. So the `+3` is not necessary and may actually be counter-productive.
>> 
>> Testing: Oracle CI tiers 1-4
>
> Looks good! Thanks for doing the performance analysis.

Thanks @coleenp for the review. @cl4es also reviewed it off-line. Since the change is trivial, I am pushing it now.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6287

From iklam at openjdk.java.net  Wed Nov 10 20:26:39 2021
From: iklam at openjdk.java.net (Ioi Lam)
Date: Wed, 10 Nov 2021 20:26:39 GMT
Subject: Integrated: 8269986: Remove +3 from Symbol::identity_hash()
In-Reply-To: <sZinjAxTK8hjhtSVd0OktG6h44GR7vX2hudlz5sfCC0=.707fd475-c9a9-4da0-9964-d971e10a1176@github.com>
References: <sZinjAxTK8hjhtSVd0OktG6h44GR7vX2hudlz5sfCC0=.707fd475-c9a9-4da0-9964-d971e10a1176@github.com>
Message-ID: <WyfwPVCmfE_GeWaTvpXHHjIq_812CNhXQFyX5I2Cj6g=.cd40f33a-c09c-459b-a2d3-3da13889b9b8@github.com>

On Sun, 7 Nov 2021 21:10:35 GMT, Ioi Lam <iklam at openjdk.org> wrote:

> Please review this trivial change that removes the `+3` from here:
> 
> 
>   unsigned Symbol::identity_hash() const {
>     unsigned addr_bits = (unsigned)((uintptr_t)this >> (LogMinObjAlignmentInBytes + 3));
>                                                                                   ^^^
>     return ((unsigned)extract_hash(_hash_and_refcount) & 0xffff) |
>            ((addr_bits ^ (length() << 8) ^ (( _body[0] << 8) | _body[1])) << 16);
>   }
> 
> 
> The `+3` was intended to avoid getting the same value for these bits:
> 
> 
> ((uintptr_t)this) >> LogMinObjAlignmentInBytes) & 0x07)
> 
> 
> However, as shown in the [bug report](https://bugs.openjdk.java.net/browse/JDK-8269986), the values for these bits are evenly distributed. So the `+3` is not necessary and may actually be counter-productive.
> 
> Testing: Oracle CI tiers 1-4

This pull request has now been integrated.

Changeset: df02daa6
Author:    Ioi Lam <iklam at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/df02daa6f9df801a7e0b6203fd6411d8a62bb277
Stats:     1 line in 1 file changed: 0 ins; 0 del; 1 mod

8269986: Remove +3 from Symbol::identity_hash()

Reviewed-by: coleenp

-------------

PR: https://git.openjdk.java.net/jdk/pull/6287

From coleenp at openjdk.java.net  Wed Nov 10 22:12:44 2021
From: coleenp at openjdk.java.net (Coleen Phillimore)
Date: Wed, 10 Nov 2021 22:12:44 GMT
Subject: RFR: 8258192: Obsolete the CriticalJNINatives flag
Message-ID: <ob57YhEZroN10gysG3zLpZ3Q8GJr2jllHF3MCKJZViw=.58908e4d-3b94-49d7-9fef-a501e001207d@github.com>

This change removes the disabled CriticalJNINatives code, and the flag now gives an obsolete message.
Tested with tier1 on cpus x64, aarch64, and builds on linux-x86-open,linux-s390x-open,linux-arm32-debug,linux-ppc64le-debug.

-------------

Commit messages:
 - 8258192: Obsolete the CriticalJNINatives flag

Changes: https://git.openjdk.java.net/jdk/pull/6343/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6343&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8258192
  Stats: 1790 lines in 24 files changed: 0 ins; 1616 del; 174 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6343.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6343/head:pull/6343

PR: https://git.openjdk.java.net/jdk/pull/6343

From dlong at openjdk.java.net  Thu Nov 11 03:36:51 2021
From: dlong at openjdk.java.net (Dean Long)
Date: Thu, 11 Nov 2021 03:36:51 GMT
Subject: RFR: 8276095: ciReplay: replay failure due to incomplete ciMethodData
 information
Message-ID: <EjhE5ReChm9c2C6F0K-WE94bWhagEW7y7qcWLlp6EvY=.a49a439b-32f5-40b1-a81c-47ed0d8047ee@github.com>

The replay data was missing MethodData::_invocation_counter.  Adding it seems to fix the problem.  @rwestrel please verify if it works for you.  Also, with this change:
1. added a version number to the replay file
2. removed unnused ci fields
3. corrected comment in TestLambdas.java

-------------

Commit messages:
 - replay failure due to incomplete ciMethodData information

Changes: https://git.openjdk.java.net/jdk/pull/6344/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6344&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8276095
  Stats: 59 lines in 7 files changed: 27 ins; 24 del; 8 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6344.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6344/head:pull/6344

PR: https://git.openjdk.java.net/jdk/pull/6344

From stuefe at openjdk.java.net  Thu Nov 11 06:30:15 2021
From: stuefe at openjdk.java.net (Thomas Stuefe)
Date: Thu, 11 Nov 2021 06:30:15 GMT
Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks [v2]
In-Reply-To: <h_TohyRCtTn8BzN-h8uTTotWxPr5gTe4CWB04_lS8uA=.3f687b45-3058-42ab-902b-f42cfeb9104d@github.com>
References: <h_TohyRCtTn8BzN-h8uTTotWxPr5gTe4CWB04_lS8uA=.3f687b45-3058-42ab-902b-f42cfeb9104d@github.com>
Message-ID: <GZdP4Skyrnhyww0vSmutTLOwvYW3tCjigaC9lQKTfd8=.fda271b3-06aa-4603-b1c1-fa3f52400be5@github.com>

> This is part of a number of RFE I plan to improve and simplify C-heap overflow checking in hotspot. For the whole story please refer to https://bugs.openjdk.java.net/browse/JDK-8275301.
> 
> This proposal adds NMT buffer overflow checking. As laid out in JDK-8275301:
> 
> - it would give us C-heap overflow checking in release builds
> - the additional costs are neglectable
> - NMT needs intact headers anyway. Faced with buffer overwrites today, it would maybe crash or maybe account wrongly, but it's a bit of a lottery really. The error reports would also be confusing.
> - it is a preparation for future code removal (the memory guarding done in debug only in os::malloc() and friends, and possibly the guarding done with CheckJNICalls)
> 
> Patch notes:
> 
> 1) The malloc header is changed such that it contains a 16-bit canary directly preceding the user payload of the allocation. 
> 
> On 64-bit, we don't even need to enlarge the malloc header: we carve some bits out by decreasing the size of the bucket index bit field to 16 bits. The bucket index field is used to store the bucket slot of the malloc site table in NMT detail mode. The malloc site table width is 512 atm, so 65k gives plenty of room for growing the malloc site table should we ever want to.
> 
> On 32-bit, I had to enlarge the header from 8 bytes to 16 bytes. That is because there were not enough bits to spare for a canary. On the upside, 8 bytes were not enough anyway, strictly speaking, to guarantee proper alignment e.g. for 128bit data types on all 32-bit platforms. See e.g. the malloc alignment the glibc uses.
> 
> I also took the freedom of re-arranging the malloc header fields a bit to minimize the difference between 32-bit and 64-bit platforms, and to align each field optimally according to its size. I also switched from bitfields to real types in order to be able to do a sizeof() on them.
> 
> For more details, see the comment in mallocTracker.hpp.
> 
> 2) I added a footer canary trailing the user allocation to catch tail buffer overruns. For simplicity reasons (alignment) and to save some cycles I made it a byte only. That is enough to catch most overrun scenarios. If you think this is too small, I'm open to change it.
> 
> 3) I put a bit of work into error reporting. When NMT detects corruption, it will now print out a hex dump of the corrupted area to tty before asserting.
> 
> 4) I added a bunch of gtests to test various heap overwrite scenarios. I also had to extend the gtest macros a bit because I wanted these tests of course to run in release builds too, but we did not have a death test macro for release builds yet (there are possibilities for code simplification here too, but that's for another RFE).
> 
> (Note that these gtests, to test anything, need to run with NMT switched on. We do this as part of our NMT jtreg-controlled gtests in tier1).
> 
> Even though the patch adds more code than it removes, it prepares possible code removal (if we can agree to do that) and the net result will be less complexity, not more. Again, see JDK-8275301 for details.
> 
> --------------
> 
> Example output a buffer overrun would provide:
> 
> 
> Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)
> NMT Block at 0x00005600f86136b0, corruption at: 0x00005600f86136c1: 
> 0x00005600f86136a8:   21 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00
> 0x00005600f86136b8:   00 00 00 00 0f 00 1f fa 00 61 00 00 00 00 00 00
> 0x00005600f86136c8:   41 39 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00005600f86136d8:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> 0x00005600f86136e8:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00005600f86136f8:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00005600f8613708:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00005600f8613718:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00005600f8613728:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00005600f8613738:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> assert failed: fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)#
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  Internal Error (mallocTracker.cpp:203), pid=10805, tid=10805
> #  fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)
> #
> 
> -------
> 
> Tests:
> - manual tests with Linux x64, x86, minimal build
> - GHAs all clean
> - SAP nightlies ran for 14 days in a row without problems

Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision:

 - Merge
 - Let NMT do overflow detection

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/5952/files
  - new: https://git.openjdk.java.net/jdk/pull/5952/files/f4a92cf5..e04a105d

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5952&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5952&range=00-01

  Stats: 886552 lines in 1706 files changed: 455452 ins; 424812 del; 6288 mod
  Patch: https://git.openjdk.java.net/jdk/pull/5952.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/5952/head:pull/5952

PR: https://git.openjdk.java.net/jdk/pull/5952

From dholmes at openjdk.java.net  Thu Nov 11 07:10:35 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Thu, 11 Nov 2021 07:10:35 GMT
Subject: RFR: 8276658: Clean up JNI local handles code
In-Reply-To: <ghp2JMOJnBtdd8Gu4gCIp98JraT8n7YOLvAiAJIrQBU=.8cafc56a-cf56-4209-9ba0-636325123d9e@github.com>
References: <ghp2JMOJnBtdd8Gu4gCIp98JraT8n7YOLvAiAJIrQBU=.8cafc56a-cf56-4209-9ba0-636325123d9e@github.com>
Message-ID: <GP4f8enTs4vqQU27VCXHUjcsNVr58UkShEQ5_y01oPI=.084d7f86-531c-48eb-b740-e58842451118@github.com>

On Wed, 10 Nov 2021 17:16:29 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

> JNI Local handles can only be created by JavaThread (there's an assert in make_local) but the fields are added to Thread.
> Move the fields to JavaThread and adding JavaThread* argument.
> Also, the global freelist isn't very useful now that global JNI handles don't use JNIHandleBlock, so the locking that claims incorrectly to block for safepoint is removed.
> Lastly, there's at least 3 places that duplicate pushing a new JNIHandleBlock to the thread for temporarily adding JNI local handles. These have been moved to common code with a JNIHandleMark object, moved from jvmci code.
> The commits are separate to help reviewing, but the entire change has been tested together with tier1-6.
> The commits in this change have been performance tested individually and together with no meaningful differences from mainline.

Hi Coleen,

Nice cleanup and refactoring! I'm not familiar with all the details but the reshuffling looks good to me.

One query and one minor issue below.

Thanks,
David

src/hotspot/share/compiler/compileBroker.cpp line 2324:

> 2322:   // Remove the JNI handle block after the ciEnv destructor has run in
> 2323:   // the previous block.
> 2324:   pop_jni_handle_block();

Does the fact the JNIHandleMark destructor won't get executed until much later, at the end of this method, make any difference?

src/hotspot/share/runtime/vmThread.hpp line 63:

> 61: class VMThread: public NamedThread {
> 62:  private:
> 63:   volatile bool _is_running;

I don't see this being initialized to false.

-------------

Marked as reviewed by dholmes (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6336

From shade at openjdk.java.net  Thu Nov 11 07:27:39 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Thu, 11 Nov 2021 07:27:39 GMT
Subject: RFR: 8258192: Obsolete the CriticalJNINatives flag
In-Reply-To: <ob57YhEZroN10gysG3zLpZ3Q8GJr2jllHF3MCKJZViw=.58908e4d-3b94-49d7-9fef-a501e001207d@github.com>
References: <ob57YhEZroN10gysG3zLpZ3Q8GJr2jllHF3MCKJZViw=.58908e4d-3b94-49d7-9fef-a501e001207d@github.com>
Message-ID: <SRNSoDTVFRw_mF5XktqAo_Fnf8opDZZxXfpPfM2FcK8=.e427ef57-7621-4b3a-a90f-a3cc95855846@github.com>

On Wed, 10 Nov 2021 22:06:05 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

> This change removes the disabled CriticalJNINatives code, and the flag now gives an obsolete message.
> Tested with tier1 on cpus x64, aarch64, and builds on linux-x86-open,linux-s390x-open,linux-arm32-debug,linux-ppc64le-debug.

src/hotspot/cpu/ppc/sharedRuntime_ppc.cpp line 1746:

> 1744:   // NW     [ABI_REG_ARGS]             <-- 1) R1_SP
> 1745:   //        [outgoing arguments]       <-- 2) R1_SP + out_arg_slot_offset
> 1746:   //        [oopHandle area]           <-- 3) R1_SP + oop_handle_offset (save area for critical natives) ?

`?`. The comment `(save area for critical natives)` must be redundant now.

src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp line 1551:

> 1549:   int total_c_args = total_in_args+1;
> 1550:   if (method->is_static()) {
> 1551:     total_c_args++;

In this patch, sometimes we keep the if structure, like here, but in other places, we replace this with:

  int total_c_args = total_in_args + (method->is_static() ? 2 : 1)

Should probably stick with a single style.

src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp line 1793:

> 1791:     int c_arg = arg_order.at(ai + 1);
> 1792:     __ block_comment(err_msg("move %d -> %d", i, c_arg));
> 1793:     assert (c_arg != -1, "wrong direction");

`assert (c_arg != -1 && i != -1, "wrong direction");`?

src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp line 1905:

> 1903:   } else {
> 1904:     // Compute a valid move order, using tmp_vmreg to break any cycles
> 1905:     ComputeMoveOrder cmo(total_in_args, in_regs, total_c_args, out_regs, in_sig_bt, arg_order, tmp_vmreg);

`ComputeMoveOrder` is still used somewhere, or?

src/hotspot/share/runtime/sharedRuntime.cpp line 3019:

> 3017:   if (CriticalJNINatives && !method->is_method_handle_intrinsic()) {
> 3018:     // We perform the I/O with transition to native before acquiring AdapterHandlerLibrary_lock.
> 3019:     critical_entry = NativeLookup::lookup_critical_entry(method);

`critical_entry` variable is now redundant?

-------------

PR: https://git.openjdk.java.net/jdk/pull/6343

From shade at openjdk.java.net  Thu Nov 11 07:30:37 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Thu, 11 Nov 2021 07:30:37 GMT
Subject: RFR: 8274903: Zero: Support AsyncGetCallTrace [v3]
In-Reply-To: <RqLUwBSarIPHSo_iL10zgPJGVg8fSe7KPQZbL4ruCaU=.c901b5e4-8e3e-4e66-a1b8-55c7f2edab81@github.com>
References: <JjNvKdaMic8QCxlSJG-pmw0Ru9eLqwnf3KQ8xGVzETY=.4d44f1e7-503a-4f08-8d62-3bf7eae74a49@github.com>
 <t6nZoibe2zyKRfM8pbTgdwA5yRBEfQomcoJMV3h8g4g=.cbe7cc85-d3a8-4d23-80df-957d15cdc989@github.com>
 <RqLUwBSarIPHSo_iL10zgPJGVg8fSe7KPQZbL4ruCaU=.c901b5e4-8e3e-4e66-a1b8-55c7f2edab81@github.com>
Message-ID: <81d18oGgsBvytKRcjrO6lkygVY2G6wY-UHntuB47Fso=.084c545c-b9f1-4a07-a264-677cdbb9a2d2@github.com>

On Wed, 10 Nov 2021 18:03:00 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote:

>> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   More reviews
>
> Marked as reviewed by sspitsyn (Reviewer).

Thank you, @sspitsyn! Any more reviews, anyone?

-------------

PR: https://git.openjdk.java.net/jdk/pull/5848

From chagedorn at openjdk.java.net  Thu Nov 11 07:36:33 2021
From: chagedorn at openjdk.java.net (Christian Hagedorn)
Date: Thu, 11 Nov 2021 07:36:33 GMT
Subject: RFR: 8276095: ciReplay: replay failure due to incomplete
 ciMethodData information
In-Reply-To: <EjhE5ReChm9c2C6F0K-WE94bWhagEW7y7qcWLlp6EvY=.a49a439b-32f5-40b1-a81c-47ed0d8047ee@github.com>
References: <EjhE5ReChm9c2C6F0K-WE94bWhagEW7y7qcWLlp6EvY=.a49a439b-32f5-40b1-a81c-47ed0d8047ee@github.com>
Message-ID: <WhC8vPRClSXwnWF3aElZqD2oubFZtcNpKQJlLiGbAm8=.56ec3811-08ad-420c-96d8-dd705ba8e11a@github.com>

On Thu, 11 Nov 2021 03:28:40 GMT, Dean Long <dlong at openjdk.org> wrote:

> The replay data was missing MethodData::_invocation_counter.  Adding it seems to fix the problem.  @rwestrel please verify if it works for you.  Also, with this change:
> 1. added a version number to the replay file
> 2. removed unnused ci fields
> 3. corrected comment in TestLambdas.java

Looks good to me! Thanks for also adapting the changes from 8275868 to use the new version number.

src/hotspot/share/ci/ciReplay.cpp line 900:

> 898: 
> 899:       // Only initialize the protection domain handle with the protection domain of the very first entry.
> 900:       // This also ensures that older replay files work.

Second sentence can now be removed with version numbers.

-------------

Marked as reviewed by chagedorn (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6344

From duke at openjdk.java.net  Thu Nov 11 08:48:07 2021
From: duke at openjdk.java.net (Alan Hayward)
Date: Thu, 11 Nov 2021 08:48:07 GMT
Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v2]
In-Reply-To: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
Message-ID: <DqpP1khabg9YQPvJsdAZm5Bl-aV5tCBhuIC_2yeNvPU=.1dcc624c-b345-42de-acc6-0341a3118dc7@github.com>

> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One
> of its uses is to protect against ROP based attacks. This is done by
> signing the Link Register whenever it is stored on the stack, and
> authenticating the value when it is loaded back from the stack. If an
> attacker were to try to change control flow by editing the stack then
> the authentication check of the Link Register will fail, causing a
> segfault when the function returns.
> 
> On a system with PAC enabled, it is expected that all applications will
> be compiled with ROP protection. Fedora 33 and upwards already provide
> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of
> PAC instructions that exist in the NOP space - on hardware without PAC,
> these instructions act as NOPs, allowing backward compatibility for
> negligible performance cost (2 NOPs per non-leaf function).
> 
> Hardware is currently limited to the Apple M1 MacBooks. All testing has
> been done within a Fedora Docker image. A run of SpecJVM showed no
> difference to that of noise - which was surprising.
> 
> The most important part of this patch is simply compiling using branch
> protection provided by GCC/LLVM. This protects all C++ code from being
> used in ROP attacks, removing all static ROP gadgets from use.
> 
> The remainder of the patch adds ROP protection to runtime generated
> code, in both stubs and compiled Java code. Attacks here are much harder
> as ROP gadgets must be found dynamically at runtime. If/when AOT
> compilation is added to JDK, then all stubs and compiled Java will be
> susceptible ROP gadgets being found by static analysis and therefore
> potentially as vulnerable as C++ code.
> 
> There are a number of places where the VM changes control flow by
> rewriting the stack or otherwise. I?ve done some analysis as to how
> these could also be used for attacks (which I didn?t want to post here).
> These areas can be protected ensuring the pointers to various stubs and
> entry points are stored in memory as signed pointers. These changes are
> simple to make (they can be reduced to a type change in common code and
> a few addition sign/auth calls in the backend), but there a lot of them
> and the total code change is fairly large. I?m happy to provide a few
> work in progress patches.
> 
> In order to match the security benefits of the Apple Arm64e ABI across
> the whole of JDK, then all the changes mentioned above would be
> required.

Alan Hayward has updated the pull request incrementally with one additional commit since the last revision:

  Simplify branch protection configure check

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6334/files
  - new: https://git.openjdk.java.net/jdk/pull/6334/files/e0e3f666..29471d30

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=00-01

  Stats: 12 lines in 1 file changed: 0 ins; 6 del; 6 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6334.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6334/head:pull/6334

PR: https://git.openjdk.java.net/jdk/pull/6334

From aph at openjdk.java.net  Thu Nov 11 08:49:36 2021
From: aph at openjdk.java.net (Andrew Haley)
Date: Thu, 11 Nov 2021 08:49:36 GMT
Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v13]
In-Reply-To: <31DzKXEmMNYWZ1NL3FroXD7dCIDhwBJNzRotZCkKTqg=.30bc4ee0-9701-4cf7-925d-27901f47cdcc@github.com>
References: <kNBCb5Kvq3poBJuzHn0mw_MP5ubeoyWUUvvkhkXt2dA=.70e0af5a-b45a-4962-9550-e518747e35fc@github.com>
 <G5Vy0H_xF5ugFVFp275IngvLejfHBoCpx8EcoAudnHw=.41e17cb9-417e-49ca-98c1-e3c4656a37f5@github.com>
 <mPTXDZyRIGV_0_sfp4Geh6ng2NhN6pNRLgqfMEo6FAw=.d0cb7552-673e-4f5f-8e6e-1339823bbedb@github.com>
 <pX7x19mPaCbDloBmj8WSPZRedZ3pGOShgSZwDKUQhXs=.b043f6bc-5cc5-4cb2-9d49-75dc69d3b0d7@github.com>
 <K_GrJFzVK9UZDyvt06am1ZaNtwFt4wOj9gholDojhhU=.97714c75-53ca-4e4c-a0c1-3bd192650f4e@github.com>
 <QgjNkBwREASPzII84F3dZx40HABtBTfpNwyBm9jU-eg=.7e340843-ec86-4122-8085-9411e9db3216@github.com>
 <31DzKXEmMNYWZ1NL3FroXD7dCIDhwBJNzRotZCkKTqg=.30bc4ee0-9701-4cf7-925d-27901f47cdcc@github.com>
Message-ID: <6AGSxH9l4ABhczTNYkGSUQncGgENSWhYtOdnLAbsicY=.435c44c2-ec21-42f9-ab53-7f0891d0b42b@github.com>

On Mon, 1 Nov 2021 13:11:40 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> This test is too artificial. Going through my records I've found I have a microbenchmark for `java.util.concurrent. SynchronousQueue` which shows good improvements on jdk11. `SynchronousQueue` uses `onSpinWait`. Since jdk17 `SynchronousQueue` has not been using `onSpinWait` any more (See https://bugs.openjdk.java.net/browse/JDK-8267502). Maybe I can come up with a microbenchmark based on `SynchronousQueue` [code](https://github.com/openjdk/jdk11u-dev/blob/master/src/java.base/share/classes/java/util/concurrent/SynchronousQueue.java#L412):
>> 
>>         SNode awaitFulfill(SNode s, boolean timed, long nanos) {
>>             /*
>>              * When a node/thread is about to block, it sets its waiter
>>              * field and then rechecks state at least one more time
>>              * before actually parking, thus covering race vs
>>              * fulfiller noticing that waiter is non-null so should be
>>              * woken.
>>              *
>>              * When invoked by nodes that appear at the point of call
>>              * to be at the head of the stack, calls to park are
>>              * preceded by spins to avoid blocking when producers and
>>              * consumers are arriving very close in time.  This can
>>              * happen enough to bother only on multiprocessors.
>>              *
>>              * The order of checks for returning out of main loop
>>              * reflects fact that interrupts have precedence over
>>              * normal returns, which have precedence over
>>              * timeouts. (So, on timeout, one last check for match is
>>              * done before giving up.) Except that calls from untimed
>>              * SynchronousQueue.{poll/offer} don't check interrupts
>>              * and don't wait at all, so are trapped in transfer
>>              * method rather than calling awaitFulfill.
>>              */
>>             final long deadline = timed ? System.nanoTime() + nanos : 0L;
>>             Thread w = Thread.currentThread();
>>             int spins = shouldSpin(s)
>>                 ? (timed ? MAX_TIMED_SPINS : MAX_UNTIMED_SPINS)
>>                 : 0;
>>             for (;;) {
>>                 if (w.isInterrupted())
>>                     s.tryCancel();
>>                 SNode m = s.match;
>>                 if (m != null)
>>                     return m;
>>                 if (timed) {
>>                     nanos = deadline - System.nanoTime();
>>                     if (nanos <= 0L) {
>>                         s.tryCancel();
>>                         continue;
>>                     }
>>                 }
>>                 if (spins > 0) {
>>                     Thread.onSpinWait();
>>                     spins = shouldSpin(s) ? (spins - 1) : 0;
>>                 }
>>                 else if (s.waiter == null)
>>                     s.waiter = w; // establish waiter so can park next iter
>>                 else if (!timed)
>>                     LockSupport.park(this);
>>                 else if (nanos > SPIN_FOR_TIMEOUT_THRESHOLD)
>>                     LockSupport.parkNanos(this, nanos);
>>             }
>>         }
>> 
>> 
>> I've created https://bugs.openjdk.java.net/browse/JDK-8275728 to write such a microbenchmark.
>
> I suggest you do https://bugs.openjdk.java.net/browse/JDK-8275728 before you commit this. A benchmark which proves that this patch has some utility is needed, isn't it?

> Hi Andrew (@theRealAph), I've created a PR: #6338 with a microbenchmark.

That's really weird. Why is the benchmark not here?

-------------

PR: https://git.openjdk.java.net/jdk/pull/5562

From duke at openjdk.java.net  Thu Nov 11 09:36:36 2021
From: duke at openjdk.java.net (Evgeny Astigeevich)
Date: Thu, 11 Nov 2021 09:36:36 GMT
Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v15]
In-Reply-To: <kNBCb5Kvq3poBJuzHn0mw_MP5ubeoyWUUvvkhkXt2dA=.70e0af5a-b45a-4962-9550-e518747e35fc@github.com>
References: <kNBCb5Kvq3poBJuzHn0mw_MP5ubeoyWUUvvkhkXt2dA=.70e0af5a-b45a-4962-9550-e518747e35fc@github.com>
Message-ID: <9vTWqXSA_S6TE9iMxqpDBY4kj9smBnxMhE2hBna2V2Q=.2457f83a-c3b3-4fb5-b919-a4fdb843d8b7@github.com>

> This PR is a follow-up on the discussion [?RFC: AArch64: Implementing spin pauses with ISB?](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054033.html).
> 
> It adds DIAGNOSTIC options `OnSpinWaitInst=inst`, where `inst` can be:
> 
> - `none`: no implementation for spin pauses. This is the default value.
> - `nop`: use `nop` instruction for spin pauses.
> - `isb`: use `isb` instruction for spin pauses.
> - `yield`: use `yield` instruction for spin pauses.
> 
> And  `OnSpinWaitInstCount=count`, where `count` specifies a number of `OnSpinWaitInst` and can be in `1..99` range. It is an error to use `OnSpinWaitInstCount` when `OnSpinWaitInst` is `none`.
> 
> The code for the `Thread.onSpinWait` intrinsic is generated based on the values of `OnSpinWaitInst` and `OnSpinWaitInstCount`.
> 
> Testing:
> 
> - `make test TEST="gtest"`: Passed
> - `make run-test TEST="tier1"`: Passed
> - `make run-test TEST="tier2"`: Passed
> - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed
> 
> CSR: https://bugs.openjdk.java.net/browse/JDK-8274564

Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision:

  8275728: Add simple Producer/Consumer microbenchmark for Thread.onSpinWait

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/5562/files
  - new: https://git.openjdk.java.net/jdk/pull/5562/files/a06b4821..0d6fc3f0

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5562&range=14
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5562&range=13-14

  Stats: 204 lines in 1 file changed: 204 ins; 0 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/5562.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/5562/head:pull/5562

PR: https://git.openjdk.java.net/jdk/pull/5562

From duke at openjdk.java.net  Thu Nov 11 09:42:38 2021
From: duke at openjdk.java.net (Evgeny Astigeevich)
Date: Thu, 11 Nov 2021 09:42:38 GMT
Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v15]
In-Reply-To: <9vTWqXSA_S6TE9iMxqpDBY4kj9smBnxMhE2hBna2V2Q=.2457f83a-c3b3-4fb5-b919-a4fdb843d8b7@github.com>
References: <kNBCb5Kvq3poBJuzHn0mw_MP5ubeoyWUUvvkhkXt2dA=.70e0af5a-b45a-4962-9550-e518747e35fc@github.com>
 <9vTWqXSA_S6TE9iMxqpDBY4kj9smBnxMhE2hBna2V2Q=.2457f83a-c3b3-4fb5-b919-a4fdb843d8b7@github.com>
Message-ID: <ECyvVTsFp3GxzbYB3jdbwCYz7y5bC_ZFnbt_u5ADPrI=.f0b20e3b-7888-4990-aea5-2178fc464cc2@github.com>

On Thu, 11 Nov 2021 09:36:36 GMT, Evgeny Astigeevich <duke at openjdk.java.net> wrote:

>> This PR is a follow-up on the discussion [?RFC: AArch64: Implementing spin pauses with ISB?](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054033.html).
>> 
>> It adds DIAGNOSTIC options `OnSpinWaitInst=inst`, where `inst` can be:
>> 
>> - `none`: no implementation for spin pauses. This is the default value.
>> - `nop`: use `nop` instruction for spin pauses.
>> - `isb`: use `isb` instruction for spin pauses.
>> - `yield`: use `yield` instruction for spin pauses.
>> 
>> And  `OnSpinWaitInstCount=count`, where `count` specifies a number of `OnSpinWaitInst` and can be in `1..99` range. It is an error to use `OnSpinWaitInstCount` when `OnSpinWaitInst` is `none`.
>> 
>> The code for the `Thread.onSpinWait` intrinsic is generated based on the values of `OnSpinWaitInst` and `OnSpinWaitInstCount`.
>> 
>> Testing:
>> 
>> - `make test TEST="gtest"`: Passed
>> - `make run-test TEST="tier1"`: Passed
>> - `make run-test TEST="tier2"`: Passed
>> - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed
>> 
>> CSR: https://bugs.openjdk.java.net/browse/JDK-8274564
>
> Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision:
> 
>   8275728: Add simple Producer/Consumer microbenchmark for Thread.onSpinWait

`ThreadOnSpinWaitProducerConsumer` is to demonstrate `Thread.onSpinWait` can be used to avoid heavy locks.
The microbenchmark differs from [Gil's original benchmark](https://github.com/giltene/GilExamples/tree/master/SpinWaitTest) and [Dmitry's variations](http://cr.openjdk.java.net/~dchuyko/8186670/yield/spinwait.html). Those benchmarks produce/consume data by incrementing a volatile counter. The latency of such operations is almost zero. They also don't use heavy locks. According to [Gil's SpinWaitTest.java](https://github.com/giltene/GilExamples/blob/master/SpinWaitTest/src/main/java/SpinWaitTest.java):
> This test can be used to measure and document the impact of Runtime.onSpinWait() behavior
>  on thread-to-thread communication latencies. E.g. when the two threads are pinned to
> the two hardware threads of a shared x86 core (with a shared L1), this test will
> demonstrate an estimate the best case thread-to-thread latencies possible on the
> platform

Gil's microbenchmark targets SMT cases (x86 hyperthreading). As not all CPUs support SMT, the microbenchmarks cannot demonstrate benefits of `Thread.onSpinWait`. It is actually opposite. They show `Thread.onSpinWait`  has negative impact on performance.

The microbenchmark from PR uses `BigInteger` to have 100 - 200 ns latencies for producing/consuming data. These latencies can cause either a producer or a consumer to wait each another. Waiting is implemented with `Object.wait`/`Object.notify` which are heavy. `Thread.onSpinWait` can be used in a spin loop to avoid them.

**ARM64 results**:
- No spin loop

Benchmark                               (maxNum)  (spinNum)  Mode  Cnt     Score    Error  Units
ThreadOnSpinWaitProducerConsumer.trial       100          0  avgt   75  1520.448 ? 40.507  us/op

- No `Thread.onSpinWait` intrinsic

Benchmark                               (maxNum)  (spinNum)  Mode  Cnt     Score    Error  Units
ThreadOnSpinWaitProducerConsumer.trial       100        125  avgt   75  1580.756 ? 47.501  us/op

- `ISB`-based `Thread.onSpinWait` intrinsic

Benchmark                               (maxNum)  (spinNum)  Mode  Cnt    Score     Error  Units
ThreadOnSpinWaitProducerConsumer.trial       100        125  avgt   75  617.454 ? 174.431  us/op


**X86_64 results**:
- No spin loop

Benchmark                               (maxNum)  (spinNum)  Mode  Cnt    Score     Error  Units
ThreadOnSpinWaitProducerConsumer.trial      100        125  avgt   75  1417.944 ? 1.691  us/op

- No `Thread.onSpinWait` intrinsic

Benchmark                               (maxNum)  (spinNum)  Mode  Cnt    Score     Error  Units
ThreadOnSpinWaitProducerConsumer.trial      100        125  avgt   75  1410.987 ? 2.093  us/op

- `PAUSE`-based `Thread.onSpinWait` intrinsic

Benchmark                               (maxNum)  (spinNum)  Mode  Cnt    Score     Error  Units
ThreadOnSpinWaitProducerConsumer.trial      100        125  avgt   75  217.054 ? 1.283  us/op

-------------

PR: https://git.openjdk.java.net/jdk/pull/5562

From duke at openjdk.java.net  Thu Nov 11 09:42:39 2021
From: duke at openjdk.java.net (Evgeny Astigeevich)
Date: Thu, 11 Nov 2021 09:42:39 GMT
Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v13]
In-Reply-To: <6AGSxH9l4ABhczTNYkGSUQncGgENSWhYtOdnLAbsicY=.435c44c2-ec21-42f9-ab53-7f0891d0b42b@github.com>
References: <kNBCb5Kvq3poBJuzHn0mw_MP5ubeoyWUUvvkhkXt2dA=.70e0af5a-b45a-4962-9550-e518747e35fc@github.com>
 <G5Vy0H_xF5ugFVFp275IngvLejfHBoCpx8EcoAudnHw=.41e17cb9-417e-49ca-98c1-e3c4656a37f5@github.com>
 <mPTXDZyRIGV_0_sfp4Geh6ng2NhN6pNRLgqfMEo6FAw=.d0cb7552-673e-4f5f-8e6e-1339823bbedb@github.com>
 <pX7x19mPaCbDloBmj8WSPZRedZ3pGOShgSZwDKUQhXs=.b043f6bc-5cc5-4cb2-9d49-75dc69d3b0d7@github.com>
 <K_GrJFzVK9UZDyvt06am1ZaNtwFt4wOj9gholDojhhU=.97714c75-53ca-4e4c-a0c1-3bd192650f4e@github.com>
 <QgjNkBwREASPzII84F3dZx40HABtBTfpNwyBm9jU-eg=.7e340843-ec86-4122-8085-9411e9db3216@github.com>
 <31DzKXEmMNYWZ1NL3FroXD7dCIDhwBJNzRotZCkKTqg=.30bc4ee0-9701-4cf7-925d-27901f47cdcc@github.com>
 <6AGSxH9l4ABhczTNYkGSUQncGgENSWhYtOdnLAbsicY=.435c44c2-ec21-42f9-ab53-7f0891d0b42b@github.com>
Message-ID: <1GRLIikoCIaOxbXAx7d5DhHz7ne8pjiKTB9t7mRNPIk=.1832e508-3f36-4893-94bc-02c0304bda23@github.com>

On Thu, 11 Nov 2021 08:46:23 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> I suggest you do https://bugs.openjdk.java.net/browse/JDK-8275728 before you commit this. A benchmark which proves that this patch has some utility is needed, isn't it?
>
>> Hi Andrew (@theRealAph), I've created a PR: #6338 with a microbenchmark.
> 
> That's really weird. Why is the benchmark not here?

I thought a separate PR would simplify a discussion. Sorry if I was wrong.
I added it here.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5562

From simonis at openjdk.java.net  Thu Nov 11 09:48:35 2021
From: simonis at openjdk.java.net (Volker Simonis)
Date: Thu, 11 Nov 2021 09:48:35 GMT
Subject: RFR: 8273563: Improve performance of implicit exceptions with
 -XX:-OmitStackTraceInFastThrow [v6]
In-Reply-To: <ErVlQs6sHdvx2AK0XhJQ8XBtlZognM5BLEQ3eisZi9U=.39e6778c-b618-493c-bbbe-0d6d1131c8f4@github.com>
References: <V33bqOWqbYrr4ACfUn3SkS_JbOuICbX68z7jlX37pXQ=.a4a6b7ce-da22-440f-b3c7-d0a99a28fea0@github.com>
 <0II1AGxTmud7vWbpWeKGm_vPr_yqFuVaauWEzBN_pMw=.f09fc7be-1833-431b-8353-161b9dad3cf4@github.com>
 <ErVlQs6sHdvx2AK0XhJQ8XBtlZognM5BLEQ3eisZi9U=.39e6778c-b618-493c-bbbe-0d6d1131c8f4@github.com>
Message-ID: <DX1s7J2WjnbZndN24LTTyxnUG64-AiKPMFNssYuPMws=.18c0cead-3488-4bef-b275-b2683e06c5e3@github.com>

On Wed, 10 Nov 2021 16:56:07 GMT, Martin Doerr <mdoerr at openjdk.org> wrote:

>> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Add new WhiteBox functionality to sun/hotspot/WhiteBox.java as well to avoid warnings in the tests which are still using it.
>
> src/hotspot/share/prims/whitebox.cpp line 987:
> 
>> 985:     bool overflow = false;
>> 986:     for (uint reason = 0; reason < mdo->trap_reason_limit(); reason++) {
>> 987:       if (reason_str != NULL && !strcmp(reason_str, Deoptimization::trap_reason_name(reason))) {
> 
> Maybe the code would be better readable when checking `reason_str != NULL` first and then use 2 loops? Just a minor suggestion. Should only be done if readability is better.

I've tried it but the resulting version is slightly longer and in my opinion not really more readable:

WB_ENTRY(jint, WB_GetMethodTrapCount(JNIEnv* env, jobject o, jobject method, jstring reason_obj))
  jmethodID jmid = reflected_method_to_jmid(thread, env, method);
  CHECK_JNI_EXCEPTION_(env, 0);
  methodHandle mh(THREAD, Method::checked_resolve_jmethod_id(jmid));
  uint cnt = 0;
  MethodData* mdo = mh->method_data();
  if (mdo != NULL) {
    ResourceMark rm(THREAD);
    if (reason_obj != NULL) {
      char* reason_str = java_lang_String::as_utf8_string(JNIHandles::resolve_non_null(reason_obj));
      for (uint reason = 0; reason < mdo->trap_reason_limit(); reason++) {
        if (!strcmp(reason_str, Deoptimization::trap_reason_name(reason))) {
          cnt = mdo->trap_count(reason);
          // Count in the overflow trap count on overflow
          if (cnt == (uint)-1) {
            cnt = mdo->trap_count_limit() + mdo->overflow_trap_count();
          }
          break;
        }
      }
    } else {
      bool overflow = false;
      for (uint reason = 0; reason < mdo->trap_reason_limit(); reason++) {
        uint c = mdo->trap_count(reason);
        if (c == (uint)-1) {
          c = mdo->trap_count_limit();
          if (!overflow) {
            // Count overflow trap count just once
            overflow = true;
            c += mdo->overflow_trap_count();
          }
        }
        cnt += c;
      }
    }
  }
  return cnt;
WB_END


But for me it's actually no difference. Please just let me know if you'd still prefer the alternative version.

PS: I've updated the documentation of the method which was inaccurate for `reason==NULL`.

> src/hotspot/share/prims/whitebox.cpp line 1016:
> 
>> 1014:   }
>> 1015:   ResourceMark rm(THREAD);
>> 1016:   char *reason_str = (reason_obj == NULL) ?
> 
> I think we should use `const char*` as far as possible.

Done.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5488

From simonis at openjdk.java.net  Thu Nov 11 09:54:38 2021
From: simonis at openjdk.java.net (Volker Simonis)
Date: Thu, 11 Nov 2021 09:54:38 GMT
Subject: RFR: 8273563: Improve performance of implicit exceptions with
 -XX:-OmitStackTraceInFastThrow [v6]
In-Reply-To: <ErVlQs6sHdvx2AK0XhJQ8XBtlZognM5BLEQ3eisZi9U=.39e6778c-b618-493c-bbbe-0d6d1131c8f4@github.com>
References: <V33bqOWqbYrr4ACfUn3SkS_JbOuICbX68z7jlX37pXQ=.a4a6b7ce-da22-440f-b3c7-d0a99a28fea0@github.com>
 <0II1AGxTmud7vWbpWeKGm_vPr_yqFuVaauWEzBN_pMw=.f09fc7be-1833-431b-8353-161b9dad3cf4@github.com>
 <ErVlQs6sHdvx2AK0XhJQ8XBtlZognM5BLEQ3eisZi9U=.39e6778c-b618-493c-bbbe-0d6d1131c8f4@github.com>
Message-ID: <xhcardcBXFRKnq7ILk6SrwqVSuY7t__baiDAttK4dJk=.c4d4c428-1691-4ec2-9c70-eb3dbbec32e8@github.com>

On Wed, 10 Nov 2021 16:57:14 GMT, Martin Doerr <mdoerr at openjdk.org> wrote:

>> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Add new WhiteBox functionality to sun/hotspot/WhiteBox.java as well to avoid warnings in the tests which are still using it.
>
> src/hotspot/share/runtime/deoptimization.cpp line 2695:
> 
>> 2693:   return 0;
>> 2694: }
>> 2695: 
> 
> Why do we need this? Is it a placeholder for a future enhancement? If so, a comment would at least be helpful.

That's a tricky one :)

It's needed to fix the Minimal/Zero builds. It's inside a the `#else` branch of a `#ifdef COMPILER2_OR_JVMCI` condition together with a bunch of other methods which have an empty body in the case we have no C2 or JVMCI.
Could certainly be implemented more elegant but I decided to adhere to the current coding style in `deoptimization.cpp` :)

-------------

PR: https://git.openjdk.java.net/jdk/pull/5488

From simonis at openjdk.java.net  Thu Nov 11 10:00:39 2021
From: simonis at openjdk.java.net (Volker Simonis)
Date: Thu, 11 Nov 2021 10:00:39 GMT
Subject: RFR: 8273563: Improve performance of implicit exceptions with
 -XX:-OmitStackTraceInFastThrow [v6]
In-Reply-To: <ErVlQs6sHdvx2AK0XhJQ8XBtlZognM5BLEQ3eisZi9U=.39e6778c-b618-493c-bbbe-0d6d1131c8f4@github.com>
References: <V33bqOWqbYrr4ACfUn3SkS_JbOuICbX68z7jlX37pXQ=.a4a6b7ce-da22-440f-b3c7-d0a99a28fea0@github.com>
 <0II1AGxTmud7vWbpWeKGm_vPr_yqFuVaauWEzBN_pMw=.f09fc7be-1833-431b-8353-161b9dad3cf4@github.com>
 <ErVlQs6sHdvx2AK0XhJQ8XBtlZognM5BLEQ3eisZi9U=.39e6778c-b618-493c-bbbe-0d6d1131c8f4@github.com>
Message-ID: <blM4v-2fMUrBzqFDh6GhkmG_TSCTbWKP8O5xcha3o0g=.f70a11d1-78d2-4d94-a1bb-7817f8f487eb@github.com>

On Wed, 10 Nov 2021 17:06:06 GMT, Martin Doerr <mdoerr at openjdk.org> wrote:

>> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Add new WhiteBox functionality to sun/hotspot/WhiteBox.java as well to avoid warnings in the tests which are still using it.
>
> test/hotspot/jtreg/compiler/exceptions/OptimizeImplicitExceptions.java line 78:
> 
>> 76:     private static final WhiteBox WB = WhiteBox.getWhiteBox();
>> 77:     // Until JDK-8275908 is not fixed, null-pointer traps for invokes and array-store traps are not profiled in the interpreter.
>> 78:     private static final boolean JDK8275908_fixed = false;
> 
> I don't know if that one should get fixed first, but I'm ok with your workaround. Would it make sense to add that bug id to this test's header?

This PR is now open for so long time and I'd like to complete it without the dependency on another fix. But adding the bug id to the test is a good idea. Done

-------------

PR: https://git.openjdk.java.net/jdk/pull/5488

From nradomski at openjdk.java.net  Thu Nov 11 10:16:58 2021
From: nradomski at openjdk.java.net (Niklas Radomski)
Date: Thu, 11 Nov 2021 10:16:58 GMT
Subject: RFR: 8276927: [PPC64] Port shenandoahgc to linux on ppc64le
Message-ID: <Nt7KC3zC3ERbpcjIda_FrsJYNuDtzbd_khX75A1x4aE=.676a48b5-c7f4-4798-a3b8-516bb60dfaa4@github.com>

Port the Shenandoah garbage collector (JDK-8241457)[https://bugs.openjdk.java.net/browse/JDK-8241457] to linux on ppc64le.

-------------

Commit messages:
 - Port shenandoahgc to linux on ppc64le

Changes: https://git.openjdk.java.net/jdk/pull/6325/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6325&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8276927
  Stats: 1526 lines in 8 files changed: 1524 ins; 0 del; 2 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6325.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6325/head:pull/6325

PR: https://git.openjdk.java.net/jdk/pull/6325

From simonis at openjdk.java.net  Thu Nov 11 10:28:04 2021
From: simonis at openjdk.java.net (Volker Simonis)
Date: Thu, 11 Nov 2021 10:28:04 GMT
Subject: RFR: 8273563: Improve performance of implicit exceptions with
 -XX:-OmitStackTraceInFastThrow [v7]
In-Reply-To: <V33bqOWqbYrr4ACfUn3SkS_JbOuICbX68z7jlX37pXQ=.a4a6b7ce-da22-440f-b3c7-d0a99a28fea0@github.com>
References: <V33bqOWqbYrr4ACfUn3SkS_JbOuICbX68z7jlX37pXQ=.a4a6b7ce-da22-440f-b3c7-d0a99a28fea0@github.com>
Message-ID: <yfSgmQFp40nSZz0p1aj6W_-QpEXzIKVBOoh9guqQXHI=.a38c0656-174b-4cca-aae6-c1f7b81e2880@github.com>

> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening.
> 
> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274):
> 
>     public static boolean isAlpha(int c) {
>         try {
>             return IS_ALPHA[c];
>         } catch (ArrayIndexOutOfBoundsException ex) {
>             return false;
>         }
>     }
> 
> 
> ### Solution
> 
> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code:
> 
> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions
> Benchmark                 (exceptionProbability)  Mode  Cnt      Score      Error  Units
> ImplicitExceptions.bench                     0.0  avgt    5      1.430 ?    0.353  ns/op
> ImplicitExceptions.bench                    0.33  avgt    5   3563.038 ?   77.358  ns/op
> ImplicitExceptions.bench                    0.66  avgt    5   8609.693 ? 1205.104  ns/op
> ImplicitExceptions.bench                    1.00  avgt    5  12842.401 ? 1022.728  ns/op
> 
> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions
> Benchmark                 (exceptionProbability)  Mode  Cnt      Score      Error  Units
> ImplicitExceptions.bench                     0.0  avgt    5     1.432  ?    0.352  ns/op
> ImplicitExceptions.bench                    0.33  avgt    5   355.723  ?   16.641  ns/op
> ImplicitExceptions.bench                    0.66  avgt    5   887.068  ?  166.728  ns/op
> ImplicitExceptions.bench                    1.00  avgt    5  1274.418  ?   88.235  ns/op
> 
> 
> ### Implementation details
> 
> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default.
> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `<init>` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. 
> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`.
> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call.
> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster.

Volker Simonis has updated the pull request incrementally with one additional commit since the last revision:

  Minor enhancements and fixes requested by Martin

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/5488/files
  - new: https://git.openjdk.java.net/jdk/pull/5488/files/99db7e54..625da2f9

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5488&range=06
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5488&range=05-06

  Stats: 6 lines in 4 files changed: 0 ins; 0 del; 6 mod
  Patch: https://git.openjdk.java.net/jdk/pull/5488.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/5488/head:pull/5488

PR: https://git.openjdk.java.net/jdk/pull/5488

From simonis at openjdk.java.net  Thu Nov 11 10:28:08 2021
From: simonis at openjdk.java.net (Volker Simonis)
Date: Thu, 11 Nov 2021 10:28:08 GMT
Subject: RFR: 8273563: Improve performance of implicit exceptions with
 -XX:-OmitStackTraceInFastThrow [v6]
In-Reply-To: <0II1AGxTmud7vWbpWeKGm_vPr_yqFuVaauWEzBN_pMw=.f09fc7be-1833-431b-8353-161b9dad3cf4@github.com>
References: <V33bqOWqbYrr4ACfUn3SkS_JbOuICbX68z7jlX37pXQ=.a4a6b7ce-da22-440f-b3c7-d0a99a28fea0@github.com>
 <0II1AGxTmud7vWbpWeKGm_vPr_yqFuVaauWEzBN_pMw=.f09fc7be-1833-431b-8353-161b9dad3cf4@github.com>
Message-ID: <soSg_sGJoEer2uaXbS3h6K98oj5vvW5p-SyrT0Y29B8=.64653aa7-e19f-447c-9b44-cb734d33b6b5@github.com>

On Thu, 4 Nov 2021 16:28:52 GMT, Volker Simonis <simonis at openjdk.org> wrote:

>> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening.
>> 
>> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274):
>> 
>>     public static boolean isAlpha(int c) {
>>         try {
>>             return IS_ALPHA[c];
>>         } catch (ArrayIndexOutOfBoundsException ex) {
>>             return false;
>>         }
>>     }
>> 
>> 
>> ### Solution
>> 
>> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code:
>> 
>> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions
>> Benchmark                 (exceptionProbability)  Mode  Cnt      Score      Error  Units
>> ImplicitExceptions.bench                     0.0  avgt    5      1.430 ?    0.353  ns/op
>> ImplicitExceptions.bench                    0.33  avgt    5   3563.038 ?   77.358  ns/op
>> ImplicitExceptions.bench                    0.66  avgt    5   8609.693 ? 1205.104  ns/op
>> ImplicitExceptions.bench                    1.00  avgt    5  12842.401 ? 1022.728  ns/op
>> 
>> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions
>> Benchmark                 (exceptionProbability)  Mode  Cnt      Score      Error  Units
>> ImplicitExceptions.bench                     0.0  avgt    5     1.432  ?    0.352  ns/op
>> ImplicitExceptions.bench                    0.33  avgt    5   355.723  ?   16.641  ns/op
>> ImplicitExceptions.bench                    0.66  avgt    5   887.068  ?  166.728  ns/op
>> ImplicitExceptions.bench                    1.00  avgt    5  1274.418  ?   88.235  ns/op
>> 
>> 
>> ### Implementation details
>> 
>> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default.
>> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `<init>` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. 
>> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`.
>> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call.
>> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster.
>
> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Add new WhiteBox functionality to sun/hotspot/WhiteBox.java as well to avoid warnings in the tests which are still using it.

Hi Martin,

thanks a lot for looking at my PR one more time. I've just pushed an updated version which should address all your points.

Still anything missing?

Best regards,
Volker

-------------

PR: https://git.openjdk.java.net/jdk/pull/5488

From mdoerr at openjdk.java.net  Thu Nov 11 10:54:38 2021
From: mdoerr at openjdk.java.net (Martin Doerr)
Date: Thu, 11 Nov 2021 10:54:38 GMT
Subject: RFR: 8273563: Improve performance of implicit exceptions with
 -XX:-OmitStackTraceInFastThrow [v7]
In-Reply-To: <yfSgmQFp40nSZz0p1aj6W_-QpEXzIKVBOoh9guqQXHI=.a38c0656-174b-4cca-aae6-c1f7b81e2880@github.com>
References: <V33bqOWqbYrr4ACfUn3SkS_JbOuICbX68z7jlX37pXQ=.a4a6b7ce-da22-440f-b3c7-d0a99a28fea0@github.com>
 <yfSgmQFp40nSZz0p1aj6W_-QpEXzIKVBOoh9guqQXHI=.a38c0656-174b-4cca-aae6-c1f7b81e2880@github.com>
Message-ID: <vC0qNUy_ZPWzxnwR1EqYsNZQX3pYKNL6xhnavykrdfg=.305219b4-5f39-448e-aa2c-8701e4f781fc@github.com>

On Thu, 11 Nov 2021 10:28:04 GMT, Volker Simonis <simonis at openjdk.org> wrote:

>> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening.
>> 
>> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274):
>> 
>>     public static boolean isAlpha(int c) {
>>         try {
>>             return IS_ALPHA[c];
>>         } catch (ArrayIndexOutOfBoundsException ex) {
>>             return false;
>>         }
>>     }
>> 
>> 
>> ### Solution
>> 
>> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code:
>> 
>> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions
>> Benchmark                 (exceptionProbability)  Mode  Cnt      Score      Error  Units
>> ImplicitExceptions.bench                     0.0  avgt    5      1.430 ?    0.353  ns/op
>> ImplicitExceptions.bench                    0.33  avgt    5   3563.038 ?   77.358  ns/op
>> ImplicitExceptions.bench                    0.66  avgt    5   8609.693 ? 1205.104  ns/op
>> ImplicitExceptions.bench                    1.00  avgt    5  12842.401 ? 1022.728  ns/op
>> 
>> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions
>> Benchmark                 (exceptionProbability)  Mode  Cnt      Score      Error  Units
>> ImplicitExceptions.bench                     0.0  avgt    5     1.432  ?    0.352  ns/op
>> ImplicitExceptions.bench                    0.33  avgt    5   355.723  ?   16.641  ns/op
>> ImplicitExceptions.bench                    0.66  avgt    5   887.068  ?  166.728  ns/op
>> ImplicitExceptions.bench                    1.00  avgt    5  1274.418  ?   88.235  ns/op
>> 
>> 
>> ### Implementation details
>> 
>> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default.
>> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `<init>` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. 
>> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`.
>> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call.
>> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster.
>
> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Minor enhancements and fixes requested by Martin

Thanks for the updates. LGTM.

-------------

Marked as reviewed by mdoerr (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/5488

From mdoerr at openjdk.java.net  Thu Nov 11 10:54:38 2021
From: mdoerr at openjdk.java.net (Martin Doerr)
Date: Thu, 11 Nov 2021 10:54:38 GMT
Subject: RFR: 8273563: Improve performance of implicit exceptions with
 -XX:-OmitStackTraceInFastThrow [v6]
In-Reply-To: <DX1s7J2WjnbZndN24LTTyxnUG64-AiKPMFNssYuPMws=.18c0cead-3488-4bef-b275-b2683e06c5e3@github.com>
References: <V33bqOWqbYrr4ACfUn3SkS_JbOuICbX68z7jlX37pXQ=.a4a6b7ce-da22-440f-b3c7-d0a99a28fea0@github.com>
 <0II1AGxTmud7vWbpWeKGm_vPr_yqFuVaauWEzBN_pMw=.f09fc7be-1833-431b-8353-161b9dad3cf4@github.com>
 <ErVlQs6sHdvx2AK0XhJQ8XBtlZognM5BLEQ3eisZi9U=.39e6778c-b618-493c-bbbe-0d6d1131c8f4@github.com>
 <DX1s7J2WjnbZndN24LTTyxnUG64-AiKPMFNssYuPMws=.18c0cead-3488-4bef-b275-b2683e06c5e3@github.com>
Message-ID: <1EDY97O7mQZB96nsPoxILTsIaRRoiVmKWkOzq-2ANd8=.2498976c-b5bf-4391-b055-2acb46d99a79@github.com>

On Thu, 11 Nov 2021 09:40:44 GMT, Volker Simonis <simonis at openjdk.org> wrote:

>> src/hotspot/share/prims/whitebox.cpp line 987:
>> 
>>> 985:     bool overflow = false;
>>> 986:     for (uint reason = 0; reason < mdo->trap_reason_limit(); reason++) {
>>> 987:       if (reason_str != NULL && !strcmp(reason_str, Deoptimization::trap_reason_name(reason))) {
>> 
>> Maybe the code would be better readable when checking `reason_str != NULL` first and then use 2 loops? Just a minor suggestion. Should only be done if readability is better.
>
> I've tried it but the resulting version is slightly longer and in my opinion not really more readable:
> 
> WB_ENTRY(jint, WB_GetMethodTrapCount(JNIEnv* env, jobject o, jobject method, jstring reason_obj))
>   jmethodID jmid = reflected_method_to_jmid(thread, env, method);
>   CHECK_JNI_EXCEPTION_(env, 0);
>   methodHandle mh(THREAD, Method::checked_resolve_jmethod_id(jmid));
>   uint cnt = 0;
>   MethodData* mdo = mh->method_data();
>   if (mdo != NULL) {
>     ResourceMark rm(THREAD);
>     if (reason_obj != NULL) {
>       char* reason_str = java_lang_String::as_utf8_string(JNIHandles::resolve_non_null(reason_obj));
>       for (uint reason = 0; reason < mdo->trap_reason_limit(); reason++) {
>         if (!strcmp(reason_str, Deoptimization::trap_reason_name(reason))) {
>           cnt = mdo->trap_count(reason);
>           // Count in the overflow trap count on overflow
>           if (cnt == (uint)-1) {
>             cnt = mdo->trap_count_limit() + mdo->overflow_trap_count();
>           }
>           break;
>         }
>       }
>     } else {
>       bool overflow = false;
>       for (uint reason = 0; reason < mdo->trap_reason_limit(); reason++) {
>         uint c = mdo->trap_count(reason);
>         if (c == (uint)-1) {
>           c = mdo->trap_count_limit();
>           if (!overflow) {
>             // Count overflow trap count just once
>             overflow = true;
>             c += mdo->overflow_trap_count();
>           }
>         }
>         cnt += c;
>       }
>     }
>   }
>   return cnt;
> WB_END
> 
> 
> But for me it's actually no difference. Please just let me know if you'd still prefer the alternative version.
> 
> PS: I've updated the documentation of the method which was inaccurate for `reason==NULL`.

Your two loop version looks a bit easier to read for me, but that may be a matter of taste. I leave you free to decide.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5488

From adinn at openjdk.java.net  Thu Nov 11 11:21:37 2021
From: adinn at openjdk.java.net (Andrew Dinn)
Date: Thu, 11 Nov 2021 11:21:37 GMT
Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v2]
In-Reply-To: <SBwk5kV3fu-jhOxb6OMecdWueqzVp_1_lDGjv7gr4ME=.12b13d5e-d020-4d2f-951d-a4934fccba2f@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
 <SBwk5kV3fu-jhOxb6OMecdWueqzVp_1_lDGjv7gr4ME=.12b13d5e-d020-4d2f-951d-a4934fccba2f@github.com>
Message-ID: <Zelddr8jRi5nOvPRw__-38aJXkhOGXawEF_RtpzeQkU=.2dac76a1-117b-4626-aae7-1d959d01d718@github.com>

On Wed, 10 Nov 2021 13:22:37 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> Alan Hayward has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Simplify branch protection configure check
>
> src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5185:
> 
>> 5183: // ROP Protection
>> 5184: 
>> 5185: void MacroAssembler::protect_return_address() {
> 
> We need proper, full, detailed comments about what these functions do, with reference to primary AArch64 documentation.

As far as the AArch64 docs are concerned the relevant details are provided in ARM-ARM D

- The PAC functionality is described in ARM-ARM Section D5.1.5
- Overview of the PAC instructions is provided in section C3.1.9
- Detailed PAC instruction descriptions are provided in C6.2.195 - C6.2.199

n.b. I am specifically referring to my (possibly out of date) copy ARM-DDI 0487D.a (ID103018) which is the Initial v8.4 EAC release from 2018.

That said, I agree that a description of how these functions use the underlying PAC support and what, effectively, they achieve via that usage would be necessary. A reference to the relevant sections of the ARM doc in the code would be helpful.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6334

From adinn at openjdk.java.net  Thu Nov 11 11:37:40 2021
From: adinn at openjdk.java.net (Andrew Dinn)
Date: Thu, 11 Nov 2021 11:37:40 GMT
Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v2]
In-Reply-To: <Zelddr8jRi5nOvPRw__-38aJXkhOGXawEF_RtpzeQkU=.2dac76a1-117b-4626-aae7-1d959d01d718@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
 <SBwk5kV3fu-jhOxb6OMecdWueqzVp_1_lDGjv7gr4ME=.12b13d5e-d020-4d2f-951d-a4934fccba2f@github.com>
 <Zelddr8jRi5nOvPRw__-38aJXkhOGXawEF_RtpzeQkU=.2dac76a1-117b-4626-aae7-1d959d01d718@github.com>
Message-ID: <fJR-7PBs3P9J6PVQkRFvsnDsS-ETA0uOv9jGAC0jOtg=.6afd810c-8b6c-4cd2-baec-a7c22af6e107@github.com>

On Thu, 11 Nov 2021 11:19:03 GMT, Andrew Dinn <adinn at openjdk.org> wrote:

>> src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5185:
>> 
>>> 5183: // ROP Protection
>>> 5184: 
>>> 5185: void MacroAssembler::protect_return_address() {
>> 
>> We need proper, full, detailed comments about what these functions do, with reference to primary AArch64 documentation.
>
> As far as the AArch64 docs are concerned the relevant details are provided in ARM-ARM D
> 
> - The PAC functionality is described in ARM-ARM Section D5.1.5
> - Overview of the PAC instructions is provided in section C3.1.9
> - Detailed PAC instruction descriptions are provided in C6.2.195 - C6.2.199
> 
> n.b. I am specifically referring to my (possibly out of date) copy ARM-DDI 0487D.a (ID103018) which is the Initial v8.4 EAC release from 2018.
> 
> That said, I agree that a description of how these functions use the underlying PAC support and what, effectively, they achieve via that usage would be necessary. A reference to the relevant sections of the ARM doc in the code would be helpful.

Correction:
Using the most up to date ARM ARM G  [ARM DDI 0487G.a (ID011921)]

- The PAC functionality is described in ARM-ARM Section D5.1.5
- Overview of the PAC instructions is provided in section C3.1.10
- Detailed PAC instruction descriptions are provided in C6.2.208 - C6.2.212

-------------

PR: https://git.openjdk.java.net/jdk/pull/6334

From aph at openjdk.java.net  Thu Nov 11 11:39:41 2021
From: aph at openjdk.java.net (Andrew Haley)
Date: Thu, 11 Nov 2021 11:39:41 GMT
Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v15]
In-Reply-To: <9vTWqXSA_S6TE9iMxqpDBY4kj9smBnxMhE2hBna2V2Q=.2457f83a-c3b3-4fb5-b919-a4fdb843d8b7@github.com>
References: <kNBCb5Kvq3poBJuzHn0mw_MP5ubeoyWUUvvkhkXt2dA=.70e0af5a-b45a-4962-9550-e518747e35fc@github.com>
 <9vTWqXSA_S6TE9iMxqpDBY4kj9smBnxMhE2hBna2V2Q=.2457f83a-c3b3-4fb5-b919-a4fdb843d8b7@github.com>
Message-ID: <bwov6xPHOBK9f_fNBxA19JH249dWh6NCHR8z71_EyIU=.d6c99da5-c162-4f44-9e9c-b8836a3daa74@github.com>

On Thu, 11 Nov 2021 09:36:36 GMT, Evgeny Astigeevich <duke at openjdk.java.net> wrote:

>> This PR is a follow-up on the discussion [?RFC: AArch64: Implementing spin pauses with ISB?](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054033.html).
>> 
>> It adds DIAGNOSTIC options `OnSpinWaitInst=inst`, where `inst` can be:
>> 
>> - `none`: no implementation for spin pauses. This is the default value.
>> - `nop`: use `nop` instruction for spin pauses.
>> - `isb`: use `isb` instruction for spin pauses.
>> - `yield`: use `yield` instruction for spin pauses.
>> 
>> And  `OnSpinWaitInstCount=count`, where `count` specifies a number of `OnSpinWaitInst` and can be in `1..99` range. It is an error to use `OnSpinWaitInstCount` when `OnSpinWaitInst` is `none`.
>> 
>> The code for the `Thread.onSpinWait` intrinsic is generated based on the values of `OnSpinWaitInst` and `OnSpinWaitInstCount`.
>> 
>> Testing:
>> 
>> - `make test TEST="gtest"`: Passed
>> - `make run-test TEST="tier1"`: Passed
>> - `make run-test TEST="tier2"`: Passed
>> - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed
>> 
>> CSR: https://bugs.openjdk.java.net/browse/JDK-8274564
>
> Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision:
> 
>   8275728: Add simple Producer/Consumer microbenchmark for Thread.onSpinWait

Marked as reviewed by aph (Reviewer).

I'm getting this for `-XX:OnSpinWaitInst=yield` on Apple M1:


Benchmark                                (maxNum)  (spinNum)    Score   Error  Units
ThreadOnSpinWaitProducerConsumer.trial       100        125   355.686 ? 1.263  us/op


This for  `-XX:OnSpinWaitInst=none`:


ThreadOnSpinWaitProducerConsumer.trial       100        125   359.635 ? 0.912  us/op


This for `-XX:OnSpinWaitInst=isb`:


ThreadOnSpinWaitProducerConsumer.trial       100        125   169.353 ? 3.932  us/op


Which looks pretty convincing, at least for this benchmark. 

I'm a bit concerned that it took so much effort to find a convincing benchmark, but I note that OnSpinWaitInst=isb doesn't seem to make anything worse, so OK.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5562

From duke at openjdk.java.net  Thu Nov 11 11:47:36 2021
From: duke at openjdk.java.net (Alan Hayward)
Date: Thu, 11 Nov 2021 11:47:36 GMT
Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v2]
In-Reply-To: <fJR-7PBs3P9J6PVQkRFvsnDsS-ETA0uOv9jGAC0jOtg=.6afd810c-8b6c-4cd2-baec-a7c22af6e107@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
 <SBwk5kV3fu-jhOxb6OMecdWueqzVp_1_lDGjv7gr4ME=.12b13d5e-d020-4d2f-951d-a4934fccba2f@github.com>
 <Zelddr8jRi5nOvPRw__-38aJXkhOGXawEF_RtpzeQkU=.2dac76a1-117b-4626-aae7-1d959d01d718@github.com>
 <fJR-7PBs3P9J6PVQkRFvsnDsS-ETA0uOv9jGAC0jOtg=.6afd810c-8b6c-4cd2-baec-a7c22af6e107@github.com>
Message-ID: <R1Pe5357t6oHxMO6NSzM9tecQc3uGqVNwuicBl6GOFM=.c793ae6c-e3b9-4345-a140-cb1603065617@github.com>

On Thu, 11 Nov 2021 11:34:09 GMT, Andrew Dinn <adinn at openjdk.org> wrote:

>> As far as the AArch64 docs are concerned the relevant details are provided in ARM-ARM D
>> 
>> - The PAC functionality is described in ARM-ARM Section D5.1.5
>> - Overview of the PAC instructions is provided in section C3.1.9
>> - Detailed PAC instruction descriptions are provided in C6.2.195 - C6.2.199
>> 
>> n.b. I am specifically referring to my (possibly out of date) copy ARM-DDI 0487D.a (ID103018) which is the Initial v8.4 EAC release from 2018.
>> 
>> That said, I agree that a description of how these functions use the underlying PAC support and what, effectively, they achieve via that usage would be necessary. A reference to the relevant sections of the ARM doc in the code would be helpful.
>
> Correction:
> Using the most up to date ARM ARM G  [ARM DDI 0487G.a (ID011921)]
> 
> - The PAC functionality is described in ARM-ARM Section D5.1.5
> - Overview of the PAC instructions is provided in section C3.1.10
> - Detailed PAC instruction descriptions are provided in C6.2.208 - C6.2.212

I'm thinking for references to the Arm Arm to use header titles instead of section numbers, as the titles should be more stable.

Also probably need some description around the code in the pauth_aarch64.hpp too. But I want to make sure I'm not duplicating comments - maybe the macroassembler comments should point to the pauth_aarch64 comments.

It didn't seen common in the code to describe instruction functionality, which is why I didn't add any. Agreed it needs something added though.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6334

From aph at openjdk.java.net  Thu Nov 11 11:55:34 2021
From: aph at openjdk.java.net (Andrew Haley)
Date: Thu, 11 Nov 2021 11:55:34 GMT
Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v2]
In-Reply-To: <R1Pe5357t6oHxMO6NSzM9tecQc3uGqVNwuicBl6GOFM=.c793ae6c-e3b9-4345-a140-cb1603065617@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
 <SBwk5kV3fu-jhOxb6OMecdWueqzVp_1_lDGjv7gr4ME=.12b13d5e-d020-4d2f-951d-a4934fccba2f@github.com>
 <Zelddr8jRi5nOvPRw__-38aJXkhOGXawEF_RtpzeQkU=.2dac76a1-117b-4626-aae7-1d959d01d718@github.com>
 <fJR-7PBs3P9J6PVQkRFvsnDsS-ETA0uOv9jGAC0jOtg=.6afd810c-8b6c-4cd2-baec-a7c22af6e107@github.com>
 <R1Pe5357t6oHxMO6NSzM9tecQc3uGqVNwuicBl6GOFM=.c793ae6c-e3b9-4345-a140-cb1603065617@github.com>
Message-ID: <_9P-UvEbKu8NkBMq_pPr-_-muZxxHfwa2vV7h9nq6ZQ=.11e2de58-4e9e-4364-bfc9-c17791a19933@github.com>

On Thu, 11 Nov 2021 11:44:09 GMT, Alan Hayward <duke at openjdk.java.net> wrote:

>> Correction:
>> Using the most up to date ARM ARM G  [ARM DDI 0487G.a (ID011921)]
>> 
>> - The PAC functionality is described in ARM-ARM Section D5.1.5
>> - Overview of the PAC instructions is provided in section C3.1.10
>> - Detailed PAC instruction descriptions are provided in C6.2.208 - C6.2.212
>
> I'm thinking for references to the Arm Arm to use header titles instead of section numbers, as the titles should be more stable.
> 
> Also probably need some description around the code in the pauth_aarch64.hpp too. But I want to make sure I'm not duplicating comments - maybe the macroassembler comments should point to the pauth_aarch64 comments.
> 
> It didn't seen common in the code to describe instruction functionality, which is why I didn't add any. Agreed it needs something added though.

Yeah. At the definitions of `authenticate_return_address()` et al you can say what you expect in the normal case and what you expect when you've been hacked, along with an overview. I realize that it was a bit tricky to make this work with HotSpot because we're synthesizing return addresses just like hackers do, so a comment where we're patching return addresses would be nice.

As long as the instructions are easily findable in the docs that's good.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6334

From aph at openjdk.java.net  Thu Nov 11 11:59:35 2021
From: aph at openjdk.java.net (Andrew Haley)
Date: Thu, 11 Nov 2021 11:59:35 GMT
Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v2]
In-Reply-To: <_9P-UvEbKu8NkBMq_pPr-_-muZxxHfwa2vV7h9nq6ZQ=.11e2de58-4e9e-4364-bfc9-c17791a19933@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
 <SBwk5kV3fu-jhOxb6OMecdWueqzVp_1_lDGjv7gr4ME=.12b13d5e-d020-4d2f-951d-a4934fccba2f@github.com>
 <Zelddr8jRi5nOvPRw__-38aJXkhOGXawEF_RtpzeQkU=.2dac76a1-117b-4626-aae7-1d959d01d718@github.com>
 <fJR-7PBs3P9J6PVQkRFvsnDsS-ETA0uOv9jGAC0jOtg=.6afd810c-8b6c-4cd2-baec-a7c22af6e107@github.com>
 <R1Pe5357t6oHxMO6NSzM9tecQc3uGqVNwuicBl6GOFM=.c793ae6c-e3b9-4345-a140-cb1603065617@github.com>
 <_9P-UvEbKu8NkBMq_pPr-_-muZxxHfwa2vV7h9nq6ZQ=.11e2de58-4e9e-4364-bfc9-c17791a19933@github.com>
Message-ID: <OA-WXOz6GVoGgaxVNuxpEX49nBnx7Do4ow2Uv3Xk5w4=.8e28a479-19e0-4ada-ae4a-3e8b8350235f@github.com>

On Thu, 11 Nov 2021 11:52:46 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> I'm thinking for references to the Arm Arm to use header titles instead of section numbers, as the titles should be more stable.
>> 
>> Also probably need some description around the code in the pauth_aarch64.hpp too. But I want to make sure I'm not duplicating comments - maybe the macroassembler comments should point to the pauth_aarch64 comments.
>> 
>> It didn't seen common in the code to describe instruction functionality, which is why I didn't add any. Agreed it needs something added though.
>
> Yeah. At the definitions of `authenticate_return_address()` et al you can say what you expect in the normal case and what you expect when you've been hacked, along with an overview. I realize that it was a bit tricky to make this work with HotSpot because we're synthesizing return addresses just like hackers do, so a comment where we're patching return addresses would be nice.
> 
> As long as the instructions are easily findable in the docs that's good.

Just to be clear: no, don't describe instructions. describe what the macros do, and when to use them. Imagine that you, the reader can't see the contents of the macro at all, just the name and the comments.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6334

From duke at openjdk.java.net  Thu Nov 11 12:02:41 2021
From: duke at openjdk.java.net (Evgeny Astigeevich)
Date: Thu, 11 Nov 2021 12:02:41 GMT
Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v15]
In-Reply-To: <bwov6xPHOBK9f_fNBxA19JH249dWh6NCHR8z71_EyIU=.d6c99da5-c162-4f44-9e9c-b8836a3daa74@github.com>
References: <kNBCb5Kvq3poBJuzHn0mw_MP5ubeoyWUUvvkhkXt2dA=.70e0af5a-b45a-4962-9550-e518747e35fc@github.com>
 <9vTWqXSA_S6TE9iMxqpDBY4kj9smBnxMhE2hBna2V2Q=.2457f83a-c3b3-4fb5-b919-a4fdb843d8b7@github.com>
 <bwov6xPHOBK9f_fNBxA19JH249dWh6NCHR8z71_EyIU=.d6c99da5-c162-4f44-9e9c-b8836a3daa74@github.com>
Message-ID: <YMhRws8oYhEoYLGCQ95Z63FCD__mmZ4NtnDcHPgHLJs=.b660c994-a445-4642-9900-c3db48029812@github.com>

On Thu, 11 Nov 2021 11:35:17 GMT, Andrew Haley <aph at openjdk.org> wrote:

> I'm a bit concerned that it took so much effort to find a convincing benchmark, but I note that OnSpinWaitInst=isb doesn't seem to make anything worse, so OK.

Thank you  Andrew.
It took the time to study the current use cases of `Thread.onSpinWait` why they got performance improved or did not. As usual when you have written something simple you need to check it is correct. All of these took most of the time.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5562

From ihse at openjdk.java.net  Thu Nov 11 12:05:43 2021
From: ihse at openjdk.java.net (Magnus Ihse Bursie)
Date: Thu, 11 Nov 2021 12:05:43 GMT
Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v2]
In-Reply-To: <DqpP1khabg9YQPvJsdAZm5Bl-aV5tCBhuIC_2yeNvPU=.1dcc624c-b345-42de-acc6-0341a3118dc7@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
 <DqpP1khabg9YQPvJsdAZm5Bl-aV5tCBhuIC_2yeNvPU=.1dcc624c-b345-42de-acc6-0341a3118dc7@github.com>
Message-ID: <N_iYPZWs9M7vK-yj6bFwo9BeLyqauuBOYs5sjYPb4Xw=.40e54aad-1bf6-46ef-ad7a-ae28bdff0113@github.com>

On Thu, 11 Nov 2021 08:48:07 GMT, Alan Hayward <duke at openjdk.java.net> wrote:

>> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One
>> of its uses is to protect against ROP based attacks. This is done by
>> signing the Link Register whenever it is stored on the stack, and
>> authenticating the value when it is loaded back from the stack. If an
>> attacker were to try to change control flow by editing the stack then
>> the authentication check of the Link Register will fail, causing a
>> segfault when the function returns.
>> 
>> On a system with PAC enabled, it is expected that all applications will
>> be compiled with ROP protection. Fedora 33 and upwards already provide
>> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of
>> PAC instructions that exist in the NOP space - on hardware without PAC,
>> these instructions act as NOPs, allowing backward compatibility for
>> negligible performance cost (2 NOPs per non-leaf function).
>> 
>> Hardware is currently limited to the Apple M1 MacBooks. All testing has
>> been done within a Fedora Docker image. A run of SpecJVM showed no
>> difference to that of noise - which was surprising.
>> 
>> The most important part of this patch is simply compiling using branch
>> protection provided by GCC/LLVM. This protects all C++ code from being
>> used in ROP attacks, removing all static ROP gadgets from use.
>> 
>> The remainder of the patch adds ROP protection to runtime generated
>> code, in both stubs and compiled Java code. Attacks here are much harder
>> as ROP gadgets must be found dynamically at runtime. If/when AOT
>> compilation is added to JDK, then all stubs and compiled Java will be
>> susceptible ROP gadgets being found by static analysis and therefore
>> potentially as vulnerable as C++ code.
>> 
>> There are a number of places where the VM changes control flow by
>> rewriting the stack or otherwise. I?ve done some analysis as to how
>> these could also be used for attacks (which I didn?t want to post here).
>> These areas can be protected ensuring the pointers to various stubs and
>> entry points are stored in memory as signed pointers. These changes are
>> simple to make (they can be reduced to a type change in common code and
>> a few addition sign/auth calls in the backend), but there a lot of them
>> and the total code change is fairly large. I?m happy to provide a few
>> work in progress patches.
>> 
>> In order to match the security benefits of the Apple Arm64e ABI across
>> the whole of JDK, then all the changes mentioned above would be
>> required.
>
> Alan Hayward has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Simplify branch protection configure check

Build changes look much better now, thanks!

Build part approved; the actual code changes needs approval from others.

-------------

Marked as reviewed by ihse (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6334

From mdoerr at openjdk.java.net  Thu Nov 11 12:06:34 2021
From: mdoerr at openjdk.java.net (Martin Doerr)
Date: Thu, 11 Nov 2021 12:06:34 GMT
Subject: RFR: 8258192: Obsolete the CriticalJNINatives flag
In-Reply-To: <ob57YhEZroN10gysG3zLpZ3Q8GJr2jllHF3MCKJZViw=.58908e4d-3b94-49d7-9fef-a501e001207d@github.com>
References: <ob57YhEZroN10gysG3zLpZ3Q8GJr2jllHF3MCKJZViw=.58908e4d-3b94-49d7-9fef-a501e001207d@github.com>
Message-ID: <WYKuvlsythW_WS-V8mEuJibbLOHip2Gm66WFXmVrULg=.156bcb2e-3009-4fa3-9651-797f4acb879a@github.com>

On Wed, 10 Nov 2021 22:06:05 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

> This change removes the disabled CriticalJNINatives code, and the flag now gives an obsolete message.
> Tested with tier1 on cpus x64, aarch64, and builds on linux-x86-open,linux-s390x-open,linux-arm32-debug,linux-ppc64le-debug.

Thanks for taking care of all platforms. `move_ptr(MacroAssembler*, VMRegPair, VMRegPair, int)` needs to get removed to avoid build warnings on PPC64 and s390.

-------------

Changes requested by mdoerr (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6343

From rkennke at openjdk.java.net  Thu Nov 11 12:10:41 2021
From: rkennke at openjdk.java.net (Roman Kennke)
Date: Thu, 11 Nov 2021 12:10:41 GMT
Subject: RFR: 8276927: [PPC64] Port shenandoahgc to linux on ppc64le
In-Reply-To: <Nt7KC3zC3ERbpcjIda_FrsJYNuDtzbd_khX75A1x4aE=.676a48b5-c7f4-4798-a3b8-516bb60dfaa4@github.com>
References: <Nt7KC3zC3ERbpcjIda_FrsJYNuDtzbd_khX75A1x4aE=.676a48b5-c7f4-4798-a3b8-516bb60dfaa4@github.com>
Message-ID: <CYnzc3NijRkUPtdp1X1qypAbyejCK3jNLo2xMKLX9kI=.a7642bbe-d06f-4228-a5de-12abc229ea0e@github.com>

On Wed, 10 Nov 2021 09:00:04 GMT, Niklas Radomski <nradomski at openjdk.org> wrote:

> Port the Shenandoah garbage collector (JDK-8241457)[https://bugs.openjdk.java.net/browse/JDK-8241457] to linux on ppc64le.

Hi Niklas,
thanks for this awesome work!
I can't really comment on the actual PPC code, so this needs to be reviewed by somebody else. Structurally the change looks correct. I have one comment about the C1 CAS barrier code, but it's minor.

Thanks & cheers,
Roman

src/hotspot/cpu/ppc/gc/shenandoah/c1/shenandoahBarrierSetC1_ppc.cpp line 83:

> 81:     LIRGenerator* gen = access.gen();
> 82: 
> 83:     if (ShenandoahCASBarrier) {

I am not sure, but I almost think we should not even end up in the method with -ShenandoahCASBarrier. If anything, -ShenandoahCASBarrier should result in only calling super to emit regular CAS without any barriers.

-------------

Marked as reviewed by rkennke (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6325

From ihse at openjdk.java.net  Thu Nov 11 12:21:32 2021
From: ihse at openjdk.java.net (Magnus Ihse Bursie)
Date: Thu, 11 Nov 2021 12:21:32 GMT
Subject: RFR: 8276927: [PPC64] Port shenandoahgc to linux on ppc64le
In-Reply-To: <Nt7KC3zC3ERbpcjIda_FrsJYNuDtzbd_khX75A1x4aE=.676a48b5-c7f4-4798-a3b8-516bb60dfaa4@github.com>
References: <Nt7KC3zC3ERbpcjIda_FrsJYNuDtzbd_khX75A1x4aE=.676a48b5-c7f4-4798-a3b8-516bb60dfaa4@github.com>
Message-ID: <p5_Yq4EdKsNbeyRzUB6qLuaW0vi5K5lyRz08dd2Facg=.c38b5711-f84b-4692-863c-80628c425002@github.com>

On Wed, 10 Nov 2021 09:00:04 GMT, Niklas Radomski <nradomski at openjdk.org> wrote:

> Port the Shenandoah garbage collector (JDK-8241457)[https://bugs.openjdk.java.net/browse/JDK-8241457] to linux on ppc64le.

Build changes look good. Actual code changes needs to be reviewed by someone more knowledgable about this area.

-------------

Marked as reviewed by ihse (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6325

From mcimadamore at openjdk.java.net  Thu Nov 11 13:03:56 2021
From: mcimadamore at openjdk.java.net (Maurizio Cimadamore)
Date: Thu, 11 Nov 2021 13:03:56 GMT
Subject: RFR: 8275063: Implementation of Foreign Function & Memory API
 (Second incubator) [v23]
In-Reply-To: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
References: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
Message-ID: <wy9gnwC78B2cCE1F0kxHLn85x4uXOuSq_f3_X9BBaQE=.2780f332-6902-4373-8cf6-71149eefca32@github.com>

> This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment.
> 
> [1] - https://openjdk.java.net/jeps/419

Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 32 commits:

 - Merge branch 'master' into JEP-419
 - Revert removal of upcall MH customization
   (This change caused spurious VM crashes, so reverting to baseline)
 - Further tweak upcall safety considerations
 - Clarify safety considerations for upcalls
 - Rename MemorySegment::ofAddressNative to MemorySegment::ofAddress
   (which is consistent with other restricted factories in VaList and NativeSymbol)
 - Streamline javadoc for package-info
 - * Add two new CLinker static methods to compute upcall/downcall method types
   * Clarify section on CLinker downcall type
   * Add section on CLinker safety guarantees
 - Fix TestUpcall
   * reverse() has a bug, as it doesn't tweak parameter types
   * reverse() is applied to the wrong MH
 - Make ArenaAllocator impl more flexible in the face of OOME
   An ArenaAllocator should remain open for business, even if OOME is thrown in case other allocations can fit the arena size.
 - Simplify ArenaAllocator impl.
   The arena should respect its boundaries and never allocate more memory than its size specifies.
 - ... and 22 more: https://git.openjdk.java.net/jdk/compare/aea09677...8c3860f8

-------------

Changes: https://git.openjdk.java.net/jdk/pull/5907/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=22
  Stats: 14686 lines in 193 files changed: 6956 ins; 5120 del; 2610 mod
  Patch: https://git.openjdk.java.net/jdk/pull/5907.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/5907/head:pull/5907

PR: https://git.openjdk.java.net/jdk/pull/5907

From coleenp at openjdk.java.net  Thu Nov 11 13:35:34 2021
From: coleenp at openjdk.java.net (Coleen Phillimore)
Date: Thu, 11 Nov 2021 13:35:34 GMT
Subject: RFR: 8276658: Clean up JNI local handles code
In-Reply-To: <GP4f8enTs4vqQU27VCXHUjcsNVr58UkShEQ5_y01oPI=.084d7f86-531c-48eb-b740-e58842451118@github.com>
References: <ghp2JMOJnBtdd8Gu4gCIp98JraT8n7YOLvAiAJIrQBU=.8cafc56a-cf56-4209-9ba0-636325123d9e@github.com>
 <GP4f8enTs4vqQU27VCXHUjcsNVr58UkShEQ5_y01oPI=.084d7f86-531c-48eb-b740-e58842451118@github.com>
Message-ID: <xX8gztc0A-YBDJEmK-lkGInuZNVKHUL9fzkLQ6jW3KU=.6a4a3f65-a954-40a7-ac8e-750b1882e887@github.com>

On Thu, 11 Nov 2021 06:35:59 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> JNI Local handles can only be created by JavaThread (there's an assert in make_local) but the fields are added to Thread.
>> Move the fields to JavaThread and adding JavaThread* argument.
>> Also, the global freelist isn't very useful now that global JNI handles don't use JNIHandleBlock, so the locking that claims incorrectly to block for safepoint is removed.
>> Lastly, there's at least 3 places that duplicate pushing a new JNIHandleBlock to the thread for temporarily adding JNI local handles. These have been moved to common code with a JNIHandleMark object, moved from jvmci code.
>> The commits are separate to help reviewing, but the entire change has been tested together with tier1-6.
>> The commits in this change have been performance tested individually and together with no meaningful differences from mainline.
>
> src/hotspot/share/runtime/vmThread.hpp line 63:
> 
>> 61: class VMThread: public NamedThread {
>> 62:  private:
>> 63:   volatile bool _is_running;
> 
> I don't see this being initialized to false.

Good catch!

-------------

PR: https://git.openjdk.java.net/jdk/pull/6336

From coleenp at openjdk.java.net  Thu Nov 11 13:39:35 2021
From: coleenp at openjdk.java.net (Coleen Phillimore)
Date: Thu, 11 Nov 2021 13:39:35 GMT
Subject: RFR: 8276658: Clean up JNI local handles code
In-Reply-To: <GP4f8enTs4vqQU27VCXHUjcsNVr58UkShEQ5_y01oPI=.084d7f86-531c-48eb-b740-e58842451118@github.com>
References: <ghp2JMOJnBtdd8Gu4gCIp98JraT8n7YOLvAiAJIrQBU=.8cafc56a-cf56-4209-9ba0-636325123d9e@github.com>
 <GP4f8enTs4vqQU27VCXHUjcsNVr58UkShEQ5_y01oPI=.084d7f86-531c-48eb-b740-e58842451118@github.com>
Message-ID: <B0tVtRqJvRZdJxuHPY4SY6ZyHXyaCjIWbKrnrcnUI7Q=.5952897b-73f9-4507-ab31-c004ea54483a@github.com>

On Thu, 11 Nov 2021 06:52:45 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> JNI Local handles can only be created by JavaThread (there's an assert in make_local) but the fields are added to Thread.
>> Move the fields to JavaThread and adding JavaThread* argument.
>> Also, the global freelist isn't very useful now that global JNI handles don't use JNIHandleBlock, so the locking that claims incorrectly to block for safepoint is removed.
>> Lastly, there's at least 3 places that duplicate pushing a new JNIHandleBlock to the thread for temporarily adding JNI local handles. These have been moved to common code with a JNIHandleMark object, moved from jvmci code.
>> The commits are separate to help reviewing, but the entire change has been tested together with tier1-6.
>> The commits in this change have been performance tested individually and together with no meaningful differences from mainline.
>
> src/hotspot/share/compiler/compileBroker.cpp line 2324:
> 
>> 2322:   // Remove the JNI handle block after the ciEnv destructor has run in
>> 2323:   // the previous block.
>> 2324:   pop_jni_handle_block();
> 
> Does the fact the JNIHandleMark destructor won't get executed until much later, at the end of this method, make any difference?

I don't think so because most of the rest of the function is logging and it doesn't seem to do anything with JNIHandles afterwards, so there are no handles created that shouldn't be removed in that code range.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6336

From chagedorn at openjdk.java.net  Thu Nov 11 13:52:37 2021
From: chagedorn at openjdk.java.net (Christian Hagedorn)
Date: Thu, 11 Nov 2021 13:52:37 GMT
Subject: RFR: 8276095: ciReplay: replay failure due to incomplete
 ciMethodData information
In-Reply-To: <EjhE5ReChm9c2C6F0K-WE94bWhagEW7y7qcWLlp6EvY=.a49a439b-32f5-40b1-a81c-47ed0d8047ee@github.com>
References: <EjhE5ReChm9c2C6F0K-WE94bWhagEW7y7qcWLlp6EvY=.a49a439b-32f5-40b1-a81c-47ed0d8047ee@github.com>
Message-ID: <k-dnWvGTeWvZ6-oQ1K49JQgX2PEpiFEfzo3qHuNRdOA=.1e28650b-9aaf-478f-8d1b-9d4f12e428cd@github.com>

On Thu, 11 Nov 2021 03:28:40 GMT, Dean Long <dlong at openjdk.org> wrote:

> The replay data was missing MethodData::_invocation_counter.  Adding it seems to fix the problem.  @rwestrel please verify if it works for you.  Also, with this change:
> 1. added a version number to the replay file
> 2. removed unnused ci fields
> 3. corrected comment in TestLambdas.java

src/hotspot/share/ci/ciReplay.cpp line 118:

> 116:   bool    _protection_domain_initialized;
> 117:   Handle  _loader;
> 118:   int     _version;

You forgot to initialize `_version` to 0. Otherwise, it could contain garbage for old replay files without version number (possibly `> REPLAY_VERSION`).

-------------

PR: https://git.openjdk.java.net/jdk/pull/6344

From coleenp at openjdk.java.net  Thu Nov 11 13:58:06 2021
From: coleenp at openjdk.java.net (Coleen Phillimore)
Date: Thu, 11 Nov 2021 13:58:06 GMT
Subject: RFR: 8276658: Clean up JNI local handles code [v2]
In-Reply-To: <ghp2JMOJnBtdd8Gu4gCIp98JraT8n7YOLvAiAJIrQBU=.8cafc56a-cf56-4209-9ba0-636325123d9e@github.com>
References: <ghp2JMOJnBtdd8Gu4gCIp98JraT8n7YOLvAiAJIrQBU=.8cafc56a-cf56-4209-9ba0-636325123d9e@github.com>
Message-ID: <QHf8KJvHHDGrFnM1hx0QhovzK6aB6YciS4y-GRfJagg=.681d8576-3afb-4a48-b21b-72b98a1f2b75@github.com>

> JNI Local handles can only be created by JavaThread (there's an assert in make_local) but the fields are added to Thread.
> Move the fields to JavaThread and adding JavaThread* argument.
> Also, the global freelist isn't very useful now that global JNI handles don't use JNIHandleBlock, so the locking that claims incorrectly to block for safepoint is removed.
> Lastly, there's at least 3 places that duplicate pushing a new JNIHandleBlock to the thread for temporarily adding JNI local handles. These have been moved to common code with a JNIHandleMark object, moved from jvmci code.
> The commits are separate to help reviewing, but the entire change has been tested together with tier1-6.
> The commits in this change have been performance tested individually and together with no meaningful differences from mainline.

Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision:

  Add _is_running initialization.

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6336/files
  - new: https://git.openjdk.java.net/jdk/pull/6336/files/239e9246..f31dfeee

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6336&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6336&range=00-01

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6336.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6336/head:pull/6336

PR: https://git.openjdk.java.net/jdk/pull/6336

From coleenp at openjdk.java.net  Thu Nov 11 13:58:07 2021
From: coleenp at openjdk.java.net (Coleen Phillimore)
Date: Thu, 11 Nov 2021 13:58:07 GMT
Subject: RFR: 8276658: Clean up JNI local handles code
In-Reply-To: <ghp2JMOJnBtdd8Gu4gCIp98JraT8n7YOLvAiAJIrQBU=.8cafc56a-cf56-4209-9ba0-636325123d9e@github.com>
References: <ghp2JMOJnBtdd8Gu4gCIp98JraT8n7YOLvAiAJIrQBU=.8cafc56a-cf56-4209-9ba0-636325123d9e@github.com>
Message-ID: <J58hsCvjnhhUFHUQb6hADCDE-v7F1snIvrdnvVG2x6Q=.3f8802fd-40c7-4aca-99c8-71dc8d1f7bfe@github.com>

On Wed, 10 Nov 2021 17:16:29 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

> JNI Local handles can only be created by JavaThread (there's an assert in make_local) but the fields are added to Thread.
> Move the fields to JavaThread and adding JavaThread* argument.
> Also, the global freelist isn't very useful now that global JNI handles don't use JNIHandleBlock, so the locking that claims incorrectly to block for safepoint is removed.
> Lastly, there's at least 3 places that duplicate pushing a new JNIHandleBlock to the thread for temporarily adding JNI local handles. These have been moved to common code with a JNIHandleMark object, moved from jvmci code.
> The commits are separate to help reviewing, but the entire change has been tested together with tier1-6.
> The commits in this change have been performance tested individually and together with no meaningful differences from mainline.

Thank you for the code review, David.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6336

From coleenp at openjdk.java.net  Thu Nov 11 14:12:41 2021
From: coleenp at openjdk.java.net (Coleen Phillimore)
Date: Thu, 11 Nov 2021 14:12:41 GMT
Subject: RFR: 8258192: Obsolete the CriticalJNINatives flag
In-Reply-To: <SRNSoDTVFRw_mF5XktqAo_Fnf8opDZZxXfpPfM2FcK8=.e427ef57-7621-4b3a-a90f-a3cc95855846@github.com>
References: <ob57YhEZroN10gysG3zLpZ3Q8GJr2jllHF3MCKJZViw=.58908e4d-3b94-49d7-9fef-a501e001207d@github.com>
 <SRNSoDTVFRw_mF5XktqAo_Fnf8opDZZxXfpPfM2FcK8=.e427ef57-7621-4b3a-a90f-a3cc95855846@github.com>
Message-ID: <2YkKB7ZvIjfaTezJh_BSpwf1PdBH4GEW7Nnt43yDphU=.e37b5f9a-258b-4a1b-9de9-b7c0933535ec@github.com>

On Thu, 11 Nov 2021 07:16:27 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> This change removes the disabled CriticalJNINatives code, and the flag now gives an obsolete message.
>> Tested with tier1 on cpus x64, aarch64, and builds on linux-x86-open,linux-s390x-open,linux-arm32-debug,linux-ppc64le-debug.
>
> src/hotspot/cpu/ppc/sharedRuntime_ppc.cpp line 1746:
> 
>> 1744:   // NW     [ABI_REG_ARGS]             <-- 1) R1_SP
>> 1745:   //        [outgoing arguments]       <-- 2) R1_SP + out_arg_slot_offset
>> 1746:   //        [oopHandle area]           <-- 3) R1_SP + oop_handle_offset (save area for critical natives) ?
> 
> `?`. The comment `(save area for critical natives)` must be redundant now.

I didn't know if the save area is still needed for something else, which is why I left the ?.  I can remove the comment but haven't made any  substantial changes here.  I'm not sure if they're needed or not, but I can't test them if I made them.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6343

From coleenp at openjdk.java.net  Thu Nov 11 14:22:41 2021
From: coleenp at openjdk.java.net (Coleen Phillimore)
Date: Thu, 11 Nov 2021 14:22:41 GMT
Subject: RFR: 8258192: Obsolete the CriticalJNINatives flag
In-Reply-To: <ob57YhEZroN10gysG3zLpZ3Q8GJr2jllHF3MCKJZViw=.58908e4d-3b94-49d7-9fef-a501e001207d@github.com>
References: <ob57YhEZroN10gysG3zLpZ3Q8GJr2jllHF3MCKJZViw=.58908e4d-3b94-49d7-9fef-a501e001207d@github.com>
Message-ID: <ffFytP84whd3zwjzlINgtpCJISklPafpBTDHb04y25M=.94d9f345-ad59-4074-9ed0-999858452516@github.com>

On Wed, 10 Nov 2021 22:06:05 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

> This change removes the disabled CriticalJNINatives code, and the flag now gives an obsolete message.
> Tested with tier1 on cpus x64, aarch64, and builds on linux-x86-open,linux-s390x-open,linux-arm32-debug,linux-ppc64le-debug.

Thanks for reviewing, Aleksey.  I made the changes and will retest.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6343

From shade at openjdk.java.net  Thu Nov 11 14:22:42 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Thu, 11 Nov 2021 14:22:42 GMT
Subject: RFR: 8258192: Obsolete the CriticalJNINatives flag
In-Reply-To: <2YkKB7ZvIjfaTezJh_BSpwf1PdBH4GEW7Nnt43yDphU=.e37b5f9a-258b-4a1b-9de9-b7c0933535ec@github.com>
References: <ob57YhEZroN10gysG3zLpZ3Q8GJr2jllHF3MCKJZViw=.58908e4d-3b94-49d7-9fef-a501e001207d@github.com>
 <SRNSoDTVFRw_mF5XktqAo_Fnf8opDZZxXfpPfM2FcK8=.e427ef57-7621-4b3a-a90f-a3cc95855846@github.com>
 <2YkKB7ZvIjfaTezJh_BSpwf1PdBH4GEW7Nnt43yDphU=.e37b5f9a-258b-4a1b-9de9-b7c0933535ec@github.com>
Message-ID: <i0WnOdaRq-viKl6mvyF_1fbiJkt7VXixS8aipRvu-KY=.60d20d60-383c-4f27-bdec-a40105e22b52@github.com>

On Thu, 11 Nov 2021 14:09:12 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

>> src/hotspot/cpu/ppc/sharedRuntime_ppc.cpp line 1746:
>> 
>>> 1744:   // NW     [ABI_REG_ARGS]             <-- 1) R1_SP
>>> 1745:   //        [outgoing arguments]       <-- 2) R1_SP + out_arg_slot_offset
>>> 1746:   //        [oopHandle area]           <-- 3) R1_SP + oop_handle_offset (save area for critical natives) ?
>> 
>> `?`. The comment `(save area for critical natives)` must be redundant now.
>
> I didn't know if the save area is still needed for something else, which is why I left the ?.  I can remove the comment but haven't made any  substantial changes here.  I'm not sure if they're needed or not, but I can't test them if I made them.

I mean, you did the same here: https://github.com/openjdk/jdk/pull/6343/files#diff-060e534de775616a893aa969f3639e53666cda9e93bed7c3a3c14b9cdc4cdba0L1553-L1554 -- and that change is understandable.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6343

From coleenp at openjdk.java.net  Thu Nov 11 14:22:45 2021
From: coleenp at openjdk.java.net (Coleen Phillimore)
Date: Thu, 11 Nov 2021 14:22:45 GMT
Subject: RFR: 8258192: Obsolete the CriticalJNINatives flag
In-Reply-To: <SRNSoDTVFRw_mF5XktqAo_Fnf8opDZZxXfpPfM2FcK8=.e427ef57-7621-4b3a-a90f-a3cc95855846@github.com>
References: <ob57YhEZroN10gysG3zLpZ3Q8GJr2jllHF3MCKJZViw=.58908e4d-3b94-49d7-9fef-a501e001207d@github.com>
 <SRNSoDTVFRw_mF5XktqAo_Fnf8opDZZxXfpPfM2FcK8=.e427ef57-7621-4b3a-a90f-a3cc95855846@github.com>
Message-ID: <CZOsJHuS4ikizyScerSAM9OancK4IcgYi5pmMa50SBE=.c2d36800-d4b7-413f-b943-310d353c7bb9@github.com>

On Thu, 11 Nov 2021 07:19:57 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> This change removes the disabled CriticalJNINatives code, and the flag now gives an obsolete message.
>> Tested with tier1 on cpus x64, aarch64, and builds on linux-x86-open,linux-s390x-open,linux-arm32-debug,linux-ppc64le-debug.
>
> src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp line 1551:
> 
>> 1549:   int total_c_args = total_in_args+1;
>> 1550:   if (method->is_static()) {
>> 1551:     total_c_args++;
> 
> In this patch, sometimes we keep the if structure, like here, but in other places, we replace this with:
> 
>   int total_c_args = total_in_args + (method->is_static() ? 2 : 1)
> 
> Should probably stick with a single style.

Ok, that's a good suggestion.  Some platforms have a method_is_static boolean and some don't, so I didn't clean up the platforms that do that later in a different way (or inconsistently).

> src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp line 1793:
> 
>> 1791:     int c_arg = arg_order.at(ai + 1);
>> 1792:     __ block_comment(err_msg("move %d -> %d", i, c_arg));
>> 1793:     assert (c_arg != -1, "wrong direction");
> 
> `assert (c_arg != -1 && i != -1, "wrong direction");`?

removed.

> src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp line 1905:
> 
>> 1903:   } else {
>> 1904:     // Compute a valid move order, using tmp_vmreg to break any cycles
>> 1905:     ComputeMoveOrder cmo(total_in_args, in_regs, total_c_args, out_regs, in_sig_bt, arg_order, tmp_vmreg);
> 
> `ComputeMoveOrder` is still used somewhere, or?

Yes, it's used in
cpu/x86/universalUpcallHandler_x86_64.cpp:  SharedRuntime::compute_move_order(in_sig_bt,

> src/hotspot/share/runtime/sharedRuntime.cpp line 3019:
> 
>> 3017:   if (CriticalJNINatives && !method->is_method_handle_intrinsic()) {
>> 3018:     // We perform the I/O with transition to native before acquiring AdapterHandlerLibrary_lock.
>> 3019:     critical_entry = NativeLookup::lookup_critical_entry(method);
> 
> `critical_entry` variable is now redundant?

removed, thanks for spotting that.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6343

From fweimer at openjdk.java.net  Thu Nov 11 14:23:43 2021
From: fweimer at openjdk.java.net (Florian Weimer)
Date: Thu, 11 Nov 2021 14:23:43 GMT
Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v2]
In-Reply-To: <DqpP1khabg9YQPvJsdAZm5Bl-aV5tCBhuIC_2yeNvPU=.1dcc624c-b345-42de-acc6-0341a3118dc7@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
 <DqpP1khabg9YQPvJsdAZm5Bl-aV5tCBhuIC_2yeNvPU=.1dcc624c-b345-42de-acc6-0341a3118dc7@github.com>
Message-ID: <I9BcKG5hATOeB9VB4UgUlFE69lrwh7V1kYc1-aM1RZE=.7958349d-a0ce-4e64-ae2f-cb7099db5704@github.com>

On Thu, 11 Nov 2021 08:48:07 GMT, Alan Hayward <duke at openjdk.java.net> wrote:

>> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One
>> of its uses is to protect against ROP based attacks. This is done by
>> signing the Link Register whenever it is stored on the stack, and
>> authenticating the value when it is loaded back from the stack. If an
>> attacker were to try to change control flow by editing the stack then
>> the authentication check of the Link Register will fail, causing a
>> segfault when the function returns.
>> 
>> On a system with PAC enabled, it is expected that all applications will
>> be compiled with ROP protection. Fedora 33 and upwards already provide
>> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of
>> PAC instructions that exist in the NOP space - on hardware without PAC,
>> these instructions act as NOPs, allowing backward compatibility for
>> negligible performance cost (2 NOPs per non-leaf function).
>> 
>> Hardware is currently limited to the Apple M1 MacBooks. All testing has
>> been done within a Fedora Docker image. A run of SpecJVM showed no
>> difference to that of noise - which was surprising.
>> 
>> The most important part of this patch is simply compiling using branch
>> protection provided by GCC/LLVM. This protects all C++ code from being
>> used in ROP attacks, removing all static ROP gadgets from use.
>> 
>> The remainder of the patch adds ROP protection to runtime generated
>> code, in both stubs and compiled Java code. Attacks here are much harder
>> as ROP gadgets must be found dynamically at runtime. If/when AOT
>> compilation is added to JDK, then all stubs and compiled Java will be
>> susceptible ROP gadgets being found by static analysis and therefore
>> potentially as vulnerable as C++ code.
>> 
>> There are a number of places where the VM changes control flow by
>> rewriting the stack or otherwise. I?ve done some analysis as to how
>> these could also be used for attacks (which I didn?t want to post here).
>> These areas can be protected ensuring the pointers to various stubs and
>> entry points are stored in memory as signed pointers. These changes are
>> simple to make (they can be reduced to a type change in common code and
>> a few addition sign/auth calls in the backend), but there a lot of them
>> and the total code change is fairly large. I?m happy to provide a few
>> work in progress patches.
>> 
>> In order to match the security benefits of the Apple Arm64e ABI across
>> the whole of JDK, then all the changes mentioned above would be
>> required.
>
> Alan Hayward has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Simplify branch protection configure check

Is the code still mapped read-write all the time?

src/hotspot/cpu/aarch64/globals_aarch64.hpp line 115:

> 113:           range(-1, 4096)                                               \
> 114:   product(bool, UseROPProtection, false,                                \
> 115:           "Use ROP based branch protection")                            \

The description is not correct. It's protection against certain ROP-based attack techniques.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6334

From coleenp at openjdk.java.net  Thu Nov 11 14:25:45 2021
From: coleenp at openjdk.java.net (Coleen Phillimore)
Date: Thu, 11 Nov 2021 14:25:45 GMT
Subject: RFR: 8258192: Obsolete the CriticalJNINatives flag
In-Reply-To: <WYKuvlsythW_WS-V8mEuJibbLOHip2Gm66WFXmVrULg=.156bcb2e-3009-4fa3-9651-797f4acb879a@github.com>
References: <ob57YhEZroN10gysG3zLpZ3Q8GJr2jllHF3MCKJZViw=.58908e4d-3b94-49d7-9fef-a501e001207d@github.com>
 <WYKuvlsythW_WS-V8mEuJibbLOHip2Gm66WFXmVrULg=.156bcb2e-3009-4fa3-9651-797f4acb879a@github.com>
Message-ID: <q-Lo6y73ApT9p0iWYbOlDnXYrMayKX_03b9ESkM8XcI=.68f28eaf-bc22-44f0-b331-e85086b3433d@github.com>

On Thu, 11 Nov 2021 12:03:55 GMT, Martin Doerr <mdoerr at openjdk.org> wrote:

>> This change removes the disabled CriticalJNINatives code, and the flag now gives an obsolete message.
>> Tested with tier1 on cpus x64, aarch64, and builds on linux-x86-open,linux-s390x-open,linux-arm32-debug,linux-ppc64le-debug.
>
> Thanks for taking care of all platforms. `move_ptr(MacroAssembler*, VMRegPair, VMRegPair, int)` needs to get removed to avoid build warnings on PPC64 and s390.

Thanks for finding move_ptr @TheRealMDoerr.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6343

From mdoerr at openjdk.java.net  Thu Nov 11 14:34:39 2021
From: mdoerr at openjdk.java.net (Martin Doerr)
Date: Thu, 11 Nov 2021 14:34:39 GMT
Subject: RFR: 8276927: [PPC64] Port shenandoahgc to linux on ppc64le
In-Reply-To: <Nt7KC3zC3ERbpcjIda_FrsJYNuDtzbd_khX75A1x4aE=.676a48b5-c7f4-4798-a3b8-516bb60dfaa4@github.com>
References: <Nt7KC3zC3ERbpcjIda_FrsJYNuDtzbd_khX75A1x4aE=.676a48b5-c7f4-4798-a3b8-516bb60dfaa4@github.com>
Message-ID: <bN4qbfy9l9Ok6oGyT6nFbChiJ1KWcmUdGXnvRzKrKfQ=.89568c7c-64d8-4d5f-b191-64b7f4eea03e@github.com>

On Wed, 10 Nov 2021 09:00:04 GMT, Niklas Radomski <nradomski at openjdk.org> wrote:

> Port the Shenandoah garbage collector (JDK-8241457)[https://bugs.openjdk.java.net/browse/JDK-8241457] to linux on ppc64le.

Nice work! Looks correct.
For others: Note that this change already contains feedback from my offline review.

src/hotspot/cpu/ppc/gc/shenandoah/shenandoahBarrierSetAssembler_ppc.cpp line 74:

> 72:   // IU barriers are also employed to avoid resurrection of weak references,
> 73:   // even if Shenandoah does not operate in incremental update mode.
> 74:   if (ShenandoahIUBarrier || ShenandoahSATBBarrier) {

Sharing the code for IU and SATB sounds like a good idea, but one needs to be careful. `ShenandoahBarrierSetC1::iu_barrier` only works with `ShenandoahIUBarrier`, so this trick can't be used in C1.
It's a bit confusing, but I'm ok with this version. At least, I don't have any better suggestion at the moment.

-------------

Marked as reviewed by mdoerr (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6325

From mdoerr at openjdk.java.net  Thu Nov 11 14:34:40 2021
From: mdoerr at openjdk.java.net (Martin Doerr)
Date: Thu, 11 Nov 2021 14:34:40 GMT
Subject: RFR: 8276927: [PPC64] Port shenandoahgc to linux on ppc64le
In-Reply-To: <CYnzc3NijRkUPtdp1X1qypAbyejCK3jNLo2xMKLX9kI=.a7642bbe-d06f-4228-a5de-12abc229ea0e@github.com>
References: <Nt7KC3zC3ERbpcjIda_FrsJYNuDtzbd_khX75A1x4aE=.676a48b5-c7f4-4798-a3b8-516bb60dfaa4@github.com>
 <CYnzc3NijRkUPtdp1X1qypAbyejCK3jNLo2xMKLX9kI=.a7642bbe-d06f-4228-a5de-12abc229ea0e@github.com>
Message-ID: <DTQUurFfPYzfWEKpAn3HoYBP-Prc3fMuqatSgR90jXo=.1de8de99-f815-4f55-ab78-e02979669bbe@github.com>

On Thu, 11 Nov 2021 11:32:49 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> Port the Shenandoah garbage collector (JDK-8241457)[https://bugs.openjdk.java.net/browse/JDK-8241457] to linux on ppc64le.
>
> src/hotspot/cpu/ppc/gc/shenandoah/c1/shenandoahBarrierSetC1_ppc.cpp line 83:
> 
>> 81:     LIRGenerator* gen = access.gen();
>> 82: 
>> 83:     if (ShenandoahCASBarrier) {
> 
> I am not sure, but I almost think we should not even end up in the method with -ShenandoahCASBarrier. If anything, -ShenandoahCASBarrier should result in only calling super to emit regular CAS without any barriers.

We hit this case when running `jdk/bin/java -XX:+UseShenandoahGC -XX:ShenandoahGCMode=passive -version`. x86 and aarch64 check for ShenandoahCASBarrier, too. So, looks like these checks are needed and correct.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6325

From ihse at openjdk.java.net  Thu Nov 11 14:38:49 2021
From: ihse at openjdk.java.net (Magnus Ihse Bursie)
Date: Thu, 11 Nov 2021 14:38:49 GMT
Subject: RFR: 8277012: Use blessed modifier order in src/utils
Message-ID: <Di7gReO-iIxY8RVWB9O_pc1uE3Adw0eVjbJjNKSEY9U=.136760ca-bc3f-4a16-ab6b-90da95e42ad2@github.com>

I ran bin/blessed-modifier-order.sh on source code in src/utils. This scripts verifies that modifiers are in the "blessed" order, and fixes it otherwise. I have manually checked the changes made by the script to make sure they are sound.

There are no clear ownership of this code, but I believe it's kind of hotspot-related.

-------------

Commit messages:
 - 8277012: Use blessed modifier order in src/utils

Changes: https://git.openjdk.java.net/jdk/pull/6354/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6354&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8277012
  Stats: 25 lines in 10 files changed: 0 ins; 0 del; 25 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6354.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6354/head:pull/6354

PR: https://git.openjdk.java.net/jdk/pull/6354

From adinn at openjdk.java.net  Thu Nov 11 14:46:43 2021
From: adinn at openjdk.java.net (Andrew Dinn)
Date: Thu, 11 Nov 2021 14:46:43 GMT
Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v2]
In-Reply-To: <I9BcKG5hATOeB9VB4UgUlFE69lrwh7V1kYc1-aM1RZE=.7958349d-a0ce-4e64-ae2f-cb7099db5704@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
 <DqpP1khabg9YQPvJsdAZm5Bl-aV5tCBhuIC_2yeNvPU=.1dcc624c-b345-42de-acc6-0341a3118dc7@github.com>
 <I9BcKG5hATOeB9VB4UgUlFE69lrwh7V1kYc1-aM1RZE=.7958349d-a0ce-4e64-ae2f-cb7099db5704@github.com>
Message-ID: <gDV8BbQGVCEZObTvbFcx69ZT6POJ7m3P6NlI5z7sl4U=.2ec54add-d532-4812-a7e5-a30949ffae7e@github.com>

On Thu, 11 Nov 2021 14:20:20 GMT, Florian Weimer <fweimer at openjdk.org> wrote:

>> Alan Hayward has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Simplify branch protection configure check
>
> src/hotspot/cpu/aarch64/globals_aarch64.hpp line 115:
> 
>> 113:           range(-1, 4096)                                               \
>> 114:   product(bool, UseROPProtection, false,                                \
>> 115:           "Use ROP based branch protection")                            \
> 
> The description is not correct. It's protection against certain ROP-based attack techniques.

I don't agree that this is incorrect, at least not for the stated reason. The flag switches on a protection mechanism that guards against ROP attacks. To my reading that does not imply it guards against all such attacks, merely that this is the nature of the protection it offers.

The description might still be considered incorrect for an unrelated reason. Its use of the adjectival phrase ROP based constitutes a transferred epithet, conflating the symptom with the medicine. In other words, the protection offered is not ROP based i.e. does not rely on an ROP technique. What it does is protect against ROP attacks. So, I'd suggest rewording to

    "Enable protection of branches against ROP attacks".

Florian, if you want to argue for rewording that to "Enable protection of branches against some categories of ROP attacks" or some other equivalently qualified variant please feel free to make a case. However, I don't think see any need to add that rider, nor any precedent in any of the other short descriptions provided in globals.hpp.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6334

From adinn at openjdk.java.net  Thu Nov 11 14:56:46 2021
From: adinn at openjdk.java.net (Andrew Dinn)
Date: Thu, 11 Nov 2021 14:56:46 GMT
Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v2]
In-Reply-To: <I9BcKG5hATOeB9VB4UgUlFE69lrwh7V1kYc1-aM1RZE=.7958349d-a0ce-4e64-ae2f-cb7099db5704@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
 <DqpP1khabg9YQPvJsdAZm5Bl-aV5tCBhuIC_2yeNvPU=.1dcc624c-b345-42de-acc6-0341a3118dc7@github.com>
 <I9BcKG5hATOeB9VB4UgUlFE69lrwh7V1kYc1-aM1RZE=.7958349d-a0ce-4e64-ae2f-cb7099db5704@github.com>
Message-ID: <sRvw2wgcxZAyOO8U59-fponA1ztGPWmVStKnHoNc6ng=.e0640791-f153-48b5-a1bd-dbb469a1059d@github.com>

On Thu, 11 Nov 2021 14:20:33 GMT, Florian Weimer <fweimer at openjdk.org> wrote:

> Is the code still mapped read-write all the time?

That depends on what code you mean. The JVM code compiled from C++ sources is mapped RO(X) in the text section like any compiled C/C++ code. Protection of that code is covered by the changes to the build system.

The runtime generated runtime stubs and Java method code  into which this patch may insert the required PAC instructions are written into a code cache in a section which is mapped RW(X) all the time. It would be hard to map even a subset of this code cache RO because generated code includes call and data sites that need to be patched during execution.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6334

From fweimer at openjdk.java.net  Thu Nov 11 14:56:46 2021
From: fweimer at openjdk.java.net (Florian Weimer)
Date: Thu, 11 Nov 2021 14:56:46 GMT
Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v2]
In-Reply-To: <gDV8BbQGVCEZObTvbFcx69ZT6POJ7m3P6NlI5z7sl4U=.2ec54add-d532-4812-a7e5-a30949ffae7e@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
 <DqpP1khabg9YQPvJsdAZm5Bl-aV5tCBhuIC_2yeNvPU=.1dcc624c-b345-42de-acc6-0341a3118dc7@github.com>
 <I9BcKG5hATOeB9VB4UgUlFE69lrwh7V1kYc1-aM1RZE=.7958349d-a0ce-4e64-ae2f-cb7099db5704@github.com>
 <gDV8BbQGVCEZObTvbFcx69ZT6POJ7m3P6NlI5z7sl4U=.2ec54add-d532-4812-a7e5-a30949ffae7e@github.com>
Message-ID: <IxZX1B-2b7nzfai37c8LjfkbxZ_Lda0E3mK03-hvU9w=.e51394be-2088-47f1-a36d-6839cfa6748d@github.com>

On Thu, 11 Nov 2021 14:43:59 GMT, Andrew Dinn <adinn at openjdk.org> wrote:

>> src/hotspot/cpu/aarch64/globals_aarch64.hpp line 115:
>> 
>>> 113:           range(-1, 4096)                                               \
>>> 114:   product(bool, UseROPProtection, false,                                \
>>> 115:           "Use ROP based branch protection")                            \
>> 
>> The description is not correct. It's protection against certain ROP-based attack techniques.
>
> I don't agree that this is incorrect, at least not for the stated reason. The flag switches on a protection mechanism that guards against ROP attacks. To my reading that does not imply it guards against all such attacks, merely that this is the nature of the protection it offers.
> 
> The description might still be considered incorrect for an unrelated reason. Its use of the adjectival phrase ROP based constitutes a transferred epithet, conflating the symptom with the medicine. In other words, the protection offered is not ROP based i.e. does not rely on an ROP technique. What it does is protect against ROP attacks. So, I'd suggest rewording to
> 
>     "Enable protection of branches against ROP attacks".
> 
> Florian, if you want to argue for rewording that to "Enable protection of branches against some categories of ROP attacks" or some other equivalently qualified variant please feel free to make a case. However, I don't think see any need to add that rider, nor any precedent in any of the other short descriptions provided in globals.hpp.

I did mean the description, not the flag name.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6334

From adinn at openjdk.java.net  Thu Nov 11 15:02:39 2021
From: adinn at openjdk.java.net (Andrew Dinn)
Date: Thu, 11 Nov 2021 15:02:39 GMT
Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v2]
In-Reply-To: <IxZX1B-2b7nzfai37c8LjfkbxZ_Lda0E3mK03-hvU9w=.e51394be-2088-47f1-a36d-6839cfa6748d@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
 <DqpP1khabg9YQPvJsdAZm5Bl-aV5tCBhuIC_2yeNvPU=.1dcc624c-b345-42de-acc6-0341a3118dc7@github.com>
 <I9BcKG5hATOeB9VB4UgUlFE69lrwh7V1kYc1-aM1RZE=.7958349d-a0ce-4e64-ae2f-cb7099db5704@github.com>
 <gDV8BbQGVCEZObTvbFcx69ZT6POJ7m3P6NlI5z7sl4U=.2ec54add-d532-4812-a7e5-a30949ffae7e@github.com>
 <IxZX1B-2b7nzfai37c8LjfkbxZ_Lda0E3mK03-hvU9w=.e51394be-2088-47f1-a36d-6839cfa6748d@github.com>
Message-ID: <BT0o0gOHIZ1Ds7KYcjOnrhI4lylWHq5Yj1unU8G3pZk=.151e6d85-9446-42b8-9356-050f72e1e03b@github.com>

On Thu, 11 Nov 2021 14:53:54 GMT, Florian Weimer <fweimer at openjdk.org> wrote:

>> I don't agree that this is incorrect, at least not for the stated reason. The flag switches on a protection mechanism that guards against ROP attacks. To my reading that does not imply it guards against all such attacks, merely that this is the nature of the protection it offers.
>> 
>> The description might still be considered incorrect for an unrelated reason. Its use of the adjectival phrase ROP based constitutes a transferred epithet, conflating the symptom with the medicine. In other words, the protection offered is not ROP based i.e. does not rely on an ROP technique. What it does is protect against ROP attacks. So, I'd suggest rewording to
>> 
>>     "Enable protection of branches against ROP attacks".
>> 
>> Florian, if you want to argue for rewording that to "Enable protection of branches against some categories of ROP attacks" or some other equivalently qualified variant please feel free to make a case. However, I don't think see any need to add that rider, nor any precedent in any of the other short descriptions provided in globals.hpp.
>
> I did mean the description, not the flag name.

Yes, understood. I too was talking about the description even though I introduced my comment by talking about what the flag does.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6334

From rkennke at openjdk.java.net  Thu Nov 11 15:04:36 2021
From: rkennke at openjdk.java.net (Roman Kennke)
Date: Thu, 11 Nov 2021 15:04:36 GMT
Subject: RFR: 8276927: [PPC64] Port shenandoahgc to linux on ppc64le
In-Reply-To: <DTQUurFfPYzfWEKpAn3HoYBP-Prc3fMuqatSgR90jXo=.1de8de99-f815-4f55-ab78-e02979669bbe@github.com>
References: <Nt7KC3zC3ERbpcjIda_FrsJYNuDtzbd_khX75A1x4aE=.676a48b5-c7f4-4798-a3b8-516bb60dfaa4@github.com>
 <CYnzc3NijRkUPtdp1X1qypAbyejCK3jNLo2xMKLX9kI=.a7642bbe-d06f-4228-a5de-12abc229ea0e@github.com>
 <DTQUurFfPYzfWEKpAn3HoYBP-Prc3fMuqatSgR90jXo=.1de8de99-f815-4f55-ab78-e02979669bbe@github.com>
Message-ID: <SGw7obFEhdoysPmULhEO9gbLKMqLpd_eFVwnGgjaH1E=.edf084e8-644b-4c7b-9b1b-c8bb04261291@github.com>

On Thu, 11 Nov 2021 14:30:05 GMT, Martin Doerr <mdoerr at openjdk.org> wrote:

>> src/hotspot/cpu/ppc/gc/shenandoah/c1/shenandoahBarrierSetC1_ppc.cpp line 83:
>> 
>>> 81:     LIRGenerator* gen = access.gen();
>>> 82: 
>>> 83:     if (ShenandoahCASBarrier) {
>> 
>> I am not sure, but I almost think we should not even end up in the method with -ShenandoahCASBarrier. If anything, -ShenandoahCASBarrier should result in only calling super to emit regular CAS without any barriers.
>
> We hit this case when running `jdk/bin/java -XX:+UseShenandoahGC -XX:ShenandoahGCMode=passive -version`. x86 and aarch64 check for ShenandoahCASBarrier, too. So, looks like these checks are needed and correct.

Ok then.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6325

From duke at openjdk.java.net  Thu Nov 11 15:33:33 2021
From: duke at openjdk.java.net (Alan Hayward)
Date: Thu, 11 Nov 2021 15:33:33 GMT
Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v2]
In-Reply-To: <sRvw2wgcxZAyOO8U59-fponA1ztGPWmVStKnHoNc6ng=.e0640791-f153-48b5-a1bd-dbb469a1059d@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
 <DqpP1khabg9YQPvJsdAZm5Bl-aV5tCBhuIC_2yeNvPU=.1dcc624c-b345-42de-acc6-0341a3118dc7@github.com>
 <I9BcKG5hATOeB9VB4UgUlFE69lrwh7V1kYc1-aM1RZE=.7958349d-a0ce-4e64-ae2f-cb7099db5704@github.com>
 <sRvw2wgcxZAyOO8U59-fponA1ztGPWmVStKnHoNc6ng=.e0640791-f153-48b5-a1bd-dbb469a1059d@github.com>
Message-ID: <CxZhDtER_g7uvzljbUJt5nwnJkfPiN9uXuyVC82i9Zk=.ed48ead2-53c9-4de2-ac12-2a609902772d@github.com>

On Thu, 11 Nov 2021 14:52:54 GMT, Andrew Dinn <adinn at openjdk.org> wrote:

> The runtime generated runtime stubs and Java method code into which this patch may insert the required PAC instructions are written into a code cache in a section which is mapped RW(X) all the time. It would be hard to map even a subset of this code cache RO because generated code includes call and data sites that need to be patched during execution.

Am I right is saying that for Macos, all generated code is remapped RO before execution?

An additional concern I have is that if the globals data was attacked then the UseROPProtection flag could be flipped, and all code after that point would be generated without ROP protection. Marking all the globals data as RO would fix that. Alternatively remove UseROPProtection and then in the macroassembler always generate PAC code, using just the subset of instructions that are NOPs on non-PAC hardware. Or alternatively only generate PAC code based on a #define set at build time. Each option has its own downsides.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6334

From coleenp at openjdk.java.net  Thu Nov 11 15:56:33 2021
From: coleenp at openjdk.java.net (Coleen Phillimore)
Date: Thu, 11 Nov 2021 15:56:33 GMT
Subject: RFR: 8258192: Obsolete the CriticalJNINatives flag
In-Reply-To: <i0WnOdaRq-viKl6mvyF_1fbiJkt7VXixS8aipRvu-KY=.60d20d60-383c-4f27-bdec-a40105e22b52@github.com>
References: <ob57YhEZroN10gysG3zLpZ3Q8GJr2jllHF3MCKJZViw=.58908e4d-3b94-49d7-9fef-a501e001207d@github.com>
 <SRNSoDTVFRw_mF5XktqAo_Fnf8opDZZxXfpPfM2FcK8=.e427ef57-7621-4b3a-a90f-a3cc95855846@github.com>
 <2YkKB7ZvIjfaTezJh_BSpwf1PdBH4GEW7Nnt43yDphU=.e37b5f9a-258b-4a1b-9de9-b7c0933535ec@github.com>
 <i0WnOdaRq-viKl6mvyF_1fbiJkt7VXixS8aipRvu-KY=.60d20d60-383c-4f27-bdec-a40105e22b52@github.com>
Message-ID: <dfI-Qp-EwCyq22azpvZXaYs_7dFUvQfr-tH-8Aq34F8=.bf860360-71d2-4ede-84b5-1f761a9b1373@github.com>

On Thu, 11 Nov 2021 14:17:15 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> I didn't know if the save area is still needed for something else, which is why I left the ?.  I can remove the comment but haven't made any  substantial changes here.  I'm not sure if they're needed or not, but I can't test them if I made them.
>
> I mean, you did the same here: https://github.com/openjdk/jdk/pull/6343/files#diff-060e534de775616a893aa969f3639e53666cda9e93bed7c3a3c14b9cdc4cdba0L1553-L1554 -- and that change is understandable.

Looking further, the area is needed (stores oops there), but not the comment.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6343

From coleenp at openjdk.java.net  Thu Nov 11 16:19:01 2021
From: coleenp at openjdk.java.net (Coleen Phillimore)
Date: Thu, 11 Nov 2021 16:19:01 GMT
Subject: RFR: 8258192: Obsolete the CriticalJNINatives flag [v2]
In-Reply-To: <ob57YhEZroN10gysG3zLpZ3Q8GJr2jllHF3MCKJZViw=.58908e4d-3b94-49d7-9fef-a501e001207d@github.com>
References: <ob57YhEZroN10gysG3zLpZ3Q8GJr2jllHF3MCKJZViw=.58908e4d-3b94-49d7-9fef-a501e001207d@github.com>
Message-ID: <bm2UOEqMEVtYUY0JKGfF0mPV8UZ265iAfu7xsp57iA4=.a8fbe580-f5f7-44e6-9956-41bfa08a63c8@github.com>

> This change removes the disabled CriticalJNINatives code, and the flag now gives an obsolete message.
> Tested with tier1 on cpus x64, aarch64, and builds on linux-x86-open,linux-s390x-open,linux-arm32-debug,linux-ppc64le-debug.

Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision:

  Some platform adjustments.

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6343/files
  - new: https://git.openjdk.java.net/jdk/pull/6343/files/7e9c641d..9b8ff9ae

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6343&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6343&range=00-01

  Stats: 67 lines in 6 files changed: 0 ins; 57 del; 10 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6343.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6343/head:pull/6343

PR: https://git.openjdk.java.net/jdk/pull/6343

From adinn at openjdk.java.net  Thu Nov 11 16:34:33 2021
From: adinn at openjdk.java.net (Andrew Dinn)
Date: Thu, 11 Nov 2021 16:34:33 GMT
Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v2]
In-Reply-To: <CxZhDtER_g7uvzljbUJt5nwnJkfPiN9uXuyVC82i9Zk=.ed48ead2-53c9-4de2-ac12-2a609902772d@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
 <DqpP1khabg9YQPvJsdAZm5Bl-aV5tCBhuIC_2yeNvPU=.1dcc624c-b345-42de-acc6-0341a3118dc7@github.com>
 <I9BcKG5hATOeB9VB4UgUlFE69lrwh7V1kYc1-aM1RZE=.7958349d-a0ce-4e64-ae2f-cb7099db5704@github.com>
 <sRvw2wgcxZAyOO8U59-fponA1ztGPWmVStKnHoNc6ng=.e0640791-f153-48b5-a1bd-dbb469a1059d@github.com>
 <CxZhDtER_g7uvzljbUJt5nwnJkfPiN9uXuyVC82i9Zk=.ed48ead2-53c9-4de2-ac12-2a609902772d@github.com>
Message-ID: <Jv0eierFbiABcz7HxS5t3KjAsTYPRmqb65PDzQd9lzo=.69f4fe6f-ece5-44a9-a7e6-63021f8f9265@github.com>

On Thu, 11 Nov 2021 15:30:29 GMT, Alan Hayward <duke at openjdk.java.net> wrote:

> Am I right is saying that for Macos, all generated code is remapped RO before execution?

Ah, no, it seems the code cache is not RWX all the time as far as Java threads are concerned. The Macos/AArch64 code is strategically calling  pthread_jit_write_protect_np at Java <-> JVM transition points.

That ensures that executable regions are executable but not writable (RX) from a Java thread when running JITted Java code and are writable but not executable (RW) when it calls into JVM code.

> An additional concern I have is that if the globals data was attacked then the UseROPProtection flag could be flipped, and all code after that point would be generated without ROP protection. Marking all the globals data as RO would fix that. Alternatively remove UseROPProtection and then in the macroassembler always generate PAC code, using just the subset of instructions that are NOPs on non-PAC hardware. Or alternatively only generate PAC code based on a #define set at build time. Each option has its own downsides.

Globals data can legitimately be written during JVM startup (perhaps in some cases also during execution?). So, they cannot simply be marked as RO.

I am not sure this concern is really warranted. If an attacker is already able to overwrite UseROPProtection then a concern over the resulting omission of JITted ROP protection seems like attending to the loud banging of the stable door while Shergar has already been diced into stew meat.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6334

From mdoerr at openjdk.java.net  Thu Nov 11 16:35:32 2021
From: mdoerr at openjdk.java.net (Martin Doerr)
Date: Thu, 11 Nov 2021 16:35:32 GMT
Subject: RFR: 8276927: [PPC64] Port shenandoahgc to linux on ppc64le
In-Reply-To: <Nt7KC3zC3ERbpcjIda_FrsJYNuDtzbd_khX75A1x4aE=.676a48b5-c7f4-4798-a3b8-516bb60dfaa4@github.com>
References: <Nt7KC3zC3ERbpcjIda_FrsJYNuDtzbd_khX75A1x4aE=.676a48b5-c7f4-4798-a3b8-516bb60dfaa4@github.com>
Message-ID: <R53ktrivQCqxHDKeS6rHMWRb9qjV0fxZi9-4npAOOFE=.b20d0f43-5eaf-4825-a876-74cfecbf1e12@github.com>

On Wed, 10 Nov 2021 09:00:04 GMT, Niklas Radomski <nradomski at openjdk.org> wrote:

> Port the Shenandoah garbage collector (JDK-8241457)[https://bugs.openjdk.java.net/browse/JDK-8241457] to linux on ppc64le.

src/hotspot/cpu/ppc/gc/shenandoah/shenandoahBarrierSetAssembler_ppc.cpp line 536:

> 534:   if (!preserve_gp_registers) { __ clobber_volatile_gprs(dst); }
> 535:   if (!needs_frame) { __ clobber_carg_stack_slots(tmp1); }
> 536: #endif

This clobber code was certainly good during development and early testing. But is it worth keeping it? Other GCs and other places don't have it any more. So, I'd slightly prefer removal. Feel free to do so if you agree.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6325

From simonis at openjdk.java.net  Thu Nov 11 16:43:14 2021
From: simonis at openjdk.java.net (Volker Simonis)
Date: Thu, 11 Nov 2021 16:43:14 GMT
Subject: RFR: 8273563: Improve performance of implicit exceptions with
 -XX:-OmitStackTraceInFastThrow [v8]
In-Reply-To: <V33bqOWqbYrr4ACfUn3SkS_JbOuICbX68z7jlX37pXQ=.a4a6b7ce-da22-440f-b3c7-d0a99a28fea0@github.com>
References: <V33bqOWqbYrr4ACfUn3SkS_JbOuICbX68z7jlX37pXQ=.a4a6b7ce-da22-440f-b3c7-d0a99a28fea0@github.com>
Message-ID: <d7f7PgBInrY_2i7fnVyJwBFUqKe4CBqj1aPv-LLFpOE=.b79b4aa4-479d-48ec-8e93-55158583cb17@github.com>

> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening.
> 
> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274):
> 
>     public static boolean isAlpha(int c) {
>         try {
>             return IS_ALPHA[c];
>         } catch (ArrayIndexOutOfBoundsException ex) {
>             return false;
>         }
>     }
> 
> 
> ### Solution
> 
> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code:
> 
> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions
> Benchmark                 (exceptionProbability)  Mode  Cnt      Score      Error  Units
> ImplicitExceptions.bench                     0.0  avgt    5      1.430 ?    0.353  ns/op
> ImplicitExceptions.bench                    0.33  avgt    5   3563.038 ?   77.358  ns/op
> ImplicitExceptions.bench                    0.66  avgt    5   8609.693 ? 1205.104  ns/op
> ImplicitExceptions.bench                    1.00  avgt    5  12842.401 ? 1022.728  ns/op
> 
> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions
> Benchmark                 (exceptionProbability)  Mode  Cnt      Score      Error  Units
> ImplicitExceptions.bench                     0.0  avgt    5     1.432  ?    0.352  ns/op
> ImplicitExceptions.bench                    0.33  avgt    5   355.723  ?   16.641  ns/op
> ImplicitExceptions.bench                    0.66  avgt    5   887.068  ?  166.728  ns/op
> ImplicitExceptions.bench                    1.00  avgt    5  1274.418  ?   88.235  ns/op
> 
> 
> ### Implementation details
> 
> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default.
> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `<init>` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. 
> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`.
> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call.
> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster.

Volker Simonis has updated the pull request incrementally with one additional commit since the last revision:

  Fix build issue for minimal/zero build one more time

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/5488/files
  - new: https://git.openjdk.java.net/jdk/pull/5488/files/625da2f9..b3c130c8

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5488&range=07
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5488&range=06-07

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/5488.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/5488/head:pull/5488

PR: https://git.openjdk.java.net/jdk/pull/5488

From aph at openjdk.java.net  Thu Nov 11 16:51:39 2021
From: aph at openjdk.java.net (Andrew Haley)
Date: Thu, 11 Nov 2021 16:51:39 GMT
Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v2]
In-Reply-To: <BT0o0gOHIZ1Ds7KYcjOnrhI4lylWHq5Yj1unU8G3pZk=.151e6d85-9446-42b8-9356-050f72e1e03b@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
 <DqpP1khabg9YQPvJsdAZm5Bl-aV5tCBhuIC_2yeNvPU=.1dcc624c-b345-42de-acc6-0341a3118dc7@github.com>
 <I9BcKG5hATOeB9VB4UgUlFE69lrwh7V1kYc1-aM1RZE=.7958349d-a0ce-4e64-ae2f-cb7099db5704@github.com>
 <gDV8BbQGVCEZObTvbFcx69ZT6POJ7m3P6NlI5z7sl4U=.2ec54add-d532-4812-a7e5-a30949ffae7e@github.com>
 <IxZX1B-2b7nzfai37c8LjfkbxZ_Lda0E3mK03-hvU9w=.e51394be-2088-47f1-a36d-6839cfa6748d@github.com>
 <BT0o0gOHIZ1Ds7KYcjOnrhI4lylWHq5Yj1unU8G3pZk=.151e6d85-9446-42b8-9356-050f72e1e03b@github.com>
Message-ID: <SBkFA5i37ibznq27DNlIcJlzyCPjo_uUDDLEyzeaDsM=.eb2eb4c5-7207-4a8c-a6cb-531ea0818198@github.com>

On Thu, 11 Nov 2021 14:59:32 GMT, Andrew Dinn <adinn at openjdk.org> wrote:

>> I did mean the description, not the flag name.
>
> Yes, understood. I too was talking about the description even though I introduced my comment by talking about what the flag does.

`"Protect branches against ROP attacks".`

-------------

PR: https://git.openjdk.java.net/jdk/pull/6334

From kvn at openjdk.java.net  Thu Nov 11 16:58:39 2021
From: kvn at openjdk.java.net (Vladimir Kozlov)
Date: Thu, 11 Nov 2021 16:58:39 GMT
Subject: RFR: 8276095: ciReplay: replay failure due to incomplete
 ciMethodData information
In-Reply-To: <EjhE5ReChm9c2C6F0K-WE94bWhagEW7y7qcWLlp6EvY=.a49a439b-32f5-40b1-a81c-47ed0d8047ee@github.com>
References: <EjhE5ReChm9c2C6F0K-WE94bWhagEW7y7qcWLlp6EvY=.a49a439b-32f5-40b1-a81c-47ed0d8047ee@github.com>
Message-ID: <L6nxoPoxbbnjCBHztg_rqU3mVSTqN2y6uey_opFwQrY=.41f9c40b-b796-44d8-a13a-34a267575773@github.com>

On Thu, 11 Nov 2021 03:28:40 GMT, Dean Long <dlong at openjdk.org> wrote:

> The replay data was missing MethodData::_invocation_counter.  Adding it seems to fix the problem.  @rwestrel please verify if it works for you.  Also, with this change:
> 1. added a version number to the replay file
> 2. removed unnused ci fields
> 3. corrected comment in TestLambdas.java

src/hotspot/share/ci/ciReplay.cpp line 645:

> 643:       _version = parse_int("version");
> 644:       if (_version > REPLAY_VERSION) {
> 645:         report_error("unrecognized version");

Would be nice to print both versions numbers in error message.
Also I would like to be able ignore such error and process file anyway. Is `report_error` allows it?

-------------

PR: https://git.openjdk.java.net/jdk/pull/6344

From kvn at openjdk.java.net  Thu Nov 11 17:02:38 2021
From: kvn at openjdk.java.net (Vladimir Kozlov)
Date: Thu, 11 Nov 2021 17:02:38 GMT
Subject: RFR: 8276095: ciReplay: replay failure due to incomplete
 ciMethodData information
In-Reply-To: <EjhE5ReChm9c2C6F0K-WE94bWhagEW7y7qcWLlp6EvY=.a49a439b-32f5-40b1-a81c-47ed0d8047ee@github.com>
References: <EjhE5ReChm9c2C6F0K-WE94bWhagEW7y7qcWLlp6EvY=.a49a439b-32f5-40b1-a81c-47ed0d8047ee@github.com>
Message-ID: <KNFKcqeGpZcbG7U7Ia2J_LzGkltwqQ7Trb6Z5zqLNHI=.6bcc1baf-1c72-4d88-ab87-7244383d12b1@github.com>

On Thu, 11 Nov 2021 03:28:40 GMT, Dean Long <dlong at openjdk.org> wrote:

> The replay data was missing MethodData::_invocation_counter.  Adding it seems to fix the problem.  @rwestrel please verify if it works for you.  Also, with this change:
> 1. added a version number to the replay file
> 2. removed unnused ci fields
> 3. corrected comment in TestLambdas.java

src/hotspot/share/ci/ciReplay.cpp line 837:

> 835:     rec->_state = parse_int("state");
> 836:     if (_version < 1) {
> 837:       parse_int("current_mileage");

Why it is not assigned to `rec->_current_mileage` here?

-------------

PR: https://git.openjdk.java.net/jdk/pull/6344

From psandoz at openjdk.java.net  Thu Nov 11 17:10:04 2021
From: psandoz at openjdk.java.net (Paul Sandoz)
Date: Thu, 11 Nov 2021 17:10:04 GMT
Subject: RFR: 8271515: Integration of JEP 417: Vector API (Third Incubator)
 [v9]
In-Reply-To: <_QQ9ntdJJfzVcAGrbjev0ZM-xNfD4wNATphnXkb-Y00=.bbf46985-8776-4dda-ada5-b15ab50774aa@github.com>
References: <_QQ9ntdJJfzVcAGrbjev0ZM-xNfD4wNATphnXkb-Y00=.bbf46985-8776-4dda-ada5-b15ab50774aa@github.com>
Message-ID: <YwW4l-qm4vJTAkZaeRDnKPA5mJ4pez3oMSClj6YsZjA=.76f4e82e-a0d6-46a1-911e-acc6bcb6d39b@github.com>

> This PR improves the performance of vector operations that accept masks on architectures that support masking in hardware, specifically Intel AVX512 and ARM SVE.
> 
> On architectures that do not support masking in hardware the same technique as before is applied to most operations, specifically composition using blend.
> 
> Masked loads/stores are a special form of masked operation that require additional care to ensure out-of-bounds access throw exceptions. The range checking has not been fully optimized and will require further work.
> 
> No API enhancements were required and only a few additional tests were needed.

Paul Sandoz has updated the pull request incrementally with one additional commit since the last revision:

  Add missing null check post mask unboxing.

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/5873/files
  - new: https://git.openjdk.java.net/jdk/pull/5873/files/571e6f39..11906870

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5873&range=08
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5873&range=07-08

  Stats: 8 lines in 1 file changed: 8 ins; 0 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/5873.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/5873/head:pull/5873

PR: https://git.openjdk.java.net/jdk/pull/5873

From kvn at openjdk.java.net  Thu Nov 11 17:19:37 2021
From: kvn at openjdk.java.net (Vladimir Kozlov)
Date: Thu, 11 Nov 2021 17:19:37 GMT
Subject: RFR: 8273563: Improve performance of implicit exceptions with
 -XX:-OmitStackTraceInFastThrow [v8]
In-Reply-To: <d7f7PgBInrY_2i7fnVyJwBFUqKe4CBqj1aPv-LLFpOE=.b79b4aa4-479d-48ec-8e93-55158583cb17@github.com>
References: <V33bqOWqbYrr4ACfUn3SkS_JbOuICbX68z7jlX37pXQ=.a4a6b7ce-da22-440f-b3c7-d0a99a28fea0@github.com>
 <d7f7PgBInrY_2i7fnVyJwBFUqKe4CBqj1aPv-LLFpOE=.b79b4aa4-479d-48ec-8e93-55158583cb17@github.com>
Message-ID: <5Q-g54nyWmdaykYA01MSSN5yGS2qoAqnXPtxhyD12fU=.103ffc2e-dc83-4da4-876a-f05735feda1b@github.com>

On Thu, 11 Nov 2021 16:43:14 GMT, Volker Simonis <simonis at openjdk.org> wrote:

>> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening.
>> 
>> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274):
>> 
>>     public static boolean isAlpha(int c) {
>>         try {
>>             return IS_ALPHA[c];
>>         } catch (ArrayIndexOutOfBoundsException ex) {
>>             return false;
>>         }
>>     }
>> 
>> 
>> ### Solution
>> 
>> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code:
>> 
>> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions
>> Benchmark                 (exceptionProbability)  Mode  Cnt      Score      Error  Units
>> ImplicitExceptions.bench                     0.0  avgt    5      1.430 ?    0.353  ns/op
>> ImplicitExceptions.bench                    0.33  avgt    5   3563.038 ?   77.358  ns/op
>> ImplicitExceptions.bench                    0.66  avgt    5   8609.693 ? 1205.104  ns/op
>> ImplicitExceptions.bench                    1.00  avgt    5  12842.401 ? 1022.728  ns/op
>> 
>> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions
>> Benchmark                 (exceptionProbability)  Mode  Cnt      Score      Error  Units
>> ImplicitExceptions.bench                     0.0  avgt    5     1.432  ?    0.352  ns/op
>> ImplicitExceptions.bench                    0.33  avgt    5   355.723  ?   16.641  ns/op
>> ImplicitExceptions.bench                    0.66  avgt    5   887.068  ?  166.728  ns/op
>> ImplicitExceptions.bench                    1.00  avgt    5  1274.418  ?   88.235  ns/op
>> 
>> 
>> ### Implementation details
>> 
>> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default.
>> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `<init>` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. 
>> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`.
>> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call.
>> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster.
>
> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix build issue for minimal/zero build one more time

I suggest to not rush it and wait JDK 19 because 18 is almost done.
I wanted to look on this too but I am on vacation.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5488

From simonis at openjdk.java.net  Thu Nov 11 17:35:47 2021
From: simonis at openjdk.java.net (Volker Simonis)
Date: Thu, 11 Nov 2021 17:35:47 GMT
Subject: RFR: 8273563: Improve performance of implicit exceptions with
 -XX:-OmitStackTraceInFastThrow [v8]
In-Reply-To: <d7f7PgBInrY_2i7fnVyJwBFUqKe4CBqj1aPv-LLFpOE=.b79b4aa4-479d-48ec-8e93-55158583cb17@github.com>
References: <V33bqOWqbYrr4ACfUn3SkS_JbOuICbX68z7jlX37pXQ=.a4a6b7ce-da22-440f-b3c7-d0a99a28fea0@github.com>
 <d7f7PgBInrY_2i7fnVyJwBFUqKe4CBqj1aPv-LLFpOE=.b79b4aa4-479d-48ec-8e93-55158583cb17@github.com>
Message-ID: <xxVbfU1cLKiUgWNxC7UGzCYsDwQFfmV_GMVIQGiSKNc=.10481281-7851-4978-a0d6-b20f8f362043@github.com>

On Thu, 11 Nov 2021 16:43:14 GMT, Volker Simonis <simonis at openjdk.org> wrote:

>> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening.
>> 
>> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274):
>> 
>>     public static boolean isAlpha(int c) {
>>         try {
>>             return IS_ALPHA[c];
>>         } catch (ArrayIndexOutOfBoundsException ex) {
>>             return false;
>>         }
>>     }
>> 
>> 
>> ### Solution
>> 
>> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code:
>> 
>> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions
>> Benchmark                 (exceptionProbability)  Mode  Cnt      Score      Error  Units
>> ImplicitExceptions.bench                     0.0  avgt    5      1.430 ?    0.353  ns/op
>> ImplicitExceptions.bench                    0.33  avgt    5   3563.038 ?   77.358  ns/op
>> ImplicitExceptions.bench                    0.66  avgt    5   8609.693 ? 1205.104  ns/op
>> ImplicitExceptions.bench                    1.00  avgt    5  12842.401 ? 1022.728  ns/op
>> 
>> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions
>> Benchmark                 (exceptionProbability)  Mode  Cnt      Score      Error  Units
>> ImplicitExceptions.bench                     0.0  avgt    5     1.432  ?    0.352  ns/op
>> ImplicitExceptions.bench                    0.33  avgt    5   355.723  ?   16.641  ns/op
>> ImplicitExceptions.bench                    0.66  avgt    5   887.068  ?  166.728  ns/op
>> ImplicitExceptions.bench                    1.00  avgt    5  1274.418  ?   88.235  ns/op
>> 
>> 
>> ### Implementation details
>> 
>> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default.
>> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `<init>` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. 
>> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`.
>> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call.
>> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster.
>
> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix build issue for minimal/zero build one more time

Hi Vladimir,
I'd be really happy if you could take a look at this PR. On the other hand, I did intend to bring this to JDK 18. There's still a month until RDP 1 starts and this PR has already been discussed for two month. If you say "don't hurry" does that mean that you won't have time to review it within the next month?
Best regards and a pleasant vacation,
Volker

-------------

PR: https://git.openjdk.java.net/jdk/pull/5488

From kvn at openjdk.java.net  Thu Nov 11 17:50:36 2021
From: kvn at openjdk.java.net (Vladimir Kozlov)
Date: Thu, 11 Nov 2021 17:50:36 GMT
Subject: RFR: 8273563: Improve performance of implicit exceptions with
 -XX:-OmitStackTraceInFastThrow [v8]
In-Reply-To: <d7f7PgBInrY_2i7fnVyJwBFUqKe4CBqj1aPv-LLFpOE=.b79b4aa4-479d-48ec-8e93-55158583cb17@github.com>
References: <V33bqOWqbYrr4ACfUn3SkS_JbOuICbX68z7jlX37pXQ=.a4a6b7ce-da22-440f-b3c7-d0a99a28fea0@github.com>
 <d7f7PgBInrY_2i7fnVyJwBFUqKe4CBqj1aPv-LLFpOE=.b79b4aa4-479d-48ec-8e93-55158583cb17@github.com>
Message-ID: <-qMqxY4R-0pRP7L0xxagLIrk-wW0XeLo7g13c3GQ8uk=.1d4c438a-e0dc-4a5d-a447-d4d2130bc9fc@github.com>

On Thu, 11 Nov 2021 16:43:14 GMT, Volker Simonis <simonis at openjdk.org> wrote:

>> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening.
>> 
>> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274):
>> 
>>     public static boolean isAlpha(int c) {
>>         try {
>>             return IS_ALPHA[c];
>>         } catch (ArrayIndexOutOfBoundsException ex) {
>>             return false;
>>         }
>>     }
>> 
>> 
>> ### Solution
>> 
>> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code:
>> 
>> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions
>> Benchmark                 (exceptionProbability)  Mode  Cnt      Score      Error  Units
>> ImplicitExceptions.bench                     0.0  avgt    5      1.430 ?    0.353  ns/op
>> ImplicitExceptions.bench                    0.33  avgt    5   3563.038 ?   77.358  ns/op
>> ImplicitExceptions.bench                    0.66  avgt    5   8609.693 ? 1205.104  ns/op
>> ImplicitExceptions.bench                    1.00  avgt    5  12842.401 ? 1022.728  ns/op
>> 
>> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions
>> Benchmark                 (exceptionProbability)  Mode  Cnt      Score      Error  Units
>> ImplicitExceptions.bench                     0.0  avgt    5     1.432  ?    0.352  ns/op
>> ImplicitExceptions.bench                    0.33  avgt    5   355.723  ?   16.641  ns/op
>> ImplicitExceptions.bench                    0.66  avgt    5   887.068  ?  166.728  ns/op
>> ImplicitExceptions.bench                    1.00  avgt    5  1274.418  ?   88.235  ns/op
>> 
>> 
>> ### Implementation details
>> 
>> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default.
>> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `<init>` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. 
>> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`.
>> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call.
>> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster.
>
> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix build issue for minimal/zero build one more time

My vacation is just started and I will have just a week before RDP1 to do review.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5488

From simonis at openjdk.java.net  Thu Nov 11 17:50:37 2021
From: simonis at openjdk.java.net (Volker Simonis)
Date: Thu, 11 Nov 2021 17:50:37 GMT
Subject: RFR: 8273563: Improve performance of implicit exceptions with
 -XX:-OmitStackTraceInFastThrow [v8]
In-Reply-To: <d7f7PgBInrY_2i7fnVyJwBFUqKe4CBqj1aPv-LLFpOE=.b79b4aa4-479d-48ec-8e93-55158583cb17@github.com>
References: <V33bqOWqbYrr4ACfUn3SkS_JbOuICbX68z7jlX37pXQ=.a4a6b7ce-da22-440f-b3c7-d0a99a28fea0@github.com>
 <d7f7PgBInrY_2i7fnVyJwBFUqKe4CBqj1aPv-LLFpOE=.b79b4aa4-479d-48ec-8e93-55158583cb17@github.com>
Message-ID: <Hlf8W1wtmcYAT0iCcCXmElqG5p0EWzm4CWRRcaAu9Pc=.63f4195b-12c5-4f94-9e9c-bcdeffa8d74c@github.com>

On Thu, 11 Nov 2021 16:43:14 GMT, Volker Simonis <simonis at openjdk.org> wrote:

>> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening.
>> 
>> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274):
>> 
>>     public static boolean isAlpha(int c) {
>>         try {
>>             return IS_ALPHA[c];
>>         } catch (ArrayIndexOutOfBoundsException ex) {
>>             return false;
>>         }
>>     }
>> 
>> 
>> ### Solution
>> 
>> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code:
>> 
>> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions
>> Benchmark                 (exceptionProbability)  Mode  Cnt      Score      Error  Units
>> ImplicitExceptions.bench                     0.0  avgt    5      1.430 ?    0.353  ns/op
>> ImplicitExceptions.bench                    0.33  avgt    5   3563.038 ?   77.358  ns/op
>> ImplicitExceptions.bench                    0.66  avgt    5   8609.693 ? 1205.104  ns/op
>> ImplicitExceptions.bench                    1.00  avgt    5  12842.401 ? 1022.728  ns/op
>> 
>> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions
>> Benchmark                 (exceptionProbability)  Mode  Cnt      Score      Error  Units
>> ImplicitExceptions.bench                     0.0  avgt    5     1.432  ?    0.352  ns/op
>> ImplicitExceptions.bench                    0.33  avgt    5   355.723  ?   16.641  ns/op
>> ImplicitExceptions.bench                    0.66  avgt    5   887.068  ?  166.728  ns/op
>> ImplicitExceptions.bench                    1.00  avgt    5  1274.418  ?   88.235  ns/op
>> 
>> 
>> ### Implementation details
>> 
>> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default.
>> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `<init>` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. 
>> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`.
>> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call.
>> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster.
>
> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix build issue for minimal/zero build one more time

OK, enjoy your vacation then...

-------------

PR: https://git.openjdk.java.net/jdk/pull/5488

From aph at openjdk.java.net  Thu Nov 11 18:10:35 2021
From: aph at openjdk.java.net (Andrew Haley)
Date: Thu, 11 Nov 2021 18:10:35 GMT
Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v2]
In-Reply-To: <Jv0eierFbiABcz7HxS5t3KjAsTYPRmqb65PDzQd9lzo=.69f4fe6f-ece5-44a9-a7e6-63021f8f9265@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
 <DqpP1khabg9YQPvJsdAZm5Bl-aV5tCBhuIC_2yeNvPU=.1dcc624c-b345-42de-acc6-0341a3118dc7@github.com>
 <I9BcKG5hATOeB9VB4UgUlFE69lrwh7V1kYc1-aM1RZE=.7958349d-a0ce-4e64-ae2f-cb7099db5704@github.com>
 <sRvw2wgcxZAyOO8U59-fponA1ztGPWmVStKnHoNc6ng=.e0640791-f153-48b5-a1bd-dbb469a1059d@github.com>
 <CxZhDtER_g7uvzljbUJt5nwnJkfPiN9uXuyVC82i9Zk=.ed48ead2-53c9-4de2-ac12-2a609902772d@github.com>
 <Jv0eierFbiABcz7HxS5t3KjAsTYPRmqb65PDzQd9lzo=.69f4fe6f-ece5-44a9-a7e6-63021f8f9265@github.com>
Message-ID: <Zr1odUW1UNBN6g7pk6wcg7Ui6B35zCnDkus-DOx9IlY=.de40bd32-422e-4b5e-a7b5-4fc18d4d1c95@github.com>

On Thu, 11 Nov 2021 16:31:41 GMT, Andrew Dinn <adinn at openjdk.org> wrote:

> > Am I right is saying that for Macos, all generated code is remapped RO before execution?
> 
> Ah, no, it seems the code cache is not RWX all the time as far as Java threads are concerned. The Macos/AArch64 code is strategically calling pthread_jit_write_protect_np at Java <-> JVM transition points.

And this requires magic kernel support. I did mention it to a kernel engineer who wasn't very impressed, but I think it's pretty cool.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6334

From fweimer at openjdk.java.net  Thu Nov 11 18:18:41 2021
From: fweimer at openjdk.java.net (Florian Weimer)
Date: Thu, 11 Nov 2021 18:18:41 GMT
Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v2]
In-Reply-To: <Zr1odUW1UNBN6g7pk6wcg7Ui6B35zCnDkus-DOx9IlY=.de40bd32-422e-4b5e-a7b5-4fc18d4d1c95@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
 <DqpP1khabg9YQPvJsdAZm5Bl-aV5tCBhuIC_2yeNvPU=.1dcc624c-b345-42de-acc6-0341a3118dc7@github.com>
 <I9BcKG5hATOeB9VB4UgUlFE69lrwh7V1kYc1-aM1RZE=.7958349d-a0ce-4e64-ae2f-cb7099db5704@github.com>
 <sRvw2wgcxZAyOO8U59-fponA1ztGPWmVStKnHoNc6ng=.e0640791-f153-48b5-a1bd-dbb469a1059d@github.com>
 <CxZhDtER_g7uvzljbUJt5nwnJkfPiN9uXuyVC82i9Zk=.ed48ead2-53c9-4de2-ac12-2a609902772d@github.com>
 <Jv0eierFbiABcz7HxS5t3KjAsTYPRmqb65PDzQd9lzo=.69f4fe6f-ece5-44a9-a7e6-63021f8f9265@github.com>
 <Zr1odUW1UNBN6g7pk6wcg7Ui6B35zCnDkus-DOx9IlY=.de40bd32-422e-4b5e-a7b5-4fc18d4d1c95@github.com>
Message-ID: <3ViGybkSVRbuD_wN398vEFGxNJfiuS1wA_SdLkGtM18=.86e45177-8525-42dc-b27f-c22a67489108@github.com>

On Thu, 11 Nov 2021 18:07:37 GMT, Andrew Haley <aph at openjdk.org> wrote:

> > > Am I right is saying that for Macos, all generated code is remapped RO before execution?
> > 
> > 
> > Ah, no, it seems the code cache is not RWX all the time as far as Java threads are concerned. The Macos/AArch64 code is strategically calling pthread_jit_write_protect_np at Java <-> JVM transition points.
> 
> And this requires magic kernel support. I did mention it to a kernel engineer who wasn't very impressed, but I think it's pretty cool.

It's possible to emulate this to some extent with memory protection keys on POWER and (recent) x86. See `pkey_alloc`.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6334

From vladimir.kozlov at oracle.com  Thu Nov 11 20:37:26 2021
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 11 Nov 2021 12:37:26 -0800
Subject: [External] : Re: RFC - Improving C2 Escape Analysis
In-Reply-To: <BY5PR21MB14734EB2CE1D14079A6252909A949@BY5PR21MB1473.namprd21.prod.outlook.com>
References: <BY5PR21MB1473143A554A8B9DE9577C159AAA9@BY5PR21MB1473.namprd21.prod.outlook.com>
 <20210930140335.648146897@eggemoggin.niobe.net>
 <ADAF2E9E-5D48-4CF0-9EFB-C68F47E31874@oracle.com>
 <BY5PR21MB147300BCF3E5008B63643D7D9AAE9@BY5PR21MB1473.namprd21.prod.outlook.com>
 <415a6622-a46c-33da-8e39-c8f3068c7df3@oracle.com>
 <44563450-403B-4A15-95AB-5FB5DCA4ED0B@oracle.com>
 <DM6PR21MB1484F06897E51C0399A94F109ABF9@DM6PR21MB1484.namprd21.prod.outlook.com>
 <81f86a0b-dfb7-0b45-1779-49209a82ae40@oracle.com>
 <BY5PR21MB14738F7C7AED2F625389109B9A859@BY5PR21MB1473.namprd21.prod.outlook.com>
 <0f30507c-e0f0-c380-568b-ac441611e116@oracle.com>
 <787f8fbb-83e6-0867-1c97-ae2516df114b@oracle.com>
 <BY5PR21MB1473B300A3D625054E22C8139A879@BY5PR21MB1473.namprd21.prod.outlook.com>
 <457a3277-bc96-d481-2a69-4559f25cd52e@oracle.com>
 <BY5PR21MB14734EB2CE1D14079A6252909A949@BY5PR21MB1473.namprd21.prod.outlook.com>
Message-ID: <4bb3c804-d9fd-b9da-a4d3-c504d2e46933@oracle.com>

Hi Cesar,

On 11/11/21 11:24 AM, Cesar Soares Lucas wrote:
> Hi Vladimir,
> 
> Thank you for the feedback and sorry for the delay in getting back to you!
> 
>  > Yes, finding solution for allocation merges (or NULL) is a pain. I spent some
>  > time investigating possible solutions for it but "no cigar". May be we do
>  > indead need control flow analysis to resolve this.
> 
> Can you elaborate a bit on the approaches you tried and why you didn't like
> them? By allocation merges do you mean nested objects like "obj1.obj2.x",
> right? Did you try solving both control-flow merge issues and also allocation
> merges?

I mean control flow merges of allocations, like in your "Code Example 4".

I tried to create separate unique instance IDs (in addition to Node::_idx) to use for merged allocations case (not NULL 
case) which would look like one allocation after merge point with different paths for fields initialization. But 
stumbles on some issues and did not proceed further. After some thinking I decided that it is wrong approach since it 
still don't solve main merge issue of flow-insensitive analysis:

https://bugs.openjdk.java.net/browse/JDK-6726999
test/hotspot/jtreg/compiler/escapeAnalysis/Test6726999.java

The issue with deep nested allocations `new A(new B( new C)))` will be addressed by Iterative EA I propose: 
https://bugs.openjdk.java.net/browse/JDK-8276455

> 
>  > There are 2 test files with small methods for different EA cases I used to
>  > see how EA works:
> 
> These examples are being very helpful, thank you again! >
>  > Yes, I think it would be good to have a prototype if you are comfortable to
>  > work with C2 code already. ?I proposed small RFEs just for warmup ;)
> 
> I talked with my colleagues and we decided to start the work by trying to fix
> the control/data-flow merge issues - *perhaps not for all cases, but at least
> for some of them*. Then, based on our experience with this and some
> benchmarking we'll decide if we really need flow-sensitive analysis and how to
> best approach that.

Use Test6726999.java for that. It may need to be modified to verify correctness of results (currently it just print result).

> 
> We'll definitely take a look at the RFEs as we move along! Implementing Stadler
> algorithm was just something that crossed my mind initially, it's very likely
> the last approach we'd try ... I don't want to bite more than I can chew..

I may look on some RFE myself after I am done with 8276455. Please, let me know if you pick one to avoid duplicated work.

Regards,
Vladimir K

> 
> 
> Regards,
> Cesar
> ------------------------------------------------------------------------------------------------------------------------
> *From:* Vladimir Kozlov <vladimir.kozlov at oracle.com>
> *Sent:* October 29, 2021 5:27 PM
> *To:* Cesar Soares Lucas <Divino.Cesar at microsoft.com>; Tobias Hartmann <tobias.hartmann at oracle.com>; Ron Pressler 
> <ron.pressler at oracle.com>
> *Cc:* John Rose <john.r.rose at oracle.com>; Mark Reinhold <mark.reinhold at oracle.com>; hotspot-dev at openjdk.java.net 
> <hotspot-dev at openjdk.java.net>; Brian Stafford <Brian.Stafford at microsoft.com>; Martijn Verburg 
> <Martijn.Verburg at microsoft.com>; Hohensee, Paul <hohensee at amazon.com>
> *Subject:* Re: [External] : Re: RFC - Improving C2 Escape Analysis
> On 10/29/21 4:50 PM, Cesar Soares Lucas wrote:
>> Hi Vladimir and Tobias,
>> 
>>? >> Sure, here are four examples of EA and/or scalarization failing due to
>>? >> complicated control/data flow:
>>? >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcr.openjdk.java.net%2F~thartmann%2FEA_examples&amp;data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731032568%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=hz4ti9lgmQeGLX%2BZ3vmSngXHHUAX%2FAvtObgeu%2Fqz1DI%3D&amp;reserved=0 
> <https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Fcr.openjdk.java.net*2F*thartmann*2FEA_examples&amp;data=04*7C01*7CDivino.Cesar*40microsoft.com*7C027f2da3f2e14f914c2608d99b3c195e*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637711504731032568*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000&amp;sdata=hz4ti9lgmQeGLX*2BZ3vmSngXHHUAX*2FAvtObgeu*2Fqz1DI*3D&amp;reserved=0__;JSUlJX4lJSUlJSUlJSUlJSUlJSUlJQ!!ACWV5N9M2RV99hQ!a2yltdzJejKSFAqSoN7N5X0yqmRW0Nb7yS2_bBOlp6rrf8axPBXgR9arXi4ZrgiJmfBTxA$>
>> 
>>? >> There are 2 test files with small methods for different EA cases I used to
>>? >> see how EA works:
>>? >>
>>? >> test/hotspot/jtreg/compiler/escapeAnalysis/Test6726999.java
>>? >> test/hotspot/jtreg/compiler/escapeAnalysis/Test6689060.java
>> 
>> Thank you for the examples, Tobias/Vladimir. This is being very helpful.
>> 
>>? >> Yes, finding solution for allocation merges (or NULL) is a pain. I spent
>>? >> some time investigating possible solutions for it but "no cigar". May be we
>>? >> do indead need control flow analysis to resolve this.
>> 
>> By "need control flow analysis" you mean the flow-sensitive EA algorithm? My
> 
> Yes.
> 
> To clarify. I investigated solutions in current flow-insensitive EA.
> 
>> first idea to handle these control/data-merge issues was to implement in C2 the
>> same algorithm used by GRAAL - i.e., the algorithm described in Stadler et. al
>> PEA paper. Do you think this is reasonable?
> 
> Yes, I think it would be good to have a prototype if you are comfortable to work with C2 code already.
> I proposed small RFEs just for warmup ;)
> 
>> 
>>? >> I am currently looking on iterative EA. Do more EA rounds if we can
>>? >> eliminate more connected allocations. It was proposed by Vladimir Ivanov and
>>? >> I have working prototype.
>> 
>> Cool! I'm curious, when do you plan to submit a Pull Request for this?
> 
> I am investigating regressions in some benchmarks.
> 
>> 
>>? >> There is also suggestion from Amazon Java group about "C2 Partial Escape
>>? >> Analysis" which needs more discussion:
>>? >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openjdk.java.net%2Fpipermail%2Fhotspot-compiler-dev%2F2021-May%2F047486.html&amp;data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731032568%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=S65Ko1Jss1PRksoLs9w1Ci7lH7Cprikus4goFUXDdL4%3D&amp;reserved=0 
> <https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Fmail.openjdk.java.net*2Fpipermail*2Fhotspot-compiler-dev*2F2021-May*2F047486.html&amp;data=04*7C01*7CDivino.Cesar*40microsoft.com*7C027f2da3f2e14f914c2608d99b3c195e*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637711504731032568*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000&amp;sdata=S65Ko1Jss1PRksoLs9w1Ci7lH7Cprikus4goFUXDdL4*3D&amp;reserved=0__;JSUlJSUlJSUlJSUlJSUlJSUlJSU!!ACWV5N9M2RV99hQ!a2yltdzJejKSFAqSoN7N5X0yqmRW0Nb7yS2_bBOlp6rrf8axPBXgR9arXi4ZrghkjcrRWA$>
>> 
>> I'd love to hear from them about their experience with these issues and if they
>> have any plans to work on this moving forward! I'll ping them on the thread
>> that you linked above.
> 
> Yes, I would like them to participate too (CCing to Paul). They sent proposal almost 6 months ago and we did not hear
> any additional information after Vladimir Ivanov replied.
> 
> Regards,
> Vladimir K
> 
>> 
>> 
>> Regards,
>> Cesar
>> ------------------------------------------------------------------------------------------------------------------------
>> *From:* Vladimir Kozlov <vladimir.kozlov at oracle.com>
>> *Sent:* October 27, 2021 10:26 AM
>> *To:* Tobias Hartmann <tobias.hartmann at oracle.com>; Cesar Soares Lucas <Divino.Cesar at microsoft.com>; Ron Pressler
>> <ron.pressler at oracle.com>
>> *Cc:* John Rose <john.r.rose at oracle.com>; Mark Reinhold <mark.reinhold at oracle.com>; hotspot-dev at openjdk.java.net
>> <hotspot-dev at openjdk.java.net>; Brian Stafford <Brian.Stafford at microsoft.com>; Martijn Verburg
>> <Martijn.Verburg at microsoft.com>
>> *Subject:* Re: [External] : Re: RFC - Improving C2 Escape Analysis
>> First. Thank you, Cesar, for collecting data about C2 EA shortcomings.
>> 
>> I agree with cases Tobias pointed as possible starting points to improve EA.
>> 
>> Yes, finding solution for allocation merges (or NULL) is a pain. I spent some time investigating possible solutions for
>> it but "no cigar". May be we do indead need control flow analysis to resolve this.
>> 
>> I looked through JBS and found few issues which are not required to write new EA:
>> 
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-7149991&amp;data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731032568%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=w1OPBcpSVInagqRbMJ9%2BB0XYxxm84DWKGltPT5Btjss%3D&amp;reserved=0 
> <https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-7149991&amp;data=04*7C01*7CDivino.Cesar*40microsoft.com*7C027f2da3f2e14f914c2608d99b3c195e*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637711504731032568*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000&amp;sdata=w1OPBcpSVInagqRbMJ9*2BB0XYxxm84DWKGltPT5Btjss*3D&amp;reserved=0__;JSUlJSUlJSUlJSUlJSUlJSUlJQ!!ACWV5N9M2RV99hQ!a2yltdzJejKSFAqSoN7N5X0yqmRW0Nb7yS2_bBOlp6rrf8axPBXgR9arXi4ZrgheYcbU4Q$> 
> 
>> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fnam06.safelinks.protection.outlook.com%2F%3Furl%3Dhttps*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-7149991%26amp%3Bdata%3D04*7C01*7CDivino.Cesar*40microsoft.com*7C63920cb1798f48c3487508d9996efd44*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637709524262611242*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000%26amp%3Bsdata%3DQrR7*2BGxXon4ToV6x3PhtQzZGl5tF7f1RUDbEi2AMTqA*3D%26amp%3Breserved%3D0__%3BJSUlJSUlJSUlJSUlJSUlJSUlJQ!!ACWV5N9M2RV99hQ!Y9n_pFC3a0ZG4KrWKJhrn9mlogJtuWmPqPlYgyNTBHWD2o2yhpaz9QpDsRbFhx0nlxftOg%24&amp;data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731032568%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Z%2B99B925iq8y%2BCcl%2Bs3zsocygNtEpAl%2F22xgX5CJcFg%3D&amp;reserved=0 
> <https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Furldefense.com*2Fv3*2F__https*3A*2F*2Fnam06.safelinks.protection.outlook.com*2F*3Furl*3Dhttps*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-7149991*26amp*3Bdata*3D04*7C01*7CDivino.Cesar*40microsoft.com*7C63920cb1798f48c3487508d9996efd44*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637709524262611242*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000*26amp*3Bsdata*3DQrR7*2BGxXon4ToV6x3PhtQzZGl5tF7f1RUDbEi2AMTqA*3D*26amp*3Breserved*3D0__*3BJSUlJSUlJSUlJSUlJSUlJSUlJQ!!ACWV5N9M2RV99hQ!Y9n_pFC3a0ZG4KrWKJhrn9mlogJtuWmPqPlYgyNTBHWD2o2yhpaz9QpDsRbFhx0nlxftOg*24&amp;data=04*7C01*7CDivino.Cesar*40microsoft.com*7C027f2da3f2e14f914c2608d99b3c195e*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637711504731032568*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000&amp;sdata=Z*2B99B925iq8y*2BCcl*2Bs3zsocygNtEpAl*2F22xgX5CJcFg*3D&amp;reserved=0__;JSUlJSUlJSUlJSUqKioqKiUlJSoqKioqKioqKioqKiUlJSoqJSUlJSUlJSUlJSUlJSUlJSUlJSUlJQ!!ACWV5N9M2RV99hQ!a2yltdzJejKSFAqSoN7N5X0yqmRW0Nb7yS2_bBOlp6rrf8axPBXgR9arXi4ZrgiwzRBBlA$>>
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8059378&amp;data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731042513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=iFo%2Farh7mS777oQl705t5pznFZttfMGqFO6%2BQpr71uY%3D&amp;reserved=0 
> <https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8059378&amp;data=04*7C01*7CDivino.Cesar*40microsoft.com*7C027f2da3f2e14f914c2608d99b3c195e*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637711504731042513*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000&amp;sdata=iFo*2Farh7mS777oQl705t5pznFZttfMGqFO6*2BQpr71uY*3D&amp;reserved=0__;JSUlJSUlJSUlJSUlJSUlJSUlJSU!!ACWV5N9M2RV99hQ!a2yltdzJejKSFAqSoN7N5X0yqmRW0Nb7yS2_bBOlp6rrf8axPBXgR9arXi4Zrgjj331Sew$> 
> 
>> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fnam06.safelinks.protection.outlook.com%2F%3Furl%3Dhttps*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8059378%26amp%3Bdata%3D04*7C01*7CDivino.Cesar*40microsoft.com*7C63920cb1798f48c3487508d9996efd44*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637709524262611242*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000%26amp%3Bsdata%3DrsMHgOyTDYF*2B*2Ba38jGeown5TcZfIEDucAWI5QuAaTd4*3D%26amp%3Breserved%3D0__%3BJSUlJSUlJSUlJSUlJSUlJSUlJSU!!ACWV5N9M2RV99hQ!Y9n_pFC3a0ZG4KrWKJhrn9mlogJtuWmPqPlYgyNTBHWD2o2yhpaz9QpDsRbFhx3fmFwUkA%24&amp;data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731042513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=30G1N2vm%2BTNOgRtDesl3ssesCGuvx2RUqyw6tns%2FDi0%3D&amp;reserved=0 
> <https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Furldefense.com*2Fv3*2F__https*3A*2F*2Fnam06.safelinks.protection.outlook.com*2F*3Furl*3Dhttps*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8059378*26amp*3Bdata*3D04*7C01*7CDivino.Cesar*40microsoft.com*7C63920cb1798f48c3487508d9996efd44*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637709524262611242*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000*26amp*3Bsdata*3DrsMHgOyTDYF*2B*2Ba38jGeown5TcZfIEDucAWI5QuAaTd4*3D*26amp*3Breserved*3D0__*3BJSUlJSUlJSUlJSUlJSUlJSUlJSU!!ACWV5N9M2RV99hQ!Y9n_pFC3a0ZG4KrWKJhrn9mlogJtuWmPqPlYgyNTBHWD2o2yhpaz9QpDsRbFhx3fmFwUkA*24&amp;data=04*7C01*7CDivino.Cesar*40microsoft.com*7C027f2da3f2e14f914c2608d99b3c195e*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637711504731042513*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000&amp;sdata=30G1N2vm*2BTNOgRtDesl3ssesCGuvx2RUqyw6tns*2FDi0*3D&amp;reserved=0__;JSUlJSUlJSUlJSUqKioqKiUlJSoqKioqKioqKioqKiUlJSoqKiUlJSUlJSUlJSUlJSUlJSUlJSUl!!ACWV5N9M2RV99hQ!a2yltdzJejKSFAqSoN7N5X0yqmRW0Nb7yS2_bBOlp6rrf8axPBXgR9arXi4ZrgjBgIowbA$>>
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8073358&amp;data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731042513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=wkSutLxq2%2B%2FqUsUViubbNO97gQQ9I91%2FarNQqQxIFC8%3D&amp;reserved=0 
> <https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8073358&amp;data=04*7C01*7CDivino.Cesar*40microsoft.com*7C027f2da3f2e14f914c2608d99b3c195e*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637711504731042513*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000&amp;sdata=wkSutLxq2*2B*2FqUsUViubbNO97gQQ9I91*2FarNQqQxIFC8*3D&amp;reserved=0__;JSUlJSUlJSUlJSUlJSUlJSUlJSUl!!ACWV5N9M2RV99hQ!a2yltdzJejKSFAqSoN7N5X0yqmRW0Nb7yS2_bBOlp6rrf8axPBXgR9arXi4Zrgh-8TmvrA$> 
> 
>> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fnam06.safelinks.protection.outlook.com%2F%3Furl%3Dhttps*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8073358%26amp%3Bdata%3D04*7C01*7CDivino.Cesar*40microsoft.com*7C63920cb1798f48c3487508d9996efd44*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637709524262611242*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000%26amp%3Bsdata%3DCypHNEd5B5EymTYMnF6jf30LspY6sBqXoz1sypE2tSg*3D%26amp%3Breserved%3D0__%3BJSUlJSUlJSUlJSUlJSUlJSUl!!ACWV5N9M2RV99hQ!Y9n_pFC3a0ZG4KrWKJhrn9mlogJtuWmPqPlYgyNTBHWD2o2yhpaz9QpDsRbFhx2VVMtprg%24&amp;data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731042513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=qcUOOBHFXNmPXPvG66KDzdlFQvTZ453fdsUliva4W8A%3D&amp;reserved=0 
> <https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Furldefense.com*2Fv3*2F__https*3A*2F*2Fnam06.safelinks.protection.outlook.com*2F*3Furl*3Dhttps*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8073358*26amp*3Bdata*3D04*7C01*7CDivino.Cesar*40microsoft.com*7C63920cb1798f48c3487508d9996efd44*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637709524262611242*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000*26amp*3Bsdata*3DCypHNEd5B5EymTYMnF6jf30LspY6sBqXoz1sypE2tSg*3D*26amp*3Breserved*3D0__*3BJSUlJSUlJSUlJSUlJSUlJSUl!!ACWV5N9M2RV99hQ!Y9n_pFC3a0ZG4KrWKJhrn9mlogJtuWmPqPlYgyNTBHWD2o2yhpaz9QpDsRbFhx2VVMtprg*24&amp;data=04*7C01*7CDivino.Cesar*40microsoft.com*7C027f2da3f2e14f914c2608d99b3c195e*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637711504731042513*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000&amp;sdata=qcUOOBHFXNmPXPvG66KDzdlFQvTZ453fdsUliva4W8A*3D&amp;reserved=0__;JSUlJSUlJSUlJSUqKioqKiUlJSoqKioqKioqKioqKiUlJSolJSUlJSUlJSUlJSUlJSUlJSU!!ACWV5N9M2RV99hQ!a2yltdzJejKSFAqSoN7N5X0yqmRW0Nb7yS2_bBOlp6rrf8axPBXgR9arXi4ZrghnfIIXrA$>>
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8155769&amp;data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731042513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=oCMhOgnX0FjV4j%2Bymy7z8Op6IFfd8z71AZ%2BZlqbYWSU%3D&amp;reserved=0 
> <https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8155769&amp;data=04*7C01*7CDivino.Cesar*40microsoft.com*7C027f2da3f2e14f914c2608d99b3c195e*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637711504731042513*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000&amp;sdata=oCMhOgnX0FjV4j*2Bymy7z8Op6IFfd8z71AZ*2BZlqbYWSU*3D&amp;reserved=0__;JSUlJSUlJSUlJSUlJSUlJSUlJSU!!ACWV5N9M2RV99hQ!a2yltdzJejKSFAqSoN7N5X0yqmRW0Nb7yS2_bBOlp6rrf8axPBXgR9arXi4ZrggugJT1_A$> 
> 
>> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fnam06.safelinks.protection.outlook.com%2F%3Furl%3Dhttps*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8155769%26amp%3Bdata%3D04*7C01*7CDivino.Cesar*40microsoft.com*7C63920cb1798f48c3487508d9996efd44*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637709524262611242*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000%26amp%3Bsdata%3DBE170*2BZrn2c2*2FDLcijZsol25q2zY5X5idHXXwjCn7ug*3D%26amp%3Breserved%3D0__%3BJSUlJSUlJSUlJSUlJSUlJSUlJSU!!ACWV5N9M2RV99hQ!Y9n_pFC3a0ZG4KrWKJhrn9mlogJtuWmPqPlYgyNTBHWD2o2yhpaz9QpDsRbFhx3hRRGkQg%24&amp;data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731042513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Irx%2Bc6pTAZmB6ipB2IF2ma%2BVE7t0mXK%2Fl7%2BiwhPntPA%3D&amp;reserved=0 
> <https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Furldefense.com*2Fv3*2F__https*3A*2F*2Fnam06.safelinks.protection.outlook.com*2F*3Furl*3Dhttps*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8155769*26amp*3Bdata*3D04*7C01*7CDivino.Cesar*40microsoft.com*7C63920cb1798f48c3487508d9996efd44*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637709524262611242*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000*26amp*3Bsdata*3DBE170*2BZrn2c2*2FDLcijZsol25q2zY5X5idHXXwjCn7ug*3D*26amp*3Breserved*3D0__*3BJSUlJSUlJSUlJSUlJSUlJSUlJSU!!ACWV5N9M2RV99hQ!Y9n_pFC3a0ZG4KrWKJhrn9mlogJtuWmPqPlYgyNTBHWD2o2yhpaz9QpDsRbFhx3hRRGkQg*24&amp;data=04*7C01*7CDivino.Cesar*40microsoft.com*7C027f2da3f2e14f914c2608d99b3c195e*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637711504731042513*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000&amp;sdata=Irx*2Bc6pTAZmB6ipB2IF2ma*2BVE7t0mXK*2Fl7*2BiwhPntPA*3D&amp;reserved=0__;JSUlJSUlJSUlJSUqKioqKiUlJSoqKioqKioqKioqKiUlJSoqKiUlJSUlJSUlJSUlJSUlJSUlJSUlJSU!!ACWV5N9M2RV99hQ!a2yltdzJejKSFAqSoN7N5X0yqmRW0Nb7yS2_bBOlp6rrf8axPBXgR9arXi4ZrgiWf_hS0w$>>
>> 
>> Tobias also has fix prototype for next bug which was not fixed yet:
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8236493&amp;data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731042513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=KCLrH3%2FnNhLANzyGrbCLILwuDUfql5h3Lx0REVsol%2F0%3D&amp;reserved=0 
> <https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8236493&amp;data=04*7C01*7CDivino.Cesar*40microsoft.com*7C027f2da3f2e14f914c2608d99b3c195e*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637711504731042513*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000&amp;sdata=KCLrH3*2FnNhLANzyGrbCLILwuDUfql5h3Lx0REVsol*2F0*3D&amp;reserved=0__;JSUlJSUlJSUlJSUlJSUlJSUlJSU!!ACWV5N9M2RV99hQ!a2yltdzJejKSFAqSoN7N5X0yqmRW0Nb7yS2_bBOlp6rrf8axPBXgR9arXi4Zrgjz1pKoaQ$> 
> 
>> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fnam06.safelinks.protection.outlook.com%2F%3Furl%3Dhttps*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8236493%26amp%3Bdata%3D04*7C01*7CDivino.Cesar*40microsoft.com*7C63920cb1798f48c3487508d9996efd44*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637709524262611242*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000%26amp%3Bsdata%3DfqaQ7zhAHGdsnUcw7wjA6c4XX96Aaa3acTIzc6*2FJXmY*3D%26amp%3Breserved%3D0__%3BJSUlJSUlJSUlJSUlJSUlJSUlJQ!!ACWV5N9M2RV99hQ!Y9n_pFC3a0ZG4KrWKJhrn9mlogJtuWmPqPlYgyNTBHWD2o2yhpaz9QpDsRbFhx2urgFigw%24&amp;data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731042513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=TEd4GjLj1FC%2BwwBaix%2B0JwWoSX7ch0nCsVmsI4VDc%2B4%3D&amp;reserved=0 
> <https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Furldefense.com*2Fv3*2F__https*3A*2F*2Fnam06.safelinks.protection.outlook.com*2F*3Furl*3Dhttps*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8236493*26amp*3Bdata*3D04*7C01*7CDivino.Cesar*40microsoft.com*7C63920cb1798f48c3487508d9996efd44*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637709524262611242*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000*26amp*3Bsdata*3DfqaQ7zhAHGdsnUcw7wjA6c4XX96Aaa3acTIzc6*2FJXmY*3D*26amp*3Breserved*3D0__*3BJSUlJSUlJSUlJSUlJSUlJSUlJQ!!ACWV5N9M2RV99hQ!Y9n_pFC3a0ZG4KrWKJhrn9mlogJtuWmPqPlYgyNTBHWD2o2yhpaz9QpDsRbFhx2urgFigw*24&amp;data=04*7C01*7CDivino.Cesar*40microsoft.com*7C027f2da3f2e14f914c2608d99b3c195e*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637711504731042513*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000&amp;sdata=TEd4GjLj1FC*2BwwBaix*2B0JwWoSX7ch0nCsVmsI4VDc*2B4*3D&amp;reserved=0__;JSUlJSUlJSUlJSUqKioqKiUlJSoqKioqKioqKioqKiUlJSoqJSUlJSUlJSUlJSUlJSUlJSUlJSUl!!ACWV5N9M2RV99hQ!a2yltdzJejKSFAqSoN7N5X0yqmRW0Nb7yS2_bBOlp6rrf8axPBXgR9arXi4ZrghE9H_vWw$>>
>> 
>> Ther are 2 test files with small methods for different EA cases I used to see how EA works:
>> 
>> test/hotspot/jtreg/compiler/escapeAnalysis/Test6726999.java
>> test/hotspot/jtreg/compiler/escapeAnalysis/Test6689060.java
>> 
>> You can start looking on above RFE/bug or run these tests and see why scalarization failed for some cases. Except for
>> known merge issue:
>> 
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-6853701&amp;data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731042513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=vYIhnXEGGw%2FLx83NKcCAu0Vdt382TngtfpQ%2BCDBq7cU%3D&amp;reserved=0 
> <https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-6853701&amp;data=04*7C01*7CDivino.Cesar*40microsoft.com*7C027f2da3f2e14f914c2608d99b3c195e*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637711504731042513*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000&amp;sdata=vYIhnXEGGw*2FLx83NKcCAu0Vdt382TngtfpQ*2BCDBq7cU*3D&amp;reserved=0__;JSUlJSUlJSUlJSUlJSUlJSUlJSU!!ACWV5N9M2RV99hQ!a2yltdzJejKSFAqSoN7N5X0yqmRW0Nb7yS2_bBOlp6rrf8axPBXgR9arXi4ZrgjXHRgx9A$> 
> 
>> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fnam06.safelinks.protection.outlook.com%2F%3Furl%3Dhttps*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-6853701%26amp%3Bdata%3D04*7C01*7CDivino.Cesar*40microsoft.com*7C63920cb1798f48c3487508d9996efd44*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637709524262621193*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000%26amp%3Bsdata%3DF*2Bz1CFuCK6ZgXi5*2FWOcOgBWuXKeap0oZJh4873QKRgk*3D%26amp%3Breserved%3D0__%3BJSUlJSUlJSUlJSUlJSUlJSUlJSU!!ACWV5N9M2RV99hQ!Y9n_pFC3a0ZG4KrWKJhrn9mlogJtuWmPqPlYgyNTBHWD2o2yhpaz9QpDsRbFhx1olloG2Q%24&amp;data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731042513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=1G7%2FG24Dpl23jat0F6EMv7EU8ezR2RoviINRcopQwpw%3D&amp;reserved=0 
> <https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Furldefense.com*2Fv3*2F__https*3A*2F*2Fnam06.safelinks.protection.outlook.com*2F*3Furl*3Dhttps*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-6853701*26amp*3Bdata*3D04*7C01*7CDivino.Cesar*40microsoft.com*7C63920cb1798f48c3487508d9996efd44*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637709524262621193*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000*26amp*3Bsdata*3DF*2Bz1CFuCK6ZgXi5*2FWOcOgBWuXKeap0oZJh4873QKRgk*3D*26amp*3Breserved*3D0__*3BJSUlJSUlJSUlJSUlJSUlJSUlJSU!!ACWV5N9M2RV99hQ!Y9n_pFC3a0ZG4KrWKJhrn9mlogJtuWmPqPlYgyNTBHWD2o2yhpaz9QpDsRbFhx1olloG2Q*24&amp;data=04*7C01*7CDivino.Cesar*40microsoft.com*7C027f2da3f2e14f914c2608d99b3c195e*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637711504731042513*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000&amp;sdata=1G7*2FG24Dpl23jat0F6EMv7EU8ezR2RoviINRcopQwpw*3D&amp;reserved=0__;JSUlJSUlJSUlJSUqKioqKiUlJSoqKioqKioqKioqKiUlJSoqKiUlJSUlJSUlJSUlJSUlJSUlJSU!!ACWV5N9M2RV99hQ!a2yltdzJejKSFAqSoN7N5X0yqmRW0Nb7yS2_bBOlp6rrf8axPBXgR9arXi4Zrgg10-OyiA$>>
>> 
>> I am currently looking on iterative EA. Do more EA rounds if we can eliminate more connected allocations. It was
>> proposed by Vladimir Ivanov and I have working prototype.
>> 
>> There is also suggestin from Amazon Java group about "C2 Partial Escape Analysis" which needs more discsussion:
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openjdk.java.net%2Fpipermail%2Fhotspot-compiler-dev%2F2021-May%2F047486.html&amp;data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731052481%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=VUDTiLcJzwQkcUHQzLk7vcOIjmqSKQt8glKSrTHRX6w%3D&amp;reserved=0 
> <https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Fmail.openjdk.java.net*2Fpipermail*2Fhotspot-compiler-dev*2F2021-May*2F047486.html&amp;data=04*7C01*7CDivino.Cesar*40microsoft.com*7C027f2da3f2e14f914c2608d99b3c195e*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637711504731052481*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000&amp;sdata=VUDTiLcJzwQkcUHQzLk7vcOIjmqSKQt8glKSrTHRX6w*3D&amp;reserved=0__;JSUlJSUlJSUlJSUlJSUlJSUlJSU!!ACWV5N9M2RV99hQ!a2yltdzJejKSFAqSoN7N5X0yqmRW0Nb7yS2_bBOlp6rrf8axPBXgR9arXi4ZrgihwSqtAQ$> 
> 
>> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fnam06.safelinks.protection.outlook.com%2F%3Furl%3Dhttps*3A*2F*2Fmail.openjdk.java.net*2Fpipermail*2Fhotspot-compiler-dev*2F2021-May*2F047486.html%26amp%3Bdata%3D04*7C01*7CDivino.Cesar*40microsoft.com*7C63920cb1798f48c3487508d9996efd44*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637709524262621193*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000%26amp%3Bsdata%3DQFszHSDnPkYLBkjqzNkmU92P6VlBFSok1mOku5sNudw*3D%26amp%3Breserved%3D0__%3BJSUlJSUlJSUlJSUlJSUlJSUlJSU!!ACWV5N9M2RV99hQ!Y9n_pFC3a0ZG4KrWKJhrn9mlogJtuWmPqPlYgyNTBHWD2o2yhpaz9QpDsRbFhx2tIPFENw%24&amp;data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731052481%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=eEcFdBEBJb%2Bg%2F2NYA9mp3%2BaBRhshP8Nk9R7lCIrpc7A%3D&amp;reserved=0 
> <https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Furldefense.com*2Fv3*2F__https*3A*2F*2Fnam06.safelinks.protection.outlook.com*2F*3Furl*3Dhttps*3A*2F*2Fmail.openjdk.java.net*2Fpipermail*2Fhotspot-compiler-dev*2F2021-May*2F047486.html*26amp*3Bdata*3D04*7C01*7CDivino.Cesar*40microsoft.com*7C63920cb1798f48c3487508d9996efd44*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637709524262621193*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000*26amp*3Bsdata*3DQFszHSDnPkYLBkjqzNkmU92P6VlBFSok1mOku5sNudw*3D*26amp*3Breserved*3D0__*3BJSUlJSUlJSUlJSUlJSUlJSUlJSU!!ACWV5N9M2RV99hQ!Y9n_pFC3a0ZG4KrWKJhrn9mlogJtuWmPqPlYgyNTBHWD2o2yhpaz9QpDsRbFhx2tIPFENw*24&amp;data=04*7C01*7CDivino.Cesar*40microsoft.com*7C027f2da3f2e14f914c2608d99b3c195e*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637711504731052481*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000&amp;sdata=eEcFdBEBJb*2Bg*2F2NYA9mp3*2BaBRhshP8Nk9R7lCIrpc7A*3D&amp;reserved=0__;JSUlJSUlJSUlJSUqKioqKioqJSUlKioqKioqKioqKioqJSUlKiUlJSUlJSUlJSUlJSUlJSUlJSUlJQ!!ACWV5N9M2RV99hQ!a2yltdzJejKSFAqSoN7N5X0yqmRW0Nb7yS2_bBOlp6rrf8axPBXgR9arXi4ZrggBRWcARQ$>>
>> 
>> Thanks,
>> Vladimir K
>> 
>> On 10/27/21 3:04 AM, Tobias Hartmann wrote:
>>> Hi Cesar,
>>> 
>>> On 27.10.21 08:20, Cesar Soares Lucas wrote:
>>>> Right. I was suspecting this to be the most critical issue indeed. However, I
>>>> didn't know there was a case where "... the object does not escape on any paths
>>>> but control flow is too complicated for EA to prove that." Is this an issue
>>>> tracked in JBS or perhaps you can show me an example where this happens?
>>> 
>>> Sure, here are four examples of EA and/or scalarization failing due to complicated control/data
>>> flow: https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcr.openjdk.java.net%2F~thartmann%2FEA_examples&amp;data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731052481%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=jv5lNO928mVyKHHlZIKyQ2eZGfu4W9ADV%2BlyX2IAvlk%3D&amp;reserved=0 
> <https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Fcr.openjdk.java.net*2F*thartmann*2FEA_examples&amp;data=04*7C01*7CDivino.Cesar*40microsoft.com*7C027f2da3f2e14f914c2608d99b3c195e*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637711504731052481*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000&amp;sdata=jv5lNO928mVyKHHlZIKyQ2eZGfu4W9ADV*2BlyX2IAvlk*3D&amp;reserved=0__;JSUlJX4lJSUlJSUlJSUlJSUlJSU!!ACWV5N9M2RV99hQ!a2yltdzJejKSFAqSoN7N5X0yqmRW0Nb7yS2_bBOlp6rrf8axPBXgR9arXi4Zrgjd4sxUaQ$> 
> 
>> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fnam06.safelinks.protection.outlook.com%2F%3Furl%3Dhttps*3A*2F*2Fcr.openjdk.java.net*2F*thartmann*2FEA_examples%26amp%3Bdata%3D04*7C01*7CDivino.Cesar*40microsoft.com*7C63920cb1798f48c3487508d9996efd44*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637709524262621193*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000%26amp%3Bsdata%3DYbaF4T0zt9dle23nulvUWWLktuTvaWFWENQHD7Q13CE*3D%26amp%3Breserved%3D0__%3BJSUlJX4lJSUlJSUlJSUlJSUlJQ!!ACWV5N9M2RV99hQ!Y9n_pFC3a0ZG4KrWKJhrn9mlogJtuWmPqPlYgyNTBHWD2o2yhpaz9QpDsRbFhx03YEOG3w%24&amp;data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731052481%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=1eI2my4BwLVxIqImsawvnY1sAHVV2Jth2lnMBmMLwFI%3D&amp;reserved=0 
> <https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Furldefense.com*2Fv3*2F__https*3A*2F*2Fnam06.safelinks.protection.outlook.com*2F*3Furl*3Dhttps*3A*2F*2Fcr.openjdk.java.net*2F*thartmann*2FEA_examples*26amp*3Bdata*3D04*7C01*7CDivino.Cesar*40microsoft.com*7C63920cb1798f48c3487508d9996efd44*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637709524262621193*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000*26amp*3Bsdata*3DYbaF4T0zt9dle23nulvUWWLktuTvaWFWENQHD7Q13CE*3D*26amp*3Breserved*3D0__*3BJSUlJX4lJSUlJSUlJSUlJSUlJQ!!ACWV5N9M2RV99hQ!Y9n_pFC3a0ZG4KrWKJhrn9mlogJtuWmPqPlYgyNTBHWD2o2yhpaz9QpDsRbFhx03YEOG3w*24&amp;data=04*7C01*7CDivino.Cesar*40microsoft.com*7C027f2da3f2e14f914c2608d99b3c195e*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637711504731052481*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000&amp;sdata=1eI2my4BwLVxIqImsawvnY1sAHVV2Jth2lnMBmMLwFI*3D&amp;reserved=0__;JSUlJSUlJSUlJSUqKioqKiolJSUqKioqKioqKioqKiolJSUqJSUlJSUlJSUlJSUlJSUlJSUl!!ACWV5N9M2RV99hQ!a2yltdzJejKSFAqSoN7N5X0yqmRW0Nb7yS2_bBOlp6rrf8axPBXgR9arXi4ZrgiSyzgF4A$>>
>>> 
>>> All examples would completely fold with inline types (Valhalla).
>>> 
>>> I'm not sure if these issues are tracked by JBS issues but there's most likely an overlap with some
>>> of the issues you already described.
>>> 
>>> Best regards,
>>> Tobias
>>> 

From psandoz at openjdk.java.net  Thu Nov 11 21:44:53 2021
From: psandoz at openjdk.java.net (Paul Sandoz)
Date: Thu, 11 Nov 2021 21:44:53 GMT
Subject: RFR: 8271515: Integration of JEP 417: Vector API (Third Incubator)
 [v10]
In-Reply-To: <_QQ9ntdJJfzVcAGrbjev0ZM-xNfD4wNATphnXkb-Y00=.bbf46985-8776-4dda-ada5-b15ab50774aa@github.com>
References: <_QQ9ntdJJfzVcAGrbjev0ZM-xNfD4wNATphnXkb-Y00=.bbf46985-8776-4dda-ada5-b15ab50774aa@github.com>
Message-ID: <C9W8aZcLRHcyG5LIi9RY2ZiUYif9R5v1rlLGK_98gD8=.95aaa87a-633a-4d4d-aa9e-cce9c0054c8c@github.com>

> This PR improves the performance of vector operations that accept masks on architectures that support masking in hardware, specifically Intel AVX512 and ARM SVE.
> 
> On architectures that do not support masking in hardware the same technique as before is applied to most operations, specifically composition using blend.
> 
> Masked loads/stores are a special form of masked operation that require additional care to ensure out-of-bounds access throw exceptions. The range checking has not been fully optimized and will require further work.
> 
> No API enhancements were required and only a few additional tests were needed.

Paul Sandoz has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits:

 - Merge branch 'master' into JDK-8271515-vector-api
 - Add missing null check post mask unboxing.
 - Merge pull request #2 from nsjian/vector-conversion-fix
   
   AArch64: Incorrect SVE double to int and float to long vector conversion
 - Incorrect double to int and float to long vector conversion
   
   Like JDK-8276151, SVE vector double to int and float to long
   conversions have similar issue. According to Java language
   specification [1], we should convert double/float to
   integer/long directly, instead of converting to long/int and then
   narrowing/extending to target types. Test cases will be updated in
   JDK-8276151.
   
   [1] https://docs.oracle.com/javase/specs/jls/se17/html/jls-5.html#jls-5.1.3
 - Merge branch 'master' into JDK-8271515-vector-api
 - Merge pull request #1 from nsjian/JDK-8271515
   
   Address AArch64 review comments from Nick.
 - Address review comments from Nick.
 - Merge branch 'master' into JDK-8271515-vector-api
 - Resolve review comments.
 - Merge branch 'master' into JDK-8271515-vector-api
 - ... and 6 more: https://git.openjdk.java.net/jdk/compare/6f35eede...44697f8b

-------------

Changes: https://git.openjdk.java.net/jdk/pull/5873/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5873&range=09
  Stats: 21982 lines in 104 files changed: 16217 ins; 2087 del; 3678 mod
  Patch: https://git.openjdk.java.net/jdk/pull/5873.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/5873/head:pull/5873

PR: https://git.openjdk.java.net/jdk/pull/5873

From jvernee at openjdk.java.net  Thu Nov 11 22:06:34 2021
From: jvernee at openjdk.java.net (Jorn Vernee)
Date: Thu, 11 Nov 2021 22:06:34 GMT
Subject: RFR: 8258192: Obsolete the CriticalJNINatives flag [v2]
In-Reply-To: <CZOsJHuS4ikizyScerSAM9OancK4IcgYi5pmMa50SBE=.c2d36800-d4b7-413f-b943-310d353c7bb9@github.com>
References: <ob57YhEZroN10gysG3zLpZ3Q8GJr2jllHF3MCKJZViw=.58908e4d-3b94-49d7-9fef-a501e001207d@github.com>
 <SRNSoDTVFRw_mF5XktqAo_Fnf8opDZZxXfpPfM2FcK8=.e427ef57-7621-4b3a-a90f-a3cc95855846@github.com>
 <CZOsJHuS4ikizyScerSAM9OancK4IcgYi5pmMa50SBE=.c2d36800-d4b7-413f-b943-310d353c7bb9@github.com>
Message-ID: <83KgAcV3vKaG38tiMA6R4qOZdOqIIBK-ZTIxWLh65lc=.98eea96e-53ef-4bad-b627-46b12f47d35c@github.com>

On Thu, 11 Nov 2021 14:17:40 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

>> src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp line 1905:
>> 
>>> 1903:   } else {
>>> 1904:     // Compute a valid move order, using tmp_vmreg to break any cycles
>>> 1905:     ComputeMoveOrder cmo(total_in_args, in_regs, total_c_args, out_regs, in_sig_bt, arg_order, tmp_vmreg);
>> 
>> `ComputeMoveOrder` is still used somewhere, or?
>
> Yes, it's used in
> cpu/x86/universalUpcallHandler_x86_64.cpp:  SharedRuntime::compute_move_order(in_sig_bt,

FWIW, I have a change in panama-foreign repo that replaces that use with a custom class. Will remove ComputeMoveOrder there as well, and it should be completely gone after the next JEP integration, probably in 19 (the JEP for 18 doesn't include that change).

-------------

PR: https://git.openjdk.java.net/jdk/pull/6343

From mdoerr at openjdk.java.net  Thu Nov 11 22:20:33 2021
From: mdoerr at openjdk.java.net (Martin Doerr)
Date: Thu, 11 Nov 2021 22:20:33 GMT
Subject: RFR: 8258192: Obsolete the CriticalJNINatives flag [v2]
In-Reply-To: <bm2UOEqMEVtYUY0JKGfF0mPV8UZ265iAfu7xsp57iA4=.a8fbe580-f5f7-44e6-9956-41bfa08a63c8@github.com>
References: <ob57YhEZroN10gysG3zLpZ3Q8GJr2jllHF3MCKJZViw=.58908e4d-3b94-49d7-9fef-a501e001207d@github.com>
 <bm2UOEqMEVtYUY0JKGfF0mPV8UZ265iAfu7xsp57iA4=.a8fbe580-f5f7-44e6-9956-41bfa08a63c8@github.com>
Message-ID: <WlgfuJYq-tk3HqNVyseJXB-ujPU_vr9-_9Clq_UlSD4=.1efe718d-40f7-47b5-94e2-59480ce77fcf@github.com>

On Thu, 11 Nov 2021 16:19:01 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

>> This change removes the disabled CriticalJNINatives code, and the flag now gives an obsolete message.
>> Tested with tier1 on cpus x64, aarch64, and builds on linux-x86-open,linux-s390x-open,linux-arm32-debug,linux-ppc64le-debug.
>
> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Some platform adjustments.

LGTM. Thanks!

-------------

Marked as reviewed by mdoerr (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6343

From duke at openjdk.java.net  Thu Nov 11 22:27:46 2021
From: duke at openjdk.java.net (Evgeny Astigeevich)
Date: Thu, 11 Nov 2021 22:27:46 GMT
Subject: Integrated: 8186670: Implement _onSpinWait() intrinsic for AArch64
In-Reply-To: <kNBCb5Kvq3poBJuzHn0mw_MP5ubeoyWUUvvkhkXt2dA=.70e0af5a-b45a-4962-9550-e518747e35fc@github.com>
References: <kNBCb5Kvq3poBJuzHn0mw_MP5ubeoyWUUvvkhkXt2dA=.70e0af5a-b45a-4962-9550-e518747e35fc@github.com>
Message-ID: <WVlW4Qs6RFvietFDcCC2M4T-rnKXWUl-jDyXQN3HspE=.769e0284-2b57-4d05-aa82-ae287aba2658@github.com>

On Fri, 17 Sep 2021 11:26:03 GMT, Evgeny Astigeevich <duke at openjdk.java.net> wrote:

> This PR is a follow-up on the discussion [?RFC: AArch64: Implementing spin pauses with ISB?](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054033.html).
> 
> It adds DIAGNOSTIC options `OnSpinWaitInst=inst`, where `inst` can be:
> 
> - `none`: no implementation for spin pauses. This is the default value.
> - `nop`: use `nop` instruction for spin pauses.
> - `isb`: use `isb` instruction for spin pauses.
> - `yield`: use `yield` instruction for spin pauses.
> 
> And  `OnSpinWaitInstCount=count`, where `count` specifies a number of `OnSpinWaitInst` and can be in `1..99` range. It is an error to use `OnSpinWaitInstCount` when `OnSpinWaitInst` is `none`.
> 
> The code for the `Thread.onSpinWait` intrinsic is generated based on the values of `OnSpinWaitInst` and `OnSpinWaitInstCount`.
> 
> Testing:
> 
> - `make test TEST="gtest"`: Passed
> - `make run-test TEST="tier1"`: Passed
> - `make run-test TEST="tier2"`: Passed
> - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed
> 
> CSR: https://bugs.openjdk.java.net/browse/JDK-8274564

This pull request has now been integrated.

Changeset: 6954b98f
Author:    Evgeny Astigeevich <eastig at amazon.com>
Committer: Paul Hohensee <phh at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/6954b98f8faf29b6c2d13687a7a94e83302bdd85
Stats:     766 lines in 13 files changed: 764 ins; 0 del; 2 mod

8186670: Implement _onSpinWait() intrinsic for AArch64

Reviewed-by: phh, aph

-------------

PR: https://git.openjdk.java.net/jdk/pull/5562

From dlong at openjdk.java.net  Fri Nov 12 03:24:33 2021
From: dlong at openjdk.java.net (Dean Long)
Date: Fri, 12 Nov 2021 03:24:33 GMT
Subject: RFR: 8276095: ciReplay: replay failure due to incomplete
 ciMethodData information
In-Reply-To: <L6nxoPoxbbnjCBHztg_rqU3mVSTqN2y6uey_opFwQrY=.41f9c40b-b796-44d8-a13a-34a267575773@github.com>
References: <EjhE5ReChm9c2C6F0K-WE94bWhagEW7y7qcWLlp6EvY=.a49a439b-32f5-40b1-a81c-47ed0d8047ee@github.com>
 <L6nxoPoxbbnjCBHztg_rqU3mVSTqN2y6uey_opFwQrY=.41f9c40b-b796-44d8-a13a-34a267575773@github.com>
Message-ID: <7Kz38EH-p2ElAwqeGSlbMnaYXH2kpzTTYldNhvA5buM=.495318a2-1505-442f-9ae9-6ae1ebcd11c4@github.com>

On Thu, 11 Nov 2021 16:55:17 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

> Would be nice to print both versions numbers in error message.
> Also I would like to be able ignore such error and process file anyway. Is report_error allows it?

Currently report_error() saves the error string to be printed later, so to have an error message that requires formatting, I guess I would have to allocate the string using malloc or ResourceObj memory.
Right now the only ignore flag is ReplayIgnoreInitErrors.  I could introduce something like ReplayIgnoreAllErrors , or maybe turn this error into a warning.
Christian is waiting on this version number support, so maybe I could create a separate RFE for the above suggestions?

-------------

PR: https://git.openjdk.java.net/jdk/pull/6344

From dlong at openjdk.java.net  Fri Nov 12 03:36:59 2021
From: dlong at openjdk.java.net (Dean Long)
Date: Fri, 12 Nov 2021 03:36:59 GMT
Subject: RFR: 8276095: ciReplay: replay failure due to incomplete
 ciMethodData information [v2]
In-Reply-To: <EjhE5ReChm9c2C6F0K-WE94bWhagEW7y7qcWLlp6EvY=.a49a439b-32f5-40b1-a81c-47ed0d8047ee@github.com>
References: <EjhE5ReChm9c2C6F0K-WE94bWhagEW7y7qcWLlp6EvY=.a49a439b-32f5-40b1-a81c-47ed0d8047ee@github.com>
Message-ID: <AGt0odrNeEUYj93qvQB8XxWsCrQm_9AnChHwPqKFUrs=.ebc30e7e-ef25-496e-a244-9424489cc7d2@github.com>

> The replay data was missing MethodData::_invocation_counter.  Adding it seems to fix the problem.  @rwestrel please verify if it works for you.  Also, with this change:
> 1. added a version number to the replay file
> 2. removed unnused ci fields
> 3. corrected comment in TestLambdas.java

Dean Long has updated the pull request incrementally with three additional commits since the last revision:

 - _current_mileage field is never used, stub out access
 - initialize _version to 0
 - remove comment

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6344/files
  - new: https://git.openjdk.java.net/jdk/pull/6344/files/a7580022..20bea849

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6344&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6344&range=00-01

  Stats: 7 lines in 2 files changed: 1 ins; 5 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6344.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6344/head:pull/6344

PR: https://git.openjdk.java.net/jdk/pull/6344

From dholmes at openjdk.java.net  Fri Nov 12 04:55:34 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Fri, 12 Nov 2021 04:55:34 GMT
Subject: RFR: 8276658: Clean up JNI local handles code [v2]
In-Reply-To: <QHf8KJvHHDGrFnM1hx0QhovzK6aB6YciS4y-GRfJagg=.681d8576-3afb-4a48-b21b-72b98a1f2b75@github.com>
References: <ghp2JMOJnBtdd8Gu4gCIp98JraT8n7YOLvAiAJIrQBU=.8cafc56a-cf56-4209-9ba0-636325123d9e@github.com>
 <QHf8KJvHHDGrFnM1hx0QhovzK6aB6YciS4y-GRfJagg=.681d8576-3afb-4a48-b21b-72b98a1f2b75@github.com>
Message-ID: <AoMxIQjNV77EZ2vp7FN7oRqnpNwl8JdjtP1EreOsysc=.7526f72f-59be-4ccd-9500-080e5727139c@github.com>

On Thu, 11 Nov 2021 13:58:06 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

>> JNI Local handles can only be created by JavaThread (there's an assert in make_local) but the fields are added to Thread.
>> Move the fields to JavaThread and adding JavaThread* argument.
>> Also, the global freelist isn't very useful now that global JNI handles don't use JNIHandleBlock, so the locking that claims incorrectly to block for safepoint is removed.
>> Lastly, there's at least 3 places that duplicate pushing a new JNIHandleBlock to the thread for temporarily adding JNI local handles. These have been moved to common code with a JNIHandleMark object, moved from jvmci code.
>> The commits are separate to help reviewing, but the entire change has been tested together with tier1-6.
>> The commits in this change have been performance tested individually and together with no meaningful differences from mainline.
>
> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Add _is_running initialization.

Marked as reviewed by dholmes (Reviewer).

-------------

PR: https://git.openjdk.java.net/jdk/pull/6336

From dholmes at openjdk.java.net  Fri Nov 12 06:50:33 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Fri, 12 Nov 2021 06:50:33 GMT
Subject: RFR: 8277012: Use blessed modifier order in src/utils
In-Reply-To: <Di7gReO-iIxY8RVWB9O_pc1uE3Adw0eVjbJjNKSEY9U=.136760ca-bc3f-4a16-ab6b-90da95e42ad2@github.com>
References: <Di7gReO-iIxY8RVWB9O_pc1uE3Adw0eVjbJjNKSEY9U=.136760ca-bc3f-4a16-ab6b-90da95e42ad2@github.com>
Message-ID: <HygIEqy5q_dBlLGNgqWiU0YPjHrF05xyK_UXg4ZKUlI=.7282b550-c3a4-4e6d-be02-e5bb85153d8c@github.com>

On Thu, 11 Nov 2021 14:32:18 GMT, Magnus Ihse Bursie <ihse at openjdk.org> wrote:

> I ran bin/blessed-modifier-order.sh on source code in src/utils. This scripts verifies that modifiers are in the "blessed" order, and fixes it otherwise. I have manually checked the changes made by the script to make sure they are sound.
> 
> There are no clear ownership of this code, but I believe it's kind of hotspot-related.

Looks fine.

Thanks,
David

-------------

Marked as reviewed by dholmes (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6354

From stuefe at openjdk.java.net  Fri Nov 12 08:24:34 2021
From: stuefe at openjdk.java.net (Thomas Stuefe)
Date: Fri, 12 Nov 2021 08:24:34 GMT
Subject: RFR: 8277012: Use blessed modifier order in src/utils
In-Reply-To: <Di7gReO-iIxY8RVWB9O_pc1uE3Adw0eVjbJjNKSEY9U=.136760ca-bc3f-4a16-ab6b-90da95e42ad2@github.com>
References: <Di7gReO-iIxY8RVWB9O_pc1uE3Adw0eVjbJjNKSEY9U=.136760ca-bc3f-4a16-ab6b-90da95e42ad2@github.com>
Message-ID: <0dDzCW4HXazTrH_66L0Jrq9MIx2k86QqIgqBNPwh-lg=.f8928984-7a43-4b4a-8a23-245b79c4aa65@github.com>

On Thu, 11 Nov 2021 14:32:18 GMT, Magnus Ihse Bursie <ihse at openjdk.org> wrote:

> I ran bin/blessed-modifier-order.sh on source code in src/utils. This scripts verifies that modifiers are in the "blessed" order, and fixes it otherwise. I have manually checked the changes made by the script to make sure they are sound.
> 
> There are no clear ownership of this code, but I believe it's kind of hotspot-related.

+1

-------------

Marked as reviewed by stuefe (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6354

From chagedorn at openjdk.java.net  Fri Nov 12 09:22:36 2021
From: chagedorn at openjdk.java.net (Christian Hagedorn)
Date: Fri, 12 Nov 2021 09:22:36 GMT
Subject: RFR: 8276095: ciReplay: replay failure due to incomplete
 ciMethodData information [v2]
In-Reply-To: <7Kz38EH-p2ElAwqeGSlbMnaYXH2kpzTTYldNhvA5buM=.495318a2-1505-442f-9ae9-6ae1ebcd11c4@github.com>
References: <EjhE5ReChm9c2C6F0K-WE94bWhagEW7y7qcWLlp6EvY=.a49a439b-32f5-40b1-a81c-47ed0d8047ee@github.com>
 <L6nxoPoxbbnjCBHztg_rqU3mVSTqN2y6uey_opFwQrY=.41f9c40b-b796-44d8-a13a-34a267575773@github.com>
 <7Kz38EH-p2ElAwqeGSlbMnaYXH2kpzTTYldNhvA5buM=.495318a2-1505-442f-9ae9-6ae1ebcd11c4@github.com>
Message-ID: <EA1Rjf1u8qqqW9DANJ68cFOC3_A59C3-dZp4Q_YWtis=.e613d724-e883-43a4-9d3e-f32714c1b91e@github.com>

On Fri, 12 Nov 2021 03:21:46 GMT, Dean Long <dlong at openjdk.org> wrote:

>> src/hotspot/share/ci/ciReplay.cpp line 645:
>> 
>>> 643:       _version = parse_int("version");
>>> 644:       if (_version > REPLAY_VERSION) {
>>> 645:         report_error("unrecognized version");
>> 
>> Would be nice to print both versions numbers in error message.
>> Also I would like to be able ignore such error and process file anyway. Is `report_error` allows it?
>
>> Would be nice to print both versions numbers in error message.
>> Also I would like to be able ignore such error and process file anyway. Is report_error allows it?
> 
> Currently report_error() saves the error string to be printed later, so to have an error message that requires formatting, I guess I would have to allocate the string using malloc or ResourceObj memory.
> Right now the only ignore flag is ReplayIgnoreInitErrors.  I could introduce something like ReplayIgnoreAllErrors , or maybe turn this error into a warning.
> Christian is waiting on this version number support, so maybe I could create a separate RFE for the above suggestions?

It probably makes sense to turn this into a warning for now and file a follow up RFE as you have suggested.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6344

From chagedorn at openjdk.java.net  Fri Nov 12 09:22:37 2021
From: chagedorn at openjdk.java.net (Christian Hagedorn)
Date: Fri, 12 Nov 2021 09:22:37 GMT
Subject: RFR: 8276095: ciReplay: replay failure due to incomplete
 ciMethodData information [v2]
In-Reply-To: <KNFKcqeGpZcbG7U7Ia2J_LzGkltwqQ7Trb6Z5zqLNHI=.6bcc1baf-1c72-4d88-ab87-7244383d12b1@github.com>
References: <EjhE5ReChm9c2C6F0K-WE94bWhagEW7y7qcWLlp6EvY=.a49a439b-32f5-40b1-a81c-47ed0d8047ee@github.com>
 <KNFKcqeGpZcbG7U7Ia2J_LzGkltwqQ7Trb6Z5zqLNHI=.6bcc1baf-1c72-4d88-ab87-7244383d12b1@github.com>
Message-ID: <KiSp0LU8AJB8cF9LcuSrwJzIpms3IJhiLbAu7S2CB0I=.3cc4d9df-b452-471a-a20b-aeb24a003ac4@github.com>

On Thu, 11 Nov 2021 16:59:39 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

>> Dean Long has updated the pull request incrementally with three additional commits since the last revision:
>> 
>>  - _current_mileage field is never used, stub out access
>>  - initialize _version to 0
>>  - remove comment
>
> src/hotspot/share/ci/ciReplay.cpp line 837:
> 
>> 835:     rec->_state = parse_int("state");
>> 836:     if (_version < 1) {
>> 837:       parse_int("current_mileage");
> 
> Why it is not assigned to `rec->_current_mileage` here?

I guess we could leave this in for old replay files with the initialization further down in `ciReplay::initialize()` if `_version < 1`. What do you think @dean-long ?

You should also update the method comment on L805: `<current_mileage>` -> `<invocation_counter>` (or change it to `<current_mileage>/<invocation_counter>` when leaving in the support for old replay files?).

-------------

PR: https://git.openjdk.java.net/jdk/pull/6344

From chagedorn at openjdk.java.net  Fri Nov 12 09:33:33 2021
From: chagedorn at openjdk.java.net (Christian Hagedorn)
Date: Fri, 12 Nov 2021 09:33:33 GMT
Subject: RFR: 8276095: ciReplay: replay failure due to incomplete
 ciMethodData information [v2]
In-Reply-To: <EA1Rjf1u8qqqW9DANJ68cFOC3_A59C3-dZp4Q_YWtis=.e613d724-e883-43a4-9d3e-f32714c1b91e@github.com>
References: <EjhE5ReChm9c2C6F0K-WE94bWhagEW7y7qcWLlp6EvY=.a49a439b-32f5-40b1-a81c-47ed0d8047ee@github.com>
 <L6nxoPoxbbnjCBHztg_rqU3mVSTqN2y6uey_opFwQrY=.41f9c40b-b796-44d8-a13a-34a267575773@github.com>
 <7Kz38EH-p2ElAwqeGSlbMnaYXH2kpzTTYldNhvA5buM=.495318a2-1505-442f-9ae9-6ae1ebcd11c4@github.com>
 <EA1Rjf1u8qqqW9DANJ68cFOC3_A59C3-dZp4Q_YWtis=.e613d724-e883-43a4-9d3e-f32714c1b91e@github.com>
Message-ID: <aGMdFiFbr6ppAEV0jME1THKez5iMovFSbJt1WZuF08M=.527d2e56-1cec-4e44-a1be-8c1b13ae972b@github.com>

On Fri, 12 Nov 2021 08:55:29 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

>>> Would be nice to print both versions numbers in error message.
>>> Also I would like to be able ignore such error and process file anyway. Is report_error allows it?
>> 
>> Currently report_error() saves the error string to be printed later, so to have an error message that requires formatting, I guess I would have to allocate the string using malloc or ResourceObj memory.
>> Right now the only ignore flag is ReplayIgnoreInitErrors.  I could introduce something like ReplayIgnoreAllErrors , or maybe turn this error into a warning.
>> Christian is waiting on this version number support, so maybe I could create a separate RFE for the above suggestions?
>
> It probably makes sense to turn this into a warning for now and file a follow up RFE as you have suggested.

However, thinking again about this, it should not happen that we parse a version number that's not supported. Maybe we should keep the error as it is indeed unexpected. It should also be easy to check the replay file manually in the error case to see which version number it had. Old replay files should still work as there is no "version X" line. Maybe you should also add `|| _version < 0` on L644. 

But the RFE still makes sense to improve the error reporting and to think about a new flag to ignore all errors.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6344

From mcimadamore at openjdk.java.net  Fri Nov 12 11:16:17 2021
From: mcimadamore at openjdk.java.net (Maurizio Cimadamore)
Date: Fri, 12 Nov 2021 11:16:17 GMT
Subject: RFR: 8275063: Implementation of Foreign Function & Memory API
 (Second incubator) [v24]
In-Reply-To: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
References: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
Message-ID: <eoGkgm-Zyrnr7AzNy9mm8y1sKSnc5QgxKwRbbQQlz3E=.de691597-aca7-419e-b58f-8402712add66@github.com>

> This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment.
> 
> [1] - https://openjdk.java.net/jeps/419

Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision:

  Adopt blessed modofier order

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/5907/files
  - new: https://git.openjdk.java.net/jdk/pull/5907/files/8c3860f8..79d3d685

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=23
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=22-23

  Stats: 7 lines in 6 files changed: 0 ins; 0 del; 7 mod
  Patch: https://git.openjdk.java.net/jdk/pull/5907.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/5907/head:pull/5907

PR: https://git.openjdk.java.net/jdk/pull/5907

From coleenp at openjdk.java.net  Fri Nov 12 13:10:44 2021
From: coleenp at openjdk.java.net (Coleen Phillimore)
Date: Fri, 12 Nov 2021 13:10:44 GMT
Subject: RFR: 8258192: Obsolete the CriticalJNINatives flag [v2]
In-Reply-To: <83KgAcV3vKaG38tiMA6R4qOZdOqIIBK-ZTIxWLh65lc=.98eea96e-53ef-4bad-b627-46b12f47d35c@github.com>
References: <ob57YhEZroN10gysG3zLpZ3Q8GJr2jllHF3MCKJZViw=.58908e4d-3b94-49d7-9fef-a501e001207d@github.com>
 <SRNSoDTVFRw_mF5XktqAo_Fnf8opDZZxXfpPfM2FcK8=.e427ef57-7621-4b3a-a90f-a3cc95855846@github.com>
 <CZOsJHuS4ikizyScerSAM9OancK4IcgYi5pmMa50SBE=.c2d36800-d4b7-413f-b943-310d353c7bb9@github.com>
 <83KgAcV3vKaG38tiMA6R4qOZdOqIIBK-ZTIxWLh65lc=.98eea96e-53ef-4bad-b627-46b12f47d35c@github.com>
Message-ID: <0gBVXgVFX1iEGdOhB_4TlrkqIAB5997vYcTMqf38YFg=.bc8f3ef5-f8f1-4afd-be13-1e0fb459a439@github.com>

On Thu, 11 Nov 2021 22:02:35 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:

>> Yes, it's used in
>> cpu/x86/universalUpcallHandler_x86_64.cpp:  SharedRuntime::compute_move_order(in_sig_bt,
>
> FWIW, I have a change in panama-foreign repo that replaces that use with a custom class. Will remove ComputeMoveOrder there as well, and it should be completely gone after the next JEP integration, probably in 19 (the JEP for 18 doesn't include that change).

Ok, thanks Jorn.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6343

From coleenp at openjdk.java.net  Fri Nov 12 13:10:44 2021
From: coleenp at openjdk.java.net (Coleen Phillimore)
Date: Fri, 12 Nov 2021 13:10:44 GMT
Subject: RFR: 8258192: Obsolete the CriticalJNINatives flag [v2]
In-Reply-To: <WlgfuJYq-tk3HqNVyseJXB-ujPU_vr9-_9Clq_UlSD4=.1efe718d-40f7-47b5-94e2-59480ce77fcf@github.com>
References: <ob57YhEZroN10gysG3zLpZ3Q8GJr2jllHF3MCKJZViw=.58908e4d-3b94-49d7-9fef-a501e001207d@github.com>
 <bm2UOEqMEVtYUY0JKGfF0mPV8UZ265iAfu7xsp57iA4=.a8fbe580-f5f7-44e6-9956-41bfa08a63c8@github.com>
 <WlgfuJYq-tk3HqNVyseJXB-ujPU_vr9-_9Clq_UlSD4=.1efe718d-40f7-47b5-94e2-59480ce77fcf@github.com>
Message-ID: <HcnaxpTmPvn5rEQjCxUS-cbm6KaZa_Ys86Nd8Iffhrw=.7183013c-3c0b-4924-9c70-e67b3a617146@github.com>

On Thu, 11 Nov 2021 22:17:41 GMT, Martin Doerr <mdoerr at openjdk.org> wrote:

>> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Some platform adjustments.
>
> LGTM. Thanks!

Thanks for reviewing @TheRealMDoerr .  @shipilev does this look good now?

-------------

PR: https://git.openjdk.java.net/jdk/pull/6343

From shade at openjdk.java.net  Fri Nov 12 14:07:37 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Fri, 12 Nov 2021 14:07:37 GMT
Subject: RFR: 8258192: Obsolete the CriticalJNINatives flag [v2]
In-Reply-To: <bm2UOEqMEVtYUY0JKGfF0mPV8UZ265iAfu7xsp57iA4=.a8fbe580-f5f7-44e6-9956-41bfa08a63c8@github.com>
References: <ob57YhEZroN10gysG3zLpZ3Q8GJr2jllHF3MCKJZViw=.58908e4d-3b94-49d7-9fef-a501e001207d@github.com>
 <bm2UOEqMEVtYUY0JKGfF0mPV8UZ265iAfu7xsp57iA4=.a8fbe580-f5f7-44e6-9956-41bfa08a63c8@github.com>
Message-ID: <k1elxbICUjZTYbWGNx1q7L95uqykLDVezTvQUfp63eU=.3639b4e9-d1ac-42d7-b625-d98523872e4c@github.com>

On Thu, 11 Nov 2021 16:19:01 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

>> This change removes the disabled CriticalJNINatives code, and the flag now gives an obsolete message.
>> Tested with tier1 on cpus x64, aarch64, and builds on linux-x86-open,linux-s390x-open,linux-arm32-debug,linux-ppc64le-debug.
>
> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Some platform adjustments.

Yeah, I am fine with this.

-------------

Marked as reviewed by shade (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6343

From ihse at openjdk.java.net  Fri Nov 12 14:12:38 2021
From: ihse at openjdk.java.net (Magnus Ihse Bursie)
Date: Fri, 12 Nov 2021 14:12:38 GMT
Subject: Integrated: 8277012: Use blessed modifier order in src/utils
In-Reply-To: <Di7gReO-iIxY8RVWB9O_pc1uE3Adw0eVjbJjNKSEY9U=.136760ca-bc3f-4a16-ab6b-90da95e42ad2@github.com>
References: <Di7gReO-iIxY8RVWB9O_pc1uE3Adw0eVjbJjNKSEY9U=.136760ca-bc3f-4a16-ab6b-90da95e42ad2@github.com>
Message-ID: <4JP7H2okj8RsYPM_9UpEJKDJwjmryFGpfxf1_7vZICI=.020e8c9f-1035-4a5a-b0e8-d47c121833f9@github.com>

On Thu, 11 Nov 2021 14:32:18 GMT, Magnus Ihse Bursie <ihse at openjdk.org> wrote:

> I ran bin/blessed-modifier-order.sh on source code in src/utils. This scripts verifies that modifiers are in the "blessed" order, and fixes it otherwise. I have manually checked the changes made by the script to make sure they are sound.
> 
> There are no clear ownership of this code, but I believe it's kind of hotspot-related.

This pull request has now been integrated.

Changeset: c4b44329
Author:    Magnus Ihse Bursie <ihse at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/c4b44329c1d250f790ca82dd419cdf3330da16f5
Stats:     25 lines in 10 files changed: 0 ins; 0 del; 25 mod

8277012: Use blessed modifier order in src/utils

Reviewed-by: dholmes, stuefe

-------------

PR: https://git.openjdk.java.net/jdk/pull/6354

From coleenp at openjdk.java.net  Fri Nov 12 14:25:58 2021
From: coleenp at openjdk.java.net (Coleen Phillimore)
Date: Fri, 12 Nov 2021 14:25:58 GMT
Subject: RFR: 8258192: Obsolete the CriticalJNINatives flag [v3]
In-Reply-To: <ob57YhEZroN10gysG3zLpZ3Q8GJr2jllHF3MCKJZViw=.58908e4d-3b94-49d7-9fef-a501e001207d@github.com>
References: <ob57YhEZroN10gysG3zLpZ3Q8GJr2jllHF3MCKJZViw=.58908e4d-3b94-49d7-9fef-a501e001207d@github.com>
Message-ID: <9X29Q3z7nla7QEjfl1RqUQU6K3spIAuAtQFiCxNiCHM=.9c630211-5934-4b8f-ba39-987a2b41f845@github.com>

> This change removes the disabled CriticalJNINatives code, and the flag now gives an obsolete message.
> Tested with tier1 on cpus x64, aarch64, and builds on linux-x86-open,linux-s390x-open,linux-arm32-debug,linux-ppc64le-debug.

Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits:

 - Merge master
 - Some platform adjustments.
 - 8258192: Obsolete the CriticalJNINatives flag

-------------

Changes: https://git.openjdk.java.net/jdk/pull/6343/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6343&range=02
  Stats: 1849 lines in 24 files changed: 0 ins; 1673 del; 176 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6343.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6343/head:pull/6343

PR: https://git.openjdk.java.net/jdk/pull/6343

From pchilanomate at openjdk.java.net  Fri Nov 12 15:15:35 2021
From: pchilanomate at openjdk.java.net (Patricio Chilano Mateo)
Date: Fri, 12 Nov 2021 15:15:35 GMT
Subject: RFR: 8276658: Clean up JNI local handles code [v2]
In-Reply-To: <QHf8KJvHHDGrFnM1hx0QhovzK6aB6YciS4y-GRfJagg=.681d8576-3afb-4a48-b21b-72b98a1f2b75@github.com>
References: <ghp2JMOJnBtdd8Gu4gCIp98JraT8n7YOLvAiAJIrQBU=.8cafc56a-cf56-4209-9ba0-636325123d9e@github.com>
 <QHf8KJvHHDGrFnM1hx0QhovzK6aB6YciS4y-GRfJagg=.681d8576-3afb-4a48-b21b-72b98a1f2b75@github.com>
Message-ID: <Kh_8wu8rTxauoTYtwE-xaTvea4nGzNGglIDhrYL8rkw=.bd592672-c353-4cfa-9cc5-65ab44fa2361@github.com>

On Thu, 11 Nov 2021 13:58:06 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

>> JNI Local handles can only be created by JavaThread (there's an assert in make_local) but the fields are added to Thread.
>> Move the fields to JavaThread and adding JavaThread* argument.
>> Also, the global freelist isn't very useful now that global JNI handles don't use JNIHandleBlock, so the locking that claims incorrectly to block for safepoint is removed.
>> Lastly, there's at least 3 places that duplicate pushing a new JNIHandleBlock to the thread for temporarily adding JNI local handles. These have been moved to common code with a JNIHandleMark object, moved from jvmci code.
>> The commits are separate to help reviewing, but the entire change has been tested together with tier1-6.
>> The commits in this change have been performance tested individually and together with no meaningful differences from mainline.
>
> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Add _is_running initialization.

Hi Coleen,

Cleanup looks good to me.

Thanks,
Patricio

src/hotspot/share/jfr/dcmd/jfrDcmds.cpp line 181:

> 179:   JNIHandleMark jni_handle_management(THREAD);
> 180: 
> 181:   DEBUG_ONLY(JfrJavaSupport::check_java_thread_in_vm(THREAD));

This method will call into Java below which already checks the thread is in vm so maybe this is not necessary. Even construct_dcmd_instance() has that assert.

-------------

Marked as reviewed by pchilanomate (Committer).

PR: https://git.openjdk.java.net/jdk/pull/6336

From duke at openjdk.java.net  Fri Nov 12 16:18:04 2021
From: duke at openjdk.java.net (Alan Hayward)
Date: Fri, 12 Nov 2021 16:18:04 GMT
Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v3]
In-Reply-To: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
Message-ID: <iON4l4nrF85-Ko8-Mm92ShAvM08-Vd9WCel12eFrv3Y=.2743317e-c21e-4632-aca6-c3129d00cc86@github.com>

> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One
> of its uses is to protect against ROP based attacks. This is done by
> signing the Link Register whenever it is stored on the stack, and
> authenticating the value when it is loaded back from the stack. If an
> attacker were to try to change control flow by editing the stack then
> the authentication check of the Link Register will fail, causing a
> segfault when the function returns.
> 
> On a system with PAC enabled, it is expected that all applications will
> be compiled with ROP protection. Fedora 33 and upwards already provide
> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of
> PAC instructions that exist in the NOP space - on hardware without PAC,
> these instructions act as NOPs, allowing backward compatibility for
> negligible performance cost (2 NOPs per non-leaf function).
> 
> Hardware is currently limited to the Apple M1 MacBooks. All testing has
> been done within a Fedora Docker image. A run of SpecJVM showed no
> difference to that of noise - which was surprising.
> 
> The most important part of this patch is simply compiling using branch
> protection provided by GCC/LLVM. This protects all C++ code from being
> used in ROP attacks, removing all static ROP gadgets from use.
> 
> The remainder of the patch adds ROP protection to runtime generated
> code, in both stubs and compiled Java code. Attacks here are much harder
> as ROP gadgets must be found dynamically at runtime. If/when AOT
> compilation is added to JDK, then all stubs and compiled Java will be
> susceptible ROP gadgets being found by static analysis and therefore
> potentially as vulnerable as C++ code.
> 
> There are a number of places where the VM changes control flow by
> rewriting the stack or otherwise. I?ve done some analysis as to how
> these could also be used for attacks (which I didn?t want to post here).
> These areas can be protected ensuring the pointers to various stubs and
> entry points are stored in memory as signed pointers. These changes are
> simple to make (they can be reduced to a type change in common code and
> a few addition sign/auth calls in the backend), but there a lot of them
> and the total code change is fairly large. I?m happy to provide a few
> work in progress patches.
> 
> In order to match the security benefits of the Apple Arm64e ABI across
> the whole of JDK, then all the changes mentioned above would be
> required.

Alan Hayward has updated the pull request incrementally with two additional commits since the last revision:

 - Document pauth functions && remove OS split
 - Update UseROPProtection description

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6334/files
  - new: https://git.openjdk.java.net/jdk/pull/6334/files/29471d30..25e62492

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=01-02

  Stats: 369 lines in 9 files changed: 129 ins; 219 del; 21 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6334.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6334/head:pull/6334

PR: https://git.openjdk.java.net/jdk/pull/6334

From duke at openjdk.java.net  Fri Nov 12 16:18:07 2021
From: duke at openjdk.java.net (Alan Hayward)
Date: Fri, 12 Nov 2021 16:18:07 GMT
Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v2]
In-Reply-To: <DqpP1khabg9YQPvJsdAZm5Bl-aV5tCBhuIC_2yeNvPU=.1dcc624c-b345-42de-acc6-0341a3118dc7@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
 <DqpP1khabg9YQPvJsdAZm5Bl-aV5tCBhuIC_2yeNvPU=.1dcc624c-b345-42de-acc6-0341a3118dc7@github.com>
Message-ID: <M6PojUDAbHBzvSUFQNiKhNNVRDq13Pk-7w9gxeo5qlE=.e9f1dcc4-8a6a-4ece-864b-81703b09a920@github.com>

On Thu, 11 Nov 2021 08:48:07 GMT, Alan Hayward <duke at openjdk.java.net> wrote:

>> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One
>> of its uses is to protect against ROP based attacks. This is done by
>> signing the Link Register whenever it is stored on the stack, and
>> authenticating the value when it is loaded back from the stack. If an
>> attacker were to try to change control flow by editing the stack then
>> the authentication check of the Link Register will fail, causing a
>> segfault when the function returns.
>> 
>> On a system with PAC enabled, it is expected that all applications will
>> be compiled with ROP protection. Fedora 33 and upwards already provide
>> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of
>> PAC instructions that exist in the NOP space - on hardware without PAC,
>> these instructions act as NOPs, allowing backward compatibility for
>> negligible performance cost (2 NOPs per non-leaf function).
>> 
>> Hardware is currently limited to the Apple M1 MacBooks. All testing has
>> been done within a Fedora Docker image. A run of SpecJVM showed no
>> difference to that of noise - which was surprising.
>> 
>> The most important part of this patch is simply compiling using branch
>> protection provided by GCC/LLVM. This protects all C++ code from being
>> used in ROP attacks, removing all static ROP gadgets from use.
>> 
>> The remainder of the patch adds ROP protection to runtime generated
>> code, in both stubs and compiled Java code. Attacks here are much harder
>> as ROP gadgets must be found dynamically at runtime. If/when AOT
>> compilation is added to JDK, then all stubs and compiled Java will be
>> susceptible ROP gadgets being found by static analysis and therefore
>> potentially as vulnerable as C++ code.
>> 
>> There are a number of places where the VM changes control flow by
>> rewriting the stack or otherwise. I?ve done some analysis as to how
>> these could also be used for attacks (which I didn?t want to post here).
>> These areas can be protected ensuring the pointers to various stubs and
>> entry points are stored in memory as signed pointers. These changes are
>> simple to make (they can be reduced to a type change in common code and
>> a few addition sign/auth calls in the backend), but there a lot of them
>> and the total code change is fairly large. I?m happy to provide a few
>> work in progress patches.
>> 
>> In order to match the security benefits of the Apple Arm64e ABI across
>> the whole of JDK, then all the changes mentioned above would be
>> required.
>
> Alan Hayward has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Simplify branch protection configure check

*Updated UseROPProtection message
*Moved pauth functions into single file
*Added comments
*Removed superfluous modifier arg from macroassembler funcs

-------------

PR: https://git.openjdk.java.net/jdk/pull/6334

From coleenp at openjdk.java.net  Fri Nov 12 16:22:05 2021
From: coleenp at openjdk.java.net (Coleen Phillimore)
Date: Fri, 12 Nov 2021 16:22:05 GMT
Subject: RFR: 8276658: Clean up JNI local handles code [v3]
In-Reply-To: <ghp2JMOJnBtdd8Gu4gCIp98JraT8n7YOLvAiAJIrQBU=.8cafc56a-cf56-4209-9ba0-636325123d9e@github.com>
References: <ghp2JMOJnBtdd8Gu4gCIp98JraT8n7YOLvAiAJIrQBU=.8cafc56a-cf56-4209-9ba0-636325123d9e@github.com>
Message-ID: <tgYmnsMtT1K7jZjH-ZABQj3omXBtC9-ERMSBiZtwO4g=.b445b735-baa4-4e28-9d4f-5b72bb21b43f@github.com>

> JNI Local handles can only be created by JavaThread (there's an assert in make_local) but the fields are added to Thread.
> Move the fields to JavaThread and adding JavaThread* argument.
> Also, the global freelist isn't very useful now that global JNI handles don't use JNIHandleBlock, so the locking that claims incorrectly to block for safepoint is removed.
> Lastly, there's at least 3 places that duplicate pushing a new JNIHandleBlock to the thread for temporarily adding JNI local handles. These have been moved to common code with a JNIHandleMark object, moved from jvmci code.
> The commits are separate to help reviewing, but the entire change has been tested together with tier1-6.
> The commits in this change have been performance tested individually and together with no meaningful differences from mainline.

Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision:

  Remove redundant assert.

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6336/files
  - new: https://git.openjdk.java.net/jdk/pull/6336/files/f31dfeee..f24e32c1

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6336&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6336&range=01-02

  Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6336.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6336/head:pull/6336

PR: https://git.openjdk.java.net/jdk/pull/6336

From coleenp at openjdk.java.net  Fri Nov 12 16:22:10 2021
From: coleenp at openjdk.java.net (Coleen Phillimore)
Date: Fri, 12 Nov 2021 16:22:10 GMT
Subject: RFR: 8276658: Clean up JNI local handles code [v2]
In-Reply-To: <QHf8KJvHHDGrFnM1hx0QhovzK6aB6YciS4y-GRfJagg=.681d8576-3afb-4a48-b21b-72b98a1f2b75@github.com>
References: <ghp2JMOJnBtdd8Gu4gCIp98JraT8n7YOLvAiAJIrQBU=.8cafc56a-cf56-4209-9ba0-636325123d9e@github.com>
 <QHf8KJvHHDGrFnM1hx0QhovzK6aB6YciS4y-GRfJagg=.681d8576-3afb-4a48-b21b-72b98a1f2b75@github.com>
Message-ID: <s9bSGRtg6MxlSQ1PvR-_GcvbYoXTjl4y39vJVWNnOF8=.5866f909-0363-4481-8ea9-877ed92dbbf1@github.com>

On Thu, 11 Nov 2021 13:58:06 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

>> JNI Local handles can only be created by JavaThread (there's an assert in make_local) but the fields are added to Thread.
>> Move the fields to JavaThread and adding JavaThread* argument.
>> Also, the global freelist isn't very useful now that global JNI handles don't use JNIHandleBlock, so the locking that claims incorrectly to block for safepoint is removed.
>> Lastly, there's at least 3 places that duplicate pushing a new JNIHandleBlock to the thread for temporarily adding JNI local handles. These have been moved to common code with a JNIHandleMark object, moved from jvmci code.
>> The commits are separate to help reviewing, but the entire change has been tested together with tier1-6.
>> The commits in this change have been performance tested individually and together with no meaningful differences from mainline.
>
> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Add _is_running initialization.

Thanks for the review, Patricio and David.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6336

From coleenp at openjdk.java.net  Fri Nov 12 16:22:15 2021
From: coleenp at openjdk.java.net (Coleen Phillimore)
Date: Fri, 12 Nov 2021 16:22:15 GMT
Subject: RFR: 8276658: Clean up JNI local handles code [v2]
In-Reply-To: <Kh_8wu8rTxauoTYtwE-xaTvea4nGzNGglIDhrYL8rkw=.bd592672-c353-4cfa-9cc5-65ab44fa2361@github.com>
References: <ghp2JMOJnBtdd8Gu4gCIp98JraT8n7YOLvAiAJIrQBU=.8cafc56a-cf56-4209-9ba0-636325123d9e@github.com>
 <QHf8KJvHHDGrFnM1hx0QhovzK6aB6YciS4y-GRfJagg=.681d8576-3afb-4a48-b21b-72b98a1f2b75@github.com>
 <Kh_8wu8rTxauoTYtwE-xaTvea4nGzNGglIDhrYL8rkw=.bd592672-c353-4cfa-9cc5-65ab44fa2361@github.com>
Message-ID: <lVqlMDPNjTYpJ8ia-O65kH_8S3QtadUVDxI1zS-d5Tg=.1f83db43-0d3c-4d1d-a2d2-55e8a7c20d46@github.com>

On Fri, 12 Nov 2021 15:08:24 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote:

>> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Add _is_running initialization.
>
> src/hotspot/share/jfr/dcmd/jfrDcmds.cpp line 181:
> 
>> 179:   JNIHandleMark jni_handle_management(THREAD);
>> 180: 
>> 181:   DEBUG_ONLY(JfrJavaSupport::check_java_thread_in_vm(THREAD));
> 
> This method will call into Java below which already checks the thread is in vm so maybe this is not necessary. Even construct_dcmd_instance() has that assert.

You're right, it's doubly redundant.  I'll remove it.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6336

From coleenp at openjdk.java.net  Fri Nov 12 16:22:16 2021
From: coleenp at openjdk.java.net (Coleen Phillimore)
Date: Fri, 12 Nov 2021 16:22:16 GMT
Subject: Integrated: 8276658: Clean up JNI local handles code
In-Reply-To: <ghp2JMOJnBtdd8Gu4gCIp98JraT8n7YOLvAiAJIrQBU=.8cafc56a-cf56-4209-9ba0-636325123d9e@github.com>
References: <ghp2JMOJnBtdd8Gu4gCIp98JraT8n7YOLvAiAJIrQBU=.8cafc56a-cf56-4209-9ba0-636325123d9e@github.com>
Message-ID: <T-7ZbmjBp5Qr8KH7CEVRrm_RCL2-W0pMRk6VZNobXlc=.a6e0cfa5-fd47-4e59-b401-c163f0c611d7@github.com>

On Wed, 10 Nov 2021 17:16:29 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

> JNI Local handles can only be created by JavaThread (there's an assert in make_local) but the fields are added to Thread.
> Move the fields to JavaThread and adding JavaThread* argument.
> Also, the global freelist isn't very useful now that global JNI handles don't use JNIHandleBlock, so the locking that claims incorrectly to block for safepoint is removed.
> Lastly, there's at least 3 places that duplicate pushing a new JNIHandleBlock to the thread for temporarily adding JNI local handles. These have been moved to common code with a JNIHandleMark object, moved from jvmci code.
> The commits are separate to help reviewing, but the entire change has been tested together with tier1-6.
> The commits in this change have been performance tested individually and together with no meaningful differences from mainline.

This pull request has now been integrated.

Changeset: 3b2585c0
Author:    Coleen Phillimore <coleenp at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/3b2585c02bd9d66cc2c8b2d5c16e9a48f4280d07
Stats:     425 lines in 25 files changed: 75 ins; 302 del; 48 mod

8276658: Clean up JNI local handles code

Reviewed-by: dholmes, pchilanomate

-------------

PR: https://git.openjdk.java.net/jdk/pull/6336

From kvn at openjdk.java.net  Fri Nov 12 17:02:44 2021
From: kvn at openjdk.java.net (Vladimir Kozlov)
Date: Fri, 12 Nov 2021 17:02:44 GMT
Subject: RFR: 8276095: ciReplay: replay failure due to incomplete
 ciMethodData information [v2]
In-Reply-To: <AGt0odrNeEUYj93qvQB8XxWsCrQm_9AnChHwPqKFUrs=.ebc30e7e-ef25-496e-a244-9424489cc7d2@github.com>
References: <EjhE5ReChm9c2C6F0K-WE94bWhagEW7y7qcWLlp6EvY=.a49a439b-32f5-40b1-a81c-47ed0d8047ee@github.com>
 <AGt0odrNeEUYj93qvQB8XxWsCrQm_9AnChHwPqKFUrs=.ebc30e7e-ef25-496e-a244-9424489cc7d2@github.com>
Message-ID: <R_1E263OmEZbwAC-7JsZcGBsjcMmChhNXaYrgVFjFC4=.91f92351-ca13-4b21-8d5b-8f0a4d2b3e71@github.com>

On Fri, 12 Nov 2021 03:36:59 GMT, Dean Long <dlong at openjdk.org> wrote:

>> The replay data was missing MethodData::_invocation_counter.  Adding it seems to fix the problem.  @rwestrel please verify if it works for you.  Also, with this change:
>> 1. added a version number to the replay file
>> 2. removed unnused ci fields
>> 3. corrected comment in TestLambdas.java
>
> Dean Long has updated the pull request incrementally with three additional commits since the last revision:
> 
>  - _current_mileage field is never used, stub out access
>  - initialize _version to 0
>  - remove comment

okay

-------------

Marked as reviewed by kvn (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6344

From kvn at openjdk.java.net  Fri Nov 12 17:02:44 2021
From: kvn at openjdk.java.net (Vladimir Kozlov)
Date: Fri, 12 Nov 2021 17:02:44 GMT
Subject: RFR: 8276095: ciReplay: replay failure due to incomplete
 ciMethodData information [v2]
In-Reply-To: <aGMdFiFbr6ppAEV0jME1THKez5iMovFSbJt1WZuF08M=.527d2e56-1cec-4e44-a1be-8c1b13ae972b@github.com>
References: <EjhE5ReChm9c2C6F0K-WE94bWhagEW7y7qcWLlp6EvY=.a49a439b-32f5-40b1-a81c-47ed0d8047ee@github.com>
 <L6nxoPoxbbnjCBHztg_rqU3mVSTqN2y6uey_opFwQrY=.41f9c40b-b796-44d8-a13a-34a267575773@github.com>
 <7Kz38EH-p2ElAwqeGSlbMnaYXH2kpzTTYldNhvA5buM=.495318a2-1505-442f-9ae9-6ae1ebcd11c4@github.com>
 <EA1Rjf1u8qqqW9DANJ68cFOC3_A59C3-dZp4Q_YWtis=.e613d724-e883-43a4-9d3e-f32714c1b91e@github.com>
 <aGMdFiFbr6ppAEV0jME1THKez5iMovFSbJt1WZuF08M=.527d2e56-1cec-4e44-a1be-8c1b13ae972b@github.com>
Message-ID: <qbKoMYfu-_k7uLV2rnfUaf1W000BbSAz6igpFiueGxQ=.6803703e-abc9-4ca7-b1ba-9c124f5874bc@github.com>

On Fri, 12 Nov 2021 09:30:38 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

>> It probably makes sense to turn this into a warning for now and file a follow up RFE as you have suggested.
>
> However, thinking again about this, it should not happen that we parse a version number that's not supported. Maybe we should keep the error as it is indeed unexpected. It should also be easy to check the replay file manually in the error case to see which version number it had. Old replay files should still work as there is no "version X" line. Maybe you should also add `|| _version < 0` on L644. 
> 
> But the RFE still makes sense to improve the error reporting and to think about a new flag to ignore all errors.

Yes, file separate RFE.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6344

From coleenp at openjdk.java.net  Fri Nov 12 17:06:46 2021
From: coleenp at openjdk.java.net (Coleen Phillimore)
Date: Fri, 12 Nov 2021 17:06:46 GMT
Subject: RFR: 8258192: Obsolete the CriticalJNINatives flag [v2]
In-Reply-To: <k1elxbICUjZTYbWGNx1q7L95uqykLDVezTvQUfp63eU=.3639b4e9-d1ac-42d7-b625-d98523872e4c@github.com>
References: <ob57YhEZroN10gysG3zLpZ3Q8GJr2jllHF3MCKJZViw=.58908e4d-3b94-49d7-9fef-a501e001207d@github.com>
 <bm2UOEqMEVtYUY0JKGfF0mPV8UZ265iAfu7xsp57iA4=.a8fbe580-f5f7-44e6-9956-41bfa08a63c8@github.com>
 <k1elxbICUjZTYbWGNx1q7L95uqykLDVezTvQUfp63eU=.3639b4e9-d1ac-42d7-b625-d98523872e4c@github.com>
Message-ID: <TlXZCJuajY4iECYEW0BRt7ztjb2DPZc5_VAvnbpRfz4=.a56d237d-81ea-4c4f-97af-9614d9402b00@github.com>

On Fri, 12 Nov 2021 14:04:42 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Some platform adjustments.
>
> Yeah, I am fine with this.

Thanks @shipilev .  All the GHA passed after resolving above merge conflict.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6343

From coleenp at openjdk.java.net  Fri Nov 12 17:06:46 2021
From: coleenp at openjdk.java.net (Coleen Phillimore)
Date: Fri, 12 Nov 2021 17:06:46 GMT
Subject: Integrated: 8258192: Obsolete the CriticalJNINatives flag
In-Reply-To: <ob57YhEZroN10gysG3zLpZ3Q8GJr2jllHF3MCKJZViw=.58908e4d-3b94-49d7-9fef-a501e001207d@github.com>
References: <ob57YhEZroN10gysG3zLpZ3Q8GJr2jllHF3MCKJZViw=.58908e4d-3b94-49d7-9fef-a501e001207d@github.com>
Message-ID: <Tz5dUNVpQ5TPKFYJyPIEUOanJONmBmfwVxUtLtWMDqg=.7e3c512d-7094-40e2-9ae3-f26859960854@github.com>

On Wed, 10 Nov 2021 22:06:05 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

> This change removes the disabled CriticalJNINatives code, and the flag now gives an obsolete message.
> Tested with tier1 on cpus x64, aarch64, and builds on linux-x86-open,linux-s390x-open,linux-arm32-debug,linux-ppc64le-debug.

This pull request has now been integrated.

Changeset: 0d2980cd
Author:    Coleen Phillimore <coleenp at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/0d2980cdd1486b0689a71fc107a1d4c100bd3025
Stats:     1849 lines in 24 files changed: 0 ins; 1673 del; 176 mod

8258192: Obsolete the CriticalJNINatives flag

Reviewed-by: mdoerr, shade

-------------

PR: https://git.openjdk.java.net/jdk/pull/6343

From aph at openjdk.java.net  Fri Nov 12 17:39:41 2021
From: aph at openjdk.java.net (Andrew Haley)
Date: Fri, 12 Nov 2021 17:39:41 GMT
Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v3]
In-Reply-To: <iON4l4nrF85-Ko8-Mm92ShAvM08-Vd9WCel12eFrv3Y=.2743317e-c21e-4632-aca6-c3129d00cc86@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
 <iON4l4nrF85-Ko8-Mm92ShAvM08-Vd9WCel12eFrv3Y=.2743317e-c21e-4632-aca6-c3129d00cc86@github.com>
Message-ID: <YBpXXIgg-17dAfwyhI0TugcHh0pvX31nPmfCi_aQS8s=.d1c61b68-cc00-459e-a12a-e6d3d199b249@github.com>

On Fri, 12 Nov 2021 16:18:04 GMT, Alan Hayward <duke at openjdk.java.net> wrote:

>> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One
>> of its uses is to protect against ROP based attacks. This is done by
>> signing the Link Register whenever it is stored on the stack, and
>> authenticating the value when it is loaded back from the stack. If an
>> attacker were to try to change control flow by editing the stack then
>> the authentication check of the Link Register will fail, causing a
>> segfault when the function returns.
>> 
>> On a system with PAC enabled, it is expected that all applications will
>> be compiled with ROP protection. Fedora 33 and upwards already provide
>> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of
>> PAC instructions that exist in the NOP space - on hardware without PAC,
>> these instructions act as NOPs, allowing backward compatibility for
>> negligible performance cost (2 NOPs per non-leaf function).
>> 
>> Hardware is currently limited to the Apple M1 MacBooks. All testing has
>> been done within a Fedora Docker image. A run of SpecJVM showed no
>> difference to that of noise - which was surprising.
>> 
>> The most important part of this patch is simply compiling using branch
>> protection provided by GCC/LLVM. This protects all C++ code from being
>> used in ROP attacks, removing all static ROP gadgets from use.
>> 
>> The remainder of the patch adds ROP protection to runtime generated
>> code, in both stubs and compiled Java code. Attacks here are much harder
>> as ROP gadgets must be found dynamically at runtime. If/when AOT
>> compilation is added to JDK, then all stubs and compiled Java will be
>> susceptible ROP gadgets being found by static analysis and therefore
>> potentially as vulnerable as C++ code.
>> 
>> There are a number of places where the VM changes control flow by
>> rewriting the stack or otherwise. I?ve done some analysis as to how
>> these could also be used for attacks (which I didn?t want to post here).
>> These areas can be protected ensuring the pointers to various stubs and
>> entry points are stored in memory as signed pointers. These changes are
>> simple to make (they can be reduced to a type change in common code and
>> a few addition sign/auth calls in the backend), but there a lot of them
>> and the total code change is fairly large. I?m happy to provide a few
>> work in progress patches.
>> 
>> In order to match the security benefits of the Apple Arm64e ABI across
>> the whole of JDK, then all the changes mentioned above would be
>> required.
>
> Alan Hayward has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Document pauth functions && remove OS split
>  - Update UseROPProtection description

src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5254:

> 5252: // Also use before signing to check that the pointer is valid and hasn't already been signed.
> 5253: //
> 5254: void MacroAssembler::check_return_address(Register return_reg) {

This commentary is excellent. Thanks.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6334

From dlong at openjdk.java.net  Fri Nov 12 20:23:07 2021
From: dlong at openjdk.java.net (Dean Long)
Date: Fri, 12 Nov 2021 20:23:07 GMT
Subject: RFR: 8276095: ciReplay: replay failure due to incomplete
 ciMethodData information [v3]
In-Reply-To: <EjhE5ReChm9c2C6F0K-WE94bWhagEW7y7qcWLlp6EvY=.a49a439b-32f5-40b1-a81c-47ed0d8047ee@github.com>
References: <EjhE5ReChm9c2C6F0K-WE94bWhagEW7y7qcWLlp6EvY=.a49a439b-32f5-40b1-a81c-47ed0d8047ee@github.com>
Message-ID: <FVaO_tnbw0jKk3lm_FyRTqZgItnJLMlONpz1ETKxQ2I=.52b829d1-c38b-40c1-8afe-3566483c23be@github.com>

> The replay data was missing MethodData::_invocation_counter.  Adding it seems to fix the problem.  @rwestrel please verify if it works for you.  Also, with this change:
> 1. added a version number to the replay file
> 2. removed unnused ci fields
> 3. corrected comment in TestLambdas.java

Dean Long has updated the pull request incrementally with two additional commits since the last revision:

 - turn version error into a warning
 - updated syntax comment

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6344/files
  - new: https://git.openjdk.java.net/jdk/pull/6344/files/20bea849..46fd3fac

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6344&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6344&range=01-02

  Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6344.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6344/head:pull/6344

PR: https://git.openjdk.java.net/jdk/pull/6344

From dlong at openjdk.java.net  Fri Nov 12 20:25:37 2021
From: dlong at openjdk.java.net (Dean Long)
Date: Fri, 12 Nov 2021 20:25:37 GMT
Subject: RFR: 8276095: ciReplay: replay failure due to incomplete
 ciMethodData information [v2]
In-Reply-To: <AGt0odrNeEUYj93qvQB8XxWsCrQm_9AnChHwPqKFUrs=.ebc30e7e-ef25-496e-a244-9424489cc7d2@github.com>
References: <EjhE5ReChm9c2C6F0K-WE94bWhagEW7y7qcWLlp6EvY=.a49a439b-32f5-40b1-a81c-47ed0d8047ee@github.com>
 <AGt0odrNeEUYj93qvQB8XxWsCrQm_9AnChHwPqKFUrs=.ebc30e7e-ef25-496e-a244-9424489cc7d2@github.com>
Message-ID: <9HXkfUAMDo56djXYPGHCOEKl3dT3x-DA7vjGeY1P6Xw=.94c4d6af-dc9b-434d-83fe-3c8324794040@github.com>

On Fri, 12 Nov 2021 03:36:59 GMT, Dean Long <dlong at openjdk.org> wrote:

>> The replay data was missing MethodData::_invocation_counter.  Adding it seems to fix the problem.  @rwestrel please verify if it works for you.  Also, with this change:
>> 1. added a version number to the replay file
>> 2. removed unnused ci fields
>> 3. corrected comment in TestLambdas.java
>
> Dean Long has updated the pull request incrementally with three additional commits since the last revision:
> 
>  - _current_mileage field is never used, stub out access
>  - initialize _version to 0
>  - remove comment

> I guess we could leave this in for old replay files with the initialization further down in ciReplay::initialize() if _version < 1.

Yes, it's necessary to parse the value for old replay files, but the value is never used.  I'm not sure what you are suggesting about the initialization further down.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6344

From dlong at openjdk.java.net  Fri Nov 12 20:33:37 2021
From: dlong at openjdk.java.net (Dean Long)
Date: Fri, 12 Nov 2021 20:33:37 GMT
Subject: RFR: 8276095: ciReplay: replay failure due to incomplete
 ciMethodData information [v3]
In-Reply-To: <FVaO_tnbw0jKk3lm_FyRTqZgItnJLMlONpz1ETKxQ2I=.52b829d1-c38b-40c1-8afe-3566483c23be@github.com>
References: <EjhE5ReChm9c2C6F0K-WE94bWhagEW7y7qcWLlp6EvY=.a49a439b-32f5-40b1-a81c-47ed0d8047ee@github.com>
 <FVaO_tnbw0jKk3lm_FyRTqZgItnJLMlONpz1ETKxQ2I=.52b829d1-c38b-40c1-8afe-3566483c23be@github.com>
Message-ID: <NObr18i_1dcptDjYH7pMyf3lctzOzIuo9tCPGSxrztI=.a678bfe8-8122-4b4b-839b-e6983b2a3876@github.com>

On Fri, 12 Nov 2021 20:23:07 GMT, Dean Long <dlong at openjdk.org> wrote:

>> The replay data was missing MethodData::_invocation_counter.  Adding it seems to fix the problem.  @rwestrel please verify if it works for you.  Also, with this change:
>> 1. added a version number to the replay file
>> 2. removed unnused ci fields
>> 3. corrected comment in TestLambdas.java
>
> Dean Long has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - turn version error into a warning
>  - updated syntax comment

> However, thinking again about this, it should not happen that we parse a version number that's not supported

A user could be using an older JDK but accidentally try a newer replay file.  That was the scenario I had in mind.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6344

From dlong at openjdk.java.net  Fri Nov 12 20:40:07 2021
From: dlong at openjdk.java.net (Dean Long)
Date: Fri, 12 Nov 2021 20:40:07 GMT
Subject: RFR: 8276095: ciReplay: replay failure due to incomplete
 ciMethodData information [v4]
In-Reply-To: <EjhE5ReChm9c2C6F0K-WE94bWhagEW7y7qcWLlp6EvY=.a49a439b-32f5-40b1-a81c-47ed0d8047ee@github.com>
References: <EjhE5ReChm9c2C6F0K-WE94bWhagEW7y7qcWLlp6EvY=.a49a439b-32f5-40b1-a81c-47ed0d8047ee@github.com>
Message-ID: <R-41UNuBHRN7YthwZ5eaXjUT_Jv5GTJsYFpZlMopisA=.9c9bf2c5-c087-4593-bcbf-47eb5707bbdd@github.com>

> The replay data was missing MethodData::_invocation_counter.  Adding it seems to fix the problem.  @rwestrel please verify if it works for you.  Also, with this change:
> 1. added a version number to the replay file
> 2. removed unnused ci fields
> 3. corrected comment in TestLambdas.java

Dean Long has updated the pull request incrementally with one additional commit since the last revision:

  strengthen version check

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6344/files
  - new: https://git.openjdk.java.net/jdk/pull/6344/files/46fd3fac..0552e47a

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6344&range=03
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6344&range=02-03

  Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6344.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6344/head:pull/6344

PR: https://git.openjdk.java.net/jdk/pull/6344

From simonis at openjdk.java.net  Sat Nov 13 00:32:13 2021
From: simonis at openjdk.java.net (Volker Simonis)
Date: Sat, 13 Nov 2021 00:32:13 GMT
Subject: RFR: 8273563: Improve performance of implicit exceptions with
 -XX:-OmitStackTraceInFastThrow [v9]
In-Reply-To: <V33bqOWqbYrr4ACfUn3SkS_JbOuICbX68z7jlX37pXQ=.a4a6b7ce-da22-440f-b3c7-d0a99a28fea0@github.com>
References: <V33bqOWqbYrr4ACfUn3SkS_JbOuICbX68z7jlX37pXQ=.a4a6b7ce-da22-440f-b3c7-d0a99a28fea0@github.com>
Message-ID: <Mz62iqNsk9VjyLEJt31DCrVO7gMjpLOSKHlhRoKeG3o=.ff201198-cc69-40fc-8ac0-1031fb3f7eac@github.com>

> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening.
> 
> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274):
> 
>     public static boolean isAlpha(int c) {
>         try {
>             return IS_ALPHA[c];
>         } catch (ArrayIndexOutOfBoundsException ex) {
>             return false;
>         }
>     }
> 
> 
> ### Solution
> 
> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code:
> 
> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions
> Benchmark                 (exceptionProbability)  Mode  Cnt      Score      Error  Units
> ImplicitExceptions.bench                     0.0  avgt    5      1.430 ?    0.353  ns/op
> ImplicitExceptions.bench                    0.33  avgt    5   3563.038 ?   77.358  ns/op
> ImplicitExceptions.bench                    0.66  avgt    5   8609.693 ? 1205.104  ns/op
> ImplicitExceptions.bench                    1.00  avgt    5  12842.401 ? 1022.728  ns/op
> 
> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions
> Benchmark                 (exceptionProbability)  Mode  Cnt      Score      Error  Units
> ImplicitExceptions.bench                     0.0  avgt    5     1.432  ?    0.352  ns/op
> ImplicitExceptions.bench                    0.33  avgt    5   355.723  ?   16.641  ns/op
> ImplicitExceptions.bench                    0.66  avgt    5   887.068  ?  166.728  ns/op
> ImplicitExceptions.bench                    1.00  avgt    5  1274.418  ?   88.235  ns/op
> 
> 
> ### Implementation details
> 
> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default.
> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `<init>` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. 
> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`.
> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call.
> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster.

Volker Simonis has updated the pull request incrementally with one additional commit since the last revision:

  Fix IR Framework test Traps::classCheck() which now behaves differently with -XX:+OptimizeImplicitExceptions

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/5488/files
  - new: https://git.openjdk.java.net/jdk/pull/5488/files/b3c130c8..536f5398

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5488&range=08
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5488&range=07-08

  Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod
  Patch: https://git.openjdk.java.net/jdk/pull/5488.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/5488/head:pull/5488

PR: https://git.openjdk.java.net/jdk/pull/5488

From leonid.mesnik at oracle.com  Sat Nov 13 04:08:34 2021
From: leonid.mesnik at oracle.com (Leonid Mesnik)
Date: Sat, 13 Nov 2021 04:08:34 +0000
Subject: jtreg cannot be run against a pre-built/downloaded JDK anymore?
In-Reply-To: <5405e633-00b4-949f-a982-d1057187d21a@gmail.com>
References: <5405e633-00b4-949f-a982-d1057187d21a@gmail.com>
Message-ID: <BYAPR10MB2471FBE391C0C48115E285839F969@BYAPR10MB2471.namprd10.prod.outlook.com>

Hi

It is a hotpost testing problem rather than a jtreg problem. So I?ve added hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net> alias.

Seems that problem is that WhiteBox API used in  testing doesn?t correspond to JDK being tested.

This commit changed WhiteBox.canWriteJavaHeapArchive() method
https://github.com/openjdk/jdk/commit/922e86f4ff28c7b17af8e7b5867a40fc76b7fdd7#diff-e75d116b35afd951f114c2b0793b26d0009b441653d6b28d611afcbe0106dfd0

So  you might see this linkage error if tries to test older version of JDK while tests have these changes.

Could you please check that you use exactly the same sources during testing which have been used to build JDK.

Leonid

From: jtreg-use <jtreg-use-retn at openjdk.java.net> on behalf of Jaikiran Pai <jai.forums2013 at gmail.com>
Date: Friday, November 12, 2021 at 8:40 PM
To: jtreg-use at openjdk.java.net <jtreg-use at openjdk.java.net>
Subject: jtreg cannot be run against a pre-built/downloaded JDK anymore?
In order to reproduce one of the issues I have been looking into, I've
been trying to run a jtreg test case against a Java 17 installation. The
command I use is:

java -jar jtreg.jar -jdk:<path-to-jdk-17-home>
test/jdk/java/..../SomeTest.java

This runs into the following exception:

failed to get value for vm.cds.write.archived.java.heap
java.lang.UnsatisfiedLinkError: 'boolean
jdk.test.whitebox.WhiteBox.canWriteJavaHeapArchive()'
     at jdk.test.whitebox.WhiteBox.canWriteJavaHeapArchive(Native Method)
     at requires.VMProps.vmCDSCanWriteArchivedJavaHeap(VMProps.java:413)
     at requires.VMProps$SafeMap.put(VMProps.java:72)
     at requires.VMProps.call(VMProps.java:113)
     at requires.VMProps.call(VMProps.java:60)
     at
com.sun.javatest.regtest.agent.GetJDKProperties.run(GetJDKProperties.java:80)
     at
com.sun.javatest.regtest.agent.GetJDKProperties.main(GetJDKProperties.java:54)
Test results: failed: 1

Is this something I am doing wrong or is it some genuine issue? I
haven't been able to run jtreg against a downloaded/installed JDK for
many weeks now. Initially I thought I had somehow messed my local jdk
source repo setup so didn't pay much attention to the failures. But now,
I'm trying this on a completely different clean setup and that too runs
into this issue.

Here's the output of jtreg -version:

jtreg 6.1-dev+1
Installed in <some-location>\jtreg\lib\jtreg.jar
Running on platform version 17.0.1 from <some-location>\jdk-17.0.1.
Built with Java(TM) 2 SDK, Version 1.8.0_312-b07 on November 12, 2021.
Copyright (c) 1999, 2021, Oracle and/or its affiliates. All rights reserved.
Use is subject to license terms.
JT Harness, version 6.0 ea b14 (November 12, 2021)
JCov 3.0-2
Java Assembler Tools, version 7.0 ea b09 (November 12, 2021)
TestNG (testng.jar): version 7.3.0
TestNG (jcommander.jar): version unknown
TestNG (guice.jar): version 4.2.3
JUnit (junit.jar): version 4.13.2
JUnit (hamcrest.jar): version 2.2

-Jaikiran



From jai.forums2013 at gmail.com  Sat Nov 13 05:37:46 2021
From: jai.forums2013 at gmail.com (Jaikiran Pai)
Date: Sat, 13 Nov 2021 11:07:46 +0530
Subject: jtreg cannot be run against a pre-built/downloaded JDK anymore?
In-Reply-To: <57f314df-33fd-1af1-8468-19d17b6d69ad@gmail.com>
References: <5405e633-00b4-949f-a982-d1057187d21a@gmail.com>
 <BYAPR10MB2471FBE391C0C48115E285839F969@BYAPR10MB2471.namprd10.prod.outlook.com>
 <57f314df-33fd-1af1-8468-19d17b6d69ad@gmail.com>
Message-ID: <680a2669-d5f8-a596-e3ad-071b9fb66233@gmail.com>

I got past this with an extensive workaround for now. I moved/copied 
that test case java file outside of the JDK source tree, then created a 
new/custom TEST.ROOT which is very minimal and has no reference to 
whitebox for bootlibs, then made sure the jtwork directory is also 
outside of the JDK source tree (so that the test is compiled afresh) and 
then ran that test. That helped, but it's only for this test since its 
requirements in the test are very minimal. I don't see a way to get past 
this if I have to run the wider range of jtreg tests that reside in the 
JDK source tree against a pre-built/downloaded Java 17 or any previous 
versions.

-Jaikiran

On 13/11/21 10:26 am, Jaikiran Pai wrote:
> Hello Leonid,
>
> On 13/11/21 9:38 am, Leonid Mesnik wrote:
>> Hi
>>
>> It is a hotpost testing problem rather than a jtreg problem. So I?ve 
>> added 
>> hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net> alias.
> Thank you for adding the right list.
>> ...
>> Could you please check that you use exactly the same sources during 
>> testing which have been used to build JDK.
>
> Do you mean the sources of the JDK against which the test is being 
> run? I don't have those sources since this test runs against a 
> pre-built binary downloaded from https://jdk.java.net/17/
>
> -Jaikiran
>

From stuefe at openjdk.java.net  Sat Nov 13 06:11:43 2021
From: stuefe at openjdk.java.net (Thomas Stuefe)
Date: Sat, 13 Nov 2021 06:11:43 GMT
Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks [v2]
In-Reply-To: <GZdP4Skyrnhyww0vSmutTLOwvYW3tCjigaC9lQKTfd8=.fda271b3-06aa-4603-b1c1-fa3f52400be5@github.com>
References: <h_TohyRCtTn8BzN-h8uTTotWxPr5gTe4CWB04_lS8uA=.3f687b45-3058-42ab-902b-f42cfeb9104d@github.com>
 <GZdP4Skyrnhyww0vSmutTLOwvYW3tCjigaC9lQKTfd8=.fda271b3-06aa-4603-b1c1-fa3f52400be5@github.com>
Message-ID: <vouhrlkCHQrvOjQ3R5-cu-b3TBgwUHvlQKp72NMC0zE=.494e203a-b403-4ea0-898e-09538853063c@github.com>

On Thu, 11 Nov 2021 06:30:15 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> This is part of a number of RFE I plan to improve and simplify C-heap overflow checking in hotspot. For the whole story please refer to https://bugs.openjdk.java.net/browse/JDK-8275301.
>> 
>> This proposal adds NMT buffer overflow checking. As laid out in JDK-8275301:
>> 
>> - it would give us C-heap overflow checking in release builds
>> - the additional costs are neglectable
>> - NMT needs intact headers anyway. Faced with buffer overwrites today, it would maybe crash or maybe account wrongly, but it's a bit of a lottery really. The error reports would also be confusing.
>> - it is a preparation for future code removal (the memory guarding done in debug only in os::malloc() and friends, and possibly the guarding done with CheckJNICalls)
>> 
>> Patch notes:
>> 
>> 1) The malloc header is changed such that it contains a 16-bit canary directly preceding the user payload of the allocation. 
>> 
>> On 64-bit, we don't even need to enlarge the malloc header: we carve some bits out by decreasing the size of the bucket index bit field to 16 bits. The bucket index field is used to store the bucket slot of the malloc site table in NMT detail mode. The malloc site table width is 512 atm, so 65k gives plenty of room for growing the malloc site table should we ever want to.
>> 
>> On 32-bit, I had to enlarge the header from 8 bytes to 16 bytes. That is because there were not enough bits to spare for a canary. On the upside, 8 bytes were not enough anyway, strictly speaking, to guarantee proper alignment e.g. for 128bit data types on all 32-bit platforms. See e.g. the malloc alignment the glibc uses.
>> 
>> I also took the freedom of re-arranging the malloc header fields a bit to minimize the difference between 32-bit and 64-bit platforms, and to align each field optimally according to its size. I also switched from bitfields to real types in order to be able to do a sizeof() on them.
>> 
>> For more details, see the comment in mallocTracker.hpp.
>> 
>> 2) I added a footer canary trailing the user allocation to catch tail buffer overruns. For simplicity reasons (alignment) and to save some cycles I made it a byte only. That is enough to catch most overrun scenarios. If you think this is too small, I'm open to change it.
>> 
>> 3) I put a bit of work into error reporting. When NMT detects corruption, it will now print out a hex dump of the corrupted area to tty before asserting.
>> 
>> 4) I added a bunch of gtests to test various heap overwrite scenarios. I also had to extend the gtest macros a bit because I wanted these tests of course to run in release builds too, but we did not have a death test macro for release builds yet (there are possibilities for code simplification here too, but that's for another RFE).
>> 
>> (Note that these gtests, to test anything, need to run with NMT switched on. We do this as part of our NMT jtreg-controlled gtests in tier1).
>> 
>> Even though the patch adds more code than it removes, it prepares possible code removal (if we can agree to do that) and the net result will be less complexity, not more. Again, see JDK-8275301 for details.
>> 
>> --------------
>> 
>> Example output a buffer overrun would provide:
>> 
>> 
>> Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)
>> NMT Block at 0x00005600f86136b0, corruption at: 0x00005600f86136c1: 
>> 0x00005600f86136a8:   21 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00
>> 0x00005600f86136b8:   00 00 00 00 0f 00 1f fa 00 61 00 00 00 00 00 00
>> 0x00005600f86136c8:   41 39 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f86136d8:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
>> 0x00005600f86136e8:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f86136f8:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f8613708:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f8613718:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f8613728:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f8613738:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
>> assert failed: fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)#
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> #  Internal Error (mallocTracker.cpp:203), pid=10805, tid=10805
>> #  fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)
>> #
>> 
>> -------
>> 
>> Tests:
>> - manual tests with Linux x64, x86, minimal build
>> - GHAs all clean
>> - SAP nightlies ran for 14 days in a row without problems
>
> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision:
> 
>  - Merge
>  - Let NMT do overflow detection

Friendly Ping.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5952

From david.holmes at oracle.com  Sat Nov 13 06:37:53 2021
From: david.holmes at oracle.com (David Holmes)
Date: Sat, 13 Nov 2021 16:37:53 +1000
Subject: jtreg cannot be run against a pre-built/downloaded JDK anymore?
In-Reply-To: <680a2669-d5f8-a596-e3ad-071b9fb66233@gmail.com>
References: <5405e633-00b4-949f-a982-d1057187d21a@gmail.com>
 <BYAPR10MB2471FBE391C0C48115E285839F969@BYAPR10MB2471.namprd10.prod.outlook.com>
 <57f314df-33fd-1af1-8468-19d17b6d69ad@gmail.com>
 <680a2669-d5f8-a596-e3ad-071b9fb66233@gmail.com>
Message-ID: <1102c99b-e4e7-5862-fd81-cb39af4c3a81@oracle.com>

On 13/11/2021 3:37 pm, Jaikiran Pai wrote:
> I got past this with an extensive workaround for now. I moved/copied 
> that test case java file outside of the JDK source tree, then created a 
> new/custom TEST.ROOT which is very minimal and has no reference to 
> whitebox for bootlibs, then made sure the jtwork directory is also 
> outside of the JDK source tree (so that the test is compiled afresh) and 
> then ran that test. That helped, but it's only for this test since its 
> requirements in the test are very minimal. I don't see a way to get past 
> this if I have to run the wider range of jtreg tests that reside in the 
> JDK source tree against a pre-built/downloaded Java 17 or any previous 
> versions.

Basically you're not supposed to do that. You have to test a given 
binary with the tests that existed when that binary was built. Many 
things in the tests can change that will fail to run with an older JDK.

In theory you can use the build number of the binary JDK to checkout the 
tests corresponding to that build using the appropriate build tag.

Cheers,
David

> -Jaikiran
> 
> On 13/11/21 10:26 am, Jaikiran Pai wrote:
>> Hello Leonid,
>>
>> On 13/11/21 9:38 am, Leonid Mesnik wrote:
>>> Hi
>>>
>>> It is a hotpost testing problem rather than a jtreg problem. So I?ve 
>>> added 
>>> hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net> alias.
>> Thank you for adding the right list.
>>> ...
>>> Could you please check that you use exactly the same sources during 
>>> testing which have been used to build JDK.
>>
>> Do you mean the sources of the JDK against which the test is being 
>> run? I don't have those sources since this test runs against a 
>> pre-built binary downloaded from https://jdk.java.net/17/
>>
>> -Jaikiran
>>

From thomas.stuefe at gmail.com  Sat Nov 13 07:56:40 2021
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Sat, 13 Nov 2021 08:56:40 +0100
Subject: jtreg cannot be run against a pre-built/downloaded JDK anymore?
In-Reply-To: <1102c99b-e4e7-5862-fd81-cb39af4c3a81@oracle.com>
References: <5405e633-00b4-949f-a982-d1057187d21a@gmail.com>
 <BYAPR10MB2471FBE391C0C48115E285839F969@BYAPR10MB2471.namprd10.prod.outlook.com>
 <57f314df-33fd-1af1-8468-19d17b6d69ad@gmail.com>
 <680a2669-d5f8-a596-e3ad-071b9fb66233@gmail.com>
 <1102c99b-e4e7-5862-fd81-cb39af4c3a81@oracle.com>
Message-ID: <CAA-vtUzt=efM-HRZP3cM1tBFbGWjZj-XGjtCZa6dC3kS5NZ7yQ@mail.gmail.com>

Maybe the easiest way for you would be to get the source drop matching the
binary JDK from the vendor of your JDK. Since you may also have
vendor-specific changes (albeit rare, its possible).

Cheers, Thomas


On Sat, Nov 13, 2021 at 7:38 AM David Holmes <david.holmes at oracle.com>
wrote:

> On 13/11/2021 3:37 pm, Jaikiran Pai wrote:
> > I got past this with an extensive workaround for now. I moved/copied
> > that test case java file outside of the JDK source tree, then created a
> > new/custom TEST.ROOT which is very minimal and has no reference to
> > whitebox for bootlibs, then made sure the jtwork directory is also
> > outside of the JDK source tree (so that the test is compiled afresh) and
> > then ran that test. That helped, but it's only for this test since its
> > requirements in the test are very minimal. I don't see a way to get past
> > this if I have to run the wider range of jtreg tests that reside in the
> > JDK source tree against a pre-built/downloaded Java 17 or any previous
> > versions.
>
> Basically you're not supposed to do that. You have to test a given
> binary with the tests that existed when that binary was built. Many
> things in the tests can change that will fail to run with an older JDK.
>
> In theory you can use the build number of the binary JDK to checkout the
> tests corresponding to that build using the appropriate build tag.
>
> Cheers,
> David
>
> > -Jaikiran
> >
> > On 13/11/21 10:26 am, Jaikiran Pai wrote:
> >> Hello Leonid,
> >>
> >> On 13/11/21 9:38 am, Leonid Mesnik wrote:
> >>> Hi
> >>>
> >>> It is a hotpost testing problem rather than a jtreg problem. So I?ve
> >>> added
> >>> hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>
> alias.
> >> Thank you for adding the right list.
> >>> ...
> >>> Could you please check that you use exactly the same sources during
> >>> testing which have been used to build JDK.
> >>
> >> Do you mean the sources of the JDK against which the test is being
> >> run? I don't have those sources since this test runs against a
> >> pre-built binary downloaded from https://jdk.java.net/17/
> >>
> >> -Jaikiran
> >>
>

From jai.forums2013 at gmail.com  Sat Nov 13 08:28:14 2021
From: jai.forums2013 at gmail.com (Jaikiran Pai)
Date: Sat, 13 Nov 2021 13:58:14 +0530
Subject: jtreg cannot be run against a pre-built/downloaded JDK anymore?
In-Reply-To: <1102c99b-e4e7-5862-fd81-cb39af4c3a81@oracle.com>
References: <5405e633-00b4-949f-a982-d1057187d21a@gmail.com>
 <BYAPR10MB2471FBE391C0C48115E285839F969@BYAPR10MB2471.namprd10.prod.outlook.com>
 <57f314df-33fd-1af1-8468-19d17b6d69ad@gmail.com>
 <680a2669-d5f8-a596-e3ad-071b9fb66233@gmail.com>
 <1102c99b-e4e7-5862-fd81-cb39af4c3a81@oracle.com>
Message-ID: <1fb7457b-d933-836a-5f97-f078a320ab72@gmail.com>

Hello David,

On 13/11/21 12:07 pm, David Holmes wrote:
> On 13/11/2021 3:37 pm, Jaikiran Pai wrote:
>> I got past this with an extensive workaround for now. I moved/copied 
>> that test case java file outside of the JDK source tree, then created 
>> a new/custom TEST.ROOT which is very minimal and has no reference to 
>> whitebox for bootlibs, then made sure the jtwork directory is also 
>> outside of the JDK source tree (so that the test is compiled afresh) 
>> and then ran that test. That helped, but it's only for this test 
>> since its requirements in the test are very minimal. I don't see a 
>> way to get past this if I have to run the wider range of jtreg tests 
>> that reside in the JDK source tree against a pre-built/downloaded 
>> Java 17 or any previous versions.
>
> Basically you're not supposed to do that. 

I wasn't aware of that. I used to use this method to selectively run 
newly added jtreg tests against different downloaded versions of JDK and 
assumed it was a supported usecase.

Thanks everyone for the inputs.

-Jaikiran


From joe.darcy at oracle.com  Sat Nov 13 17:48:36 2021
From: joe.darcy at oracle.com (Joe Darcy)
Date: Sat, 13 Nov 2021 09:48:36 -0800
Subject: jtreg cannot be run against a pre-built/downloaded JDK anymore?
In-Reply-To: <CAA-vtUzt=efM-HRZP3cM1tBFbGWjZj-XGjtCZa6dC3kS5NZ7yQ@mail.gmail.com>
References: <5405e633-00b4-949f-a982-d1057187d21a@gmail.com>
 <BYAPR10MB2471FBE391C0C48115E285839F969@BYAPR10MB2471.namprd10.prod.outlook.com>
 <57f314df-33fd-1af1-8468-19d17b6d69ad@gmail.com>
 <680a2669-d5f8-a596-e3ad-071b9fb66233@gmail.com>
 <1102c99b-e4e7-5862-fd81-cb39af4c3a81@oracle.com>
 <CAA-vtUzt=efM-HRZP3cM1tBFbGWjZj-XGjtCZa6dC3kS5NZ7yQ@mail.gmail.com>
Message-ID: <0f2c573f-ae6c-c52d-c3a9-e1a92baf6f2d@oracle.com>

And the SCM hashes used to create a JDK build are one of the pieces of 
information in the $JDK/release file.

-Joe

On 11/12/2021 11:56 PM, Thomas St?fe wrote:
> Maybe the easiest way for you would be to get the source drop matching the
> binary JDK from the vendor of your JDK. Since you may also have
> vendor-specific changes (albeit rare, its possible).
>
> Cheers, Thomas
>
>
> On Sat, Nov 13, 2021 at 7:38 AM David Holmes <david.holmes at oracle.com>
> wrote:
>
>> On 13/11/2021 3:37 pm, Jaikiran Pai wrote:
>>> I got past this with an extensive workaround for now. I moved/copied
>>> that test case java file outside of the JDK source tree, then created a
>>> new/custom TEST.ROOT which is very minimal and has no reference to
>>> whitebox for bootlibs, then made sure the jtwork directory is also
>>> outside of the JDK source tree (so that the test is compiled afresh) and
>>> then ran that test. That helped, but it's only for this test since its
>>> requirements in the test are very minimal. I don't see a way to get past
>>> this if I have to run the wider range of jtreg tests that reside in the
>>> JDK source tree against a pre-built/downloaded Java 17 or any previous
>>> versions.
>> Basically you're not supposed to do that. You have to test a given
>> binary with the tests that existed when that binary was built. Many
>> things in the tests can change that will fail to run with an older JDK.
>>
>> In theory you can use the build number of the binary JDK to checkout the
>> tests corresponding to that build using the appropriate build tag.
>>
>> Cheers,
>> David
>>
>>> -Jaikiran
>>>
>>> On 13/11/21 10:26 am, Jaikiran Pai wrote:
>>>> Hello Leonid,
>>>>
>>>> On 13/11/21 9:38 am, Leonid Mesnik wrote:
>>>>> Hi
>>>>>
>>>>> It is a hotpost testing problem rather than a jtreg problem. So I?ve
>>>>> added
>>>>> hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>
>> alias.
>>>> Thank you for adding the right list.
>>>>> ...
>>>>> Could you please check that you use exactly the same sources during
>>>>> testing which have been used to build JDK.
>>>> Do you mean the sources of the JDK against which the test is being
>>>> run? I don't have those sources since this test runs against a
>>>> pre-built binary downloaded from https://jdk.java.net/17/
>>>>
>>>> -Jaikiran
>>>>

From duke at openjdk.java.net  Mon Nov 15 09:07:11 2021
From: duke at openjdk.java.net (Alan Hayward)
Date: Mon, 15 Nov 2021 09:07:11 GMT
Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v4]
In-Reply-To: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
Message-ID: <OLh8pzd5wf1j-KW7PI_0BRoxOEyPNxUvzfsUIW6tl9Y=.251fbd65-ba03-4b66-b132-e7fc32e676f1@github.com>

> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One
> of its uses is to protect against ROP based attacks. This is done by
> signing the Link Register whenever it is stored on the stack, and
> authenticating the value when it is loaded back from the stack. If an
> attacker were to try to change control flow by editing the stack then
> the authentication check of the Link Register will fail, causing a
> segfault when the function returns.
> 
> On a system with PAC enabled, it is expected that all applications will
> be compiled with ROP protection. Fedora 33 and upwards already provide
> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of
> PAC instructions that exist in the NOP space - on hardware without PAC,
> these instructions act as NOPs, allowing backward compatibility for
> negligible performance cost (2 NOPs per non-leaf function).
> 
> Hardware is currently limited to the Apple M1 MacBooks. All testing has
> been done within a Fedora Docker image. A run of SpecJVM showed no
> difference to that of noise - which was surprising.
> 
> The most important part of this patch is simply compiling using branch
> protection provided by GCC/LLVM. This protects all C++ code from being
> used in ROP attacks, removing all static ROP gadgets from use.
> 
> The remainder of the patch adds ROP protection to runtime generated
> code, in both stubs and compiled Java code. Attacks here are much harder
> as ROP gadgets must be found dynamically at runtime. If/when AOT
> compilation is added to JDK, then all stubs and compiled Java will be
> susceptible ROP gadgets being found by static analysis and therefore
> potentially as vulnerable as C++ code.
> 
> There are a number of places where the VM changes control flow by
> rewriting the stack or otherwise. I?ve done some analysis as to how
> these could also be used for attacks (which I didn?t want to post here).
> These areas can be protected ensuring the pointers to various stubs and
> entry points are stored in memory as signed pointers. These changes are
> simple to make (they can be reduced to a type change in common code and
> a few addition sign/auth calls in the backend), but there a lot of them
> and the total code change is fairly large. I?m happy to provide a few
> work in progress patches.
> 
> In order to match the security benefits of the Apple Arm64e ABI across
> the whole of JDK, then all the changes mentioned above would be
> required.

Alan Hayward has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits:

 - Merge master
 - Document pauth functions && remove OS split
 - Update UseROPProtection description
 - Simplify branch protection configure check
 - 8264130: PAC-RET protection for Linux/AArch64
   
   PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One
   of its uses is to protect against ROP based attacks. This is done by
   signing the Link Register whenever it is stored on the stack, and
   authenticating the value when it is loaded back from the stack. If an
   attacker were to try to change control flow by editing the stack then
   the authentication check of the Link Register will fail, causing a
   segfault when the function returns.
   
   On a system with PAC enabled, it is expected that all applications will
   be compiled with ROP protection. Fedora 33 and upwards already provide
   this. By compiling for ARMv8.0, GCC and LLVM will only use the set of
   PAC instructions that exist in the NOP space - on hardware without PAC,
   these instructions act as NOPs, allowing backward compatibility for
   negligible performance cost (2 NOPs per non-leaf function).
   
   Hardware is currently limited to the Apple M1 MacBooks. All testing has
   been done within a Fedora Docker image. A run of SpecJVM showed no
   difference to that of noise - which was surprising.
   
   The most important part of this patch is simply compiling using branch
   protection provided by GCC/LLVM. This protects all C++ code from being
   used in ROP attacks, removing all static ROP gadgets from use.
   
   The remainder of the patch adds ROP protection to runtime generated
   code, in both stubs and compiled Java code. Attacks here are much harder
   as ROP gadgets must be found dynamically at runtime. If/when AOT
   compilation is added to JDK, then all stubs and compiled Java will be
   susceptible ROP gadgets being found by static analysis and therefore
   potentially as vulnerable as C++ code.
   
   There are a number of places where the VM changes control flow by
   rewriting the stack or otherwise. I?ve done some analysis as to how
   these could also be used for attacks (which I didn?t want to post here).
   These areas can be protected ensuring the pointers to various stubs and
   entry points are stored in memory as signed pointers. These changes are
   simple to make (they can be reduced to a type change in common code and
   a few addition sign/auth calls in the backend), but there a lot of them
   and the total code change is fairly large. I?m happy to provide a few
   work in progress patches.
   
   In order to match the security benefits of the Apple Arm64e ABI across
   the whole of JDK, then all the changes mentioned above would be
   required.
 - Add PAC assembly instructions
 - Add AArch64 ROP protection runtime flag
 - Build with branch protection

-------------

Changes: https://git.openjdk.java.net/jdk/pull/6334/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=03
  Stats: 1436 lines in 25 files changed: 490 ins; 150 del; 796 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6334.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6334/head:pull/6334

PR: https://git.openjdk.java.net/jdk/pull/6334

From aph at openjdk.java.net  Mon Nov 15 10:15:40 2021
From: aph at openjdk.java.net (Andrew Haley)
Date: Mon, 15 Nov 2021 10:15:40 GMT
Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v4]
In-Reply-To: <OLh8pzd5wf1j-KW7PI_0BRoxOEyPNxUvzfsUIW6tl9Y=.251fbd65-ba03-4b66-b132-e7fc32e676f1@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
 <OLh8pzd5wf1j-KW7PI_0BRoxOEyPNxUvzfsUIW6tl9Y=.251fbd65-ba03-4b66-b132-e7fc32e676f1@github.com>
Message-ID: <9F2FQ7FTjc4Jzjf63x0pKeb2VPMsjcPQ-iQUo_rwCf4=.16d9e002-e6c9-4a4a-922c-ccbdf6e00eab@github.com>

On Mon, 15 Nov 2021 09:07:11 GMT, Alan Hayward <duke at openjdk.java.net> wrote:

>> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One
>> of its uses is to protect against ROP based attacks. This is done by
>> signing the Link Register whenever it is stored on the stack, and
>> authenticating the value when it is loaded back from the stack. If an
>> attacker were to try to change control flow by editing the stack then
>> the authentication check of the Link Register will fail, causing a
>> segfault when the function returns.
>> 
>> On a system with PAC enabled, it is expected that all applications will
>> be compiled with ROP protection. Fedora 33 and upwards already provide
>> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of
>> PAC instructions that exist in the NOP space - on hardware without PAC,
>> these instructions act as NOPs, allowing backward compatibility for
>> negligible performance cost (2 NOPs per non-leaf function).
>> 
>> Hardware is currently limited to the Apple M1 MacBooks. All testing has
>> been done within a Fedora Docker image. A run of SpecJVM showed no
>> difference to that of noise - which was surprising.
>> 
>> The most important part of this patch is simply compiling using branch
>> protection provided by GCC/LLVM. This protects all C++ code from being
>> used in ROP attacks, removing all static ROP gadgets from use.
>> 
>> The remainder of the patch adds ROP protection to runtime generated
>> code, in both stubs and compiled Java code. Attacks here are much harder
>> as ROP gadgets must be found dynamically at runtime. If/when AOT
>> compilation is added to JDK, then all stubs and compiled Java will be
>> susceptible ROP gadgets being found by static analysis and therefore
>> potentially as vulnerable as C++ code.
>> 
>> There are a number of places where the VM changes control flow by
>> rewriting the stack or otherwise. I?ve done some analysis as to how
>> these could also be used for attacks (which I didn?t want to post here).
>> These areas can be protected ensuring the pointers to various stubs and
>> entry points are stored in memory as signed pointers. These changes are
>> simple to make (they can be reduced to a type change in common code and
>> a few addition sign/auth calls in the backend), but there a lot of them
>> and the total code change is fairly large. I?m happy to provide a few
>> work in progress patches.
>> 
>> In order to match the security benefits of the Apple Arm64e ABI across
>> the whole of JDK, then all the changes mentioned above would be
>> required.
>
> Alan Hayward has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits:
> 
>  - Merge master
>  - Document pauth functions && remove OS split
>  - Update UseROPProtection description
>  - Simplify branch protection configure check
>  - 8264130: PAC-RET protection for Linux/AArch64
>    
>    PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One
>    of its uses is to protect against ROP based attacks. This is done by
>    signing the Link Register whenever it is stored on the stack, and
>    authenticating the value when it is loaded back from the stack. If an
>    attacker were to try to change control flow by editing the stack then
>    the authentication check of the Link Register will fail, causing a
>    segfault when the function returns.
>    
>    On a system with PAC enabled, it is expected that all applications will
>    be compiled with ROP protection. Fedora 33 and upwards already provide
>    this. By compiling for ARMv8.0, GCC and LLVM will only use the set of
>    PAC instructions that exist in the NOP space - on hardware without PAC,
>    these instructions act as NOPs, allowing backward compatibility for
>    negligible performance cost (2 NOPs per non-leaf function).
>    
>    Hardware is currently limited to the Apple M1 MacBooks. All testing has
>    been done within a Fedora Docker image. A run of SpecJVM showed no
>    difference to that of noise - which was surprising.
>    
>    The most important part of this patch is simply compiling using branch
>    protection provided by GCC/LLVM. This protects all C++ code from being
>    used in ROP attacks, removing all static ROP gadgets from use.
>    
>    The remainder of the patch adds ROP protection to runtime generated
>    code, in both stubs and compiled Java code. Attacks here are much harder
>    as ROP gadgets must be found dynamically at runtime. If/when AOT
>    compilation is added to JDK, then all stubs and compiled Java will be
>    susceptible ROP gadgets being found by static analysis and therefore
>    potentially as vulnerable as C++ code.
>    
>    There are a number of places where the VM changes control flow by
>    rewriting the stack or otherwise. I?ve done some analysis as to how
>    these could also be used for attacks (which I didn?t want to post here).
>    These areas can be protected ensuring the pointers to various stubs and
>    entry points are stored in memory as signed pointers. These changes are
>    simple to make (they can be reduced to a type change in common code and
>    a few addition sign/auth calls in the backend), but there a lot of them
>    and the total code change is fairly large. I?m happy to provide a few
>    work in progress patches.
>    
>    In order to match the security benefits of the Apple Arm64e ABI across
>    the whole of JDK, then all the changes mentioned above would be
>    required.
>  - Add PAC assembly instructions
>  - Add AArch64 ROP protection runtime flag
>  - Build with branch protection

src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp line 452:

> 450: 
> 451:   // only r0 is valid at this time, all other registers have been destroyed by the runtime call
> 452:   __ invalidate_registers(false, true, true, true, true, true);

Not so: `lr` is live.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6334

From adinn at openjdk.java.net  Mon Nov 15 10:15:41 2021
From: adinn at openjdk.java.net (Andrew Dinn)
Date: Mon, 15 Nov 2021 10:15:41 GMT
Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v4]
In-Reply-To: <OLh8pzd5wf1j-KW7PI_0BRoxOEyPNxUvzfsUIW6tl9Y=.251fbd65-ba03-4b66-b132-e7fc32e676f1@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
 <OLh8pzd5wf1j-KW7PI_0BRoxOEyPNxUvzfsUIW6tl9Y=.251fbd65-ba03-4b66-b132-e7fc32e676f1@github.com>
Message-ID: <9psGxGDAGJTaAW2jtH3v3A6jsuq4x7aOMXMgJEyeLLI=.21f995de-9192-4483-a378-0a54e3d3745d@github.com>

On Mon, 15 Nov 2021 09:07:11 GMT, Alan Hayward <duke at openjdk.java.net> wrote:

>> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One
>> of its uses is to protect against ROP based attacks. This is done by
>> signing the Link Register whenever it is stored on the stack, and
>> authenticating the value when it is loaded back from the stack. If an
>> attacker were to try to change control flow by editing the stack then
>> the authentication check of the Link Register will fail, causing a
>> segfault when the function returns.
>> 
>> On a system with PAC enabled, it is expected that all applications will
>> be compiled with ROP protection. Fedora 33 and upwards already provide
>> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of
>> PAC instructions that exist in the NOP space - on hardware without PAC,
>> these instructions act as NOPs, allowing backward compatibility for
>> negligible performance cost (2 NOPs per non-leaf function).
>> 
>> Hardware is currently limited to the Apple M1 MacBooks. All testing has
>> been done within a Fedora Docker image. A run of SpecJVM showed no
>> difference to that of noise - which was surprising.
>> 
>> The most important part of this patch is simply compiling using branch
>> protection provided by GCC/LLVM. This protects all C++ code from being
>> used in ROP attacks, removing all static ROP gadgets from use.
>> 
>> The remainder of the patch adds ROP protection to runtime generated
>> code, in both stubs and compiled Java code. Attacks here are much harder
>> as ROP gadgets must be found dynamically at runtime. If/when AOT
>> compilation is added to JDK, then all stubs and compiled Java will be
>> susceptible ROP gadgets being found by static analysis and therefore
>> potentially as vulnerable as C++ code.
>> 
>> There are a number of places where the VM changes control flow by
>> rewriting the stack or otherwise. I?ve done some analysis as to how
>> these could also be used for attacks (which I didn?t want to post here).
>> These areas can be protected ensuring the pointers to various stubs and
>> entry points are stored in memory as signed pointers. These changes are
>> simple to make (they can be reduced to a type change in common code and
>> a few addition sign/auth calls in the backend), but there a lot of them
>> and the total code change is fairly large. I?m happy to provide a few
>> work in progress patches.
>> 
>> In order to match the security benefits of the Apple Arm64e ABI across
>> the whole of JDK, then all the changes mentioned above would be
>> required.
>
> Alan Hayward has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits:
> 
>  - Merge master
>  - Document pauth functions && remove OS split
>  - Update UseROPProtection description
>  - Simplify branch protection configure check
>  - 8264130: PAC-RET protection for Linux/AArch64
>    
>    PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One
>    of its uses is to protect against ROP based attacks. This is done by
>    signing the Link Register whenever it is stored on the stack, and
>    authenticating the value when it is loaded back from the stack. If an
>    attacker were to try to change control flow by editing the stack then
>    the authentication check of the Link Register will fail, causing a
>    segfault when the function returns.
>    
>    On a system with PAC enabled, it is expected that all applications will
>    be compiled with ROP protection. Fedora 33 and upwards already provide
>    this. By compiling for ARMv8.0, GCC and LLVM will only use the set of
>    PAC instructions that exist in the NOP space - on hardware without PAC,
>    these instructions act as NOPs, allowing backward compatibility for
>    negligible performance cost (2 NOPs per non-leaf function).
>    
>    Hardware is currently limited to the Apple M1 MacBooks. All testing has
>    been done within a Fedora Docker image. A run of SpecJVM showed no
>    difference to that of noise - which was surprising.
>    
>    The most important part of this patch is simply compiling using branch
>    protection provided by GCC/LLVM. This protects all C++ code from being
>    used in ROP attacks, removing all static ROP gadgets from use.
>    
>    The remainder of the patch adds ROP protection to runtime generated
>    code, in both stubs and compiled Java code. Attacks here are much harder
>    as ROP gadgets must be found dynamically at runtime. If/when AOT
>    compilation is added to JDK, then all stubs and compiled Java will be
>    susceptible ROP gadgets being found by static analysis and therefore
>    potentially as vulnerable as C++ code.
>    
>    There are a number of places where the VM changes control flow by
>    rewriting the stack or otherwise. I?ve done some analysis as to how
>    these could also be used for attacks (which I didn?t want to post here).
>    These areas can be protected ensuring the pointers to various stubs and
>    entry points are stored in memory as signed pointers. These changes are
>    simple to make (they can be reduced to a type change in common code and
>    a few addition sign/auth calls in the backend), but there a lot of them
>    and the total code change is fairly large. I?m happy to provide a few
>    work in progress patches.
>    
>    In order to match the security benefits of the Apple Arm64e ABI across
>    the whole of JDK, then all the changes mentioned above would be
>    required.
>  - Add PAC assembly instructions
>  - Add AArch64 ROP protection runtime flag
>  - Build with branch protection

src/hotspot/cpu/aarch64/pauth_aarch64.hpp line 33:

> 31: 
> 32: // Support for ROP Protection in VM code.
> 33: // This is provided by via the AArch64 PAC feature.

"by via" should just be "via"

-------------

PR: https://git.openjdk.java.net/jdk/pull/6334

From aph at openjdk.java.net  Mon Nov 15 10:18:42 2021
From: aph at openjdk.java.net (Andrew Haley)
Date: Mon, 15 Nov 2021 10:18:42 GMT
Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v4]
In-Reply-To: <OLh8pzd5wf1j-KW7PI_0BRoxOEyPNxUvzfsUIW6tl9Y=.251fbd65-ba03-4b66-b132-e7fc32e676f1@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
 <OLh8pzd5wf1j-KW7PI_0BRoxOEyPNxUvzfsUIW6tl9Y=.251fbd65-ba03-4b66-b132-e7fc32e676f1@github.com>
Message-ID: <81l6r4GfgLq9L4qhlvi_VWKE46vPqhspX-d7NG6Qux0=.4dbf25ed-4c3f-415b-9ffc-ddaf69211cf2@github.com>

On Mon, 15 Nov 2021 09:07:11 GMT, Alan Hayward <duke at openjdk.java.net> wrote:

>> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One
>> of its uses is to protect against ROP based attacks. This is done by
>> signing the Link Register whenever it is stored on the stack, and
>> authenticating the value when it is loaded back from the stack. If an
>> attacker were to try to change control flow by editing the stack then
>> the authentication check of the Link Register will fail, causing a
>> segfault when the function returns.
>> 
>> On a system with PAC enabled, it is expected that all applications will
>> be compiled with ROP protection. Fedora 33 and upwards already provide
>> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of
>> PAC instructions that exist in the NOP space - on hardware without PAC,
>> these instructions act as NOPs, allowing backward compatibility for
>> negligible performance cost (2 NOPs per non-leaf function).
>> 
>> Hardware is currently limited to the Apple M1 MacBooks. All testing has
>> been done within a Fedora Docker image. A run of SpecJVM showed no
>> difference to that of noise - which was surprising.
>> 
>> The most important part of this patch is simply compiling using branch
>> protection provided by GCC/LLVM. This protects all C++ code from being
>> used in ROP attacks, removing all static ROP gadgets from use.
>> 
>> The remainder of the patch adds ROP protection to runtime generated
>> code, in both stubs and compiled Java code. Attacks here are much harder
>> as ROP gadgets must be found dynamically at runtime. If/when AOT
>> compilation is added to JDK, then all stubs and compiled Java will be
>> susceptible ROP gadgets being found by static analysis and therefore
>> potentially as vulnerable as C++ code.
>> 
>> There are a number of places where the VM changes control flow by
>> rewriting the stack or otherwise. I?ve done some analysis as to how
>> these could also be used for attacks (which I didn?t want to post here).
>> These areas can be protected ensuring the pointers to various stubs and
>> entry points are stored in memory as signed pointers. These changes are
>> simple to make (they can be reduced to a type change in common code and
>> a few addition sign/auth calls in the backend), but there a lot of them
>> and the total code change is fairly large. I?m happy to provide a few
>> work in progress patches.
>> 
>> In order to match the security benefits of the Apple Arm64e ABI across
>> the whole of JDK, then all the changes mentioned above would be
>> required.
>
> Alan Hayward has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits:
> 
>  - Merge master
>  - Document pauth functions && remove OS split
>  - Update UseROPProtection description
>  - Simplify branch protection configure check
>  - 8264130: PAC-RET protection for Linux/AArch64
>    
>    PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One
>    of its uses is to protect against ROP based attacks. This is done by
>    signing the Link Register whenever it is stored on the stack, and
>    authenticating the value when it is loaded back from the stack. If an
>    attacker were to try to change control flow by editing the stack then
>    the authentication check of the Link Register will fail, causing a
>    segfault when the function returns.
>    
>    On a system with PAC enabled, it is expected that all applications will
>    be compiled with ROP protection. Fedora 33 and upwards already provide
>    this. By compiling for ARMv8.0, GCC and LLVM will only use the set of
>    PAC instructions that exist in the NOP space - on hardware without PAC,
>    these instructions act as NOPs, allowing backward compatibility for
>    negligible performance cost (2 NOPs per non-leaf function).
>    
>    Hardware is currently limited to the Apple M1 MacBooks. All testing has
>    been done within a Fedora Docker image. A run of SpecJVM showed no
>    difference to that of noise - which was surprising.
>    
>    The most important part of this patch is simply compiling using branch
>    protection provided by GCC/LLVM. This protects all C++ code from being
>    used in ROP attacks, removing all static ROP gadgets from use.
>    
>    The remainder of the patch adds ROP protection to runtime generated
>    code, in both stubs and compiled Java code. Attacks here are much harder
>    as ROP gadgets must be found dynamically at runtime. If/when AOT
>    compilation is added to JDK, then all stubs and compiled Java will be
>    susceptible ROP gadgets being found by static analysis and therefore
>    potentially as vulnerable as C++ code.
>    
>    There are a number of places where the VM changes control flow by
>    rewriting the stack or otherwise. I?ve done some analysis as to how
>    these could also be used for attacks (which I didn?t want to post here).
>    These areas can be protected ensuring the pointers to various stubs and
>    entry points are stored in memory as signed pointers. These changes are
>    simple to make (they can be reduced to a type change in common code and
>    a few addition sign/auth calls in the backend), but there a lot of them
>    and the total code change is fairly large. I?m happy to provide a few
>    work in progress patches.
>    
>    In order to match the security benefits of the Apple Arm64e ABI across
>    the whole of JDK, then all the changes mentioned above would be
>    required.
>  - Add PAC assembly instructions
>  - Add AArch64 ROP protection runtime flag
>  - Build with branch protection

src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp line 452:

> 450:   // patch the return address, this stub will directly return to the exception handler
> 451:   __ str(r0, Address(rfp, 1*BytesPerWord));
> 452: 

Please explain the reason for this change, that leaves `lr` live across `restore_live_registers()`.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6334

From aph at openjdk.java.net  Mon Nov 15 10:23:38 2021
From: aph at openjdk.java.net (Andrew Haley)
Date: Mon, 15 Nov 2021 10:23:38 GMT
Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v4]
In-Reply-To: <OLh8pzd5wf1j-KW7PI_0BRoxOEyPNxUvzfsUIW6tl9Y=.251fbd65-ba03-4b66-b132-e7fc32e676f1@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
 <OLh8pzd5wf1j-KW7PI_0BRoxOEyPNxUvzfsUIW6tl9Y=.251fbd65-ba03-4b66-b132-e7fc32e676f1@github.com>
Message-ID: <jsMk_t2ZCmDPuJxtAknP2eTvCg38xAJZ3AMkSz_bxDQ=.c23d56a6-f22f-4d93-8535-22369831a73b@github.com>

On Mon, 15 Nov 2021 09:07:11 GMT, Alan Hayward <duke at openjdk.java.net> wrote:

>> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One
>> of its uses is to protect against ROP based attacks. This is done by
>> signing the Link Register whenever it is stored on the stack, and
>> authenticating the value when it is loaded back from the stack. If an
>> attacker were to try to change control flow by editing the stack then
>> the authentication check of the Link Register will fail, causing a
>> segfault when the function returns.
>> 
>> On a system with PAC enabled, it is expected that all applications will
>> be compiled with ROP protection. Fedora 33 and upwards already provide
>> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of
>> PAC instructions that exist in the NOP space - on hardware without PAC,
>> these instructions act as NOPs, allowing backward compatibility for
>> negligible performance cost (2 NOPs per non-leaf function).
>> 
>> Hardware is currently limited to the Apple M1 MacBooks. All testing has
>> been done within a Fedora Docker image. A run of SpecJVM showed no
>> difference to that of noise - which was surprising.
>> 
>> The most important part of this patch is simply compiling using branch
>> protection provided by GCC/LLVM. This protects all C++ code from being
>> used in ROP attacks, removing all static ROP gadgets from use.
>> 
>> The remainder of the patch adds ROP protection to runtime generated
>> code, in both stubs and compiled Java code. Attacks here are much harder
>> as ROP gadgets must be found dynamically at runtime. If/when AOT
>> compilation is added to JDK, then all stubs and compiled Java will be
>> susceptible ROP gadgets being found by static analysis and therefore
>> potentially as vulnerable as C++ code.
>> 
>> There are a number of places where the VM changes control flow by
>> rewriting the stack or otherwise. I?ve done some analysis as to how
>> these could also be used for attacks (which I didn?t want to post here).
>> These areas can be protected ensuring the pointers to various stubs and
>> entry points are stored in memory as signed pointers. These changes are
>> simple to make (they can be reduced to a type change in common code and
>> a few addition sign/auth calls in the backend), but there a lot of them
>> and the total code change is fairly large. I?m happy to provide a few
>> work in progress patches.
>> 
>> In order to match the security benefits of the Apple Arm64e ABI across
>> the whole of JDK, then all the changes mentioned above would be
>> required.
>
> Alan Hayward has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits:
> 
>  - Merge master
>  - Document pauth functions && remove OS split
>  - Update UseROPProtection description
>  - Simplify branch protection configure check
>  - 8264130: PAC-RET protection for Linux/AArch64
>    
>    PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One
>    of its uses is to protect against ROP based attacks. This is done by
>    signing the Link Register whenever it is stored on the stack, and
>    authenticating the value when it is loaded back from the stack. If an
>    attacker were to try to change control flow by editing the stack then
>    the authentication check of the Link Register will fail, causing a
>    segfault when the function returns.
>    
>    On a system with PAC enabled, it is expected that all applications will
>    be compiled with ROP protection. Fedora 33 and upwards already provide
>    this. By compiling for ARMv8.0, GCC and LLVM will only use the set of
>    PAC instructions that exist in the NOP space - on hardware without PAC,
>    these instructions act as NOPs, allowing backward compatibility for
>    negligible performance cost (2 NOPs per non-leaf function).
>    
>    Hardware is currently limited to the Apple M1 MacBooks. All testing has
>    been done within a Fedora Docker image. A run of SpecJVM showed no
>    difference to that of noise - which was surprising.
>    
>    The most important part of this patch is simply compiling using branch
>    protection provided by GCC/LLVM. This protects all C++ code from being
>    used in ROP attacks, removing all static ROP gadgets from use.
>    
>    The remainder of the patch adds ROP protection to runtime generated
>    code, in both stubs and compiled Java code. Attacks here are much harder
>    as ROP gadgets must be found dynamically at runtime. If/when AOT
>    compilation is added to JDK, then all stubs and compiled Java will be
>    susceptible ROP gadgets being found by static analysis and therefore
>    potentially as vulnerable as C++ code.
>    
>    There are a number of places where the VM changes control flow by
>    rewriting the stack or otherwise. I?ve done some analysis as to how
>    these could also be used for attacks (which I didn?t want to post here).
>    These areas can be protected ensuring the pointers to various stubs and
>    entry points are stored in memory as signed pointers. These changes are
>    simple to make (they can be reduced to a type change in common code and
>    a few addition sign/auth calls in the backend), but there a lot of them
>    and the total code change is fairly large. I?m happy to provide a few
>    work in progress patches.
>    
>    In order to match the security benefits of the Apple Arm64e ABI across
>    the whole of JDK, then all the changes mentioned above would be
>    required.
>  - Add PAC assembly instructions
>  - Add AArch64 ROP protection runtime flag
>  - Build with branch protection

src/hotspot/cpu/aarch64/pauth_aarch64.hpp line 132:

> 130: // Authenticate or strip a return value. Use for efficiency and only when the safety of the data
> 131: // isn't an issue - for example when viewing the stack.
> 132: //

So, whether this function authenticates or strips the address depends only on debugging? The vague name makes the callers hard to read.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6334

From adinn at openjdk.java.net  Mon Nov 15 10:31:52 2021
From: adinn at openjdk.java.net (Andrew Dinn)
Date: Mon, 15 Nov 2021 10:31:52 GMT
Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v4]
In-Reply-To: <OLh8pzd5wf1j-KW7PI_0BRoxOEyPNxUvzfsUIW6tl9Y=.251fbd65-ba03-4b66-b132-e7fc32e676f1@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
 <OLh8pzd5wf1j-KW7PI_0BRoxOEyPNxUvzfsUIW6tl9Y=.251fbd65-ba03-4b66-b132-e7fc32e676f1@github.com>
Message-ID: <464NS7NEldWHQM0Q8PF_MzPfl9O0CUj9GjbeI_qdjEc=.758f4244-8239-4e5d-bb08-a0dec85c2a06@github.com>

On Mon, 15 Nov 2021 09:07:11 GMT, Alan Hayward <duke at openjdk.java.net> wrote:

>> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One
>> of its uses is to protect against ROP based attacks. This is done by
>> signing the Link Register whenever it is stored on the stack, and
>> authenticating the value when it is loaded back from the stack. If an
>> attacker were to try to change control flow by editing the stack then
>> the authentication check of the Link Register will fail, causing a
>> segfault when the function returns.
>> 
>> On a system with PAC enabled, it is expected that all applications will
>> be compiled with ROP protection. Fedora 33 and upwards already provide
>> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of
>> PAC instructions that exist in the NOP space - on hardware without PAC,
>> these instructions act as NOPs, allowing backward compatibility for
>> negligible performance cost (2 NOPs per non-leaf function).
>> 
>> Hardware is currently limited to the Apple M1 MacBooks. All testing has
>> been done within a Fedora Docker image. A run of SpecJVM showed no
>> difference to that of noise - which was surprising.
>> 
>> The most important part of this patch is simply compiling using branch
>> protection provided by GCC/LLVM. This protects all C++ code from being
>> used in ROP attacks, removing all static ROP gadgets from use.
>> 
>> The remainder of the patch adds ROP protection to runtime generated
>> code, in both stubs and compiled Java code. Attacks here are much harder
>> as ROP gadgets must be found dynamically at runtime. If/when AOT
>> compilation is added to JDK, then all stubs and compiled Java will be
>> susceptible ROP gadgets being found by static analysis and therefore
>> potentially as vulnerable as C++ code.
>> 
>> There are a number of places where the VM changes control flow by
>> rewriting the stack or otherwise. I?ve done some analysis as to how
>> these could also be used for attacks (which I didn?t want to post here).
>> These areas can be protected ensuring the pointers to various stubs and
>> entry points are stored in memory as signed pointers. These changes are
>> simple to make (they can be reduced to a type change in common code and
>> a few addition sign/auth calls in the backend), but there a lot of them
>> and the total code change is fairly large. I?m happy to provide a few
>> work in progress patches.
>> 
>> In order to match the security benefits of the Apple Arm64e ABI across
>> the whole of JDK, then all the changes mentioned above would be
>> required.
>
> Alan Hayward has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits:
> 
>  - Merge master
>  - Document pauth functions && remove OS split
>  - Update UseROPProtection description
>  - Simplify branch protection configure check
>  - 8264130: PAC-RET protection for Linux/AArch64
>    
>    PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One
>    of its uses is to protect against ROP based attacks. This is done by
>    signing the Link Register whenever it is stored on the stack, and
>    authenticating the value when it is loaded back from the stack. If an
>    attacker were to try to change control flow by editing the stack then
>    the authentication check of the Link Register will fail, causing a
>    segfault when the function returns.
>    
>    On a system with PAC enabled, it is expected that all applications will
>    be compiled with ROP protection. Fedora 33 and upwards already provide
>    this. By compiling for ARMv8.0, GCC and LLVM will only use the set of
>    PAC instructions that exist in the NOP space - on hardware without PAC,
>    these instructions act as NOPs, allowing backward compatibility for
>    negligible performance cost (2 NOPs per non-leaf function).
>    
>    Hardware is currently limited to the Apple M1 MacBooks. All testing has
>    been done within a Fedora Docker image. A run of SpecJVM showed no
>    difference to that of noise - which was surprising.
>    
>    The most important part of this patch is simply compiling using branch
>    protection provided by GCC/LLVM. This protects all C++ code from being
>    used in ROP attacks, removing all static ROP gadgets from use.
>    
>    The remainder of the patch adds ROP protection to runtime generated
>    code, in both stubs and compiled Java code. Attacks here are much harder
>    as ROP gadgets must be found dynamically at runtime. If/when AOT
>    compilation is added to JDK, then all stubs and compiled Java will be
>    susceptible ROP gadgets being found by static analysis and therefore
>    potentially as vulnerable as C++ code.
>    
>    There are a number of places where the VM changes control flow by
>    rewriting the stack or otherwise. I?ve done some analysis as to how
>    these could also be used for attacks (which I didn?t want to post here).
>    These areas can be protected ensuring the pointers to various stubs and
>    entry points are stored in memory as signed pointers. These changes are
>    simple to make (they can be reduced to a type change in common code and
>    a few addition sign/auth calls in the backend), but there a lot of them
>    and the total code change is fairly large. I?m happy to provide a few
>    work in progress patches.
>    
>    In order to match the security benefits of the Apple Arm64e ABI across
>    the whole of JDK, then all the changes mentioned above would be
>    required.
>  - Add PAC assembly instructions
>  - Add AArch64 ROP protection runtime flag
>  - Build with branch protection

This is much clearer and looks good to push modulo a minor typo I noted in a comment.

-------------

Marked as reviewed by adinn (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6334

From duke at openjdk.java.net  Mon Nov 15 10:42:38 2021
From: duke at openjdk.java.net (Alan Hayward)
Date: Mon, 15 Nov 2021 10:42:38 GMT
Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v4]
In-Reply-To: <AslxqdYQYOyLnckgXKHb9yB5_UyxQbliLC2DeVIHpG8=.14fb94d9-2cde-4343-b4f1-f7c7c7eeb44f@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
 <qyoqdCskYNR6Q1WG3fZP-XMWMdM1Uwg8k7nJhFQzoN0=.f41ee0da-eda2-40dd-99c5-9931964b6953@github.com>
 <AslxqdYQYOyLnckgXKHb9yB5_UyxQbliLC2DeVIHpG8=.14fb94d9-2cde-4343-b4f1-f7c7c7eeb44f@github.com>
Message-ID: <hic5h0qN6ooVETV6RFDDBuXA9owugtVoEY9Pwx1_rvw=.9218aa98-fd4c-4d8c-86b2-4101d4633394@github.com>

On Wed, 10 Nov 2021 15:01:51 GMT, Alan Hayward <duke at openjdk.java.net> wrote:

>> src/hotspot/os_cpu/bsd_aarch64/pauth_bsd_aarch64.inline.hpp line 25:
>> 
>>> 23:  */
>>> 24: 
>>> 25: #ifndef OS_CPU_BSD_AARCH64_PAUTH_BSD_AARCH64_INLINE_HPP
>> 
>> Are these two files different enough to separate them for BSD and Linux?
>
> My motivation was to avoid having any ifdefs - but we need one anyway for the apple ifdef.
> 
> If I merged the two we would end up with just the contents of the BSD version of the file.
> 
> There is also the windows version of the file, which for now has empty functions. If PAC in windows is added, that'll either use the same code or maybe Windows will provide an API (like the Apple one). Merging everything would mean windows gains the UseROPProtection check.

>Are these two files different enough to separate them for BSD and Linux?

Merging these files then broke everything for windows (because the asm function is different). Having a "ifdef apple, elseif windows else" doesn't really make sense, so I'll split the files out again.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6334

From duke at openjdk.java.net  Mon Nov 15 11:01:36 2021
From: duke at openjdk.java.net (Alan Hayward)
Date: Mon, 15 Nov 2021 11:01:36 GMT
Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v4]
In-Reply-To: <jsMk_t2ZCmDPuJxtAknP2eTvCg38xAJZ3AMkSz_bxDQ=.c23d56a6-f22f-4d93-8535-22369831a73b@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
 <OLh8pzd5wf1j-KW7PI_0BRoxOEyPNxUvzfsUIW6tl9Y=.251fbd65-ba03-4b66-b132-e7fc32e676f1@github.com>
 <jsMk_t2ZCmDPuJxtAknP2eTvCg38xAJZ3AMkSz_bxDQ=.c23d56a6-f22f-4d93-8535-22369831a73b@github.com>
Message-ID: <yyWNr32v4R15a7xWsOf2e5BcDAdsraKZsHDgC7fXpCc=.dfb1f70b-676f-4cf5-b3bb-e3a3ce09ef86@github.com>

On Mon, 15 Nov 2021 10:20:15 GMT, Andrew Haley <aph at openjdk.org> wrote:

>whether this function authenticates or strips the address depends only on debugging?

Yes. We only need to strip the value, because we're not jumping to the lr value, only viewing it.

The interface is different to a strip (as we need to pass in the modifier). 

How about something like pauth_authenticate_fast() ? or pauth_authenticate_unsafe() ?

Alternatively, this function is only called by the functions in Frame, so the frequency of use is probably low enough (compared to the sign/auth every function) that it's not going to cause any performance issues. So, could just replace with calls to pauth_authenticate. I think that might be the best option.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6334

From aph at openjdk.java.net  Mon Nov 15 11:11:37 2021
From: aph at openjdk.java.net (Andrew Haley)
Date: Mon, 15 Nov 2021 11:11:37 GMT
Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v4]
In-Reply-To: <yyWNr32v4R15a7xWsOf2e5BcDAdsraKZsHDgC7fXpCc=.dfb1f70b-676f-4cf5-b3bb-e3a3ce09ef86@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
 <OLh8pzd5wf1j-KW7PI_0BRoxOEyPNxUvzfsUIW6tl9Y=.251fbd65-ba03-4b66-b132-e7fc32e676f1@github.com>
 <jsMk_t2ZCmDPuJxtAknP2eTvCg38xAJZ3AMkSz_bxDQ=.c23d56a6-f22f-4d93-8535-22369831a73b@github.com>
 <yyWNr32v4R15a7xWsOf2e5BcDAdsraKZsHDgC7fXpCc=.dfb1f70b-676f-4cf5-b3bb-e3a3ce09ef86@github.com>
Message-ID: <mG3X2wbGNgWTvBJvptINmYx-_s4f8Xw4iIkNJkT1tsU=.29d21bb1-ae4e-4b2b-b9a3-6b4c2db0e52c@github.com>

On Mon, 15 Nov 2021 10:58:06 GMT, Alan Hayward <duke at openjdk.java.net> wrote:

>> src/hotspot/cpu/aarch64/pauth_aarch64.hpp line 132:
>> 
>>> 130: // Authenticate or strip a return value. Use for efficiency and only when the safety of the data
>>> 131: // isn't an issue - for example when viewing the stack.
>>> 132: //
>> 
>> So, whether this function authenticates or strips the address depends only on debugging? The vague name makes the callers hard to read.
>
>>whether this function authenticates or strips the address depends only on debugging?
> 
> Yes. We only need to strip the value, because we're not jumping to the lr value, only viewing it.
> 
> The interface is different to a strip (as we need to pass in the modifier). 
> 
> How about something like pauth_authenticate_fast() ? or pauth_authenticate_unsafe() ?
> 
> Alternatively, this function is only called by the functions in Frame, so the frequency of use is probably low enough (compared to the sign/auth every function) that it's not going to cause any performance issues. So, could just replace with calls to pauth_authenticate. I think that might be the best option.

A simple rule here: function names go with what the release version does. So I'd go with the actual purpose, which is `pauth_strip_addr_for_debuginfo()`. That's right, isn't it? You only want this thing for stack traces, logs, etc.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6334

From duke at openjdk.java.net  Mon Nov 15 11:24:46 2021
From: duke at openjdk.java.net (Alan Hayward)
Date: Mon, 15 Nov 2021 11:24:46 GMT
Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v4]
In-Reply-To: <81l6r4GfgLq9L4qhlvi_VWKE46vPqhspX-d7NG6Qux0=.4dbf25ed-4c3f-415b-9ffc-ddaf69211cf2@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
 <OLh8pzd5wf1j-KW7PI_0BRoxOEyPNxUvzfsUIW6tl9Y=.251fbd65-ba03-4b66-b132-e7fc32e676f1@github.com>
 <81l6r4GfgLq9L4qhlvi_VWKE46vPqhspX-d7NG6Qux0=.4dbf25ed-4c3f-415b-9ffc-ddaf69211cf2@github.com>
Message-ID: <CrzmKGLq1Oc7XXvZ5TLWZ7FA4UASqZAtyCJs969Bvg8=.b8c82b75-cb7b-4ab4-a3f1-bdad09b5cd4e@github.com>

On Mon, 15 Nov 2021 10:15:41 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> Alan Hayward has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits:
>> 
>>  - Merge master
>>  - Document pauth functions && remove OS split
>>  - Update UseROPProtection description
>>  - Simplify branch protection configure check
>>  - 8264130: PAC-RET protection for Linux/AArch64
>>    
>>    PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One
>>    of its uses is to protect against ROP based attacks. This is done by
>>    signing the Link Register whenever it is stored on the stack, and
>>    authenticating the value when it is loaded back from the stack. If an
>>    attacker were to try to change control flow by editing the stack then
>>    the authentication check of the Link Register will fail, causing a
>>    segfault when the function returns.
>>    
>>    On a system with PAC enabled, it is expected that all applications will
>>    be compiled with ROP protection. Fedora 33 and upwards already provide
>>    this. By compiling for ARMv8.0, GCC and LLVM will only use the set of
>>    PAC instructions that exist in the NOP space - on hardware without PAC,
>>    these instructions act as NOPs, allowing backward compatibility for
>>    negligible performance cost (2 NOPs per non-leaf function).
>>    
>>    Hardware is currently limited to the Apple M1 MacBooks. All testing has
>>    been done within a Fedora Docker image. A run of SpecJVM showed no
>>    difference to that of noise - which was surprising.
>>    
>>    The most important part of this patch is simply compiling using branch
>>    protection provided by GCC/LLVM. This protects all C++ code from being
>>    used in ROP attacks, removing all static ROP gadgets from use.
>>    
>>    The remainder of the patch adds ROP protection to runtime generated
>>    code, in both stubs and compiled Java code. Attacks here are much harder
>>    as ROP gadgets must be found dynamically at runtime. If/when AOT
>>    compilation is added to JDK, then all stubs and compiled Java will be
>>    susceptible ROP gadgets being found by static analysis and therefore
>>    potentially as vulnerable as C++ code.
>>    
>>    There are a number of places where the VM changes control flow by
>>    rewriting the stack or otherwise. I?ve done some analysis as to how
>>    these could also be used for attacks (which I didn?t want to post here).
>>    These areas can be protected ensuring the pointers to various stubs and
>>    entry points are stored in memory as signed pointers. These changes are
>>    simple to make (they can be reduced to a type change in common code and
>>    a few addition sign/auth calls in the backend), but there a lot of them
>>    and the total code change is fairly large. I?m happy to provide a few
>>    work in progress patches.
>>    
>>    In order to match the security benefits of the Apple Arm64e ABI across
>>    the whole of JDK, then all the changes mentioned above would be
>>    required.
>>  - Add PAC assembly instructions
>>  - Add AArch64 ROP protection runtime flag
>>  - Build with branch protection
>
> src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp line 452:
> 
>> 450:   // patch the return address, this stub will directly return to the exception handler
>> 451:   __ str(r0, Address(rfp, 1*BytesPerWord));
>> 452: 
> 
> Please explain the reason for this change, that leaves `lr` live across `restore_live_registers()`.

In the original code:
*save r0 to the lr location on the stack
*restore_live_registers
*Standard return: remove stack frame, load lr and fp off the stack, jump to lr.
 
With PAC it would now be:
*Sign r0 then save it to the lr location on the stack
*restore_live_registers
*Standard return: remove stack frame, load lr and fp off the stack, auth lr, jump to lr.

After reading the code in restore_live_registers, it doesn't touch lr and so seemed odd to have the save to the stack, only to restore it directly afterwards.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6334

From aph at openjdk.java.net  Mon Nov 15 11:33:41 2021
From: aph at openjdk.java.net (Andrew Haley)
Date: Mon, 15 Nov 2021 11:33:41 GMT
Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v4]
In-Reply-To: <CrzmKGLq1Oc7XXvZ5TLWZ7FA4UASqZAtyCJs969Bvg8=.b8c82b75-cb7b-4ab4-a3f1-bdad09b5cd4e@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
 <OLh8pzd5wf1j-KW7PI_0BRoxOEyPNxUvzfsUIW6tl9Y=.251fbd65-ba03-4b66-b132-e7fc32e676f1@github.com>
 <81l6r4GfgLq9L4qhlvi_VWKE46vPqhspX-d7NG6Qux0=.4dbf25ed-4c3f-415b-9ffc-ddaf69211cf2@github.com>
 <CrzmKGLq1Oc7XXvZ5TLWZ7FA4UASqZAtyCJs969Bvg8=.b8c82b75-cb7b-4ab4-a3f1-bdad09b5cd4e@github.com>
Message-ID: <1MtnvG48AfLFiyinjPMaT6KJ1MdM15mp2k2UMrryCgk=.7d91bba7-f1a7-4a26-8a4d-e1388a8b88ea@github.com>

On Mon, 15 Nov 2021 11:21:37 GMT, Alan Hayward <duke at openjdk.java.net> wrote:

>> src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp line 452:
>> 
>>> 450:   // patch the return address, this stub will directly return to the exception handler
>>> 451:   __ str(r0, Address(rfp, 1*BytesPerWord));
>>> 452: 
>> 
>> Please explain the reason for this change, that leaves `lr` live across `restore_live_registers()`.
>
> In the original code:
> *save r0 to the lr location on the stack
> *restore_live_registers
> *Standard return: remove stack frame, load lr and fp off the stack, jump to lr.
>  
> With PAC it would now be:
> *Sign r0 then save it to the lr location on the stack
> *restore_live_registers
> *Standard return: remove stack frame, load lr and fp off the stack, auth lr, jump to lr.
> 
> After reading the code in restore_live_registers, it doesn't touch lr and so seemed odd to have the save to the stack, only to restore it directly afterwards.

That's an optimization, though. You shouldn't need to read the code in `restore_live_registers()` to see if it's safe to keep the return address in LR: at best it's pathological coupling, in the sense that the correctness of this code depends on the internal details of  `restore_live_registers()`. Let's keep LR live ranges as short as possible.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6334

From duke at openjdk.java.net  Mon Nov 15 11:40:39 2021
From: duke at openjdk.java.net (Alan Hayward)
Date: Mon, 15 Nov 2021 11:40:39 GMT
Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v4]
In-Reply-To: <1MtnvG48AfLFiyinjPMaT6KJ1MdM15mp2k2UMrryCgk=.7d91bba7-f1a7-4a26-8a4d-e1388a8b88ea@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
 <OLh8pzd5wf1j-KW7PI_0BRoxOEyPNxUvzfsUIW6tl9Y=.251fbd65-ba03-4b66-b132-e7fc32e676f1@github.com>
 <81l6r4GfgLq9L4qhlvi_VWKE46vPqhspX-d7NG6Qux0=.4dbf25ed-4c3f-415b-9ffc-ddaf69211cf2@github.com>
 <CrzmKGLq1Oc7XXvZ5TLWZ7FA4UASqZAtyCJs969Bvg8=.b8c82b75-cb7b-4ab4-a3f1-bdad09b5cd4e@github.com>
 <1MtnvG48AfLFiyinjPMaT6KJ1MdM15mp2k2UMrryCgk=.7d91bba7-f1a7-4a26-8a4d-e1388a8b88ea@github.com>
Message-ID: <aRDPjr8a5KzXY8KbvrNJCWqYR3e7hPV75IXUufd0jm8=.65eb7220-660b-42a3-ac0f-497c77421ba8@github.com>

On Mon, 15 Nov 2021 11:30:35 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> In the original code:
>> *save r0 to the lr location on the stack
>> *restore_live_registers
>> *Standard return: remove stack frame, load lr and fp off the stack, jump to lr.
>>  
>> With PAC it would now be:
>> *Sign r0 then save it to the lr location on the stack
>> *restore_live_registers
>> *Standard return: remove stack frame, load lr and fp off the stack, auth lr, jump to lr.
>> 
>> After reading the code in restore_live_registers, it doesn't touch lr and so seemed odd to have the save to the stack, only to restore it directly afterwards.
>
> That's an optimization, though. You shouldn't need to read the code in `restore_live_registers()` to see if it's safe to keep the return address in LR: at best it's pathological coupling, in the sense that the correctness of this code depends on the internal details of  `restore_live_registers()`. Let's keep LR live ranges as short as possible.

Ok, that's fine, I'll update it (It'll simplify the total code diff too).

-------------

PR: https://git.openjdk.java.net/jdk/pull/6334

From adinn at openjdk.java.net  Mon Nov 15 11:56:47 2021
From: adinn at openjdk.java.net (Andrew Dinn)
Date: Mon, 15 Nov 2021 11:56:47 GMT
Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v4]
In-Reply-To: <mG3X2wbGNgWTvBJvptINmYx-_s4f8Xw4iIkNJkT1tsU=.29d21bb1-ae4e-4b2b-b9a3-6b4c2db0e52c@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
 <OLh8pzd5wf1j-KW7PI_0BRoxOEyPNxUvzfsUIW6tl9Y=.251fbd65-ba03-4b66-b132-e7fc32e676f1@github.com>
 <jsMk_t2ZCmDPuJxtAknP2eTvCg38xAJZ3AMkSz_bxDQ=.c23d56a6-f22f-4d93-8535-22369831a73b@github.com>
 <yyWNr32v4R15a7xWsOf2e5BcDAdsraKZsHDgC7fXpCc=.dfb1f70b-676f-4cf5-b3bb-e3a3ce09ef86@github.com>
 <mG3X2wbGNgWTvBJvptINmYx-_s4f8Xw4iIkNJkT1tsU=.29d21bb1-ae4e-4b2b-b9a3-6b4c2db0e52c@github.com>
Message-ID: <Gi4u7r31v1mCU91c5Z_GuK0IMLDBRoe0f1UgbHZPNpw=.128d30ba-4306-4124-8db7-8cd2ad5e6173@github.com>

On Mon, 15 Nov 2021 11:08:57 GMT, Andrew Haley <aph at openjdk.org> wrote:

>>>whether this function authenticates or strips the address depends only on debugging?
>> 
>> Yes. We only need to strip the value, because we're not jumping to the lr value, only viewing it.
>> 
>> The interface is different to a strip (as we need to pass in the modifier). 
>> 
>> How about something like pauth_authenticate_fast() ? or pauth_authenticate_unsafe() ?
>> 
>> Alternatively, this function is only called by the functions in Frame, so the frequency of use is probably low enough (compared to the sign/auth every function) that it's not going to cause any performance issues. So, could just replace with calls to pauth_authenticate. I think that might be the best option.
>
> A simple rule here: function names go with what the release version does. So I'd go with the actual purpose, which is `pauth_strip_addr_for_debuginfo()`. That's right, isn't it? You only want this thing for stack traces, logs, etc.

This function is used by the frame code. So, that means it is used for all stack walks which are far from being simply cosmetic/ornamental. The runtime will rely on this for various different types of thread housekeeping.

The difference here is that in product mode this simply strips auth bits whereas in debug mode it actually authenticates as it strips to give extra verification. So, your suggested name is quite misleading. Likewise Alan's suggested names is misleading because the primary product operation is to strip not authenticate.

How about pauth_strip_verifiable? and a comment saying that it differs from pauth_strip by actually authenticating when debug is enabled.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6334

From chagedorn at openjdk.java.net  Mon Nov 15 12:59:36 2021
From: chagedorn at openjdk.java.net (Christian Hagedorn)
Date: Mon, 15 Nov 2021 12:59:36 GMT
Subject: RFR: 8276095: ciReplay: replay failure due to incomplete
 ciMethodData information [v4]
In-Reply-To: <R-41UNuBHRN7YthwZ5eaXjUT_Jv5GTJsYFpZlMopisA=.9c9bf2c5-c087-4593-bcbf-47eb5707bbdd@github.com>
References: <EjhE5ReChm9c2C6F0K-WE94bWhagEW7y7qcWLlp6EvY=.a49a439b-32f5-40b1-a81c-47ed0d8047ee@github.com>
 <R-41UNuBHRN7YthwZ5eaXjUT_Jv5GTJsYFpZlMopisA=.9c9bf2c5-c087-4593-bcbf-47eb5707bbdd@github.com>
Message-ID: <ZrQ7kKztEgecd_9L3n7WYhuymaTquc7aVgJK_8iDiJA=.021437e8-805f-4642-b792-1880515959e2@github.com>

On Fri, 12 Nov 2021 20:40:07 GMT, Dean Long <dlong at openjdk.org> wrote:

>> The replay data was missing MethodData::_invocation_counter.  Adding it seems to fix the problem.  @rwestrel please verify if it works for you.  Also, with this change:
>> 1. added a version number to the replay file
>> 2. removed unnused ci fields
>> 3. corrected comment in TestLambdas.java
>
> Dean Long has updated the pull request incrementally with one additional commit since the last revision:
> 
>   strengthen version check

> > I guess we could leave this in for old replay files with the initialization further down in ciReplay::initialize() if _version < 1.
> 
> Yes, it's necessary to parse the value for old replay files, but the value is never used. I'm not sure what you are suggesting about the initialization further down.

Ok, I was missing that the parsed value has no effect anyways. Then you do not need this code for the initialization on L1416 for `_version <  1`:

m->_current_mileage = rec->_current_mileage;


> > However, thinking again about this, it should not happen that we parse a version number that's not supported
> 
> A user could be using an older JDK but accidentally try a newer replay file. That was the scenario I had in mind.

That's a valid point for emitting a warning instead.

Changes look good to me!

-------------

Marked as reviewed by chagedorn (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6344

From jai.forums2013 at gmail.com  Mon Nov 15 13:45:13 2021
From: jai.forums2013 at gmail.com (Jaikiran Pai)
Date: Mon, 15 Nov 2021 19:15:13 +0530
Subject: jtreg cannot be run against a pre-built/downloaded JDK anymore?
In-Reply-To: <CAA-vtUzt=efM-HRZP3cM1tBFbGWjZj-XGjtCZa6dC3kS5NZ7yQ@mail.gmail.com>
References: <5405e633-00b4-949f-a982-d1057187d21a@gmail.com>
 <BYAPR10MB2471FBE391C0C48115E285839F969@BYAPR10MB2471.namprd10.prod.outlook.com>
 <57f314df-33fd-1af1-8468-19d17b6d69ad@gmail.com>
 <680a2669-d5f8-a596-e3ad-071b9fb66233@gmail.com>
 <1102c99b-e4e7-5862-fd81-cb39af4c3a81@oracle.com>
 <CAA-vtUzt=efM-HRZP3cM1tBFbGWjZj-XGjtCZa6dC3kS5NZ7yQ@mail.gmail.com>
Message-ID: <468cdc88-86de-8fde-a1b7-44837b9453a5@gmail.com>

The way I used to use this previously was more for convenience than 
anything more. Very specifically, I used to do something like this:

- Work on some bug fix with latest JDK master source repo.

- Add a jtreg test to verify the fix

- Send out a PR and wait for reviews

- On some occasions, the review suggestions include relatively big 
changes to the jtreg test case. In such cases, I used to do those 
changes in the test, verify that the test still continues to pass. 
However, I would even want to make sure the test still reproduces the 
original issue. So instead of git reverting only the source code 
changes, building the current JDK again and then running the updated 
test, I would just point the jtreg run to a differently older version of 
a JDK (which wouldn't have the fix) by using the 
-jdk:<path-to-downloaded-jdk>. I would then expect the test to fail with 
the expected issue.

It was just a convenience than anything more.

-Jaikiran

On 13/11/21 1:26 pm, Thomas St?fe wrote:
> Maybe the easiest way for you would be to get the source drop matching the
> binary JDK from the vendor of your JDK. Since you may also have
> vendor-specific changes (albeit rare, its possible).
>
> Cheers, Thomas
>
>
> On Sat, Nov 13, 2021 at 7:38 AM David Holmes <david.holmes at oracle.com>
> wrote:
>
>> On 13/11/2021 3:37 pm, Jaikiran Pai wrote:
>>> I got past this with an extensive workaround for now. I moved/copied
>>> that test case java file outside of the JDK source tree, then created a
>>> new/custom TEST.ROOT which is very minimal and has no reference to
>>> whitebox for bootlibs, then made sure the jtwork directory is also
>>> outside of the JDK source tree (so that the test is compiled afresh) and
>>> then ran that test. That helped, but it's only for this test since its
>>> requirements in the test are very minimal. I don't see a way to get past
>>> this if I have to run the wider range of jtreg tests that reside in the
>>> JDK source tree against a pre-built/downloaded Java 17 or any previous
>>> versions.
>> Basically you're not supposed to do that. You have to test a given
>> binary with the tests that existed when that binary was built. Many
>> things in the tests can change that will fail to run with an older JDK.
>>
>> In theory you can use the build number of the binary JDK to checkout the
>> tests corresponding to that build using the appropriate build tag.
>>
>> Cheers,
>> David
>>
>>> -Jaikiran
>>>
>>> On 13/11/21 10:26 am, Jaikiran Pai wrote:
>>>> Hello Leonid,
>>>>
>>>> On 13/11/21 9:38 am, Leonid Mesnik wrote:
>>>>> Hi
>>>>>
>>>>> It is a hotpost testing problem rather than a jtreg problem. So I?ve
>>>>> added
>>>>> hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>
>> alias.
>>>> Thank you for adding the right list.
>>>>> ...
>>>>> Could you please check that you use exactly the same sources during
>>>>> testing which have been used to build JDK.
>>>> Do you mean the sources of the JDK against which the test is being
>>>> run? I don't have those sources since this test runs against a
>>>> pre-built binary downloaded from https://jdk.java.net/17/
>>>>
>>>> -Jaikiran
>>>>

From duke at openjdk.java.net  Mon Nov 15 13:59:40 2021
From: duke at openjdk.java.net (Alan Hayward)
Date: Mon, 15 Nov 2021 13:59:40 GMT
Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v4]
In-Reply-To: <Gi4u7r31v1mCU91c5Z_GuK0IMLDBRoe0f1UgbHZPNpw=.128d30ba-4306-4124-8db7-8cd2ad5e6173@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
 <OLh8pzd5wf1j-KW7PI_0BRoxOEyPNxUvzfsUIW6tl9Y=.251fbd65-ba03-4b66-b132-e7fc32e676f1@github.com>
 <jsMk_t2ZCmDPuJxtAknP2eTvCg38xAJZ3AMkSz_bxDQ=.c23d56a6-f22f-4d93-8535-22369831a73b@github.com>
 <yyWNr32v4R15a7xWsOf2e5BcDAdsraKZsHDgC7fXpCc=.dfb1f70b-676f-4cf5-b3bb-e3a3ce09ef86@github.com>
 <mG3X2wbGNgWTvBJvptINmYx-_s4f8Xw4iIkNJkT1tsU=.29d21bb1-ae4e-4b2b-b9a3-6b4c2db0e52c@github.com>
 <Gi4u7r31v1mCU91c5Z_GuK0IMLDBRoe0f1UgbHZPNpw=.128d30ba-4306-4124-8db7-8cd2ad5e6173@github.com>
Message-ID: <8L4qvqk-eda9UU1BCA3i4yf3JVaJ0UMJxTDDLCO8XKg=.fc1bad40-8925-480d-a8a3-33f9c7650315@github.com>

On Mon, 15 Nov 2021 11:54:09 GMT, Andrew Dinn <adinn at openjdk.org> wrote:

> pauth_strip_verifiable

That name works for me.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6334

From hseigel at openjdk.java.net  Mon Nov 15 14:58:49 2021
From: hseigel at openjdk.java.net (Harold Seigel)
Date: Mon, 15 Nov 2021 14:58:49 GMT
Subject: RFR: 8276795: Deprecate seldom used CDS flags
Message-ID: <Sex9p6I6KaGfkEMPnyyuLPsgMM6KEm_WLeSuhUkHMs4=.e4fce608-e8ac-4014-8e6b-20a5348af607@github.com>

Please review this small change to deprecate seldom used CDS flags.  The flags will be deprecated in 18, obsoleted in 19, and removed in a later release.

The changes were tested with Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 3-5 on Linux x64.

Thanks, Harold

-------------

Commit messages:
 - 8276795: Deprecate seldom used CDS flags

Changes: https://git.openjdk.java.net/jdk/pull/6390/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6390&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8276795
  Stats: 6 lines in 2 files changed: 4 ins; 0 del; 2 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6390.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6390/head:pull/6390

PR: https://git.openjdk.java.net/jdk/pull/6390

From eosterlund at openjdk.java.net  Mon Nov 15 15:31:06 2021
From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=)
Date: Mon, 15 Nov 2021 15:31:06 GMT
Subject: RFR: 8259643: ZGC can return metaspace OOM prematurely [v3]
In-Reply-To: <V-_4hXiYrxvOCdZmmQuwz7zaHxMIZ6MlPg2fcdr2Y7M=.bb702064-3de5-4d56-a040-4883b9419bab@github.com>
References: <V-_4hXiYrxvOCdZmmQuwz7zaHxMIZ6MlPg2fcdr2Y7M=.bb702064-3de5-4d56-a040-4883b9419bab@github.com>
Message-ID: <xaADeyBGGUM8pun_puKpN_-FLtwo0ccT5nYPiDL-myM=.44e0817a-ad7d-4ca4-a18d-da5cc77c602f@github.com>

> There exists a race condition for ZGC metaspace allocations, where an allocation can throw OOM due to unbounded starvation from other threads. Towards the end of the allocation dance, we conceptually do this:
> 
> 1. full_gc()
> 2. final_allocation_attempt()
> 
> And if we still fail at 2 after doing a full GC, we conclude that there isn't enough metaspace memory. However, if the thread gets preempted between 1 and 2, then an unbounded number of metaspace allocations from other threads can fill up the entire metaspace, making the final allocation attempt fail and hence throw. This can cause a situation where almost the entire metaspace is unreachable from roots, yet we throw OOM. I managed to reproduce this with the right sleeps.
> 
> The way we deal with this particular issue for heap allocations, is to have an allocation request queue, and satisfy those allocations before others, preventing starvation. My solution to this metaspace OOM problem will be to basically do exactly that - have a queue of "critical" allocations, that get precedence over normal metaspace allocations.
> 
> The solution should work for other concurrent GCs (who likely have the same issue), but I only tried this with ZGC, so I am only hooking in ZGC to the new API (for concurrently unloading GCs to manage critical metaspace allocations) at this point.
> 
> Passes ZGC tests from tier 1-5, and the particular test that failed (with the JVM sleeps that make it fail deterministically).

Erik ?sterlund has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits:

 - Merge branch 'master' into 8259643_load_unload_bug
 - polish code alignment and rename register/unregister to add/remove
 - 8259643: ZGC can return metaspace OOM prematurely

-------------

Changes: https://git.openjdk.java.net/jdk/pull/2289/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2289&range=02
  Stats: 298 lines in 6 files changed: 276 ins; 17 del; 5 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2289.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2289/head:pull/2289

PR: https://git.openjdk.java.net/jdk/pull/2289

From eosterlund at openjdk.java.net  Mon Nov 15 15:47:51 2021
From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=)
Date: Mon, 15 Nov 2021 15:47:51 GMT
Subject: RFR: 8259643: ZGC can return metaspace OOM prematurely [v3]
In-Reply-To: <xaADeyBGGUM8pun_puKpN_-FLtwo0ccT5nYPiDL-myM=.44e0817a-ad7d-4ca4-a18d-da5cc77c602f@github.com>
References: <V-_4hXiYrxvOCdZmmQuwz7zaHxMIZ6MlPg2fcdr2Y7M=.bb702064-3de5-4d56-a040-4883b9419bab@github.com>
 <xaADeyBGGUM8pun_puKpN_-FLtwo0ccT5nYPiDL-myM=.44e0817a-ad7d-4ca4-a18d-da5cc77c602f@github.com>
Message-ID: <pYOnp2Jc5Xd27xMhU_6EqmId-HdxBMMU69RufaBCTxM=.b3711ca8-3031-440c-9ff6-2b314a7c48fa@github.com>

On Mon, 15 Nov 2021 15:31:06 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote:

>> There exists a race condition for ZGC metaspace allocations, where an allocation can throw OOM due to unbounded starvation from other threads. Towards the end of the allocation dance, we conceptually do this:
>> 
>> 1. full_gc()
>> 2. final_allocation_attempt()
>> 
>> And if we still fail at 2 after doing a full GC, we conclude that there isn't enough metaspace memory. However, if the thread gets preempted between 1 and 2, then an unbounded number of metaspace allocations from other threads can fill up the entire metaspace, making the final allocation attempt fail and hence throw. This can cause a situation where almost the entire metaspace is unreachable from roots, yet we throw OOM. I managed to reproduce this with the right sleeps.
>> 
>> The way we deal with this particular issue for heap allocations, is to have an allocation request queue, and satisfy those allocations before others, preventing starvation. My solution to this metaspace OOM problem will be to basically do exactly that - have a queue of "critical" allocations, that get precedence over normal metaspace allocations.
>> 
>> The solution should work for other concurrent GCs (who likely have the same issue), but I only tried this with ZGC, so I am only hooking in ZGC to the new API (for concurrently unloading GCs to manage critical metaspace allocations) at this point.
>> 
>> Passes ZGC tests from tier 1-5, and the particular test that failed (with the JVM sleeps that make it fail deterministically).
>
> Erik ?sterlund has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits:
> 
>  - Merge branch 'master' into 8259643_load_unload_bug
>  - polish code alignment and rename register/unregister to add/remove
>  - 8259643: ZGC can return metaspace OOM prematurely

Sorry I ran out of steam with this patch a few months ago. Looks like I already had 3 reviews so I think I am ready to go. I rebased with the latest mainline, which involved just a small fix to what kind of lock (not safepoint checking lock with new rank) is used due to all the lock ranking changes as of lately.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2289

From eosterlund at openjdk.java.net  Mon Nov 15 16:21:06 2021
From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=)
Date: Mon, 15 Nov 2021 16:21:06 GMT
Subject: RFR: 8259643: ZGC can return metaspace OOM prematurely [v4]
In-Reply-To: <V-_4hXiYrxvOCdZmmQuwz7zaHxMIZ6MlPg2fcdr2Y7M=.bb702064-3de5-4d56-a040-4883b9419bab@github.com>
References: <V-_4hXiYrxvOCdZmmQuwz7zaHxMIZ6MlPg2fcdr2Y7M=.bb702064-3de5-4d56-a040-4883b9419bab@github.com>
Message-ID: <6fQXcaACVi0FL90kaNFWUHJ59Ki5eq_06AC2QCY1ZeU=.9dfc8d8f-2f02-4222-875e-67b226030f16@github.com>

> There exists a race condition for ZGC metaspace allocations, where an allocation can throw OOM due to unbounded starvation from other threads. Towards the end of the allocation dance, we conceptually do this:
> 
> 1. full_gc()
> 2. final_allocation_attempt()
> 
> And if we still fail at 2 after doing a full GC, we conclude that there isn't enough metaspace memory. However, if the thread gets preempted between 1 and 2, then an unbounded number of metaspace allocations from other threads can fill up the entire metaspace, making the final allocation attempt fail and hence throw. This can cause a situation where almost the entire metaspace is unreachable from roots, yet we throw OOM. I managed to reproduce this with the right sleeps.
> 
> The way we deal with this particular issue for heap allocations, is to have an allocation request queue, and satisfy those allocations before others, preventing starvation. My solution to this metaspace OOM problem will be to basically do exactly that - have a queue of "critical" allocations, that get precedence over normal metaspace allocations.
> 
> The solution should work for other concurrent GCs (who likely have the same issue), but I only tried this with ZGC, so I am only hooking in ZGC to the new API (for concurrently unloading GCs to manage critical metaspace allocations) at this point.
> 
> Passes ZGC tests from tier 1-5, and the particular test that failed (with the JVM sleeps that make it fail deterministically).

Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision:

  lock rank update

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/2289/files
  - new: https://git.openjdk.java.net/jdk/pull/2289/files/1f45fa7f..012603f4

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2289&range=03
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2289&range=02-03

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2289.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2289/head:pull/2289

PR: https://git.openjdk.java.net/jdk/pull/2289

From eosterlund at openjdk.java.net  Mon Nov 15 16:43:19 2021
From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=)
Date: Mon, 15 Nov 2021 16:43:19 GMT
Subject: RFR: 8259643: ZGC can return metaspace OOM prematurely [v5]
In-Reply-To: <V-_4hXiYrxvOCdZmmQuwz7zaHxMIZ6MlPg2fcdr2Y7M=.bb702064-3de5-4d56-a040-4883b9419bab@github.com>
References: <V-_4hXiYrxvOCdZmmQuwz7zaHxMIZ6MlPg2fcdr2Y7M=.bb702064-3de5-4d56-a040-4883b9419bab@github.com>
Message-ID: <2xi37tsSVLQx5aVD3A5spaHygDDEOUKQTmh4cScBDjg=.7d053595-5a0f-43d8-b518-f0264c8e17e3@github.com>

> There exists a race condition for ZGC metaspace allocations, where an allocation can throw OOM due to unbounded starvation from other threads. Towards the end of the allocation dance, we conceptually do this:
> 
> 1. full_gc()
> 2. final_allocation_attempt()
> 
> And if we still fail at 2 after doing a full GC, we conclude that there isn't enough metaspace memory. However, if the thread gets preempted between 1 and 2, then an unbounded number of metaspace allocations from other threads can fill up the entire metaspace, making the final allocation attempt fail and hence throw. This can cause a situation where almost the entire metaspace is unreachable from roots, yet we throw OOM. I managed to reproduce this with the right sleeps.
> 
> The way we deal with this particular issue for heap allocations, is to have an allocation request queue, and satisfy those allocations before others, preventing starvation. My solution to this metaspace OOM problem will be to basically do exactly that - have a queue of "critical" allocations, that get precedence over normal metaspace allocations.
> 
> The solution should work for other concurrent GCs (who likely have the same issue), but I only tried this with ZGC, so I am only hooking in ZGC to the new API (for concurrently unloading GCs to manage critical metaspace allocations) at this point.
> 
> Passes ZGC tests from tier 1-5, and the particular test that failed (with the JVM sleeps that make it fail deterministically).

Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision:

  style polish in ZGC code

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/2289/files
  - new: https://git.openjdk.java.net/jdk/pull/2289/files/012603f4..9c6f1041

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2289&range=04
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2289&range=03-04

  Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2289.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2289/head:pull/2289

PR: https://git.openjdk.java.net/jdk/pull/2289

From pliden at openjdk.java.net  Mon Nov 15 16:43:22 2021
From: pliden at openjdk.java.net (Per Liden)
Date: Mon, 15 Nov 2021 16:43:22 GMT
Subject: RFR: 8259643: ZGC can return metaspace OOM prematurely [v5]
In-Reply-To: <2xi37tsSVLQx5aVD3A5spaHygDDEOUKQTmh4cScBDjg=.7d053595-5a0f-43d8-b518-f0264c8e17e3@github.com>
References: <V-_4hXiYrxvOCdZmmQuwz7zaHxMIZ6MlPg2fcdr2Y7M=.bb702064-3de5-4d56-a040-4883b9419bab@github.com>
 <2xi37tsSVLQx5aVD3A5spaHygDDEOUKQTmh4cScBDjg=.7d053595-5a0f-43d8-b518-f0264c8e17e3@github.com>
Message-ID: <Muqdc3x-QUjgz12VV80gKnytMowCFEnT9MF1Dx2rNbo=.f079440f-71bb-4e44-bd15-9bad13646dd2@github.com>

On Mon, 15 Nov 2021 16:40:26 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote:

>> There exists a race condition for ZGC metaspace allocations, where an allocation can throw OOM due to unbounded starvation from other threads. Towards the end of the allocation dance, we conceptually do this:
>> 
>> 1. full_gc()
>> 2. final_allocation_attempt()
>> 
>> And if we still fail at 2 after doing a full GC, we conclude that there isn't enough metaspace memory. However, if the thread gets preempted between 1 and 2, then an unbounded number of metaspace allocations from other threads can fill up the entire metaspace, making the final allocation attempt fail and hence throw. This can cause a situation where almost the entire metaspace is unreachable from roots, yet we throw OOM. I managed to reproduce this with the right sleeps.
>> 
>> The way we deal with this particular issue for heap allocations, is to have an allocation request queue, and satisfy those allocations before others, preventing starvation. My solution to this metaspace OOM problem will be to basically do exactly that - have a queue of "critical" allocations, that get precedence over normal metaspace allocations.
>> 
>> The solution should work for other concurrent GCs (who likely have the same issue), but I only tried this with ZGC, so I am only hooking in ZGC to the new API (for concurrently unloading GCs to manage critical metaspace allocations) at this point.
>> 
>> Passes ZGC tests from tier 1-5, and the particular test that failed (with the JVM sleeps that make it fail deterministically).
>
> Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision:
> 
>   style polish in ZGC code

Still looks good.

-------------

Marked as reviewed by pliden (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/2289

From coleenp at openjdk.java.net  Mon Nov 15 16:52:37 2021
From: coleenp at openjdk.java.net (Coleen Phillimore)
Date: Mon, 15 Nov 2021 16:52:37 GMT
Subject: RFR: 8259643: ZGC can return metaspace OOM prematurely [v5]
In-Reply-To: <2xi37tsSVLQx5aVD3A5spaHygDDEOUKQTmh4cScBDjg=.7d053595-5a0f-43d8-b518-f0264c8e17e3@github.com>
References: <V-_4hXiYrxvOCdZmmQuwz7zaHxMIZ6MlPg2fcdr2Y7M=.bb702064-3de5-4d56-a040-4883b9419bab@github.com>
 <2xi37tsSVLQx5aVD3A5spaHygDDEOUKQTmh4cScBDjg=.7d053595-5a0f-43d8-b518-f0264c8e17e3@github.com>
Message-ID: <_k1X1jd1-Em51dcazX20TXmjagqTVc7MnaUWIFtIwk4=.0e106ef7-010d-4031-ac8f-d3b5d783847d@github.com>

On Mon, 15 Nov 2021 16:43:19 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote:

>> There exists a race condition for ZGC metaspace allocations, where an allocation can throw OOM due to unbounded starvation from other threads. Towards the end of the allocation dance, we conceptually do this:
>> 
>> 1. full_gc()
>> 2. final_allocation_attempt()
>> 
>> And if we still fail at 2 after doing a full GC, we conclude that there isn't enough metaspace memory. However, if the thread gets preempted between 1 and 2, then an unbounded number of metaspace allocations from other threads can fill up the entire metaspace, making the final allocation attempt fail and hence throw. This can cause a situation where almost the entire metaspace is unreachable from roots, yet we throw OOM. I managed to reproduce this with the right sleeps.
>> 
>> The way we deal with this particular issue for heap allocations, is to have an allocation request queue, and satisfy those allocations before others, preventing starvation. My solution to this metaspace OOM problem will be to basically do exactly that - have a queue of "critical" allocations, that get precedence over normal metaspace allocations.
>> 
>> The solution should work for other concurrent GCs (who likely have the same issue), but I only tried this with ZGC, so I am only hooking in ZGC to the new API (for concurrently unloading GCs to manage critical metaspace allocations) at this point.
>> 
>> Passes ZGC tests from tier 1-5, and the particular test that failed (with the JVM sleeps that make it fail deterministically).
>
> Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision:
> 
>   style polish in ZGC code

src/hotspot/share/runtime/mutexLocker.cpp line 248:

> 246: 
> 247:   def(Metaspace_lock               , PaddedMutex  , nosafepoint-3);
> 248:   def(MetaspaceCritical_lock       , PaddedMonitor, nosafepoint-1, true);

You don't need the true parameter.  That's the default for nosafepoint locks.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2289

From stuefe at openjdk.java.net  Mon Nov 15 17:52:34 2021
From: stuefe at openjdk.java.net (Thomas Stuefe)
Date: Mon, 15 Nov 2021 17:52:34 GMT
Subject: RFR: 8259643: ZGC can return metaspace OOM prematurely [v5]
In-Reply-To: <2xi37tsSVLQx5aVD3A5spaHygDDEOUKQTmh4cScBDjg=.7d053595-5a0f-43d8-b518-f0264c8e17e3@github.com>
References: <V-_4hXiYrxvOCdZmmQuwz7zaHxMIZ6MlPg2fcdr2Y7M=.bb702064-3de5-4d56-a040-4883b9419bab@github.com>
 <2xi37tsSVLQx5aVD3A5spaHygDDEOUKQTmh4cScBDjg=.7d053595-5a0f-43d8-b518-f0264c8e17e3@github.com>
Message-ID: <E7-FHu4swbS5KXEArqv2Tob_4EcZ-6FS8jwrrWt7ZWY=.860d1840-7ed2-4f0d-b76c-6cdb2fe34b4a@github.com>

On Mon, 15 Nov 2021 16:43:19 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote:

>> There exists a race condition for ZGC metaspace allocations, where an allocation can throw OOM due to unbounded starvation from other threads. Towards the end of the allocation dance, we conceptually do this:
>> 
>> 1. full_gc()
>> 2. final_allocation_attempt()
>> 
>> And if we still fail at 2 after doing a full GC, we conclude that there isn't enough metaspace memory. However, if the thread gets preempted between 1 and 2, then an unbounded number of metaspace allocations from other threads can fill up the entire metaspace, making the final allocation attempt fail and hence throw. This can cause a situation where almost the entire metaspace is unreachable from roots, yet we throw OOM. I managed to reproduce this with the right sleeps.
>> 
>> The way we deal with this particular issue for heap allocations, is to have an allocation request queue, and satisfy those allocations before others, preventing starvation. My solution to this metaspace OOM problem will be to basically do exactly that - have a queue of "critical" allocations, that get precedence over normal metaspace allocations.
>> 
>> The solution should work for other concurrent GCs (who likely have the same issue), but I only tried this with ZGC, so I am only hooking in ZGC to the new API (for concurrently unloading GCs to manage critical metaspace allocations) at this point.
>> 
>> Passes ZGC tests from tier 1-5, and the particular test that failed (with the JVM sleeps that make it fail deterministically).
>
> Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision:
> 
>   style polish in ZGC code

Nice to have you back. Change looks still good.

Thanks, Thomas

-------------

PR: https://git.openjdk.java.net/jdk/pull/2289

From dlong at openjdk.java.net  Mon Nov 15 21:12:43 2021
From: dlong at openjdk.java.net (Dean Long)
Date: Mon, 15 Nov 2021 21:12:43 GMT
Subject: RFR: 8276095: ciReplay: replay failure due to incomplete
 ciMethodData information [v4]
In-Reply-To: <ZrQ7kKztEgecd_9L3n7WYhuymaTquc7aVgJK_8iDiJA=.021437e8-805f-4642-b792-1880515959e2@github.com>
References: <EjhE5ReChm9c2C6F0K-WE94bWhagEW7y7qcWLlp6EvY=.a49a439b-32f5-40b1-a81c-47ed0d8047ee@github.com>
 <R-41UNuBHRN7YthwZ5eaXjUT_Jv5GTJsYFpZlMopisA=.9c9bf2c5-c087-4593-bcbf-47eb5707bbdd@github.com>
 <ZrQ7kKztEgecd_9L3n7WYhuymaTquc7aVgJK_8iDiJA=.021437e8-805f-4642-b792-1880515959e2@github.com>
Message-ID: <DudJH7nvIgNOCYV9aNWcVV4DMcozWRJYVYzweeK3BIM=.8ae57d35-456f-4e16-ac59-d59d8d90dad3@github.com>

On Mon, 15 Nov 2021 12:56:35 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

>> Dean Long has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   strengthen version check
>
>> > I guess we could leave this in for old replay files with the initialization further down in ciReplay::initialize() if _version < 1.
>> 
>> Yes, it's necessary to parse the value for old replay files, but the value is never used. I'm not sure what you are suggesting about the initialization further down.
> 
> Ok, I was missing that the parsed value has no effect anyways. Then you do not need this code for the initialization on L1416 for `_version <  1`:
> 
> m->_current_mileage = rec->_current_mileage;
> 
> 
>> > However, thinking again about this, it should not happen that we parse a version number that's not supported
>> 
>> A user could be using an older JDK but accidentally try a newer replay file. That was the scenario I had in mind.
> 
> That's a valid point for emitting a warning instead.
> 
> Changes look good to me!

Thanks @chhagedorn and @vnkozlov.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6344

From dlong at openjdk.java.net  Mon Nov 15 21:12:43 2021
From: dlong at openjdk.java.net (Dean Long)
Date: Mon, 15 Nov 2021 21:12:43 GMT
Subject: Integrated: 8276095: ciReplay: replay failure due to incomplete
 ciMethodData information
In-Reply-To: <EjhE5ReChm9c2C6F0K-WE94bWhagEW7y7qcWLlp6EvY=.a49a439b-32f5-40b1-a81c-47ed0d8047ee@github.com>
References: <EjhE5ReChm9c2C6F0K-WE94bWhagEW7y7qcWLlp6EvY=.a49a439b-32f5-40b1-a81c-47ed0d8047ee@github.com>
Message-ID: <UnLqfDRCDU2sVboHyMMiQoLKEjEOKrvDuw7545hdknc=.76631c9c-95c1-4914-b217-25f11784f79f@github.com>

On Thu, 11 Nov 2021 03:28:40 GMT, Dean Long <dlong at openjdk.org> wrote:

> The replay data was missing MethodData::_invocation_counter.  Adding it seems to fix the problem.  @rwestrel please verify if it works for you.  Also, with this change:
> 1. added a version number to the replay file
> 2. removed unnused ci fields
> 3. corrected comment in TestLambdas.java

This pull request has now been integrated.

Changeset: 9326eb14
Author:    Dean Long <dlong at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/9326eb14617bf08e3376f854fc022e11d1ef34dd
Stats:     63 lines in 8 files changed: 26 ins; 27 del; 10 mod

8276095: ciReplay: replay failure due to incomplete ciMethodData information

Reviewed-by: chagedorn, kvn

-------------

PR: https://git.openjdk.java.net/jdk/pull/6344

From psandoz at openjdk.java.net  Mon Nov 15 21:51:46 2021
From: psandoz at openjdk.java.net (Paul Sandoz)
Date: Mon, 15 Nov 2021 21:51:46 GMT
Subject: Integrated: 8271515: Integration of JEP 417: Vector API (Third
 Incubator)
In-Reply-To: <_QQ9ntdJJfzVcAGrbjev0ZM-xNfD4wNATphnXkb-Y00=.bbf46985-8776-4dda-ada5-b15ab50774aa@github.com>
References: <_QQ9ntdJJfzVcAGrbjev0ZM-xNfD4wNATphnXkb-Y00=.bbf46985-8776-4dda-ada5-b15ab50774aa@github.com>
Message-ID: <dPgMZd1yMVslWOP4kijrCtTjXvmgRPgTbM0JJocHREE=.d301031b-62dc-4955-8b50-bba2f103d686@github.com>

On Fri, 8 Oct 2021 21:25:26 GMT, Paul Sandoz <psandoz at openjdk.org> wrote:

> This PR improves the performance of vector operations that accept masks on architectures that support masking in hardware, specifically Intel AVX512 and ARM SVE.
> 
> On architectures that do not support masking in hardware the same technique as before is applied to most operations, specifically composition using blend.
> 
> Masked loads/stores are a special form of masked operation that require additional care to ensure out-of-bounds access throw exceptions. The range checking has not been fully optimized and will require further work.
> 
> No API enhancements were required and only a few additional tests were needed.

This pull request has now been integrated.

Changeset: a59c9b2a
Author:    Paul Sandoz <psandoz at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/a59c9b2ac277d6ff6be1700d91ff389f137e61ca
Stats:     21982 lines in 104 files changed: 16217 ins; 2087 del; 3678 mod

8271515: Integration of JEP 417: Vector API (Third Incubator)

Co-authored-by: Sandhya Viswanathan <sviswanathan at openjdk.org>
Co-authored-by: Jatin Bhateja <jbhateja at openjdk.org>
Co-authored-by: Ningsheng Jian <njian at openjdk.org>
Co-authored-by: Xiaohong Gong <xgong at openjdk.org>
Co-authored-by: Eric Liu <eliu at openjdk.org>
Co-authored-by: Jie Fu <jiefu at openjdk.org>
Co-authored-by: Vladimir Ivanov <vlivanov at openjdk.org>
Co-authored-by: John R Rose <jrose at openjdk.org>
Co-authored-by: Paul Sandoz <psandoz at openjdk.org>
Co-authored-by: Rado Smogura <mail at smogura.eu>
Reviewed-by: kvn, sviswanathan, ngasson

-------------

PR: https://git.openjdk.java.net/jdk/pull/5873

From david.holmes at oracle.com  Mon Nov 15 21:59:28 2021
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 16 Nov 2021 07:59:28 +1000
Subject: jtreg cannot be run against a pre-built/downloaded JDK anymore?
In-Reply-To: <468cdc88-86de-8fde-a1b7-44837b9453a5@gmail.com>
References: <5405e633-00b4-949f-a982-d1057187d21a@gmail.com>
 <BYAPR10MB2471FBE391C0C48115E285839F969@BYAPR10MB2471.namprd10.prod.outlook.com>
 <57f314df-33fd-1af1-8468-19d17b6d69ad@gmail.com>
 <680a2669-d5f8-a596-e3ad-071b9fb66233@gmail.com>
 <1102c99b-e4e7-5862-fd81-cb39af4c3a81@oracle.com>
 <CAA-vtUzt=efM-HRZP3cM1tBFbGWjZj-XGjtCZa6dC3kS5NZ7yQ@mail.gmail.com>
 <468cdc88-86de-8fde-a1b7-44837b9453a5@gmail.com>
Message-ID: <98cb4719-387d-fe60-e368-e9805917dba6@oracle.com>

On 15/11/2021 11:45 pm, Jaikiran Pai wrote:
> The way I used to use this previously was more for convenience than 
> anything more. Very specifically, I used to do something like this:
> 
> - Work on some bug fix with latest JDK master source repo.
> 
> - Add a jtreg test to verify the fix
> 
> - Send out a PR and wait for reviews
> 
> - On some occasions, the review suggestions include relatively big 
> changes to the jtreg test case. In such cases, I used to do those 
> changes in the test, verify that the test still continues to pass. 
> However, I would even want to make sure the test still reproduces the 
> original issue. So instead of git reverting only the source code 
> changes, building the current JDK again and then running the updated 
> test, I would just point the jtreg run to a differently older version of 
> a JDK (which wouldn't have the fix) by using the 
> -jdk:<path-to-downloaded-jdk>. I would then expect the test to fail with 
> the expected issue.
> 
> It was just a convenience than anything more.

Sure and most of the time that will work. But if the test relies on 
something that is only present in the later JDK binary then obviously 
the test will fail.

Cheers,
David

> -Jaikiran
> 
> On 13/11/21 1:26 pm, Thomas St?fe wrote:
>> Maybe the easiest way for you would be to get the source drop matching 
>> the
>> binary JDK from the vendor of your JDK. Since you may also have
>> vendor-specific changes (albeit rare, its possible).
>>
>> Cheers, Thomas
>>
>>
>> On Sat, Nov 13, 2021 at 7:38 AM David Holmes <david.holmes at oracle.com>
>> wrote:
>>
>>> On 13/11/2021 3:37 pm, Jaikiran Pai wrote:
>>>> I got past this with an extensive workaround for now. I moved/copied
>>>> that test case java file outside of the JDK source tree, then created a
>>>> new/custom TEST.ROOT which is very minimal and has no reference to
>>>> whitebox for bootlibs, then made sure the jtwork directory is also
>>>> outside of the JDK source tree (so that the test is compiled afresh) 
>>>> and
>>>> then ran that test. That helped, but it's only for this test since its
>>>> requirements in the test are very minimal. I don't see a way to get 
>>>> past
>>>> this if I have to run the wider range of jtreg tests that reside in the
>>>> JDK source tree against a pre-built/downloaded Java 17 or any previous
>>>> versions.
>>> Basically you're not supposed to do that. You have to test a given
>>> binary with the tests that existed when that binary was built. Many
>>> things in the tests can change that will fail to run with an older JDK.
>>>
>>> In theory you can use the build number of the binary JDK to checkout the
>>> tests corresponding to that build using the appropriate build tag.
>>>
>>> Cheers,
>>> David
>>>
>>>> -Jaikiran
>>>>
>>>> On 13/11/21 10:26 am, Jaikiran Pai wrote:
>>>>> Hello Leonid,
>>>>>
>>>>> On 13/11/21 9:38 am, Leonid Mesnik wrote:
>>>>>> Hi
>>>>>>
>>>>>> It is a hotpost testing problem rather than a jtreg problem. So I?ve
>>>>>> added
>>>>>> hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>
>>> alias.
>>>>> Thank you for adding the right list.
>>>>>> ...
>>>>>> Could you please check that you use exactly the same sources during
>>>>>> testing which have been used to build JDK.
>>>>> Do you mean the sources of the JDK against which the test is being
>>>>> run? I don't have those sources since this test runs against a
>>>>> pre-built binary downloaded from https://jdk.java.net/17/
>>>>>
>>>>> -Jaikiran
>>>>>

From dholmes at openjdk.java.net  Mon Nov 15 22:37:36 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Mon, 15 Nov 2021 22:37:36 GMT
Subject: RFR: 8276795: Deprecate seldom used CDS flags
In-Reply-To: <Sex9p6I6KaGfkEMPnyyuLPsgMM6KEm_WLeSuhUkHMs4=.e4fce608-e8ac-4014-8e6b-20a5348af607@github.com>
References: <Sex9p6I6KaGfkEMPnyyuLPsgMM6KEm_WLeSuhUkHMs4=.e4fce608-e8ac-4014-8e6b-20a5348af607@github.com>
Message-ID: <C6vvqIIOM68PT48XJa0hLoMfpKATByK-YpJ41P9SdG4=.1ae0136a-34bc-42dc-9f14-9703dbfcde99@github.com>

On Mon, 15 Nov 2021 14:50:43 GMT, Harold Seigel <hseigel at openjdk.org> wrote:

> Please review this small change to deprecate seldom used CDS flags.  The flags will be deprecated in 18, obsoleted in 19, and removed in a later release.
> 
> The changes were tested with Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 3-5 on Linux x64.
> 
> Thanks, Harold

Hi Harold,

You also need to add "(Deprecated)" to the description of these flags in globals.hpp.

You should also add these flags to the VMDeprecatedOptions.java test.

Thanks,
David

-------------

Changes requested by dholmes (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6390

From stuefe at openjdk.java.net  Tue Nov 16 06:59:54 2021
From: stuefe at openjdk.java.net (Thomas Stuefe)
Date: Tue, 16 Nov 2021 06:59:54 GMT
Subject: RFR: JDK-8277172: Remove stray comment mentioning
 instr_size_for_decode_klass_not_null on x64
Message-ID: <dRPmmIFu3WvtIERt2hsS53A75eGRQbAtYXMRt-onH9c=.85820b96-bc05-438c-b7d1-98ac47692afd@github.com>

Trivial cleanup.

https://bugs.openjdk.java.net/browse/JDK-8241825 removed `instr_size_for_decode_klass_not_null()` on x64 but left a comment in place. Remove obsolete comment.

-------------

Commit messages:
 - remove comment

Changes: https://git.openjdk.java.net/jdk/pull/6384/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6384&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8277172
  Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6384.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6384/head:pull/6384

PR: https://git.openjdk.java.net/jdk/pull/6384

From dholmes at openjdk.java.net  Tue Nov 16 07:43:40 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Tue, 16 Nov 2021 07:43:40 GMT
Subject: RFR: JDK-8277172: Remove stray comment mentioning
 instr_size_for_decode_klass_not_null on x64
In-Reply-To: <dRPmmIFu3WvtIERt2hsS53A75eGRQbAtYXMRt-onH9c=.85820b96-bc05-438c-b7d1-98ac47692afd@github.com>
References: <dRPmmIFu3WvtIERt2hsS53A75eGRQbAtYXMRt-onH9c=.85820b96-bc05-438c-b7d1-98ac47692afd@github.com>
Message-ID: <Ob0MSrv-ttdM8AsrpSUl-lm_-WyeRuBDBj4F--Sot4o=.51e1eb50-3384-437a-8646-e54d1e7d97e8@github.com>

On Mon, 15 Nov 2021 09:03:58 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> Trivial cleanup.
> 
> https://bugs.openjdk.java.net/browse/JDK-8241825 removed `instr_size_for_decode_klass_not_null()` on x64 but left a comment in place. Remove obsolete comment.

Looks good and trivial.

Thanks,
David

-------------

Marked as reviewed by dholmes (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6384

From stuefe at openjdk.java.net  Tue Nov 16 07:52:41 2021
From: stuefe at openjdk.java.net (Thomas Stuefe)
Date: Tue, 16 Nov 2021 07:52:41 GMT
Subject: RFR: JDK-8277172: Remove stray comment mentioning
 instr_size_for_decode_klass_not_null on x64
In-Reply-To: <Ob0MSrv-ttdM8AsrpSUl-lm_-WyeRuBDBj4F--Sot4o=.51e1eb50-3384-437a-8646-e54d1e7d97e8@github.com>
References: <dRPmmIFu3WvtIERt2hsS53A75eGRQbAtYXMRt-onH9c=.85820b96-bc05-438c-b7d1-98ac47692afd@github.com>
 <Ob0MSrv-ttdM8AsrpSUl-lm_-WyeRuBDBj4F--Sot4o=.51e1eb50-3384-437a-8646-e54d1e7d97e8@github.com>
Message-ID: <RdWsVelU7FF2wlGVkRM7wz1d6cliVKL3di6MDlb9qxA=.cfaeb2fa-6060-4c74-acdc-37560f94668a@github.com>

On Tue, 16 Nov 2021 07:41:00 GMT, David Holmes <dholmes at openjdk.org> wrote:

> Looks good and trivial.
> 
> Thanks, David

Thanks David.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6384

From stuefe at openjdk.java.net  Tue Nov 16 07:52:42 2021
From: stuefe at openjdk.java.net (Thomas Stuefe)
Date: Tue, 16 Nov 2021 07:52:42 GMT
Subject: Integrated: JDK-8277172: Remove stray comment mentioning
 instr_size_for_decode_klass_not_null on x64
In-Reply-To: <dRPmmIFu3WvtIERt2hsS53A75eGRQbAtYXMRt-onH9c=.85820b96-bc05-438c-b7d1-98ac47692afd@github.com>
References: <dRPmmIFu3WvtIERt2hsS53A75eGRQbAtYXMRt-onH9c=.85820b96-bc05-438c-b7d1-98ac47692afd@github.com>
Message-ID: <yUKgszeztghKFoVFBaff--bQmt6nrbrVy1yA1AJS2p8=.8f758610-c290-4da2-b674-2d70b3c98571@github.com>

On Mon, 15 Nov 2021 09:03:58 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> Trivial cleanup.
> 
> https://bugs.openjdk.java.net/browse/JDK-8241825 removed `instr_size_for_decode_klass_not_null()` on x64 but left a comment in place. Remove obsolete comment.

This pull request has now been integrated.

Changeset: 7719a74c
Author:    Thomas Stuefe <stuefe at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/7719a74cec8c47fd036226b520a5fce7887386da
Stats:     2 lines in 1 file changed: 0 ins; 2 del; 0 mod

8277172: Remove stray comment mentioning instr_size_for_decode_klass_not_null on x64

Reviewed-by: dholmes

-------------

PR: https://git.openjdk.java.net/jdk/pull/6384

From duke at openjdk.java.net  Tue Nov 16 08:22:22 2021
From: duke at openjdk.java.net (Alan Hayward)
Date: Tue, 16 Nov 2021 08:22:22 GMT
Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v5]
In-Reply-To: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
Message-ID: <T7FVhJJYx_IteKnevRld1x_lkG3UTuhJ_hxdhpOv6Rk=.b0e887d4-97c0-48d3-9171-47c122601ae5@github.com>

> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One
> of its uses is to protect against ROP based attacks. This is done by
> signing the Link Register whenever it is stored on the stack, and
> authenticating the value when it is loaded back from the stack. If an
> attacker were to try to change control flow by editing the stack then
> the authentication check of the Link Register will fail, causing a
> segfault when the function returns.
> 
> On a system with PAC enabled, it is expected that all applications will
> be compiled with ROP protection. Fedora 33 and upwards already provide
> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of
> PAC instructions that exist in the NOP space - on hardware without PAC,
> these instructions act as NOPs, allowing backward compatibility for
> negligible performance cost (2 NOPs per non-leaf function).
> 
> Hardware is currently limited to the Apple M1 MacBooks. All testing has
> been done within a Fedora Docker image. A run of SpecJVM showed no
> difference to that of noise - which was surprising.
> 
> The most important part of this patch is simply compiling using branch
> protection provided by GCC/LLVM. This protects all C++ code from being
> used in ROP attacks, removing all static ROP gadgets from use.
> 
> The remainder of the patch adds ROP protection to runtime generated
> code, in both stubs and compiled Java code. Attacks here are much harder
> as ROP gadgets must be found dynamically at runtime. If/when AOT
> compilation is added to JDK, then all stubs and compiled Java will be
> susceptible ROP gadgets being found by static analysis and therefore
> potentially as vulnerable as C++ code.
> 
> There are a number of places where the VM changes control flow by
> rewriting the stack or otherwise. I?ve done some analysis as to how
> these could also be used for attacks (which I didn?t want to post here).
> These areas can be protected ensuring the pointers to various stubs and
> entry points are stored in memory as signed pointers. These changes are
> simple to make (they can be reduced to a type change in common code and
> a few addition sign/auth calls in the backend), but there a lot of them
> and the total code change is fairly large. I?m happy to provide a few
> work in progress patches.
> 
> In order to match the security benefits of the Apple Arm64e ABI across
> the whole of JDK, then all the changes mentioned above would be
> required.

Alan Hayward has updated the pull request incrementally with three additional commits since the last revision:

 - Rename pauth_authenticate_or_strip_return_address
 - Fix windows aarch64 by restoring pauth file split
 - Don't keep LR live across restore_live_registers

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6334/files
  - new: https://git.openjdk.java.net/jdk/pull/6334/files/2c27eb5e..dbd6bda2

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=04
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=03-04

  Stats: 318 lines in 6 files changed: 233 ins; 70 del; 15 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6334.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6334/head:pull/6334

PR: https://git.openjdk.java.net/jdk/pull/6334

From eosterlund at openjdk.java.net  Tue Nov 16 08:38:05 2021
From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=)
Date: Tue, 16 Nov 2021 08:38:05 GMT
Subject: RFR: 8259643: ZGC can return metaspace OOM prematurely [v6]
In-Reply-To: <V-_4hXiYrxvOCdZmmQuwz7zaHxMIZ6MlPg2fcdr2Y7M=.bb702064-3de5-4d56-a040-4883b9419bab@github.com>
References: <V-_4hXiYrxvOCdZmmQuwz7zaHxMIZ6MlPg2fcdr2Y7M=.bb702064-3de5-4d56-a040-4883b9419bab@github.com>
Message-ID: <9kt7RPhsAqWys4i2qaLwebf28SQdYzU7uhdefuQtVvQ=.9bbb723e-e2ec-4c57-ae58-64e57f46aa9f@github.com>

> There exists a race condition for ZGC metaspace allocations, where an allocation can throw OOM due to unbounded starvation from other threads. Towards the end of the allocation dance, we conceptually do this:
> 
> 1. full_gc()
> 2. final_allocation_attempt()
> 
> And if we still fail at 2 after doing a full GC, we conclude that there isn't enough metaspace memory. However, if the thread gets preempted between 1 and 2, then an unbounded number of metaspace allocations from other threads can fill up the entire metaspace, making the final allocation attempt fail and hence throw. This can cause a situation where almost the entire metaspace is unreachable from roots, yet we throw OOM. I managed to reproduce this with the right sleeps.
> 
> The way we deal with this particular issue for heap allocations, is to have an allocation request queue, and satisfy those allocations before others, preventing starvation. My solution to this metaspace OOM problem will be to basically do exactly that - have a queue of "critical" allocations, that get precedence over normal metaspace allocations.
> 
> The solution should work for other concurrent GCs (who likely have the same issue), but I only tried this with ZGC, so I am only hooking in ZGC to the new API (for concurrently unloading GCs to manage critical metaspace allocations) at this point.
> 
> Passes ZGC tests from tier 1-5, and the particular test that failed (with the JVM sleeps that make it fail deterministically).

Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision:

  return bool for Coleen

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/2289/files
  - new: https://git.openjdk.java.net/jdk/pull/2289/files/9c6f1041..df0cdc87

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2289&range=05
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2289&range=04-05

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2289.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2289/head:pull/2289

PR: https://git.openjdk.java.net/jdk/pull/2289

From eosterlund at openjdk.java.net  Tue Nov 16 08:38:07 2021
From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=)
Date: Tue, 16 Nov 2021 08:38:07 GMT
Subject: RFR: 8259643: ZGC can return metaspace OOM prematurely [v5]
In-Reply-To: <Muqdc3x-QUjgz12VV80gKnytMowCFEnT9MF1Dx2rNbo=.f079440f-71bb-4e44-bd15-9bad13646dd2@github.com>
References: <V-_4hXiYrxvOCdZmmQuwz7zaHxMIZ6MlPg2fcdr2Y7M=.bb702064-3de5-4d56-a040-4883b9419bab@github.com>
 <2xi37tsSVLQx5aVD3A5spaHygDDEOUKQTmh4cScBDjg=.7d053595-5a0f-43d8-b518-f0264c8e17e3@github.com>
 <Muqdc3x-QUjgz12VV80gKnytMowCFEnT9MF1Dx2rNbo=.f079440f-71bb-4e44-bd15-9bad13646dd2@github.com>
Message-ID: <z45OYw8CTjLh8MhEK7hrYK-lUGDUnNXAh53MWDznXgE=.ee7ff547-0f2c-4d93-81c2-5a02f83abcd0@github.com>

On Mon, 15 Nov 2021 16:39:22 GMT, Per Liden <pliden at openjdk.org> wrote:

>> Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   style polish in ZGC code
>
> Still looks good.

Thanks for the reviews @pliden @tstuefe and @coleenp!

-------------

PR: https://git.openjdk.java.net/jdk/pull/2289

From eosterlund at openjdk.java.net  Tue Nov 16 08:38:13 2021
From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=)
Date: Tue, 16 Nov 2021 08:38:13 GMT
Subject: RFR: 8259643: ZGC can return metaspace OOM prematurely [v5]
In-Reply-To: <_k1X1jd1-Em51dcazX20TXmjagqTVc7MnaUWIFtIwk4=.0e106ef7-010d-4031-ac8f-d3b5d783847d@github.com>
References: <V-_4hXiYrxvOCdZmmQuwz7zaHxMIZ6MlPg2fcdr2Y7M=.bb702064-3de5-4d56-a040-4883b9419bab@github.com>
 <2xi37tsSVLQx5aVD3A5spaHygDDEOUKQTmh4cScBDjg=.7d053595-5a0f-43d8-b518-f0264c8e17e3@github.com>
 <_k1X1jd1-Em51dcazX20TXmjagqTVc7MnaUWIFtIwk4=.0e106ef7-010d-4031-ac8f-d3b5d783847d@github.com>
Message-ID: <yj6V-IqvrtAoaAMMYSsJ56gt5TRW4I4PJB1gs63Z7uk=.5fa90be1-08d4-48c6-99e3-2285f710cb5a@github.com>

On Mon, 15 Nov 2021 16:49:31 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

>> Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   style polish in ZGC code
>
> src/hotspot/share/runtime/mutexLocker.cpp line 248:
> 
>> 246: 
>> 247:   def(Metaspace_lock               , PaddedMutex  , nosafepoint-3);
>> 248:   def(MetaspaceCritical_lock       , PaddedMonitor, nosafepoint-1, true);
> 
> You don't need the true parameter.  That's the default for nosafepoint locks.

Oh, okay. Fixed.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2289

From eosterlund at openjdk.java.net  Tue Nov 16 08:51:47 2021
From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=)
Date: Tue, 16 Nov 2021 08:51:47 GMT
Subject: RFR: 8266368: Inaccurate after_unwind hook in C2 exception handler
Message-ID: <c0IRGAICnZCtKjSL4QU9IliPSVnNZmzAQc2hrLDUXVw=.90d2b414-a202-4f2a-b69a-39783c24c36f@github.com>

When we throw an exception and unwind into a frame, the exception handler of that frame needs to call an after_unwind hook for the StackWatermark code, to support for concurrent stack processing. Unfortunately, for C2 frames, I inaccurately do this in OptoRuntime::rethrow_C, but the exception handler when unwinding into a C2 frame really is OptoRuntime::handle_exception_C.
The handle_exception_C code does walk frames to the caller though, which also pokes the StackWatermark code. So in the end, there is no real bug here, but it works for the wrong reasons. So I'd like to move the hook in rethrow_C to handle_exception_C.

-------------

Commit messages:
 - 8266368: Inaccurate after_unwind hook in C2 exception handler

Changes: https://git.openjdk.java.net/jdk/pull/6405/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6405&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8266368
  Stats: 12 lines in 2 files changed: 5 ins; 5 del; 2 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6405.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6405/head:pull/6405

PR: https://git.openjdk.java.net/jdk/pull/6405

From xxinliu at amazon.com  Tue Nov 16 09:44:13 2021
From: xxinliu at amazon.com (Liu, Xin)
Date: Tue, 16 Nov 2021 01:44:13 -0800
Subject: Is it necessary to check subtype for invokeinterface of private
 method?
Message-ID: <3d8e3679-30db-ad9c-d68a-6184fd2d008c@amazon.com>

Hi,

I am working on the regex performance in JDK-8274983. Even though it
begins with regex, the problem boils down to invokeinterface to the
private interface methods. Before hidden class (JDK-8238358), lambda
meta factory generates invokespecial for them. Now it generates
invokeinterface instead. C1 doesn't recognize the new code pattern and
generates an ic virtual call for the callsite. If many classes all
implement a common interface, they trash the ic stub because the
concrete classes are different. InvokePrivateInterfaceMethod.java with
-XX:+TraceCallFixup can reveal this pathological slowness.

Is it the intentional behavior of C1? I see that C2 actually generates
checkcast code sequence for this case. I would like to patch up C1
because C1 plays an important role for the startup time.

I have a patch to let C1 treats invokeinterface private interface
methods as invokespecial. In other words, I treat the private interface
methods as effective final. It runs pretty well until I encounter the
regression ./test/jdk/java/lang/invoke/PrivateInterfaceCall.java. That
leads me to the second question.

In my understanding, the code unsafeCastI2() is essentially a typecast
of a function pointer, isn't it?  My take of "unsafe" in "unsafeCastI2"
is that its behavior undefined, then why we need to check ICCE here?

    System.out.println("ICCE PrivateInterfaceCall.invokeDirect D1");
    shouldThrowICCE(() ->
PrivateInterfaceCall.invokeDirect(unsafeCastI2(new D1())))

    static I2 unsafeCastI2(Object obj) {
        try {
            MethodHandle mh = MethodHandles.identity(Object.class);
            mh = MethodHandles.explicitCastArguments(mh,
mh.type().changeReturnType(I2.class));
            return (I2)mh.invokeExact((Object) obj);
        } catch (Throwable e) {
            throw new Error(e);
        }
    }

In real world, how much meaningful we detect the error if we
accidentally invoke a private interface method where we actually don't
implement that interface. I think it's only possible via methodhandle
and jasm, right? if we say it's undefined behavior, I think we can skip
typecheck. It would make lambda code faster.

thanks,
--lx


From coleenp at openjdk.java.net  Tue Nov 16 13:36:49 2021
From: coleenp at openjdk.java.net (Coleen Phillimore)
Date: Tue, 16 Nov 2021 13:36:49 GMT
Subject: RFR: 8276177:
 nsk/jvmti/RedefineClasses/StressRedefineWithoutBytecodeCorruption failed with
 "assert(def_ik->is_being_redefined()) failed: should be being redefined to get
 here"
Message-ID: <bIc3bHZoI18uZLind1OWmwcRLt7MPg9XkQ4AgrMnXJQ=.1be19b81-5b3b-4b22-b00d-c5772f621916@github.com>

The boolean AND was sharing a flag with another thread, so the value of _misc_is_being_redefined was being set and reset with the other thread.  Moved the bit to AccessFlags which has space and an atomic set operation.
Tested with tier1-6, 7-8 in progress.

-------------

Commit messages:
 - 8276177: nsk/jvmti/RedefineClasses/StressRedefineWithoutBytecodeCorruption failed with "assert(def_ik->is_being_redefined()) failed: should be being redefined to get here"

Changes: https://git.openjdk.java.net/jdk/pull/6410/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6410&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8276177
  Stats: 13 lines in 3 files changed: 7 ins; 2 del; 4 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6410.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6410/head:pull/6410

PR: https://git.openjdk.java.net/jdk/pull/6410

From duke at openjdk.java.net  Tue Nov 16 14:23:07 2021
From: duke at openjdk.java.net (Alan Hayward)
Date: Tue, 16 Nov 2021 14:23:07 GMT
Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection
 for Linux/AArch64 [v6]
In-Reply-To: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
Message-ID: <B7nLJm7Uegt41cWe9U00ZDQ8cdVkjkJav7-aXeXNFaQ=.c72106b7-9dd1-4fe7-9285-42b0e6ffd597@github.com>

> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One
> of its uses is to protect against ROP based attacks. This is done by
> signing the Link Register whenever it is stored on the stack, and
> authenticating the value when it is loaded back from the stack. If an
> attacker were to try to change control flow by editing the stack then
> the authentication check of the Link Register will fail, causing a
> segfault when the function returns.
> 
> On a system with PAC enabled, it is expected that all applications will
> be compiled with ROP protection. Fedora 33 and upwards already provide
> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of
> PAC instructions that exist in the NOP space - on hardware without PAC,
> these instructions act as NOPs, allowing backward compatibility for
> negligible performance cost (2 NOPs per non-leaf function).
> 
> Hardware is currently limited to the Apple M1 MacBooks. All testing has
> been done within a Fedora Docker image. A run of SpecJVM showed no
> difference to that of noise - which was surprising.
> 
> The most important part of this patch is simply compiling using branch
> protection provided by GCC/LLVM. This protects all C++ code from being
> used in ROP attacks, removing all static ROP gadgets from use.
> 
> The remainder of the patch adds ROP protection to runtime generated
> code, in both stubs and compiled Java code. Attacks here are much harder
> as ROP gadgets must be found dynamically at runtime. If/when AOT
> compilation is added to JDK, then all stubs and compiled Java will be
> susceptible ROP gadgets being found by static analysis and therefore
> potentially as vulnerable as C++ code.
> 
> There are a number of places where the VM changes control flow by
> rewriting the stack or otherwise. I?ve done some analysis as to how
> these could also be used for attacks (which I didn?t want to post here).
> These areas can be protected ensuring the pointers to various stubs and
> entry points are stored in memory as signed pointers. These changes are
> simple to make (they can be reduced to a type change in common code and
> a few addition sign/auth calls in the backend), but there a lot of them
> and the total code change is fairly large. I?m happy to provide a few
> work in progress patches.
> 
> In order to match the security benefits of the Apple Arm64e ABI across
> the whole of JDK, then all the changes mentioned above would be
> required.

Alan Hayward has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits:

 - Merge master
 - Rename pauth_authenticate_or_strip_return_address
 - Fix windows aarch64 by restoring pauth file split
 - Don't keep LR live across restore_live_registers
 - Merge master
 - Document pauth functions && remove OS split
 - Update UseROPProtection description
 - Simplify branch protection configure check
 - 8264130: PAC-RET protection for Linux/AArch64
   
   PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One
   of its uses is to protect against ROP based attacks. This is done by
   signing the Link Register whenever it is stored on the stack, and
   authenticating the value when it is loaded back from the stack. If an
   attacker were to try to change control flow by editing the stack then
   the authentication check of the Link Register will fail, causing a
   segfault when the function returns.
   
   On a system with PAC enabled, it is expected that all applications will
   be compiled with ROP protection. Fedora 33 and upwards already provide
   this. By compiling for ARMv8.0, GCC and LLVM will only use the set of
   PAC instructions that exist in the NOP space - on hardware without PAC,
   these instructions act as NOPs, allowing backward compatibility for
   negligible performance cost (2 NOPs per non-leaf function).
   
   Hardware is currently limited to the Apple M1 MacBooks. All testing has
   been done within a Fedora Docker image. A run of SpecJVM showed no
   difference to that of noise - which was surprising.
   
   The most important part of this patch is simply compiling using branch
   protection provided by GCC/LLVM. This protects all C++ code from being
   used in ROP attacks, removing all static ROP gadgets from use.
   
   The remainder of the patch adds ROP protection to runtime generated
   code, in both stubs and compiled Java code. Attacks here are much harder
   as ROP gadgets must be found dynamically at runtime. If/when AOT
   compilation is added to JDK, then all stubs and compiled Java will be
   susceptible ROP gadgets being found by static analysis and therefore
   potentially as vulnerable as C++ code.
   
   There are a number of places where the VM changes control flow by
   rewriting the stack or otherwise. I?ve done some analysis as to how
   these could also be used for attacks (which I didn?t want to post here).
   These areas can be protected ensuring the pointers to various stubs and
   entry points are stored in memory as signed pointers. These changes are
   simple to make (they can be reduced to a type change in common code and
   a few addition sign/auth calls in the backend), but there a lot of them
   and the total code change is fairly large. I?m happy to provide a few
   work in progress patches.
   
   In order to match the security benefits of the Apple Arm64e ABI across
   the whole of JDK, then all the changes mentioned above would be
   required.
 - Add PAC assembly instructions
 - ... and 2 more: https://git.openjdk.java.net/jdk/compare/b8d33a2a...deb17a56

-------------

Changes: https://git.openjdk.java.net/jdk/pull/6334/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=05
  Stats: 1347 lines in 25 files changed: 521 ins; 18 del; 808 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6334.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6334/head:pull/6334

PR: https://git.openjdk.java.net/jdk/pull/6334

From hseigel at openjdk.java.net  Tue Nov 16 15:56:00 2021
From: hseigel at openjdk.java.net (Harold Seigel)
Date: Tue, 16 Nov 2021 15:56:00 GMT
Subject: RFR: 8276795: Deprecate seldom used CDS flags [v2]
In-Reply-To: <Sex9p6I6KaGfkEMPnyyuLPsgMM6KEm_WLeSuhUkHMs4=.e4fce608-e8ac-4014-8e6b-20a5348af607@github.com>
References: <Sex9p6I6KaGfkEMPnyyuLPsgMM6KEm_WLeSuhUkHMs4=.e4fce608-e8ac-4014-8e6b-20a5348af607@github.com>
Message-ID: <2R0k3TgJwgMkaV2tlOyW8O1cLiB6USFPJ-qvItVBJV0=.7e012487-abf4-4fa3-91c8-74f01d49bbab@github.com>

> Please review this small change to deprecate seldom used CDS flags.  The flags will be deprecated in 18, obsoleted in 19, and removed in a later release.
> 
> The changes were tested with Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 3-5 on Linux x64.
> 
> Thanks, Harold

Harold Seigel has updated the pull request incrementally with one additional commit since the last revision:

  Add (Deprecated) to comments and add options to deprecated test

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6390/files
  - new: https://git.openjdk.java.net/jdk/pull/6390/files/aad3f00b..9d49730e

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6390&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6390&range=00-01

  Stats: 11 lines in 2 files changed: 4 ins; 0 del; 7 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6390.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6390/head:pull/6390

PR: https://git.openjdk.java.net/jdk/pull/6390

From hseigel at openjdk.java.net  Tue Nov 16 15:56:01 2021
From: hseigel at openjdk.java.net (Harold Seigel)
Date: Tue, 16 Nov 2021 15:56:01 GMT
Subject: RFR: 8276795: Deprecate seldom used CDS flags
In-Reply-To: <Sex9p6I6KaGfkEMPnyyuLPsgMM6KEm_WLeSuhUkHMs4=.e4fce608-e8ac-4014-8e6b-20a5348af607@github.com>
References: <Sex9p6I6KaGfkEMPnyyuLPsgMM6KEm_WLeSuhUkHMs4=.e4fce608-e8ac-4014-8e6b-20a5348af607@github.com>
Message-ID: <jES9nScybDwIPmkn1KJDySqN3hsH8L7jnE9W58OWT_U=.074771c9-ea32-49d6-8740-4a4de1a09c5f@github.com>

On Mon, 15 Nov 2021 14:50:43 GMT, Harold Seigel <hseigel at openjdk.org> wrote:

> Please review this small change to deprecate seldom used CDS flags.  The flags will be deprecated in 18, obsoleted in 19, and removed in a later release.
> 
> The changes were tested with Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 3-5 on Linux x64.
> 
> Thanks, Harold

David, thanks for looking at this change.  Please review the updated commit. It contains the needed additional changes that you pointed out.
Thanks, Harold

-------------

PR: https://git.openjdk.java.net/jdk/pull/6390

From ccheung at openjdk.java.net  Tue Nov 16 17:36:36 2021
From: ccheung at openjdk.java.net (Calvin Cheung)
Date: Tue, 16 Nov 2021 17:36:36 GMT
Subject: RFR: 8276795: Deprecate seldom used CDS flags [v2]
In-Reply-To: <2R0k3TgJwgMkaV2tlOyW8O1cLiB6USFPJ-qvItVBJV0=.7e012487-abf4-4fa3-91c8-74f01d49bbab@github.com>
References: <Sex9p6I6KaGfkEMPnyyuLPsgMM6KEm_WLeSuhUkHMs4=.e4fce608-e8ac-4014-8e6b-20a5348af607@github.com>
 <2R0k3TgJwgMkaV2tlOyW8O1cLiB6USFPJ-qvItVBJV0=.7e012487-abf4-4fa3-91c8-74f01d49bbab@github.com>
Message-ID: <zNAzSJoxxu6aZ5jQfNatkKokr3VYOShk4WMFbr9qfh4=.907d891a-e189-4682-ad52-1d6e5af40dc2@github.com>

On Tue, 16 Nov 2021 15:56:00 GMT, Harold Seigel <hseigel at openjdk.org> wrote:

>> Please review this small change to deprecate seldom used CDS flags.  The flags will be deprecated in 18, obsoleted in 19, and removed in a later release.
>> 
>> The changes were tested with Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 3-5 on Linux x64.
>> 
>> Thanks, Harold
>
> Harold Seigel has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Add (Deprecated) to comments and add options to deprecated test

LGTM.

-------------

Marked as reviewed by ccheung (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6390

From iklam at openjdk.java.net  Tue Nov 16 17:40:39 2021
From: iklam at openjdk.java.net (Ioi Lam)
Date: Tue, 16 Nov 2021 17:40:39 GMT
Subject: RFR: 8276795: Deprecate seldom used CDS flags [v2]
In-Reply-To: <2R0k3TgJwgMkaV2tlOyW8O1cLiB6USFPJ-qvItVBJV0=.7e012487-abf4-4fa3-91c8-74f01d49bbab@github.com>
References: <Sex9p6I6KaGfkEMPnyyuLPsgMM6KEm_WLeSuhUkHMs4=.e4fce608-e8ac-4014-8e6b-20a5348af607@github.com>
 <2R0k3TgJwgMkaV2tlOyW8O1cLiB6USFPJ-qvItVBJV0=.7e012487-abf4-4fa3-91c8-74f01d49bbab@github.com>
Message-ID: <tczCP1SfxKtnJIB6-msDgOGeiSAM8mMHw2eKJUsdDto=.a37652be-ab44-4aab-b37b-5cf27af98ccd@github.com>

On Tue, 16 Nov 2021 15:56:00 GMT, Harold Seigel <hseigel at openjdk.org> wrote:

>> Please review this small change to deprecate seldom used CDS flags.  The flags will be deprecated in 18, obsoleted in 19, and removed in a later release.
>> 
>> The changes were tested with Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 3-5 on Linux x64.
>> 
>> Thanks, Harold
>
> Harold Seigel has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Add (Deprecated) to comments and add options to deprecated test

LGTM

-------------

Marked as reviewed by iklam (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6390

From jorn.vernee at oracle.com  Tue Nov 16 17:51:02 2021
From: jorn.vernee at oracle.com (Jorn Vernee)
Date: Tue, 16 Nov 2021 18:51:02 +0100
Subject: Questions about oop handling for Panama upcalls.
Message-ID: <700ffdf2-f63d-7d91-828a-d41e9aa433e5@oracle.com>

Hi,

For panama-foreign upcalls we spin our own upcall stubs that wrap a 
method handle VM entry for the actual upcall. I want to make sure I have 
the oop handling correct on this.

We receive a list of arguments from native code (all primitives, so no 
oops to handle there), and then prefix that list with a MethodHandle 
oop, before calling into the MH's VM entry. The MH oop can be stored in 
three different places:

1. The MH oop is stored in a global JNI handle, and then resolved right 
before the upcall [1].
2. The MH oop is then stored in the first argument register j_rarg0 for 
the call.
3. During a deopt of the callee, the deoptimization code spills the 
receiver (MH oop) into the frame of the upcall stub. (looks like the 
extending of the frame that happens for instance in c2i adapters doesn't 
make room for the receiver?).

I don't think I need to do anything else for 1., but for 2. and 3. there 
is currently no handling. I wanted to ask how those cases should be 
handled, if at all.

I think 2. could in theory be addressed by implementing 
CodeBlob::preserve_callee_argument_oops. Though, it has been working 
fine so far without this, so I'm wondering if this is even needed. Is 
the caller or callee responsible for handling argument oops (seems to be 
caller, from looking at CompiledMethod::preserve_callee_argument_oops)? 
Or does the caller just handle the receiver if there is one (since deopt 
spills that into the callers frame)? The oop offset is passed to an 
OopClosure in CompiledArgumentOopFinder::handle_oop_offset as an oop* 
[2]. Does the argument register get spilled somewhere and the oop needs 
to be patched in place at that address (by the OopClosure)? Or is this 
just used to mark the oop as alive? (in the latter case, the JNI global 
should be enough I think).

I think 3. could be handled with an OopMap entry at the frame offset 
where the receiver is spilled during a deopt of the callee? Should it be 
an oop or a narrowOop, or does it depend on VM settings? FWIW, the deopt 
code always seems to need a machine word (64-bits) to do the spilling, 
so I think it's an oop? Do I need to zero out that part of the frame 
when allocating the frame so that the GC doesn't mistake some garbage 
that's in there for an oop?

I have a POC patch here for reference [3], that implements the 2 things 
above. This passes our test suite, but I'm not sure about the 
correctness. Looking at what JNI does for upcalls [4], I don't see how 
e.g. the receiver argument that is put on the stack is handled, or what 
happens when the callee deopts (though I think it would just overwrite 
the value on the stack that's there already, since JNI always seems to 
do interpreted calls, where we do compiled calls).? But, JNI/the call 
stub might be special cased elsewhere...

Also, the oop is briefly stored in rscratch1 when resolving. I'm 
interested to know when the GC can look at the frame and register state, 
especially with concurrent GCs in mind. I'm assuming it's only during 
the call to the MH VM entry (but the existence of frame::safe_for_sender 
makes me less sure)? AFAIK the call counts as a safepoint (with oop map 
for it typically stored at the return offset). At this safepoint, the 
oop can only be stored at one of the 3 places listed at the start.

Thanks,
Jorn

[1] : 
https://github.com/openjdk/panama-foreign/blob/foreign-jextract/src/hotspot/cpu/x86/universalUpcallHandler_x86_64.cpp#L412-L416
[2] : 
https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/frame.cpp#L939-L946
[3] : 
https://github.com/openjdk/panama-foreign/compare/foreign-memaccess+abi...JornVernee:Deopt_Crash
[4] : 
https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp#L339


From duke at openjdk.java.net  Tue Nov 16 18:21:50 2021
From: duke at openjdk.java.net (Evgeny Astigeevich)
Date: Tue, 16 Nov 2021 18:21:50 GMT
Subject: RFR: 8277137: Set OnSpinWaitInst default value to "isb" for Arm
 Neoverse N1
Message-ID: <-I6bN1jTD9JWI_Nsrf64Yy25zpmeF7Q6Vpe5vSLPKY8=.33084f60-ebd4-4f8f-b56e-408f0fd807b6@github.com>

One `ISB` implementation of `Thread.OnSpinWait` shows performance improvements on Graviton2 (Arm Neoverse N1 implementation), e.g. https://github.com/openjdk/jdk/pull/5562#issuecomment-966153163. 

Testing:
- `make test TEST=gtest`: Passed
- `make run-test TEST=tier1`: Passed
- `make run-test TEST=tier2`: Passed
- `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed

-------------

Commit messages:
 - 8277137: Set OnSpinWaitInst default value to "isb" for Arm Neoverse N1

Changes: https://git.openjdk.java.net/jdk/pull/6415/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6415&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8277137
  Stats: 100 lines in 2 files changed: 100 ins; 0 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6415.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6415/head:pull/6415

PR: https://git.openjdk.java.net/jdk/pull/6415

From phh at openjdk.java.net  Tue Nov 16 19:00:40 2021
From: phh at openjdk.java.net (Paul Hohensee)
Date: Tue, 16 Nov 2021 19:00:40 GMT
Subject: RFR: 8277137: Set OnSpinWaitInst default value to "isb" for Arm
 Neoverse N1
In-Reply-To: <-I6bN1jTD9JWI_Nsrf64Yy25zpmeF7Q6Vpe5vSLPKY8=.33084f60-ebd4-4f8f-b56e-408f0fd807b6@github.com>
References: <-I6bN1jTD9JWI_Nsrf64Yy25zpmeF7Q6Vpe5vSLPKY8=.33084f60-ebd4-4f8f-b56e-408f0fd807b6@github.com>
Message-ID: <ZFaRy0JpxPM4FqAznTJ4BNDxoB3MJNYbEm4rzOWCDFI=.b34bf5ac-6885-4aa9-a3cd-637e17d2ca3f@github.com>

On Tue, 16 Nov 2021 18:14:15 GMT, Evgeny Astigeevich <duke at openjdk.java.net> wrote:

> One `ISB` implementation of `Thread.OnSpinWait` shows performance improvements on Graviton2 (Arm Neoverse N1 implementation), e.g. https://github.com/openjdk/jdk/pull/5562#issuecomment-966153163. 
> 
> Testing:
> - `make test TEST=gtest`: Passed
> - `make run-test TEST=tier1`: Passed
> - `make run-test TEST=tier2`: Passed
> - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed

I'd explicitly set OnSpinWaitInstCount because one has to go find the default value in another file to understand what's going to happen. So I'd add:

FLAG_SET_DEFAULT(OnSpinWaitInstCount, 1);

-------------

Changes requested by phh (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6415

From duke at openjdk.java.net  Tue Nov 16 19:15:11 2021
From: duke at openjdk.java.net (Evgeny Astigeevich)
Date: Tue, 16 Nov 2021 19:15:11 GMT
Subject: RFR: 8277137: Set OnSpinWaitInst default value to "isb" for Arm
 Neoverse N1 [v2]
In-Reply-To: <-I6bN1jTD9JWI_Nsrf64Yy25zpmeF7Q6Vpe5vSLPKY8=.33084f60-ebd4-4f8f-b56e-408f0fd807b6@github.com>
References: <-I6bN1jTD9JWI_Nsrf64Yy25zpmeF7Q6Vpe5vSLPKY8=.33084f60-ebd4-4f8f-b56e-408f0fd807b6@github.com>
Message-ID: <aFdigbMcMQLsaatbofyFuXA-XNlUaIbA77TumqEk9gI=.cda5b3cb-d625-472c-b08e-46ea69dca145@github.com>

> One `ISB` implementation of `Thread.OnSpinWait` shows performance improvements on Graviton2 (Arm Neoverse N1 implementation), e.g. https://github.com/openjdk/jdk/pull/5562#issuecomment-966153163. 
> 
> Testing:
> - `make test TEST=gtest`: Passed
> - `make run-test TEST=tier1`: Passed
> - `make run-test TEST=tier2`: Passed
> - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed

Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision:

  Explicitly set OnSpinWaitInstCount to 1

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6415/files
  - new: https://git.openjdk.java.net/jdk/pull/6415/files/b3b8a23e..56258906

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6415&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6415&range=00-01

  Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6415.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6415/head:pull/6415

PR: https://git.openjdk.java.net/jdk/pull/6415

From duke at openjdk.java.net  Tue Nov 16 19:15:14 2021
From: duke at openjdk.java.net (Evgeny Astigeevich)
Date: Tue, 16 Nov 2021 19:15:14 GMT
Subject: RFR: 8277137: Set OnSpinWaitInst default value to "isb" for Arm
 Neoverse N1 [v2]
In-Reply-To: <ZFaRy0JpxPM4FqAznTJ4BNDxoB3MJNYbEm4rzOWCDFI=.b34bf5ac-6885-4aa9-a3cd-637e17d2ca3f@github.com>
References: <-I6bN1jTD9JWI_Nsrf64Yy25zpmeF7Q6Vpe5vSLPKY8=.33084f60-ebd4-4f8f-b56e-408f0fd807b6@github.com>
 <ZFaRy0JpxPM4FqAznTJ4BNDxoB3MJNYbEm4rzOWCDFI=.b34bf5ac-6885-4aa9-a3cd-637e17d2ca3f@github.com>
Message-ID: <S_c4GQwPS4pEYAjt6IhzDSi-8mZh16Rot_T_PUQWifE=.97f54e32-89bb-4892-90c2-ccb859252254@github.com>

On Tue, 16 Nov 2021 18:57:53 GMT, Paul Hohensee <phh at openjdk.org> wrote:

> I'd explicitly set OnSpinWaitInstCount because one has to go find the default value in another file to understand what's going to happen. So I'd add:
> 
> FLAG_SET_DEFAULT(OnSpinWaitInstCount, 1);

Thank you for reviewing.
Done.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6415

From duke at openjdk.java.net  Wed Nov 17 03:53:58 2021
From: duke at openjdk.java.net (Fei Gao)
Date: Wed, 17 Nov 2021 03:53:58 GMT
Subject: RFR: 8274179: AArch64: Support SVE operations with encodable
 immediates [v5]
In-Reply-To: <iQ6P7TblcS20eQ-acZmAkLU_ikzTkMCUGXorw4LFD_8=.15686e9d-b6a0-4ee8-bd55-ecbe0512b721@github.com>
References: <iQ6P7TblcS20eQ-acZmAkLU_ikzTkMCUGXorw4LFD_8=.15686e9d-b6a0-4ee8-bd55-ecbe0512b721@github.com>
Message-ID: <UnN3RA8mTzpJHPCBvtz6ZVimjzm2-LGeYu1SvQqp6IE=.02cf9938-9b7c-4b55-bb1b-cf88e0f58c0d@github.com>

> for(int i = 0; i < LENGTH; i++) {
>       c[i] = a[i] + 2;
>     }
> 
> For the case showed above, after superword optimization with SVE,
> without the patch, the vector add operation always has 2 z-reg inputs,
> like:
> mov     z16.s, #2
> add	z17.s, z17.s, z16.s
> 
> Considering sve has supported basic binary operations with immediate,
> this pattern could be further optimized to:
> add     z16.s, z16.s, #2
> 
> To implement it, we added some new match rules and assembler rules in
> the aarch64 backend. We also made some extensions on immediate types
> and functions to keep backward compatible.
> 
> With the patch, only these binary integer vector operations, +(add),
> -(sub), &(and), |(orr), and ^(eor) with immediate are supported for
> the optimization. Other vector operations are not supported currently.
> 
> Tested tier1 and test/hotspot/jtreg/compiler on SVE featured AArch64
> CPU, no new failure.
> 
> There is no obvious performance uplift but it can help remove one
> redundant mov instruction.

Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits:

 - Regenerate the asmtest.out.h file for aarch64 after rebasing
   
   Change-Id: I1292449268c73c8f84cc3ffa7a4c859cf79058eb
 - Merge branch 'master' of github.com:fg1417/jdk into fg1417-20211026
   
   Change-Id: I2004dc45f7f0ab44bc22b48083b185e7b3bd5eea
 - Add some assertion lines for help functions
   
   Change-Id: Ic9120902bd8f8a8ead2e3740435a40f35d21757c
 - Split the original patch and leave the existing logic in Assembler entirely untouched
   
   Change-Id: If8ddcef07b15615d7dd0c3063c44d2b705fac6f7
 - Merge branch 'master' of github.com:fg1417/jdk into fg1417-20211026
   
   Change-Id: I52aa66d200b74ac312c5d40283b94854bc1142e6
 - 8274179: AArch64: Support SVE operations with encodable immediates
   
       for(int i = 0; i < LENGTH; i++) {
         c[i] = a[i] + 2;
       }
   
   For the case showed above, after superword optimization with SVE,
   without the patch, the vector add operation always has 2 z-reg inputs,
   like:
   mov     z16.s, #2
   add	z17.s, z17.s, z16.s
   
   Considering sve has supported basic binary operations with immediate,
   this pattern could be further optimized to:
   add     z16.s, z16.s, #2
   
   To implement it, we added some new match rules and assembler rules in
   the aarch64 backend. We also made some extensions on immediate types
   and functions to keep backward compatible.
   
   With the patch, only these binary integer vector operations, +(add),
   -(sub), &(and), |(orr), and ^(eor) with immediate are supported for
   the optimization. Other vector operations are not supported currently.
   
   Tested tier1 and test/hotspot/jtreg/compiler on SVE featured AArch64
   CPU, no new failure.
   
   There is no obvious performance uplift but it can help remove one
   redundant mov instruction.
   
   Change-Id: Iaec40e362918118691083fb171cc4dff390b35a2

-------------

Changes: https://git.openjdk.java.net/jdk/pull/6115/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6115&range=04
  Stats: 1476 lines in 12 files changed: 1329 ins; 43 del; 104 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6115.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6115/head:pull/6115

PR: https://git.openjdk.java.net/jdk/pull/6115

From david.holmes at oracle.com  Wed Nov 17 06:56:23 2021
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 17 Nov 2021 16:56:23 +1000
Subject: Is it necessary to check subtype for invokeinterface of private
 method?
In-Reply-To: <3d8e3679-30db-ad9c-d68a-6184fd2d008c@amazon.com>
References: <3d8e3679-30db-ad9c-d68a-6184fd2d008c@amazon.com>
Message-ID: <df985b39-3969-64e4-1a99-dfb491eee42e@oracle.com>

Hi Xin,

On 16/11/2021 7:44 pm, Liu, Xin wrote:
> Hi,
> 
> I am working on the regex performance in JDK-8274983. Even though it
> begins with regex, the problem boils down to invokeinterface to the
> private interface methods. Before hidden class (JDK-8238358), lambda
> meta factory generates invokespecial for them. Now it generates
> invokeinterface instead. C1 doesn't recognize the new code pattern and
> generates an ic virtual call for the callsite. If many classes all
> implement a common interface, they trash the ic stub because the
> concrete classes are different. InvokePrivateInterfaceMethod.java with
> -XX:+TraceCallFixup can reveal this pathological slowness.
> 
> Is it the intentional behavior of C1? I see that C2 actually generates
> checkcast code sequence for this case. I would like to patch up C1
> because C1 plays an important role for the startup time.

Can't comment on C1 issue.

> I have a patch to let C1 treats invokeinterface private interface
> methods as invokespecial. In other words, I treat the private interface
> methods as effective final. It runs pretty well until I encounter the
> regression ./test/jdk/java/lang/invoke/PrivateInterfaceCall.java. That
> leads me to the second question.
> 
> In my understanding, the code unsafeCastI2() is essentially a typecast
> of a function pointer, isn't it?  My take of "unsafe" in "unsafeCastI2"
> is that its behavior undefined, then why we need to check ICCE here?

"unsafe" means that the validity of the cast can't be checked 
immediately, but it will be checked when the result is actually used - 
it is not "undefined behaviour". The VM and MethodHandle specifications 
require the subtype check on the receiver:

JVMS 6.5 invoke_interface - Run-time Exception: Otherwise, if the class 
of objectref does not implement the resolved interface, invokeinterface 
throws an IncompatibleClassChangeError.

>      System.out.println("ICCE PrivateInterfaceCall.invokeDirect D1");
>      shouldThrowICCE(() ->
> PrivateInterfaceCall.invokeDirect(unsafeCastI2(new D1())))
> 
>      static I2 unsafeCastI2(Object obj) {
>          try {
>              MethodHandle mh = MethodHandles.identity(Object.class);
>              mh = MethodHandles.explicitCastArguments(mh,
> mh.type().changeReturnType(I2.class));
>              return (I2)mh.invokeExact((Object) obj);
>          } catch (Throwable e) {
>              throw new Error(e);
>          }
>      }
> 
> In real world, how much meaningful we detect the error if we
> accidentally invoke a private interface method where we actually don't
> implement that interface. I think it's only possible via methodhandle
> and jasm, right? if we say it's undefined behavior, I think we can skip
> typecheck. It would make lambda code faster.

I think there are potential security considerations here, but if you 
want to make a case for change then email:

  jls-jvms-spec-comments at openjdk.java.net

Cheers,
David


> thanks,
> --lx
> 

From ngasson at openjdk.java.net  Wed Nov 17 07:42:36 2021
From: ngasson at openjdk.java.net (Nick Gasson)
Date: Wed, 17 Nov 2021 07:42:36 GMT
Subject: RFR: 8277137: Set OnSpinWaitInst default value to "isb" for Arm
 Neoverse N1 [v2]
In-Reply-To: <aFdigbMcMQLsaatbofyFuXA-XNlUaIbA77TumqEk9gI=.cda5b3cb-d625-472c-b08e-46ea69dca145@github.com>
References: <-I6bN1jTD9JWI_Nsrf64Yy25zpmeF7Q6Vpe5vSLPKY8=.33084f60-ebd4-4f8f-b56e-408f0fd807b6@github.com>
 <aFdigbMcMQLsaatbofyFuXA-XNlUaIbA77TumqEk9gI=.cda5b3cb-d625-472c-b08e-46ea69dca145@github.com>
Message-ID: <bjATkUpHxn_MAbv4SZkpSyEwqLzAtfBryGOwdJIZt4A=.2cd05900-7ee0-49a5-86cd-af2a30611425@github.com>

On Tue, 16 Nov 2021 19:15:11 GMT, Evgeny Astigeevich <duke at openjdk.java.net> wrote:

>> One `ISB` implementation of `Thread.OnSpinWait` shows performance improvements on Graviton2 (Arm Neoverse N1 implementation), e.g. https://github.com/openjdk/jdk/pull/5562#issuecomment-966153163. 
>> 
>> Testing:
>> - `make test TEST=gtest`: Passed
>> - `make run-test TEST=tier1`: Passed
>> - `make run-test TEST=tier2`: Passed
>> - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed
>
> Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Explicitly set OnSpinWaitInstCount to 1

src/hotspot/cpu/aarch64/vm_version_aarch64.cpp line 206:

> 204:     }
> 205: 
> 206:     if (FLAG_IS_DEFAULT(OnSpinWaitInst) && FLAG_IS_DEFAULT(OnSpinWaitInstCount)) {

Should these two be set independently? If I pass `-XX:OnSpinWaitInstCount=2` then `OnSpinWaitInst` will default to "none".

-------------

PR: https://git.openjdk.java.net/jdk/pull/6415

From xxinliu at amazon.com  Wed Nov 17 08:22:10 2021
From: xxinliu at amazon.com (Liu, Xin)
Date: Wed, 17 Nov 2021 00:22:10 -0800
Subject: Is it necessary to check subtype for invokeinterface of private
 method?
In-Reply-To: <df985b39-3969-64e4-1a99-dfb491eee42e@oracle.com>
References: <3d8e3679-30db-ad9c-d68a-6184fd2d008c@amazon.com>
 <df985b39-3969-64e4-1a99-dfb491eee42e@oracle.com>
Message-ID: <687da655-903b-b6b7-2b38-23e3cf722106@amazon.com>

Hi, David,

Thanks you the head-up! Now I understand why c2 generates the checkcast
code for invokespecial and invokeinterface. It must conform to the JVM
spec.

C1 does the checkcast for invokespecial right now.
https://github.com/openjdk/jdk/blob/master/src/hotspot/share/c1/c1_GraphBuilder.cpp#L1874

./test/jdk/java/lang/invoke/PrivateInterfaceCall.java is correct. The
ICCE is expected. I will implement invokeinterface by book.

I don't intend the challenge the JVM spec. The private interface methods
is new for me. I double think about this. There's no polymorphism for
them, so c1/c2 can use relocInfo::opt_virtual_call_type to optimize the
callsites of them, but the typecheck still needs to be in place!

thanks,
--lx


On 11/16/21 10:56 PM, David Holmes wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> Hi Xin,
> 
> On 16/11/2021 7:44 pm, Liu, Xin wrote:
>> Hi,
>>
>> I am working on the regex performance in JDK-8274983. Even though it
>> begins with regex, the problem boils down to invokeinterface to the
>> private interface methods. Before hidden class (JDK-8238358), lambda
>> meta factory generates invokespecial for them. Now it generates
>> invokeinterface instead. C1 doesn't recognize the new code pattern and
>> generates an ic virtual call for the callsite. If many classes all
>> implement a common interface, they trash the ic stub because the
>> concrete classes are different. InvokePrivateInterfaceMethod.java with
>> -XX:+TraceCallFixup can reveal this pathological slowness.
>>
>> Is it the intentional behavior of C1? I see that C2 actually generates
>> checkcast code sequence for this case. I would like to patch up C1
>> because C1 plays an important role for the startup time.
> 
> Can't comment on C1 issue.
> 
>> I have a patch to let C1 treats invokeinterface private interface
>> methods as invokespecial. In other words, I treat the private interface
>> methods as effective final. It runs pretty well until I encounter the
>> regression ./test/jdk/java/lang/invoke/PrivateInterfaceCall.java. That
>> leads me to the second question.
>>
>> In my understanding, the code unsafeCastI2() is essentially a typecast
>> of a function pointer, isn't it?  My take of "unsafe" in "unsafeCastI2"
>> is that its behavior undefined, then why we need to check ICCE here?
> 
> "unsafe" means that the validity of the cast can't be checked
> immediately, but it will be checked when the result is actually used -
> it is not "undefined behaviour". The VM and MethodHandle specifications
> require the subtype check on the receiver:
> 
> JVMS 6.5 invoke_interface - Run-time Exception: Otherwise, if the class
> of objectref does not implement the resolved interface, invokeinterface
> throws an IncompatibleClassChangeError.
> 
>>      System.out.println("ICCE PrivateInterfaceCall.invokeDirect D1");
>>      shouldThrowICCE(() ->
>> PrivateInterfaceCall.invokeDirect(unsafeCastI2(new D1())))
>>
>>      static I2 unsafeCastI2(Object obj) {
>>          try {
>>              MethodHandle mh = MethodHandles.identity(Object.class);
>>              mh = MethodHandles.explicitCastArguments(mh,
>> mh.type().changeReturnType(I2.class));
>>              return (I2)mh.invokeExact((Object) obj);
>>          } catch (Throwable e) {
>>              throw new Error(e);
>>          }
>>      }
>>
>> In real world, how much meaningful we detect the error if we
>> accidentally invoke a private interface method where we actually don't
>> implement that interface. I think it's only possible via methodhandle
>> and jasm, right? if we say it's undefined behavior, I think we can skip
>> typecheck. It would make lambda code faster.
> 
> I think there are potential security considerations here, but if you
> want to make a case for change then email:
> 
>   jls-jvms-spec-comments at openjdk.java.net
> 
> Cheers,
> David
> 
> 
>> thanks,
>> --lx
>>

From forax at univ-mlv.fr  Wed Nov 17 08:53:32 2021
From: forax at univ-mlv.fr (Remi Forax)
Date: Wed, 17 Nov 2021 09:53:32 +0100 (CET)
Subject: Is it necessary to check subtype for invokeinterface of private
 method?
In-Reply-To: <687da655-903b-b6b7-2b38-23e3cf722106@amazon.com>
References: <3d8e3679-30db-ad9c-d68a-6184fd2d008c@amazon.com>
 <df985b39-3969-64e4-1a99-dfb491eee42e@oracle.com>
 <687da655-903b-b6b7-2b38-23e3cf722106@amazon.com>
Message-ID: <1135408296.1620728.1637139212846.JavaMail.zimbra@u-pem.fr>

----- Original Message -----
> From: "Liu, Xin" <xxinliu at amazon.com>
> To: "David Holmes" <david.holmes at oracle.com>, "hotspot-dev" <hotspot-dev at openjdk.java.net>
> Cc: "Simonis, Volker" <simonisv at amazon.de>, "Dean Long" <dean.long at oracle.com>
> Sent: Mercredi 17 Novembre 2021 09:22:10
> Subject: Re: Is it necessary to check subtype for invokeinterface of private method?

> Hi, David,
> 
> Thanks you the head-up! Now I understand why c2 generates the checkcast
> code for invokespecial and invokeinterface. It must conform to the JVM
> spec.
> 
> C1 does the checkcast for invokespecial right now.
> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/c1/c1_GraphBuilder.cpp#L1874
> 
> ./test/jdk/java/lang/invoke/PrivateInterfaceCall.java is correct. The
> ICCE is expected. I will implement invokeinterface by book.
> 
> I don't intend the challenge the JVM spec. The private interface methods
> is new for me. I double think about this. There's no polymorphism for
> them, so c1/c2 can use relocInfo::opt_virtual_call_type to optimize the
> callsites of them, but the typecheck still needs to be in place!

Yes, the type check is due to the fact that the bytecode verifier does not verify if a value really implement an interface so a typecheck as to be inserted.
Usually, it means that the world seen by the compiler and the world seen by the VM are not the same hence the IncompatibleClassChangeError.

> 
> thanks,
> --lx

regards,
R?mi

> 
> 
> On 11/16/21 10:56 PM, David Holmes wrote:
>> CAUTION: This email originated from outside of the organization. Do not click
>> links or open attachments unless you can confirm the sender and know the
>> content is safe.
>> 
>> 
>> 
>> Hi Xin,
>> 
>> On 16/11/2021 7:44 pm, Liu, Xin wrote:
>>> Hi,
>>>
>>> I am working on the regex performance in JDK-8274983. Even though it
>>> begins with regex, the problem boils down to invokeinterface to the
>>> private interface methods. Before hidden class (JDK-8238358), lambda
>>> meta factory generates invokespecial for them. Now it generates
>>> invokeinterface instead. C1 doesn't recognize the new code pattern and
>>> generates an ic virtual call for the callsite. If many classes all
>>> implement a common interface, they trash the ic stub because the
>>> concrete classes are different. InvokePrivateInterfaceMethod.java with
>>> -XX:+TraceCallFixup can reveal this pathological slowness.
>>>
>>> Is it the intentional behavior of C1? I see that C2 actually generates
>>> checkcast code sequence for this case. I would like to patch up C1
>>> because C1 plays an important role for the startup time.
>> 
>> Can't comment on C1 issue.
>> 
>>> I have a patch to let C1 treats invokeinterface private interface
>>> methods as invokespecial. In other words, I treat the private interface
>>> methods as effective final. It runs pretty well until I encounter the
>>> regression ./test/jdk/java/lang/invoke/PrivateInterfaceCall.java. That
>>> leads me to the second question.
>>>
>>> In my understanding, the code unsafeCastI2() is essentially a typecast
>>> of a function pointer, isn't it?  My take of "unsafe" in "unsafeCastI2"
>>> is that its behavior undefined, then why we need to check ICCE here?
>> 
>> "unsafe" means that the validity of the cast can't be checked
>> immediately, but it will be checked when the result is actually used -
>> it is not "undefined behaviour". The VM and MethodHandle specifications
>> require the subtype check on the receiver:
>> 
>> JVMS 6.5 invoke_interface - Run-time Exception: Otherwise, if the class
>> of objectref does not implement the resolved interface, invokeinterface
>> throws an IncompatibleClassChangeError.
>> 
>>>      System.out.println("ICCE PrivateInterfaceCall.invokeDirect D1");
>>>      shouldThrowICCE(() ->
>>> PrivateInterfaceCall.invokeDirect(unsafeCastI2(new D1())))
>>>
>>>      static I2 unsafeCastI2(Object obj) {
>>>          try {
>>>              MethodHandle mh = MethodHandles.identity(Object.class);
>>>              mh = MethodHandles.explicitCastArguments(mh,
>>> mh.type().changeReturnType(I2.class));
>>>              return (I2)mh.invokeExact((Object) obj);
>>>          } catch (Throwable e) {
>>>              throw new Error(e);
>>>          }
>>>      }
>>>
>>> In real world, how much meaningful we detect the error if we
>>> accidentally invoke a private interface method where we actually don't
>>> implement that interface. I think it's only possible via methodhandle
>>> and jasm, right? if we say it's undefined behavior, I think we can skip
>>> typecheck. It would make lambda code faster.
>> 
>> I think there are potential security considerations here, but if you
>> want to make a case for change then email:
>> 
>>   jls-jvms-spec-comments at openjdk.java.net
>> 
>> Cheers,
>> David
>> 
>> 
>>> thanks,
>>> --lx

From duke at openjdk.java.net  Wed Nov 17 09:30:45 2021
From: duke at openjdk.java.net (Fei Gao)
Date: Wed, 17 Nov 2021 09:30:45 GMT
Subject: RFR: 8274179: AArch64: Support SVE operations with encodable
 immediates
In-Reply-To: <XlVFZDTwwYvLPlMddZCPyMUlG-a_ryk-pYxniQBSQu8=.9f72283d-7d54-4839-b45a-3962138ff261@github.com>
References: <iQ6P7TblcS20eQ-acZmAkLU_ikzTkMCUGXorw4LFD_8=.15686e9d-b6a0-4ee8-bd55-ecbe0512b721@github.com>
 <XlVFZDTwwYvLPlMddZCPyMUlG-a_ryk-pYxniQBSQu8=.9f72283d-7d54-4839-b45a-3962138ff261@github.com>
Message-ID: <c6h5Bp3u-T2hkf5HvZJwrQtJ0lkjWXgWl3qw4BDOqIA=.da482ce6-5324-4e36-8d11-ddeaea5e7604@github.com>

On Tue, 26 Oct 2021 11:37:23 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> for(int i = 0; i < LENGTH; i++) {
>>       c[i] = a[i] + 2;
>>     }
>> 
>> For the case showed above, after superword optimization with SVE,
>> without the patch, the vector add operation always has 2 z-reg inputs,
>> like:
>> mov     z16.s, #2
>> add	z17.s, z17.s, z16.s
>> 
>> Considering sve has supported basic binary operations with immediate,
>> this pattern could be further optimized to:
>> add     z16.s, z16.s, #2
>> 
>> To implement it, we added some new match rules and assembler rules in
>> the aarch64 backend. We also made some extensions on immediate types
>> and functions to keep backward compatible.
>> 
>> With the patch, only these binary integer vector operations, +(add),
>> -(sub), &(and), |(orr), and ^(eor) with immediate are supported for
>> the optimization. Other vector operations are not supported currently.
>> 
>> Tested tier1 and test/hotspot/jtreg/compiler on SVE featured AArch64
>> CPU, no new failure.
>> 
>> There is no obvious performance uplift but it can help remove one
>> redundant mov instruction.
>
> I'd like you to split this patch into two parts, please.
> First, please use the new functions such as `Assembler::operand_valid_for_logical_immediate(bool is32, uint64_t imm)` only for SVE, leaving the existing logic in `Assembler` entirely untouched. This will cause some duplication, but that's OK. We can review changes to merge functionality in a separate patch. This will be much easier.

Hi @theRealAph , I rebased my patch and retested it internally. Can I have your review :)? Thanks.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6115

From erik.osterlund at oracle.com  Wed Nov 17 09:42:22 2021
From: erik.osterlund at oracle.com (Erik Osterlund)
Date: Wed, 17 Nov 2021 09:42:22 +0000
Subject: Questions about oop handling for Panama upcalls.
In-Reply-To: <700ffdf2-f63d-7d91-828a-d41e9aa433e5@oracle.com>
References: <700ffdf2-f63d-7d91-828a-d41e9aa433e5@oracle.com>
Message-ID: <BN0PR10MB5176632CF1C775BC4BE361D1F49A9@BN0PR10MB5176.namprd10.prod.outlook.com>

Hi Jorn,

So you have a jobject in the caller, resolve it, and then need to pass the oop around as an argument to the callee. Our current upcall stubs try to quack like an interpreter in many ways, so that it will look like an i-2-something call. I think you can either try to do the same quacking dance, to pass the oop to the callee, or alternatively the primary question for me seems to be who is the callee? You have a very fixed format for the call, which makes me suspect the callee is some kind of JDK internal code. Another way of dealing with this would be to pass the jobject as a long and just resolve it in the callee instead, if this is indeed JDK internal code. Then this becomes a problem that doesn't need to be solved at all. Just sanity checking.

/Erik

> -----Original Message-----
> From: hotspot-dev <hotspot-dev-retn at openjdk.java.net> On Behalf Of Jorn
> Vernee
> Sent: Tuesday, 16 November 2021 18:51
> To: hotspot-dev at openjdk.java.net
> Subject: Questions about oop handling for Panama upcalls.
> 
> Hi,
> 
> For panama-foreign upcalls we spin our own upcall stubs that wrap a method
> handle VM entry for the actual upcall. I want to make sure I have the oop
> handling correct on this.
> 
> We receive a list of arguments from native code (all primitives, so no oops to
> handle there), and then prefix that list with a MethodHandle oop, before
> calling into the MH's VM entry. The MH oop can be stored in three different
> places:
> 
> 1. The MH oop is stored in a global JNI handle, and then resolved right before
> the upcall [1].
> 2. The MH oop is then stored in the first argument register j_rarg0 for the
> call.
> 3. During a deopt of the callee, the deoptimization code spills the receiver
> (MH oop) into the frame of the upcall stub. (looks like the extending of the
> frame that happens for instance in c2i adapters doesn't make room for the
> receiver?).
> 
> I don't think I need to do anything else for 1., but for 2. and 3. there is
> currently no handling. I wanted to ask how those cases should be handled, if
> at all.
> 
> I think 2. could in theory be addressed by implementing
> CodeBlob::preserve_callee_argument_oops. Though, it has been working
> fine so far without this, so I'm wondering if this is even needed. Is the caller
> or callee responsible for handling argument oops (seems to be caller, from
> looking at CompiledMethod::preserve_callee_argument_oops)?
> Or does the caller just handle the receiver if there is one (since deopt spills
> that into the callers frame)? The oop offset is passed to an OopClosure in
> CompiledArgumentOopFinder::handle_oop_offset as an oop* [2]. Does the
> argument register get spilled somewhere and the oop needs to be patched
> in place at that address (by the OopClosure)? Or is this just used to mark the
> oop as alive? (in the latter case, the JNI global should be enough I think).
> 
> I think 3. could be handled with an OopMap entry at the frame offset where
> the receiver is spilled during a deopt of the callee? Should it be an oop or a
> narrowOop, or does it depend on VM settings? FWIW, the deopt code
> always seems to need a machine word (64-bits) to do the spilling, so I think
> it's an oop? Do I need to zero out that part of the frame when allocating the
> frame so that the GC doesn't mistake some garbage that's in there for an
> oop?
> 
> I have a POC patch here for reference [3], that implements the 2 things
> above. This passes our test suite, but I'm not sure about the correctness.
> Looking at what JNI does for upcalls [4], I don't see how e.g. the receiver
> argument that is put on the stack is handled, or what happens when the
> callee deopts (though I think it would just overwrite the value on the stack
> that's there already, since JNI always seems to do interpreted calls, where
> we do compiled calls).? But, JNI/the call stub might be special cased
> elsewhere...
> 
> Also, the oop is briefly stored in rscratch1 when resolving. I'm interested to
> know when the GC can look at the frame and register state, especially with
> concurrent GCs in mind. I'm assuming it's only during the call to the MH VM
> entry (but the existence of frame::safe_for_sender makes me less sure)?
> AFAIK the call counts as a safepoint (with oop map for it typically stored at
> the return offset). At this safepoint, the oop can only be stored at one of the
> 3 places listed at the start.
> 
> Thanks,
> Jorn
> 
> [1] :
> https://github.com/openjdk/panama-foreign/blob/foreign-
> jextract/src/hotspot/cpu/x86/universalUpcallHandler_x86_64.cpp#L412-L416
> [2] :
> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/fr
> ame.cpp#L939-L946
> [3] :
> https://github.com/openjdk/panama-foreign/compare/foreign-
> memaccess+abi...JornVernee:Deopt_Crash
> [4] :
> https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/stubGe
> nerator_x86_64.cpp#L339


From aph at openjdk.java.net  Wed Nov 17 09:57:39 2021
From: aph at openjdk.java.net (Andrew Haley)
Date: Wed, 17 Nov 2021 09:57:39 GMT
Subject: RFR: 8274179: AArch64: Support SVE operations with encodable
 immediates [v5]
In-Reply-To: <UnN3RA8mTzpJHPCBvtz6ZVimjzm2-LGeYu1SvQqp6IE=.02cf9938-9b7c-4b55-bb1b-cf88e0f58c0d@github.com>
References: <iQ6P7TblcS20eQ-acZmAkLU_ikzTkMCUGXorw4LFD_8=.15686e9d-b6a0-4ee8-bd55-ecbe0512b721@github.com>
 <UnN3RA8mTzpJHPCBvtz6ZVimjzm2-LGeYu1SvQqp6IE=.02cf9938-9b7c-4b55-bb1b-cf88e0f58c0d@github.com>
Message-ID: <v6LSLhZswr7w0-l_idQYSqtyEGs_lS6h2CPtT5Zp-fY=.c580e6e6-2a5c-496d-a1b9-d7b28f740fdc@github.com>

On Wed, 17 Nov 2021 03:53:58 GMT, Fei Gao <duke at openjdk.java.net> wrote:

>> for(int i = 0; i < LENGTH; i++) {
>>       c[i] = a[i] + 2;
>>     }
>> 
>> For the case showed above, after superword optimization with SVE,
>> without the patch, the vector add operation always has 2 z-reg inputs,
>> like:
>> mov     z16.s, #2
>> add	z17.s, z17.s, z16.s
>> 
>> Considering sve has supported basic binary operations with immediate,
>> this pattern could be further optimized to:
>> add     z16.s, z16.s, #2
>> 
>> To implement it, we added some new match rules and assembler rules in
>> the aarch64 backend. We also made some extensions on immediate types
>> and functions to keep backward compatible.
>> 
>> With the patch, only these binary integer vector operations, +(add),
>> -(sub), &(and), |(orr), and ^(eor) with immediate are supported for
>> the optimization. Other vector operations are not supported currently.
>> 
>> Tested tier1 and test/hotspot/jtreg/compiler on SVE featured AArch64
>> CPU, no new failure.
>> 
>> There is no obvious performance uplift but it can help remove one
>> redundant mov instruction.
>
> Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits:
> 
>  - Regenerate the asmtest.out.h file for aarch64 after rebasing
>    
>    Change-Id: I1292449268c73c8f84cc3ffa7a4c859cf79058eb
>  - Merge branch 'master' of github.com:fg1417/jdk into fg1417-20211026
>    
>    Change-Id: I2004dc45f7f0ab44bc22b48083b185e7b3bd5eea
>  - Add some assertion lines for help functions
>    
>    Change-Id: Ic9120902bd8f8a8ead2e3740435a40f35d21757c
>  - Split the original patch and leave the existing logic in Assembler entirely untouched
>    
>    Change-Id: If8ddcef07b15615d7dd0c3063c44d2b705fac6f7
>  - Merge branch 'master' of github.com:fg1417/jdk into fg1417-20211026
>    
>    Change-Id: I52aa66d200b74ac312c5d40283b94854bc1142e6
>  - 8274179: AArch64: Support SVE operations with encodable immediates
>    
>        for(int i = 0; i < LENGTH; i++) {
>          c[i] = a[i] + 2;
>        }
>    
>    For the case showed above, after superword optimization with SVE,
>    without the patch, the vector add operation always has 2 z-reg inputs,
>    like:
>    mov     z16.s, #2
>    add	z17.s, z17.s, z16.s
>    
>    Considering sve has supported basic binary operations with immediate,
>    this pattern could be further optimized to:
>    add     z16.s, z16.s, #2
>    
>    To implement it, we added some new match rules and assembler rules in
>    the aarch64 backend. We also made some extensions on immediate types
>    and functions to keep backward compatible.
>    
>    With the patch, only these binary integer vector operations, +(add),
>    -(sub), &(and), |(orr), and ^(eor) with immediate are supported for
>    the optimization. Other vector operations are not supported currently.
>    
>    Tested tier1 and test/hotspot/jtreg/compiler on SVE featured AArch64
>    CPU, no new failure.
>    
>    There is no obvious performance uplift but it can help remove one
>    redundant mov instruction.
>    
>    Change-Id: Iaec40e362918118691083fb171cc4dff390b35a2

src/hotspot/cpu/aarch64/aarch64.ad line 2736:

> 2734:   if (is_vshift_con_pattern(n, m) ||
> 2735:       (UseSVE > 0 && m->Opcode() == Op_VectorStoreMask && n->Opcode() == Op_StoreVector) ||
> 2736:       is_vector_arith_imm_pattern(n, m)) {

Indent this line.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6115

From aph at openjdk.java.net  Wed Nov 17 10:01:37 2021
From: aph at openjdk.java.net (Andrew Haley)
Date: Wed, 17 Nov 2021 10:01:37 GMT
Subject: RFR: 8274179: AArch64: Support SVE operations with encodable
 immediates [v5]
In-Reply-To: <UnN3RA8mTzpJHPCBvtz6ZVimjzm2-LGeYu1SvQqp6IE=.02cf9938-9b7c-4b55-bb1b-cf88e0f58c0d@github.com>
References: <iQ6P7TblcS20eQ-acZmAkLU_ikzTkMCUGXorw4LFD_8=.15686e9d-b6a0-4ee8-bd55-ecbe0512b721@github.com>
 <UnN3RA8mTzpJHPCBvtz6ZVimjzm2-LGeYu1SvQqp6IE=.02cf9938-9b7c-4b55-bb1b-cf88e0f58c0d@github.com>
Message-ID: <jnPIDwLPyeb4vtXrH1coY56wLa-fRQp7xFxq9NQEzjY=.ebe072d8-13bd-42f6-8428-a439fa55d399@github.com>

On Wed, 17 Nov 2021 03:53:58 GMT, Fei Gao <duke at openjdk.java.net> wrote:

>> for(int i = 0; i < LENGTH; i++) {
>>       c[i] = a[i] + 2;
>>     }
>> 
>> For the case showed above, after superword optimization with SVE,
>> without the patch, the vector add operation always has 2 z-reg inputs,
>> like:
>> mov     z16.s, #2
>> add	z17.s, z17.s, z16.s
>> 
>> Considering sve has supported basic binary operations with immediate,
>> this pattern could be further optimized to:
>> add     z16.s, z16.s, #2
>> 
>> To implement it, we added some new match rules and assembler rules in
>> the aarch64 backend. We also made some extensions on immediate types
>> and functions to keep backward compatible.
>> 
>> With the patch, only these binary integer vector operations, +(add),
>> -(sub), &(and), |(orr), and ^(eor) with immediate are supported for
>> the optimization. Other vector operations are not supported currently.
>> 
>> Tested tier1 and test/hotspot/jtreg/compiler on SVE featured AArch64
>> CPU, no new failure.
>> 
>> There is no obvious performance uplift but it can help remove one
>> redundant mov instruction.
>
> Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits:
> 
>  - Regenerate the asmtest.out.h file for aarch64 after rebasing
>    
>    Change-Id: I1292449268c73c8f84cc3ffa7a4c859cf79058eb
>  - Merge branch 'master' of github.com:fg1417/jdk into fg1417-20211026
>    
>    Change-Id: I2004dc45f7f0ab44bc22b48083b185e7b3bd5eea
>  - Add some assertion lines for help functions
>    
>    Change-Id: Ic9120902bd8f8a8ead2e3740435a40f35d21757c
>  - Split the original patch and leave the existing logic in Assembler entirely untouched
>    
>    Change-Id: If8ddcef07b15615d7dd0c3063c44d2b705fac6f7
>  - Merge branch 'master' of github.com:fg1417/jdk into fg1417-20211026
>    
>    Change-Id: I52aa66d200b74ac312c5d40283b94854bc1142e6
>  - 8274179: AArch64: Support SVE operations with encodable immediates
>    
>        for(int i = 0; i < LENGTH; i++) {
>          c[i] = a[i] + 2;
>        }
>    
>    For the case showed above, after superword optimization with SVE,
>    without the patch, the vector add operation always has 2 z-reg inputs,
>    like:
>    mov     z16.s, #2
>    add	z17.s, z17.s, z16.s
>    
>    Considering sve has supported basic binary operations with immediate,
>    this pattern could be further optimized to:
>    add     z16.s, z16.s, #2
>    
>    To implement it, we added some new match rules and assembler rules in
>    the aarch64 backend. We also made some extensions on immediate types
>    and functions to keep backward compatible.
>    
>    With the patch, only these binary integer vector operations, +(add),
>    -(sub), &(and), |(orr), and ^(eor) with immediate are supported for
>    the optimization. Other vector operations are not supported currently.
>    
>    Tested tier1 and test/hotspot/jtreg/compiler on SVE featured AArch64
>    CPU, no new failure.
>    
>    There is no obvious performance uplift but it can help remove one
>    redundant mov instruction.
>    
>    Change-Id: Iaec40e362918118691083fb171cc4dff390b35a2

Good job, well done.

-------------

Marked as reviewed by aph (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6115

From aph-open at littlepinkcloud.com  Wed Nov 17 10:02:34 2021
From: aph-open at littlepinkcloud.com (Andrew Haley)
Date: Wed, 17 Nov 2021 10:02:34 +0000
Subject: RFR: 8274179: AArch64: Support SVE operations with encodable
 immediates [v5]
In-Reply-To: <v6LSLhZswr7w0-l_idQYSqtyEGs_lS6h2CPtT5Zp-fY=.c580e6e6-2a5c-496d-a1b9-d7b28f740fdc@github.com>
References: <iQ6P7TblcS20eQ-acZmAkLU_ikzTkMCUGXorw4LFD_8=.15686e9d-b6a0-4ee8-bd55-ecbe0512b721@github.com>
 <UnN3RA8mTzpJHPCBvtz6ZVimjzm2-LGeYu1SvQqp6IE=.02cf9938-9b7c-4b55-bb1b-cf88e0f58c0d@github.com>
 <v6LSLhZswr7w0-l_idQYSqtyEGs_lS6h2CPtT5Zp-fY=.c580e6e6-2a5c-496d-a1b9-d7b28f740fdc@github.com>
Message-ID: <85cbec88-e5ea-85a8-d3b4-593d1a9d7778@littlepinkcloud.com>

On 11/17/21 09:57, Andrew Haley wrote:
>> 2734:   if (is_vshift_con_pattern(n, m) ||
>> 2735:       (UseSVE > 0 && m->Opcode() == Op_VectorStoreMask && n->Opcode() == Op_StoreVector) ||
>> 2736:       is_vector_arith_imm_pattern(n, m)) {
> Indent this line.

Sorry, that was a mistake.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From duke at openjdk.java.net  Wed Nov 17 10:09:38 2021
From: duke at openjdk.java.net (Fei Gao)
Date: Wed, 17 Nov 2021 10:09:38 GMT
Subject: RFR: 8274179: AArch64: Support SVE operations with encodable
 immediates
In-Reply-To: <XlVFZDTwwYvLPlMddZCPyMUlG-a_ryk-pYxniQBSQu8=.9f72283d-7d54-4839-b45a-3962138ff261@github.com>
References: <iQ6P7TblcS20eQ-acZmAkLU_ikzTkMCUGXorw4LFD_8=.15686e9d-b6a0-4ee8-bd55-ecbe0512b721@github.com>
 <XlVFZDTwwYvLPlMddZCPyMUlG-a_ryk-pYxniQBSQu8=.9f72283d-7d54-4839-b45a-3962138ff261@github.com>
Message-ID: <DRzxATKNqNeCEfUOWceHhLMynI3G0Sy7vX7OhuVoshI=.0943dbb3-e4c7-4779-b1c6-177603d52812@github.com>

On Tue, 26 Oct 2021 11:37:23 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> for(int i = 0; i < LENGTH; i++) {
>>       c[i] = a[i] + 2;
>>     }
>> 
>> For the case showed above, after superword optimization with SVE,
>> without the patch, the vector add operation always has 2 z-reg inputs,
>> like:
>> mov     z16.s, #2
>> add	z17.s, z17.s, z16.s
>> 
>> Considering sve has supported basic binary operations with immediate,
>> this pattern could be further optimized to:
>> add     z16.s, z16.s, #2
>> 
>> To implement it, we added some new match rules and assembler rules in
>> the aarch64 backend. We also made some extensions on immediate types
>> and functions to keep backward compatible.
>> 
>> With the patch, only these binary integer vector operations, +(add),
>> -(sub), &(and), |(orr), and ^(eor) with immediate are supported for
>> the optimization. Other vector operations are not supported currently.
>> 
>> Tested tier1 and test/hotspot/jtreg/compiler on SVE featured AArch64
>> CPU, no new failure.
>> 
>> There is no obvious performance uplift but it can help remove one
>> redundant mov instruction.
>
> I'd like you to split this patch into two parts, please.
> First, please use the new functions such as `Assembler::operand_valid_for_logical_immediate(bool is32, uint64_t imm)` only for SVE, leaving the existing logic in `Assembler` entirely untouched. This will cause some duplication, but that's OK. We can review changes to merge functionality in a separate patch. This will be much easier.

Thanks :) @theRealAph

-------------

PR: https://git.openjdk.java.net/jdk/pull/6115

From thartmann at openjdk.java.net  Wed Nov 17 11:48:49 2021
From: thartmann at openjdk.java.net (Tobias Hartmann)
Date: Wed, 17 Nov 2021 11:48:49 GMT
Subject: RFR: 8275643: C2's unaryOp vector intrinsic does not properly handle
 LongVector.neg
Message-ID: <VIWd1DGe48UKFJdi61wqQqYOOuFed9_yEgqMaz1444k=.43162c19-75d4-4806-81dc-29ad772d7155@github.com>

Code in `LongVector::lanewiseTemplate` currently implements the `NEG` operation as a `SUB` and has a corresponding `FIXME` comment:
https://github.com/openjdk/jdk/blob/e9934e1243929514e147ecdd3cefa74168ed0500/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/LongVector.java#L534-L541

The implicit assumption is that since we will never pass `NEG` to `VectorSupport.unaryOp` in line 540, the corresponding C2 intrinsic does not need to handle that case. That's not guaranteed though because C2 might still compile that path when not being able to prove that it's unreachable at parse time. As a result, we then assert in the intrinsic because the negation operation on a long vector is currently not supported (i.e. there is no `Op_NegVL`). I propose to simply handle this case in ` VectorSupport::vop2ideal`. We will then bail out from intrinsification with `operation not supported: opc=NegL bt=long` because `VectorNode::opcode` returns 0:
https://github.com/openjdk/jdk/blob/e9934e1243929514e147ecdd3cefa74168ed0500/src/hotspot/share/opto/vectorIntrinsics.cpp#L390-L394

Question to the Vector API experts: There are other `FIXME: Support this in the JIT` comments in the code. Do these code paths suffer from similar issues? Is there a tracking RFE/bug?

Thanks,
Tobias

-------------

Commit messages:
 - 8275643: C2's unaryOp vector intrinsic does not properly handle LongVector.neg

Changes: https://git.openjdk.java.net/jdk/pull/6428/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6428&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8275643
  Stats: 51 lines in 2 files changed: 51 ins; 0 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6428.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6428/head:pull/6428

PR: https://git.openjdk.java.net/jdk/pull/6428

From duke at openjdk.java.net  Wed Nov 17 12:04:34 2021
From: duke at openjdk.java.net (Evgeny Astigeevich)
Date: Wed, 17 Nov 2021 12:04:34 GMT
Subject: RFR: 8277137: Set OnSpinWaitInst default value to "isb" for Arm
 Neoverse N1 [v2]
In-Reply-To: <bjATkUpHxn_MAbv4SZkpSyEwqLzAtfBryGOwdJIZt4A=.2cd05900-7ee0-49a5-86cd-af2a30611425@github.com>
References: <-I6bN1jTD9JWI_Nsrf64Yy25zpmeF7Q6Vpe5vSLPKY8=.33084f60-ebd4-4f8f-b56e-408f0fd807b6@github.com>
 <aFdigbMcMQLsaatbofyFuXA-XNlUaIbA77TumqEk9gI=.cda5b3cb-d625-472c-b08e-46ea69dca145@github.com>
 <bjATkUpHxn_MAbv4SZkpSyEwqLzAtfBryGOwdJIZt4A=.2cd05900-7ee0-49a5-86cd-af2a30611425@github.com>
Message-ID: <vQycJfjFTOQLy6QQY6foamXM9R5w1tzLO1WKaaQ3SOo=.79363f7b-e71e-44db-a434-07da6c0a4f9d@github.com>

On Wed, 17 Nov 2021 07:39:29 GMT, Nick Gasson <ngasson at openjdk.org> wrote:

>> Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Explicitly set OnSpinWaitInstCount to 1
>
> src/hotspot/cpu/aarch64/vm_version_aarch64.cpp line 206:
> 
>> 204:     }
>> 205: 
>> 206:     if (FLAG_IS_DEFAULT(OnSpinWaitInst) && FLAG_IS_DEFAULT(OnSpinWaitInstCount)) {
> 
> Should these two be set independently? If I pass `-XX:OnSpinWaitInstCount=2` then `OnSpinWaitInst` will default to "none".

Hi Nick,
Thank you for reviewing the PR.

> Should these two be set independently?

I don't mind.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6415

From chagedorn at openjdk.java.net  Wed Nov 17 12:25:38 2021
From: chagedorn at openjdk.java.net (Christian Hagedorn)
Date: Wed, 17 Nov 2021 12:25:38 GMT
Subject: RFR: 8275643: C2's unaryOp vector intrinsic does not properly
 handle LongVector.neg
In-Reply-To: <VIWd1DGe48UKFJdi61wqQqYOOuFed9_yEgqMaz1444k=.43162c19-75d4-4806-81dc-29ad772d7155@github.com>
References: <VIWd1DGe48UKFJdi61wqQqYOOuFed9_yEgqMaz1444k=.43162c19-75d4-4806-81dc-29ad772d7155@github.com>
Message-ID: <AwA6OiLttXte1jtJkAVIyGCIJNm5qFc0oo-a76DUts0=.2b38d04e-a5e2-4c69-815d-c8a52d483e95@github.com>

On Wed, 17 Nov 2021 11:41:04 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote:

> Code in `LongVector::lanewiseTemplate` currently implements the `NEG` operation as a `SUB` and has a corresponding `FIXME` comment:
> https://github.com/openjdk/jdk/blob/e9934e1243929514e147ecdd3cefa74168ed0500/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/LongVector.java#L534-L541
> 
> The implicit assumption is that since we will never pass `NEG` to `VectorSupport.unaryOp` in line 540, the corresponding C2 intrinsic does not need to handle that case. That's not guaranteed though because C2 might still compile that path when not being able to prove that it's unreachable at parse time. As a result, we then assert in the intrinsic because the negation operation on a long vector is currently not supported (i.e. there is no `Op_NegVL`). I propose to simply handle this case in ` VectorSupport::vop2ideal`. We will then bail out from intrinsification with `operation not supported: opc=NegL bt=long` because `VectorNode::opcode` returns 0:
> https://github.com/openjdk/jdk/blob/e9934e1243929514e147ecdd3cefa74168ed0500/src/hotspot/share/opto/vectorIntrinsics.cpp#L390-L394
> 
> Question to the Vector API experts: There are other `FIXME: Support this in the JIT` comments in the code. Do these code paths suffer from similar issues? Is there a tracking RFE/bug?
> 
> Thanks,
> Tobias

That looks good to me!

-------------

Marked as reviewed by chagedorn (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6428

From duke at openjdk.java.net  Wed Nov 17 12:31:10 2021
From: duke at openjdk.java.net (Evgeny Astigeevich)
Date: Wed, 17 Nov 2021 12:31:10 GMT
Subject: RFR: 8277137: Set OnSpinWaitInst/OnSpinWaitInstCount defaults to
 "isb"/1 for Arm Neoverse N1 [v3]
In-Reply-To: <-I6bN1jTD9JWI_Nsrf64Yy25zpmeF7Q6Vpe5vSLPKY8=.33084f60-ebd4-4f8f-b56e-408f0fd807b6@github.com>
References: <-I6bN1jTD9JWI_Nsrf64Yy25zpmeF7Q6Vpe5vSLPKY8=.33084f60-ebd4-4f8f-b56e-408f0fd807b6@github.com>
Message-ID: <rZv8bPMXd0EIyGqH3UKloJMXlsw5Oe5KE0kXegj4_nc=.1fcc7aa0-3b49-4321-a0b8-e48f779c54e9@github.com>

> One `ISB` implementation of `Thread.OnSpinWait` shows performance improvements on Graviton2 (Arm Neoverse N1 implementation), e.g. https://github.com/openjdk/jdk/pull/5562#issuecomment-966153163. 
> 
> Testing:
> - `make test TEST=gtest`: Passed
> - `make run-test TEST=tier1`: Passed
> - `make run-test TEST=tier2`: Passed
> - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed

Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision:

  Set defaults for OnSpinWaitInst/OnSpinWaitInstCount independently

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6415/files
  - new: https://git.openjdk.java.net/jdk/pull/6415/files/56258906..a9edcca6

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6415&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6415&range=01-02

  Stats: 9 lines in 2 files changed: 5 ins; 1 del; 3 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6415.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6415/head:pull/6415

PR: https://git.openjdk.java.net/jdk/pull/6415

From duke at openjdk.java.net  Wed Nov 17 12:31:12 2021
From: duke at openjdk.java.net (Evgeny Astigeevich)
Date: Wed, 17 Nov 2021 12:31:12 GMT
Subject: RFR: 8277137: Set OnSpinWaitInst/OnSpinWaitInstCount defaults to
 "isb"/1 for Arm Neoverse N1 [v2]
In-Reply-To: <vQycJfjFTOQLy6QQY6foamXM9R5w1tzLO1WKaaQ3SOo=.79363f7b-e71e-44db-a434-07da6c0a4f9d@github.com>
References: <-I6bN1jTD9JWI_Nsrf64Yy25zpmeF7Q6Vpe5vSLPKY8=.33084f60-ebd4-4f8f-b56e-408f0fd807b6@github.com>
 <aFdigbMcMQLsaatbofyFuXA-XNlUaIbA77TumqEk9gI=.cda5b3cb-d625-472c-b08e-46ea69dca145@github.com>
 <bjATkUpHxn_MAbv4SZkpSyEwqLzAtfBryGOwdJIZt4A=.2cd05900-7ee0-49a5-86cd-af2a30611425@github.com>
 <vQycJfjFTOQLy6QQY6foamXM9R5w1tzLO1WKaaQ3SOo=.79363f7b-e71e-44db-a434-07da6c0a4f9d@github.com>
Message-ID: <zpAA82DeCn7291DlOMCi1dcH6qN4jbO0s_j6CUdmbBA=.b1443a27-a0f9-4379-a495-1734be152475@github.com>

On Wed, 17 Nov 2021 12:01:12 GMT, Evgeny Astigeevich <duke at openjdk.java.net> wrote:

>> src/hotspot/cpu/aarch64/vm_version_aarch64.cpp line 206:
>> 
>>> 204:     }
>>> 205: 
>>> 206:     if (FLAG_IS_DEFAULT(OnSpinWaitInst) && FLAG_IS_DEFAULT(OnSpinWaitInstCount)) {
>> 
>> Should these two be set independently? If I pass `-XX:OnSpinWaitInstCount=2` then `OnSpinWaitInst` will default to "none".
>
> Hi Nick,
> Thank you for reviewing the PR.
> 
>> Should these two be set independently?
> 
> I don't mind.

Done.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6415

From thartmann at openjdk.java.net  Wed Nov 17 12:42:37 2021
From: thartmann at openjdk.java.net (Tobias Hartmann)
Date: Wed, 17 Nov 2021 12:42:37 GMT
Subject: RFR: 8275643: C2's unaryOp vector intrinsic does not properly
 handle LongVector.neg
In-Reply-To: <VIWd1DGe48UKFJdi61wqQqYOOuFed9_yEgqMaz1444k=.43162c19-75d4-4806-81dc-29ad772d7155@github.com>
References: <VIWd1DGe48UKFJdi61wqQqYOOuFed9_yEgqMaz1444k=.43162c19-75d4-4806-81dc-29ad772d7155@github.com>
Message-ID: <EhrxKLopt8uhH4HUFdvswXqAy_QiZWlkmug3c1FlW5s=.6c507c4a-71fe-47be-8152-e5a06bc5a077@github.com>

On Wed, 17 Nov 2021 11:41:04 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote:

> Code in `LongVector::lanewiseTemplate` currently implements the `NEG` operation as a `SUB` and has a corresponding `FIXME` comment:
> https://github.com/openjdk/jdk/blob/e9934e1243929514e147ecdd3cefa74168ed0500/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/LongVector.java#L534-L541
> 
> The implicit assumption is that since we will never pass `NEG` to `VectorSupport.unaryOp` in line 540, the corresponding C2 intrinsic does not need to handle that case. That's not guaranteed though because C2 might still compile that path when not being able to prove that it's unreachable at parse time. As a result, we then assert in the intrinsic because the negation operation on a long vector is currently not supported (i.e. there is no `Op_NegVL`). I propose to simply handle this case in ` VectorSupport::vop2ideal`. We will then bail out from intrinsification with `operation not supported: opc=NegL bt=long` because `VectorNode::opcode` returns 0:
> https://github.com/openjdk/jdk/blob/e9934e1243929514e147ecdd3cefa74168ed0500/src/hotspot/share/opto/vectorIntrinsics.cpp#L390-L394
> 
> Question to the Vector API experts: There are other `FIXME: Support this in the JIT` comments in the code. Do these code paths suffer from similar issues? Is there a tracking RFE/bug?
> 
> Thanks,
> Tobias

Thanks for the review, Christian!

-------------

PR: https://git.openjdk.java.net/jdk/pull/6428

From aph at openjdk.java.net  Wed Nov 17 13:47:46 2021
From: aph at openjdk.java.net (Andrew Haley)
Date: Wed, 17 Nov 2021 13:47:46 GMT
Subject: RFR: 8277137: Set OnSpinWaitInst/OnSpinWaitInstCount defaults to
 "isb"/1 for Arm Neoverse N1 [v3]
In-Reply-To: <rZv8bPMXd0EIyGqH3UKloJMXlsw5Oe5KE0kXegj4_nc=.1fcc7aa0-3b49-4321-a0b8-e48f779c54e9@github.com>
References: <-I6bN1jTD9JWI_Nsrf64Yy25zpmeF7Q6Vpe5vSLPKY8=.33084f60-ebd4-4f8f-b56e-408f0fd807b6@github.com>
 <rZv8bPMXd0EIyGqH3UKloJMXlsw5Oe5KE0kXegj4_nc=.1fcc7aa0-3b49-4321-a0b8-e48f779c54e9@github.com>
Message-ID: <GA5LJzhsPSpwC2s3vX3bqBLyl8DxqojU3NLHjZXW_fo=.38348746-6a30-46e4-92b3-f6917b6eff57@github.com>

On Wed, 17 Nov 2021 12:31:10 GMT, Evgeny Astigeevich <duke at openjdk.java.net> wrote:

>> One `ISB` implementation of `Thread.OnSpinWait` shows performance improvements on Graviton2 (Arm Neoverse N1 implementation), e.g. https://github.com/openjdk/jdk/pull/5562#issuecomment-966153163. 
>> 
>> Testing:
>> - `make test TEST=gtest`: Passed
>> - `make run-test TEST=tier1`: Passed
>> - `make run-test TEST=tier2`: Passed
>> - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed
>
> Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Set defaults for OnSpinWaitInst/OnSpinWaitInstCount independently

Did we establish that this is the right default for Neoverse N1? I know that we've found a benchmark where it's a win, bit I'm not sure that's the same thing. On the other hand, do we know of possible cases where ISB makes things worse?

-------------

PR: https://git.openjdk.java.net/jdk/pull/6415

From hseigel at openjdk.java.net  Wed Nov 17 14:28:38 2021
From: hseigel at openjdk.java.net (Harold Seigel)
Date: Wed, 17 Nov 2021 14:28:38 GMT
Subject: RFR: 8276177:
 nsk/jvmti/RedefineClasses/StressRedefineWithoutBytecodeCorruption failed with
 "assert(def_ik->is_being_redefined()) failed: should be being redefined to get
 here"
In-Reply-To: <bIc3bHZoI18uZLind1OWmwcRLt7MPg9XkQ4AgrMnXJQ=.1be19b81-5b3b-4b22-b00d-c5772f621916@github.com>
References: <bIc3bHZoI18uZLind1OWmwcRLt7MPg9XkQ4AgrMnXJQ=.1be19b81-5b3b-4b22-b00d-c5772f621916@github.com>
Message-ID: <LLEHcwhrjFNhSid97ELZOKOP1iZLw0AUjvA_MtDuTXU=.003ff3cb-9341-481d-82cb-cc86557f9064@github.com>

On Tue, 16 Nov 2021 13:29:08 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

> The boolean AND was sharing a flag with another thread, so the value of _misc_is_being_redefined was being set and reset with the other thread.  Moved the bit to AccessFlags which has space and an atomic set operation.
> Tested with tier1-6, 7-8 in progress.

Looks Good!  Thanks for doing this.
Harold

-------------

Marked as reviewed by hseigel (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6410

From jorn.vernee at oracle.com  Wed Nov 17 14:48:37 2021
From: jorn.vernee at oracle.com (Jorn Vernee)
Date: Wed, 17 Nov 2021 15:48:37 +0100
Subject: Questions about oop handling for Panama upcalls.
In-Reply-To: <BN0PR10MB5176632CF1C775BC4BE361D1F49A9@BN0PR10MB5176.namprd10.prod.outlook.com>
References: <700ffdf2-f63d-7d91-828a-d41e9aa433e5@oracle.com>
 <BN0PR10MB5176632CF1C775BC4BE361D1F49A9@BN0PR10MB5176.namprd10.prod.outlook.com>
Message-ID: <9e2fa731-ff0c-3497-eda0-2ca394a1f33b@oracle.com>

Hi Erik,

Thanks for the suggestion.

The callee is a mix of JDK internal and user code. The user gives us a 
method handle that they want to turn into a native function pointer [1], 
and we adapt that using method handle combinators [2] to take only 
primitve arguments according to the registers in which the native 
calling convention passes arguments (essentially each primitive argument 
is a register value). The register values are then reconstructed into 
high-level arguments (through our MH adaptation), and passed to the user 
code. It's this adapted method handle that we call from the upcall stub.

I guess what you're suggesting is that we have some internal Java method 
like this:

 ??? static ... invoke(long methodHandle, ...) {
 ??????? MethodHandle mh = resolveJObject(methodHandle);
 ??????? return (...) mh.invokeExact(...);
 ??? }

Which is then called from the upcall stub instead.

I think it could work maybe (would have to see how the performance works 
out), but we have to deal with different signatures, so would have to 
use bytecode spinning to generate these 'invoke' methods on demand, 
which seems like maybe it's a worse medicine (in terms of complexity) 
than adding the correct oop handling in the VM.

I would also just like to get a better understanding of how this is 
supposed to work in the first place (or how it works e.g. in the case of 
nmethods), since I had to implement the correct oop handling in the past 
as well when implementing the intrinsics for down calls, and it's 
probably not the last time I have to deal with something like this...

 > Our current upcall stubs try to quack like an interpreter in many 
ways, so that it will look like an i-2-something call. I think you can 
either try to do the same quacking dance, to pass the oop to the callee

So, I suppose interpreter argument oops are handled through another 
mechanism than OopMaps, maybe something similar to 
CompiledMethod::preserve_callee_argument_oops?

Thanks,
Jorn

[1] : 
https://github.com/openjdk/panama-foreign/blob/foreign-jextract/src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/CLinker.java#L224
[2] : 
https://github.com/openjdk/panama-foreign/blob/foreign-jextract/src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/abi/ProgrammableUpcallHandler.java#L157

On 17/11/2021 10:42, Erik Osterlund wrote:
> Hi Jorn,
>
> So you have a jobject in the caller, resolve it, and then need to pass the oop around as an argument to the callee. Our current upcall stubs try to quack like an interpreter in many ways, so that it will look like an i-2-something call. I think you can either try to do the same quacking dance, to pass the oop to the callee, or alternatively the primary question for me seems to be who is the callee? You have a very fixed format for the call, which makes me suspect the callee is some kind of JDK internal code. Another way of dealing with this would be to pass the jobject as a long and just resolve it in the callee instead, if this is indeed JDK internal code. Then this becomes a problem that doesn't need to be solved at all. Just sanity checking.
>
> /Erik
>
>> -----Original Message-----
>> From: hotspot-dev<hotspot-dev-retn at openjdk.java.net>  On Behalf Of Jorn
>> Vernee
>> Sent: Tuesday, 16 November 2021 18:51
>> To:hotspot-dev at openjdk.java.net
>> Subject: Questions about oop handling for Panama upcalls.
>>
>> Hi,
>>
>> For panama-foreign upcalls we spin our own upcall stubs that wrap a method
>> handle VM entry for the actual upcall. I want to make sure I have the oop
>> handling correct on this.
>>
>> We receive a list of arguments from native code (all primitives, so no oops to
>> handle there), and then prefix that list with a MethodHandle oop, before
>> calling into the MH's VM entry. The MH oop can be stored in three different
>> places:
>>
>> 1. The MH oop is stored in a global JNI handle, and then resolved right before
>> the upcall [1].
>> 2. The MH oop is then stored in the first argument register j_rarg0 for the
>> call.
>> 3. During a deopt of the callee, the deoptimization code spills the receiver
>> (MH oop) into the frame of the upcall stub. (looks like the extending of the
>> frame that happens for instance in c2i adapters doesn't make room for the
>> receiver?).
>>
>> I don't think I need to do anything else for 1., but for 2. and 3. there is
>> currently no handling. I wanted to ask how those cases should be handled, if
>> at all.
>>
>> I think 2. could in theory be addressed by implementing
>> CodeBlob::preserve_callee_argument_oops. Though, it has been working
>> fine so far without this, so I'm wondering if this is even needed. Is the caller
>> or callee responsible for handling argument oops (seems to be caller, from
>> looking at CompiledMethod::preserve_callee_argument_oops)?
>> Or does the caller just handle the receiver if there is one (since deopt spills
>> that into the callers frame)? The oop offset is passed to an OopClosure in
>> CompiledArgumentOopFinder::handle_oop_offset as an oop* [2]. Does the
>> argument register get spilled somewhere and the oop needs to be patched
>> in place at that address (by the OopClosure)? Or is this just used to mark the
>> oop as alive? (in the latter case, the JNI global should be enough I think).
>>
>> I think 3. could be handled with an OopMap entry at the frame offset where
>> the receiver is spilled during a deopt of the callee? Should it be an oop or a
>> narrowOop, or does it depend on VM settings? FWIW, the deopt code
>> always seems to need a machine word (64-bits) to do the spilling, so I think
>> it's an oop? Do I need to zero out that part of the frame when allocating the
>> frame so that the GC doesn't mistake some garbage that's in there for an
>> oop?
>>
>> I have a POC patch here for reference [3], that implements the 2 things
>> above. This passes our test suite, but I'm not sure about the correctness.
>> Looking at what JNI does for upcalls [4], I don't see how e.g. the receiver
>> argument that is put on the stack is handled, or what happens when the
>> callee deopts (though I think it would just overwrite the value on the stack
>> that's there already, since JNI always seems to do interpreted calls, where
>> we do compiled calls).? But, JNI/the call stub might be special cased
>> elsewhere...
>>
>> Also, the oop is briefly stored in rscratch1 when resolving. I'm interested to
>> know when the GC can look at the frame and register state, especially with
>> concurrent GCs in mind. I'm assuming it's only during the call to the MH VM
>> entry (but the existence of frame::safe_for_sender makes me less sure)?
>> AFAIK the call counts as a safepoint (with oop map for it typically stored at
>> the return offset). At this safepoint, the oop can only be stored at one of the
>> 3 places listed at the start.
>>
>> Thanks,
>> Jorn
>>
>> [1] :
>> https://github.com/openjdk/panama-foreign/blob/foreign-
>> jextract/src/hotspot/cpu/x86/universalUpcallHandler_x86_64.cpp#L412-L416
>> [2] :
>> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/fr
>> ame.cpp#L939-L946
>> [3] :
>> https://github.com/openjdk/panama-foreign/compare/foreign-
>> memaccess+abi...JornVernee:Deopt_Crash
>> [4] :
>> https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/stubGe
>> nerator_x86_64.cpp#L339

From erik.osterlund at oracle.com  Wed Nov 17 15:14:06 2021
From: erik.osterlund at oracle.com (Erik Osterlund)
Date: Wed, 17 Nov 2021 15:14:06 +0000
Subject: Questions about oop handling for Panama upcalls.
In-Reply-To: <9e2fa731-ff0c-3497-eda0-2ca394a1f33b@oracle.com>
References: <700ffdf2-f63d-7d91-828a-d41e9aa433e5@oracle.com>
 <BN0PR10MB5176632CF1C775BC4BE361D1F49A9@BN0PR10MB5176.namprd10.prod.outlook.com>
 <9e2fa731-ff0c-3497-eda0-2ca394a1f33b@oracle.com>
Message-ID: <BN0PR10MB517672D2A13028547A57F7DBF49A9@BN0PR10MB5176.namprd10.prod.outlook.com>

Hi Jorn,

In the interpreter world, the expression stack at the call site becomes the locals
of the callee. So everything is passed through the stack. So the upcall stub sets
things up like an interpreter method would have (quack quack), and calls the
i2c adapter if there is an nmethod (quack quack), which will transform the
arguments to the compiled convention of the callee. The argument ownership
then switches from the caller to the callee, once the callee can manifest on the
stack. But if there are safepoints inbetween, then the caller owns the arguments
until its callee manifests.

Do you want to avoid the pretend to be the interpreter step because it is costly
in the Panama world to spill arguments to the stack?

/Erik

> -----Original Message-----
> From: Jorn Vernee <jorn.vernee at oracle.com>
> Sent: Wednesday, 17 November 2021 15:49
> To: Erik Osterlund <erik.osterlund at oracle.com>; hotspot-
> dev at openjdk.java.net
> Subject: Re: Questions about oop handling for Panama upcalls.
> 
> Hi Erik,
> 
> Thanks for the suggestion.
> 
> The callee is a mix of JDK internal and user code. The user gives us a method
> handle that they want to turn into a native function pointer [1], and we adapt
> that using method handle combinators [2] to take only primitve arguments
> according to the registers in which the native calling convention passes
> arguments (essentially each primitive argument is a register value). The
> register values are then reconstructed into high-level arguments (through
> our MH adaptation), and passed to the user code. It's this adapted method
> handle that we call from the upcall stub.
> 
> I guess what you're suggesting is that we have some internal Java method
> like this:
> 
>  ??? static ... invoke(long methodHandle, ...) {
>  ??????? MethodHandle mh = resolveJObject(methodHandle);
>  ??????? return (...) mh.invokeExact(...);
>  ??? }
> 
> Which is then called from the upcall stub instead.
> 
> I think it could work maybe (would have to see how the performance works
> out), but we have to deal with different signatures, so would have to use
> bytecode spinning to generate these 'invoke' methods on demand, which
> seems like maybe it's a worse medicine (in terms of complexity) than adding
> the correct oop handling in the VM.
> 
> I would also just like to get a better understanding of how this is supposed to
> work in the first place (or how it works e.g. in the case of nmethods), since I
> had to implement the correct oop handling in the past as well when
> implementing the intrinsics for down calls, and it's probably not the last time I
> have to deal with something like this...
> 
>  > Our current upcall stubs try to quack like an interpreter in many ways, so
> that it will look like an i-2-something call. I think you can either try to do the
> same quacking dance, to pass the oop to the callee
> 
> So, I suppose interpreter argument oops are handled through another
> mechanism than OopMaps, maybe something similar to
> CompiledMethod::preserve_callee_argument_oops?
> 
> Thanks,
> Jorn
> 
> [1] :
> https://github.com/openjdk/panama-foreign/blob/foreign-
> jextract/src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/CLink
> er.java#L224
> [2] :
> https://github.com/openjdk/panama-foreign/blob/foreign-
> jextract/src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/abi/Pr
> ogrammableUpcallHandler.java#L157
> 
> On 17/11/2021 10:42, Erik Osterlund wrote:
> > Hi Jorn,
> >
> > So you have a jobject in the caller, resolve it, and then need to pass the
> oop around as an argument to the callee. Our current upcall stubs try to
> quack like an interpreter in many ways, so that it will look like an i-2-
> something call. I think you can either try to do the same quacking dance, to
> pass the oop to the callee, or alternatively the primary question for me
> seems to be who is the callee? You have a very fixed format for the call,
> which makes me suspect the callee is some kind of JDK internal code.
> Another way of dealing with this would be to pass the jobject as a long and
> just resolve it in the callee instead, if this is indeed JDK internal code. Then
> this becomes a problem that doesn't need to be solved at all. Just sanity
> checking.
> >
> > /Erik
> >
> >> -----Original Message-----
> >> From: hotspot-dev<hotspot-dev-retn at openjdk.java.net>  On Behalf Of
> >> Jorn Vernee
> >> Sent: Tuesday, 16 November 2021 18:51 To:hotspot-
> dev at openjdk.java.net
> >> Subject: Questions about oop handling for Panama upcalls.
> >>
> >> Hi,
> >>
> >> For panama-foreign upcalls we spin our own upcall stubs that wrap a
> >> method handle VM entry for the actual upcall. I want to make sure I
> >> have the oop handling correct on this.
> >>
> >> We receive a list of arguments from native code (all primitives, so
> >> no oops to handle there), and then prefix that list with a
> >> MethodHandle oop, before calling into the MH's VM entry. The MH oop
> >> can be stored in three different
> >> places:
> >>
> >> 1. The MH oop is stored in a global JNI handle, and then resolved
> >> right before the upcall [1].
> >> 2. The MH oop is then stored in the first argument register j_rarg0
> >> for the call.
> >> 3. During a deopt of the callee, the deoptimization code spills the
> >> receiver (MH oop) into the frame of the upcall stub. (looks like the
> >> extending of the frame that happens for instance in c2i adapters
> >> doesn't make room for the receiver?).
> >>
> >> I don't think I need to do anything else for 1., but for 2. and 3.
> >> there is currently no handling. I wanted to ask how those cases
> >> should be handled, if at all.
> >>
> >> I think 2. could in theory be addressed by implementing
> >> CodeBlob::preserve_callee_argument_oops. Though, it has been working
> >> fine so far without this, so I'm wondering if this is even needed. Is
> >> the caller or callee responsible for handling argument oops (seems to
> >> be caller, from looking at
> CompiledMethod::preserve_callee_argument_oops)?
> >> Or does the caller just handle the receiver if there is one (since
> >> deopt spills that into the callers frame)? The oop offset is passed
> >> to an OopClosure in CompiledArgumentOopFinder::handle_oop_offset as
> >> an oop* [2]. Does the argument register get spilled somewhere and the
> >> oop needs to be patched in place at that address (by the OopClosure)?
> >> Or is this just used to mark the oop as alive? (in the latter case, the JNI
> global should be enough I think).
> >>
> >> I think 3. could be handled with an OopMap entry at the frame offset
> >> where the receiver is spilled during a deopt of the callee? Should it
> >> be an oop or a narrowOop, or does it depend on VM settings? FWIW, the
> >> deopt code always seems to need a machine word (64-bits) to do the
> >> spilling, so I think it's an oop? Do I need to zero out that part of
> >> the frame when allocating the frame so that the GC doesn't mistake
> >> some garbage that's in there for an oop?
> >>
> >> I have a POC patch here for reference [3], that implements the 2
> >> things above. This passes our test suite, but I'm not sure about the
> correctness.
> >> Looking at what JNI does for upcalls [4], I don't see how e.g. the
> >> receiver argument that is put on the stack is handled, or what
> >> happens when the callee deopts (though I think it would just
> >> overwrite the value on the stack that's there already, since JNI
> >> always seems to do interpreted calls, where we do compiled calls).
> >> But, JNI/the call stub might be special cased elsewhere...
> >>
> >> Also, the oop is briefly stored in rscratch1 when resolving. I'm
> >> interested to know when the GC can look at the frame and register
> >> state, especially with concurrent GCs in mind. I'm assuming it's only
> >> during the call to the MH VM entry (but the existence of
> frame::safe_for_sender makes me less sure)?
> >> AFAIK the call counts as a safepoint (with oop map for it typically
> >> stored at the return offset). At this safepoint, the oop can only be
> >> stored at one of the
> >> 3 places listed at the start.
> >>
> >> Thanks,
> >> Jorn
> >>
> >> [1] :
> >> https://github.com/openjdk/panama-foreign/blob/foreign-
> >> jextract/src/hotspot/cpu/x86/universalUpcallHandler_x86_64.cpp#L412-L
> >> 416
> >> [2] :
> >>
> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/
> >> fr
> >> ame.cpp#L939-L946
> >> [3] :
> >> https://github.com/openjdk/panama-foreign/compare/foreign-
> >> memaccess+abi...JornVernee:Deopt_Crash
> >> [4] :
> >>
> https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/stubGe
> >> nerator_x86_64.cpp#L339

From jorn.vernee at oracle.com  Wed Nov 17 15:35:16 2021
From: jorn.vernee at oracle.com (Jorn Vernee)
Date: Wed, 17 Nov 2021 16:35:16 +0100
Subject: Questions about oop handling for Panama upcalls.
In-Reply-To: <BN0PR10MB517672D2A13028547A57F7DBF49A9@BN0PR10MB5176.namprd10.prod.outlook.com>
References: <700ffdf2-f63d-7d91-828a-d41e9aa433e5@oracle.com>
 <BN0PR10MB5176632CF1C775BC4BE361D1F49A9@BN0PR10MB5176.namprd10.prod.outlook.com>
 <9e2fa731-ff0c-3497-eda0-2ca394a1f33b@oracle.com>
 <BN0PR10MB517672D2A13028547A57F7DBF49A9@BN0PR10MB5176.namprd10.prod.outlook.com>
Message-ID: <89b42995-1504-d3cc-1d37-595610b75801@oracle.com>

On 17/11/2021 16:14, Erik Osterlund wrote:
> Hi Jorn,
>
> In the interpreter world, the expression stack at the call site becomes the locals
> of the callee. So everything is passed through the stack. So the upcall stub sets
> things up like an interpreter method would have (quack quack), and calls the
> i2c adapter if there is an nmethod (quack quack), which will transform the
> arguments to the compiled convention of the callee. The argument ownership
> then switches from the caller to the callee, once the callee can manifest on the
> stack. But if there are safepoints inbetween, then the caller owns the arguments
> until its callee manifests.
Okay, thanks, that makes sense. This probably explains why not 
implementing preserve_callee_argument_oops for the upcall stubs didn't 
cause any problems so far. There probably just weren't any safepoints in 
between the call from the stub and the callee setting up it's frame. 
(although I'm still a bit confused here why the callee doesn't make 
space for the receiver in it's frame as well).
> Do you want to avoid the pretend to be the interpreter step because it is costly
> in the Panama world to spill arguments to the stack?
I think either one could "work", although it seems like interpreter 
calls require more setup of meta data around calls (which would be 
unneeded if we called into an nmethod I think?). Also, we generate an 
argument shuffle from the native convention to the Java calling 
convention (this is unavoidable). If the native convention passes 
arguments in the same registers that the Java convention expects them in 
we don't have to generate code for that in the shuffle. Theoretically we 
could also do a pass to minimize the needed shuffle by reordering 
parameters on the MethodHandle. If we went with an interpreted calling 
convention, we would always have to copy across arguments to the stack, 
in a shuffle-ish manner (right now we rely on 
SharedRuntime::java_calling_convention to compute the target registers. 
Would have to implement something similar for the interpreter convention).

It seems to me that in the long run, going with the Java compiled 
calling convention for the upcall is the right choice if we want to be 
able to squeeze out as much speed as possible.

Jorn
>
> /Erik
>
>> -----Original Message-----
>> From: Jorn Vernee <jorn.vernee at oracle.com>
>> Sent: Wednesday, 17 November 2021 15:49
>> To: Erik Osterlund <erik.osterlund at oracle.com>; hotspot-
>> dev at openjdk.java.net
>> Subject: Re: Questions about oop handling for Panama upcalls.
>>
>> Hi Erik,
>>
>> Thanks for the suggestion.
>>
>> The callee is a mix of JDK internal and user code. The user gives us a method
>> handle that they want to turn into a native function pointer [1], and we adapt
>> that using method handle combinators [2] to take only primitve arguments
>> according to the registers in which the native calling convention passes
>> arguments (essentially each primitive argument is a register value). The
>> register values are then reconstructed into high-level arguments (through
>> our MH adaptation), and passed to the user code. It's this adapted method
>> handle that we call from the upcall stub.
>>
>> I guess what you're suggesting is that we have some internal Java method
>> like this:
>>
>>   ??? static ... invoke(long methodHandle, ...) {
>>   ??????? MethodHandle mh = resolveJObject(methodHandle);
>>   ??????? return (...) mh.invokeExact(...);
>>   ??? }
>>
>> Which is then called from the upcall stub instead.
>>
>> I think it could work maybe (would have to see how the performance works
>> out), but we have to deal with different signatures, so would have to use
>> bytecode spinning to generate these 'invoke' methods on demand, which
>> seems like maybe it's a worse medicine (in terms of complexity) than adding
>> the correct oop handling in the VM.
>>
>> I would also just like to get a better understanding of how this is supposed to
>> work in the first place (or how it works e.g. in the case of nmethods), since I
>> had to implement the correct oop handling in the past as well when
>> implementing the intrinsics for down calls, and it's probably not the last time I
>> have to deal with something like this...
>>
>>   > Our current upcall stubs try to quack like an interpreter in many ways, so
>> that it will look like an i-2-something call. I think you can either try to do the
>> same quacking dance, to pass the oop to the callee
>>
>> So, I suppose interpreter argument oops are handled through another
>> mechanism than OopMaps, maybe something similar to
>> CompiledMethod::preserve_callee_argument_oops?
>>
>> Thanks,
>> Jorn
>>
>> [1] :
>> https://github.com/openjdk/panama-foreign/blob/foreign-
>> jextract/src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/CLink
>> er.java#L224
>> [2] :
>> https://github.com/openjdk/panama-foreign/blob/foreign-
>> jextract/src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/abi/Pr
>> ogrammableUpcallHandler.java#L157
>>
>> On 17/11/2021 10:42, Erik Osterlund wrote:
>>> Hi Jorn,
>>>
>>> So you have a jobject in the caller, resolve it, and then need to pass the
>> oop around as an argument to the callee. Our current upcall stubs try to
>> quack like an interpreter in many ways, so that it will look like an i-2-
>> something call. I think you can either try to do the same quacking dance, to
>> pass the oop to the callee, or alternatively the primary question for me
>> seems to be who is the callee? You have a very fixed format for the call,
>> which makes me suspect the callee is some kind of JDK internal code.
>> Another way of dealing with this would be to pass the jobject as a long and
>> just resolve it in the callee instead, if this is indeed JDK internal code. Then
>> this becomes a problem that doesn't need to be solved at all. Just sanity
>> checking.
>>> /Erik
>>>
>>>> -----Original Message-----
>>>> From: hotspot-dev<hotspot-dev-retn at openjdk.java.net>  On Behalf Of
>>>> Jorn Vernee
>>>> Sent: Tuesday, 16 November 2021 18:51 To:hotspot-
>> dev at openjdk.java.net
>>>> Subject: Questions about oop handling for Panama upcalls.
>>>>
>>>> Hi,
>>>>
>>>> For panama-foreign upcalls we spin our own upcall stubs that wrap a
>>>> method handle VM entry for the actual upcall. I want to make sure I
>>>> have the oop handling correct on this.
>>>>
>>>> We receive a list of arguments from native code (all primitives, so
>>>> no oops to handle there), and then prefix that list with a
>>>> MethodHandle oop, before calling into the MH's VM entry. The MH oop
>>>> can be stored in three different
>>>> places:
>>>>
>>>> 1. The MH oop is stored in a global JNI handle, and then resolved
>>>> right before the upcall [1].
>>>> 2. The MH oop is then stored in the first argument register j_rarg0
>>>> for the call.
>>>> 3. During a deopt of the callee, the deoptimization code spills the
>>>> receiver (MH oop) into the frame of the upcall stub. (looks like the
>>>> extending of the frame that happens for instance in c2i adapters
>>>> doesn't make room for the receiver?).
>>>>
>>>> I don't think I need to do anything else for 1., but for 2. and 3.
>>>> there is currently no handling. I wanted to ask how those cases
>>>> should be handled, if at all.
>>>>
>>>> I think 2. could in theory be addressed by implementing
>>>> CodeBlob::preserve_callee_argument_oops. Though, it has been working
>>>> fine so far without this, so I'm wondering if this is even needed. Is
>>>> the caller or callee responsible for handling argument oops (seems to
>>>> be caller, from looking at
>> CompiledMethod::preserve_callee_argument_oops)?
>>>> Or does the caller just handle the receiver if there is one (since
>>>> deopt spills that into the callers frame)? The oop offset is passed
>>>> to an OopClosure in CompiledArgumentOopFinder::handle_oop_offset as
>>>> an oop* [2]. Does the argument register get spilled somewhere and the
>>>> oop needs to be patched in place at that address (by the OopClosure)?
>>>> Or is this just used to mark the oop as alive? (in the latter case, the JNI
>> global should be enough I think).
>>>> I think 3. could be handled with an OopMap entry at the frame offset
>>>> where the receiver is spilled during a deopt of the callee? Should it
>>>> be an oop or a narrowOop, or does it depend on VM settings? FWIW, the
>>>> deopt code always seems to need a machine word (64-bits) to do the
>>>> spilling, so I think it's an oop? Do I need to zero out that part of
>>>> the frame when allocating the frame so that the GC doesn't mistake
>>>> some garbage that's in there for an oop?
>>>>
>>>> I have a POC patch here for reference [3], that implements the 2
>>>> things above. This passes our test suite, but I'm not sure about the
>> correctness.
>>>> Looking at what JNI does for upcalls [4], I don't see how e.g. the
>>>> receiver argument that is put on the stack is handled, or what
>>>> happens when the callee deopts (though I think it would just
>>>> overwrite the value on the stack that's there already, since JNI
>>>> always seems to do interpreted calls, where we do compiled calls).
>>>> But, JNI/the call stub might be special cased elsewhere...
>>>>
>>>> Also, the oop is briefly stored in rscratch1 when resolving. I'm
>>>> interested to know when the GC can look at the frame and register
>>>> state, especially with concurrent GCs in mind. I'm assuming it's only
>>>> during the call to the MH VM entry (but the existence of
>> frame::safe_for_sender makes me less sure)?
>>>> AFAIK the call counts as a safepoint (with oop map for it typically
>>>> stored at the return offset). At this safepoint, the oop can only be
>>>> stored at one of the
>>>> 3 places listed at the start.
>>>>
>>>> Thanks,
>>>> Jorn
>>>>
>>>> [1] :
>>>> https://github.com/openjdk/panama-foreign/blob/foreign-
>>>> jextract/src/hotspot/cpu/x86/universalUpcallHandler_x86_64.cpp#L412-L
>>>> 416
>>>> [2] :
>>>>
>> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/
>>>> fr
>>>> ame.cpp#L939-L946
>>>> [3] :
>>>> https://github.com/openjdk/panama-foreign/compare/foreign-
>>>> memaccess+abi...JornVernee:Deopt_Crash
>>>> [4] :
>>>>
>> https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/stubGe
>>>> nerator_x86_64.cpp#L339

From shade at openjdk.java.net  Wed Nov 17 15:40:36 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Wed, 17 Nov 2021 15:40:36 GMT
Subject: RFR: 8274903: Zero: Support AsyncGetCallTrace [v3]
In-Reply-To: <RqLUwBSarIPHSo_iL10zgPJGVg8fSe7KPQZbL4ruCaU=.c901b5e4-8e3e-4e66-a1b8-55c7f2edab81@github.com>
References: <JjNvKdaMic8QCxlSJG-pmw0Ru9eLqwnf3KQ8xGVzETY=.4d44f1e7-503a-4f08-8d62-3bf7eae74a49@github.com>
 <t6nZoibe2zyKRfM8pbTgdwA5yRBEfQomcoJMV3h8g4g=.cbe7cc85-d3a8-4d23-80df-957d15cdc989@github.com>
 <RqLUwBSarIPHSo_iL10zgPJGVg8fSe7KPQZbL4ruCaU=.c901b5e4-8e3e-4e66-a1b8-55c7f2edab81@github.com>
Message-ID: <xleenxEq5unOjxvelnSniYUEGmQFxQNmI-p1_fkgP5A=.8794a4c0-e9e2-41a7-bee3-af75df63cb88@github.com>

On Wed, 10 Nov 2021 18:03:00 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote:

>> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   More reviews
>
> Marked as reviewed by sspitsyn (Reviewer).

> Thank you, @sspitsyn! Any more reviews, anyone?

No other reviews? I'd like to integrate this soon.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5848

From duke at openjdk.java.net  Wed Nov 17 15:46:38 2021
From: duke at openjdk.java.net (Evgeny Astigeevich)
Date: Wed, 17 Nov 2021 15:46:38 GMT
Subject: RFR: 8277137: Set OnSpinWaitInst/OnSpinWaitInstCount defaults to
 "isb"/1 for Arm Neoverse N1 [v3]
In-Reply-To: <rZv8bPMXd0EIyGqH3UKloJMXlsw5Oe5KE0kXegj4_nc=.1fcc7aa0-3b49-4321-a0b8-e48f779c54e9@github.com>
References: <-I6bN1jTD9JWI_Nsrf64Yy25zpmeF7Q6Vpe5vSLPKY8=.33084f60-ebd4-4f8f-b56e-408f0fd807b6@github.com>
 <rZv8bPMXd0EIyGqH3UKloJMXlsw5Oe5KE0kXegj4_nc=.1fcc7aa0-3b49-4321-a0b8-e48f779c54e9@github.com>
Message-ID: <6dPmyx5EBbz9tN_rgpgcCx6u7v5CJsswOsB0qpEkDKY=.4e4d5f98-df5b-47df-9885-f4cdc84a48d3@github.com>

On Wed, 17 Nov 2021 12:31:10 GMT, Evgeny Astigeevich <duke at openjdk.java.net> wrote:

>> One `ISB` implementation of `Thread.OnSpinWait` shows performance improvements on Graviton2 (Arm Neoverse N1 implementation), e.g. https://github.com/openjdk/jdk/pull/5562#issuecomment-966153163. 
>> 
>> Testing:
>> - `make test TEST=gtest`: Passed
>> - `make run-test TEST=tier1`: Passed
>> - `make run-test TEST=tier2`: Passed
>> - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed
>
> Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Set defaults for OnSpinWaitInst/OnSpinWaitInstCount independently

Hi Andrew,
Thank you for reviewing.

> Did we establish that this is the right default for Neoverse N1?

This is based on:
- MySql: https://bugs.mysql.com/bug.php?id=100664
- MongoDB: https://jira.mongodb.org/browse/WT-6872
- Netty: https://github.com/netty/netty/pull/11677
- Customers' benchmarks and workloads.
- Experiments with two and three `ISB` instructions.

> On the other hand, do we know of possible cases where ISB makes things worse?

`Thread.onSpinWait` makes things worse when synchronisation overhead is not on the critical path. It might not improve performance when there is thread contention. In this case it might not give CPU resources to another thread. This applies to both arm64 and x86_64.
For example, my x86 system:

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              16
On-line CPU(s) list: 0-15
Thread(s) per core:  2
Core(s) per socket:  8
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
Stepping:            7
CPU MHz:             3097.588
BogoMIPS:            4999.99
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            36608K

Results of `org.openjdk.bench.java.lang.ThreadOnSpinWaitSharedCounter` with 4 threads running on 2 vCPUs:
- `taskset -c 0-1 build/linux-x86_64-server-release/images/jdk/bin/java -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_onSpinWait -jar build/linux-x86_64-server-release/images/test/micro/benchmarks.jar -f 3 org.openjdk.bench.java.lang.ThreadOnSpinWaitSharedCounter` 

Benchmark                            (maxNum)  (threadCount)  Mode  Cnt   Score   Error  Units
ThreadOnSpinWaitSharedCounter.trial   1000000              4  avgt   15  45.317 ? 1.741  ms/op

- `taskset -c 0-1 build/linux-x86_64-server-release/images/jdk/bin/java -jar build/linux-x86_64-server-release/images/test/micro/benchmarks.jar -f 3 org.openjdk.bench.java.lang.ThreadOnSpinWaitSharedCounter`

Benchmark                            (maxNum)  (threadCount)  Mode  Cnt   Score   Error  Units
ThreadOnSpinWaitSharedCounter.trial   1000000              4  avgt   15  55.530 ? 4.606  ms/op


X86 `PAUSE` based implementation causes 22.5% slowdown.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6415

From aph at openjdk.java.net  Wed Nov 17 16:36:36 2021
From: aph at openjdk.java.net (Andrew Haley)
Date: Wed, 17 Nov 2021 16:36:36 GMT
Subject: RFR: 8277137: Set OnSpinWaitInst/OnSpinWaitInstCount defaults to
 "isb"/1 for Arm Neoverse N1 [v3]
In-Reply-To: <rZv8bPMXd0EIyGqH3UKloJMXlsw5Oe5KE0kXegj4_nc=.1fcc7aa0-3b49-4321-a0b8-e48f779c54e9@github.com>
References: <-I6bN1jTD9JWI_Nsrf64Yy25zpmeF7Q6Vpe5vSLPKY8=.33084f60-ebd4-4f8f-b56e-408f0fd807b6@github.com>
 <rZv8bPMXd0EIyGqH3UKloJMXlsw5Oe5KE0kXegj4_nc=.1fcc7aa0-3b49-4321-a0b8-e48f779c54e9@github.com>
Message-ID: <3zBHQL8aaLzwDww80QuHhjFrRvXrwoQIwvJkQeOvUFs=.ed8bdf82-c311-4519-ab08-c447733bdd5f@github.com>

On Wed, 17 Nov 2021 12:31:10 GMT, Evgeny Astigeevich <duke at openjdk.java.net> wrote:

>> One `ISB` implementation of `Thread.OnSpinWait` shows performance improvements on Graviton2 (Arm Neoverse N1 implementation), e.g. https://github.com/openjdk/jdk/pull/5562#issuecomment-966153163. 
>> 
>> Testing:
>> - `make test TEST=gtest`: Passed
>> - `make run-test TEST=tier1`: Passed
>> - `make run-test TEST=tier2`: Passed
>> - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed
>
> Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Set defaults for OnSpinWaitInst/OnSpinWaitInstCount independently

Marked as reviewed by aph (Reviewer).

Hi,

> > Did we establish that this is the right default for Neoverse N1?
> 
> This is based on:
> 
>     * MySql: https://bugs.mysql.com/bug.php?id=100664
> 
>     * MongoDB: https://jira.mongodb.org/browse/WT-6872
> 
>     * Netty: [Use cpu_relax() implementation for aarch64 netty/netty#11677](https://github.com/netty/netty/pull/11677)
> 
>     * Customers' benchmarks and workloads.
> 
>     * Experiments with two and three `ISB` instructions.

OK, I'll buy that.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6415

From phh at openjdk.java.net  Wed Nov 17 16:54:40 2021
From: phh at openjdk.java.net (Paul Hohensee)
Date: Wed, 17 Nov 2021 16:54:40 GMT
Subject: RFR: 8277137: Set OnSpinWaitInst/OnSpinWaitInstCount defaults to
 "isb"/1 for Arm Neoverse N1 [v3]
In-Reply-To: <rZv8bPMXd0EIyGqH3UKloJMXlsw5Oe5KE0kXegj4_nc=.1fcc7aa0-3b49-4321-a0b8-e48f779c54e9@github.com>
References: <-I6bN1jTD9JWI_Nsrf64Yy25zpmeF7Q6Vpe5vSLPKY8=.33084f60-ebd4-4f8f-b56e-408f0fd807b6@github.com>
 <rZv8bPMXd0EIyGqH3UKloJMXlsw5Oe5KE0kXegj4_nc=.1fcc7aa0-3b49-4321-a0b8-e48f779c54e9@github.com>
Message-ID: <z9nDs98qxyzBAfQBY3JPvshOl_rC5-59i9xeFjKNRjI=.ddbff4c9-070f-4d78-a01c-ff9abb90d0b8@github.com>

On Wed, 17 Nov 2021 12:31:10 GMT, Evgeny Astigeevich <duke at openjdk.java.net> wrote:

>> One `ISB` implementation of `Thread.OnSpinWait` shows performance improvements on Graviton2 (Arm Neoverse N1 implementation), e.g. https://github.com/openjdk/jdk/pull/5562#issuecomment-966153163. 
>> 
>> Testing:
>> - `make test TEST=gtest`: Passed
>> - `make run-test TEST=tier1`: Passed
>> - `make run-test TEST=tier2`: Passed
>> - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed
>
> Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Set defaults for OnSpinWaitInst/OnSpinWaitInstCount independently

Lgtm.

-------------

Marked as reviewed by phh (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6415

From duke at openjdk.java.net  Wed Nov 17 16:54:41 2021
From: duke at openjdk.java.net (Evgeny Astigeevich)
Date: Wed, 17 Nov 2021 16:54:41 GMT
Subject: RFR: 8277137: Set OnSpinWaitInst/OnSpinWaitInstCount defaults to
 "isb"/1 for Arm Neoverse N1 [v3]
In-Reply-To: <3zBHQL8aaLzwDww80QuHhjFrRvXrwoQIwvJkQeOvUFs=.ed8bdf82-c311-4519-ab08-c447733bdd5f@github.com>
References: <-I6bN1jTD9JWI_Nsrf64Yy25zpmeF7Q6Vpe5vSLPKY8=.33084f60-ebd4-4f8f-b56e-408f0fd807b6@github.com>
 <rZv8bPMXd0EIyGqH3UKloJMXlsw5Oe5KE0kXegj4_nc=.1fcc7aa0-3b49-4321-a0b8-e48f779c54e9@github.com>
 <3zBHQL8aaLzwDww80QuHhjFrRvXrwoQIwvJkQeOvUFs=.ed8bdf82-c311-4519-ab08-c447733bdd5f@github.com>
Message-ID: <CVCD80-Jk9jJj1C5HnSRnjFZ1jLs_UwXkWAz7iUflqE=.1341f507-e1f7-43b0-9652-cd5a8b92bc3a@github.com>

On Wed, 17 Nov 2021 16:32:45 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Set defaults for OnSpinWaitInst/OnSpinWaitInstCount independently
>
> Hi,
> 
>> > Did we establish that this is the right default for Neoverse N1?
>> 
>> This is based on:
>> 
>>     * MySql: https://bugs.mysql.com/bug.php?id=100664
>> 
>>     * MongoDB: https://jira.mongodb.org/browse/WT-6872
>> 
>>     * Netty: [Use cpu_relax() implementation for aarch64 netty/netty#11677](https://github.com/netty/netty/pull/11677)
>> 
>>     * Customers' benchmarks and workloads.
>> 
>>     * Experiments with two and three `ISB` instructions.
> 
> OK, I'll buy that.

@theRealAph Thank you.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6415

From duke at openjdk.java.net  Wed Nov 17 16:54:41 2021
From: duke at openjdk.java.net (Evgeny Astigeevich)
Date: Wed, 17 Nov 2021 16:54:41 GMT
Subject: RFR: 8277137: Set OnSpinWaitInst/OnSpinWaitInstCount defaults to
 "isb"/1 for Arm Neoverse N1 [v3]
In-Reply-To: <z9nDs98qxyzBAfQBY3JPvshOl_rC5-59i9xeFjKNRjI=.ddbff4c9-070f-4d78-a01c-ff9abb90d0b8@github.com>
References: <-I6bN1jTD9JWI_Nsrf64Yy25zpmeF7Q6Vpe5vSLPKY8=.33084f60-ebd4-4f8f-b56e-408f0fd807b6@github.com>
 <rZv8bPMXd0EIyGqH3UKloJMXlsw5Oe5KE0kXegj4_nc=.1fcc7aa0-3b49-4321-a0b8-e48f779c54e9@github.com>
 <z9nDs98qxyzBAfQBY3JPvshOl_rC5-59i9xeFjKNRjI=.ddbff4c9-070f-4d78-a01c-ff9abb90d0b8@github.com>
Message-ID: <54L9vtPkUc1djo2iQLpaZBjGJpwe4cxaCbrulq5TC7o=.a3cfb114-dc74-45c6-8b69-cfc99813e3f1@github.com>

On Wed, 17 Nov 2021 16:50:43 GMT, Paul Hohensee <phh at openjdk.org> wrote:

>> Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Set defaults for OnSpinWaitInst/OnSpinWaitInstCount independently
>
> Lgtm.

@phohensee Thank you.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6415

From sspitsyn at openjdk.java.net  Wed Nov 17 18:00:40 2021
From: sspitsyn at openjdk.java.net (Serguei Spitsyn)
Date: Wed, 17 Nov 2021 18:00:40 GMT
Subject: RFR: 8276177:
 nsk/jvmti/RedefineClasses/StressRedefineWithoutBytecodeCorruption failed with
 "assert(def_ik->is_being_redefined()) failed: should be being redefined to get
 here"
In-Reply-To: <bIc3bHZoI18uZLind1OWmwcRLt7MPg9XkQ4AgrMnXJQ=.1be19b81-5b3b-4b22-b00d-c5772f621916@github.com>
References: <bIc3bHZoI18uZLind1OWmwcRLt7MPg9XkQ4AgrMnXJQ=.1be19b81-5b3b-4b22-b00d-c5772f621916@github.com>
Message-ID: <h21pjMPm7qzeEFy2JmAA_qQ6zn-1Xaxgj3rLvGlS_qQ=.787f5d83-70d0-42a9-a1f7-6602e50991ab@github.com>

On Tue, 16 Nov 2021 13:29:08 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

> The boolean AND was sharing a flag with another thread, so the value of _misc_is_being_redefined was being set and reset with the other thread.  Moved the bit to AccessFlags which has space and an atomic set operation.
> Tested with tier1-6, 7-8 in progress.

Hi Coleen,
Great discovery!
The fix looks good.
Thanks,
Serguei

-------------

Marked as reviewed by sspitsyn (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6410

From coleenp at openjdk.java.net  Wed Nov 17 19:57:47 2021
From: coleenp at openjdk.java.net (Coleen Phillimore)
Date: Wed, 17 Nov 2021 19:57:47 GMT
Subject: RFR: 8276177:
 nsk/jvmti/RedefineClasses/StressRedefineWithoutBytecodeCorruption failed with
 "assert(def_ik->is_being_redefined()) failed: should be being redefined to get
 here"
In-Reply-To: <bIc3bHZoI18uZLind1OWmwcRLt7MPg9XkQ4AgrMnXJQ=.1be19b81-5b3b-4b22-b00d-c5772f621916@github.com>
References: <bIc3bHZoI18uZLind1OWmwcRLt7MPg9XkQ4AgrMnXJQ=.1be19b81-5b3b-4b22-b00d-c5772f621916@github.com>
Message-ID: <Iy2-nENDxMBLms42MFEYnFBRxJgg4VNTQVIgmOpUBFk=.c9b3365b-03ae-44b5-85c3-7da660bbd529@github.com>

On Tue, 16 Nov 2021 13:29:08 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

> The boolean AND was sharing a flag with another thread, so the value of _misc_is_being_redefined was being set and reset with the other thread.  Moved the bit to AccessFlags which has space and an atomic set operation.
> Tested with tier1-6, 7-8 in progress.

Thank you Serguei and Harold.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6410

From coleenp at openjdk.java.net  Wed Nov 17 19:57:48 2021
From: coleenp at openjdk.java.net (Coleen Phillimore)
Date: Wed, 17 Nov 2021 19:57:48 GMT
Subject: Integrated: 8276177:
 nsk/jvmti/RedefineClasses/StressRedefineWithoutBytecodeCorruption failed with
 "assert(def_ik->is_being_redefined()) failed: should be being redefined to get
 here"
In-Reply-To: <bIc3bHZoI18uZLind1OWmwcRLt7MPg9XkQ4AgrMnXJQ=.1be19b81-5b3b-4b22-b00d-c5772f621916@github.com>
References: <bIc3bHZoI18uZLind1OWmwcRLt7MPg9XkQ4AgrMnXJQ=.1be19b81-5b3b-4b22-b00d-c5772f621916@github.com>
Message-ID: <WoNWWq7mgj43BLAtKVEgQmVbQoFSAiAxmCY3NwW9bZs=.2291f20f-48d3-4722-b31b-f9fe03a3ee0e@github.com>

On Tue, 16 Nov 2021 13:29:08 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

> The boolean AND was sharing a flag with another thread, so the value of _misc_is_being_redefined was being set and reset with the other thread.  Moved the bit to AccessFlags which has space and an atomic set operation.
> Tested with tier1-6, 7-8 in progress.

This pull request has now been integrated.

Changeset: a907b2b1
Author:    Coleen Phillimore <coleenp at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/a907b2b144f2af27392eb7c2f9656fbb1a759618
Stats:     13 lines in 3 files changed: 7 ins; 2 del; 4 mod

8276177: nsk/jvmti/RedefineClasses/StressRedefineWithoutBytecodeCorruption failed with "assert(def_ik->is_being_redefined()) failed: should be being redefined to get here"

Reviewed-by: hseigel, sspitsyn

-------------

PR: https://git.openjdk.java.net/jdk/pull/6410

From sspitsyn at openjdk.java.net  Wed Nov 17 22:37:01 2021
From: sspitsyn at openjdk.java.net (Serguei Spitsyn)
Date: Wed, 17 Nov 2021 22:37:01 GMT
Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with
 "assert(java_thread == _state->get_thread()) failed: Must be"
Message-ID: <PqDFsCk3WFtK9ZA-Blegiu6CzrA7EoiG6jx0raAFF7A=.5ddd8298-66cb-47c4-b53d-ca148ce2fc5b@github.com>

The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa.
The fix is to add a check for is_exiting() status into handshake closure do_thread() early.
There following handshake closures are fixed by this update:
  - UpdateForPopTopFrameClosure
 - SetForceEarlyReturn
 - SetFramePopClosure

-------------

Commit messages:
 - remove vmTestbase/nsk/jvmti/PopFrame/popframe011 from ProblemList.txt
 - fix 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with assert

Changes: https://git.openjdk.java.net/jdk/pull/6440/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6440&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8266593
  Stats: 22 lines in 2 files changed: 10 ins; 6 del; 6 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6440.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6440/head:pull/6440

PR: https://git.openjdk.java.net/jdk/pull/6440

From sviswanathan at openjdk.java.net  Wed Nov 17 22:57:41 2021
From: sviswanathan at openjdk.java.net (Sandhya Viswanathan)
Date: Wed, 17 Nov 2021 22:57:41 GMT
Subject: RFR: 8275643: C2's unaryOp vector intrinsic does not properly
 handle LongVector.neg
In-Reply-To: <VIWd1DGe48UKFJdi61wqQqYOOuFed9_yEgqMaz1444k=.43162c19-75d4-4806-81dc-29ad772d7155@github.com>
References: <VIWd1DGe48UKFJdi61wqQqYOOuFed9_yEgqMaz1444k=.43162c19-75d4-4806-81dc-29ad772d7155@github.com>
Message-ID: <fuvWYYcAS3lMEbupsHhAB7L5_r_OkLI8kL2EqOxVExo=.def40272-6820-46fa-b484-8f45afe4023e@github.com>

On Wed, 17 Nov 2021 11:41:04 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote:

> Code in `LongVector::lanewiseTemplate` currently implements the `NEG` operation as a `SUB` and has a corresponding `FIXME` comment:
> https://github.com/openjdk/jdk/blob/e9934e1243929514e147ecdd3cefa74168ed0500/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/LongVector.java#L534-L541
> 
> The implicit assumption is that since we will never pass `NEG` to `VectorSupport.unaryOp` in line 540, the corresponding C2 intrinsic does not need to handle that case. That's not guaranteed though because C2 might still compile that path when not being able to prove that it's unreachable at parse time. As a result, we then assert in the intrinsic because the negation operation on a long vector is currently not supported (i.e. there is no `Op_NegVL`). I propose to simply handle this case in ` VectorSupport::vop2ideal`. We will then bail out from intrinsification with `operation not supported: opc=NegL bt=long` because `VectorNode::opcode` returns 0:
> https://github.com/openjdk/jdk/blob/e9934e1243929514e147ecdd3cefa74168ed0500/src/hotspot/share/opto/vectorIntrinsics.cpp#L390-L394
> 
> Question to the Vector API experts: There are other `FIXME: Support this in the JIT` comments in the code. Do these code paths suffer from similar issues? Is there a tracking RFE/bug?
> 
> Thanks,
> Tobias

Looks good to me.

-------------

Marked as reviewed by sviswanathan (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6428

From mdoerr at openjdk.java.net  Wed Nov 17 23:02:48 2021
From: mdoerr at openjdk.java.net (Martin Doerr)
Date: Wed, 17 Nov 2021 23:02:48 GMT
Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with
 "assert(java_thread == _state->get_thread()) failed: Must be"
In-Reply-To: <PqDFsCk3WFtK9ZA-Blegiu6CzrA7EoiG6jx0raAFF7A=.5ddd8298-66cb-47c4-b53d-ca148ce2fc5b@github.com>
References: <PqDFsCk3WFtK9ZA-Blegiu6CzrA7EoiG6jx0raAFF7A=.5ddd8298-66cb-47c4-b53d-ca148ce2fc5b@github.com>
Message-ID: <9_lGsNSJueWi-Q6czgJUI8Ps9RuSK4sW8w4HO8uPfHU=.74eee17c-0d7e-4811-a6c6-fe90f60abd09@github.com>

On Wed, 17 Nov 2021 22:21:33 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote:

> The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa.
> The fix is to add a check for is_exiting() status into handshake closure do_thread() early.
> There following handshake closures are fixed by this update:
>   - UpdateForPopTopFrameClosure
>  - SetForceEarlyReturn
>  - SetFramePopClosure

LGTM. Thanks for fixing it!

-------------

Marked as reviewed by mdoerr (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6440

From dlong at openjdk.java.net  Thu Nov 18 00:26:37 2021
From: dlong at openjdk.java.net (Dean Long)
Date: Thu, 18 Nov 2021 00:26:37 GMT
Subject: RFR: 8266368: Inaccurate after_unwind hook in C2 exception handler
In-Reply-To: <c0IRGAICnZCtKjSL4QU9IliPSVnNZmzAQc2hrLDUXVw=.90d2b414-a202-4f2a-b69a-39783c24c36f@github.com>
References: <c0IRGAICnZCtKjSL4QU9IliPSVnNZmzAQc2hrLDUXVw=.90d2b414-a202-4f2a-b69a-39783c24c36f@github.com>
Message-ID: <kxKB8rxJAvqWkH0fDGz0rW5D5fh7zo0vVp3NDpayHhk=.9a2aca6c-4da3-4639-8762-092aeff5584c@github.com>

On Tue, 16 Nov 2021 08:42:32 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote:

> When we throw an exception and unwind into a frame, the exception handler of that frame needs to call an after_unwind hook for the StackWatermark code, to support for concurrent stack processing. Unfortunately, for C2 frames, I inaccurately do this in OptoRuntime::rethrow_C, but the exception handler when unwinding into a C2 frame really is OptoRuntime::handle_exception_C.
> The handle_exception_C code does walk frames to the caller though, which also pokes the StackWatermark code. So in the end, there is no real bug here, but it works for the wrong reasons. So I'd like to move the hook in rethrow_C to handle_exception_C.

Looks good!

-------------

Marked as reviewed by dlong (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6405

From lmesnik at openjdk.java.net  Thu Nov 18 00:35:41 2021
From: lmesnik at openjdk.java.net (Leonid Mesnik)
Date: Thu, 18 Nov 2021 00:35:41 GMT
Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with
 "assert(java_thread == _state->get_thread()) failed: Must be"
In-Reply-To: <PqDFsCk3WFtK9ZA-Blegiu6CzrA7EoiG6jx0raAFF7A=.5ddd8298-66cb-47c4-b53d-ca148ce2fc5b@github.com>
References: <PqDFsCk3WFtK9ZA-Blegiu6CzrA7EoiG6jx0raAFF7A=.5ddd8298-66cb-47c4-b53d-ca148ce2fc5b@github.com>
Message-ID: <eCXxqDUrTXraJ5CuXnrhJY-c75y5oe1r4n7g-6gAKfc=.a899a8d2-03ec-436c-880d-9a3eb39de65d@github.com>

On Wed, 17 Nov 2021 22:21:33 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote:

> The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa.
> The fix is to add a check for is_exiting() status into handshake closure do_thread() early.
> There following handshake closures are fixed by this update:
>   - UpdateForPopTopFrameClosure
>  - SetForceEarlyReturn
>  - SetFramePopClosure

Marked as reviewed by lmesnik (Reviewer).

-------------

PR: https://git.openjdk.java.net/jdk/pull/6440

From ngasson at openjdk.java.net  Thu Nov 18 01:31:48 2021
From: ngasson at openjdk.java.net (Nick Gasson)
Date: Thu, 18 Nov 2021 01:31:48 GMT
Subject: RFR: 8277137: Set OnSpinWaitInst/OnSpinWaitInstCount defaults to
 "isb"/1 for Arm Neoverse N1 [v3]
In-Reply-To: <rZv8bPMXd0EIyGqH3UKloJMXlsw5Oe5KE0kXegj4_nc=.1fcc7aa0-3b49-4321-a0b8-e48f779c54e9@github.com>
References: <-I6bN1jTD9JWI_Nsrf64Yy25zpmeF7Q6Vpe5vSLPKY8=.33084f60-ebd4-4f8f-b56e-408f0fd807b6@github.com>
 <rZv8bPMXd0EIyGqH3UKloJMXlsw5Oe5KE0kXegj4_nc=.1fcc7aa0-3b49-4321-a0b8-e48f779c54e9@github.com>
Message-ID: <vCixXLL49PE633Rym2_ZyydSCoolK-IaFL7fWqJ5QgY=.0f1fa4d1-d01d-4cc8-89f7-1a3f0ed84ed6@github.com>

On Wed, 17 Nov 2021 12:31:10 GMT, Evgeny Astigeevich <duke at openjdk.java.net> wrote:

>> One `ISB` implementation of `Thread.OnSpinWait` shows performance improvements on Graviton2 (Arm Neoverse N1 implementation), e.g. https://github.com/openjdk/jdk/pull/5562#issuecomment-966153163. 
>> 
>> Testing:
>> - `make test TEST=gtest`: Passed
>> - `make run-test TEST=tier1`: Passed
>> - `make run-test TEST=tier2`: Passed
>> - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed
>
> Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Set defaults for OnSpinWaitInst/OnSpinWaitInstCount independently

Marked as reviewed by ngasson (Reviewer).

-------------

PR: https://git.openjdk.java.net/jdk/pull/6415

From smarks at openjdk.java.net  Thu Nov 18 01:51:07 2021
From: smarks at openjdk.java.net (Stuart Marks)
Date: Thu, 18 Nov 2021 01:51:07 GMT
Subject: RFR: JDK-8276422 Add command-line option to disable finalization
Message-ID: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>

Pretty much what it says. The new option controls a static member in InstanceKlass that's consulted to determine whether the finalization machinery is activated for instances when a class is loaded. A new native method is added so that this state can be queried from Java. This is used to control whether a finalizer thread is created and to disable the `System` and `Runtime::runFinalization` methods. Includes tests for the above.

-------------

Commit messages:
 - extraneous newline
 - Merge branch 'master' into JDK-8276422-disable-finalization-option
 - Simplify InvalidFinalizationOption test.
 - Change InvalidFinalizationOption test to driver mode.
 - Revert extraneous whitespace change to globals.hpp.
 - Renaming within the test class itself.
 - Rename invalid finalization option test.
 - Add test for invalid finalization option syntax or value.
 - Add @bug line to JFR finalization event test.
 - Test that no jdk.FinalizationStatistics events are generated when finalization is disabled
 - ... and 7 more: https://git.openjdk.java.net/jdk/compare/29e552c0...3836cc94

Changes: https://git.openjdk.java.net/jdk/pull/6442/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6442&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8276422
  Stats: 266 lines in 13 files changed: 249 ins; 0 del; 17 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6442.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6442/head:pull/6442

PR: https://git.openjdk.java.net/jdk/pull/6442

From dholmes at openjdk.java.net  Thu Nov 18 01:59:41 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Thu, 18 Nov 2021 01:59:41 GMT
Subject: RFR: JDK-8276422 Add command-line option to disable finalization
In-Reply-To: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
References: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
Message-ID: <cJRW-PeEexZJ9YQtiL5xratfEMiPleQz3_QpCQmkqYs=.478cae24-9f93-4837-9744-2da1b328a90c@github.com>

On Thu, 18 Nov 2021 01:34:36 GMT, Stuart Marks <smarks at openjdk.org> wrote:

> Pretty much what it says. The new option controls a static member in InstanceKlass that's consulted to determine whether the finalization machinery is activated for instances when a class is loaded. A new native method is added so that this state can be queried from Java. This is used to control whether a finalizer thread is created and to disable the `System` and `Runtime::runFinalization` methods. Includes tests for the above.

Hi Stuart,

This all looks fine to me. The hotspot part needs a second reviewer (especially as I contributed a chunk of that code :) ).

Thanks,
David

-------------

Marked as reviewed by dholmes (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6442

From duke at openjdk.java.net  Thu Nov 18 02:44:45 2021
From: duke at openjdk.java.net (Fei Gao)
Date: Thu, 18 Nov 2021 02:44:45 GMT
Subject: Integrated: 8274179: AArch64: Support SVE operations with encodable
 immediates
In-Reply-To: <iQ6P7TblcS20eQ-acZmAkLU_ikzTkMCUGXorw4LFD_8=.15686e9d-b6a0-4ee8-bd55-ecbe0512b721@github.com>
References: <iQ6P7TblcS20eQ-acZmAkLU_ikzTkMCUGXorw4LFD_8=.15686e9d-b6a0-4ee8-bd55-ecbe0512b721@github.com>
Message-ID: <osXQyFGpFs64d1JevoofpSvpme6rwxU5UAUuE2mccsM=.8e2876e3-57c1-4cd4-8922-65dc669c6d6e@github.com>

On Tue, 26 Oct 2021 01:58:40 GMT, Fei Gao <duke at openjdk.java.net> wrote:

> for(int i = 0; i < LENGTH; i++) {
>       c[i] = a[i] + 2;
>     }
> 
> For the case showed above, after superword optimization with SVE,
> without the patch, the vector add operation always has 2 z-reg inputs,
> like:
> mov     z16.s, #2
> add	z17.s, z17.s, z16.s
> 
> Considering sve has supported basic binary operations with immediate,
> this pattern could be further optimized to:
> add     z16.s, z16.s, #2
> 
> To implement it, we added some new match rules and assembler rules in
> the aarch64 backend. We also made some extensions on immediate types
> and functions to keep backward compatible.
> 
> With the patch, only these binary integer vector operations, +(add),
> -(sub), &(and), |(orr), and ^(eor) with immediate are supported for
> the optimization. Other vector operations are not supported currently.
> 
> Tested tier1 and test/hotspot/jtreg/compiler on SVE featured AArch64
> CPU, no new failure.
> 
> There is no obvious performance uplift but it can help remove one
> redundant mov instruction.

This pull request has now been integrated.

Changeset: 81938001
Author:    Fei Gao <Fei.Gao at arm.com>
Committer: Ningsheng Jian <njian at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/81938001f9bae56c59f4e18b7756089f2cf0bf74
Stats:     1476 lines in 12 files changed: 1329 ins; 43 del; 104 mod

8274179: AArch64: Support SVE operations with encodable immediates

Reviewed-by: aph, ngasson

-------------

PR: https://git.openjdk.java.net/jdk/pull/6115

From pli at openjdk.java.net  Thu Nov 18 04:03:54 2021
From: pli at openjdk.java.net (Pengfei Li)
Date: Thu, 18 Nov 2021 04:03:54 GMT
Subject: RFR: 8277168: AArch64: Enable arraycopy partial inlining with SVE
Message-ID: <Rv-7zsL9uLaSjwoS7JIDgZ2-HhIoX00l8aCiDmFd6qw=.b1dbbb3b-0578-4d7f-97ba-1acd3baaa74c@github.com>

Arraycopy partial inlining is a C2 compiler technique that avoids stub
call overhead in small-sized arraycopy operations by generating masked
vector instructions. So far it works on x86 AVX512 only and this patch
enables it on AArch64 with SVE.

We add AArch64 matching rule for VectorMaskGenNode and refactor that
node a little bit. The major change is moving the element type field
into its TypeVectMask bottom type. The reason is that AArch64 vector
masks are different for different vector element types.

E.g., an x86 AVX512 vector mask value masking 3 least significant vector
lanes (of any type) is like

`0000 0000 ... 0000 0000 0000 0000 0111`

On AArch64 SVE, this mask value can only be used for masking the 3 least
significant lanes of bytes. But for 3 lanes of ints, the value should be

`0000 0000 ... 0000 0000 0001 0001 0001`

where the least significant bit of each lane matters. So AArch64 matcher
needs to know the vector element type to generate right masks.

After this patch, the C2 generated code for copying a 50-byte array on
AArch64 SVE looks like

  mov     x12, #0x32
  whilelo p0.b, xzr, x12
  add     x11, x11, #0x10
  ld1b    {z16.b}, p0/z, [x11]
  add     x10, x10, #0x10
  st1b    {z16.b}, p0, [x10]

We ran jtreg hotspot::hotspot_all, jdk::tier1~3 and langtools::tier1 on
both x86 AVX512 and AArch64 SVE machines, no issue is found. We tested
JMH org/openjdk/bench/java/lang/ArrayCopyAligned.java with small array
size arguments on a 512-bit SVE-featured CPU. We got below performance
data changes.

Benchmark                  (length)  (Performance)
ArrayCopyAligned.testByte        10          -2.6%
ArrayCopyAligned.testByte        20          +4.7%
ArrayCopyAligned.testByte        30          +4.8%
ArrayCopyAligned.testByte        40         +21.7%
ArrayCopyAligned.testByte        50         +22.5%
ArrayCopyAligned.testByte        60         +28.4%

The test machine has SVE vector size of 512 bits, so we see performance
gain for most array sizes less than 64 bytes. For very small arrays we
see a bit regression because a vector load/store may be a bit slower
than 1 or 2 scalar loads/stores.

-------------

Commit messages:
 - 8277168: AArch64: Enable arraycopy partial inlining with SVE

Changes: https://git.openjdk.java.net/jdk/pull/6444/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6444&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8277168
  Stats: 87 lines in 16 files changed: 57 ins; 7 del; 23 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6444.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6444/head:pull/6444

PR: https://git.openjdk.java.net/jdk/pull/6444

From jpai at openjdk.java.net  Thu Nov 18 04:17:43 2021
From: jpai at openjdk.java.net (Jaikiran Pai)
Date: Thu, 18 Nov 2021 04:17:43 GMT
Subject: RFR: JDK-8276422 Add command-line option to disable finalization
In-Reply-To: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
References: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
Message-ID: <aY7bIf24C-sA8R3hoi6dHmqd2R6QgzGUMg4UiWsy_5w=.a7144786-a08d-4def-9468-f13848200656@github.com>

On Thu, 18 Nov 2021 01:34:36 GMT, Stuart Marks <smarks at openjdk.org> wrote:

> Pretty much what it says. The new option controls a static member in InstanceKlass that's consulted to determine whether the finalization machinery is activated for instances when a class is loaded. A new native method is added so that this state can be queried from Java. This is used to control whether a finalizer thread is created and to disable the `System` and `Runtime::runFinalization` methods. Includes tests for the above.

src/java.base/share/classes/java/lang/ref/Finalizer.java line 195:

> 193: 
> 194:     static {
> 195:         if (Holder.ENABLED) {

Hello Stuart,
My understanding of the the lazy `Holder` is that it's there to delay the static initialization of the code that's part of the `Holder`. In this case here, the `Holder` is being used right within the `static` block of the `Finalizer` class, that too as the first thing. In this case, is that `Holder` class necessary?

-------------

PR: https://git.openjdk.java.net/jdk/pull/6442

From smarks at openjdk.java.net  Thu Nov 18 05:22:35 2021
From: smarks at openjdk.java.net (Stuart Marks)
Date: Thu, 18 Nov 2021 05:22:35 GMT
Subject: RFR: JDK-8276422 Add command-line option to disable finalization
In-Reply-To: <aY7bIf24C-sA8R3hoi6dHmqd2R6QgzGUMg4UiWsy_5w=.a7144786-a08d-4def-9468-f13848200656@github.com>
References: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
 <aY7bIf24C-sA8R3hoi6dHmqd2R6QgzGUMg4UiWsy_5w=.a7144786-a08d-4def-9468-f13848200656@github.com>
Message-ID: <f3w2BhEricECDGSIpGaZkuSKjTFus9LK-ZBV9BxvymM=.0a67f8df-912f-47c5-969d-b5a8161c31bf@github.com>

On Thu, 18 Nov 2021 04:13:21 GMT, Jaikiran Pai <jpai at openjdk.org> wrote:

>> Pretty much what it says. The new option controls a static member in InstanceKlass that's consulted to determine whether the finalization machinery is activated for instances when a class is loaded. A new native method is added so that this state can be queried from Java. This is used to control whether a finalizer thread is created and to disable the `System` and `Runtime::runFinalization` methods. Includes tests for the above.
>
> src/java.base/share/classes/java/lang/ref/Finalizer.java line 195:
> 
>> 193: 
>> 194:     static {
>> 195:         if (Holder.ENABLED) {
> 
> Hello Stuart,
> My understanding of the the lazy `Holder` is that it's there to delay the static initialization of the code that's part of the `Holder`. In this case here, the `Holder` is being used right within the `static` block of the `Finalizer` class, that too as the first thing. In this case, is that `Holder` class necessary?

Huh, good catch! This was mostly left over from an earlier version of the flag that used system properties, which aren't initialized until after the Finalizer class is initialized.

It might be the case that the Holder can be removed at this point, since the finalization-enabled bit is no longer in a system property and is in a native class member that should be available before the VM is started.

I say "might" though because this occurs early in system startup, and weird things potentially happen. For example, suppose the first object with a finalizer is created before the Finalizer class is initialized. The VM will perform an upcall to Finalizer::register. An ordinary call to a static method will ensure the class is initialized before proceeding with the call, but this VM upcall is a special case.... I'll have to investigate this some more.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6442

From dholmes at openjdk.java.net  Thu Nov 18 05:43:41 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Thu, 18 Nov 2021 05:43:41 GMT
Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with
 "assert(java_thread == _state->get_thread()) failed: Must be"
In-Reply-To: <PqDFsCk3WFtK9ZA-Blegiu6CzrA7EoiG6jx0raAFF7A=.5ddd8298-66cb-47c4-b53d-ca148ce2fc5b@github.com>
References: <PqDFsCk3WFtK9ZA-Blegiu6CzrA7EoiG6jx0raAFF7A=.5ddd8298-66cb-47c4-b53d-ca148ce2fc5b@github.com>
Message-ID: <juZNw818RkNbj2ff6_adOWk7KaD5ABqe2u61oqUH67Q=.927354f6-11b0-444c-a7b7-173e8b39657f@github.com>

On Wed, 17 Nov 2021 22:21:33 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote:

> The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa.
> The fix is to add a check for is_exiting() status into handshake closure do_thread() early.
> There following handshake closures are fixed by this update:
>   - UpdateForPopTopFrameClosure
>  - SetForceEarlyReturn
>  - SetFramePopClosure

Hi Leonid,

Something seems amiss to me.

First the checks for `java_thread->threadObj() == NULL` should not be necessary as the `threadObj` can never be NULL once it has been started and a non-started thread should not be possible by the time you reach the code doing the checks. Even if we nulled out `threadObj` for a terminated thread the `is_exiting` check would already handle that case.

Second, if the target thread is exiting then surely the suspension check should return false and so we would already give a JVMTI_ERROR_THREAD_NOT_SUSPENDED error?

Thanks,
David

-------------

PR: https://git.openjdk.java.net/jdk/pull/6440

From dholmes at openjdk.java.net  Thu Nov 18 06:03:45 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Thu, 18 Nov 2021 06:03:45 GMT
Subject: RFR: 8276177:
 nsk/jvmti/RedefineClasses/StressRedefineWithoutBytecodeCorruption failed with
 "assert(def_ik->is_being_redefined()) failed: should be being redefined to get
 here"
In-Reply-To: <bIc3bHZoI18uZLind1OWmwcRLt7MPg9XkQ4AgrMnXJQ=.1be19b81-5b3b-4b22-b00d-c5772f621916@github.com>
References: <bIc3bHZoI18uZLind1OWmwcRLt7MPg9XkQ4AgrMnXJQ=.1be19b81-5b3b-4b22-b00d-c5772f621916@github.com>
Message-ID: <MLwNcdm-BGG5eq3fqEk6vDEsdgqThDbQdrwPxPOlZYc=.787c213b-e09a-484a-8c45-bde750f0f6bb@github.com>

On Tue, 16 Nov 2021 13:29:08 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

> The boolean AND was sharing a flag with another thread, so the value of _misc_is_being_redefined was being set and reset with the other thread.  Moved the bit to AccessFlags which has space and an atomic set operation.
> Tested with tier1-6, 7-8 in progress.

src/hotspot/share/utilities/accessFlags.hpp line 165:

> 163:   bool is_being_redefined() const       { return (_flags & JVM_ACC_IS_BEING_REDEFINED) != 0; }
> 164:   void set_is_being_redefined()         { atomic_set_bits(JVM_ACC_IS_BEING_REDEFINED); }
> 165:   void clear_is_being_redefined()       { atomic_clear_bits(JVM_ACC_IS_BEING_REDEFINED); }

Shouldn't these have been under Klass flags, not Klass and Method flags ?

-------------

PR: https://git.openjdk.java.net/jdk/pull/6410

From dholmes at openjdk.java.net  Thu Nov 18 06:23:42 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Thu, 18 Nov 2021 06:23:42 GMT
Subject: RFR: 8276795: Deprecate seldom used CDS flags [v2]
In-Reply-To: <2R0k3TgJwgMkaV2tlOyW8O1cLiB6USFPJ-qvItVBJV0=.7e012487-abf4-4fa3-91c8-74f01d49bbab@github.com>
References: <Sex9p6I6KaGfkEMPnyyuLPsgMM6KEm_WLeSuhUkHMs4=.e4fce608-e8ac-4014-8e6b-20a5348af607@github.com>
 <2R0k3TgJwgMkaV2tlOyW8O1cLiB6USFPJ-qvItVBJV0=.7e012487-abf4-4fa3-91c8-74f01d49bbab@github.com>
Message-ID: <h30a_qW5M6evGmXgTxsIkWg5oksP3DE-ObBYrWU3mlc=.86174b44-04c9-455a-b76e-662b55b2af54@github.com>

On Tue, 16 Nov 2021 15:56:00 GMT, Harold Seigel <hseigel at openjdk.org> wrote:

>> Please review this small change to deprecate seldom used CDS flags.  The flags will be deprecated in 18, obsoleted in 19, and removed in a later release.
>> 
>> The changes were tested with Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 3-5 on Linux x64.
>> 
>> Thanks, Harold
>
> Harold Seigel has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Add (Deprecated) to comments and add options to deprecated test

Sorry for the delay - updates look good.

Thanks,
David

-------------

Marked as reviewed by dholmes (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6390

From sspitsyn at openjdk.java.net  Thu Nov 18 06:52:40 2021
From: sspitsyn at openjdk.java.net (Serguei Spitsyn)
Date: Thu, 18 Nov 2021 06:52:40 GMT
Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with
 "assert(java_thread == _state->get_thread()) failed: Must be"
In-Reply-To: <PqDFsCk3WFtK9ZA-Blegiu6CzrA7EoiG6jx0raAFF7A=.5ddd8298-66cb-47c4-b53d-ca148ce2fc5b@github.com>
References: <PqDFsCk3WFtK9ZA-Blegiu6CzrA7EoiG6jx0raAFF7A=.5ddd8298-66cb-47c4-b53d-ca148ce2fc5b@github.com>
Message-ID: <YbHNfPt2B_vkpfS5py8kEs2_Gmlr6qXTndUJ9mCzWgU=.b88a9b29-0f12-4578-bdda-548ce7e91cdf@github.com>

On Wed, 17 Nov 2021 22:21:33 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote:

> The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa.
> The fix is to add a check for is_exiting() status into handshake closure do_thread() early.
> There following handshake closures are fixed by this update:
>   - UpdateForPopTopFrameClosure
>  - SetForceEarlyReturn
>  - SetFramePopClosure

Martin and Leonid, thank you for quick review!

-------------

PR: https://git.openjdk.java.net/jdk/pull/6440

From pli at openjdk.java.net  Thu Nov 18 06:58:40 2021
From: pli at openjdk.java.net (Pengfei Li)
Date: Thu, 18 Nov 2021 06:58:40 GMT
Subject: RFR: 8277168: AArch64: Enable arraycopy partial inlining with SVE
In-Reply-To: <Rv-7zsL9uLaSjwoS7JIDgZ2-HhIoX00l8aCiDmFd6qw=.b1dbbb3b-0578-4d7f-97ba-1acd3baaa74c@github.com>
References: <Rv-7zsL9uLaSjwoS7JIDgZ2-HhIoX00l8aCiDmFd6qw=.b1dbbb3b-0578-4d7f-97ba-1acd3baaa74c@github.com>
Message-ID: <v6YJjUUxs507gmh_JQNThW34vhS3w0FxQ_YPUUlST-g=.7a0a7e7d-def4-4dc9-b29a-efbad6081983@github.com>

On Thu, 18 Nov 2021 03:50:45 GMT, Pengfei Li <pli at openjdk.org> wrote:

> Arraycopy partial inlining is a C2 compiler technique that avoids stub
> call overhead in small-sized arraycopy operations by generating masked
> vector instructions. So far it works on x86 AVX512 only and this patch
> enables it on AArch64 with SVE.
> 
> We add AArch64 matching rule for VectorMaskGenNode and refactor that
> node a little bit. The major change is moving the element type field
> into its TypeVectMask bottom type. The reason is that AArch64 vector
> masks are different for different vector element types.
> 
> E.g., an x86 AVX512 vector mask value masking 3 least significant vector
> lanes (of any type) is like
> 
> `0000 0000 ... 0000 0000 0000 0000 0111`
> 
> On AArch64 SVE, this mask value can only be used for masking the 3 least
> significant lanes of bytes. But for 3 lanes of ints, the value should be
> 
> `0000 0000 ... 0000 0000 0001 0001 0001`
> 
> where the least significant bit of each lane matters. So AArch64 matcher
> needs to know the vector element type to generate right masks.
> 
> After this patch, the C2 generated code for copying a 50-byte array on
> AArch64 SVE looks like
> 
>   mov     x12, #0x32
>   whilelo p0.b, xzr, x12
>   add     x11, x11, #0x10
>   ld1b    {z16.b}, p0/z, [x11]
>   add     x10, x10, #0x10
>   st1b    {z16.b}, p0, [x10]
> 
> We ran jtreg hotspot::hotspot_all, jdk::tier1~3 and langtools::tier1 on
> both x86 AVX512 and AArch64 SVE machines, no issue is found. We tested
> JMH org/openjdk/bench/java/lang/ArrayCopyAligned.java with small array
> size arguments on a 512-bit SVE-featured CPU. We got below performance
> data changes.
> 
> Benchmark                  (length)  (Performance)
> ArrayCopyAligned.testByte        10          -2.6%
> ArrayCopyAligned.testByte        20          +4.7%
> ArrayCopyAligned.testByte        30          +4.8%
> ArrayCopyAligned.testByte        40         +21.7%
> ArrayCopyAligned.testByte        50         +22.5%
> ArrayCopyAligned.testByte        60         +28.4%
> 
> The test machine has SVE vector size of 512 bits, so we see performance
> gain for most array sizes less than 64 bytes. For very small arrays we
> see a bit regression because a vector load/store may be a bit slower
> than 1 or 2 scalar loads/stores.

The x86 failure is caused by a recent commit (see [JDK-8277324](https://bugs.openjdk.java.net/browse/JDK-8277324)) and unrelated to this PR.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6444

From sspitsyn at openjdk.java.net  Thu Nov 18 07:00:47 2021
From: sspitsyn at openjdk.java.net (Serguei Spitsyn)
Date: Thu, 18 Nov 2021 07:00:47 GMT
Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with
 "assert(java_thread == _state->get_thread()) failed: Must be"
In-Reply-To: <PqDFsCk3WFtK9ZA-Blegiu6CzrA7EoiG6jx0raAFF7A=.5ddd8298-66cb-47c4-b53d-ca148ce2fc5b@github.com>
References: <PqDFsCk3WFtK9ZA-Blegiu6CzrA7EoiG6jx0raAFF7A=.5ddd8298-66cb-47c4-b53d-ca148ce2fc5b@github.com>
Message-ID: <3pe7ADvZ3z_slXMHOU3g0kIrhLcsLi0xDIeqIAAmmsM=.27039eee-1c1d-405b-a948-a0bda9acd287@github.com>

On Wed, 17 Nov 2021 22:21:33 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote:

> The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa.
> The fix is to add a check for is_exiting() status into handshake closure do_thread() early.
> There following handshake closures are fixed by this update:
>   - UpdateForPopTopFrameClosure
>  - SetForceEarlyReturn
>  - SetFramePopClosure

Hi David,

Thank you for reviewing this!
I was also thinking about getting rid of the check `java_thread->threadObj() == NULL`. 
Then I've decided it is safe to keep it as it was in the original UpdateForPopTopFrameClosure implementation (but later in the code). I will remove it and retest the fix.

> Second, if the target thread is exiting then surely the suspension check should return
> false and so we would already give a JVMTI_ERROR_THREAD_NOT_SUSPENDED error?

The assert 
`   assert(java_thread == _state->get_thread(), "Must be");`
is fired one line before the `JVMTI_ERROR_THREAD_NOT_SUSPENDED` code is returned.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6440

From thartmann at openjdk.java.net  Thu Nov 18 07:05:41 2021
From: thartmann at openjdk.java.net (Tobias Hartmann)
Date: Thu, 18 Nov 2021 07:05:41 GMT
Subject: RFR: 8275643: C2's unaryOp vector intrinsic does not properly
 handle LongVector.neg
In-Reply-To: <VIWd1DGe48UKFJdi61wqQqYOOuFed9_yEgqMaz1444k=.43162c19-75d4-4806-81dc-29ad772d7155@github.com>
References: <VIWd1DGe48UKFJdi61wqQqYOOuFed9_yEgqMaz1444k=.43162c19-75d4-4806-81dc-29ad772d7155@github.com>
Message-ID: <V73TXrxcBIUzy7r7_bFWIgNytOVb78zawzXr0jGiDOk=.ce427c67-131b-42d0-acf3-a7f2f15f39b8@github.com>

On Wed, 17 Nov 2021 11:41:04 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote:

> Code in `LongVector::lanewiseTemplate` currently implements the `NEG` operation as a `SUB` and has a corresponding `FIXME` comment:
> https://github.com/openjdk/jdk/blob/e9934e1243929514e147ecdd3cefa74168ed0500/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/LongVector.java#L534-L541
> 
> The implicit assumption is that since we will never pass `NEG` to `VectorSupport.unaryOp` in line 540, the corresponding C2 intrinsic does not need to handle that case. That's not guaranteed though because C2 might still compile that path when not being able to prove that it's unreachable at parse time. As a result, we then assert in the intrinsic because the negation operation on a long vector is currently not supported (i.e. there is no `Op_NegVL`). I propose to simply handle this case in ` VectorSupport::vop2ideal`. We will then bail out from intrinsification with `operation not supported: opc=NegL bt=long` because `VectorNode::opcode` returns 0:
> https://github.com/openjdk/jdk/blob/e9934e1243929514e147ecdd3cefa74168ed0500/src/hotspot/share/opto/vectorIntrinsics.cpp#L390-L394
> 
> Question to the Vector API experts: There are other `FIXME: Support this in the JIT` comments in the code. Do these code paths suffer from similar issues? Is there a tracking RFE/bug?
> 
> Thanks,
> Tobias

Thanks for the review, Sandhya!

-------------

PR: https://git.openjdk.java.net/jdk/pull/6428

From sspitsyn at openjdk.java.net  Thu Nov 18 07:08:15 2021
From: sspitsyn at openjdk.java.net (Serguei Spitsyn)
Date: Thu, 18 Nov 2021 07:08:15 GMT
Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with
 "assert(java_thread == _state->get_thread()) failed: Must be" [v2]
In-Reply-To: <PqDFsCk3WFtK9ZA-Blegiu6CzrA7EoiG6jx0raAFF7A=.5ddd8298-66cb-47c4-b53d-ca148ce2fc5b@github.com>
References: <PqDFsCk3WFtK9ZA-Blegiu6CzrA7EoiG6jx0raAFF7A=.5ddd8298-66cb-47c4-b53d-ca148ce2fc5b@github.com>
Message-ID: <IRMilekj_sKUbVmaCojRDxv30HVVbpHoTybTfSzYrCc=.27f57c00-c185-461e-a0f6-44ad1a550189@github.com>

> The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa.
> The fix is to add a check for is_exiting() status into handshake closure do_thread() early.
> There following handshake closures are fixed by this update:
>   - UpdateForPopTopFrameClosure
>  - SetForceEarlyReturn
>  - SetFramePopClosure

Serguei Spitsyn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:

 - Merge
 - remove vmTestbase/nsk/jvmti/PopFrame/popframe011 from ProblemList.txt
 - fix 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with assert

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6440/files
  - new: https://git.openjdk.java.net/jdk/pull/6440/files/64f22944..60e784ec

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6440&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6440&range=00-01

  Stats: 1850 lines in 27 files changed: 1576 ins; 95 del; 179 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6440.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6440/head:pull/6440

PR: https://git.openjdk.java.net/jdk/pull/6440

From dholmes at openjdk.java.net  Thu Nov 18 07:16:42 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Thu, 18 Nov 2021 07:16:42 GMT
Subject: RFR: JDK-8276422 Add command-line option to disable finalization
In-Reply-To: <f3w2BhEricECDGSIpGaZkuSKjTFus9LK-ZBV9BxvymM=.0a67f8df-912f-47c5-969d-b5a8161c31bf@github.com>
References: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
 <aY7bIf24C-sA8R3hoi6dHmqd2R6QgzGUMg4UiWsy_5w=.a7144786-a08d-4def-9468-f13848200656@github.com>
 <f3w2BhEricECDGSIpGaZkuSKjTFus9LK-ZBV9BxvymM=.0a67f8df-912f-47c5-969d-b5a8161c31bf@github.com>
Message-ID: <sw5vIggPSwpfKnyNbt8UExUxh0YWm5MXRI3tcwZSgPo=.a5b6fb7f-d344-4227-af96-5dee5d3d991e@github.com>

On Thu, 18 Nov 2021 05:20:02 GMT, Stuart Marks <smarks at openjdk.org> wrote:

>> src/java.base/share/classes/java/lang/ref/Finalizer.java line 195:
>> 
>>> 193: 
>>> 194:     static {
>>> 195:         if (Holder.ENABLED) {
>> 
>> Hello Stuart,
>> My understanding of the the lazy `Holder` is that it's there to delay the static initialization of the code that's part of the `Holder`. In this case here, the `Holder` is being used right within the `static` block of the `Finalizer` class, that too as the first thing. In this case, is that `Holder` class necessary?
>
> Huh, good catch! This was mostly left over from an earlier version of the flag that used system properties, which aren't initialized until after the Finalizer class is initialized.
> 
> It might be the case that the Holder can be removed at this point, since the finalization-enabled bit is no longer in a system property and is in a native class member that should be available before the VM is started.
> 
> I say "might" though because this occurs early in system startup, and weird things potentially happen. For example, suppose the first object with a finalizer is created before the Finalizer class is initialized. The VM will perform an upcall to Finalizer::register. An ordinary call to a static method will ensure the class is initialized before proceeding with the call, but this VM upcall is a special case.... I'll have to investigate this some more.

@stuart-marks not sure I see how anything is different here compared to the existing logic. The `Finalizer` class is explicitly initialized quite early in the init process, but if a preceding class's initialization created an object with a finalizer then that same upcall would be involved.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6442

From kbarrett at openjdk.java.net  Thu Nov 18 07:19:37 2021
From: kbarrett at openjdk.java.net (Kim Barrett)
Date: Thu, 18 Nov 2021 07:19:37 GMT
Subject: RFR: JDK-8276422 Add command-line option to disable finalization
In-Reply-To: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
References: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
Message-ID: <6s-4rTTyX8qZpavXbov9l2-H6BU7QLA0c71-K6xSQUM=.54dd6db2-5e78-4292-8958-72cadd762154@github.com>

On Thu, 18 Nov 2021 01:34:36 GMT, Stuart Marks <smarks at openjdk.org> wrote:

> Pretty much what it says. The new option controls a static member in InstanceKlass that's consulted to determine whether the finalization machinery is activated for instances when a class is loaded. A new native method is added so that this state can be queried from Java. This is used to control whether a finalizer thread is created and to disable the `System` and `Runtime::runFinalization` methods. Includes tests for the above.

I only really reviewed the hotspot changes.

There is nothing here to make the various GCs take advantage of finalization being disabled.  Is the plan to leave that to followup changes?

src/hotspot/share/oops/instanceKlass.hpp line 338:

> 336: 
> 337:   // Queries finalization state
> 338:   static bool finalization_enabled() { return _finalization_enabled; }

Predicate functions like this are often named "is_xxx"; that idiom is common in this class.

src/hotspot/share/prims/jvm.cpp line 694:

> 692: 
> 693: JVM_ENTRY(jboolean, JVM_IsFinalizationEnabled(JNIEnv * env))
> 694: return InstanceKlass::finalization_enabled() ? JNI_TRUE : JNI_FALSE;

missing indentation

-------------

Changes requested by kbarrett (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6442

From kbarrett at openjdk.java.net  Thu Nov 18 07:19:37 2021
From: kbarrett at openjdk.java.net (Kim Barrett)
Date: Thu, 18 Nov 2021 07:19:37 GMT
Subject: RFR: JDK-8276422 Add command-line option to disable finalization
In-Reply-To: <6s-4rTTyX8qZpavXbov9l2-H6BU7QLA0c71-K6xSQUM=.54dd6db2-5e78-4292-8958-72cadd762154@github.com>
References: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
 <6s-4rTTyX8qZpavXbov9l2-H6BU7QLA0c71-K6xSQUM=.54dd6db2-5e78-4292-8958-72cadd762154@github.com>
Message-ID: <WJ1mUrSUgXOp_f_ajBnOO9osSd4CySGDOTSZxxrx0iA=.1478bcd7-6767-4a84-bafd-aa86c4fdbf86@github.com>

On Thu, 18 Nov 2021 06:43:01 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:

>> Pretty much what it says. The new option controls a static member in InstanceKlass that's consulted to determine whether the finalization machinery is activated for instances when a class is loaded. A new native method is added so that this state can be queried from Java. This is used to control whether a finalizer thread is created and to disable the `System` and `Runtime::runFinalization` methods. Includes tests for the above.
>
> src/hotspot/share/prims/jvm.cpp line 694:
> 
>> 692: 
>> 693: JVM_ENTRY(jboolean, JVM_IsFinalizationEnabled(JNIEnv * env))
>> 694: return InstanceKlass::finalization_enabled() ? JNI_TRUE : JNI_FALSE;
> 
> missing indentation

I think this could just be `return InstanceKlass::finalization_enabled();`.  There is lots of code in this file and elsewhere that assumes C++ `bool` converts to `jboolean` appropriately.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6442

From shade at openjdk.java.net  Thu Nov 18 07:32:43 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Thu, 18 Nov 2021 07:32:43 GMT
Subject: RFR: JDK-8276422 Add command-line option to disable finalization
In-Reply-To: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
References: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
Message-ID: <G3LY1_9uMKwLJv93ncw1CLCj8jh8TqSQ_hgFp7ChmIg=.2b814625-6d7a-438d-8c8d-b91aae9b3ecb@github.com>

On Thu, 18 Nov 2021 01:34:36 GMT, Stuart Marks <smarks at openjdk.org> wrote:

> Pretty much what it says. The new option controls a static member in InstanceKlass that's consulted to determine whether the finalization machinery is activated for instances when a class is loaded. A new native method is added so that this state can be queried from Java. This is used to control whether a finalizer thread is created and to disable the `System` and `Runtime::runFinalization` methods. Includes tests for the above.

>From the brief look, it is OK. Minor nits.

src/hotspot/share/prims/jvm.cpp line 694:

> 692: 
> 693: JVM_ENTRY(jboolean, JVM_IsFinalizationEnabled(JNIEnv * env))
> 694: return InstanceKlass::finalization_enabled() ? JNI_TRUE : JNI_FALSE;

Suggestion:

  return InstanceKlass::finalization_enabled() ? JNI_TRUE : JNI_FALSE;

-------------

PR: https://git.openjdk.java.net/jdk/pull/6442

From shade at openjdk.java.net  Thu Nov 18 07:32:44 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Thu, 18 Nov 2021 07:32:44 GMT
Subject: RFR: JDK-8276422 Add command-line option to disable finalization
In-Reply-To: <sw5vIggPSwpfKnyNbt8UExUxh0YWm5MXRI3tcwZSgPo=.a5b6fb7f-d344-4227-af96-5dee5d3d991e@github.com>
References: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
 <aY7bIf24C-sA8R3hoi6dHmqd2R6QgzGUMg4UiWsy_5w=.a7144786-a08d-4def-9468-f13848200656@github.com>
 <f3w2BhEricECDGSIpGaZkuSKjTFus9LK-ZBV9BxvymM=.0a67f8df-912f-47c5-969d-b5a8161c31bf@github.com>
 <sw5vIggPSwpfKnyNbt8UExUxh0YWm5MXRI3tcwZSgPo=.a5b6fb7f-d344-4227-af96-5dee5d3d991e@github.com>
Message-ID: <U84HM_PqW_WsrjHouuAdxceM6fYs8C9a0LOlhI5TnF8=.3a879bd2-696a-4209-ad39-d6e9d871bb0f@github.com>

On Thu, 18 Nov 2021 07:13:55 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Huh, good catch! This was mostly left over from an earlier version of the flag that used system properties, which aren't initialized until after the Finalizer class is initialized.
>> 
>> It might be the case that the Holder can be removed at this point, since the finalization-enabled bit is no longer in a system property and is in a native class member that should be available before the VM is started.
>> 
>> I say "might" though because this occurs early in system startup, and weird things potentially happen. For example, suppose the first object with a finalizer is created before the Finalizer class is initialized. The VM will perform an upcall to Finalizer::register. An ordinary call to a static method will ensure the class is initialized before proceeding with the call, but this VM upcall is a special case.... I'll have to investigate this some more.
>
> @stuart-marks not sure I see how anything is different here compared to the existing logic. The `Finalizer` class is explicitly initialized quite early in the init process, but if a preceding class's initialization created an object with a finalizer then that same upcall would be involved.

Do we even have to have a flag on Java side? It looks like these calls are only done as the upcalls from VM, so we might just keep the flag on VM side?

-------------

PR: https://git.openjdk.java.net/jdk/pull/6442

From dholmes at openjdk.java.net  Thu Nov 18 07:37:42 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Thu, 18 Nov 2021 07:37:42 GMT
Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with
 "assert(java_thread == _state->get_thread()) failed: Must be" [v2]
In-Reply-To: <IRMilekj_sKUbVmaCojRDxv30HVVbpHoTybTfSzYrCc=.27f57c00-c185-461e-a0f6-44ad1a550189@github.com>
References: <PqDFsCk3WFtK9ZA-Blegiu6CzrA7EoiG6jx0raAFF7A=.5ddd8298-66cb-47c4-b53d-ca148ce2fc5b@github.com>
 <IRMilekj_sKUbVmaCojRDxv30HVVbpHoTybTfSzYrCc=.27f57c00-c185-461e-a0f6-44ad1a550189@github.com>
Message-ID: <a9n8loqncN5V4iWQcnS5aNH4GbmL6bf4jIVEaT4_sqg=.7a19693b-0050-4633-8ac1-3397441e850f@github.com>

On Thu, 18 Nov 2021 07:08:15 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote:

>> The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa.
>> The fix is to add a check for is_exiting() status into handshake closure do_thread() early.
>> There following handshake closures are fixed by this update:
>>   - UpdateForPopTopFrameClosure
>>  - SetForceEarlyReturn
>>  - SetFramePopClosure
>
> Serguei Spitsyn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
> 
>  - Merge
>  - remove vmTestbase/nsk/jvmti/PopFrame/popframe011 from ProblemList.txt
>  - fix 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with assert

Wouldn't it suffice to just move the assert then?

-------------

PR: https://git.openjdk.java.net/jdk/pull/6440

From dholmes at openjdk.java.net  Thu Nov 18 07:43:35 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Thu, 18 Nov 2021 07:43:35 GMT
Subject: RFR: JDK-8276422 Add command-line option to disable finalization
In-Reply-To: <U84HM_PqW_WsrjHouuAdxceM6fYs8C9a0LOlhI5TnF8=.3a879bd2-696a-4209-ad39-d6e9d871bb0f@github.com>
References: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
 <aY7bIf24C-sA8R3hoi6dHmqd2R6QgzGUMg4UiWsy_5w=.a7144786-a08d-4def-9468-f13848200656@github.com>
 <f3w2BhEricECDGSIpGaZkuSKjTFus9LK-ZBV9BxvymM=.0a67f8df-912f-47c5-969d-b5a8161c31bf@github.com>
 <sw5vIggPSwpfKnyNbt8UExUxh0YWm5MXRI3tcwZSgPo=.a5b6fb7f-d344-4227-af96-5dee5d3d991e@github.com>
 <U84HM_PqW_WsrjHouuAdxceM6fYs8C9a0LOlhI5TnF8=.3a879bd2-696a-4209-ad39-d6e9d871bb0f@github.com>
Message-ID: <cTtwBoryFv_Jk5upDHI3n0cAh_fERT_X5Kx9Gvxpx98=.f0021e32-acf1-4e05-b175-504771b98b48@github.com>

On Thu, 18 Nov 2021 07:27:30 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> @stuart-marks not sure I see how anything is different here compared to the existing logic. The `Finalizer` class is explicitly initialized quite early in the init process, but if a preceding class's initialization created an object with a finalizer then that same upcall would be involved.
>
> Do we even have to have a flag on Java side? It looks like these calls are only done as the upcalls from VM, so we might just keep the flag on VM side?

@shipilev not sure what you mean by  "a flag on the Java side". The Java code just queries the VM for the finalization enabled/disabled state and uses that to control things.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6442

From shade at openjdk.java.net  Thu Nov 18 07:46:38 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Thu, 18 Nov 2021 07:46:38 GMT
Subject: RFR: JDK-8276422 Add command-line option to disable finalization
In-Reply-To: <cTtwBoryFv_Jk5upDHI3n0cAh_fERT_X5Kx9Gvxpx98=.f0021e32-acf1-4e05-b175-504771b98b48@github.com>
References: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
 <aY7bIf24C-sA8R3hoi6dHmqd2R6QgzGUMg4UiWsy_5w=.a7144786-a08d-4def-9468-f13848200656@github.com>
 <f3w2BhEricECDGSIpGaZkuSKjTFus9LK-ZBV9BxvymM=.0a67f8df-912f-47c5-969d-b5a8161c31bf@github.com>
 <sw5vIggPSwpfKnyNbt8UExUxh0YWm5MXRI3tcwZSgPo=.a5b6fb7f-d344-4227-af96-5dee5d3d991e@github.com>
 <U84HM_PqW_WsrjHouuAdxceM6fYs8C9a0LOlhI5TnF8=.3a879bd2-696a-4209-ad39-d6e9d871bb0f@github.com>
 <cTtwBoryFv_Jk5upDHI3n0cAh_fERT_X5Kx9Gvxpx98=.f0021e32-acf1-4e05-b175-504771b98b48@github.com>
Message-ID: <z41cE3BjXwZFI7VSTlJfbXkxB7hndTdJfrr16svSwKY=.d1dc118f-30bf-4d25-b94b-4ed9679b3159@github.com>

On Thu, 18 Nov 2021 07:40:34 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Do we even have to have a flag on Java side? It looks like these calls are only done as the upcalls from VM, so we might just keep the flag on VM side?
>
> @shipilev not sure what you mean by  "a flag on the Java side". The Java code just queries the VM for the finalization enabled/disabled state and uses that to control things.

Yeah, "flag" is `Holder.ENABLED` here. I mean, are Java methods `registerFinalizer` and `runFinalization` called only by VM? If so, can VM check the whole thing on VM side, without going to Java and asking back from there?

-------------

PR: https://git.openjdk.java.net/jdk/pull/6442

From dholmes at openjdk.java.net  Thu Nov 18 07:58:39 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Thu, 18 Nov 2021 07:58:39 GMT
Subject: RFR: JDK-8276422 Add command-line option to disable finalization
In-Reply-To: <6s-4rTTyX8qZpavXbov9l2-H6BU7QLA0c71-K6xSQUM=.54dd6db2-5e78-4292-8958-72cadd762154@github.com>
References: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
 <6s-4rTTyX8qZpavXbov9l2-H6BU7QLA0c71-K6xSQUM=.54dd6db2-5e78-4292-8958-72cadd762154@github.com>
Message-ID: <jWsvTuYncDMgmHZDJAMI7q6Dp5MSVlAoIr4ynDJJQQQ=.d1cd394f-3397-489b-8631-ac23544c59da@github.com>

On Thu, 18 Nov 2021 07:16:56 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:

> There is nothing here to make the various GCs take advantage of finalization being disabled. Is the plan to leave that to followup changes?

@kimbarrett I provided the basic VM parts here. I'm not aware of what specifically a GC might optimise if it knows there can be no finalizers, but that seems like something the GC folk should look to providing as a follow up. Thanks.

> src/hotspot/share/oops/instanceKlass.hpp line 338:
> 
>> 336: 
>> 337:   // Queries finalization state
>> 338:   static bool finalization_enabled() { return _finalization_enabled; }
> 
> Predicate functions like this are often named "is_xxx"; that idiom is common in this class.

This was intended as an accessor function, similar to `count()` or `offset()` not a query as-in `is_shared_boot_class()`. As it is a boolean field you could convert it to a query instead.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6442

From dholmes at openjdk.java.net  Thu Nov 18 07:58:39 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Thu, 18 Nov 2021 07:58:39 GMT
Subject: RFR: JDK-8276422 Add command-line option to disable finalization
In-Reply-To: <z41cE3BjXwZFI7VSTlJfbXkxB7hndTdJfrr16svSwKY=.d1dc118f-30bf-4d25-b94b-4ed9679b3159@github.com>
References: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
 <aY7bIf24C-sA8R3hoi6dHmqd2R6QgzGUMg4UiWsy_5w=.a7144786-a08d-4def-9468-f13848200656@github.com>
 <f3w2BhEricECDGSIpGaZkuSKjTFus9LK-ZBV9BxvymM=.0a67f8df-912f-47c5-969d-b5a8161c31bf@github.com>
 <sw5vIggPSwpfKnyNbt8UExUxh0YWm5MXRI3tcwZSgPo=.a5b6fb7f-d344-4227-af96-5dee5d3d991e@github.com>
 <U84HM_PqW_WsrjHouuAdxceM6fYs8C9a0LOlhI5TnF8=.3a879bd2-696a-4209-ad39-d6e9d871bb0f@github.com>
 <cTtwBoryFv_Jk5upDHI3n0cAh_fERT_X5Kx9Gvxpx98=.f0021e32-acf1-4e05-b175-504771b98b48@github.com>
 <z41cE3BjXwZFI7VSTlJfbXkxB7hndTdJfrr16svSwKY=.d1dc118f-30bf-4d25-b94b-4ed9679b3159@github.com>
Message-ID: <wvBJBIkLa6ii27Y-haBT-3V1rzd1x3jhmwOQTu4lj3E=.b1110b6a-1818-48ee-bea0-f405ba07be29@github.com>

On Thu, 18 Nov 2021 07:44:05 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> @shipilev not sure what you mean by  "a flag on the Java side". The Java code just queries the VM for the finalization enabled/disabled state and uses that to control things.
>
> Yeah, "flag" is `Holder.ENABLED` here. I mean, are Java methods `registerFinalizer` and `runFinalization` called only by VM? If so, can VM check the whole thing on VM side, without going to Java and asking back from there?

`registerFinalizer` does not expect to be called and only uses the "flag" as a form of assertion.

`runFinalization` is called from Java code.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6442

From smarks at openjdk.java.net  Thu Nov 18 08:04:07 2021
From: smarks at openjdk.java.net (Stuart Marks)
Date: Thu, 18 Nov 2021 08:04:07 GMT
Subject: RFR: JDK-8276422 Add command-line option to disable finalization
 [v2]
In-Reply-To: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
References: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
Message-ID: <tuSDambxc_tQmoAc_Yk6H8L7cTIKRvn7tIRj6a1nyEo=.afb4cf03-0c12-4a95-9173-7b4fb84963ca@github.com>

> Pretty much what it says. The new option controls a static member in InstanceKlass that's consulted to determine whether the finalization machinery is activated for instances when a class is loaded. A new native method is added so that this state can be queried from Java. This is used to control whether a finalizer thread is created and to disable the `System` and `Runtime::runFinalization` methods. Includes tests for the above.

Stuart Marks has updated the pull request incrementally with one additional commit since the last revision:

  Include instanceKlass.hpp in arguments.cpp

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6442/files
  - new: https://git.openjdk.java.net/jdk/pull/6442/files/3836cc94..911af0b1

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6442&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6442&range=00-01

  Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6442.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6442/head:pull/6442

PR: https://git.openjdk.java.net/jdk/pull/6442

From sspitsyn at openjdk.java.net  Thu Nov 18 08:29:41 2021
From: sspitsyn at openjdk.java.net (Serguei Spitsyn)
Date: Thu, 18 Nov 2021 08:29:41 GMT
Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with
 "assert(java_thread == _state->get_thread()) failed: Must be" [v2]
In-Reply-To: <IRMilekj_sKUbVmaCojRDxv30HVVbpHoTybTfSzYrCc=.27f57c00-c185-461e-a0f6-44ad1a550189@github.com>
References: <PqDFsCk3WFtK9ZA-Blegiu6CzrA7EoiG6jx0raAFF7A=.5ddd8298-66cb-47c4-b53d-ca148ce2fc5b@github.com>
 <IRMilekj_sKUbVmaCojRDxv30HVVbpHoTybTfSzYrCc=.27f57c00-c185-461e-a0f6-44ad1a550189@github.com>
Message-ID: <ezYy9tnTcol1PE8hdQ_UFaEYglDdDpx6AQ19ObdXMfo=.d15809f6-88cc-41f7-828f-e43700902dba@github.com>

On Thu, 18 Nov 2021 07:08:15 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote:

>> The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa.
>> The fix is to add a check for is_exiting() status into handshake closure do_thread() early.
>> There following handshake closures are fixed by this update:
>>   - UpdateForPopTopFrameClosure
>>  - SetForceEarlyReturn
>>  - SetFramePopClosure
>
> Serguei Spitsyn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
> 
>  - Merge
>  - remove vmTestbase/nsk/jvmti/PopFrame/popframe011 from ProblemList.txt
>  - fix 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with assert

It does not look right to check other conditions if the JavaThread is exiting.
So, I think, the `java_thread->is_exiting()` has to be checked first.
Please, let me know if I miss anything.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6440

From alanb at openjdk.java.net  Thu Nov 18 08:42:40 2021
From: alanb at openjdk.java.net (Alan Bateman)
Date: Thu, 18 Nov 2021 08:42:40 GMT
Subject: RFR: JDK-8276422 Add command-line option to disable finalization
 [v2]
In-Reply-To: <z41cE3BjXwZFI7VSTlJfbXkxB7hndTdJfrr16svSwKY=.d1dc118f-30bf-4d25-b94b-4ed9679b3159@github.com>
References: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
 <aY7bIf24C-sA8R3hoi6dHmqd2R6QgzGUMg4UiWsy_5w=.a7144786-a08d-4def-9468-f13848200656@github.com>
 <f3w2BhEricECDGSIpGaZkuSKjTFus9LK-ZBV9BxvymM=.0a67f8df-912f-47c5-969d-b5a8161c31bf@github.com>
 <sw5vIggPSwpfKnyNbt8UExUxh0YWm5MXRI3tcwZSgPo=.a5b6fb7f-d344-4227-af96-5dee5d3d991e@github.com>
 <U84HM_PqW_WsrjHouuAdxceM6fYs8C9a0LOlhI5TnF8=.3a879bd2-696a-4209-ad39-d6e9d871bb0f@github.com>
 <cTtwBoryFv_Jk5upDHI3n0cAh_fERT_X5Kx9Gvxpx98=.f0021e32-acf1-4e05-b175-504771b98b48@github.com>
 <z41cE3BjXwZFI7VSTlJfbXkxB7hndTdJfrr16svSwKY=.d1dc118f-30bf-4d25-b94b-4ed9679b3159@github.com>
Message-ID: <t6r4zgk8qUMXdSDYBxr-V6-KXAaB0nmawicCyc5JZoA=.cca430e6-b4e6-4eb4-b820-5be99f3574dd@github.com>

On Thu, 18 Nov 2021 07:44:05 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> @shipilev not sure what you mean by  "a flag on the Java side". The Java code just queries the VM for the finalization enabled/disabled state and uses that to control things.
>
> Yeah, "flag" is `Holder.ENABLED` here. I mean, are Java methods `registerFinalizer` and `runFinalization` called only by VM? If so, can VM check the whole thing on VM side, without going to Java and asking back from there?

I think @shipilev asks a good question. This could be done completely in the VM without the changes to j.l.ref.Finalizer. The CLI option is for experimenting, at least in the short term, and should be benign to have the Finalizer thread running, it just won't do anything.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6442

From thartmann at openjdk.java.net  Thu Nov 18 09:25:38 2021
From: thartmann at openjdk.java.net (Tobias Hartmann)
Date: Thu, 18 Nov 2021 09:25:38 GMT
Subject: RFR: 8266368: Inaccurate after_unwind hook in C2 exception handler
In-Reply-To: <c0IRGAICnZCtKjSL4QU9IliPSVnNZmzAQc2hrLDUXVw=.90d2b414-a202-4f2a-b69a-39783c24c36f@github.com>
References: <c0IRGAICnZCtKjSL4QU9IliPSVnNZmzAQc2hrLDUXVw=.90d2b414-a202-4f2a-b69a-39783c24c36f@github.com>
Message-ID: <aDlIgiJizhev3G0iPhEkV0T-bj0FH9JZAkJt0zYuspk=.f6681508-3998-4815-a7ce-cf5e0bb42a7a@github.com>

On Tue, 16 Nov 2021 08:42:32 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote:

> When we throw an exception and unwind into a frame, the exception handler of that frame needs to call an after_unwind hook for the StackWatermark code, to support for concurrent stack processing. Unfortunately, for C2 frames, I inaccurately do this in OptoRuntime::rethrow_C, but the exception handler when unwinding into a C2 frame really is OptoRuntime::handle_exception_C.
> The handle_exception_C code does walk frames to the caller though, which also pokes the StackWatermark code. So in the end, there is no real bug here, but it works for the wrong reasons. So I'd like to move the hook in rethrow_C to handle_exception_C.

Looks good.

-------------

Marked as reviewed by thartmann (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6405

From sspitsyn at openjdk.java.net  Thu Nov 18 09:34:13 2021
From: sspitsyn at openjdk.java.net (Serguei Spitsyn)
Date: Thu, 18 Nov 2021 09:34:13 GMT
Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with
 "assert(java_thread == _state->get_thread()) failed: Must be" [v3]
In-Reply-To: <PqDFsCk3WFtK9ZA-Blegiu6CzrA7EoiG6jx0raAFF7A=.5ddd8298-66cb-47c4-b53d-ca148ce2fc5b@github.com>
References: <PqDFsCk3WFtK9ZA-Blegiu6CzrA7EoiG6jx0raAFF7A=.5ddd8298-66cb-47c4-b53d-ca148ce2fc5b@github.com>
Message-ID: <Ivox2R4Y3x6BDgrGBt-QYm2grMJN5XWeetESBRyYa38=.3df670b6-2633-4a89-adac-2eb21f108689@github.com>

> The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa.
> The fix is to add a check for is_exiting() status into handshake closure do_thread() early.
> There following handshake closures are fixed by this update:
>   - UpdateForPopTopFrameClosure
>  - SetForceEarlyReturn
>  - SetFramePopClosure

Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision:

  get rid of the checks in jvmti handshakes: java_thread->threadObj() == NULL

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6440/files
  - new: https://git.openjdk.java.net/jdk/pull/6440/files/60e784ec..435ab513

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6440&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6440&range=01-02

  Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6440.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6440/head:pull/6440

PR: https://git.openjdk.java.net/jdk/pull/6440

From aph at openjdk.java.net  Thu Nov 18 09:36:37 2021
From: aph at openjdk.java.net (Andrew Haley)
Date: Thu, 18 Nov 2021 09:36:37 GMT
Subject: RFR: 8277168: AArch64: Enable arraycopy partial inlining with SVE
In-Reply-To: <Rv-7zsL9uLaSjwoS7JIDgZ2-HhIoX00l8aCiDmFd6qw=.b1dbbb3b-0578-4d7f-97ba-1acd3baaa74c@github.com>
References: <Rv-7zsL9uLaSjwoS7JIDgZ2-HhIoX00l8aCiDmFd6qw=.b1dbbb3b-0578-4d7f-97ba-1acd3baaa74c@github.com>
Message-ID: <QXBei4ATDrOdqbIilMVbJ8R0j6AyzrZut0Ykd-ebXcI=.4cc64d54-c790-4c5c-a9b7-14686e690108@github.com>

On Thu, 18 Nov 2021 03:50:45 GMT, Pengfei Li <pli at openjdk.org> wrote:

> Arraycopy partial inlining is a C2 compiler technique that avoids stub
> call overhead in small-sized arraycopy operations by generating masked
> vector instructions. So far it works on x86 AVX512 only and this patch
> enables it on AArch64 with SVE.
> 
> We add AArch64 matching rule for VectorMaskGenNode and refactor that
> node a little bit. The major change is moving the element type field
> into its TypeVectMask bottom type. The reason is that AArch64 vector
> masks are different for different vector element types.
> 
> E.g., an x86 AVX512 vector mask value masking 3 least significant vector
> lanes (of any type) is like
> 
> `0000 0000 ... 0000 0000 0000 0000 0111`
> 
> On AArch64 SVE, this mask value can only be used for masking the 3 least
> significant lanes of bytes. But for 3 lanes of ints, the value should be
> 
> `0000 0000 ... 0000 0000 0001 0001 0001`
> 
> where the least significant bit of each lane matters. So AArch64 matcher
> needs to know the vector element type to generate right masks.
> 
> After this patch, the C2 generated code for copying a 50-byte array on
> AArch64 SVE looks like
> 
>   mov     x12, #0x32
>   whilelo p0.b, xzr, x12
>   add     x11, x11, #0x10
>   ld1b    {z16.b}, p0/z, [x11]
>   add     x10, x10, #0x10
>   st1b    {z16.b}, p0, [x10]
> 
> We ran jtreg hotspot::hotspot_all, jdk::tier1~3 and langtools::tier1 on
> both x86 AVX512 and AArch64 SVE machines, no issue is found. We tested
> JMH org/openjdk/bench/java/lang/ArrayCopyAligned.java with small array
> size arguments on a 512-bit SVE-featured CPU. We got below performance
> data changes.
> 
> Benchmark                  (length)  (Performance)
> ArrayCopyAligned.testByte        10          -2.6%
> ArrayCopyAligned.testByte        20          +4.7%
> ArrayCopyAligned.testByte        30          +4.8%
> ArrayCopyAligned.testByte        40         +21.7%
> ArrayCopyAligned.testByte        50         +22.5%
> ArrayCopyAligned.testByte        60         +28.4%
> 
> The test machine has SVE vector size of 512 bits, so we see performance
> gain for most array sizes less than 64 bytes. For very small arrays we
> see a bit regression because a vector load/store may be a bit slower
> than 1 or 2 scalar loads/stores.

I'll have a look. It'll take me a little time to provision a suitable SVE-enabled AArch64 box.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6444

From dholmes at openjdk.java.net  Thu Nov 18 09:58:39 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Thu, 18 Nov 2021 09:58:39 GMT
Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with
 "assert(java_thread == _state->get_thread()) failed: Must be" [v3]
In-Reply-To: <Ivox2R4Y3x6BDgrGBt-QYm2grMJN5XWeetESBRyYa38=.3df670b6-2633-4a89-adac-2eb21f108689@github.com>
References: <PqDFsCk3WFtK9ZA-Blegiu6CzrA7EoiG6jx0raAFF7A=.5ddd8298-66cb-47c4-b53d-ca148ce2fc5b@github.com>
 <Ivox2R4Y3x6BDgrGBt-QYm2grMJN5XWeetESBRyYa38=.3df670b6-2633-4a89-adac-2eb21f108689@github.com>
Message-ID: <tHuu38BHH4gNKkRQT9BJ864Gvk9oPNdLH5hrPNOlkkM=.4e17f83c-7c0d-4340-a0aa-ee30ab8efa82@github.com>

On Thu, 18 Nov 2021 09:34:13 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote:

>> The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa.
>> The fix is to add a check for is_exiting() status into handshake closure do_thread() early.
>> There following handshake closures are fixed by this update:
>>   - UpdateForPopTopFrameClosure
>>  - SetForceEarlyReturn
>>  - SetFramePopClosure
>
> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision:
> 
>   get rid of the checks in jvmti handshakes: java_thread->threadObj() == NULL

IIUC these cases all require that the target is suspended else it is an error. If the target is_exiting then it is not suspended and therefore there is an error. The suspension check should already handle an exiting thread and so there is no need to explicitly add an is_exiting check.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6440

From stefank at openjdk.java.net  Thu Nov 18 10:03:00 2021
From: stefank at openjdk.java.net (Stefan Karlsson)
Date: Thu, 18 Nov 2021 10:03:00 GMT
Subject: RFR: 8277212: GC accidentally cleans valid megamorphic vtable inline
 caches
Message-ID: <9wD0oJ2P5bG1qYJ71qdCN4-Q_fiOkKWh4kXTdw8Yb8o=.fc8b7be9-7818-4414-9177-542e2b061480@github.com>

We got a report on the zgc-dev list about a large performance issue affecting ZGC:
https://mail.openjdk.java.net/pipermail/zgc-dev/2021-November/001086.html

One of the issues that the reporter identified was that we could get extremely long class unloading / unlinking times:

[17606.140s][info][gc,phases ] GC(719) Concurrent Process Non-Strong References 25781.928ms
[17610.181s][info][gc,stats ] Subphase: Concurrent Classes Unlink 14280.772 / 25769.511 1126.563 / 25769.511 217.882 / 68385.750 217.882 / 68385.750 ms

and while this were happening we got a huge number of ICBufferFull safepoints.

It turns out that we have a 10-year-old bug in the inline cache cleaning code. This code came in with the permgen removal. See how the original code only calls set_to_clean when ic_oop is non-null:

https://github.com/openjdk/jdk/commit/5c58d27aac7b291b879a7a3ff6f39fca25619103

        CompiledIC *ic = CompiledIC_at(iter.reloc());
        oop ic_oop = ic->cached_oop();
        if (ic_oop != NULL && !is_alive->do_object_b(ic_oop)) {
          // The only exception is compiledICHolder oops which may
          // yet be marked below. (We check this further below).
          if (ic_oop->is_compiledICHolder()) {
            compiledICHolderOop cichk_oop = compiledICHolderOop(ic_oop);
            if (is_alive->do_object_b(
                  cichk_oop->holder_method()->method_holder()) &&
                is_alive->do_object_b(cichk_oop->holder_klass())) {
              continue;
            }
          }
          ic->set_to_clean();
          assert(ic->cached_oop() == NULL,
                 "cached oop in IC should be cleared");
        }
      }


The rewritten code put the set_to_clean call in a different scope, causing the CompiledIC to also be cleaned when ic_oop is NULL:

        CompiledIC *ic = CompiledIC_at(iter.reloc());
        if (ic->is_icholder_call()) {
          // The only exception is compiledICHolder oops which may
          // yet be marked below. (We check this further below).
          CompiledICHolder* cichk_oop = ic->cached_icholder();
          if (cichk_oop->holder_method()->method_holder()->is_loader_alive(is_alive) &&
              cichk_oop->holder_klass()->is_loader_alive(is_alive)) {
              continue;
            }
        } else {
          Metadata* ic_oop = ic->cached_metadata();
          if (ic_oop != NULL) {
            if (ic_oop->is_klass()) {
              if (((Klass*)ic_oop)->is_loader_alive(is_alive)) {
                continue;
              }
            } else if (ic_oop->is_method()) {
              if (((Method*)ic_oop)->method_holder()->is_loader_alive(is_alive)) {
                continue;
              }
            } else {
              ShouldNotReachHere();
            }
          }
          }
          ic->set_to_clean();
      }


Note the weird indentation, which could be seen as a hint that this might be a dubious / accidental change.

To understand why this is causing the problems we are seeing it's good to start by reading:
https://wiki.openjdk.java.net/display/HotSpot/Overview+of+CompiledIC+and+CompiledStaticCall

When the GC hits this path and finds an ic_oop that is NULL, it means that it is dealing with an inline cache that is a megamorphic vtable call (or clean). Those should not be cleaned (at least that wasn't the intention of the old code).

But now we do clean them, and to do so we use an ICStub (to make a safe transition to the clean state), which uses up slots in the ICBuffer. When the ICBuffer is full, concurrent GCs have to stop and schedule an ICBufferFull safepoint stop-the-world operation, which removes the ICStub from the inline cache and completely frees up the ICBuffer. If the GC cleans a lot of these megamorphic vtable inline caches, then we'll create a large number of ICBufferFull safepoints.

But it is even worse than that. After the class unloading GCs have destroyed all megamorphic vtable inline caches, the Java threads will see these cleaned inline caches and correct them. Correcting the cleaned inline caches from the Java threads will also use ICStubs, and eventually the inline caches will transition back to be a megamorphic vtable calls. Because of this we can end up in a situation where the GC and Java threads change the inline cache back and forth between clean and megamorphic vtable calls. When this happen both GC and Java threads will continuously schedule ICBufferFull safepoints, and this can go on for many seconds, even minutes, if we are unlucky. For ZGC this has the effect that it blocks any further GC work, and eventually the Java threads will run out of memory and hit allocation stalls. The Java threads will then wait for the GC "clean" all inline caches and exit the class unloading phase and proceed to the phase where memory is reclaimed. You can see in th
 e GC logs that even though the problematic unlinking phase goes on for many seconds, the allocation stalls are "only" a few hundred milliseconds. This shows that when the Java threads stop fighting over the inline caches, the GC can finish the work relatively quickly.

G1 performs the inline cache cleaning while the Java threads are stopped, and therefore don't have to use ICStubs when the megamorphic vtables are accidentally cleaned. So, G1 (and other stop-the-world class unloading GCs) won't enter the situation where the GC and Java thread concurrently fight over the inline caches. It still causes the Java threads to have to take a slow path and fix the inline caches, which can result in unnecessary ICBufferFull safepoints.

I been able to reproduce the issue where ZGC and the Java threads fight over the ICStubs, causing minute long unloading times, by running one of the microbenchmarks from the Blackbird library used by the reporter of this issue. See description in:
https://mail.openjdk.java.net/pipermail/zgc-dev/2021-November/001096.html

I think this could be reproduced in other workloads as well. I've also been able to reproduce the excessive ICBufferFull safepoints with Kitchensink (an oracle-internal stress test).

I've verified that restoring the set_to_clean code to the right scope fixes the issue that I can reproduce with both Blackbird and Kitchensink. After the fix, the class unloading times go back to normal levels.

To identify this issue, it's good to run with -Xlog:gc*,safepoint and take note of the "Concurrent Process Non-Strong References" times and ICBufferFull safepoint lines.

Example logs from ZGC where concurrent cleaning causes ICBufferFull safepoints:

[38.557s][1637062062666ms][info ][gc,phases   ] GC(222) Concurrent Mark Free 0.001ms
[38.565s][1637062062673ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 7389821 ns, Reaching safepoint: 167546 ns, At safepoint: 6840 ns, Total: 174386 ns
[38.565s][1637062062673ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 27749 ns, Reaching safepoint: 89368 ns, At safepoint: 5710 ns, Total: 95078 ns
[38.566s][1637062062674ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 678872 ns, Reaching safepoint: 145967 ns, At safepoint: 6969 ns, Total: 152936 ns
[38.567s][1637062062675ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 934596 ns, Reaching safepoint: 165826 ns, At safepoint: 5460 ns, Total: 171286 ns
[38.567s][1637062062675ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 16500 ns, Reaching safepoint: 91147 ns, At safepoint: 5770 ns, Total: 96917 ns
[38.568s][1637062062677ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1124041 ns, Reaching safepoint: 154426 ns, At safepoint: 6280 ns, Total: 160706 ns
[38.570s][1637062062678ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1222819 ns, Reaching safepoint: 152646 ns, At safepoint: 6920 ns, Total: 159566 ns
[38.571s][1637062062679ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1070303 ns, Reaching safepoint: 152686 ns, At safepoint: 6029 ns, Total: 158715 ns
[38.571s][1637062062679ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 23650 ns, Reaching safepoint: 83208 ns, At safepoint: 6170 ns, Total: 89378 ns
[38.572s][1637062062681ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1005014 ns, Reaching safepoint: 148206 ns, At safepoint: 5660 ns, Total: 153866 ns
[38.572s][1637062062681ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 15110 ns, Reaching safepoint: 84047 ns, At safepoint: 5690 ns, Total: 89737 ns
[38.574s][1637062062682ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1370755 ns, Reaching safepoint: 171876 ns, At safepoint: 5030 ns, Total: 176906 ns
[38.574s][1637062062682ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 19749 ns, Reaching safepoint: 82478 ns, At safepoint: 4740 ns, Total: 87218 ns
[38.574s][1637062062682ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 12480 ns, Reaching safepoint: 86707 ns, At safepoint: 5040 ns, Total: 91747 ns
[38.575s][1637062062684ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 904007 ns, Reaching safepoint: 162666 ns, At safepoint: 5160 ns, Total: 167826 ns
[38.575s][1637062062684ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14269 ns, Reaching safepoint: 80878 ns, At safepoint: 5420 ns, Total: 86298 ns
[38.577s][1637062062685ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1240908 ns, Reaching safepoint: 144267 ns, At safepoint: 7030 ns, Total: 151297 ns
[38.578s][1637062062686ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 971325 ns, Reaching safepoint: 175725 ns, At safepoint: 4710 ns, Total: 180435 ns
[38.578s][1637062062686ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 16140 ns, Reaching safepoint: 80258 ns, At safepoint: 5389 ns, Total: 85647 ns
[38.578s][1637062062686ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 10290 ns, Reaching safepoint: 80858 ns, At safepoint: 5530 ns, Total: 86388 ns
[38.579s][1637062062687ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 430509 ns, Reaching safepoint: 159906 ns, At safepoint: 4610 ns, Total: 164516 ns
[38.579s][1637062062687ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 18269 ns, Reaching safepoint: 83838 ns, At safepoint: 4520 ns, Total: 88358 ns
[38.579s][1637062062687ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 13270 ns, Reaching safepoint: 77928 ns, At safepoint: 4790 ns, Total: 82718 ns
[38.579s][1637062062688ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 384230 ns, Reaching safepoint: 193705 ns, At safepoint: 4080 ns, Total: 197785 ns
[38.579s][1637062062688ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14099 ns, Reaching safepoint: 80908 ns, At safepoint: 4840 ns, Total: 85748 ns
[38.580s][1637062062688ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 9150 ns, Reaching safepoint: 79268 ns, At safepoint: 4890 ns, Total: 84158 ns
[38.580s][1637062062689ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 549396 ns, Reaching safepoint: 143086 ns, At safepoint: 6430 ns, Total: 149516 ns
[38.580s][1637062062689ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 12540 ns, Reaching safepoint: 94717 ns, At safepoint: 5800 ns, Total: 100517 ns
[38.581s][1637062062690ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 847758 ns, Reaching safepoint: 146687 ns, At safepoint: 5969 ns, Total: 152656 ns
[38.582s][1637062062691ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 972285 ns, Reaching safepoint: 128177 ns, At safepoint: 6350 ns, Total: 134527 ns
[38.584s][1637062062692ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 986975 ns, Reaching safepoint: 136396 ns, At safepoint: 5770 ns, Total: 142166 ns
[38.584s][1637062062692ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 17280 ns, Reaching safepoint: 87097 ns, At safepoint: 5270 ns, Total: 92367 ns
[38.585s][1637062062693ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1143131 ns, Reaching safepoint: 188315 ns, At safepoint: 5250 ns, Total: 193565 ns
[38.585s][1637062062694ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 12200 ns, Reaching safepoint: 80168 ns, At safepoint: 7480 ns, Total: 87648 ns
[38.586s][1637062062695ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1153410 ns, Reaching safepoint: 166846 ns, At safepoint: 7060 ns, Total: 173906 ns
[38.587s][1637062062695ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 21549 ns, Reaching safepoint: 89898 ns, At safepoint: 5360 ns, Total: 95258 ns
[38.588s][1637062062696ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1129411 ns, Reaching safepoint: 156726 ns, At safepoint: 4810 ns, Total: 161536 ns
[38.588s][1637062062696ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14089 ns, Reaching safepoint: 80588 ns, At safepoint: 5170 ns, Total: 85758 ns
[38.589s][1637062062697ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 644824 ns, Reaching safepoint: 140666 ns, At safepoint: 5990 ns, Total: 146656 ns
[38.590s][1637062062699ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1085312 ns, Reaching safepoint: 254264 ns, At safepoint: 5440 ns, Total: 259704 ns
[38.590s][1637062062699ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14609 ns, Reaching safepoint: 83748 ns, At safepoint: 5610 ns, Total: 89358 ns
[38.591s][1637062062699ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 387680 ns, Reaching safepoint: 201215 ns, At safepoint: 5340 ns, Total: 206555 ns
[38.591s][1637062062699ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 18929 ns, Reaching safepoint: 85098 ns, At safepoint: 5910 ns, Total: 91008 ns
[38.591s][1637062062700ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 380750 ns, Reaching safepoint: 175066 ns, At safepoint: 4730 ns, Total: 179796 ns
[38.592s][1637062062700ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14500 ns, Reaching safepoint: 80577 ns, At safepoint: 6790 ns, Total: 87367 ns
[38.592s][1637062062700ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14660 ns, Reaching safepoint: 78498 ns, At safepoint: 7180 ns, Total: 85678 ns
[38.592s][1637062062701ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 655783 ns, Reaching safepoint: 141717 ns, At safepoint: 6089 ns, Total: 147806 ns
[38.594s][1637062062702ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 916657 ns, Reaching safepoint: 144226 ns, At safepoint: 5360 ns, Total: 149586 ns
[38.595s][1637062062703ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1012334 ns, Reaching safepoint: 133037 ns, At safepoint: 10439 ns, Total: 143476 ns
[38.597s][1637062062705ms][info ][gc,phases   ] GC(222) Concurrent Process Non-Strong References 39.443ms


Example logs from G1 where the Java threads fixes the cleaned inline caches and run out of ICStubs:

[125.998s][1637065197322ms][info ][gc          ] GC(1040) Pause Remark 586M->414M(2048M) 6.609ms
[125.998s][1637065197322ms][info ][gc,cpu      ] GC(1040) User=0.08s Sys=0.00s Real=0.01s
[125.998s][1637065197322ms][info ][safepoint   ] Safepoint "G1Concurrent", Time since last: 33150646 ns, Reaching safepoint: 103457 ns, At safepoint: 6666988 ns, Total: 6770445 ns
[125.998s][1637065197322ms][info ][gc,marking  ] GC(1040) Concurrent Mark 38.296ms
[125.998s][1637065197322ms][info ][gc,marking  ] GC(1040) Concurrent Rebuild Remembered Sets
[126.001s][1637065197326ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 2621782 ns, Reaching safepoint: 626684 ns, At safepoint: 9340 ns, Total: 636024 ns
[126.002s][1637065197326ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 19949 ns, Reaching safepoint: 714022 ns, At safepoint: 12160 ns, Total: 726182 ns
[126.007s][1637065197331ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 4665009 ns, Reaching safepoint: 339751 ns, At safepoint: 9640 ns, Total: 349391 ns
[126.009s][1637065197334ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 2274802 ns, Reaching safepoint: 365760 ns, At safepoint: 9250 ns, Total: 375010 ns
[126.027s][1637065197352ms][info ][gc,marking  ] GC(1040) Concurrent Rebuild Remembered Sets 29.618ms


I've tested the performance of the change with SPECjbb2015, SPECjvm2008, DaCapo, Renaissance.

I've tested run the patch through tier1-7.

Note that I've made patch as small as possible to make it easier to backport. Thanks @fisk for discussion and explanation of the inline caches code.

-------------

Commit messages:
 - Minimize
 - Rewrite
 - Fix guarded by flags

Changes: https://git.openjdk.java.net/jdk/pull/6450/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6450&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8277212
  Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6450.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6450/head:pull/6450

PR: https://git.openjdk.java.net/jdk/pull/6450

From simonis at openjdk.java.net  Thu Nov 18 10:21:01 2021
From: simonis at openjdk.java.net (Volker Simonis)
Date: Thu, 18 Nov 2021 10:21:01 GMT
Subject: RFR: 8273563: Improve performance of implicit exceptions with
 -XX:-OmitStackTraceInFastThrow [v10]
In-Reply-To: <V33bqOWqbYrr4ACfUn3SkS_JbOuICbX68z7jlX37pXQ=.a4a6b7ce-da22-440f-b3c7-d0a99a28fea0@github.com>
References: <V33bqOWqbYrr4ACfUn3SkS_JbOuICbX68z7jlX37pXQ=.a4a6b7ce-da22-440f-b3c7-d0a99a28fea0@github.com>
Message-ID: <3DyX38fUwXmYfYuInLP-xhm1toijhtr2U7pHK2zhNqU=.b91e17bd-bea6-4323-96e0-03c59e3f0573@github.com>

> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening.
> 
> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274):
> 
>     public static boolean isAlpha(int c) {
>         try {
>             return IS_ALPHA[c];
>         } catch (ArrayIndexOutOfBoundsException ex) {
>             return false;
>         }
>     }
> 
> 
> ### Solution
> 
> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code:
> 
> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions
> Benchmark                 (exceptionProbability)  Mode  Cnt      Score      Error  Units
> ImplicitExceptions.bench                     0.0  avgt    5      1.430 ?    0.353  ns/op
> ImplicitExceptions.bench                    0.33  avgt    5   3563.038 ?   77.358  ns/op
> ImplicitExceptions.bench                    0.66  avgt    5   8609.693 ? 1205.104  ns/op
> ImplicitExceptions.bench                    1.00  avgt    5  12842.401 ? 1022.728  ns/op
> 
> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions
> Benchmark                 (exceptionProbability)  Mode  Cnt      Score      Error  Units
> ImplicitExceptions.bench                     0.0  avgt    5     1.432  ?    0.352  ns/op
> ImplicitExceptions.bench                    0.33  avgt    5   355.723  ?   16.641  ns/op
> ImplicitExceptions.bench                    0.66  avgt    5   887.068  ?  166.728  ns/op
> ImplicitExceptions.bench                    1.00  avgt    5  1274.418  ?   88.235  ns/op
> 
> 
> ### Implementation details
> 
> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default.
> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `<init>` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. 
> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`.
> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call.
> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster.

Volker Simonis has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits:

 - Fix jit/t/t105/t105.java to also use -XX:-OptimizeImplicitExceptions in addition to -XX:-OmitStacktracesInFastThrow
 - Fix IR Framework test Traps::classCheck() which now behaves differently with -XX:+OptimizeImplicitExceptions
 - Fix build issue for minimal/zero build one more time
 - Minor enhancements and fixes requested by Martin
 - Add new WhiteBox functionality to sun/hotspot/WhiteBox.java as well to avoid warnings in the tests which are still using it.
 - Fix build issue for minimal/zero build
 - Added jtreg test and extended the Whitebox API to export decompile, deopt and trap counters
 - Fix special case where we're creating an implicit exception for a regular invoke* bytecode
 - Minor updates as requested by @TheRealMDoerr
 - 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow

-------------

Changes: https://git.openjdk.java.net/jdk/pull/5488/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5488&range=09
  Stats: 793 lines in 18 files changed: 778 ins; 0 del; 15 mod
  Patch: https://git.openjdk.java.net/jdk/pull/5488.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/5488/head:pull/5488

PR: https://git.openjdk.java.net/jdk/pull/5488

From eosterlund at openjdk.java.net  Thu Nov 18 10:32:50 2021
From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=)
Date: Thu, 18 Nov 2021 10:32:50 GMT
Subject: RFR: 8277212: GC accidentally cleans valid megamorphic vtable
 inline caches
In-Reply-To: <9wD0oJ2P5bG1qYJ71qdCN4-Q_fiOkKWh4kXTdw8Yb8o=.fc8b7be9-7818-4414-9177-542e2b061480@github.com>
References: <9wD0oJ2P5bG1qYJ71qdCN4-Q_fiOkKWh4kXTdw8Yb8o=.fc8b7be9-7818-4414-9177-542e2b061480@github.com>
Message-ID: <FEl8d1y812vjzLlYNCJXHFXp7rt36fvVNbD8N6mrPi4=.08e82b6d-6007-480c-a5a9-9bd9e523e003@github.com>

On Thu, 18 Nov 2021 09:56:37 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:

> We got a report on the zgc-dev list about a large performance issue affecting ZGC:
> https://mail.openjdk.java.net/pipermail/zgc-dev/2021-November/001086.html
> 
> One of the issues that the reporter identified was that we could get extremely long class unloading / unlinking times:
> 
> [17606.140s][info][gc,phases ] GC(719) Concurrent Process Non-Strong References 25781.928ms
> [17610.181s][info][gc,stats ] Subphase: Concurrent Classes Unlink 14280.772 / 25769.511 1126.563 / 25769.511 217.882 / 68385.750 217.882 / 68385.750 ms
> 
> and while this were happening we got a huge number of ICBufferFull safepoints.
> 
> It turns out that we have a 10-year-old bug in the inline cache cleaning code. This code came in with the permgen removal. See how the original code only calls set_to_clean when ic_oop is non-null:
> 
> https://github.com/openjdk/jdk/commit/5c58d27aac7b291b879a7a3ff6f39fca25619103
> 
>         CompiledIC *ic = CompiledIC_at(iter.reloc());
>         oop ic_oop = ic->cached_oop();
>         if (ic_oop != NULL && !is_alive->do_object_b(ic_oop)) {
>           // The only exception is compiledICHolder oops which may
>           // yet be marked below. (We check this further below).
>           if (ic_oop->is_compiledICHolder()) {
>             compiledICHolderOop cichk_oop = compiledICHolderOop(ic_oop);
>             if (is_alive->do_object_b(
>                   cichk_oop->holder_method()->method_holder()) &&
>                 is_alive->do_object_b(cichk_oop->holder_klass())) {
>               continue;
>             }
>           }
>           ic->set_to_clean();
>           assert(ic->cached_oop() == NULL,
>                  "cached oop in IC should be cleared");
>         }
>       }
> 
> 
> The rewritten code put the set_to_clean call in a different scope, causing the CompiledIC to also be cleaned when ic_oop is NULL:
> 
>         CompiledIC *ic = CompiledIC_at(iter.reloc());
>         if (ic->is_icholder_call()) {
>           // The only exception is compiledICHolder oops which may
>           // yet be marked below. (We check this further below).
>           CompiledICHolder* cichk_oop = ic->cached_icholder();
>           if (cichk_oop->holder_method()->method_holder()->is_loader_alive(is_alive) &&
>               cichk_oop->holder_klass()->is_loader_alive(is_alive)) {
>               continue;
>             }
>         } else {
>           Metadata* ic_oop = ic->cached_metadata();
>           if (ic_oop != NULL) {
>             if (ic_oop->is_klass()) {
>               if (((Klass*)ic_oop)->is_loader_alive(is_alive)) {
>                 continue;
>               }
>             } else if (ic_oop->is_method()) {
>               if (((Method*)ic_oop)->method_holder()->is_loader_alive(is_alive)) {
>                 continue;
>               }
>             } else {
>               ShouldNotReachHere();
>             }
>           }
>           }
>           ic->set_to_clean();
>       }
> 
> 
> Note the weird indentation, which could be seen as a hint that this might be a dubious / accidental change.
> 
> To understand why this is causing the problems we are seeing it's good to start by reading:
> https://wiki.openjdk.java.net/display/HotSpot/Overview+of+CompiledIC+and+CompiledStaticCall
> 
> When the GC hits this path and finds an ic_oop that is NULL, it means that it is dealing with an inline cache that is a megamorphic vtable call (or clean). Those should not be cleaned (at least that wasn't the intention of the old code).
> 
> But now we do clean them, and to do so we use an ICStub (to make a safe transition to the clean state), which uses up slots in the ICBuffer. When the ICBuffer is full, concurrent GCs have to stop and schedule an ICBufferFull safepoint stop-the-world operation, which removes the ICStub from the inline cache and completely frees up the ICBuffer. If the GC cleans a lot of these megamorphic vtable inline caches, then we'll create a large number of ICBufferFull safepoints.
> 
> But it is even worse than that. After the class unloading GCs have destroyed all megamorphic vtable inline caches, the Java threads will see these cleaned inline caches and correct them. Correcting the cleaned inline caches from the Java threads will also use ICStubs, and eventually the inline caches will transition back to be a megamorphic vtable calls. Because of this we can end up in a situation where the GC and Java threads change the inline cache back and forth between clean and megamorphic vtable calls. When this happen both GC and Java threads will continuously schedule ICBufferFull safepoints, and this can go on for many seconds, even minutes, if we are unlucky. For ZGC this has the effect that it blocks any further GC work, and eventually the Java threads will run out of memory and hit allocation stalls. The Java threads will then wait for the GC "clean" all inline caches and exit the class unloading phase and proceed to the phase where memory is reclaimed. You can see in 
 the GC logs that even though the problematic unlinking phase goes on for many seconds, the allocation stalls are "only" a few hundred milliseconds. This shows that when the Java threads stop fighting over the inline caches, the GC can finish the work relatively quickly.
> 
> G1 performs the inline cache cleaning while the Java threads are stopped, and therefore don't have to use ICStubs when the megamorphic vtables are accidentally cleaned. So, G1 (and other stop-the-world class unloading GCs) won't enter the situation where the GC and Java thread concurrently fight over the inline caches. It still causes the Java threads to have to take a slow path and fix the inline caches, which can result in unnecessary ICBufferFull safepoints.
> 
> I been able to reproduce the issue where ZGC and the Java threads fight over the ICStubs, causing minute long unloading times, by running one of the microbenchmarks from the Blackbird library used by the reporter of this issue. See description in:
> https://mail.openjdk.java.net/pipermail/zgc-dev/2021-November/001096.html
> 
> I think this could be reproduced in other workloads as well. I've also been able to reproduce the excessive ICBufferFull safepoints with Kitchensink (an oracle-internal stress test).
> 
> I've verified that restoring the set_to_clean code to the right scope fixes the issue that I can reproduce with both Blackbird and Kitchensink. After the fix, the class unloading times go back to normal levels.
> 
> To identify this issue, it's good to run with -Xlog:gc*,safepoint and take note of the "Concurrent Process Non-Strong References" times and ICBufferFull safepoint lines.
> 
> Example logs from ZGC where concurrent cleaning causes ICBufferFull safepoints:
> 
> [38.557s][1637062062666ms][info ][gc,phases   ] GC(222) Concurrent Mark Free 0.001ms
> [38.565s][1637062062673ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 7389821 ns, Reaching safepoint: 167546 ns, At safepoint: 6840 ns, Total: 174386 ns
> [38.565s][1637062062673ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 27749 ns, Reaching safepoint: 89368 ns, At safepoint: 5710 ns, Total: 95078 ns
> [38.566s][1637062062674ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 678872 ns, Reaching safepoint: 145967 ns, At safepoint: 6969 ns, Total: 152936 ns
> [38.567s][1637062062675ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 934596 ns, Reaching safepoint: 165826 ns, At safepoint: 5460 ns, Total: 171286 ns
> [38.567s][1637062062675ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 16500 ns, Reaching safepoint: 91147 ns, At safepoint: 5770 ns, Total: 96917 ns
> [38.568s][1637062062677ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1124041 ns, Reaching safepoint: 154426 ns, At safepoint: 6280 ns, Total: 160706 ns
> [38.570s][1637062062678ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1222819 ns, Reaching safepoint: 152646 ns, At safepoint: 6920 ns, Total: 159566 ns
> [38.571s][1637062062679ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1070303 ns, Reaching safepoint: 152686 ns, At safepoint: 6029 ns, Total: 158715 ns
> [38.571s][1637062062679ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 23650 ns, Reaching safepoint: 83208 ns, At safepoint: 6170 ns, Total: 89378 ns
> [38.572s][1637062062681ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1005014 ns, Reaching safepoint: 148206 ns, At safepoint: 5660 ns, Total: 153866 ns
> [38.572s][1637062062681ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 15110 ns, Reaching safepoint: 84047 ns, At safepoint: 5690 ns, Total: 89737 ns
> [38.574s][1637062062682ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1370755 ns, Reaching safepoint: 171876 ns, At safepoint: 5030 ns, Total: 176906 ns
> [38.574s][1637062062682ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 19749 ns, Reaching safepoint: 82478 ns, At safepoint: 4740 ns, Total: 87218 ns
> [38.574s][1637062062682ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 12480 ns, Reaching safepoint: 86707 ns, At safepoint: 5040 ns, Total: 91747 ns
> [38.575s][1637062062684ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 904007 ns, Reaching safepoint: 162666 ns, At safepoint: 5160 ns, Total: 167826 ns
> [38.575s][1637062062684ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14269 ns, Reaching safepoint: 80878 ns, At safepoint: 5420 ns, Total: 86298 ns
> [38.577s][1637062062685ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1240908 ns, Reaching safepoint: 144267 ns, At safepoint: 7030 ns, Total: 151297 ns
> [38.578s][1637062062686ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 971325 ns, Reaching safepoint: 175725 ns, At safepoint: 4710 ns, Total: 180435 ns
> [38.578s][1637062062686ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 16140 ns, Reaching safepoint: 80258 ns, At safepoint: 5389 ns, Total: 85647 ns
> [38.578s][1637062062686ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 10290 ns, Reaching safepoint: 80858 ns, At safepoint: 5530 ns, Total: 86388 ns
> [38.579s][1637062062687ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 430509 ns, Reaching safepoint: 159906 ns, At safepoint: 4610 ns, Total: 164516 ns
> [38.579s][1637062062687ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 18269 ns, Reaching safepoint: 83838 ns, At safepoint: 4520 ns, Total: 88358 ns
> [38.579s][1637062062687ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 13270 ns, Reaching safepoint: 77928 ns, At safepoint: 4790 ns, Total: 82718 ns
> [38.579s][1637062062688ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 384230 ns, Reaching safepoint: 193705 ns, At safepoint: 4080 ns, Total: 197785 ns
> [38.579s][1637062062688ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14099 ns, Reaching safepoint: 80908 ns, At safepoint: 4840 ns, Total: 85748 ns
> [38.580s][1637062062688ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 9150 ns, Reaching safepoint: 79268 ns, At safepoint: 4890 ns, Total: 84158 ns
> [38.580s][1637062062689ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 549396 ns, Reaching safepoint: 143086 ns, At safepoint: 6430 ns, Total: 149516 ns
> [38.580s][1637062062689ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 12540 ns, Reaching safepoint: 94717 ns, At safepoint: 5800 ns, Total: 100517 ns
> [38.581s][1637062062690ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 847758 ns, Reaching safepoint: 146687 ns, At safepoint: 5969 ns, Total: 152656 ns
> [38.582s][1637062062691ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 972285 ns, Reaching safepoint: 128177 ns, At safepoint: 6350 ns, Total: 134527 ns
> [38.584s][1637062062692ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 986975 ns, Reaching safepoint: 136396 ns, At safepoint: 5770 ns, Total: 142166 ns
> [38.584s][1637062062692ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 17280 ns, Reaching safepoint: 87097 ns, At safepoint: 5270 ns, Total: 92367 ns
> [38.585s][1637062062693ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1143131 ns, Reaching safepoint: 188315 ns, At safepoint: 5250 ns, Total: 193565 ns
> [38.585s][1637062062694ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 12200 ns, Reaching safepoint: 80168 ns, At safepoint: 7480 ns, Total: 87648 ns
> [38.586s][1637062062695ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1153410 ns, Reaching safepoint: 166846 ns, At safepoint: 7060 ns, Total: 173906 ns
> [38.587s][1637062062695ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 21549 ns, Reaching safepoint: 89898 ns, At safepoint: 5360 ns, Total: 95258 ns
> [38.588s][1637062062696ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1129411 ns, Reaching safepoint: 156726 ns, At safepoint: 4810 ns, Total: 161536 ns
> [38.588s][1637062062696ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14089 ns, Reaching safepoint: 80588 ns, At safepoint: 5170 ns, Total: 85758 ns
> [38.589s][1637062062697ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 644824 ns, Reaching safepoint: 140666 ns, At safepoint: 5990 ns, Total: 146656 ns
> [38.590s][1637062062699ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1085312 ns, Reaching safepoint: 254264 ns, At safepoint: 5440 ns, Total: 259704 ns
> [38.590s][1637062062699ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14609 ns, Reaching safepoint: 83748 ns, At safepoint: 5610 ns, Total: 89358 ns
> [38.591s][1637062062699ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 387680 ns, Reaching safepoint: 201215 ns, At safepoint: 5340 ns, Total: 206555 ns
> [38.591s][1637062062699ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 18929 ns, Reaching safepoint: 85098 ns, At safepoint: 5910 ns, Total: 91008 ns
> [38.591s][1637062062700ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 380750 ns, Reaching safepoint: 175066 ns, At safepoint: 4730 ns, Total: 179796 ns
> [38.592s][1637062062700ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14500 ns, Reaching safepoint: 80577 ns, At safepoint: 6790 ns, Total: 87367 ns
> [38.592s][1637062062700ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14660 ns, Reaching safepoint: 78498 ns, At safepoint: 7180 ns, Total: 85678 ns
> [38.592s][1637062062701ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 655783 ns, Reaching safepoint: 141717 ns, At safepoint: 6089 ns, Total: 147806 ns
> [38.594s][1637062062702ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 916657 ns, Reaching safepoint: 144226 ns, At safepoint: 5360 ns, Total: 149586 ns
> [38.595s][1637062062703ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1012334 ns, Reaching safepoint: 133037 ns, At safepoint: 10439 ns, Total: 143476 ns
> [38.597s][1637062062705ms][info ][gc,phases   ] GC(222) Concurrent Process Non-Strong References 39.443ms
> 
> 
> Example logs from G1 where the Java threads fixes the cleaned inline caches and run out of ICStubs:
> 
> [125.998s][1637065197322ms][info ][gc          ] GC(1040) Pause Remark 586M->414M(2048M) 6.609ms
> [125.998s][1637065197322ms][info ][gc,cpu      ] GC(1040) User=0.08s Sys=0.00s Real=0.01s
> [125.998s][1637065197322ms][info ][safepoint   ] Safepoint "G1Concurrent", Time since last: 33150646 ns, Reaching safepoint: 103457 ns, At safepoint: 6666988 ns, Total: 6770445 ns
> [125.998s][1637065197322ms][info ][gc,marking  ] GC(1040) Concurrent Mark 38.296ms
> [125.998s][1637065197322ms][info ][gc,marking  ] GC(1040) Concurrent Rebuild Remembered Sets
> [126.001s][1637065197326ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 2621782 ns, Reaching safepoint: 626684 ns, At safepoint: 9340 ns, Total: 636024 ns
> [126.002s][1637065197326ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 19949 ns, Reaching safepoint: 714022 ns, At safepoint: 12160 ns, Total: 726182 ns
> [126.007s][1637065197331ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 4665009 ns, Reaching safepoint: 339751 ns, At safepoint: 9640 ns, Total: 349391 ns
> [126.009s][1637065197334ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 2274802 ns, Reaching safepoint: 365760 ns, At safepoint: 9250 ns, Total: 375010 ns
> [126.027s][1637065197352ms][info ][gc,marking  ] GC(1040) Concurrent Rebuild Remembered Sets 29.618ms
> 
> 
> I've tested the performance of the change with SPECjbb2015, SPECjvm2008, DaCapo, Renaissance.
> 
> I've tested run the patch through tier1-7.
> 
> Note that I've made patch as small as possible to make it easier to backport. Thanks @fisk for discussion and explanation of the inline caches code.

Looks good!

-------------

Marked as reviewed by eosterlund (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6450

From sspitsyn at openjdk.java.net  Thu Nov 18 10:34:40 2021
From: sspitsyn at openjdk.java.net (Serguei Spitsyn)
Date: Thu, 18 Nov 2021 10:34:40 GMT
Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with
 "assert(java_thread == _state->get_thread()) failed: Must be" [v3]
In-Reply-To: <Ivox2R4Y3x6BDgrGBt-QYm2grMJN5XWeetESBRyYa38=.3df670b6-2633-4a89-adac-2eb21f108689@github.com>
References: <PqDFsCk3WFtK9ZA-Blegiu6CzrA7EoiG6jx0raAFF7A=.5ddd8298-66cb-47c4-b53d-ca148ce2fc5b@github.com>
 <Ivox2R4Y3x6BDgrGBt-QYm2grMJN5XWeetESBRyYa38=.3df670b6-2633-4a89-adac-2eb21f108689@github.com>
Message-ID: <_JMj789jxQAfiksYaaXNDkVxOyYr3bomNH_oUGDaSIk=.6e2435cc-78e5-4306-bfd7-d45f3766e51e@github.com>

On Thu, 18 Nov 2021 09:34:13 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote:

>> The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa.
>> The fix is to add a check for is_exiting() status into handshake closure do_thread() early.
>> There following handshake closures are fixed by this update:
>>   - UpdateForPopTopFrameClosure
>>  - SetForceEarlyReturn
>>  - SetFramePopClosure
>
> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision:
> 
>   get rid of the checks in jvmti handshakes: java_thread->threadObj() == NULL

It is not correct.
At least, there is this case:

/* non suspended and exiting thread */
    case 6:
        set_watch_ev(1); /* watch JVMTI events */
        popframe_err = (jvmti->PopFrame(frameThr)); /* explode the bomb */
        set_watch_ev(0); /* ignore again JVMTI events */
        if (popframe_err != JVMTI_ERROR_THREAD_NOT_SUSPENDED &&
            popframe_err != JVMTI_ERROR_THREAD_NOT_ALIVE) {
            printf("TEST FAILED: the function PopFrame() returned the error %d: %s\n",
                popframe_err, TranslateError(popframe_err));
            printf("\tBut it should return the error JVMTI_ERROR_THREAD_NOT_SUSPENDED or JVMTI_ERROR_THREAD_NOT_ALIVE.\n");
            return STATUS_FAILED;
        }
        break;
    }

In other cases, the test constructs cases so that the tested thread is alive when expected.
The test was easily failing before in 10th of runs but now it does not fail in 100 runs.
I'll try to run this test 1000 times on all platforms.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6440

From sspitsyn at openjdk.java.net  Thu Nov 18 10:38:40 2021
From: sspitsyn at openjdk.java.net (Serguei Spitsyn)
Date: Thu, 18 Nov 2021 10:38:40 GMT
Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with
 "assert(java_thread == _state->get_thread()) failed: Must be" [v3]
In-Reply-To: <Ivox2R4Y3x6BDgrGBt-QYm2grMJN5XWeetESBRyYa38=.3df670b6-2633-4a89-adac-2eb21f108689@github.com>
References: <PqDFsCk3WFtK9ZA-Blegiu6CzrA7EoiG6jx0raAFF7A=.5ddd8298-66cb-47c4-b53d-ca148ce2fc5b@github.com>
 <Ivox2R4Y3x6BDgrGBt-QYm2grMJN5XWeetESBRyYa38=.3df670b6-2633-4a89-adac-2eb21f108689@github.com>
Message-ID: <-6jHqTZU-MyvUMDaH_H7GFwBEP84d7IV2vrgLjS2n3w=.fa71006c-959e-4866-be9b-4de8c6525b6f@github.com>

On Thu, 18 Nov 2021 09:34:13 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote:

>> The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa.
>> The fix is to add a check for is_exiting() status into handshake closure do_thread() early.
>> There following handshake closures are fixed by this update:
>>   - UpdateForPopTopFrameClosure
>>  - SetForceEarlyReturn
>>  - SetFramePopClosure
>
> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision:
> 
>   get rid of the checks in jvmti handshakes: java_thread->threadObj() == NULL

Also, if the target thread is exiting then the PopFrame should return error code `JVMTI_ERROR_THREAD_NOT_ALIVE`, but not `JVMTI_ERROR_THREAD_NOT_SUSPENDED`. It does not matter what this test is expecting.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6440

From pliden at openjdk.java.net  Thu Nov 18 10:48:46 2021
From: pliden at openjdk.java.net (Per Liden)
Date: Thu, 18 Nov 2021 10:48:46 GMT
Subject: RFR: 8277212: GC accidentally cleans valid megamorphic vtable
 inline caches
In-Reply-To: <9wD0oJ2P5bG1qYJ71qdCN4-Q_fiOkKWh4kXTdw8Yb8o=.fc8b7be9-7818-4414-9177-542e2b061480@github.com>
References: <9wD0oJ2P5bG1qYJ71qdCN4-Q_fiOkKWh4kXTdw8Yb8o=.fc8b7be9-7818-4414-9177-542e2b061480@github.com>
Message-ID: <9fmEJXfA_BDBUHlUS8P9XA6ZlwXxGNGggnDdP_0wvKs=.4e471ca9-d6e4-473d-b66b-338dabc1f528@github.com>

On Thu, 18 Nov 2021 09:56:37 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:

> We got a report on the zgc-dev list about a large performance issue affecting ZGC:
> https://mail.openjdk.java.net/pipermail/zgc-dev/2021-November/001086.html
> 
> One of the issues that the reporter identified was that we could get extremely long class unloading / unlinking times:
> 
> [17606.140s][info][gc,phases ] GC(719) Concurrent Process Non-Strong References 25781.928ms
> [17610.181s][info][gc,stats ] Subphase: Concurrent Classes Unlink 14280.772 / 25769.511 1126.563 / 25769.511 217.882 / 68385.750 217.882 / 68385.750 ms
> 
> and while this were happening we got a huge number of ICBufferFull safepoints.
> 
> It turns out that we have a 10-year-old bug in the inline cache cleaning code. This code came in with the permgen removal. See how the original code only calls set_to_clean when ic_oop is non-null:
> 
> https://github.com/openjdk/jdk/commit/5c58d27aac7b291b879a7a3ff6f39fca25619103
> 
>         CompiledIC *ic = CompiledIC_at(iter.reloc());
>         oop ic_oop = ic->cached_oop();
>         if (ic_oop != NULL && !is_alive->do_object_b(ic_oop)) {
>           // The only exception is compiledICHolder oops which may
>           // yet be marked below. (We check this further below).
>           if (ic_oop->is_compiledICHolder()) {
>             compiledICHolderOop cichk_oop = compiledICHolderOop(ic_oop);
>             if (is_alive->do_object_b(
>                   cichk_oop->holder_method()->method_holder()) &&
>                 is_alive->do_object_b(cichk_oop->holder_klass())) {
>               continue;
>             }
>           }
>           ic->set_to_clean();
>           assert(ic->cached_oop() == NULL,
>                  "cached oop in IC should be cleared");
>         }
>       }
> 
> 
> The rewritten code put the set_to_clean call in a different scope, causing the CompiledIC to also be cleaned when ic_oop is NULL:
> 
>         CompiledIC *ic = CompiledIC_at(iter.reloc());
>         if (ic->is_icholder_call()) {
>           // The only exception is compiledICHolder oops which may
>           // yet be marked below. (We check this further below).
>           CompiledICHolder* cichk_oop = ic->cached_icholder();
>           if (cichk_oop->holder_method()->method_holder()->is_loader_alive(is_alive) &&
>               cichk_oop->holder_klass()->is_loader_alive(is_alive)) {
>               continue;
>             }
>         } else {
>           Metadata* ic_oop = ic->cached_metadata();
>           if (ic_oop != NULL) {
>             if (ic_oop->is_klass()) {
>               if (((Klass*)ic_oop)->is_loader_alive(is_alive)) {
>                 continue;
>               }
>             } else if (ic_oop->is_method()) {
>               if (((Method*)ic_oop)->method_holder()->is_loader_alive(is_alive)) {
>                 continue;
>               }
>             } else {
>               ShouldNotReachHere();
>             }
>           }
>           }
>           ic->set_to_clean();
>       }
> 
> 
> Note the weird indentation, which could be seen as a hint that this might be a dubious / accidental change.
> 
> To understand why this is causing the problems we are seeing it's good to start by reading:
> https://wiki.openjdk.java.net/display/HotSpot/Overview+of+CompiledIC+and+CompiledStaticCall
> 
> When the GC hits this path and finds an ic_oop that is NULL, it means that it is dealing with an inline cache that is a megamorphic vtable call (or clean). Those should not be cleaned (at least that wasn't the intention of the old code).
> 
> But now we do clean them, and to do so we use an ICStub (to make a safe transition to the clean state), which uses up slots in the ICBuffer. When the ICBuffer is full, concurrent GCs have to stop and schedule an ICBufferFull safepoint stop-the-world operation, which removes the ICStub from the inline cache and completely frees up the ICBuffer. If the GC cleans a lot of these megamorphic vtable inline caches, then we'll create a large number of ICBufferFull safepoints.
> 
> But it is even worse than that. After the class unloading GCs have destroyed all megamorphic vtable inline caches, the Java threads will see these cleaned inline caches and correct them. Correcting the cleaned inline caches from the Java threads will also use ICStubs, and eventually the inline caches will transition back to be a megamorphic vtable calls. Because of this we can end up in a situation where the GC and Java threads change the inline cache back and forth between clean and megamorphic vtable calls. When this happen both GC and Java threads will continuously schedule ICBufferFull safepoints, and this can go on for many seconds, even minutes, if we are unlucky. For ZGC this has the effect that it blocks any further GC work, and eventually the Java threads will run out of memory and hit allocation stalls. The Java threads will then wait for the GC "clean" all inline caches and exit the class unloading phase and proceed to the phase where memory is reclaimed. You can see in 
 the GC logs that even though the problematic unlinking phase goes on for many seconds, the allocation stalls are "only" a few hundred milliseconds. This shows that when the Java threads stop fighting over the inline caches, the GC can finish the work relatively quickly.
> 
> G1 performs the inline cache cleaning while the Java threads are stopped, and therefore don't have to use ICStubs when the megamorphic vtables are accidentally cleaned. So, G1 (and other stop-the-world class unloading GCs) won't enter the situation where the GC and Java thread concurrently fight over the inline caches. It still causes the Java threads to have to take a slow path and fix the inline caches, which can result in unnecessary ICBufferFull safepoints.
> 
> I been able to reproduce the issue where ZGC and the Java threads fight over the ICStubs, causing minute long unloading times, by running one of the microbenchmarks from the Blackbird library used by the reporter of this issue. See description in:
> https://mail.openjdk.java.net/pipermail/zgc-dev/2021-November/001096.html
> 
> I think this could be reproduced in other workloads as well. I've also been able to reproduce the excessive ICBufferFull safepoints with Kitchensink (an oracle-internal stress test).
> 
> I've verified that restoring the set_to_clean code to the right scope fixes the issue that I can reproduce with both Blackbird and Kitchensink. After the fix, the class unloading times go back to normal levels.
> 
> To identify this issue, it's good to run with -Xlog:gc*,safepoint and take note of the "Concurrent Process Non-Strong References" times and ICBufferFull safepoint lines.
> 
> Example logs from ZGC where concurrent cleaning causes ICBufferFull safepoints:
> 
> [38.557s][1637062062666ms][info ][gc,phases   ] GC(222) Concurrent Mark Free 0.001ms
> [38.565s][1637062062673ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 7389821 ns, Reaching safepoint: 167546 ns, At safepoint: 6840 ns, Total: 174386 ns
> [38.565s][1637062062673ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 27749 ns, Reaching safepoint: 89368 ns, At safepoint: 5710 ns, Total: 95078 ns
> [38.566s][1637062062674ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 678872 ns, Reaching safepoint: 145967 ns, At safepoint: 6969 ns, Total: 152936 ns
> [38.567s][1637062062675ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 934596 ns, Reaching safepoint: 165826 ns, At safepoint: 5460 ns, Total: 171286 ns
> [38.567s][1637062062675ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 16500 ns, Reaching safepoint: 91147 ns, At safepoint: 5770 ns, Total: 96917 ns
> [38.568s][1637062062677ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1124041 ns, Reaching safepoint: 154426 ns, At safepoint: 6280 ns, Total: 160706 ns
> [38.570s][1637062062678ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1222819 ns, Reaching safepoint: 152646 ns, At safepoint: 6920 ns, Total: 159566 ns
> [38.571s][1637062062679ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1070303 ns, Reaching safepoint: 152686 ns, At safepoint: 6029 ns, Total: 158715 ns
> [38.571s][1637062062679ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 23650 ns, Reaching safepoint: 83208 ns, At safepoint: 6170 ns, Total: 89378 ns
> [38.572s][1637062062681ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1005014 ns, Reaching safepoint: 148206 ns, At safepoint: 5660 ns, Total: 153866 ns
> [38.572s][1637062062681ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 15110 ns, Reaching safepoint: 84047 ns, At safepoint: 5690 ns, Total: 89737 ns
> [38.574s][1637062062682ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1370755 ns, Reaching safepoint: 171876 ns, At safepoint: 5030 ns, Total: 176906 ns
> [38.574s][1637062062682ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 19749 ns, Reaching safepoint: 82478 ns, At safepoint: 4740 ns, Total: 87218 ns
> [38.574s][1637062062682ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 12480 ns, Reaching safepoint: 86707 ns, At safepoint: 5040 ns, Total: 91747 ns
> [38.575s][1637062062684ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 904007 ns, Reaching safepoint: 162666 ns, At safepoint: 5160 ns, Total: 167826 ns
> [38.575s][1637062062684ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14269 ns, Reaching safepoint: 80878 ns, At safepoint: 5420 ns, Total: 86298 ns
> [38.577s][1637062062685ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1240908 ns, Reaching safepoint: 144267 ns, At safepoint: 7030 ns, Total: 151297 ns
> [38.578s][1637062062686ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 971325 ns, Reaching safepoint: 175725 ns, At safepoint: 4710 ns, Total: 180435 ns
> [38.578s][1637062062686ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 16140 ns, Reaching safepoint: 80258 ns, At safepoint: 5389 ns, Total: 85647 ns
> [38.578s][1637062062686ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 10290 ns, Reaching safepoint: 80858 ns, At safepoint: 5530 ns, Total: 86388 ns
> [38.579s][1637062062687ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 430509 ns, Reaching safepoint: 159906 ns, At safepoint: 4610 ns, Total: 164516 ns
> [38.579s][1637062062687ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 18269 ns, Reaching safepoint: 83838 ns, At safepoint: 4520 ns, Total: 88358 ns
> [38.579s][1637062062687ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 13270 ns, Reaching safepoint: 77928 ns, At safepoint: 4790 ns, Total: 82718 ns
> [38.579s][1637062062688ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 384230 ns, Reaching safepoint: 193705 ns, At safepoint: 4080 ns, Total: 197785 ns
> [38.579s][1637062062688ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14099 ns, Reaching safepoint: 80908 ns, At safepoint: 4840 ns, Total: 85748 ns
> [38.580s][1637062062688ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 9150 ns, Reaching safepoint: 79268 ns, At safepoint: 4890 ns, Total: 84158 ns
> [38.580s][1637062062689ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 549396 ns, Reaching safepoint: 143086 ns, At safepoint: 6430 ns, Total: 149516 ns
> [38.580s][1637062062689ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 12540 ns, Reaching safepoint: 94717 ns, At safepoint: 5800 ns, Total: 100517 ns
> [38.581s][1637062062690ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 847758 ns, Reaching safepoint: 146687 ns, At safepoint: 5969 ns, Total: 152656 ns
> [38.582s][1637062062691ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 972285 ns, Reaching safepoint: 128177 ns, At safepoint: 6350 ns, Total: 134527 ns
> [38.584s][1637062062692ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 986975 ns, Reaching safepoint: 136396 ns, At safepoint: 5770 ns, Total: 142166 ns
> [38.584s][1637062062692ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 17280 ns, Reaching safepoint: 87097 ns, At safepoint: 5270 ns, Total: 92367 ns
> [38.585s][1637062062693ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1143131 ns, Reaching safepoint: 188315 ns, At safepoint: 5250 ns, Total: 193565 ns
> [38.585s][1637062062694ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 12200 ns, Reaching safepoint: 80168 ns, At safepoint: 7480 ns, Total: 87648 ns
> [38.586s][1637062062695ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1153410 ns, Reaching safepoint: 166846 ns, At safepoint: 7060 ns, Total: 173906 ns
> [38.587s][1637062062695ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 21549 ns, Reaching safepoint: 89898 ns, At safepoint: 5360 ns, Total: 95258 ns
> [38.588s][1637062062696ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1129411 ns, Reaching safepoint: 156726 ns, At safepoint: 4810 ns, Total: 161536 ns
> [38.588s][1637062062696ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14089 ns, Reaching safepoint: 80588 ns, At safepoint: 5170 ns, Total: 85758 ns
> [38.589s][1637062062697ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 644824 ns, Reaching safepoint: 140666 ns, At safepoint: 5990 ns, Total: 146656 ns
> [38.590s][1637062062699ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1085312 ns, Reaching safepoint: 254264 ns, At safepoint: 5440 ns, Total: 259704 ns
> [38.590s][1637062062699ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14609 ns, Reaching safepoint: 83748 ns, At safepoint: 5610 ns, Total: 89358 ns
> [38.591s][1637062062699ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 387680 ns, Reaching safepoint: 201215 ns, At safepoint: 5340 ns, Total: 206555 ns
> [38.591s][1637062062699ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 18929 ns, Reaching safepoint: 85098 ns, At safepoint: 5910 ns, Total: 91008 ns
> [38.591s][1637062062700ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 380750 ns, Reaching safepoint: 175066 ns, At safepoint: 4730 ns, Total: 179796 ns
> [38.592s][1637062062700ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14500 ns, Reaching safepoint: 80577 ns, At safepoint: 6790 ns, Total: 87367 ns
> [38.592s][1637062062700ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14660 ns, Reaching safepoint: 78498 ns, At safepoint: 7180 ns, Total: 85678 ns
> [38.592s][1637062062701ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 655783 ns, Reaching safepoint: 141717 ns, At safepoint: 6089 ns, Total: 147806 ns
> [38.594s][1637062062702ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 916657 ns, Reaching safepoint: 144226 ns, At safepoint: 5360 ns, Total: 149586 ns
> [38.595s][1637062062703ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1012334 ns, Reaching safepoint: 133037 ns, At safepoint: 10439 ns, Total: 143476 ns
> [38.597s][1637062062705ms][info ][gc,phases   ] GC(222) Concurrent Process Non-Strong References 39.443ms
> 
> 
> Example logs from G1 where the Java threads fixes the cleaned inline caches and run out of ICStubs:
> 
> [125.998s][1637065197322ms][info ][gc          ] GC(1040) Pause Remark 586M->414M(2048M) 6.609ms
> [125.998s][1637065197322ms][info ][gc,cpu      ] GC(1040) User=0.08s Sys=0.00s Real=0.01s
> [125.998s][1637065197322ms][info ][safepoint   ] Safepoint "G1Concurrent", Time since last: 33150646 ns, Reaching safepoint: 103457 ns, At safepoint: 6666988 ns, Total: 6770445 ns
> [125.998s][1637065197322ms][info ][gc,marking  ] GC(1040) Concurrent Mark 38.296ms
> [125.998s][1637065197322ms][info ][gc,marking  ] GC(1040) Concurrent Rebuild Remembered Sets
> [126.001s][1637065197326ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 2621782 ns, Reaching safepoint: 626684 ns, At safepoint: 9340 ns, Total: 636024 ns
> [126.002s][1637065197326ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 19949 ns, Reaching safepoint: 714022 ns, At safepoint: 12160 ns, Total: 726182 ns
> [126.007s][1637065197331ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 4665009 ns, Reaching safepoint: 339751 ns, At safepoint: 9640 ns, Total: 349391 ns
> [126.009s][1637065197334ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 2274802 ns, Reaching safepoint: 365760 ns, At safepoint: 9250 ns, Total: 375010 ns
> [126.027s][1637065197352ms][info ][gc,marking  ] GC(1040) Concurrent Rebuild Remembered Sets 29.618ms
> 
> 
> I've tested the performance of the change with SPECjbb2015, SPECjvm2008, DaCapo, Renaissance.
> 
> I've tested run the patch through tier1-7.
> 
> Note that I've made patch as small as possible to make it easier to backport. Thanks @fisk for discussion and explanation of the inline caches code.

Looks good!

-------------

Marked as reviewed by pliden (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6450

From eosterlund at openjdk.java.net  Thu Nov 18 11:20:45 2021
From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=)
Date: Thu, 18 Nov 2021 11:20:45 GMT
Subject: RFR: 8266368: Inaccurate after_unwind hook in C2 exception handler
In-Reply-To: <aDlIgiJizhev3G0iPhEkV0T-bj0FH9JZAkJt0zYuspk=.f6681508-3998-4815-a7ce-cf5e0bb42a7a@github.com>
References: <c0IRGAICnZCtKjSL4QU9IliPSVnNZmzAQc2hrLDUXVw=.90d2b414-a202-4f2a-b69a-39783c24c36f@github.com>
 <aDlIgiJizhev3G0iPhEkV0T-bj0FH9JZAkJt0zYuspk=.f6681508-3998-4815-a7ce-cf5e0bb42a7a@github.com>
Message-ID: <LMLefXc8szu15W0gtTb4XUylTHVkAdbN49HV7zPWNZ4=.4eb7ea7b-197b-4a71-9cd5-0607298be4c1@github.com>

On Thu, 18 Nov 2021 09:22:40 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote:

>> When we throw an exception and unwind into a frame, the exception handler of that frame needs to call an after_unwind hook for the StackWatermark code, to support for concurrent stack processing. Unfortunately, for C2 frames, I inaccurately do this in OptoRuntime::rethrow_C, but the exception handler when unwinding into a C2 frame really is OptoRuntime::handle_exception_C.
>> The handle_exception_C code does walk frames to the caller though, which also pokes the StackWatermark code. So in the end, there is no real bug here, but it works for the wrong reasons. So I'd like to move the hook in rethrow_C to handle_exception_C.
>
> Looks good.

Thanks for the reviews, @TobiHartmann and @dean-long.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6405

From eosterlund at openjdk.java.net  Thu Nov 18 11:20:46 2021
From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=)
Date: Thu, 18 Nov 2021 11:20:46 GMT
Subject: Integrated: 8266368: Inaccurate after_unwind hook in C2 exception
 handler
In-Reply-To: <c0IRGAICnZCtKjSL4QU9IliPSVnNZmzAQc2hrLDUXVw=.90d2b414-a202-4f2a-b69a-39783c24c36f@github.com>
References: <c0IRGAICnZCtKjSL4QU9IliPSVnNZmzAQc2hrLDUXVw=.90d2b414-a202-4f2a-b69a-39783c24c36f@github.com>
Message-ID: <xbeVgXLy9MF-B3fYYaD08b1qXXtl6oECFPlEhsnv8AQ=.a1caf7b0-8096-484c-b8a7-c587ffaf2015@github.com>

On Tue, 16 Nov 2021 08:42:32 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote:

> When we throw an exception and unwind into a frame, the exception handler of that frame needs to call an after_unwind hook for the StackWatermark code, to support for concurrent stack processing. Unfortunately, for C2 frames, I inaccurately do this in OptoRuntime::rethrow_C, but the exception handler when unwinding into a C2 frame really is OptoRuntime::handle_exception_C.
> The handle_exception_C code does walk frames to the caller though, which also pokes the StackWatermark code. So in the end, there is no real bug here, but it works for the wrong reasons. So I'd like to move the hook in rethrow_C to handle_exception_C.

This pull request has now been integrated.

Changeset: 2c06bca9
Author:    Erik ?sterlund <eosterlund at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/2c06bca98fcf9d129d6085e26c225fb26368a558
Stats:     12 lines in 2 files changed: 5 ins; 5 del; 2 mod

8266368: Inaccurate after_unwind hook in C2 exception handler

Reviewed-by: dlong, thartmann

-------------

PR: https://git.openjdk.java.net/jdk/pull/6405

From duke at openjdk.java.net  Thu Nov 18 11:21:54 2021
From: duke at openjdk.java.net (Evgeny Astigeevich)
Date: Thu, 18 Nov 2021 11:21:54 GMT
Subject: Integrated: 8277137: Set OnSpinWaitInst/OnSpinWaitInstCount defaults
 to "isb"/1 for Arm Neoverse N1
In-Reply-To: <-I6bN1jTD9JWI_Nsrf64Yy25zpmeF7Q6Vpe5vSLPKY8=.33084f60-ebd4-4f8f-b56e-408f0fd807b6@github.com>
References: <-I6bN1jTD9JWI_Nsrf64Yy25zpmeF7Q6Vpe5vSLPKY8=.33084f60-ebd4-4f8f-b56e-408f0fd807b6@github.com>
Message-ID: <2f6H9Qd974dKFthSWKyO4AEUME2TGMeJwBZKrCORGUc=.b22c6589-c758-427d-bc3e-d1e84185a38f@github.com>

On Tue, 16 Nov 2021 18:14:15 GMT, Evgeny Astigeevich <duke at openjdk.java.net> wrote:

> One `ISB` implementation of `Thread.OnSpinWait` shows performance improvements on Graviton2 (Arm Neoverse N1 implementation), e.g. https://github.com/openjdk/jdk/pull/5562#issuecomment-966153163. 
> 
> Testing:
> - `make test TEST=gtest`: Passed
> - `make run-test TEST=tier1`: Passed
> - `make run-test TEST=tier2`: Passed
> - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed

This pull request has now been integrated.

Changeset: 38345bd2
Author:    Evgeny Astigeevich <eastig at amazon.com>
Committer: Volker Simonis <simonis at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/38345bd28db83371676f1685806ddc207a833879
Stats:     105 lines in 2 files changed: 105 ins; 0 del; 0 mod

8277137: Set OnSpinWaitInst/OnSpinWaitInstCount defaults to "isb"/1 for Arm Neoverse N1

Reviewed-by: phh, aph, ngasson

-------------

PR: https://git.openjdk.java.net/jdk/pull/6415

From dholmes at openjdk.java.net  Thu Nov 18 12:56:42 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Thu, 18 Nov 2021 12:56:42 GMT
Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with
 "assert(java_thread == _state->get_thread()) failed: Must be" [v3]
In-Reply-To: <Ivox2R4Y3x6BDgrGBt-QYm2grMJN5XWeetESBRyYa38=.3df670b6-2633-4a89-adac-2eb21f108689@github.com>
References: <PqDFsCk3WFtK9ZA-Blegiu6CzrA7EoiG6jx0raAFF7A=.5ddd8298-66cb-47c4-b53d-ca148ce2fc5b@github.com>
 <Ivox2R4Y3x6BDgrGBt-QYm2grMJN5XWeetESBRyYa38=.3df670b6-2633-4a89-adac-2eb21f108689@github.com>
Message-ID: <SuxpSKq7hWDvLXzXIsrH8ieuVfZJztmX0xqtYT8yRuY=.0f7e9839-1ab3-47fb-99a3-d426846f0816@github.com>

On Thu, 18 Nov 2021 09:34:13 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote:

>> The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa.
>> The fix is to add a check for is_exiting() status into handshake closure do_thread() early.
>> There following handshake closures are fixed by this update:
>>   - UpdateForPopTopFrameClosure
>>  - SetForceEarlyReturn
>>  - SetFramePopClosure
>
> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision:
> 
>   get rid of the checks in jvmti handshakes: java_thread->threadObj() == NULL

It seems somewhat subjective whether a thread that is exiting and thus still on its way to becoming "not alive" needs to report "not alive" versus "not suspended". As there appears to be no synchronization with the target in this case what stops it from transitioning to "is_exiting" the moment after the "is_exiting" check returns false, but before you hit the assertion?

-------------

PR: https://git.openjdk.java.net/jdk/pull/6440

From dholmes at openjdk.java.net  Thu Nov 18 13:11:37 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Thu, 18 Nov 2021 13:11:37 GMT
Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with
 "assert(java_thread == _state->get_thread()) failed: Must be" [v3]
In-Reply-To: <Ivox2R4Y3x6BDgrGBt-QYm2grMJN5XWeetESBRyYa38=.3df670b6-2633-4a89-adac-2eb21f108689@github.com>
References: <PqDFsCk3WFtK9ZA-Blegiu6CzrA7EoiG6jx0raAFF7A=.5ddd8298-66cb-47c4-b53d-ca148ce2fc5b@github.com>
 <Ivox2R4Y3x6BDgrGBt-QYm2grMJN5XWeetESBRyYa38=.3df670b6-2633-4a89-adac-2eb21f108689@github.com>
Message-ID: <zr_C2pe8nJGoYqNsCDkMxHBXxF-FrJ77S_CPfyr9Tyg=.e370de07-4e6e-4b46-8685-7e71bc700e1f@github.com>

On Thu, 18 Nov 2021 09:34:13 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote:

>> The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa.
>> The fix is to add a check for is_exiting() status into handshake closure do_thread() early.
>> There following handshake closures are fixed by this update:
>>   - UpdateForPopTopFrameClosure
>>  - SetForceEarlyReturn
>>  - SetFramePopClosure
>
> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision:
> 
>   get rid of the checks in jvmti handshakes: java_thread->threadObj() == NULL

Ignore that last question - the target is in a handshake so can't change state.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6440

From stefank at openjdk.java.net  Thu Nov 18 13:16:53 2021
From: stefank at openjdk.java.net (Stefan Karlsson)
Date: Thu, 18 Nov 2021 13:16:53 GMT
Subject: RFR: 8277397: ZGC: Add JFR event for temporary latency measurements
Message-ID: <yD1yxe9zzlk0043jw7jTG2MRkF4Umk8RIGl-eIOX6H0=.e6123f9e-8b75-47b7-b69a-f138d1b1d411@github.com>

I often measure latencies and stalls using JFR events. I'd like to add an event that can be used for these ad-hoc measurements during development and debugging.

-------------

Commit messages:
 - 8277397: ZGC: Add JFR event for temporary latency measurements

Changes: https://git.openjdk.java.net/jdk/pull/6454/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6454&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8277397
  Stats: 55 lines in 6 files changed: 55 ins; 0 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6454.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6454/head:pull/6454

PR: https://git.openjdk.java.net/jdk/pull/6454

From hseigel at openjdk.java.net  Thu Nov 18 13:22:48 2021
From: hseigel at openjdk.java.net (Harold Seigel)
Date: Thu, 18 Nov 2021 13:22:48 GMT
Subject: RFR: 8276795: Deprecate seldom used CDS flags [v2]
In-Reply-To: <2R0k3TgJwgMkaV2tlOyW8O1cLiB6USFPJ-qvItVBJV0=.7e012487-abf4-4fa3-91c8-74f01d49bbab@github.com>
References: <Sex9p6I6KaGfkEMPnyyuLPsgMM6KEm_WLeSuhUkHMs4=.e4fce608-e8ac-4014-8e6b-20a5348af607@github.com>
 <2R0k3TgJwgMkaV2tlOyW8O1cLiB6USFPJ-qvItVBJV0=.7e012487-abf4-4fa3-91c8-74f01d49bbab@github.com>
Message-ID: <7239ZzwFhI50QRFnA_pYj26UEQH2mzlltN5Ve4L-sWw=.32fbd078-7628-4632-876b-24d2aaa9cce3@github.com>

On Tue, 16 Nov 2021 15:56:00 GMT, Harold Seigel <hseigel at openjdk.org> wrote:

>> Please review this small change to deprecate seldom used CDS flags.  The flags will be deprecated in 18, obsoleted in 19, and removed in a later release.
>> 
>> The changes were tested with Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 3-5 on Linux x64.
>> 
>> Thanks, Harold
>
> Harold Seigel has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Add (Deprecated) to comments and add options to deprecated test

Thanks Calvin, Ioi, and David for the reviews.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6390

From hseigel at openjdk.java.net  Thu Nov 18 13:22:48 2021
From: hseigel at openjdk.java.net (Harold Seigel)
Date: Thu, 18 Nov 2021 13:22:48 GMT
Subject: Integrated: 8276795: Deprecate seldom used CDS flags
In-Reply-To: <Sex9p6I6KaGfkEMPnyyuLPsgMM6KEm_WLeSuhUkHMs4=.e4fce608-e8ac-4014-8e6b-20a5348af607@github.com>
References: <Sex9p6I6KaGfkEMPnyyuLPsgMM6KEm_WLeSuhUkHMs4=.e4fce608-e8ac-4014-8e6b-20a5348af607@github.com>
Message-ID: <z_sIovuNjQP0yjigySGDXuc5RrxVuxb1gfQhdaBlN3I=.9c599b3c-3b6d-45a8-bbcc-e5ae1cb36a90@github.com>

On Mon, 15 Nov 2021 14:50:43 GMT, Harold Seigel <hseigel at openjdk.org> wrote:

> Please review this small change to deprecate seldom used CDS flags.  The flags will be deprecated in 18, obsoleted in 19, and removed in a later release.
> 
> The changes were tested with Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 3-5 on Linux x64.
> 
> Thanks, Harold

This pull request has now been integrated.

Changeset: b3a62b48
Author:    Harold Seigel <hseigel at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/b3a62b48816358ac7dadde4e7893190500ca7b79
Stats:     17 lines in 4 files changed: 8 ins; 0 del; 9 mod

8276795: Deprecate seldom used CDS flags

Reviewed-by: dholmes, ccheung, iklam

-------------

PR: https://git.openjdk.java.net/jdk/pull/6390

From simonis at openjdk.java.net  Thu Nov 18 13:37:49 2021
From: simonis at openjdk.java.net (Volker Simonis)
Date: Thu, 18 Nov 2021 13:37:49 GMT
Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks [v2]
In-Reply-To: <GZdP4Skyrnhyww0vSmutTLOwvYW3tCjigaC9lQKTfd8=.fda271b3-06aa-4603-b1c1-fa3f52400be5@github.com>
References: <h_TohyRCtTn8BzN-h8uTTotWxPr5gTe4CWB04_lS8uA=.3f687b45-3058-42ab-902b-f42cfeb9104d@github.com>
 <GZdP4Skyrnhyww0vSmutTLOwvYW3tCjigaC9lQKTfd8=.fda271b3-06aa-4603-b1c1-fa3f52400be5@github.com>
Message-ID: <qiFO3w9PJClRnvSSkPrqIy5MJj5L4klQZBEcK32o0qc=.a00f1ec9-e103-4e7a-ba02-6f0b0e1a4163@github.com>

On Thu, 11 Nov 2021 06:30:15 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> This is part of a number of RFE I plan to improve and simplify C-heap overflow checking in hotspot.
>> 
>> This proposal adds NMT buffer overflow checking:
>> 
>> - it gives us C-heap overflow checking in release builds
>> - the costs are neglectable: if NMT is off, we won't pay anything; if NMT is on, the added work is minuscule since we have to do malloc header management anyway.
>> - NMT needs intact headers anyway. Faced with buffer overwrites today, it would maybe crash or maybe account wrongly, but it's a bit of a lottery really. Better to go the extra step and do a real check.
>> - it could be a preparation for future code removal, if we wanted to do that (see details in umbrella RFE https://bugs.openjdk.java.net/browse/JDK-8275301). That way, net complexity would come down even with this patch.
>> 
>> For more details, please see the JBS issue.
>> 
>> ----
>> 
>> Patch notes:
>> 
>> - The malloc header is changed such that it contains a 16-bit canary directly preceding the user payload of the allocation. The new malloc header does not use bitfields anymore but normal types. For more details, see the comment in mallocTracker.hpp.
>>   - On 64-bit, we don't enlarge the malloc header. It remains 16 bytes in length. So no additional memory cost (apart from the 1-byte-footer, see below). Space for the canary is instead obtained by reducing the size of the bucket index bit field to 16 bits. That bit field is used to store the bucket slot index of the malloc site table in NMT detail mode. With 40 bits it was over-dimensioned, and even 16-bits arguably still are: malloc site table width is 512.
>>   - On 32-bit, I had to enlarge the header from 8 bytes to 16 bytes to make room for a canary. But strictly speaking 8 bytes were not enough anyway: the header size has to be large enough to satisfy malloc(3) alignment, and that would be 16 bytes. I believe it never led to an error since we don't store 128bit data in malloc'd memory in the hotspot anywhere.
>> 
>> - I added a footer canary trailing the user allocation to catch tail buffer overruns. To keep matters simple (alignment) I made it a single byte only. That is enough to catch most overrun scenarios.
>> 
>> - I brushed up error reporting. When NMT detects corruption, it will now print out a hex dump of the corrupted area to tty before asserting.
>> 
>> - I added a bunch of gtests to test various heap overwrite scenarios. I also had to extend the gtest macros a bit because I wanted these tests of course to run in release builds too, but we did not have a death test macro for release builds yet (there are possibilities for code simplification here too, but that's for another RFE).
>> 
>> - I renamed `nmt_header_size` to `nmt_overhead` since that size includes header and footer now.
>> 
>> - I made the assert for malloc site table width a compile time STATIC_ASSERT.
>> 
>> --------------
>> 
>> Example output a buffer overrun would provide:
>> 
>> 
>> Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)
>> NMT Block at 0x00005600f86136b0, corruption at: 0x00005600f86136c1: 
>> 0x00005600f86136a8:   21 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00
>> 0x00005600f86136b8:   00 00 00 00 0f 00 1f fa 00 61 00 00 00 00 00 00
>> 0x00005600f86136c8:   41 39 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f86136d8:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
>> 0x00005600f86136e8:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f86136f8:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f8613708:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f8613718:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f8613728:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f8613738:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
>> assert failed: fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)#
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> #  Internal Error (mallocTracker.cpp:203), pid=10805, tid=10805
>> #  fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)
>> #
>> 
>> -------
>> 
>> Tests:
>> - manual tests with Linux x64, x86, minimal build
>> - GHAs all clean
>> - SAP nightlies ran for 4 weeks now without problems
>
> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision:
> 
>  - Merge
>  - Let NMT do overflow detection

Hi Thoms,

your change looks good. I only have a few remarks and comments inline.

Best regards,
Volker

src/hotspot/share/services/mallocTracker.cpp line 138:

> 136:   os::print_hex_dump(st, from, to, 1);
> 137:   assert(bad_address >= from, "sanity");
> 138:   // if the corruption is in the block body of in the footer, print out that part too

// If the ... body or in the footer...

src/hotspot/share/services/mallocTracker.cpp line 143:

> 141:   from2 = MAX2(to, from2);
> 142:   address to2 = from2 + 96;
> 143:   if (to2 > to) {

Don't understand this. If `from2 = MAX2(to, from2)` then `from2 >= to`. So shouldn't `to2` (which is `from2 + 96`) always be bigger then `to`?

src/hotspot/share/services/mallocTracker.cpp line 169:

> 167:   //  use SafeFetch but since this is a hot path we don't. If we are
> 168:   //  wrong, we will crash when accessing the canary, which hopefully
> 169:   //  generates distinct crash report.

No need for two spaces after `//`

src/hotspot/share/services/mallocTracker.cpp line 174:

> 172:   // we check here are the bare minimum of what we know will malloc() give us
> 173:   // (which is 64-bit even on 32-bit platforms).
> 174:   if (!is_aligned(this, sizeof(uint64_t))) {

Where does this information come from? As far as I can see, the man-page of `malloc()` only mentions:

> "malloc returns a pointer which is suitably aligned for any built-in  type"

Why is this 64 bit on a 32-bit platform?

src/hotspot/share/services/mallocTracker.hpp line 314:

> 312:   static const uint8_t  _footer_canary_dead_mark = 0xFB;
> 313:   NOT_LP64(static const uint32_t _header_alt_canary_life_mark = 0xFAFA1F1F;)
> 314:   NOT_LP64(static const uint32_t _header_alt_canary_dead_mark = 0xFBFB1F1F;)

Just out of interest, how did you choose these canary marks? Is there some evidence that they appear less frequently in real code/data than other values?

test/hotspot/gtest/nmt/test_nmt_buffer_overflow_detection.cpp line 71:

> 69: // this should generate two hex dumps, one with the front header, one with the overwritten
> 70: // portion.
> 71: static void test_overwrite_back_long() {

I think the test isn't really checking that we get two hex dumps, right?

-------------

PR: https://git.openjdk.java.net/jdk/pull/5952

From aph at openjdk.java.net  Thu Nov 18 13:56:38 2021
From: aph at openjdk.java.net (Andrew Haley)
Date: Thu, 18 Nov 2021 13:56:38 GMT
Subject: RFR: 8277168: AArch64: Enable arraycopy partial inlining with SVE
In-Reply-To: <Rv-7zsL9uLaSjwoS7JIDgZ2-HhIoX00l8aCiDmFd6qw=.b1dbbb3b-0578-4d7f-97ba-1acd3baaa74c@github.com>
References: <Rv-7zsL9uLaSjwoS7JIDgZ2-HhIoX00l8aCiDmFd6qw=.b1dbbb3b-0578-4d7f-97ba-1acd3baaa74c@github.com>
Message-ID: <mEbx_v-4vc6R1-sFvt6ws-wCvBc7uLDP66MFhmBmGi8=.a1160df8-9da5-4879-8a7c-c77fe568fc09@github.com>

On Thu, 18 Nov 2021 03:50:45 GMT, Pengfei Li <pli at openjdk.org> wrote:

> Arraycopy partial inlining is a C2 compiler technique that avoids stub
> call overhead in small-sized arraycopy operations by generating masked
> vector instructions. So far it works on x86 AVX512 only and this patch
> enables it on AArch64 with SVE.
> 
> We add AArch64 matching rule for VectorMaskGenNode and refactor that
> node a little bit. The major change is moving the element type field
> into its TypeVectMask bottom type. The reason is that AArch64 vector
> masks are different for different vector element types.
> 
> E.g., an x86 AVX512 vector mask value masking 3 least significant vector
> lanes (of any type) is like
> 
> `0000 0000 ... 0000 0000 0000 0000 0111`
> 
> On AArch64 SVE, this mask value can only be used for masking the 3 least
> significant lanes of bytes. But for 3 lanes of ints, the value should be
> 
> `0000 0000 ... 0000 0000 0001 0001 0001`
> 
> where the least significant bit of each lane matters. So AArch64 matcher
> needs to know the vector element type to generate right masks.
> 
> After this patch, the C2 generated code for copying a 50-byte array on
> AArch64 SVE looks like
> 
>   mov     x12, #0x32
>   whilelo p0.b, xzr, x12
>   add     x11, x11, #0x10
>   ld1b    {z16.b}, p0/z, [x11]
>   add     x10, x10, #0x10
>   st1b    {z16.b}, p0, [x10]
> 
> We ran jtreg hotspot::hotspot_all, jdk::tier1~3 and langtools::tier1 on
> both x86 AVX512 and AArch64 SVE machines, no issue is found. We tested
> JMH org/openjdk/bench/java/lang/ArrayCopyAligned.java with small array
> size arguments on a 512-bit SVE-featured CPU. We got below performance
> data changes.
> 
> Benchmark                  (length)  (Performance)
> ArrayCopyAligned.testByte        10          -2.6%
> ArrayCopyAligned.testByte        20          +4.7%
> ArrayCopyAligned.testByte        30          +4.8%
> ArrayCopyAligned.testByte        40         +21.7%
> ArrayCopyAligned.testByte        50         +22.5%
> ArrayCopyAligned.testByte        60         +28.4%
> 
> The test machine has SVE vector size of 512 bits, so we see performance
> gain for most array sizes less than 64 bytes. For very small arrays we
> see a bit regression because a vector load/store may be a bit slower
> than 1 or 2 scalar loads/stores.

I'm having a lot of difficulty understanding how this is supposed to work.

Firstly, I'm not seeing a performance increase on a fujitsu-fx700.
Secondly, I'm not surprised: looking at the results of JMH `-prof:perfasm`, it seems to me that the only SVE instructions being executed are _outside_ the timing loop in the `testByte_ArrayCopyAligned_testByte_jmhTest:avgt_jmhStub` method. I'm baffled by what is going on.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6444

From stuefe at openjdk.java.net  Thu Nov 18 13:58:40 2021
From: stuefe at openjdk.java.net (Thomas Stuefe)
Date: Thu, 18 Nov 2021 13:58:40 GMT
Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks [v2]
In-Reply-To: <qiFO3w9PJClRnvSSkPrqIy5MJj5L4klQZBEcK32o0qc=.a00f1ec9-e103-4e7a-ba02-6f0b0e1a4163@github.com>
References: <h_TohyRCtTn8BzN-h8uTTotWxPr5gTe4CWB04_lS8uA=.3f687b45-3058-42ab-902b-f42cfeb9104d@github.com>
 <GZdP4Skyrnhyww0vSmutTLOwvYW3tCjigaC9lQKTfd8=.fda271b3-06aa-4603-b1c1-fa3f52400be5@github.com>
 <qiFO3w9PJClRnvSSkPrqIy5MJj5L4klQZBEcK32o0qc=.a00f1ec9-e103-4e7a-ba02-6f0b0e1a4163@github.com>
Message-ID: <KLAkuUEH-EGusfmwfOsfvH_4UnfMJvYEeL4M2PD1jmQ=.acedad5e-ac81-484a-8a38-fd58c2fa7db2@github.com>

On Thu, 18 Nov 2021 12:12:17 GMT, Volker Simonis <simonis at openjdk.org> wrote:

>> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision:
>> 
>>  - Merge
>>  - Let NMT do overflow detection
>
> src/hotspot/share/services/mallocTracker.cpp line 143:
> 
>> 141:   from2 = MAX2(to, from2);
>> 142:   address to2 = from2 + 96;
>> 143:   if (to2 > to) {
> 
> Don't understand this. If `from2 = MAX2(to, from2)` then `from2 >= to`. So shouldn't `to2` (which is `from2 + 96`) always be bigger then `to`?

You are absolutely right, and the code is not very clear either, I'll improve it.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5952

From stuefe at openjdk.java.net  Thu Nov 18 14:22:46 2021
From: stuefe at openjdk.java.net (Thomas Stuefe)
Date: Thu, 18 Nov 2021 14:22:46 GMT
Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks [v2]
In-Reply-To: <qiFO3w9PJClRnvSSkPrqIy5MJj5L4klQZBEcK32o0qc=.a00f1ec9-e103-4e7a-ba02-6f0b0e1a4163@github.com>
References: <h_TohyRCtTn8BzN-h8uTTotWxPr5gTe4CWB04_lS8uA=.3f687b45-3058-42ab-902b-f42cfeb9104d@github.com>
 <GZdP4Skyrnhyww0vSmutTLOwvYW3tCjigaC9lQKTfd8=.fda271b3-06aa-4603-b1c1-fa3f52400be5@github.com>
 <qiFO3w9PJClRnvSSkPrqIy5MJj5L4klQZBEcK32o0qc=.a00f1ec9-e103-4e7a-ba02-6f0b0e1a4163@github.com>
Message-ID: <UVTzs12JWIAKnp3QTOLAub9Jmdi754jaoob2TpRWp2M=.4269405a-4dc1-4740-96b9-a6e1c87a1695@github.com>

On Thu, 18 Nov 2021 12:28:55 GMT, Volker Simonis <simonis at openjdk.org> wrote:

>> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision:
>> 
>>  - Merge
>>  - Let NMT do overflow detection
>
> src/hotspot/share/services/mallocTracker.cpp line 174:
> 
>> 172:   // we check here are the bare minimum of what we know will malloc() give us
>> 173:   // (which is 64-bit even on 32-bit platforms).
>> 174:   if (!is_aligned(this, sizeof(uint64_t))) {
> 
> Where does this information come from? As far as I can see, the man-page of `malloc()` only mentions:
> 
>> "malloc returns a pointer which is suitably aligned for any built-in  type"
> 
> Why is this 64 bit on a 32-bit platform?

We know that the alignment has to be *at least* 64-bit since we know we have 64-bit inbuilt types on both 64-bit and 32-bit platforms (`uint64_t`). From experience, I know it is probably more, 16 or 32 bytes. This makes sense since there exist scalar data types larger than 64-bit.

But this code tests the *minimal* necessary alignment only and I wanted to prevent false positives. Let's say, in case we happen to run with a libc whose malloc implementation returns only 64-bit aligned pointers. That also could happen if someone put a weird malloc() implementation below us (malloc hooks or LD_PRELOAD). I think the assumption that everything malloc() returns is at least 64-bit aligned is pretty safe though.

Ideally, we would have a clear definition of malloc alignment somewhere in `globalDefinitions.hpp`. In hotspot, there are a range of places where such alignment is implicitly assumed. The NMT header size, for instance, or metaspace allocation size, hotspot arena allocation alignment etc. Basically, everywhere where one either marshalls malloc'ed blocks or implements some sort of general purpose allocator. C++ has `std::max_align_t` for that. Maybe we could use that one. But that's a topic for another RFE.

I try to improve the comment.

> src/hotspot/share/services/mallocTracker.hpp line 314:
> 
>> 312:   static const uint8_t  _footer_canary_dead_mark = 0xFB;
>> 313:   NOT_LP64(static const uint32_t _header_alt_canary_life_mark = 0xFAFA1F1F;)
>> 314:   NOT_LP64(static const uint32_t _header_alt_canary_dead_mark = 0xFBFB1F1F;)
> 
> Just out of interest, how did you choose these canary marks? Is there some evidence that they appear less frequently in real code/data than other values?

I did an extensive statistical analysis of many core dumps.

...

...

Just kidding, I chose them on a whim to be not zero :) Do you have a better suggestion? I thought about making them ASCII pattern, but those are actually more common in payload data.

> test/hotspot/gtest/nmt/test_nmt_buffer_overflow_detection.cpp line 71:
> 
>> 69: // this should generate two hex dumps, one with the front header, one with the overwritten
>> 70: // portion.
>> 71: static void test_overwrite_back_long() {
> 
> I think the test isn't really checking that we get two hex dumps, right?

Again, bad wording in the comment. This tests that `MallocHeader::print_block_on_error()` prints a hex dump covering both header and the corruption address, and if both are too far apart, that the dump is split up in two parts.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5952

From zgu at openjdk.java.net  Thu Nov 18 14:30:44 2021
From: zgu at openjdk.java.net (Zhengyu Gu)
Date: Thu, 18 Nov 2021 14:30:44 GMT
Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks [v2]
In-Reply-To: <UVTzs12JWIAKnp3QTOLAub9Jmdi754jaoob2TpRWp2M=.4269405a-4dc1-4740-96b9-a6e1c87a1695@github.com>
References: <h_TohyRCtTn8BzN-h8uTTotWxPr5gTe4CWB04_lS8uA=.3f687b45-3058-42ab-902b-f42cfeb9104d@github.com>
 <GZdP4Skyrnhyww0vSmutTLOwvYW3tCjigaC9lQKTfd8=.fda271b3-06aa-4603-b1c1-fa3f52400be5@github.com>
 <qiFO3w9PJClRnvSSkPrqIy5MJj5L4klQZBEcK32o0qc=.a00f1ec9-e103-4e7a-ba02-6f0b0e1a4163@github.com>
 <UVTzs12JWIAKnp3QTOLAub9Jmdi754jaoob2TpRWp2M=.4269405a-4dc1-4740-96b9-a6e1c87a1695@github.com>
Message-ID: <XEaeQynwJqwE5g0dkJ8SeQ-AlygbsD_DIbpI-PU3QoE=.0720f9bc-de84-4b35-8f0c-f9f694ecd290@github.com>

On Thu, 18 Nov 2021 14:14:17 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> src/hotspot/share/services/mallocTracker.cpp line 174:
>> 
>>> 172:   // we check here are the bare minimum of what we know will malloc() give us
>>> 173:   // (which is 64-bit even on 32-bit platforms).
>>> 174:   if (!is_aligned(this, sizeof(uint64_t))) {
>> 
>> Where does this information come from? As far as I can see, the man-page of `malloc()` only mentions:
>> 
>>> "malloc returns a pointer which is suitably aligned for any built-in  type"
>> 
>> Why is this 64 bit on a 32-bit platform?
>
> We know that the alignment has to be *at least* 64-bit since we know we have 64-bit inbuilt types on both 64-bit and 32-bit platforms (`uint64_t`). From experience, I know it is probably more, 16 or 32 bytes. This makes sense since there exist scalar data types larger than 64-bit.
> 
> But this code tests the *minimal* necessary alignment only and I wanted to prevent false positives. Let's say, in case we happen to run with a libc whose malloc implementation returns only 64-bit aligned pointers. That also could happen if someone put a weird malloc() implementation below us (malloc hooks or LD_PRELOAD). I think the assumption that everything malloc() returns is at least 64-bit aligned is pretty safe though.
> 
> Ideally, we would have a clear definition of malloc alignment somewhere in `globalDefinitions.hpp`. In hotspot, there are a range of places where such alignment is implicitly assumed. The NMT header size, for instance, or metaspace allocation size, hotspot arena allocation alignment etc. Basically, everywhere where one either marshalls malloc'ed blocks or implements some sort of general purpose allocator. C++ has `std::max_align_t` for that. Maybe we could use that one. But that's a topic for another RFE.
> 
> I try to improve the comment.

> Where does this information come from? As far as I can see, the man-page of `malloc()` only mentions:
> 
> > "malloc returns a pointer which is suitably aligned for any built-in  type"
> 
> Why is this 64 bit on a 32-bit platform?

NMT always assumes (from experiments on various platforms) that malloc memory is 2-machine-word aligned, so it is 64-bit align on a 32-bit platform.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5952

From eosterlund at openjdk.java.net  Thu Nov 18 14:36:56 2021
From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=)
Date: Thu, 18 Nov 2021 14:36:56 GMT
Subject: Integrated: 8259643: ZGC can return metaspace OOM prematurely
In-Reply-To: <V-_4hXiYrxvOCdZmmQuwz7zaHxMIZ6MlPg2fcdr2Y7M=.bb702064-3de5-4d56-a040-4883b9419bab@github.com>
References: <V-_4hXiYrxvOCdZmmQuwz7zaHxMIZ6MlPg2fcdr2Y7M=.bb702064-3de5-4d56-a040-4883b9419bab@github.com>
Message-ID: <6xrNisEJhIfc0YwEy-5Z-xTCeHLjBYJK8X4jei5NtIU=.3ef261e6-3b73-4ca8-a9a7-bc9fff74b9ea@github.com>

On Thu, 28 Jan 2021 12:55:55 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote:

> There exists a race condition for ZGC metaspace allocations, where an allocation can throw OOM due to unbounded starvation from other threads. Towards the end of the allocation dance, we conceptually do this:
> 
> 1. full_gc()
> 2. final_allocation_attempt()
> 
> And if we still fail at 2 after doing a full GC, we conclude that there isn't enough metaspace memory. However, if the thread gets preempted between 1 and 2, then an unbounded number of metaspace allocations from other threads can fill up the entire metaspace, making the final allocation attempt fail and hence throw. This can cause a situation where almost the entire metaspace is unreachable from roots, yet we throw OOM. I managed to reproduce this with the right sleeps.
> 
> The way we deal with this particular issue for heap allocations, is to have an allocation request queue, and satisfy those allocations before others, preventing starvation. My solution to this metaspace OOM problem will be to basically do exactly that - have a queue of "critical" allocations, that get precedence over normal metaspace allocations.
> 
> The solution should work for other concurrent GCs (who likely have the same issue), but I only tried this with ZGC, so I am only hooking in ZGC to the new API (for concurrently unloading GCs to manage critical metaspace allocations) at this point.
> 
> Passes ZGC tests from tier 1-5, and the particular test that failed (with the JVM sleeps that make it fail deterministically).

This pull request has now been integrated.

Changeset: 00c388b4
Author:    Erik ?sterlund <eosterlund at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/00c388b4aba41d5f0874585e9c0a33c4571805f6
Stats:     297 lines in 6 files changed: 276 ins; 17 del; 4 mod

8259643: ZGC can return metaspace OOM prematurely

Reviewed-by: stefank, pliden, stuefe

-------------

PR: https://git.openjdk.java.net/jdk/pull/2289

From coleenp at openjdk.java.net  Thu Nov 18 14:43:40 2021
From: coleenp at openjdk.java.net (Coleen Phillimore)
Date: Thu, 18 Nov 2021 14:43:40 GMT
Subject: RFR: 8277212: GC accidentally cleans valid megamorphic vtable
 inline caches
In-Reply-To: <9wD0oJ2P5bG1qYJ71qdCN4-Q_fiOkKWh4kXTdw8Yb8o=.fc8b7be9-7818-4414-9177-542e2b061480@github.com>
References: <9wD0oJ2P5bG1qYJ71qdCN4-Q_fiOkKWh4kXTdw8Yb8o=.fc8b7be9-7818-4414-9177-542e2b061480@github.com>
Message-ID: <ZFfgDOUYfH3aHS4TzmXZ88aOr_HYOgWPbIx9qR5p-Qc=.4d7424bb-ce0a-4d86-ae13-f1b4d38a5564@github.com>

On Thu, 18 Nov 2021 09:56:37 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:

> We got a report on the zgc-dev list about a large performance issue affecting ZGC:
> https://mail.openjdk.java.net/pipermail/zgc-dev/2021-November/001086.html
> 
> One of the issues that the reporter identified was that we could get extremely long class unloading / unlinking times:
> 
> [17606.140s][info][gc,phases ] GC(719) Concurrent Process Non-Strong References 25781.928ms
> [17610.181s][info][gc,stats ] Subphase: Concurrent Classes Unlink 14280.772 / 25769.511 1126.563 / 25769.511 217.882 / 68385.750 217.882 / 68385.750 ms
> 
> and while this were happening we got a huge number of ICBufferFull safepoints.
> 
> It turns out that we have a 10-year-old bug in the inline cache cleaning code. This code came in with the permgen removal. See how the original code only calls set_to_clean when ic_oop is non-null:
> 
> https://github.com/openjdk/jdk/commit/5c58d27aac7b291b879a7a3ff6f39fca25619103
> 
>         CompiledIC *ic = CompiledIC_at(iter.reloc());
>         oop ic_oop = ic->cached_oop();
>         if (ic_oop != NULL && !is_alive->do_object_b(ic_oop)) {
>           // The only exception is compiledICHolder oops which may
>           // yet be marked below. (We check this further below).
>           if (ic_oop->is_compiledICHolder()) {
>             compiledICHolderOop cichk_oop = compiledICHolderOop(ic_oop);
>             if (is_alive->do_object_b(
>                   cichk_oop->holder_method()->method_holder()) &&
>                 is_alive->do_object_b(cichk_oop->holder_klass())) {
>               continue;
>             }
>           }
>           ic->set_to_clean();
>           assert(ic->cached_oop() == NULL,
>                  "cached oop in IC should be cleared");
>         }
>       }
> 
> 
> The rewritten code put the set_to_clean call in a different scope, causing the CompiledIC to also be cleaned when ic_oop is NULL:
> 
>         CompiledIC *ic = CompiledIC_at(iter.reloc());
>         if (ic->is_icholder_call()) {
>           // The only exception is compiledICHolder oops which may
>           // yet be marked below. (We check this further below).
>           CompiledICHolder* cichk_oop = ic->cached_icholder();
>           if (cichk_oop->holder_method()->method_holder()->is_loader_alive(is_alive) &&
>               cichk_oop->holder_klass()->is_loader_alive(is_alive)) {
>               continue;
>             }
>         } else {
>           Metadata* ic_oop = ic->cached_metadata();
>           if (ic_oop != NULL) {
>             if (ic_oop->is_klass()) {
>               if (((Klass*)ic_oop)->is_loader_alive(is_alive)) {
>                 continue;
>               }
>             } else if (ic_oop->is_method()) {
>               if (((Method*)ic_oop)->method_holder()->is_loader_alive(is_alive)) {
>                 continue;
>               }
>             } else {
>               ShouldNotReachHere();
>             }
>           }
>           }
>           ic->set_to_clean();
>       }
> 
> 
> Note the weird indentation, which could be seen as a hint that this might be a dubious / accidental change.
> 
> To understand why this is causing the problems we are seeing it's good to start by reading:
> https://wiki.openjdk.java.net/display/HotSpot/Overview+of+CompiledIC+and+CompiledStaticCall
> 
> When the GC hits this path and finds an ic_oop that is NULL, it means that it is dealing with an inline cache that is a megamorphic vtable call (or clean). Those should not be cleaned (at least that wasn't the intention of the old code).
> 
> But now we do clean them, and to do so we use an ICStub (to make a safe transition to the clean state), which uses up slots in the ICBuffer. When the ICBuffer is full, concurrent GCs have to stop and schedule an ICBufferFull safepoint stop-the-world operation, which removes the ICStub from the inline cache and completely frees up the ICBuffer. If the GC cleans a lot of these megamorphic vtable inline caches, then we'll create a large number of ICBufferFull safepoints.
> 
> But it is even worse than that. After the class unloading GCs have destroyed all megamorphic vtable inline caches, the Java threads will see these cleaned inline caches and correct them. Correcting the cleaned inline caches from the Java threads will also use ICStubs, and eventually the inline caches will transition back to be a megamorphic vtable calls. Because of this we can end up in a situation where the GC and Java threads change the inline cache back and forth between clean and megamorphic vtable calls. When this happen both GC and Java threads will continuously schedule ICBufferFull safepoints, and this can go on for many seconds, even minutes, if we are unlucky. For ZGC this has the effect that it blocks any further GC work, and eventually the Java threads will run out of memory and hit allocation stalls. The Java threads will then wait for the GC "clean" all inline caches and exit the class unloading phase and proceed to the phase where memory is reclaimed. You can see in 
 the GC logs that even though the problematic unlinking phase goes on for many seconds, the allocation stalls are "only" a few hundred milliseconds. This shows that when the Java threads stop fighting over the inline caches, the GC can finish the work relatively quickly.
> 
> G1 performs the inline cache cleaning while the Java threads are stopped, and therefore don't have to use ICStubs when the megamorphic vtables are accidentally cleaned. So, G1 (and other stop-the-world class unloading GCs) won't enter the situation where the GC and Java thread concurrently fight over the inline caches. It still causes the Java threads to have to take a slow path and fix the inline caches, which can result in unnecessary ICBufferFull safepoints.
> 
> I been able to reproduce the issue where ZGC and the Java threads fight over the ICStubs, causing minute long unloading times, by running one of the microbenchmarks from the Blackbird library used by the reporter of this issue. See description in:
> https://mail.openjdk.java.net/pipermail/zgc-dev/2021-November/001096.html
> 
> I think this could be reproduced in other workloads as well. I've also been able to reproduce the excessive ICBufferFull safepoints with Kitchensink (an oracle-internal stress test).
> 
> I've verified that restoring the set_to_clean code to the right scope fixes the issue that I can reproduce with both Blackbird and Kitchensink. After the fix, the class unloading times go back to normal levels.
> 
> To identify this issue, it's good to run with -Xlog:gc*,safepoint and take note of the "Concurrent Process Non-Strong References" times and ICBufferFull safepoint lines.
> 
> Example logs from ZGC where concurrent cleaning causes ICBufferFull safepoints:
> 
> [38.557s][1637062062666ms][info ][gc,phases   ] GC(222) Concurrent Mark Free 0.001ms
> [38.565s][1637062062673ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 7389821 ns, Reaching safepoint: 167546 ns, At safepoint: 6840 ns, Total: 174386 ns
> [38.565s][1637062062673ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 27749 ns, Reaching safepoint: 89368 ns, At safepoint: 5710 ns, Total: 95078 ns
> [38.566s][1637062062674ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 678872 ns, Reaching safepoint: 145967 ns, At safepoint: 6969 ns, Total: 152936 ns
> [38.567s][1637062062675ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 934596 ns, Reaching safepoint: 165826 ns, At safepoint: 5460 ns, Total: 171286 ns
> [38.567s][1637062062675ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 16500 ns, Reaching safepoint: 91147 ns, At safepoint: 5770 ns, Total: 96917 ns
> [38.568s][1637062062677ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1124041 ns, Reaching safepoint: 154426 ns, At safepoint: 6280 ns, Total: 160706 ns
> [38.570s][1637062062678ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1222819 ns, Reaching safepoint: 152646 ns, At safepoint: 6920 ns, Total: 159566 ns
> [38.571s][1637062062679ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1070303 ns, Reaching safepoint: 152686 ns, At safepoint: 6029 ns, Total: 158715 ns
> [38.571s][1637062062679ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 23650 ns, Reaching safepoint: 83208 ns, At safepoint: 6170 ns, Total: 89378 ns
> [38.572s][1637062062681ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1005014 ns, Reaching safepoint: 148206 ns, At safepoint: 5660 ns, Total: 153866 ns
> [38.572s][1637062062681ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 15110 ns, Reaching safepoint: 84047 ns, At safepoint: 5690 ns, Total: 89737 ns
> [38.574s][1637062062682ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1370755 ns, Reaching safepoint: 171876 ns, At safepoint: 5030 ns, Total: 176906 ns
> [38.574s][1637062062682ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 19749 ns, Reaching safepoint: 82478 ns, At safepoint: 4740 ns, Total: 87218 ns
> [38.574s][1637062062682ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 12480 ns, Reaching safepoint: 86707 ns, At safepoint: 5040 ns, Total: 91747 ns
> [38.575s][1637062062684ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 904007 ns, Reaching safepoint: 162666 ns, At safepoint: 5160 ns, Total: 167826 ns
> [38.575s][1637062062684ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14269 ns, Reaching safepoint: 80878 ns, At safepoint: 5420 ns, Total: 86298 ns
> [38.577s][1637062062685ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1240908 ns, Reaching safepoint: 144267 ns, At safepoint: 7030 ns, Total: 151297 ns
> [38.578s][1637062062686ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 971325 ns, Reaching safepoint: 175725 ns, At safepoint: 4710 ns, Total: 180435 ns
> [38.578s][1637062062686ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 16140 ns, Reaching safepoint: 80258 ns, At safepoint: 5389 ns, Total: 85647 ns
> [38.578s][1637062062686ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 10290 ns, Reaching safepoint: 80858 ns, At safepoint: 5530 ns, Total: 86388 ns
> [38.579s][1637062062687ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 430509 ns, Reaching safepoint: 159906 ns, At safepoint: 4610 ns, Total: 164516 ns
> [38.579s][1637062062687ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 18269 ns, Reaching safepoint: 83838 ns, At safepoint: 4520 ns, Total: 88358 ns
> [38.579s][1637062062687ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 13270 ns, Reaching safepoint: 77928 ns, At safepoint: 4790 ns, Total: 82718 ns
> [38.579s][1637062062688ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 384230 ns, Reaching safepoint: 193705 ns, At safepoint: 4080 ns, Total: 197785 ns
> [38.579s][1637062062688ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14099 ns, Reaching safepoint: 80908 ns, At safepoint: 4840 ns, Total: 85748 ns
> [38.580s][1637062062688ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 9150 ns, Reaching safepoint: 79268 ns, At safepoint: 4890 ns, Total: 84158 ns
> [38.580s][1637062062689ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 549396 ns, Reaching safepoint: 143086 ns, At safepoint: 6430 ns, Total: 149516 ns
> [38.580s][1637062062689ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 12540 ns, Reaching safepoint: 94717 ns, At safepoint: 5800 ns, Total: 100517 ns
> [38.581s][1637062062690ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 847758 ns, Reaching safepoint: 146687 ns, At safepoint: 5969 ns, Total: 152656 ns
> [38.582s][1637062062691ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 972285 ns, Reaching safepoint: 128177 ns, At safepoint: 6350 ns, Total: 134527 ns
> [38.584s][1637062062692ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 986975 ns, Reaching safepoint: 136396 ns, At safepoint: 5770 ns, Total: 142166 ns
> [38.584s][1637062062692ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 17280 ns, Reaching safepoint: 87097 ns, At safepoint: 5270 ns, Total: 92367 ns
> [38.585s][1637062062693ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1143131 ns, Reaching safepoint: 188315 ns, At safepoint: 5250 ns, Total: 193565 ns
> [38.585s][1637062062694ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 12200 ns, Reaching safepoint: 80168 ns, At safepoint: 7480 ns, Total: 87648 ns
> [38.586s][1637062062695ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1153410 ns, Reaching safepoint: 166846 ns, At safepoint: 7060 ns, Total: 173906 ns
> [38.587s][1637062062695ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 21549 ns, Reaching safepoint: 89898 ns, At safepoint: 5360 ns, Total: 95258 ns
> [38.588s][1637062062696ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1129411 ns, Reaching safepoint: 156726 ns, At safepoint: 4810 ns, Total: 161536 ns
> [38.588s][1637062062696ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14089 ns, Reaching safepoint: 80588 ns, At safepoint: 5170 ns, Total: 85758 ns
> [38.589s][1637062062697ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 644824 ns, Reaching safepoint: 140666 ns, At safepoint: 5990 ns, Total: 146656 ns
> [38.590s][1637062062699ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1085312 ns, Reaching safepoint: 254264 ns, At safepoint: 5440 ns, Total: 259704 ns
> [38.590s][1637062062699ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14609 ns, Reaching safepoint: 83748 ns, At safepoint: 5610 ns, Total: 89358 ns
> [38.591s][1637062062699ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 387680 ns, Reaching safepoint: 201215 ns, At safepoint: 5340 ns, Total: 206555 ns
> [38.591s][1637062062699ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 18929 ns, Reaching safepoint: 85098 ns, At safepoint: 5910 ns, Total: 91008 ns
> [38.591s][1637062062700ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 380750 ns, Reaching safepoint: 175066 ns, At safepoint: 4730 ns, Total: 179796 ns
> [38.592s][1637062062700ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14500 ns, Reaching safepoint: 80577 ns, At safepoint: 6790 ns, Total: 87367 ns
> [38.592s][1637062062700ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14660 ns, Reaching safepoint: 78498 ns, At safepoint: 7180 ns, Total: 85678 ns
> [38.592s][1637062062701ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 655783 ns, Reaching safepoint: 141717 ns, At safepoint: 6089 ns, Total: 147806 ns
> [38.594s][1637062062702ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 916657 ns, Reaching safepoint: 144226 ns, At safepoint: 5360 ns, Total: 149586 ns
> [38.595s][1637062062703ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1012334 ns, Reaching safepoint: 133037 ns, At safepoint: 10439 ns, Total: 143476 ns
> [38.597s][1637062062705ms][info ][gc,phases   ] GC(222) Concurrent Process Non-Strong References 39.443ms
> 
> 
> Example logs from G1 where the Java threads fixes the cleaned inline caches and run out of ICStubs:
> 
> [125.998s][1637065197322ms][info ][gc          ] GC(1040) Pause Remark 586M->414M(2048M) 6.609ms
> [125.998s][1637065197322ms][info ][gc,cpu      ] GC(1040) User=0.08s Sys=0.00s Real=0.01s
> [125.998s][1637065197322ms][info ][safepoint   ] Safepoint "G1Concurrent", Time since last: 33150646 ns, Reaching safepoint: 103457 ns, At safepoint: 6666988 ns, Total: 6770445 ns
> [125.998s][1637065197322ms][info ][gc,marking  ] GC(1040) Concurrent Mark 38.296ms
> [125.998s][1637065197322ms][info ][gc,marking  ] GC(1040) Concurrent Rebuild Remembered Sets
> [126.001s][1637065197326ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 2621782 ns, Reaching safepoint: 626684 ns, At safepoint: 9340 ns, Total: 636024 ns
> [126.002s][1637065197326ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 19949 ns, Reaching safepoint: 714022 ns, At safepoint: 12160 ns, Total: 726182 ns
> [126.007s][1637065197331ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 4665009 ns, Reaching safepoint: 339751 ns, At safepoint: 9640 ns, Total: 349391 ns
> [126.009s][1637065197334ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 2274802 ns, Reaching safepoint: 365760 ns, At safepoint: 9250 ns, Total: 375010 ns
> [126.027s][1637065197352ms][info ][gc,marking  ] GC(1040) Concurrent Rebuild Remembered Sets 29.618ms
> 
> 
> I've tested the performance of the change with SPECjbb2015, SPECjvm2008, DaCapo, Renaissance.
> 
> I've tested run the patch through tier1-7.
> 
> Note that I've made patch as small as possible to make it easier to backport. Thanks @fisk for discussion and explanation of the inline caches code.

Great job tracking this down!   It does look like it was a merge error from the original code that's escaped notice until now. Well done!

src/hotspot/share/code/compiledMethod.cpp line 482:

> 480:       }
> 481:     } else {
> 482:       return true;

I've given up pretending to understand this code, but could you add a one line comment why you're returning true here?  ie. if ic_metadata is NULL, it's a megamorphic call or already clean and shouldn't be cleaned.

-------------

Marked as reviewed by coleenp (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6450

From plevart at openjdk.java.net  Thu Nov 18 14:56:50 2021
From: plevart at openjdk.java.net (Peter Levart)
Date: Thu, 18 Nov 2021 14:56:50 GMT
Subject: RFR: JDK-8276422 Add command-line option to disable finalization
 [v2]
In-Reply-To: <t6r4zgk8qUMXdSDYBxr-V6-KXAaB0nmawicCyc5JZoA=.cca430e6-b4e6-4eb4-b820-5be99f3574dd@github.com>
References: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
 <aY7bIf24C-sA8R3hoi6dHmqd2R6QgzGUMg4UiWsy_5w=.a7144786-a08d-4def-9468-f13848200656@github.com>
 <f3w2BhEricECDGSIpGaZkuSKjTFus9LK-ZBV9BxvymM=.0a67f8df-912f-47c5-969d-b5a8161c31bf@github.com>
 <sw5vIggPSwpfKnyNbt8UExUxh0YWm5MXRI3tcwZSgPo=.a5b6fb7f-d344-4227-af96-5dee5d3d991e@github.com>
 <U84HM_PqW_WsrjHouuAdxceM6fYs8C9a0LOlhI5TnF8=.3a879bd2-696a-4209-ad39-d6e9d871bb0f@github.com>
 <cTtwBoryFv_Jk5upDHI3n0cAh_fERT_X5Kx9Gvxpx98=.f0021e32-acf1-4e05-b175-504771b98b48@github.com>
 <z41cE3BjXwZFI7VSTlJfbXkxB7hndTdJfrr16svSwKY=.d1dc118f-30bf-4d25-b94b-4ed9679b3159@github.com>
 <t6r4zgk8qUMXdSDYBxr-V6-KXAaB0nmawicCyc5JZoA=.cca430e6-b4e6-4eb4-b820-5be99f3574dd@github.com>
Message-ID: <RtH2sOhHogV5vWttU3NuYCyCGvJWHg_WNL3mg-Rj1ag=.76a419ed-284f-41b1-9b4b-e25e3e9d338b@github.com>

On Thu, 18 Nov 2021 08:39:52 GMT, Alan Bateman <alanb at openjdk.org> wrote:

>> Yeah, "flag" is `Holder.ENABLED` here. I mean, are Java methods `registerFinalizer` and `runFinalization` called only by VM? If so, can VM check the whole thing on VM side, without going to Java and asking back from there?
>
> I think @shipilev asks a good question. This could be done completely in the VM without the changes to j.l.ref.Finalizer. The CLI option is for experimenting, at least in the short term, and should be benign to have the Finalizer thread running, it just won't do anything.

Or, you could move the static initialization block that statrts the finalizer thread into the Finalizer.FinalizerThread class itself and then arrange for that class to be initialized explicitly immediately after the Finalizer class, but conditionally, only if the option to disable finalization was not specified...
This way the Finalizer class could still be initialized early, but the thread would not be started if it is not needed.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6442

From plevart at openjdk.java.net  Thu Nov 18 15:08:45 2021
From: plevart at openjdk.java.net (Peter Levart)
Date: Thu, 18 Nov 2021 15:08:45 GMT
Subject: RFR: JDK-8276422 Add command-line option to disable finalization
 [v2]
In-Reply-To: <RtH2sOhHogV5vWttU3NuYCyCGvJWHg_WNL3mg-Rj1ag=.76a419ed-284f-41b1-9b4b-e25e3e9d338b@github.com>
References: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
 <aY7bIf24C-sA8R3hoi6dHmqd2R6QgzGUMg4UiWsy_5w=.a7144786-a08d-4def-9468-f13848200656@github.com>
 <f3w2BhEricECDGSIpGaZkuSKjTFus9LK-ZBV9BxvymM=.0a67f8df-912f-47c5-969d-b5a8161c31bf@github.com>
 <sw5vIggPSwpfKnyNbt8UExUxh0YWm5MXRI3tcwZSgPo=.a5b6fb7f-d344-4227-af96-5dee5d3d991e@github.com>
 <U84HM_PqW_WsrjHouuAdxceM6fYs8C9a0LOlhI5TnF8=.3a879bd2-696a-4209-ad39-d6e9d871bb0f@github.com>
 <cTtwBoryFv_Jk5upDHI3n0cAh_fERT_X5Kx9Gvxpx98=.f0021e32-acf1-4e05-b175-504771b98b48@github.com>
 <z41cE3BjXwZFI7VSTlJfbXkxB7hndTdJfrr16svSwKY=.d1dc118f-30bf-4d25-b94b-4ed9679b3159@github.com>
 <t6r4zgk8qUMXdSDYBxr-V6-KXAaB0nmawicCyc5JZoA=.cca430e6-b4e6-4eb4-b820-5be99f3574dd@github.com>
 <RtH2sOhHogV5vWttU3NuYCyCGvJWHg_WNL3mg-Rj1ag=.76a419ed-284f-41b1-9b4b-e25e3e9d338b@github.com>
Message-ID: <WnXJ-krRcrrWjNIWs1CmXmEWMNSs18zbYxGuQcgdfRo=.c45b93ae-2f4b-460d-8cf9-8e3fcb5289c1@github.com>

On Thu, 18 Nov 2021 14:53:38 GMT, Peter Levart <plevart at openjdk.org> wrote:

>> I think @shipilev asks a good question. This could be done completely in the VM without the changes to j.l.ref.Finalizer. The CLI option is for experimenting, at least in the short term, and should be benign to have the Finalizer thread running, it just won't do anything.
>
> Or, you could move the static initialization block that statrts the finalizer thread into the Finalizer.FinalizerThread class itself and then arrange for that class to be initialized explicitly immediately after the Finalizer class, but conditionally, only if the option to disable finalization was not specified...
> This way the Finalizer class could still be initialized early, but the thread would not be started if it is not needed.

If you then need this "flag" in the assert of registerFinalizer and runFinalization, you could use unsafe.shouldBeInitialized(Finalizer.FinalizerThread.class) as a means to find out whether the flag was set or not...

-------------

PR: https://git.openjdk.java.net/jdk/pull/6442

From stuefe at openjdk.java.net  Thu Nov 18 15:25:15 2021
From: stuefe at openjdk.java.net (Thomas Stuefe)
Date: Thu, 18 Nov 2021 15:25:15 GMT
Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks [v3]
In-Reply-To: <h_TohyRCtTn8BzN-h8uTTotWxPr5gTe4CWB04_lS8uA=.3f687b45-3058-42ab-902b-f42cfeb9104d@github.com>
References: <h_TohyRCtTn8BzN-h8uTTotWxPr5gTe4CWB04_lS8uA=.3f687b45-3058-42ab-902b-f42cfeb9104d@github.com>
Message-ID: <ar7S0Y42f-xf1ii4ntYtOIEd5Nz0_W6wCMDoJD0x-S0=.75583559-34d4-4362-9ff7-b4a8c41d31dc@github.com>

> This is part of a number of RFE I plan to improve and simplify C-heap overflow checking in hotspot.
> 
> This proposal adds NMT buffer overflow checking:
> 
> - it gives us C-heap overflow checking in release builds
> - the costs are neglectable: if NMT is off, we won't pay anything; if NMT is on, the added work is minuscule since we have to do malloc header management anyway.
> - NMT needs intact headers anyway. Faced with buffer overwrites today, it would maybe crash or maybe account wrongly, but it's a bit of a lottery really. Better to go the extra step and do a real check.
> - it could be a preparation for future code removal, if we wanted to do that (see details in umbrella RFE https://bugs.openjdk.java.net/browse/JDK-8275301). That way, net complexity would come down even with this patch.
> 
> For more details, please see the JBS issue.
> 
> ----
> 
> Patch notes:
> 
> - The malloc header is changed such that it contains a 16-bit canary directly preceding the user payload of the allocation. The new malloc header does not use bitfields anymore but normal types. For more details, see the comment in mallocTracker.hpp.
>   - On 64-bit, we don't enlarge the malloc header. It remains 16 bytes in length. So no additional memory cost (apart from the 1-byte-footer, see below). Space for the canary is instead obtained by reducing the size of the bucket index bit field to 16 bits. That bit field is used to store the bucket slot index of the malloc site table in NMT detail mode. With 40 bits it was over-dimensioned, and even 16-bits arguably still are: malloc site table width is 512.
>   - On 32-bit, I had to enlarge the header from 8 bytes to 16 bytes to make room for a canary. But strictly speaking 8 bytes were not enough anyway: the header size has to be large enough to satisfy malloc(3) alignment, and that would be 16 bytes. I believe it never led to an error since we don't store 128bit data in malloc'd memory in the hotspot anywhere.
> 
> - I added a footer canary trailing the user allocation to catch tail buffer overruns. To keep matters simple (alignment) I made it a single byte only. That is enough to catch most overrun scenarios.
> 
> - I brushed up error reporting. When NMT detects corruption, it will now print out a hex dump of the corrupted area to tty before asserting.
> 
> - I added a bunch of gtests to test various heap overwrite scenarios. I also had to extend the gtest macros a bit because I wanted these tests of course to run in release builds too, but we did not have a death test macro for release builds yet (there are possibilities for code simplification here too, but that's for another RFE).
> 
> - I renamed `nmt_header_size` to `nmt_overhead` since that size includes header and footer now.
> 
> - I made the assert for malloc site table width a compile time STATIC_ASSERT.
> 
> --------------
> 
> Example output a buffer overrun would provide:
> 
> 
> Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)
> NMT Block at 0x00005600f86136b0, corruption at: 0x00005600f86136c1: 
> 0x00005600f86136a8:   21 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00
> 0x00005600f86136b8:   00 00 00 00 0f 00 1f fa 00 61 00 00 00 00 00 00
> 0x00005600f86136c8:   41 39 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00005600f86136d8:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> 0x00005600f86136e8:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00005600f86136f8:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00005600f8613708:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00005600f8613718:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00005600f8613728:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00005600f8613738:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> assert failed: fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)#
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  Internal Error (mallocTracker.cpp:203), pid=10805, tid=10805
> #  fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)
> #
> 
> -------
> 
> Tests:
> - manual tests with Linux x64, x86, minimal build
> - GHAs all clean
> - SAP nightlies ran for 4 weeks now without problems

Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:

  Feedback Volker

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/5952/files
  - new: https://git.openjdk.java.net/jdk/pull/5952/files/e04a105d..a1611e78

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5952&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5952&range=01-02

  Stats: 42 lines in 2 files changed: 19 ins; 5 del; 18 mod
  Patch: https://git.openjdk.java.net/jdk/pull/5952.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/5952/head:pull/5952

PR: https://git.openjdk.java.net/jdk/pull/5952

From stuefe at openjdk.java.net  Thu Nov 18 15:25:20 2021
From: stuefe at openjdk.java.net (Thomas Stuefe)
Date: Thu, 18 Nov 2021 15:25:20 GMT
Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks [v2]
In-Reply-To: <qiFO3w9PJClRnvSSkPrqIy5MJj5L4klQZBEcK32o0qc=.a00f1ec9-e103-4e7a-ba02-6f0b0e1a4163@github.com>
References: <h_TohyRCtTn8BzN-h8uTTotWxPr5gTe4CWB04_lS8uA=.3f687b45-3058-42ab-902b-f42cfeb9104d@github.com>
 <GZdP4Skyrnhyww0vSmutTLOwvYW3tCjigaC9lQKTfd8=.fda271b3-06aa-4603-b1c1-fa3f52400be5@github.com>
 <qiFO3w9PJClRnvSSkPrqIy5MJj5L4klQZBEcK32o0qc=.a00f1ec9-e103-4e7a-ba02-6f0b0e1a4163@github.com>
Message-ID: <mV_ShP-2VGu5c24OzG6DOhMSQeomYlI7A8xkurKdnx8=.2cd84227-bb81-48ce-a975-d685b604da60@github.com>

On Thu, 18 Nov 2021 13:34:11 GMT, Volker Simonis <simonis at openjdk.org> wrote:

>> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision:
>> 
>>  - Merge
>>  - Let NMT do overflow detection
>
> Hi Thoms,
> 
> your change looks good. I only have a few remarks and comments inline.
> 
> Best regards,
> Volker

Thanks a lot @simonis for the review!

I massaged the patch a bit, improving comments and rewriting the block print function. I think its now easier to understand. I also extended the dump size somewhat, now it looks like this:

corruption and header close together:

NMT Block at 0x00005571a7030330, corruption at: 0x00005571a7030341: 
0x00005571a70302b0:   30 1c 05 a7 71 55 00 00 60 e5 f3 31 fa 7f 00 00
0x00005571a70302c0:   47 e6 f3 31 fa 7f 00 00 11 e6 f3 31 fa 7f 00 00
0x00005571a70302d0:   84 e6 f3 31 fa 7f 00 00 f1 f1 f1 f1 f1 f1 f1 f1
0x00005571a70302e0:   00 00 00 00 f1 f1 f1 f1 fa ab ab ab ab ab ab ab
0x00005571a70302f0:   ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab
0x00005571a7030300:   ab ba ba ba ba ba ba ba 51 00 00 00 00 00 00 00
0x00005571a7030310:   ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab
0x00005571a7030320:   12 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x00005571a7030330:   01 00 00 00 00 00 00 00 f1 f1 f1 f1 0f 00 1f fa
0x00005571a7030340:   f1 61 ab ab ab ab ab ab ab ab ab ab ab ab ab ab
0x00005571a7030350:   ab ab 00 00 00 00 00 00 61 00 00 00 00 00 00 00
0x00005571a7030360:   ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab
0x00005571a7030370:   21 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x00005571a7030380:   10 00 00 00 00 00 00 00 f1 f1 f1 f1 0b 00 1f fa
0x00005571a7030390:   00 00 00 00 00 00 00 00 d8 a5 01 c8 f9 7f 00 00
0x00005571a70303a0:   fa ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab
0x00005571a70303b0:   ab ba ba ba ba ba ba ba 61 00 00 00 00 00 00 00 


Corruption and header apart:

NMT Block at 0x0000564da16bee10, corruption at: 0x0000564da16c0e20: 
0x0000564da16bed90:   00 00 00 00 00 00 00 00 00 ba ba ba ba ba ba ba
0x0000564da16beda0:   ba ba ba ba ba ba ba ba 01 01 ba ba 00 00 00 00
0x0000564da16bedb0:   01 00 00 00 00 ba ba ba 64 00 00 00 ba ba ba ba
0x0000564da16bedc0:   d0 ed 6b a1 4d 56 00 00 00 00 00 00 00 00 00 00
0x0000564da16bedd0:   00 ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba
0x0000564da16bede0:   00 ba ba ba ba ba ba ba 51 20 00 00 00 00 00 00
0x0000564da16bedf0:   ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab
0x0000564da16bee00:   11 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0000564da16bee10:   00 20 00 00 00 00 00 00 f1 f1 f1 f1 0f 00 1f fa
0x0000564da16bee20:   f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1
0x0000564da16bee30:   f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1
0x0000564da16bee40:   f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1
0x0000564da16bee50:   f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1
0x0000564da16bee60:   f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1
0x0000564da16bee70:   f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1
0x0000564da16bee80:   f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 
...
0x0000564da16c0da0:   f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1
0x0000564da16c0db0:   f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1
0x0000564da16c0dc0:   f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1
0x0000564da16c0dd0:   f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1
0x0000564da16c0de0:   f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1
0x0000564da16c0df0:   f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1
0x0000564da16c0e00:   f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1
0x0000564da16c0e10:   f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1
0x0000564da16c0e20:   61 ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab
0x0000564da16c0e30:   ab ba ba ba ba ba ba ba d1 61 01 00 00 00 00 00
0x0000564da16c0e40:   ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba
0x0000564da16c0e50:   ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba
0x0000564da16c0e60:   ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba
0x0000564da16c0e70:   ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba
0x0000564da16c0e80:   ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba
0x0000564da16c0e90:   ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba 


Thanks!

Thomas

-------------

PR: https://git.openjdk.java.net/jdk/pull/5952

From stefank at openjdk.java.net  Thu Nov 18 15:26:10 2021
From: stefank at openjdk.java.net (Stefan Karlsson)
Date: Thu, 18 Nov 2021 15:26:10 GMT
Subject: RFR: 8277212: GC accidentally cleans valid megamorphic vtable
 inline caches [v2]
In-Reply-To: <9wD0oJ2P5bG1qYJ71qdCN4-Q_fiOkKWh4kXTdw8Yb8o=.fc8b7be9-7818-4414-9177-542e2b061480@github.com>
References: <9wD0oJ2P5bG1qYJ71qdCN4-Q_fiOkKWh4kXTdw8Yb8o=.fc8b7be9-7818-4414-9177-542e2b061480@github.com>
Message-ID: <hxvf5rgvpTrlGwzCyqz2rbGsJD5kNuz3U8_9iVHRh8I=.2f6ce421-2f91-4f0b-a722-3b7ac1346ebd@github.com>

> We got a report on the zgc-dev list about a large performance issue affecting ZGC:
> https://mail.openjdk.java.net/pipermail/zgc-dev/2021-November/001086.html
> 
> One of the issues that the reporter identified was that we could get extremely long class unloading / unlinking times:
> 
> [17606.140s][info][gc,phases ] GC(719) Concurrent Process Non-Strong References 25781.928ms
> [17610.181s][info][gc,stats ] Subphase: Concurrent Classes Unlink 14280.772 / 25769.511 1126.563 / 25769.511 217.882 / 68385.750 217.882 / 68385.750 ms
> 
> and while this were happening we got a huge number of ICBufferFull safepoints.
> 
> It turns out that we have a 10-year-old bug in the inline cache cleaning code. This code came in with the permgen removal. See how the original code only calls set_to_clean when ic_oop is non-null:
> 
> https://github.com/openjdk/jdk/commit/5c58d27aac7b291b879a7a3ff6f39fca25619103
> 
>         CompiledIC *ic = CompiledIC_at(iter.reloc());
>         oop ic_oop = ic->cached_oop();
>         if (ic_oop != NULL && !is_alive->do_object_b(ic_oop)) {
>           // The only exception is compiledICHolder oops which may
>           // yet be marked below. (We check this further below).
>           if (ic_oop->is_compiledICHolder()) {
>             compiledICHolderOop cichk_oop = compiledICHolderOop(ic_oop);
>             if (is_alive->do_object_b(
>                   cichk_oop->holder_method()->method_holder()) &&
>                 is_alive->do_object_b(cichk_oop->holder_klass())) {
>               continue;
>             }
>           }
>           ic->set_to_clean();
>           assert(ic->cached_oop() == NULL,
>                  "cached oop in IC should be cleared");
>         }
>       }
> 
> 
> The rewritten code put the set_to_clean call in a different scope, causing the CompiledIC to also be cleaned when ic_oop is NULL:
> 
>         CompiledIC *ic = CompiledIC_at(iter.reloc());
>         if (ic->is_icholder_call()) {
>           // The only exception is compiledICHolder oops which may
>           // yet be marked below. (We check this further below).
>           CompiledICHolder* cichk_oop = ic->cached_icholder();
>           if (cichk_oop->holder_method()->method_holder()->is_loader_alive(is_alive) &&
>               cichk_oop->holder_klass()->is_loader_alive(is_alive)) {
>               continue;
>             }
>         } else {
>           Metadata* ic_oop = ic->cached_metadata();
>           if (ic_oop != NULL) {
>             if (ic_oop->is_klass()) {
>               if (((Klass*)ic_oop)->is_loader_alive(is_alive)) {
>                 continue;
>               }
>             } else if (ic_oop->is_method()) {
>               if (((Method*)ic_oop)->method_holder()->is_loader_alive(is_alive)) {
>                 continue;
>               }
>             } else {
>               ShouldNotReachHere();
>             }
>           }
>           }
>           ic->set_to_clean();
>       }
> 
> 
> Note the weird indentation, which could be seen as a hint that this might be a dubious / accidental change.
> 
> To understand why this is causing the problems we are seeing it's good to start by reading:
> https://wiki.openjdk.java.net/display/HotSpot/Overview+of+CompiledIC+and+CompiledStaticCall
> 
> When the GC hits this path and finds an ic_oop that is NULL, it means that it is dealing with an inline cache that is a megamorphic vtable call (or clean). Those should not be cleaned (at least that wasn't the intention of the old code).
> 
> But now we do clean them, and to do so we use an ICStub (to make a safe transition to the clean state), which uses up slots in the ICBuffer. When the ICBuffer is full, concurrent GCs have to stop and schedule an ICBufferFull safepoint stop-the-world operation, which removes the ICStub from the inline cache and completely frees up the ICBuffer. If the GC cleans a lot of these megamorphic vtable inline caches, then we'll create a large number of ICBufferFull safepoints.
> 
> But it is even worse than that. After the class unloading GCs have destroyed all megamorphic vtable inline caches, the Java threads will see these cleaned inline caches and correct them. Correcting the cleaned inline caches from the Java threads will also use ICStubs, and eventually the inline caches will transition back to be a megamorphic vtable calls. Because of this we can end up in a situation where the GC and Java threads change the inline cache back and forth between clean and megamorphic vtable calls. When this happen both GC and Java threads will continuously schedule ICBufferFull safepoints, and this can go on for many seconds, even minutes, if we are unlucky. For ZGC this has the effect that it blocks any further GC work, and eventually the Java threads will run out of memory and hit allocation stalls. The Java threads will then wait for the GC "clean" all inline caches and exit the class unloading phase and proceed to the phase where memory is reclaimed. You can see in 
 the GC logs that even though the problematic unlinking phase goes on for many seconds, the allocation stalls are "only" a few hundred milliseconds. This shows that when the Java threads stop fighting over the inline caches, the GC can finish the work relatively quickly.
> 
> G1 performs the inline cache cleaning while the Java threads are stopped, and therefore don't have to use ICStubs when the megamorphic vtables are accidentally cleaned. So, G1 (and other stop-the-world class unloading GCs) won't enter the situation where the GC and Java thread concurrently fight over the inline caches. It still causes the Java threads to have to take a slow path and fix the inline caches, which can result in unnecessary ICBufferFull safepoints.
> 
> I been able to reproduce the issue where ZGC and the Java threads fight over the ICStubs, causing minute long unloading times, by running one of the microbenchmarks from the Blackbird library used by the reporter of this issue. See description in:
> https://mail.openjdk.java.net/pipermail/zgc-dev/2021-November/001096.html
> 
> I think this could be reproduced in other workloads as well. I've also been able to reproduce the excessive ICBufferFull safepoints with Kitchensink (an oracle-internal stress test).
> 
> I've verified that restoring the set_to_clean code to the right scope fixes the issue that I can reproduce with both Blackbird and Kitchensink. After the fix, the class unloading times go back to normal levels.
> 
> To identify this issue, it's good to run with -Xlog:gc*,safepoint and take note of the "Concurrent Process Non-Strong References" times and ICBufferFull safepoint lines.
> 
> Example logs from ZGC where concurrent cleaning causes ICBufferFull safepoints:
> 
> [38.557s][1637062062666ms][info ][gc,phases   ] GC(222) Concurrent Mark Free 0.001ms
> [38.565s][1637062062673ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 7389821 ns, Reaching safepoint: 167546 ns, At safepoint: 6840 ns, Total: 174386 ns
> [38.565s][1637062062673ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 27749 ns, Reaching safepoint: 89368 ns, At safepoint: 5710 ns, Total: 95078 ns
> [38.566s][1637062062674ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 678872 ns, Reaching safepoint: 145967 ns, At safepoint: 6969 ns, Total: 152936 ns
> [38.567s][1637062062675ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 934596 ns, Reaching safepoint: 165826 ns, At safepoint: 5460 ns, Total: 171286 ns
> [38.567s][1637062062675ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 16500 ns, Reaching safepoint: 91147 ns, At safepoint: 5770 ns, Total: 96917 ns
> [38.568s][1637062062677ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1124041 ns, Reaching safepoint: 154426 ns, At safepoint: 6280 ns, Total: 160706 ns
> [38.570s][1637062062678ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1222819 ns, Reaching safepoint: 152646 ns, At safepoint: 6920 ns, Total: 159566 ns
> [38.571s][1637062062679ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1070303 ns, Reaching safepoint: 152686 ns, At safepoint: 6029 ns, Total: 158715 ns
> [38.571s][1637062062679ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 23650 ns, Reaching safepoint: 83208 ns, At safepoint: 6170 ns, Total: 89378 ns
> [38.572s][1637062062681ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1005014 ns, Reaching safepoint: 148206 ns, At safepoint: 5660 ns, Total: 153866 ns
> [38.572s][1637062062681ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 15110 ns, Reaching safepoint: 84047 ns, At safepoint: 5690 ns, Total: 89737 ns
> [38.574s][1637062062682ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1370755 ns, Reaching safepoint: 171876 ns, At safepoint: 5030 ns, Total: 176906 ns
> [38.574s][1637062062682ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 19749 ns, Reaching safepoint: 82478 ns, At safepoint: 4740 ns, Total: 87218 ns
> [38.574s][1637062062682ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 12480 ns, Reaching safepoint: 86707 ns, At safepoint: 5040 ns, Total: 91747 ns
> [38.575s][1637062062684ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 904007 ns, Reaching safepoint: 162666 ns, At safepoint: 5160 ns, Total: 167826 ns
> [38.575s][1637062062684ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14269 ns, Reaching safepoint: 80878 ns, At safepoint: 5420 ns, Total: 86298 ns
> [38.577s][1637062062685ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1240908 ns, Reaching safepoint: 144267 ns, At safepoint: 7030 ns, Total: 151297 ns
> [38.578s][1637062062686ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 971325 ns, Reaching safepoint: 175725 ns, At safepoint: 4710 ns, Total: 180435 ns
> [38.578s][1637062062686ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 16140 ns, Reaching safepoint: 80258 ns, At safepoint: 5389 ns, Total: 85647 ns
> [38.578s][1637062062686ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 10290 ns, Reaching safepoint: 80858 ns, At safepoint: 5530 ns, Total: 86388 ns
> [38.579s][1637062062687ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 430509 ns, Reaching safepoint: 159906 ns, At safepoint: 4610 ns, Total: 164516 ns
> [38.579s][1637062062687ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 18269 ns, Reaching safepoint: 83838 ns, At safepoint: 4520 ns, Total: 88358 ns
> [38.579s][1637062062687ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 13270 ns, Reaching safepoint: 77928 ns, At safepoint: 4790 ns, Total: 82718 ns
> [38.579s][1637062062688ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 384230 ns, Reaching safepoint: 193705 ns, At safepoint: 4080 ns, Total: 197785 ns
> [38.579s][1637062062688ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14099 ns, Reaching safepoint: 80908 ns, At safepoint: 4840 ns, Total: 85748 ns
> [38.580s][1637062062688ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 9150 ns, Reaching safepoint: 79268 ns, At safepoint: 4890 ns, Total: 84158 ns
> [38.580s][1637062062689ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 549396 ns, Reaching safepoint: 143086 ns, At safepoint: 6430 ns, Total: 149516 ns
> [38.580s][1637062062689ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 12540 ns, Reaching safepoint: 94717 ns, At safepoint: 5800 ns, Total: 100517 ns
> [38.581s][1637062062690ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 847758 ns, Reaching safepoint: 146687 ns, At safepoint: 5969 ns, Total: 152656 ns
> [38.582s][1637062062691ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 972285 ns, Reaching safepoint: 128177 ns, At safepoint: 6350 ns, Total: 134527 ns
> [38.584s][1637062062692ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 986975 ns, Reaching safepoint: 136396 ns, At safepoint: 5770 ns, Total: 142166 ns
> [38.584s][1637062062692ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 17280 ns, Reaching safepoint: 87097 ns, At safepoint: 5270 ns, Total: 92367 ns
> [38.585s][1637062062693ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1143131 ns, Reaching safepoint: 188315 ns, At safepoint: 5250 ns, Total: 193565 ns
> [38.585s][1637062062694ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 12200 ns, Reaching safepoint: 80168 ns, At safepoint: 7480 ns, Total: 87648 ns
> [38.586s][1637062062695ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1153410 ns, Reaching safepoint: 166846 ns, At safepoint: 7060 ns, Total: 173906 ns
> [38.587s][1637062062695ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 21549 ns, Reaching safepoint: 89898 ns, At safepoint: 5360 ns, Total: 95258 ns
> [38.588s][1637062062696ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1129411 ns, Reaching safepoint: 156726 ns, At safepoint: 4810 ns, Total: 161536 ns
> [38.588s][1637062062696ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14089 ns, Reaching safepoint: 80588 ns, At safepoint: 5170 ns, Total: 85758 ns
> [38.589s][1637062062697ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 644824 ns, Reaching safepoint: 140666 ns, At safepoint: 5990 ns, Total: 146656 ns
> [38.590s][1637062062699ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1085312 ns, Reaching safepoint: 254264 ns, At safepoint: 5440 ns, Total: 259704 ns
> [38.590s][1637062062699ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14609 ns, Reaching safepoint: 83748 ns, At safepoint: 5610 ns, Total: 89358 ns
> [38.591s][1637062062699ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 387680 ns, Reaching safepoint: 201215 ns, At safepoint: 5340 ns, Total: 206555 ns
> [38.591s][1637062062699ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 18929 ns, Reaching safepoint: 85098 ns, At safepoint: 5910 ns, Total: 91008 ns
> [38.591s][1637062062700ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 380750 ns, Reaching safepoint: 175066 ns, At safepoint: 4730 ns, Total: 179796 ns
> [38.592s][1637062062700ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14500 ns, Reaching safepoint: 80577 ns, At safepoint: 6790 ns, Total: 87367 ns
> [38.592s][1637062062700ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14660 ns, Reaching safepoint: 78498 ns, At safepoint: 7180 ns, Total: 85678 ns
> [38.592s][1637062062701ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 655783 ns, Reaching safepoint: 141717 ns, At safepoint: 6089 ns, Total: 147806 ns
> [38.594s][1637062062702ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 916657 ns, Reaching safepoint: 144226 ns, At safepoint: 5360 ns, Total: 149586 ns
> [38.595s][1637062062703ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1012334 ns, Reaching safepoint: 133037 ns, At safepoint: 10439 ns, Total: 143476 ns
> [38.597s][1637062062705ms][info ][gc,phases   ] GC(222) Concurrent Process Non-Strong References 39.443ms
> 
> 
> Example logs from G1 where the Java threads fixes the cleaned inline caches and run out of ICStubs:
> 
> [125.998s][1637065197322ms][info ][gc          ] GC(1040) Pause Remark 586M->414M(2048M) 6.609ms
> [125.998s][1637065197322ms][info ][gc,cpu      ] GC(1040) User=0.08s Sys=0.00s Real=0.01s
> [125.998s][1637065197322ms][info ][safepoint   ] Safepoint "G1Concurrent", Time since last: 33150646 ns, Reaching safepoint: 103457 ns, At safepoint: 6666988 ns, Total: 6770445 ns
> [125.998s][1637065197322ms][info ][gc,marking  ] GC(1040) Concurrent Mark 38.296ms
> [125.998s][1637065197322ms][info ][gc,marking  ] GC(1040) Concurrent Rebuild Remembered Sets
> [126.001s][1637065197326ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 2621782 ns, Reaching safepoint: 626684 ns, At safepoint: 9340 ns, Total: 636024 ns
> [126.002s][1637065197326ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 19949 ns, Reaching safepoint: 714022 ns, At safepoint: 12160 ns, Total: 726182 ns
> [126.007s][1637065197331ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 4665009 ns, Reaching safepoint: 339751 ns, At safepoint: 9640 ns, Total: 349391 ns
> [126.009s][1637065197334ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 2274802 ns, Reaching safepoint: 365760 ns, At safepoint: 9250 ns, Total: 375010 ns
> [126.027s][1637065197352ms][info ][gc,marking  ] GC(1040) Concurrent Rebuild Remembered Sets 29.618ms
> 
> 
> I've tested the performance of the change with SPECjbb2015, SPECjvm2008, DaCapo, Renaissance.
> 
> I've tested run the patch through tier1-7.
> 
> Note that I've made patch as small as possible to make it easier to backport. Thanks @fisk for discussion and explanation of the inline caches code.

Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision:

  Review Coleen

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6450/files
  - new: https://git.openjdk.java.net/jdk/pull/6450/files/8a0aae06..af72104a

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6450&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6450&range=00-01

  Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6450.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6450/head:pull/6450

PR: https://git.openjdk.java.net/jdk/pull/6450

From stefank at openjdk.java.net  Thu Nov 18 15:26:12 2021
From: stefank at openjdk.java.net (Stefan Karlsson)
Date: Thu, 18 Nov 2021 15:26:12 GMT
Subject: RFR: 8277212: GC accidentally cleans valid megamorphic vtable
 inline caches
In-Reply-To: <9wD0oJ2P5bG1qYJ71qdCN4-Q_fiOkKWh4kXTdw8Yb8o=.fc8b7be9-7818-4414-9177-542e2b061480@github.com>
References: <9wD0oJ2P5bG1qYJ71qdCN4-Q_fiOkKWh4kXTdw8Yb8o=.fc8b7be9-7818-4414-9177-542e2b061480@github.com>
Message-ID: <fF0hUcZg5sTc-Ou7Qq_UrzOOwfWWznwEUM9M_7OVijo=.adcd966a-1054-4ab2-9900-c6140497f38b@github.com>

On Thu, 18 Nov 2021 09:56:37 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:

> We got a report on the zgc-dev list about a large performance issue affecting ZGC:
> https://mail.openjdk.java.net/pipermail/zgc-dev/2021-November/001086.html
> 
> One of the issues that the reporter identified was that we could get extremely long class unloading / unlinking times:
> 
> [17606.140s][info][gc,phases ] GC(719) Concurrent Process Non-Strong References 25781.928ms
> [17610.181s][info][gc,stats ] Subphase: Concurrent Classes Unlink 14280.772 / 25769.511 1126.563 / 25769.511 217.882 / 68385.750 217.882 / 68385.750 ms
> 
> and while this were happening we got a huge number of ICBufferFull safepoints.
> 
> It turns out that we have a 10-year-old bug in the inline cache cleaning code. This code came in with the permgen removal. See how the original code only calls set_to_clean when ic_oop is non-null:
> 
> https://github.com/openjdk/jdk/commit/5c58d27aac7b291b879a7a3ff6f39fca25619103
> 
>         CompiledIC *ic = CompiledIC_at(iter.reloc());
>         oop ic_oop = ic->cached_oop();
>         if (ic_oop != NULL && !is_alive->do_object_b(ic_oop)) {
>           // The only exception is compiledICHolder oops which may
>           // yet be marked below. (We check this further below).
>           if (ic_oop->is_compiledICHolder()) {
>             compiledICHolderOop cichk_oop = compiledICHolderOop(ic_oop);
>             if (is_alive->do_object_b(
>                   cichk_oop->holder_method()->method_holder()) &&
>                 is_alive->do_object_b(cichk_oop->holder_klass())) {
>               continue;
>             }
>           }
>           ic->set_to_clean();
>           assert(ic->cached_oop() == NULL,
>                  "cached oop in IC should be cleared");
>         }
>       }
> 
> 
> The rewritten code put the set_to_clean call in a different scope, causing the CompiledIC to also be cleaned when ic_oop is NULL:
> 
>         CompiledIC *ic = CompiledIC_at(iter.reloc());
>         if (ic->is_icholder_call()) {
>           // The only exception is compiledICHolder oops which may
>           // yet be marked below. (We check this further below).
>           CompiledICHolder* cichk_oop = ic->cached_icholder();
>           if (cichk_oop->holder_method()->method_holder()->is_loader_alive(is_alive) &&
>               cichk_oop->holder_klass()->is_loader_alive(is_alive)) {
>               continue;
>             }
>         } else {
>           Metadata* ic_oop = ic->cached_metadata();
>           if (ic_oop != NULL) {
>             if (ic_oop->is_klass()) {
>               if (((Klass*)ic_oop)->is_loader_alive(is_alive)) {
>                 continue;
>               }
>             } else if (ic_oop->is_method()) {
>               if (((Method*)ic_oop)->method_holder()->is_loader_alive(is_alive)) {
>                 continue;
>               }
>             } else {
>               ShouldNotReachHere();
>             }
>           }
>           }
>           ic->set_to_clean();
>       }
> 
> 
> Note the weird indentation, which could be seen as a hint that this might be a dubious / accidental change.
> 
> To understand why this is causing the problems we are seeing it's good to start by reading:
> https://wiki.openjdk.java.net/display/HotSpot/Overview+of+CompiledIC+and+CompiledStaticCall
> 
> When the GC hits this path and finds an ic_oop that is NULL, it means that it is dealing with an inline cache that is a megamorphic vtable call (or clean). Those should not be cleaned (at least that wasn't the intention of the old code).
> 
> But now we do clean them, and to do so we use an ICStub (to make a safe transition to the clean state), which uses up slots in the ICBuffer. When the ICBuffer is full, concurrent GCs have to stop and schedule an ICBufferFull safepoint stop-the-world operation, which removes the ICStub from the inline cache and completely frees up the ICBuffer. If the GC cleans a lot of these megamorphic vtable inline caches, then we'll create a large number of ICBufferFull safepoints.
> 
> But it is even worse than that. After the class unloading GCs have destroyed all megamorphic vtable inline caches, the Java threads will see these cleaned inline caches and correct them. Correcting the cleaned inline caches from the Java threads will also use ICStubs, and eventually the inline caches will transition back to be a megamorphic vtable calls. Because of this we can end up in a situation where the GC and Java threads change the inline cache back and forth between clean and megamorphic vtable calls. When this happen both GC and Java threads will continuously schedule ICBufferFull safepoints, and this can go on for many seconds, even minutes, if we are unlucky. For ZGC this has the effect that it blocks any further GC work, and eventually the Java threads will run out of memory and hit allocation stalls. The Java threads will then wait for the GC "clean" all inline caches and exit the class unloading phase and proceed to the phase where memory is reclaimed. You can see in 
 the GC logs that even though the problematic unlinking phase goes on for many seconds, the allocation stalls are "only" a few hundred milliseconds. This shows that when the Java threads stop fighting over the inline caches, the GC can finish the work relatively quickly.
> 
> G1 performs the inline cache cleaning while the Java threads are stopped, and therefore don't have to use ICStubs when the megamorphic vtables are accidentally cleaned. So, G1 (and other stop-the-world class unloading GCs) won't enter the situation where the GC and Java thread concurrently fight over the inline caches. It still causes the Java threads to have to take a slow path and fix the inline caches, which can result in unnecessary ICBufferFull safepoints.
> 
> I been able to reproduce the issue where ZGC and the Java threads fight over the ICStubs, causing minute long unloading times, by running one of the microbenchmarks from the Blackbird library used by the reporter of this issue. See description in:
> https://mail.openjdk.java.net/pipermail/zgc-dev/2021-November/001096.html
> 
> I think this could be reproduced in other workloads as well. I've also been able to reproduce the excessive ICBufferFull safepoints with Kitchensink (an oracle-internal stress test).
> 
> I've verified that restoring the set_to_clean code to the right scope fixes the issue that I can reproduce with both Blackbird and Kitchensink. After the fix, the class unloading times go back to normal levels.
> 
> To identify this issue, it's good to run with -Xlog:gc*,safepoint and take note of the "Concurrent Process Non-Strong References" times and ICBufferFull safepoint lines.
> 
> Example logs from ZGC where concurrent cleaning causes ICBufferFull safepoints:
> 
> [38.557s][1637062062666ms][info ][gc,phases   ] GC(222) Concurrent Mark Free 0.001ms
> [38.565s][1637062062673ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 7389821 ns, Reaching safepoint: 167546 ns, At safepoint: 6840 ns, Total: 174386 ns
> [38.565s][1637062062673ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 27749 ns, Reaching safepoint: 89368 ns, At safepoint: 5710 ns, Total: 95078 ns
> [38.566s][1637062062674ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 678872 ns, Reaching safepoint: 145967 ns, At safepoint: 6969 ns, Total: 152936 ns
> [38.567s][1637062062675ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 934596 ns, Reaching safepoint: 165826 ns, At safepoint: 5460 ns, Total: 171286 ns
> [38.567s][1637062062675ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 16500 ns, Reaching safepoint: 91147 ns, At safepoint: 5770 ns, Total: 96917 ns
> [38.568s][1637062062677ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1124041 ns, Reaching safepoint: 154426 ns, At safepoint: 6280 ns, Total: 160706 ns
> [38.570s][1637062062678ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1222819 ns, Reaching safepoint: 152646 ns, At safepoint: 6920 ns, Total: 159566 ns
> [38.571s][1637062062679ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1070303 ns, Reaching safepoint: 152686 ns, At safepoint: 6029 ns, Total: 158715 ns
> [38.571s][1637062062679ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 23650 ns, Reaching safepoint: 83208 ns, At safepoint: 6170 ns, Total: 89378 ns
> [38.572s][1637062062681ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1005014 ns, Reaching safepoint: 148206 ns, At safepoint: 5660 ns, Total: 153866 ns
> [38.572s][1637062062681ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 15110 ns, Reaching safepoint: 84047 ns, At safepoint: 5690 ns, Total: 89737 ns
> [38.574s][1637062062682ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1370755 ns, Reaching safepoint: 171876 ns, At safepoint: 5030 ns, Total: 176906 ns
> [38.574s][1637062062682ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 19749 ns, Reaching safepoint: 82478 ns, At safepoint: 4740 ns, Total: 87218 ns
> [38.574s][1637062062682ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 12480 ns, Reaching safepoint: 86707 ns, At safepoint: 5040 ns, Total: 91747 ns
> [38.575s][1637062062684ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 904007 ns, Reaching safepoint: 162666 ns, At safepoint: 5160 ns, Total: 167826 ns
> [38.575s][1637062062684ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14269 ns, Reaching safepoint: 80878 ns, At safepoint: 5420 ns, Total: 86298 ns
> [38.577s][1637062062685ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1240908 ns, Reaching safepoint: 144267 ns, At safepoint: 7030 ns, Total: 151297 ns
> [38.578s][1637062062686ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 971325 ns, Reaching safepoint: 175725 ns, At safepoint: 4710 ns, Total: 180435 ns
> [38.578s][1637062062686ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 16140 ns, Reaching safepoint: 80258 ns, At safepoint: 5389 ns, Total: 85647 ns
> [38.578s][1637062062686ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 10290 ns, Reaching safepoint: 80858 ns, At safepoint: 5530 ns, Total: 86388 ns
> [38.579s][1637062062687ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 430509 ns, Reaching safepoint: 159906 ns, At safepoint: 4610 ns, Total: 164516 ns
> [38.579s][1637062062687ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 18269 ns, Reaching safepoint: 83838 ns, At safepoint: 4520 ns, Total: 88358 ns
> [38.579s][1637062062687ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 13270 ns, Reaching safepoint: 77928 ns, At safepoint: 4790 ns, Total: 82718 ns
> [38.579s][1637062062688ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 384230 ns, Reaching safepoint: 193705 ns, At safepoint: 4080 ns, Total: 197785 ns
> [38.579s][1637062062688ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14099 ns, Reaching safepoint: 80908 ns, At safepoint: 4840 ns, Total: 85748 ns
> [38.580s][1637062062688ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 9150 ns, Reaching safepoint: 79268 ns, At safepoint: 4890 ns, Total: 84158 ns
> [38.580s][1637062062689ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 549396 ns, Reaching safepoint: 143086 ns, At safepoint: 6430 ns, Total: 149516 ns
> [38.580s][1637062062689ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 12540 ns, Reaching safepoint: 94717 ns, At safepoint: 5800 ns, Total: 100517 ns
> [38.581s][1637062062690ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 847758 ns, Reaching safepoint: 146687 ns, At safepoint: 5969 ns, Total: 152656 ns
> [38.582s][1637062062691ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 972285 ns, Reaching safepoint: 128177 ns, At safepoint: 6350 ns, Total: 134527 ns
> [38.584s][1637062062692ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 986975 ns, Reaching safepoint: 136396 ns, At safepoint: 5770 ns, Total: 142166 ns
> [38.584s][1637062062692ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 17280 ns, Reaching safepoint: 87097 ns, At safepoint: 5270 ns, Total: 92367 ns
> [38.585s][1637062062693ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1143131 ns, Reaching safepoint: 188315 ns, At safepoint: 5250 ns, Total: 193565 ns
> [38.585s][1637062062694ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 12200 ns, Reaching safepoint: 80168 ns, At safepoint: 7480 ns, Total: 87648 ns
> [38.586s][1637062062695ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1153410 ns, Reaching safepoint: 166846 ns, At safepoint: 7060 ns, Total: 173906 ns
> [38.587s][1637062062695ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 21549 ns, Reaching safepoint: 89898 ns, At safepoint: 5360 ns, Total: 95258 ns
> [38.588s][1637062062696ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1129411 ns, Reaching safepoint: 156726 ns, At safepoint: 4810 ns, Total: 161536 ns
> [38.588s][1637062062696ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14089 ns, Reaching safepoint: 80588 ns, At safepoint: 5170 ns, Total: 85758 ns
> [38.589s][1637062062697ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 644824 ns, Reaching safepoint: 140666 ns, At safepoint: 5990 ns, Total: 146656 ns
> [38.590s][1637062062699ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1085312 ns, Reaching safepoint: 254264 ns, At safepoint: 5440 ns, Total: 259704 ns
> [38.590s][1637062062699ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14609 ns, Reaching safepoint: 83748 ns, At safepoint: 5610 ns, Total: 89358 ns
> [38.591s][1637062062699ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 387680 ns, Reaching safepoint: 201215 ns, At safepoint: 5340 ns, Total: 206555 ns
> [38.591s][1637062062699ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 18929 ns, Reaching safepoint: 85098 ns, At safepoint: 5910 ns, Total: 91008 ns
> [38.591s][1637062062700ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 380750 ns, Reaching safepoint: 175066 ns, At safepoint: 4730 ns, Total: 179796 ns
> [38.592s][1637062062700ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14500 ns, Reaching safepoint: 80577 ns, At safepoint: 6790 ns, Total: 87367 ns
> [38.592s][1637062062700ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14660 ns, Reaching safepoint: 78498 ns, At safepoint: 7180 ns, Total: 85678 ns
> [38.592s][1637062062701ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 655783 ns, Reaching safepoint: 141717 ns, At safepoint: 6089 ns, Total: 147806 ns
> [38.594s][1637062062702ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 916657 ns, Reaching safepoint: 144226 ns, At safepoint: 5360 ns, Total: 149586 ns
> [38.595s][1637062062703ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1012334 ns, Reaching safepoint: 133037 ns, At safepoint: 10439 ns, Total: 143476 ns
> [38.597s][1637062062705ms][info ][gc,phases   ] GC(222) Concurrent Process Non-Strong References 39.443ms
> 
> 
> Example logs from G1 where the Java threads fixes the cleaned inline caches and run out of ICStubs:
> 
> [125.998s][1637065197322ms][info ][gc          ] GC(1040) Pause Remark 586M->414M(2048M) 6.609ms
> [125.998s][1637065197322ms][info ][gc,cpu      ] GC(1040) User=0.08s Sys=0.00s Real=0.01s
> [125.998s][1637065197322ms][info ][safepoint   ] Safepoint "G1Concurrent", Time since last: 33150646 ns, Reaching safepoint: 103457 ns, At safepoint: 6666988 ns, Total: 6770445 ns
> [125.998s][1637065197322ms][info ][gc,marking  ] GC(1040) Concurrent Mark 38.296ms
> [125.998s][1637065197322ms][info ][gc,marking  ] GC(1040) Concurrent Rebuild Remembered Sets
> [126.001s][1637065197326ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 2621782 ns, Reaching safepoint: 626684 ns, At safepoint: 9340 ns, Total: 636024 ns
> [126.002s][1637065197326ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 19949 ns, Reaching safepoint: 714022 ns, At safepoint: 12160 ns, Total: 726182 ns
> [126.007s][1637065197331ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 4665009 ns, Reaching safepoint: 339751 ns, At safepoint: 9640 ns, Total: 349391 ns
> [126.009s][1637065197334ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 2274802 ns, Reaching safepoint: 365760 ns, At safepoint: 9250 ns, Total: 375010 ns
> [126.027s][1637065197352ms][info ][gc,marking  ] GC(1040) Concurrent Rebuild Remembered Sets 29.618ms
> 
> 
> I've tested the performance of the change with SPECjbb2015, SPECjvm2008, DaCapo, Renaissance.
> 
> I've tested run the patch through tier1-7.
> 
> Note that I've made patch as small as possible to make it easier to backport. Thanks @fisk for discussion and explanation of the inline caches code.

I've updated the patch with a comment. Note that we perform a is_clean() check at the top of the function, so we know that the IC is not "clean" at the new return line.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6450

From eosterlund at openjdk.java.net  Thu Nov 18 15:40:38 2021
From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=)
Date: Thu, 18 Nov 2021 15:40:38 GMT
Subject: RFR: 8277212: GC accidentally cleans valid megamorphic vtable
 inline caches [v2]
In-Reply-To: <hxvf5rgvpTrlGwzCyqz2rbGsJD5kNuz3U8_9iVHRh8I=.2f6ce421-2f91-4f0b-a722-3b7ac1346ebd@github.com>
References: <9wD0oJ2P5bG1qYJ71qdCN4-Q_fiOkKWh4kXTdw8Yb8o=.fc8b7be9-7818-4414-9177-542e2b061480@github.com>
 <hxvf5rgvpTrlGwzCyqz2rbGsJD5kNuz3U8_9iVHRh8I=.2f6ce421-2f91-4f0b-a722-3b7ac1346ebd@github.com>
Message-ID: <mWrJFR1V5RpCKxQ7HvfHkHArenrdSaLSHxd9BqTxRk4=.b57ca88e-fa4d-455f-8b4a-1a6e84b0fe7c@github.com>

On Thu, 18 Nov 2021 15:26:10 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:

>> We got a report on the zgc-dev list about a large performance issue affecting ZGC:
>> https://mail.openjdk.java.net/pipermail/zgc-dev/2021-November/001086.html
>> 
>> One of the issues that the reporter identified was that we could get extremely long class unloading / unlinking times:
>> 
>> [17606.140s][info][gc,phases ] GC(719) Concurrent Process Non-Strong References 25781.928ms
>> [17610.181s][info][gc,stats ] Subphase: Concurrent Classes Unlink 14280.772 / 25769.511 1126.563 / 25769.511 217.882 / 68385.750 217.882 / 68385.750 ms
>> 
>> and while this were happening we got a huge number of ICBufferFull safepoints.
>> 
>> It turns out that we have a 10-year-old bug in the inline cache cleaning code. This code came in with the permgen removal. See how the original code only calls set_to_clean when ic_oop is non-null:
>> 
>> https://github.com/openjdk/jdk/commit/5c58d27aac7b291b879a7a3ff6f39fca25619103
>> 
>>         CompiledIC *ic = CompiledIC_at(iter.reloc());
>>         oop ic_oop = ic->cached_oop();
>>         if (ic_oop != NULL && !is_alive->do_object_b(ic_oop)) {
>>           // The only exception is compiledICHolder oops which may
>>           // yet be marked below. (We check this further below).
>>           if (ic_oop->is_compiledICHolder()) {
>>             compiledICHolderOop cichk_oop = compiledICHolderOop(ic_oop);
>>             if (is_alive->do_object_b(
>>                   cichk_oop->holder_method()->method_holder()) &&
>>                 is_alive->do_object_b(cichk_oop->holder_klass())) {
>>               continue;
>>             }
>>           }
>>           ic->set_to_clean();
>>           assert(ic->cached_oop() == NULL,
>>                  "cached oop in IC should be cleared");
>>         }
>>       }
>> 
>> 
>> The rewritten code put the set_to_clean call in a different scope, causing the CompiledIC to also be cleaned when ic_oop is NULL:
>> 
>>         CompiledIC *ic = CompiledIC_at(iter.reloc());
>>         if (ic->is_icholder_call()) {
>>           // The only exception is compiledICHolder oops which may
>>           // yet be marked below. (We check this further below).
>>           CompiledICHolder* cichk_oop = ic->cached_icholder();
>>           if (cichk_oop->holder_method()->method_holder()->is_loader_alive(is_alive) &&
>>               cichk_oop->holder_klass()->is_loader_alive(is_alive)) {
>>               continue;
>>             }
>>         } else {
>>           Metadata* ic_oop = ic->cached_metadata();
>>           if (ic_oop != NULL) {
>>             if (ic_oop->is_klass()) {
>>               if (((Klass*)ic_oop)->is_loader_alive(is_alive)) {
>>                 continue;
>>               }
>>             } else if (ic_oop->is_method()) {
>>               if (((Method*)ic_oop)->method_holder()->is_loader_alive(is_alive)) {
>>                 continue;
>>               }
>>             } else {
>>               ShouldNotReachHere();
>>             }
>>           }
>>           }
>>           ic->set_to_clean();
>>       }
>> 
>> 
>> Note the weird indentation, which could be seen as a hint that this might be a dubious / accidental change.
>> 
>> To understand why this is causing the problems we are seeing it's good to start by reading:
>> https://wiki.openjdk.java.net/display/HotSpot/Overview+of+CompiledIC+and+CompiledStaticCall
>> 
>> When the GC hits this path and finds an ic_oop that is NULL, it means that it is dealing with an inline cache that is a megamorphic vtable call (or clean). Those should not be cleaned (at least that wasn't the intention of the old code).
>> 
>> But now we do clean them, and to do so we use an ICStub (to make a safe transition to the clean state), which uses up slots in the ICBuffer. When the ICBuffer is full, concurrent GCs have to stop and schedule an ICBufferFull safepoint stop-the-world operation, which removes the ICStub from the inline cache and completely frees up the ICBuffer. If the GC cleans a lot of these megamorphic vtable inline caches, then we'll create a large number of ICBufferFull safepoints.
>> 
>> But it is even worse than that. After the class unloading GCs have destroyed all megamorphic vtable inline caches, the Java threads will see these cleaned inline caches and correct them. Correcting the cleaned inline caches from the Java threads will also use ICStubs, and eventually the inline caches will transition back to be a megamorphic vtable calls. Because of this we can end up in a situation where the GC and Java threads change the inline cache back and forth between clean and megamorphic vtable calls. When this happen both GC and Java threads will continuously schedule ICBufferFull safepoints, and this can go on for many seconds, even minutes, if we are unlucky. For ZGC this has the effect that it blocks any further GC work, and eventually the Java threads will run out of memory and hit allocation stalls. The Java threads will then wait for the GC "clean" all inline caches and exit the class unloading phase and proceed to the phase where memory is reclaimed. You can see in
  the GC logs that even though the problematic unlinking phase goes on for many seconds, the allocation stalls are "only" a few hundred milliseconds. This shows that when the Java threads stop fighting over the inline caches, the GC can finish the work relatively quickly.
>> 
>> G1 performs the inline cache cleaning while the Java threads are stopped, and therefore don't have to use ICStubs when the megamorphic vtables are accidentally cleaned. So, G1 (and other stop-the-world class unloading GCs) won't enter the situation where the GC and Java thread concurrently fight over the inline caches. It still causes the Java threads to have to take a slow path and fix the inline caches, which can result in unnecessary ICBufferFull safepoints.
>> 
>> I been able to reproduce the issue where ZGC and the Java threads fight over the ICStubs, causing minute long unloading times, by running one of the microbenchmarks from the Blackbird library used by the reporter of this issue. See description in:
>> https://mail.openjdk.java.net/pipermail/zgc-dev/2021-November/001096.html
>> 
>> I think this could be reproduced in other workloads as well. I've also been able to reproduce the excessive ICBufferFull safepoints with Kitchensink (an oracle-internal stress test).
>> 
>> I've verified that restoring the set_to_clean code to the right scope fixes the issue that I can reproduce with both Blackbird and Kitchensink. After the fix, the class unloading times go back to normal levels.
>> 
>> To identify this issue, it's good to run with -Xlog:gc*,safepoint and take note of the "Concurrent Process Non-Strong References" times and ICBufferFull safepoint lines.
>> 
>> Example logs from ZGC where concurrent cleaning causes ICBufferFull safepoints:
>> 
>> [38.557s][1637062062666ms][info ][gc,phases   ] GC(222) Concurrent Mark Free 0.001ms
>> [38.565s][1637062062673ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 7389821 ns, Reaching safepoint: 167546 ns, At safepoint: 6840 ns, Total: 174386 ns
>> [38.565s][1637062062673ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 27749 ns, Reaching safepoint: 89368 ns, At safepoint: 5710 ns, Total: 95078 ns
>> [38.566s][1637062062674ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 678872 ns, Reaching safepoint: 145967 ns, At safepoint: 6969 ns, Total: 152936 ns
>> [38.567s][1637062062675ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 934596 ns, Reaching safepoint: 165826 ns, At safepoint: 5460 ns, Total: 171286 ns
>> [38.567s][1637062062675ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 16500 ns, Reaching safepoint: 91147 ns, At safepoint: 5770 ns, Total: 96917 ns
>> [38.568s][1637062062677ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1124041 ns, Reaching safepoint: 154426 ns, At safepoint: 6280 ns, Total: 160706 ns
>> [38.570s][1637062062678ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1222819 ns, Reaching safepoint: 152646 ns, At safepoint: 6920 ns, Total: 159566 ns
>> [38.571s][1637062062679ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1070303 ns, Reaching safepoint: 152686 ns, At safepoint: 6029 ns, Total: 158715 ns
>> [38.571s][1637062062679ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 23650 ns, Reaching safepoint: 83208 ns, At safepoint: 6170 ns, Total: 89378 ns
>> [38.572s][1637062062681ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1005014 ns, Reaching safepoint: 148206 ns, At safepoint: 5660 ns, Total: 153866 ns
>> [38.572s][1637062062681ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 15110 ns, Reaching safepoint: 84047 ns, At safepoint: 5690 ns, Total: 89737 ns
>> [38.574s][1637062062682ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1370755 ns, Reaching safepoint: 171876 ns, At safepoint: 5030 ns, Total: 176906 ns
>> [38.574s][1637062062682ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 19749 ns, Reaching safepoint: 82478 ns, At safepoint: 4740 ns, Total: 87218 ns
>> [38.574s][1637062062682ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 12480 ns, Reaching safepoint: 86707 ns, At safepoint: 5040 ns, Total: 91747 ns
>> [38.575s][1637062062684ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 904007 ns, Reaching safepoint: 162666 ns, At safepoint: 5160 ns, Total: 167826 ns
>> [38.575s][1637062062684ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14269 ns, Reaching safepoint: 80878 ns, At safepoint: 5420 ns, Total: 86298 ns
>> [38.577s][1637062062685ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1240908 ns, Reaching safepoint: 144267 ns, At safepoint: 7030 ns, Total: 151297 ns
>> [38.578s][1637062062686ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 971325 ns, Reaching safepoint: 175725 ns, At safepoint: 4710 ns, Total: 180435 ns
>> [38.578s][1637062062686ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 16140 ns, Reaching safepoint: 80258 ns, At safepoint: 5389 ns, Total: 85647 ns
>> [38.578s][1637062062686ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 10290 ns, Reaching safepoint: 80858 ns, At safepoint: 5530 ns, Total: 86388 ns
>> [38.579s][1637062062687ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 430509 ns, Reaching safepoint: 159906 ns, At safepoint: 4610 ns, Total: 164516 ns
>> [38.579s][1637062062687ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 18269 ns, Reaching safepoint: 83838 ns, At safepoint: 4520 ns, Total: 88358 ns
>> [38.579s][1637062062687ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 13270 ns, Reaching safepoint: 77928 ns, At safepoint: 4790 ns, Total: 82718 ns
>> [38.579s][1637062062688ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 384230 ns, Reaching safepoint: 193705 ns, At safepoint: 4080 ns, Total: 197785 ns
>> [38.579s][1637062062688ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14099 ns, Reaching safepoint: 80908 ns, At safepoint: 4840 ns, Total: 85748 ns
>> [38.580s][1637062062688ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 9150 ns, Reaching safepoint: 79268 ns, At safepoint: 4890 ns, Total: 84158 ns
>> [38.580s][1637062062689ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 549396 ns, Reaching safepoint: 143086 ns, At safepoint: 6430 ns, Total: 149516 ns
>> [38.580s][1637062062689ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 12540 ns, Reaching safepoint: 94717 ns, At safepoint: 5800 ns, Total: 100517 ns
>> [38.581s][1637062062690ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 847758 ns, Reaching safepoint: 146687 ns, At safepoint: 5969 ns, Total: 152656 ns
>> [38.582s][1637062062691ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 972285 ns, Reaching safepoint: 128177 ns, At safepoint: 6350 ns, Total: 134527 ns
>> [38.584s][1637062062692ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 986975 ns, Reaching safepoint: 136396 ns, At safepoint: 5770 ns, Total: 142166 ns
>> [38.584s][1637062062692ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 17280 ns, Reaching safepoint: 87097 ns, At safepoint: 5270 ns, Total: 92367 ns
>> [38.585s][1637062062693ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1143131 ns, Reaching safepoint: 188315 ns, At safepoint: 5250 ns, Total: 193565 ns
>> [38.585s][1637062062694ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 12200 ns, Reaching safepoint: 80168 ns, At safepoint: 7480 ns, Total: 87648 ns
>> [38.586s][1637062062695ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1153410 ns, Reaching safepoint: 166846 ns, At safepoint: 7060 ns, Total: 173906 ns
>> [38.587s][1637062062695ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 21549 ns, Reaching safepoint: 89898 ns, At safepoint: 5360 ns, Total: 95258 ns
>> [38.588s][1637062062696ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1129411 ns, Reaching safepoint: 156726 ns, At safepoint: 4810 ns, Total: 161536 ns
>> [38.588s][1637062062696ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14089 ns, Reaching safepoint: 80588 ns, At safepoint: 5170 ns, Total: 85758 ns
>> [38.589s][1637062062697ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 644824 ns, Reaching safepoint: 140666 ns, At safepoint: 5990 ns, Total: 146656 ns
>> [38.590s][1637062062699ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1085312 ns, Reaching safepoint: 254264 ns, At safepoint: 5440 ns, Total: 259704 ns
>> [38.590s][1637062062699ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14609 ns, Reaching safepoint: 83748 ns, At safepoint: 5610 ns, Total: 89358 ns
>> [38.591s][1637062062699ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 387680 ns, Reaching safepoint: 201215 ns, At safepoint: 5340 ns, Total: 206555 ns
>> [38.591s][1637062062699ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 18929 ns, Reaching safepoint: 85098 ns, At safepoint: 5910 ns, Total: 91008 ns
>> [38.591s][1637062062700ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 380750 ns, Reaching safepoint: 175066 ns, At safepoint: 4730 ns, Total: 179796 ns
>> [38.592s][1637062062700ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14500 ns, Reaching safepoint: 80577 ns, At safepoint: 6790 ns, Total: 87367 ns
>> [38.592s][1637062062700ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14660 ns, Reaching safepoint: 78498 ns, At safepoint: 7180 ns, Total: 85678 ns
>> [38.592s][1637062062701ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 655783 ns, Reaching safepoint: 141717 ns, At safepoint: 6089 ns, Total: 147806 ns
>> [38.594s][1637062062702ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 916657 ns, Reaching safepoint: 144226 ns, At safepoint: 5360 ns, Total: 149586 ns
>> [38.595s][1637062062703ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1012334 ns, Reaching safepoint: 133037 ns, At safepoint: 10439 ns, Total: 143476 ns
>> [38.597s][1637062062705ms][info ][gc,phases   ] GC(222) Concurrent Process Non-Strong References 39.443ms
>> 
>> 
>> Example logs from G1 where the Java threads fixes the cleaned inline caches and run out of ICStubs:
>> 
>> [125.998s][1637065197322ms][info ][gc          ] GC(1040) Pause Remark 586M->414M(2048M) 6.609ms
>> [125.998s][1637065197322ms][info ][gc,cpu      ] GC(1040) User=0.08s Sys=0.00s Real=0.01s
>> [125.998s][1637065197322ms][info ][safepoint   ] Safepoint "G1Concurrent", Time since last: 33150646 ns, Reaching safepoint: 103457 ns, At safepoint: 6666988 ns, Total: 6770445 ns
>> [125.998s][1637065197322ms][info ][gc,marking  ] GC(1040) Concurrent Mark 38.296ms
>> [125.998s][1637065197322ms][info ][gc,marking  ] GC(1040) Concurrent Rebuild Remembered Sets
>> [126.001s][1637065197326ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 2621782 ns, Reaching safepoint: 626684 ns, At safepoint: 9340 ns, Total: 636024 ns
>> [126.002s][1637065197326ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 19949 ns, Reaching safepoint: 714022 ns, At safepoint: 12160 ns, Total: 726182 ns
>> [126.007s][1637065197331ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 4665009 ns, Reaching safepoint: 339751 ns, At safepoint: 9640 ns, Total: 349391 ns
>> [126.009s][1637065197334ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 2274802 ns, Reaching safepoint: 365760 ns, At safepoint: 9250 ns, Total: 375010 ns
>> [126.027s][1637065197352ms][info ][gc,marking  ] GC(1040) Concurrent Rebuild Remembered Sets 29.618ms
>> 
>> 
>> I've tested the performance of the change with SPECjbb2015, SPECjvm2008, DaCapo, Renaissance.
>> 
>> I've tested run the patch through tier1-7.
>> 
>> Note that I've made patch as small as possible to make it easier to backport. Thanks @fisk for discussion and explanation of the inline caches code.
>
> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Review Coleen

Marked as reviewed by eosterlund (Reviewer).

-------------

PR: https://git.openjdk.java.net/jdk/pull/6450

From psandoz at openjdk.java.net  Thu Nov 18 16:20:49 2021
From: psandoz at openjdk.java.net (Paul Sandoz)
Date: Thu, 18 Nov 2021 16:20:49 GMT
Subject: RFR: 8275643: C2's unaryOp vector intrinsic does not properly
 handle LongVector.neg
In-Reply-To: <fuvWYYcAS3lMEbupsHhAB7L5_r_OkLI8kL2EqOxVExo=.def40272-6820-46fa-b484-8f45afe4023e@github.com>
References: <VIWd1DGe48UKFJdi61wqQqYOOuFed9_yEgqMaz1444k=.43162c19-75d4-4806-81dc-29ad772d7155@github.com>
 <fuvWYYcAS3lMEbupsHhAB7L5_r_OkLI8kL2EqOxVExo=.def40272-6820-46fa-b484-8f45afe4023e@github.com>
Message-ID: <_ixIA9gCfmBYP8l9ZK121Z52eAnTpWE8nMPFEmY8oTA=.97c5f64b-8571-4ecc-a05a-a0f0c468c471@github.com>

On Wed, 17 Nov 2021 22:54:06 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:

>> Code in `LongVector::lanewiseTemplate` currently implements the `NEG` operation as a `SUB` and has a corresponding `FIXME` comment:
>> https://github.com/openjdk/jdk/blob/e9934e1243929514e147ecdd3cefa74168ed0500/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/LongVector.java#L534-L541
>> 
>> The implicit assumption is that since we will never pass `NEG` to `VectorSupport.unaryOp` in line 540, the corresponding C2 intrinsic does not need to handle that case. That's not guaranteed though because C2 might still compile that path when not being able to prove that it's unreachable at parse time. As a result, we then assert in the intrinsic because the negation operation on a long vector is currently not supported (i.e. there is no `Op_NegVL`). I propose to simply handle this case in ` VectorSupport::vop2ideal`. We will then bail out from intrinsification with `operation not supported: opc=NegL bt=long` because `VectorNode::opcode` returns 0:
>> https://github.com/openjdk/jdk/blob/e9934e1243929514e147ecdd3cefa74168ed0500/src/hotspot/share/opto/vectorIntrinsics.cpp#L390-L394
>> 
>> Question to the Vector API experts: There are other `FIXME: Support this in the JIT` comments in the code. Do these code paths suffer from similar issues? Is there a tracking RFE/bug?
>> 
>> Thanks,
>> Tobias
>
> Looks good to me.

@sviswa7 @jatin-bhateja any thoughts on the other related FIXMEs brought up by Tobias? e.g.


            if (op == AND_NOT) {
                // FIXME: Support this in the JIT.
                that = that.lanewise(NOT);
                op = AND;

-------------

PR: https://git.openjdk.java.net/jdk/pull/6428

From sviswanathan at openjdk.java.net  Thu Nov 18 16:35:40 2021
From: sviswanathan at openjdk.java.net (Sandhya Viswanathan)
Date: Thu, 18 Nov 2021 16:35:40 GMT
Subject: RFR: 8275643: C2's unaryOp vector intrinsic does not properly
 handle LongVector.neg
In-Reply-To: <_ixIA9gCfmBYP8l9ZK121Z52eAnTpWE8nMPFEmY8oTA=.97c5f64b-8571-4ecc-a05a-a0f0c468c471@github.com>
References: <VIWd1DGe48UKFJdi61wqQqYOOuFed9_yEgqMaz1444k=.43162c19-75d4-4806-81dc-29ad772d7155@github.com>
 <fuvWYYcAS3lMEbupsHhAB7L5_r_OkLI8kL2EqOxVExo=.def40272-6820-46fa-b484-8f45afe4023e@github.com>
 <_ixIA9gCfmBYP8l9ZK121Z52eAnTpWE8nMPFEmY8oTA=.97c5f64b-8571-4ecc-a05a-a0f0c468c471@github.com>
Message-ID: <xnJtxfxLkooRvrGw9xZiRvOYcpid4HpANmJG_NveTnQ=.0295f7c4-d06b-40cb-88c4-0cd48813f5ef@github.com>

On Thu, 18 Nov 2021 16:18:10 GMT, Paul Sandoz <psandoz at openjdk.org> wrote:

>> Looks good to me.
>
> @sviswa7 @jatin-bhateja any thoughts on the other related FIXMEs brought up by Tobias? e.g.
> 
> 
>             if (op == AND_NOT) {
>                 // FIXME: Support this in the JIT.
>                 that = that.lanewise(NOT);
>                 op = AND;

@PaulSandoz Those fixme notes are from John, pointing to us where further optimizations are possible and not related to correctness. I also looked at the vop2ideal, it now handles all the opcodes for the relevant data types (inegral/fp).

-------------

PR: https://git.openjdk.java.net/jdk/pull/6428

From zgu at openjdk.java.net  Thu Nov 18 16:45:41 2021
From: zgu at openjdk.java.net (Zhengyu Gu)
Date: Thu, 18 Nov 2021 16:45:41 GMT
Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks [v3]
In-Reply-To: <ar7S0Y42f-xf1ii4ntYtOIEd5Nz0_W6wCMDoJD0x-S0=.75583559-34d4-4362-9ff7-b4a8c41d31dc@github.com>
References: <h_TohyRCtTn8BzN-h8uTTotWxPr5gTe4CWB04_lS8uA=.3f687b45-3058-42ab-902b-f42cfeb9104d@github.com>
 <ar7S0Y42f-xf1ii4ntYtOIEd5Nz0_W6wCMDoJD0x-S0=.75583559-34d4-4362-9ff7-b4a8c41d31dc@github.com>
Message-ID: <pB39u7JZ0RkQtc1hbQZAbsemG2Mx5E_21V7rwyjIN-A=.b362477e-58cd-4cb8-9955-fceeeb7ec5db@github.com>

On Thu, 18 Nov 2021 15:25:15 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> This is part of a number of RFE I plan to improve and simplify C-heap overflow checking in hotspot.
>> 
>> This proposal adds NMT buffer overflow checking:
>> 
>> - it gives us C-heap overflow checking in release builds
>> - the costs are neglectable: if NMT is off, we won't pay anything; if NMT is on, the added work is minuscule since we have to do malloc header management anyway.
>> - NMT needs intact headers anyway. Faced with buffer overwrites today, it would maybe crash or maybe account wrongly, but it's a bit of a lottery really. Better to go the extra step and do a real check.
>> - it could be a preparation for future code removal, if we wanted to do that (see details in umbrella RFE https://bugs.openjdk.java.net/browse/JDK-8275301). That way, net complexity would come down even with this patch.
>> 
>> For more details, please see the JBS issue.
>> 
>> ----
>> 
>> Patch notes:
>> 
>> - The malloc header is changed such that it contains a 16-bit canary directly preceding the user payload of the allocation. The new malloc header does not use bitfields anymore but normal types. For more details, see the comment in mallocTracker.hpp.
>>   - On 64-bit, we don't enlarge the malloc header. It remains 16 bytes in length. So no additional memory cost (apart from the 1-byte-footer, see below). Space for the canary is instead obtained by reducing the size of the bucket index bit field to 16 bits. That bit field is used to store the bucket slot index of the malloc site table in NMT detail mode. With 40 bits it was over-dimensioned, and even 16-bits arguably still are: malloc site table width is 512.
>>   - On 32-bit, I had to enlarge the header from 8 bytes to 16 bytes to make room for a canary. But strictly speaking 8 bytes were not enough anyway: the header size has to be large enough to satisfy malloc(3) alignment, and that would be 16 bytes. I believe it never led to an error since we don't store 128bit data in malloc'd memory in the hotspot anywhere.
>> 
>> - I added a footer canary trailing the user allocation to catch tail buffer overruns. To keep matters simple (alignment) I made it a single byte only. That is enough to catch most overrun scenarios.
>> 
>> - I brushed up error reporting. When NMT detects corruption, it will now print out a hex dump of the corrupted area to tty before asserting.
>> 
>> - I added a bunch of gtests to test various heap overwrite scenarios. I also had to extend the gtest macros a bit because I wanted these tests of course to run in release builds too, but we did not have a death test macro for release builds yet (there are possibilities for code simplification here too, but that's for another RFE).
>> 
>> - I renamed `nmt_header_size` to `nmt_overhead` since that size includes header and footer now.
>> 
>> - I made the assert for malloc site table width a compile time STATIC_ASSERT.
>> 
>> --------------
>> 
>> Example output a buffer overrun would provide:
>> 
>> 
>> Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)
>> NMT Block at 0x00005600f86136b0, corruption at: 0x00005600f86136c1: 
>> 0x00005600f86136a8:   21 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00
>> 0x00005600f86136b8:   00 00 00 00 0f 00 1f fa 00 61 00 00 00 00 00 00
>> 0x00005600f86136c8:   41 39 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f86136d8:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
>> 0x00005600f86136e8:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f86136f8:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f8613708:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f8613718:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f8613728:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f8613738:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
>> assert failed: fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)#
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> #  Internal Error (mallocTracker.cpp:203), pid=10805, tid=10805
>> #  fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)
>> #
>> 
>> -------
>> 
>> Tests:
>> - manual tests with Linux x64, x86, minimal build
>> - GHAs all clean
>> - SAP nightlies ran for 4 weeks now without problems
>
> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Feedback Volker

Changes requested by zgu (Reviewer).

src/hotspot/share/runtime/os.cpp line 750:

> 748:   return MemTracker::record_malloc(ptr, size, memflags, stack, level);
> 749: #else
> 750:   if (memblock == NULL) {

I think you also need to subtract malloc_footer_size when calculating memblock_size below. Otherwise, memcpy can overwrite the footer.

I wonder should just consolidate malloc_header_size and malloc_footer_size to one malloc_overhead? I don't see them used separately.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5952

From psandoz at openjdk.java.net  Thu Nov 18 16:49:45 2021
From: psandoz at openjdk.java.net (Paul Sandoz)
Date: Thu, 18 Nov 2021 16:49:45 GMT
Subject: RFR: 8275643: C2's unaryOp vector intrinsic does not properly
 handle LongVector.neg
In-Reply-To: <_ixIA9gCfmBYP8l9ZK121Z52eAnTpWE8nMPFEmY8oTA=.97c5f64b-8571-4ecc-a05a-a0f0c468c471@github.com>
References: <VIWd1DGe48UKFJdi61wqQqYOOuFed9_yEgqMaz1444k=.43162c19-75d4-4806-81dc-29ad772d7155@github.com>
 <fuvWYYcAS3lMEbupsHhAB7L5_r_OkLI8kL2EqOxVExo=.def40272-6820-46fa-b484-8f45afe4023e@github.com>
 <_ixIA9gCfmBYP8l9ZK121Z52eAnTpWE8nMPFEmY8oTA=.97c5f64b-8571-4ecc-a05a-a0f0c468c471@github.com>
Message-ID: <7weUFVLrZV9GXlxUfOyRekglxGisX-5jl6cxm0KoovY=.5e8165a9-7b9c-4493-abca-a339526eed8a@github.com>

On Thu, 18 Nov 2021 16:18:10 GMT, Paul Sandoz <psandoz at openjdk.org> wrote:

>> Looks good to me.
>
> @sviswa7 @jatin-bhateja any thoughts on the other related FIXMEs brought up by Tobias? e.g.
> 
> 
>             if (op == AND_NOT) {
>                 // FIXME: Support this in the JIT.
>                 that = that.lanewise(NOT);
>                 op = AND;

> @PaulSandoz Those fixme notes are from John, pointing to us where further optimizations are possible and not related to correctness. I also looked at the vop2ideal, it now handles all the opcodes for the relevant data types (inegral/fp).

Thanks, i also looked at `vop2ideal` and concluded the same.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6428

From aph at openjdk.java.net  Thu Nov 18 16:55:43 2021
From: aph at openjdk.java.net (Andrew Haley)
Date: Thu, 18 Nov 2021 16:55:43 GMT
Subject: RFR: 8277168: AArch64: Enable arraycopy partial inlining with SVE
In-Reply-To: <mEbx_v-4vc6R1-sFvt6ws-wCvBc7uLDP66MFhmBmGi8=.a1160df8-9da5-4879-8a7c-c77fe568fc09@github.com>
References: <Rv-7zsL9uLaSjwoS7JIDgZ2-HhIoX00l8aCiDmFd6qw=.b1dbbb3b-0578-4d7f-97ba-1acd3baaa74c@github.com>
 <mEbx_v-4vc6R1-sFvt6ws-wCvBc7uLDP66MFhmBmGi8=.a1160df8-9da5-4879-8a7c-c77fe568fc09@github.com>
Message-ID: <b8Kypc9_7OOYvjLWRKw4bVtl60AkQOXEWDwS9AH32a4=.b37b033a-d98b-4fa9-bd3b-719bc98981f1@github.com>

On Thu, 18 Nov 2021 13:53:11 GMT, Andrew Haley <aph at openjdk.org> wrote:

> I'm baffled by what is going on.

Sorry, it looks like I managed to confuse myself. The top of the loop looks like:

    10c     B17: #      out( B18 ) &lt;- in( B27 )  Freq: 4.49963
    10c     # castLL of R2
    10c     sve_whilelo P0, zr, R2       # sve
    110     sve_ldr V16, P0, [R0]       # load vector predicated (sve)
    114     sve_str [R1], P0, V16       # store vector predicated (sve)
    118     B18: #      out( B30 B19 ) &lt;- in( B17 B28 B26 )  Freq: 8.99927
    118
    118     ldarb  R10, [R23]   # byte ! Field: volatile org/openjdk/jmh/runner/InfraControlL2.isDone

... and the bottom

    1a0     cmp  R2, #64
    1a4     bls  B17    # unsigned  P=0.500000 C=-1.000000
    1a8     B28: #      out( B18 ) &lt;- in( B27 )  Freq: 4.49963
    1a8     CALL, runtime leaf nofp 0x0000ffff6d1058f8 jbyte_arraycopy
            No JVM State Info
            #
    1b0     b  B18

So only if the length is < 64 (i.e. 512 bits) do we branch back to B17 to do the `SVE WHILELO` to set the predicate. This is confusing only because the code has been rearranged so that the test for < 64 bytes is at the bottom of the loop.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6444

From simonis at openjdk.java.net  Thu Nov 18 17:09:42 2021
From: simonis at openjdk.java.net (Volker Simonis)
Date: Thu, 18 Nov 2021 17:09:42 GMT
Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks [v2]
In-Reply-To: <UVTzs12JWIAKnp3QTOLAub9Jmdi754jaoob2TpRWp2M=.4269405a-4dc1-4740-96b9-a6e1c87a1695@github.com>
References: <h_TohyRCtTn8BzN-h8uTTotWxPr5gTe4CWB04_lS8uA=.3f687b45-3058-42ab-902b-f42cfeb9104d@github.com>
 <GZdP4Skyrnhyww0vSmutTLOwvYW3tCjigaC9lQKTfd8=.fda271b3-06aa-4603-b1c1-fa3f52400be5@github.com>
 <qiFO3w9PJClRnvSSkPrqIy5MJj5L4klQZBEcK32o0qc=.a00f1ec9-e103-4e7a-ba02-6f0b0e1a4163@github.com>
 <UVTzs12JWIAKnp3QTOLAub9Jmdi754jaoob2TpRWp2M=.4269405a-4dc1-4740-96b9-a6e1c87a1695@github.com>
Message-ID: <F4mxz-DjcmNmk61E7IXdA25iCWyzIXCLnMKouJOJ-XA=.fd20734f-23f2-4c7a-aa43-cba5a91941c5@github.com>

On Thu, 18 Nov 2021 14:16:12 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> src/hotspot/share/services/mallocTracker.hpp line 314:
>> 
>>> 312:   static const uint8_t  _footer_canary_dead_mark = 0xFB;
>>> 313:   NOT_LP64(static const uint32_t _header_alt_canary_life_mark = 0xFAFA1F1F;)
>>> 314:   NOT_LP64(static const uint32_t _header_alt_canary_dead_mark = 0xFBFB1F1F;)
>> 
>> Just out of interest, how did you choose these canary marks? Is there some evidence that they appear less frequently in real code/data than other values?
>
> I did an extensive statistical analysis of many core dumps.
> 
> ...
> 
> ...
> 
> Just kidding, I chose them on a whim to be not zero :) Do you have a better suggestion? I thought about making them ASCII pattern, but those are actually more common in payload data.

I was just thinking of the usual suspects like 0xcafebabe, 0xbaadbabe or 0xdeadbeef because that would simplify the detection of these markers in core dumps, hs_err files or during debugging. But I'm fine with whatever you choose :)

-------------

PR: https://git.openjdk.java.net/jdk/pull/5952

From dcubed at openjdk.java.net  Thu Nov 18 17:18:39 2021
From: dcubed at openjdk.java.net (Daniel D.Daugherty)
Date: Thu, 18 Nov 2021 17:18:39 GMT
Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with
 "assert(java_thread == _state->get_thread()) failed: Must be" [v3]
In-Reply-To: <Ivox2R4Y3x6BDgrGBt-QYm2grMJN5XWeetESBRyYa38=.3df670b6-2633-4a89-adac-2eb21f108689@github.com>
References: <PqDFsCk3WFtK9ZA-Blegiu6CzrA7EoiG6jx0raAFF7A=.5ddd8298-66cb-47c4-b53d-ca148ce2fc5b@github.com>
 <Ivox2R4Y3x6BDgrGBt-QYm2grMJN5XWeetESBRyYa38=.3df670b6-2633-4a89-adac-2eb21f108689@github.com>
Message-ID: <Io5zyME4R4fiSdR9bV91VhvMzfQoETTuGYG6gRQEimc=.35f5f16a-0848-4b93-bd84-0c41cc5a6df3@github.com>

On Thu, 18 Nov 2021 09:34:13 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote:

>> The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa.
>> The fix is to add a check for is_exiting() status into handshake closure do_thread() early.
>> There following handshake closures are fixed by this update:
>>   - UpdateForPopTopFrameClosure
>>  - SetForceEarlyReturn
>>  - SetFramePopClosure
>
> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision:
> 
>   get rid of the checks in jvmti handshakes: java_thread->threadObj() == NULL

src/hotspot/share/prims/jvmtiEnvBase.cpp line 1533:

> 1531:     return; /* JVMTI_ERROR_THREAD_NOT_ALIVE (default) */
> 1532:   }
> 1533:   assert(java_thread == _state->get_thread(), "Must be");

This `assert()` is the site of the original test failure. I haven't yet
looked at the locations of the other changes.

The `is_exiting()` check is made under the protection of the
`JvmtiThreadState_lock` so an unsuspended target thread that is
exiting cannot reach the point where the `_state` is updated to
clear the `JavaThread*` so we can't fail the `assert()` if the
`is_exiting()` check has returned `false`.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6440

From nradomski at openjdk.java.net  Thu Nov 18 17:22:28 2021
From: nradomski at openjdk.java.net (Niklas Radomski)
Date: Thu, 18 Nov 2021 17:22:28 GMT
Subject: RFR: 8276927: [PPC64] Port shenandoahgc to linux on ppc64le [v2]
In-Reply-To: <Nt7KC3zC3ERbpcjIda_FrsJYNuDtzbd_khX75A1x4aE=.676a48b5-c7f4-4798-a3b8-516bb60dfaa4@github.com>
References: <Nt7KC3zC3ERbpcjIda_FrsJYNuDtzbd_khX75A1x4aE=.676a48b5-c7f4-4798-a3b8-516bb60dfaa4@github.com>
Message-ID: <-yZ-CH3Zdp9FbqRbcKA908KaXU7dWIuzy_SCyMluDr4=.31797995-8447-4902-89bd-693948925e49@github.com>

> Port the Shenandoah garbage collector (JDK-8241457)[https://bugs.openjdk.java.net/browse/JDK-8241457] to linux on ppc64le.

Niklas Radomski has updated the pull request incrementally with one additional commit since the last revision:

  Remove debug clobber code

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6325/files
  - new: https://git.openjdk.java.net/jdk/pull/6325/files/1dec8885..c504b66d

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6325&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6325&range=00-01

  Stats: 5 lines in 1 file changed: 0 ins; 5 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6325.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6325/head:pull/6325

PR: https://git.openjdk.java.net/jdk/pull/6325

From dcubed at openjdk.java.net  Thu Nov 18 17:26:38 2021
From: dcubed at openjdk.java.net (Daniel D.Daugherty)
Date: Thu, 18 Nov 2021 17:26:38 GMT
Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with
 "assert(java_thread == _state->get_thread()) failed: Must be" [v3]
In-Reply-To: <Ivox2R4Y3x6BDgrGBt-QYm2grMJN5XWeetESBRyYa38=.3df670b6-2633-4a89-adac-2eb21f108689@github.com>
References: <PqDFsCk3WFtK9ZA-Blegiu6CzrA7EoiG6jx0raAFF7A=.5ddd8298-66cb-47c4-b53d-ca148ce2fc5b@github.com>
 <Ivox2R4Y3x6BDgrGBt-QYm2grMJN5XWeetESBRyYa38=.3df670b6-2633-4a89-adac-2eb21f108689@github.com>
Message-ID: <UaxiSmd-pk9zw_gIc1aU53VdcCXkJNUkyySPxOaOxyw=.78569f8a-9402-4008-94b7-010f51f0bd9a@github.com>

On Thu, 18 Nov 2021 09:34:13 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote:

>> The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa.
>> The fix is to add a check for is_exiting() status into handshake closure do_thread() early.
>> There following handshake closures are fixed by this update:
>>   - UpdateForPopTopFrameClosure
>>  - SetForceEarlyReturn
>>  - SetFramePopClosure
>
> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision:
> 
>   get rid of the checks in jvmti handshakes: java_thread->threadObj() == NULL

I don't see a reason for the change in `SetForceEarlyReturn::doit()`,
but I'm okay with the other changes.

src/hotspot/share/prims/jvmtiEnvBase.cpp line 1401:

> 1399:   if (!self) {
> 1400:     if (!java_thread->is_suspended()) {
> 1401:       _result = JVMTI_ERROR_THREAD_NOT_SUSPENDED;

I don't see an obvious reason for this `is_exiting()` check.

src/hotspot/share/prims/jvmtiEnvBase.cpp line 1625:

> 1623:     return; /* JVMTI_ERROR_THREAD_NOT_ALIVE (default) */
> 1624:   }
> 1625:   assert(_state->get_thread() == java_thread, "Must be");

The `assert()` on L1625 is subject to the same race as the original site.
This `is_exiting()` check is made under the protection of the
`JvmtiThreadState_lock` so it is sufficient to protect that `assert()`.

-------------

Changes requested by dcubed (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6440

From aph at openjdk.java.net  Thu Nov 18 17:27:41 2021
From: aph at openjdk.java.net (Andrew Haley)
Date: Thu, 18 Nov 2021 17:27:41 GMT
Subject: RFR: 8277168: AArch64: Enable arraycopy partial inlining with SVE
In-Reply-To: <Rv-7zsL9uLaSjwoS7JIDgZ2-HhIoX00l8aCiDmFd6qw=.b1dbbb3b-0578-4d7f-97ba-1acd3baaa74c@github.com>
References: <Rv-7zsL9uLaSjwoS7JIDgZ2-HhIoX00l8aCiDmFd6qw=.b1dbbb3b-0578-4d7f-97ba-1acd3baaa74c@github.com>
Message-ID: <KVuc3bhu_gny271MhjP2gEOa6rtcoYlzw6fPkGRhRqc=.a79e2cc2-b39c-4083-aef7-ff6fd4b457de@github.com>

On Thu, 18 Nov 2021 03:50:45 GMT, Pengfei Li <pli at openjdk.org> wrote:

> Arraycopy partial inlining is a C2 compiler technique that avoids stub
> call overhead in small-sized arraycopy operations by generating masked
> vector instructions. So far it works on x86 AVX512 only and this patch
> enables it on AArch64 with SVE.
> 
> We add AArch64 matching rule for VectorMaskGenNode and refactor that
> node a little bit. The major change is moving the element type field
> into its TypeVectMask bottom type. The reason is that AArch64 vector
> masks are different for different vector element types.
> 
> E.g., an x86 AVX512 vector mask value masking 3 least significant vector
> lanes (of any type) is like
> 
> `0000 0000 ... 0000 0000 0000 0000 0111`
> 
> On AArch64 SVE, this mask value can only be used for masking the 3 least
> significant lanes of bytes. But for 3 lanes of ints, the value should be
> 
> `0000 0000 ... 0000 0000 0001 0001 0001`
> 
> where the least significant bit of each lane matters. So AArch64 matcher
> needs to know the vector element type to generate right masks.
> 
> After this patch, the C2 generated code for copying a 50-byte array on
> AArch64 SVE looks like
> 
>   mov     x12, #0x32
>   whilelo p0.b, xzr, x12
>   add     x11, x11, #0x10
>   ld1b    {z16.b}, p0/z, [x11]
>   add     x10, x10, #0x10
>   st1b    {z16.b}, p0, [x10]
> 
> We ran jtreg hotspot::hotspot_all, jdk::tier1~3 and langtools::tier1 on
> both x86 AVX512 and AArch64 SVE machines, no issue is found. We tested
> JMH org/openjdk/bench/java/lang/ArrayCopyAligned.java with small array
> size arguments on a 512-bit SVE-featured CPU. We got below performance
> data changes.
> 
> Benchmark                  (length)  (Performance)
> ArrayCopyAligned.testByte        10          -2.6%
> ArrayCopyAligned.testByte        20          +4.7%
> ArrayCopyAligned.testByte        30          +4.8%
> ArrayCopyAligned.testByte        40         +21.7%
> ArrayCopyAligned.testByte        50         +22.5%
> ArrayCopyAligned.testByte        60         +28.4%
> 
> The test machine has SVE vector size of 512 bits, so we see performance
> gain for most array sizes less than 64 bytes. For very small arrays we
> see a bit regression because a vector load/store may be a bit slower
> than 1 or 2 scalar loads/stores.

Hurrah! I have managed to duplicate your results.

Old:

Benchmark                       (length)  Mode  Cnt   Score   Error  Units
ArrayCopyAligned.testByte             40  avgt    5  23.332 ? 0.016  ns/op


New:

ArrayCopyAligned.testByte             40  avgt    5  18.092 ? 0.093  ns/op


... and in fact your result is much better than this suggests, because the bulk of the test is fetching all of the arguments to arraycopy, not actually copying the bytes. I get it now.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6444

From coleenp at openjdk.java.net  Thu Nov 18 17:34:42 2021
From: coleenp at openjdk.java.net (Coleen Phillimore)
Date: Thu, 18 Nov 2021 17:34:42 GMT
Subject: RFR: 8277212: GC accidentally cleans valid megamorphic vtable
 inline caches [v2]
In-Reply-To: <hxvf5rgvpTrlGwzCyqz2rbGsJD5kNuz3U8_9iVHRh8I=.2f6ce421-2f91-4f0b-a722-3b7ac1346ebd@github.com>
References: <9wD0oJ2P5bG1qYJ71qdCN4-Q_fiOkKWh4kXTdw8Yb8o=.fc8b7be9-7818-4414-9177-542e2b061480@github.com>
 <hxvf5rgvpTrlGwzCyqz2rbGsJD5kNuz3U8_9iVHRh8I=.2f6ce421-2f91-4f0b-a722-3b7ac1346ebd@github.com>
Message-ID: <hXPXBIi-iRbX0UJk4-RkFtfagbHnHFIHASmK8uuhGno=.39e6b786-a027-412f-b48a-bca5a6ca86ab@github.com>

On Thu, 18 Nov 2021 15:26:10 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:

>> We got a report on the zgc-dev list about a large performance issue affecting ZGC:
>> https://mail.openjdk.java.net/pipermail/zgc-dev/2021-November/001086.html
>> 
>> One of the issues that the reporter identified was that we could get extremely long class unloading / unlinking times:
>> 
>> [17606.140s][info][gc,phases ] GC(719) Concurrent Process Non-Strong References 25781.928ms
>> [17610.181s][info][gc,stats ] Subphase: Concurrent Classes Unlink 14280.772 / 25769.511 1126.563 / 25769.511 217.882 / 68385.750 217.882 / 68385.750 ms
>> 
>> and while this were happening we got a huge number of ICBufferFull safepoints.
>> 
>> It turns out that we have a 10-year-old bug in the inline cache cleaning code. This code came in with the permgen removal. See how the original code only calls set_to_clean when ic_oop is non-null:
>> 
>> https://github.com/openjdk/jdk/commit/5c58d27aac7b291b879a7a3ff6f39fca25619103
>> 
>>         CompiledIC *ic = CompiledIC_at(iter.reloc());
>>         oop ic_oop = ic->cached_oop();
>>         if (ic_oop != NULL && !is_alive->do_object_b(ic_oop)) {
>>           // The only exception is compiledICHolder oops which may
>>           // yet be marked below. (We check this further below).
>>           if (ic_oop->is_compiledICHolder()) {
>>             compiledICHolderOop cichk_oop = compiledICHolderOop(ic_oop);
>>             if (is_alive->do_object_b(
>>                   cichk_oop->holder_method()->method_holder()) &&
>>                 is_alive->do_object_b(cichk_oop->holder_klass())) {
>>               continue;
>>             }
>>           }
>>           ic->set_to_clean();
>>           assert(ic->cached_oop() == NULL,
>>                  "cached oop in IC should be cleared");
>>         }
>>       }
>> 
>> 
>> The rewritten code put the set_to_clean call in a different scope, causing the CompiledIC to also be cleaned when ic_oop is NULL:
>> 
>>         CompiledIC *ic = CompiledIC_at(iter.reloc());
>>         if (ic->is_icholder_call()) {
>>           // The only exception is compiledICHolder oops which may
>>           // yet be marked below. (We check this further below).
>>           CompiledICHolder* cichk_oop = ic->cached_icholder();
>>           if (cichk_oop->holder_method()->method_holder()->is_loader_alive(is_alive) &&
>>               cichk_oop->holder_klass()->is_loader_alive(is_alive)) {
>>               continue;
>>             }
>>         } else {
>>           Metadata* ic_oop = ic->cached_metadata();
>>           if (ic_oop != NULL) {
>>             if (ic_oop->is_klass()) {
>>               if (((Klass*)ic_oop)->is_loader_alive(is_alive)) {
>>                 continue;
>>               }
>>             } else if (ic_oop->is_method()) {
>>               if (((Method*)ic_oop)->method_holder()->is_loader_alive(is_alive)) {
>>                 continue;
>>               }
>>             } else {
>>               ShouldNotReachHere();
>>             }
>>           }
>>           }
>>           ic->set_to_clean();
>>       }
>> 
>> 
>> Note the weird indentation, which could be seen as a hint that this might be a dubious / accidental change.
>> 
>> To understand why this is causing the problems we are seeing it's good to start by reading:
>> https://wiki.openjdk.java.net/display/HotSpot/Overview+of+CompiledIC+and+CompiledStaticCall
>> 
>> When the GC hits this path and finds an ic_oop that is NULL, it means that it is dealing with an inline cache that is a megamorphic vtable call (or clean). Those should not be cleaned (at least that wasn't the intention of the old code).
>> 
>> But now we do clean them, and to do so we use an ICStub (to make a safe transition to the clean state), which uses up slots in the ICBuffer. When the ICBuffer is full, concurrent GCs have to stop and schedule an ICBufferFull safepoint stop-the-world operation, which removes the ICStub from the inline cache and completely frees up the ICBuffer. If the GC cleans a lot of these megamorphic vtable inline caches, then we'll create a large number of ICBufferFull safepoints.
>> 
>> But it is even worse than that. After the class unloading GCs have destroyed all megamorphic vtable inline caches, the Java threads will see these cleaned inline caches and correct them. Correcting the cleaned inline caches from the Java threads will also use ICStubs, and eventually the inline caches will transition back to be a megamorphic vtable calls. Because of this we can end up in a situation where the GC and Java threads change the inline cache back and forth between clean and megamorphic vtable calls. When this happen both GC and Java threads will continuously schedule ICBufferFull safepoints, and this can go on for many seconds, even minutes, if we are unlucky. For ZGC this has the effect that it blocks any further GC work, and eventually the Java threads will run out of memory and hit allocation stalls. The Java threads will then wait for the GC "clean" all inline caches and exit the class unloading phase and proceed to the phase where memory is reclaimed. You can see in
  the GC logs that even though the problematic unlinking phase goes on for many seconds, the allocation stalls are "only" a few hundred milliseconds. This shows that when the Java threads stop fighting over the inline caches, the GC can finish the work relatively quickly.
>> 
>> G1 performs the inline cache cleaning while the Java threads are stopped, and therefore don't have to use ICStubs when the megamorphic vtables are accidentally cleaned. So, G1 (and other stop-the-world class unloading GCs) won't enter the situation where the GC and Java thread concurrently fight over the inline caches. It still causes the Java threads to have to take a slow path and fix the inline caches, which can result in unnecessary ICBufferFull safepoints.
>> 
>> I been able to reproduce the issue where ZGC and the Java threads fight over the ICStubs, causing minute long unloading times, by running one of the microbenchmarks from the Blackbird library used by the reporter of this issue. See description in:
>> https://mail.openjdk.java.net/pipermail/zgc-dev/2021-November/001096.html
>> 
>> I think this could be reproduced in other workloads as well. I've also been able to reproduce the excessive ICBufferFull safepoints with Kitchensink (an oracle-internal stress test).
>> 
>> I've verified that restoring the set_to_clean code to the right scope fixes the issue that I can reproduce with both Blackbird and Kitchensink. After the fix, the class unloading times go back to normal levels.
>> 
>> To identify this issue, it's good to run with -Xlog:gc*,safepoint and take note of the "Concurrent Process Non-Strong References" times and ICBufferFull safepoint lines.
>> 
>> Example logs from ZGC where concurrent cleaning causes ICBufferFull safepoints:
>> 
>> [38.557s][1637062062666ms][info ][gc,phases   ] GC(222) Concurrent Mark Free 0.001ms
>> [38.565s][1637062062673ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 7389821 ns, Reaching safepoint: 167546 ns, At safepoint: 6840 ns, Total: 174386 ns
>> [38.565s][1637062062673ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 27749 ns, Reaching safepoint: 89368 ns, At safepoint: 5710 ns, Total: 95078 ns
>> [38.566s][1637062062674ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 678872 ns, Reaching safepoint: 145967 ns, At safepoint: 6969 ns, Total: 152936 ns
>> [38.567s][1637062062675ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 934596 ns, Reaching safepoint: 165826 ns, At safepoint: 5460 ns, Total: 171286 ns
>> [38.567s][1637062062675ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 16500 ns, Reaching safepoint: 91147 ns, At safepoint: 5770 ns, Total: 96917 ns
>> [38.568s][1637062062677ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1124041 ns, Reaching safepoint: 154426 ns, At safepoint: 6280 ns, Total: 160706 ns
>> [38.570s][1637062062678ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1222819 ns, Reaching safepoint: 152646 ns, At safepoint: 6920 ns, Total: 159566 ns
>> [38.571s][1637062062679ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1070303 ns, Reaching safepoint: 152686 ns, At safepoint: 6029 ns, Total: 158715 ns
>> [38.571s][1637062062679ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 23650 ns, Reaching safepoint: 83208 ns, At safepoint: 6170 ns, Total: 89378 ns
>> [38.572s][1637062062681ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1005014 ns, Reaching safepoint: 148206 ns, At safepoint: 5660 ns, Total: 153866 ns
>> [38.572s][1637062062681ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 15110 ns, Reaching safepoint: 84047 ns, At safepoint: 5690 ns, Total: 89737 ns
>> [38.574s][1637062062682ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1370755 ns, Reaching safepoint: 171876 ns, At safepoint: 5030 ns, Total: 176906 ns
>> [38.574s][1637062062682ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 19749 ns, Reaching safepoint: 82478 ns, At safepoint: 4740 ns, Total: 87218 ns
>> [38.574s][1637062062682ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 12480 ns, Reaching safepoint: 86707 ns, At safepoint: 5040 ns, Total: 91747 ns
>> [38.575s][1637062062684ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 904007 ns, Reaching safepoint: 162666 ns, At safepoint: 5160 ns, Total: 167826 ns
>> [38.575s][1637062062684ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14269 ns, Reaching safepoint: 80878 ns, At safepoint: 5420 ns, Total: 86298 ns
>> [38.577s][1637062062685ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1240908 ns, Reaching safepoint: 144267 ns, At safepoint: 7030 ns, Total: 151297 ns
>> [38.578s][1637062062686ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 971325 ns, Reaching safepoint: 175725 ns, At safepoint: 4710 ns, Total: 180435 ns
>> [38.578s][1637062062686ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 16140 ns, Reaching safepoint: 80258 ns, At safepoint: 5389 ns, Total: 85647 ns
>> [38.578s][1637062062686ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 10290 ns, Reaching safepoint: 80858 ns, At safepoint: 5530 ns, Total: 86388 ns
>> [38.579s][1637062062687ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 430509 ns, Reaching safepoint: 159906 ns, At safepoint: 4610 ns, Total: 164516 ns
>> [38.579s][1637062062687ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 18269 ns, Reaching safepoint: 83838 ns, At safepoint: 4520 ns, Total: 88358 ns
>> [38.579s][1637062062687ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 13270 ns, Reaching safepoint: 77928 ns, At safepoint: 4790 ns, Total: 82718 ns
>> [38.579s][1637062062688ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 384230 ns, Reaching safepoint: 193705 ns, At safepoint: 4080 ns, Total: 197785 ns
>> [38.579s][1637062062688ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14099 ns, Reaching safepoint: 80908 ns, At safepoint: 4840 ns, Total: 85748 ns
>> [38.580s][1637062062688ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 9150 ns, Reaching safepoint: 79268 ns, At safepoint: 4890 ns, Total: 84158 ns
>> [38.580s][1637062062689ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 549396 ns, Reaching safepoint: 143086 ns, At safepoint: 6430 ns, Total: 149516 ns
>> [38.580s][1637062062689ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 12540 ns, Reaching safepoint: 94717 ns, At safepoint: 5800 ns, Total: 100517 ns
>> [38.581s][1637062062690ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 847758 ns, Reaching safepoint: 146687 ns, At safepoint: 5969 ns, Total: 152656 ns
>> [38.582s][1637062062691ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 972285 ns, Reaching safepoint: 128177 ns, At safepoint: 6350 ns, Total: 134527 ns
>> [38.584s][1637062062692ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 986975 ns, Reaching safepoint: 136396 ns, At safepoint: 5770 ns, Total: 142166 ns
>> [38.584s][1637062062692ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 17280 ns, Reaching safepoint: 87097 ns, At safepoint: 5270 ns, Total: 92367 ns
>> [38.585s][1637062062693ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1143131 ns, Reaching safepoint: 188315 ns, At safepoint: 5250 ns, Total: 193565 ns
>> [38.585s][1637062062694ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 12200 ns, Reaching safepoint: 80168 ns, At safepoint: 7480 ns, Total: 87648 ns
>> [38.586s][1637062062695ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1153410 ns, Reaching safepoint: 166846 ns, At safepoint: 7060 ns, Total: 173906 ns
>> [38.587s][1637062062695ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 21549 ns, Reaching safepoint: 89898 ns, At safepoint: 5360 ns, Total: 95258 ns
>> [38.588s][1637062062696ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1129411 ns, Reaching safepoint: 156726 ns, At safepoint: 4810 ns, Total: 161536 ns
>> [38.588s][1637062062696ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14089 ns, Reaching safepoint: 80588 ns, At safepoint: 5170 ns, Total: 85758 ns
>> [38.589s][1637062062697ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 644824 ns, Reaching safepoint: 140666 ns, At safepoint: 5990 ns, Total: 146656 ns
>> [38.590s][1637062062699ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1085312 ns, Reaching safepoint: 254264 ns, At safepoint: 5440 ns, Total: 259704 ns
>> [38.590s][1637062062699ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14609 ns, Reaching safepoint: 83748 ns, At safepoint: 5610 ns, Total: 89358 ns
>> [38.591s][1637062062699ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 387680 ns, Reaching safepoint: 201215 ns, At safepoint: 5340 ns, Total: 206555 ns
>> [38.591s][1637062062699ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 18929 ns, Reaching safepoint: 85098 ns, At safepoint: 5910 ns, Total: 91008 ns
>> [38.591s][1637062062700ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 380750 ns, Reaching safepoint: 175066 ns, At safepoint: 4730 ns, Total: 179796 ns
>> [38.592s][1637062062700ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14500 ns, Reaching safepoint: 80577 ns, At safepoint: 6790 ns, Total: 87367 ns
>> [38.592s][1637062062700ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14660 ns, Reaching safepoint: 78498 ns, At safepoint: 7180 ns, Total: 85678 ns
>> [38.592s][1637062062701ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 655783 ns, Reaching safepoint: 141717 ns, At safepoint: 6089 ns, Total: 147806 ns
>> [38.594s][1637062062702ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 916657 ns, Reaching safepoint: 144226 ns, At safepoint: 5360 ns, Total: 149586 ns
>> [38.595s][1637062062703ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1012334 ns, Reaching safepoint: 133037 ns, At safepoint: 10439 ns, Total: 143476 ns
>> [38.597s][1637062062705ms][info ][gc,phases   ] GC(222) Concurrent Process Non-Strong References 39.443ms
>> 
>> 
>> Example logs from G1 where the Java threads fixes the cleaned inline caches and run out of ICStubs:
>> 
>> [125.998s][1637065197322ms][info ][gc          ] GC(1040) Pause Remark 586M->414M(2048M) 6.609ms
>> [125.998s][1637065197322ms][info ][gc,cpu      ] GC(1040) User=0.08s Sys=0.00s Real=0.01s
>> [125.998s][1637065197322ms][info ][safepoint   ] Safepoint "G1Concurrent", Time since last: 33150646 ns, Reaching safepoint: 103457 ns, At safepoint: 6666988 ns, Total: 6770445 ns
>> [125.998s][1637065197322ms][info ][gc,marking  ] GC(1040) Concurrent Mark 38.296ms
>> [125.998s][1637065197322ms][info ][gc,marking  ] GC(1040) Concurrent Rebuild Remembered Sets
>> [126.001s][1637065197326ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 2621782 ns, Reaching safepoint: 626684 ns, At safepoint: 9340 ns, Total: 636024 ns
>> [126.002s][1637065197326ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 19949 ns, Reaching safepoint: 714022 ns, At safepoint: 12160 ns, Total: 726182 ns
>> [126.007s][1637065197331ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 4665009 ns, Reaching safepoint: 339751 ns, At safepoint: 9640 ns, Total: 349391 ns
>> [126.009s][1637065197334ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 2274802 ns, Reaching safepoint: 365760 ns, At safepoint: 9250 ns, Total: 375010 ns
>> [126.027s][1637065197352ms][info ][gc,marking  ] GC(1040) Concurrent Rebuild Remembered Sets 29.618ms
>> 
>> 
>> I've tested the performance of the change with SPECjbb2015, SPECjvm2008, DaCapo, Renaissance.
>> 
>> I've tested run the patch through tier1-7.
>> 
>> Note that I've made patch as small as possible to make it easier to backport. Thanks @fisk for discussion and explanation of the inline caches code.
>
> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Review Coleen

Marked as reviewed by coleenp (Reviewer).

-------------

PR: https://git.openjdk.java.net/jdk/pull/6450

From coleenp at openjdk.java.net  Thu Nov 18 17:34:43 2021
From: coleenp at openjdk.java.net (Coleen Phillimore)
Date: Thu, 18 Nov 2021 17:34:43 GMT
Subject: RFR: 8277212: GC accidentally cleans valid megamorphic vtable
 inline caches [v2]
In-Reply-To: <ZFfgDOUYfH3aHS4TzmXZ88aOr_HYOgWPbIx9qR5p-Qc=.4d7424bb-ce0a-4d86-ae13-f1b4d38a5564@github.com>
References: <9wD0oJ2P5bG1qYJ71qdCN4-Q_fiOkKWh4kXTdw8Yb8o=.fc8b7be9-7818-4414-9177-542e2b061480@github.com>
 <ZFfgDOUYfH3aHS4TzmXZ88aOr_HYOgWPbIx9qR5p-Qc=.4d7424bb-ce0a-4d86-ae13-f1b4d38a5564@github.com>
Message-ID: <8UhBoFffqzWeQJ96suEJbxltD8NOjsX7-MxUYzC20wU=.bbb9e801-600e-49f0-8e65-f5c82f251316@github.com>

On Thu, 18 Nov 2021 14:32:50 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

>> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Review Coleen
>
> src/hotspot/share/code/compiledMethod.cpp line 482:
> 
>> 480:       }
>> 481:     } else {
>> 482:       return true;
> 
> I've given up pretending to understand this code, but could you add a one line comment why you're returning true here?  ie. if ic_metadata is NULL, it's a megamorphic call or already clean and shouldn't be cleaned.

Thanks for the comment.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6450

From simonis at openjdk.java.net  Thu Nov 18 17:36:41 2021
From: simonis at openjdk.java.net (Volker Simonis)
Date: Thu, 18 Nov 2021 17:36:41 GMT
Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks [v3]
In-Reply-To: <ar7S0Y42f-xf1ii4ntYtOIEd5Nz0_W6wCMDoJD0x-S0=.75583559-34d4-4362-9ff7-b4a8c41d31dc@github.com>
References: <h_TohyRCtTn8BzN-h8uTTotWxPr5gTe4CWB04_lS8uA=.3f687b45-3058-42ab-902b-f42cfeb9104d@github.com>
 <ar7S0Y42f-xf1ii4ntYtOIEd5Nz0_W6wCMDoJD0x-S0=.75583559-34d4-4362-9ff7-b4a8c41d31dc@github.com>
Message-ID: <9GqbVZKY1Z5fCvB-vuCwqIFwPXEDU1nHd002J3SS2KM=.a1ca56cc-016b-4daf-9f69-5bbf60f32e71@github.com>

On Thu, 18 Nov 2021 15:25:15 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> This is part of a number of RFE I plan to improve and simplify C-heap overflow checking in hotspot.
>> 
>> This proposal adds NMT buffer overflow checking:
>> 
>> - it gives us C-heap overflow checking in release builds
>> - the costs are neglectable: if NMT is off, we won't pay anything; if NMT is on, the added work is minuscule since we have to do malloc header management anyway.
>> - NMT needs intact headers anyway. Faced with buffer overwrites today, it would maybe crash or maybe account wrongly, but it's a bit of a lottery really. Better to go the extra step and do a real check.
>> - it could be a preparation for future code removal, if we wanted to do that (see details in umbrella RFE https://bugs.openjdk.java.net/browse/JDK-8275301). That way, net complexity would come down even with this patch.
>> 
>> For more details, please see the JBS issue.
>> 
>> ----
>> 
>> Patch notes:
>> 
>> - The malloc header is changed such that it contains a 16-bit canary directly preceding the user payload of the allocation. The new malloc header does not use bitfields anymore but normal types. For more details, see the comment in mallocTracker.hpp.
>>   - On 64-bit, we don't enlarge the malloc header. It remains 16 bytes in length. So no additional memory cost (apart from the 1-byte-footer, see below). Space for the canary is instead obtained by reducing the size of the bucket index bit field to 16 bits. That bit field is used to store the bucket slot index of the malloc site table in NMT detail mode. With 40 bits it was over-dimensioned, and even 16-bits arguably still are: malloc site table width is 512.
>>   - On 32-bit, I had to enlarge the header from 8 bytes to 16 bytes to make room for a canary. But strictly speaking 8 bytes were not enough anyway: the header size has to be large enough to satisfy malloc(3) alignment, and that would be 16 bytes. I believe it never led to an error since we don't store 128bit data in malloc'd memory in the hotspot anywhere.
>> 
>> - I added a footer canary trailing the user allocation to catch tail buffer overruns. To keep matters simple (alignment) I made it a single byte only. That is enough to catch most overrun scenarios.
>> 
>> - I brushed up error reporting. When NMT detects corruption, it will now print out a hex dump of the corrupted area to tty before asserting.
>> 
>> - I added a bunch of gtests to test various heap overwrite scenarios. I also had to extend the gtest macros a bit because I wanted these tests of course to run in release builds too, but we did not have a death test macro for release builds yet (there are possibilities for code simplification here too, but that's for another RFE).
>> 
>> - I renamed `nmt_header_size` to `nmt_overhead` since that size includes header and footer now.
>> 
>> - I made the assert for malloc site table width a compile time STATIC_ASSERT.
>> 
>> --------------
>> 
>> Example output a buffer overrun would provide:
>> 
>> 
>> Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)
>> NMT Block at 0x00005600f86136b0, corruption at: 0x00005600f86136c1: 
>> 0x00005600f86136a8:   21 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00
>> 0x00005600f86136b8:   00 00 00 00 0f 00 1f fa 00 61 00 00 00 00 00 00
>> 0x00005600f86136c8:   41 39 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f86136d8:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
>> 0x00005600f86136e8:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f86136f8:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f8613708:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f8613718:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f8613728:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f8613738:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
>> assert failed: fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)#
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> #  Internal Error (mallocTracker.cpp:203), pid=10805, tid=10805
>> #  fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)
>> #
>> 
>> -------
>> 
>> Tests:
>> - manual tests with Linux x64, x86, minimal build
>> - GHAs all clean
>> - SAP nightlies ran for 4 weeks now without problems
>
> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Feedback Volker

Looks good to me know except for Zhengyu question.

src/hotspot/share/services/mallocTracker.cpp line 134:

> 132: 
> 133:   // This function prints block information, including hex dump, in case of a detected
> 134:   // corruption. The hex dump should show the both block header and the corruption site

..show both, the block header..

test/hotspot/gtest/nmt/test_nmt_buffer_overflow_detection.cpp line 69:

> 67: ///////
> 68: 
> 69: // A overwriter farther away from the NMT header; the report should show the hex dump split up

An overwrite

-------------

PR: https://git.openjdk.java.net/jdk/pull/5952

From stuefe at openjdk.java.net  Thu Nov 18 17:51:42 2021
From: stuefe at openjdk.java.net (Thomas Stuefe)
Date: Thu, 18 Nov 2021 17:51:42 GMT
Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks [v3]
In-Reply-To: <ar7S0Y42f-xf1ii4ntYtOIEd5Nz0_W6wCMDoJD0x-S0=.75583559-34d4-4362-9ff7-b4a8c41d31dc@github.com>
References: <h_TohyRCtTn8BzN-h8uTTotWxPr5gTe4CWB04_lS8uA=.3f687b45-3058-42ab-902b-f42cfeb9104d@github.com>
 <ar7S0Y42f-xf1ii4ntYtOIEd5Nz0_W6wCMDoJD0x-S0=.75583559-34d4-4362-9ff7-b4a8c41d31dc@github.com>
Message-ID: <L-EaybC6OOncnhsa9bRbYy0knr5qSuNpWlBn4u_B5WM=.55f40093-e235-4622-acbc-9ac0cef826c1@github.com>

On Thu, 18 Nov 2021 15:25:15 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> This is part of a number of RFE I plan to improve and simplify C-heap overflow checking in hotspot.
>> 
>> This proposal adds NMT buffer overflow checking:
>> 
>> - it gives us C-heap overflow checking in release builds
>> - the costs are neglectable: if NMT is off, we won't pay anything; if NMT is on, the added work is minuscule since we have to do malloc header management anyway.
>> - NMT needs intact headers anyway. Faced with buffer overwrites today, it would maybe crash or maybe account wrongly, but it's a bit of a lottery really. Better to go the extra step and do a real check.
>> - it could be a preparation for future code removal, if we wanted to do that (see details in umbrella RFE https://bugs.openjdk.java.net/browse/JDK-8275301). That way, net complexity would come down even with this patch.
>> 
>> For more details, please see the JBS issue.
>> 
>> ----
>> 
>> Patch notes:
>> 
>> - The malloc header is changed such that it contains a 16-bit canary directly preceding the user payload of the allocation. The new malloc header does not use bitfields anymore but normal types. For more details, see the comment in mallocTracker.hpp.
>>   - On 64-bit, we don't enlarge the malloc header. It remains 16 bytes in length. So no additional memory cost (apart from the 1-byte-footer, see below). Space for the canary is instead obtained by reducing the size of the bucket index bit field to 16 bits. That bit field is used to store the bucket slot index of the malloc site table in NMT detail mode. With 40 bits it was over-dimensioned, and even 16-bits arguably still are: malloc site table width is 512.
>>   - On 32-bit, I had to enlarge the header from 8 bytes to 16 bytes to make room for a canary. But strictly speaking 8 bytes were not enough anyway: the header size has to be large enough to satisfy malloc(3) alignment, and that would be 16 bytes. I believe it never led to an error since we don't store 128bit data in malloc'd memory in the hotspot anywhere.
>> 
>> - I added a footer canary trailing the user allocation to catch tail buffer overruns. To keep matters simple (alignment) I made it a single byte only. That is enough to catch most overrun scenarios.
>> 
>> - I brushed up error reporting. When NMT detects corruption, it will now print out a hex dump of the corrupted area to tty before asserting.
>> 
>> - I added a bunch of gtests to test various heap overwrite scenarios. I also had to extend the gtest macros a bit because I wanted these tests of course to run in release builds too, but we did not have a death test macro for release builds yet (there are possibilities for code simplification here too, but that's for another RFE).
>> 
>> - I renamed `nmt_header_size` to `nmt_overhead` since that size includes header and footer now.
>> 
>> - I made the assert for malloc site table width a compile time STATIC_ASSERT.
>> 
>> --------------
>> 
>> Example output a buffer overrun would provide:
>> 
>> 
>> Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)
>> NMT Block at 0x00005600f86136b0, corruption at: 0x00005600f86136c1: 
>> 0x00005600f86136a8:   21 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00
>> 0x00005600f86136b8:   00 00 00 00 0f 00 1f fa 00 61 00 00 00 00 00 00
>> 0x00005600f86136c8:   41 39 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f86136d8:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
>> 0x00005600f86136e8:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f86136f8:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f8613708:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f8613718:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f8613728:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f8613738:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
>> assert failed: fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)#
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> #  Internal Error (mallocTracker.cpp:203), pid=10805, tid=10805
>> #  fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)
>> #
>> 
>> -------
>> 
>> Tests:
>> - manual tests with Linux x64, x86, minimal build
>> - GHAs all clean
>> - SAP nightlies ran for 4 weeks now without problems
>
> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Feedback Volker

> I think you also need to subtract malloc_footer_size when calculating memblock_size below. Otherwise, memcpy can overwrite the footer.

Oh man, good catch... this is too complicated. See, that's why I want to remove the GuardedMemory layer. Having that gone will be such a relief.

So this is for a resize to a smaller size. We have this:

[guard header] [nmt header] [ ... payload ... ] [nmt footer] [guard footer]

and both nmt header and footer are now, from the POV of GuardedMemory, part of its payload. The os::malloc above already allocates a new block, and we need to copy the user payload while leaving the NMT footer intact.

I'll first write a repro case - this should have been catched by tests - then I think about a solution.

> I wonder should just consolidate malloc_header_size and malloc_footer_size to one malloc_overhead? I don't see them used separately.

I take a look. I also found that we have two version of malloc_header_size() - one takes the NMT level, one takes a pointer. That makes me nervous resolution wise, though it's very probably fine. Maybe we can reduce the complexity a bit. Though I prefer to keep this patch as small as possihble.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5952

From mdoerr at openjdk.java.net  Thu Nov 18 18:35:43 2021
From: mdoerr at openjdk.java.net (Martin Doerr)
Date: Thu, 18 Nov 2021 18:35:43 GMT
Subject: RFR: 8276927: [PPC64] Port shenandoahgc to linux on ppc64le [v2]
In-Reply-To: <-yZ-CH3Zdp9FbqRbcKA908KaXU7dWIuzy_SCyMluDr4=.31797995-8447-4902-89bd-693948925e49@github.com>
References: <Nt7KC3zC3ERbpcjIda_FrsJYNuDtzbd_khX75A1x4aE=.676a48b5-c7f4-4798-a3b8-516bb60dfaa4@github.com>
 <-yZ-CH3Zdp9FbqRbcKA908KaXU7dWIuzy_SCyMluDr4=.31797995-8447-4902-89bd-693948925e49@github.com>
Message-ID: <6ikSOeIWtJPZbIzHuiiEbSmpT60lFaZgOWejMxyAg80=.378fb020-a2e1-42f9-8e38-0985408f87f1@github.com>

On Thu, 18 Nov 2021 17:22:28 GMT, Niklas Radomski <nradomski at openjdk.org> wrote:

>> Port the Shenandoah garbage collector (JDK-8241457)[https://bugs.openjdk.java.net/browse/JDK-8241457] to linux on ppc64le.
>
> Niklas Radomski has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Remove debug clobber code

Thanks for the update! I think it's good to go.

-------------

Marked as reviewed by mdoerr (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6325

From nradomski at openjdk.java.net  Thu Nov 18 18:58:39 2021
From: nradomski at openjdk.java.net (Niklas Radomski)
Date: Thu, 18 Nov 2021 18:58:39 GMT
Subject: RFR: 8276927: [PPC64] Port shenandoahgc to linux on ppc64le [v2]
In-Reply-To: <-yZ-CH3Zdp9FbqRbcKA908KaXU7dWIuzy_SCyMluDr4=.31797995-8447-4902-89bd-693948925e49@github.com>
References: <Nt7KC3zC3ERbpcjIda_FrsJYNuDtzbd_khX75A1x4aE=.676a48b5-c7f4-4798-a3b8-516bb60dfaa4@github.com>
 <-yZ-CH3Zdp9FbqRbcKA908KaXU7dWIuzy_SCyMluDr4=.31797995-8447-4902-89bd-693948925e49@github.com>
Message-ID: <Mj2c6LxP7k_x__IZAeDx6fDLv7JWFqEZnnon2RqQF4k=.fe19ebbb-701b-4874-b2d1-991d273c5478@github.com>

On Thu, 18 Nov 2021 17:22:28 GMT, Niklas Radomski <nradomski at openjdk.org> wrote:

>> Port the Shenandoah garbage collector [JDK-8241457](https://bugs.openjdk.java.net/browse/JDK-8241457) to linux on ppc64le.
>
> Niklas Radomski has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Remove debug clobber code

Thank you for your reviews! Happy to see that the change has been so well received.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6325

From nradomski at openjdk.java.net  Thu Nov 18 19:07:46 2021
From: nradomski at openjdk.java.net (Niklas Radomski)
Date: Thu, 18 Nov 2021 19:07:46 GMT
Subject: Integrated: 8276927: [PPC64] Port shenandoahgc to linux on ppc64le
In-Reply-To: <Nt7KC3zC3ERbpcjIda_FrsJYNuDtzbd_khX75A1x4aE=.676a48b5-c7f4-4798-a3b8-516bb60dfaa4@github.com>
References: <Nt7KC3zC3ERbpcjIda_FrsJYNuDtzbd_khX75A1x4aE=.676a48b5-c7f4-4798-a3b8-516bb60dfaa4@github.com>
Message-ID: <fpMaWjJQo80qCIEj3niuU2LH0vqrhR2DgNoJYMSbTGA=.89aa4d63-dd18-4f55-ba53-274ffaddf9fc@github.com>

On Wed, 10 Nov 2021 09:00:04 GMT, Niklas Radomski <nradomski at openjdk.org> wrote:

> Port the Shenandoah garbage collector [JDK-8241457](https://bugs.openjdk.java.net/browse/JDK-8241457) to linux on ppc64le.

This pull request has now been integrated.

Changeset: 57eb8647
Author:    Niklas Radomski <nradomski at openjdk.org>
Committer: Martin Doerr <mdoerr at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/57eb864765f38185f8db8f1d37681d6cfe2a3c73
Stats:     1521 lines in 8 files changed: 1519 ins; 0 del; 2 mod

8276927: [PPC64] Port shenandoahgc to linux on ppc64le

Reviewed-by: rkennke, ihse, mdoerr

-------------

PR: https://git.openjdk.java.net/jdk/pull/6325

From smarks at openjdk.java.net  Thu Nov 18 19:30:49 2021
From: smarks at openjdk.java.net (Stuart Marks)
Date: Thu, 18 Nov 2021 19:30:49 GMT
Subject: RFR: JDK-8276422 Add command-line option to disable finalization
 [v2]
In-Reply-To: <WnXJ-krRcrrWjNIWs1CmXmEWMNSs18zbYxGuQcgdfRo=.c45b93ae-2f4b-460d-8cf9-8e3fcb5289c1@github.com>
References: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
 <aY7bIf24C-sA8R3hoi6dHmqd2R6QgzGUMg4UiWsy_5w=.a7144786-a08d-4def-9468-f13848200656@github.com>
 <f3w2BhEricECDGSIpGaZkuSKjTFus9LK-ZBV9BxvymM=.0a67f8df-912f-47c5-969d-b5a8161c31bf@github.com>
 <sw5vIggPSwpfKnyNbt8UExUxh0YWm5MXRI3tcwZSgPo=.a5b6fb7f-d344-4227-af96-5dee5d3d991e@github.com>
 <U84HM_PqW_WsrjHouuAdxceM6fYs8C9a0LOlhI5TnF8=.3a879bd2-696a-4209-ad39-d6e9d871bb0f@github.com>
 <cTtwBoryFv_Jk5upDHI3n0cAh_fERT_X5Kx9Gvxpx98=.f0021e32-acf1-4e05-b175-504771b98b48@github.com>
 <z41cE3BjXwZFI7VSTlJfbXkxB7hndTdJfrr16svSwKY=.d1dc118f-30bf-4d25-b94b-4ed9679b3159@github.com>
 <t6r4zgk8qUMXdSDYBxr-V6-KXAaB0nmawicCyc5JZoA=.cca430e6-b4e6-4eb4-b820-5be99f3574dd@github.com>
 <RtH2sOhHogV5vWttU3NuYCyCGvJWHg_WNL3mg-Rj1ag=.76a419ed-284f-41b1-9b4b-e25e3e9d338b@github.com>
 <WnXJ-krRcrrWjNIWs1CmXmEWMNSs18zbYxGuQcgdfRo=.c45b93ae-2f4b-460d-8cf9-8e3fcb5289c1@github.com>
Message-ID: <wj3PTvnCvI3VBdPWMKZgLTNUNu1RCSXQ2m-tYY1wxZo=.2c76a021-8aff-4a38-b1ec-3c565ff1618c@github.com>

On Thu, 18 Nov 2021 15:05:49 GMT, Peter Levart <plevart at openjdk.org> wrote:

>> Or, you could move the static initialization block that statrts the finalizer thread into the Finalizer.FinalizerThread class itself and then arrange for that class to be initialized explicitly immediately after the Finalizer class, but conditionally, only if the option to disable finalization was not specified...
>> This way the Finalizer class could still be initialized early, but the thread would not be started if it is not needed.
>
> If you then need this "flag" in the assert of registerFinalizer and runFinalization, you could use unsafe.shouldBeInitialized(Finalizer.FinalizerThread.class) as a means to find out whether the flag was set or not...

The disable-finalization feature is a bit more than experimental. The goal is to provide a faithful representation of what the system will look like when finalization is removed. Of course most of that is objects' `finalize` methods not being called, but it also includes having no finalizer thread running, as well as having `runFinalization` (a public API) do nothing at all. Thus I think it's useful to have the flag visible to Java.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6442

From smarks at openjdk.java.net  Thu Nov 18 19:30:51 2021
From: smarks at openjdk.java.net (Stuart Marks)
Date: Thu, 18 Nov 2021 19:30:51 GMT
Subject: RFR: JDK-8276422 Add command-line option to disable finalization
 [v2]
In-Reply-To: <wvBJBIkLa6ii27Y-haBT-3V1rzd1x3jhmwOQTu4lj3E=.b1110b6a-1818-48ee-bea0-f405ba07be29@github.com>
References: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
 <aY7bIf24C-sA8R3hoi6dHmqd2R6QgzGUMg4UiWsy_5w=.a7144786-a08d-4def-9468-f13848200656@github.com>
 <f3w2BhEricECDGSIpGaZkuSKjTFus9LK-ZBV9BxvymM=.0a67f8df-912f-47c5-969d-b5a8161c31bf@github.com>
 <sw5vIggPSwpfKnyNbt8UExUxh0YWm5MXRI3tcwZSgPo=.a5b6fb7f-d344-4227-af96-5dee5d3d991e@github.com>
 <U84HM_PqW_WsrjHouuAdxceM6fYs8C9a0LOlhI5TnF8=.3a879bd2-696a-4209-ad39-d6e9d871bb0f@github.com>
 <cTtwBoryFv_Jk5upDHI3n0cAh_fERT_X5Kx9Gvxpx98=.f0021e32-acf1-4e05-b175-504771b98b48@github.com>
 <z41cE3BjXwZFI7VSTlJfbXkxB7hndTdJfrr16svSwKY=.d1dc118f-30bf-4d25-b94b-4ed9679b3159@github.com>
 <wvBJBIkLa6ii27Y-haBT-3V1rzd1x3jhmwOQTu4lj3E=.b1110b6a-1818-48ee-bea0-f405ba07be29@github.com>
Message-ID: <mAAaEV3fEeU4NSgqew6wdu5Hc7XlFBYi73YHZGbQnAY=.2fcd787d-c396-4bb3-9669-7184310f6856@github.com>

On Thu, 18 Nov 2021 07:52:18 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Yeah, "flag" is `Holder.ENABLED` here. I mean, are Java methods `registerFinalizer` and `runFinalization` called only by VM? If so, can VM check the whole thing on VM side, without going to Java and asking back from there?
>
> `registerFinalizer` does not expect to be called and only uses the "flag" as a form of assertion.
> 
> `runFinalization` is called from Java code.

@dholmes-ora If the Finalizer class is initialized explicitly and at the right time, then maybe we can do away with the Holder class entirely. Can you point me to where this is done?

-------------

PR: https://git.openjdk.java.net/jdk/pull/6442

From smarks at openjdk.java.net  Thu Nov 18 20:05:15 2021
From: smarks at openjdk.java.net (Stuart Marks)
Date: Thu, 18 Nov 2021 20:05:15 GMT
Subject: RFR: JDK-8276422 Add command-line option to disable finalization
 [v3]
In-Reply-To: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
References: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
Message-ID: <4fYIR8SXkUTipn7wyUCWmBYgYwsL1MRd1bKmxyV6YLk=.9a9aef46-d959-402a-a6d8-c0fb65f06983@github.com>

> Pretty much what it says. The new option controls a static member in InstanceKlass that's consulted to determine whether the finalization machinery is activated for instances when a class is loaded. A new native method is added so that this state can be queried from Java. This is used to control whether a finalizer thread is created and to disable the `System` and `Runtime::runFinalization` methods. Includes tests for the above.
> 
> Adding an option to disable finalization is part of [JEP 421](https://openjdk.java.net/jeps/421).

Stuart Marks has updated the pull request incrementally with one additional commit since the last revision:

  Rename InstanceKlass::finalization_enabled to is_finalization_enabled. Minor cleanups.

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6442/files
  - new: https://git.openjdk.java.net/jdk/pull/6442/files/911af0b1..5df8bf9f

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6442&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6442&range=01-02

  Stats: 4 lines in 3 files changed: 0 ins; 0 del; 4 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6442.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6442/head:pull/6442

PR: https://git.openjdk.java.net/jdk/pull/6442

From mchung at openjdk.java.net  Thu Nov 18 20:16:37 2021
From: mchung at openjdk.java.net (Mandy Chung)
Date: Thu, 18 Nov 2021 20:16:37 GMT
Subject: RFR: JDK-8276422 Add command-line option to disable finalization
 [v3]
In-Reply-To: <WJ1mUrSUgXOp_f_ajBnOO9osSd4CySGDOTSZxxrx0iA=.1478bcd7-6767-4a84-bafd-aa86c4fdbf86@github.com>
References: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
 <6s-4rTTyX8qZpavXbov9l2-H6BU7QLA0c71-K6xSQUM=.54dd6db2-5e78-4292-8958-72cadd762154@github.com>
 <WJ1mUrSUgXOp_f_ajBnOO9osSd4CySGDOTSZxxrx0iA=.1478bcd7-6767-4a84-bafd-aa86c4fdbf86@github.com>
Message-ID: <rOixlVlkVwXNzuZwxEOMC68XN_9dV1mOMfetQP_A7Lw=.f90c217a-836c-4d3e-b1a9-655107b7434e@github.com>

On Thu, 18 Nov 2021 06:49:03 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:

>> src/hotspot/share/prims/jvm.cpp line 694:
>> 
>>> 692: 
>>> 693: JVM_ENTRY(jboolean, JVM_IsFinalizationEnabled(JNIEnv * env))
>>> 694: return InstanceKlass::finalization_enabled() ? JNI_TRUE : JNI_FALSE;
>> 
>> missing indentation
>
> I think this could just be `return InstanceKlass::finalization_enabled();`.  There is lots of code in this file and elsewhere that assumes C++ `bool` converts to `jboolean` appropriately.

One typical way for VM to pass the arguments to the library is via private system properties.   System::initPhase1 will save the VM properties in `jdk.internal.misc.VM` and filters out the private properties from the system properties returned from System::getProperties (see System::createProperties).

You can query the flag via `jdk.internal.misc.VM.getProperty("jdk.finalization.disabled")` for example.

I don't see any issue moving the Finalizer class initialization after initPhase1 since there is no finalizer during VM startup.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6442

From mchung at openjdk.java.net  Thu Nov 18 21:18:38 2021
From: mchung at openjdk.java.net (Mandy Chung)
Date: Thu, 18 Nov 2021 21:18:38 GMT
Subject: RFR: JDK-8276422 Add command-line option to disable finalization
 [v3]
In-Reply-To: <4fYIR8SXkUTipn7wyUCWmBYgYwsL1MRd1bKmxyV6YLk=.9a9aef46-d959-402a-a6d8-c0fb65f06983@github.com>
References: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
 <4fYIR8SXkUTipn7wyUCWmBYgYwsL1MRd1bKmxyV6YLk=.9a9aef46-d959-402a-a6d8-c0fb65f06983@github.com>
Message-ID: <0NKLwjDgge6gVJVHZr8o87VQvjz2FhNte1UPgiqs9qA=.c4eda9c4-d8a7-4ccc-8c64-9967e3c2923c@github.com>

On Thu, 18 Nov 2021 20:05:15 GMT, Stuart Marks <smarks at openjdk.org> wrote:

>> Pretty much what it says. The new option controls a static member in InstanceKlass that's consulted to determine whether the finalization machinery is activated for instances when a class is loaded. A new native method is added so that this state can be queried from Java. This is used to control whether a finalizer thread is created and to disable the `System` and `Runtime::runFinalization` methods. Includes tests for the above.
>> 
>> Adding an option to disable finalization is part of [JEP 421](https://openjdk.java.net/jeps/421).
>
> Stuart Marks has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Rename InstanceKlass::finalization_enabled to is_finalization_enabled. Minor cleanups.

When the finalization is disabled, perhaps jcmd GC.finalizer_info should just be made as a nop in the VM.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6442

From smarks at openjdk.java.net  Thu Nov 18 21:22:54 2021
From: smarks at openjdk.java.net (Stuart Marks)
Date: Thu, 18 Nov 2021 21:22:54 GMT
Subject: RFR: JDK-8276422 Add command-line option to disable finalization
 [v3]
In-Reply-To: <rOixlVlkVwXNzuZwxEOMC68XN_9dV1mOMfetQP_A7Lw=.f90c217a-836c-4d3e-b1a9-655107b7434e@github.com>
References: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
 <6s-4rTTyX8qZpavXbov9l2-H6BU7QLA0c71-K6xSQUM=.54dd6db2-5e78-4292-8958-72cadd762154@github.com>
 <WJ1mUrSUgXOp_f_ajBnOO9osSd4CySGDOTSZxxrx0iA=.1478bcd7-6767-4a84-bafd-aa86c4fdbf86@github.com>
 <rOixlVlkVwXNzuZwxEOMC68XN_9dV1mOMfetQP_A7Lw=.f90c217a-836c-4d3e-b1a9-655107b7434e@github.com>
Message-ID: <3QzIh53czhYZl6kAtuP4lbxnBXY_eb5gmR1fN-WnBiY=.9d088dde-61c0-4151-8880-d7b7a66c317d@github.com>

On Thu, 18 Nov 2021 20:13:23 GMT, Mandy Chung <mchung at openjdk.org> wrote:

>> I think this could just be `return InstanceKlass::finalization_enabled();`.  There is lots of code in this file and elsewhere that assumes C++ `bool` converts to `jboolean` appropriately.
>
> One typical way for VM to pass the arguments to the library is via private system properties.   System::initPhase1 will save the VM properties in `jdk.internal.misc.VM` and filters out the private properties from the system properties returned from System::getProperties (see System::createProperties).
> 
> You can query the flag via `jdk.internal.misc.VM.getProperty("jdk.finalization.disabled")` for example.
> 
> I don't see any issue moving the Finalizer class initialization after initPhase1 since there is no finalizer during VM startup.

I renamed the function to `is_finalization_enabled` per previous comment, and I also made these cleanups.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6442

From rkennke at openjdk.java.net  Thu Nov 18 21:35:51 2021
From: rkennke at openjdk.java.net (Roman Kennke)
Date: Thu, 18 Nov 2021 21:35:51 GMT
Subject: Integrated: 8275527: Refactor forward pointer access
In-Reply-To: <lLd1nmhXCgBhAySmq81KrMplWUSWMCJS4OybGZuMjco=.4012548b-6f1d-44ee-a59c-21f1077cba01@github.com>
References: <lLd1nmhXCgBhAySmq81KrMplWUSWMCJS4OybGZuMjco=.4012548b-6f1d-44ee-a59c-21f1077cba01@github.com>
Message-ID: <ABql1QFqXem4aRwxCiAqQiDqrAkZCXvtgoHhd20k70Y=.31336a91-c57f-4af5-a97c-5efe8a47cf4d@github.com>

On Thu, 14 Oct 2021 16:37:02 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

> Accessing the forward pointer is currently a little inconsistent. Some code paths call oopDesc::forwardee() / oopDesc::is_forwarded(), some code paths call forwardee() and check it for ==/!= NULL, some code paths even call markWord::decode_pointer() and markWord::is_marked() instead.
> 
> This change attempts to make the situation more consistent. For simple cases it preserves oopDesc::forwardee() / is_forwarded(), some cases need to use the markWord for consistency in concurrent GC, they now use markWord::forwardee() and markWord::is_forwarded(). Also, checking whether or not an object is forwarded is now consistently done using is_forwarded() and not by checking forwardee ==/!= NULL. This also resolves the mess in G1 full GC that changes not-forwarded objects to have a NULL (fake-) pointer. This is not necessary, because we can just as well use the lock bits to determine whether or not the object is forwarded.
> 
> Testing:
>  - [x] tier
>  - [x] tier2
>  - [x] hotspot_gc

This pull request has now been integrated.

Changeset: 89b125f4
Author:    Roman Kennke <rkennke at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/89b125f4d4d6a467185b4b39861fd530a738e67f
Stats:     46 lines in 9 files changed: 4 ins; 26 del; 16 mod

8275527: Refactor forward pointer access

Reviewed-by: tschatzl, stefank

-------------

PR: https://git.openjdk.java.net/jdk/pull/5955

From smarks at openjdk.java.net  Thu Nov 18 21:54:45 2021
From: smarks at openjdk.java.net (Stuart Marks)
Date: Thu, 18 Nov 2021 21:54:45 GMT
Subject: RFR: JDK-8276422 Add command-line option to disable finalization
 [v3]
In-Reply-To: <G3LY1_9uMKwLJv93ncw1CLCj8jh8TqSQ_hgFp7ChmIg=.2b814625-6d7a-438d-8c8d-b91aae9b3ecb@github.com>
References: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
 <G3LY1_9uMKwLJv93ncw1CLCj8jh8TqSQ_hgFp7ChmIg=.2b814625-6d7a-438d-8c8d-b91aae9b3ecb@github.com>
Message-ID: <DPvzopmaj7H7f1xkl_y-RD2k0771dSfDrzQvw798vqM=.934e5ce7-2b33-48fe-a5e9-708e6443db44@github.com>

On Thu, 18 Nov 2021 06:47:05 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> Stuart Marks has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Rename InstanceKlass::finalization_enabled to is_finalization_enabled. Minor cleanups.
>
> src/hotspot/share/prims/jvm.cpp line 694:
> 
>> 692: 
>> 693: JVM_ENTRY(jboolean, JVM_IsFinalizationEnabled(JNIEnv * env))
>> 694: return InstanceKlass::finalization_enabled() ? JNI_TRUE : JNI_FALSE;
> 
> Suggestion:
> 
>   return InstanceKlass::finalization_enabled() ? JNI_TRUE : JNI_FALSE;

Fixed.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6442

From coleenp at openjdk.java.net  Thu Nov 18 22:04:58 2021
From: coleenp at openjdk.java.net (Coleen Phillimore)
Date: Thu, 18 Nov 2021 22:04:58 GMT
Subject: RFR: 8277342: vmTestbase/nsk/stress/strace/strace004.java fails with
 SIGSEGV  in InstanceKlass::jni_id_for
Message-ID: <pAeVupXPcD-13lzymILAyKfwTmvE0hAbuek4GnsZosg=.55fe59ce-31c2-4532-b573-686cc3275d36@github.com>

Use the version jni_id_for_impl() as jni_id_for() that takes out the JFieldIdCreation_lock before reading jni_ids in InstanceKlass.
Tested with mach5 tier1-3.

-------------

Commit messages:
 - 8277342: vmTestbase/nsk/stress/strace/strace004.java fails with SIGSEGV in InstanceKlass::jni_id_for

Changes: https://git.openjdk.java.net/jdk/pull/6466/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6466&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8277342
  Stats: 14 lines in 2 files changed: 0 ins; 12 del; 2 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6466.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6466/head:pull/6466

PR: https://git.openjdk.java.net/jdk/pull/6466

From smarks at openjdk.java.net  Thu Nov 18 22:07:41 2021
From: smarks at openjdk.java.net (Stuart Marks)
Date: Thu, 18 Nov 2021 22:07:41 GMT
Subject: RFR: JDK-8276422 Add command-line option to disable finalization
 [v3]
In-Reply-To: <3QzIh53czhYZl6kAtuP4lbxnBXY_eb5gmR1fN-WnBiY=.9d088dde-61c0-4151-8880-d7b7a66c317d@github.com>
References: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
 <6s-4rTTyX8qZpavXbov9l2-H6BU7QLA0c71-K6xSQUM=.54dd6db2-5e78-4292-8958-72cadd762154@github.com>
 <WJ1mUrSUgXOp_f_ajBnOO9osSd4CySGDOTSZxxrx0iA=.1478bcd7-6767-4a84-bafd-aa86c4fdbf86@github.com>
 <rOixlVlkVwXNzuZwxEOMC68XN_9dV1mOMfetQP_A7Lw=.f90c217a-836c-4d3e-b1a9-655107b7434e@github.com>
 <3QzIh53czhYZl6kAtuP4lbxnBXY_eb5gmR1fN-WnBiY=.9d088dde-61c0-4151-8880-d7b7a66c317d@github.com>
Message-ID: <xuiaXo5o2W1SfD3zNyAXbbI2xTEKjmNF7OnbdbjqrVg=.f7eb4c6a-2ff7-4a82-a485-f5e3ed9e3d1e@github.com>

On Thu, 18 Nov 2021 21:19:44 GMT, Stuart Marks <smarks at openjdk.org> wrote:

>> One typical way for VM to pass the arguments to the library is via private system properties.   System::initPhase1 will save the VM properties in `jdk.internal.misc.VM` and filters out the private properties from the system properties returned from System::getProperties (see System::createProperties).
>> 
>> You can query the flag via `jdk.internal.misc.VM.getProperty("jdk.finalization.disabled")` for example.
>> 
>> I don't see any issue moving the Finalizer class initialization after initPhase1 since there is no finalizer during VM startup.
>
> I renamed the function to `is_finalization_enabled` per previous comment, and I also made these cleanups.

Regarding using system properties, my initial prototype did this in the launcher, and it did run into the problem that the Finalizer class is initialized before system properties are available. That's why I created the Holder class, so that reading the property could be delayed until the first upcall to Finalizer::register. I suppose the initialization of Finalizer could be moved later, but that seems more invasive.

The flag needs to be available in the VM in order to avoid upcalls for instances-with-finalizers in the first place. Alan had [suggested](https://bugs.openjdk.java.net/browse/JDK-8276422?focusedCommentId=14456185&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14456185) moving the argument processing into the VM, and David suggested putting the flag into InstanceKlass, which seems a sensible place to me. It's also reasonably accessible there to GC implementations, should they want to inspect it.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6442

From dcubed at openjdk.java.net  Thu Nov 18 22:28:41 2021
From: dcubed at openjdk.java.net (Daniel D.Daugherty)
Date: Thu, 18 Nov 2021 22:28:41 GMT
Subject: RFR: 8277342: vmTestbase/nsk/stress/strace/strace004.java fails
 with SIGSEGV  in InstanceKlass::jni_id_for
In-Reply-To: <pAeVupXPcD-13lzymILAyKfwTmvE0hAbuek4GnsZosg=.55fe59ce-31c2-4532-b573-686cc3275d36@github.com>
References: <pAeVupXPcD-13lzymILAyKfwTmvE0hAbuek4GnsZosg=.55fe59ce-31c2-4532-b573-686cc3275d36@github.com>
Message-ID: <EOoNidgipL8Es30fbv7p5sY0GgvSWf13df5aZPGUfxA=.db97a723-97bb-4fb3-baa5-d28ad9b49ad4@github.com>

On Thu, 18 Nov 2021 21:56:58 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

> Use the version jni_id_for_impl() as jni_id_for() that takes out the JFieldIdCreation_lock before reading jni_ids in InstanceKlass.
> Tested with mach5 tier1-3.

@coleenp - The original failure happened in Tier5...

-------------

PR: https://git.openjdk.java.net/jdk/pull/6466

From mchung at openjdk.java.net  Thu Nov 18 22:42:39 2021
From: mchung at openjdk.java.net (Mandy Chung)
Date: Thu, 18 Nov 2021 22:42:39 GMT
Subject: RFR: JDK-8276422 Add command-line option to disable finalization
 [v3]
In-Reply-To: <xuiaXo5o2W1SfD3zNyAXbbI2xTEKjmNF7OnbdbjqrVg=.f7eb4c6a-2ff7-4a82-a485-f5e3ed9e3d1e@github.com>
References: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
 <6s-4rTTyX8qZpavXbov9l2-H6BU7QLA0c71-K6xSQUM=.54dd6db2-5e78-4292-8958-72cadd762154@github.com>
 <WJ1mUrSUgXOp_f_ajBnOO9osSd4CySGDOTSZxxrx0iA=.1478bcd7-6767-4a84-bafd-aa86c4fdbf86@github.com>
 <rOixlVlkVwXNzuZwxEOMC68XN_9dV1mOMfetQP_A7Lw=.f90c217a-836c-4d3e-b1a9-655107b7434e@github.com>
 <3QzIh53czhYZl6kAtuP4lbxnBXY_eb5gmR1fN-WnBiY=.9d088dde-61c0-4151-8880-d7b7a66c317d@github.com>
 <xuiaXo5o2W1SfD3zNyAXbbI2xTEKjmNF7OnbdbjqrVg=.f7eb4c6a-2ff7-4a82-a485-f5e3ed9e3d1e@github.com>
Message-ID: <1QdFtRR9FuHw4CehdL7NxWSEsdCHU6roiGiYQEJlEO0=.ed86261e-63ca-4684-93da-f153e252643e@github.com>

On Thu, 18 Nov 2021 22:04:52 GMT, Stuart Marks <smarks at openjdk.org> wrote:

>> I renamed the function to `is_finalization_enabled` per previous comment, and I also made these cleanups.
>
> Regarding using system properties, my initial prototype did this in the launcher, and it did run into the problem that the Finalizer class is initialized before system properties are available. That's why I created the Holder class, so that reading the property could be delayed until the first upcall to Finalizer::register. I suppose the initialization of Finalizer could be moved later, but that seems more invasive.
> 
> The flag needs to be available in the VM in order to avoid upcalls for instances-with-finalizers in the first place. Alan had [suggested](https://bugs.openjdk.java.net/browse/JDK-8276422?focusedCommentId=14456185&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14456185) moving the argument processing into the VM, and David suggested putting the flag into InstanceKlass, which seems a sensible place to me. It's also reasonably accessible there to GC implementations, should they want to inspect it.

> Alan had suggested moving the argument processing into the VM, and David suggested putting the flag into InstanceKlass, which seems a sensible place to me. It's also reasonably accessible there to GC implementations, should they want to inspect it.

That's still all good.  What I meant is for the VM to add a private system property (not the launcher) as to pass the flag to the library code.  The precedence is like `sun.nio.MaxDirectMemorySize`  or `java.lang.Integer.IntegerCache.high`.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6442

From david.holmes at oracle.com  Thu Nov 18 23:23:37 2021
From: david.holmes at oracle.com (David Holmes)
Date: Fri, 19 Nov 2021 09:23:37 +1000
Subject: RFR: JDK-8276422 Add command-line option to disable finalization
 [v3]
In-Reply-To: <rOixlVlkVwXNzuZwxEOMC68XN_9dV1mOMfetQP_A7Lw=.f90c217a-836c-4d3e-b1a9-655107b7434e@github.com>
References: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
 <6s-4rTTyX8qZpavXbov9l2-H6BU7QLA0c71-K6xSQUM=.54dd6db2-5e78-4292-8958-72cadd762154@github.com>
 <WJ1mUrSUgXOp_f_ajBnOO9osSd4CySGDOTSZxxrx0iA=.1478bcd7-6767-4a84-bafd-aa86c4fdbf86@github.com>
 <rOixlVlkVwXNzuZwxEOMC68XN_9dV1mOMfetQP_A7Lw=.f90c217a-836c-4d3e-b1a9-655107b7434e@github.com>
Message-ID: <3d0c8442-459c-54ef-6693-6e09cbfa5bbd@oracle.com>

Hi Mandy,

On 19/11/2021 6:16 am, Mandy Chung wrote:
> On Thu, 18 Nov 2021 06:49:03 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:
> 
>>> src/hotspot/share/prims/jvm.cpp line 694:
>>>
>>>> 692:
>>>> 693: JVM_ENTRY(jboolean, JVM_IsFinalizationEnabled(JNIEnv * env))
>>>> 694: return InstanceKlass::finalization_enabled() ? JNI_TRUE : JNI_FALSE;
>>>
>>> missing indentation
>>
>> I think this could just be `return InstanceKlass::finalization_enabled();`.  There is lots of code in this file and elsewhere that assumes C++ `bool` converts to `jboolean` appropriately.
> 
> One typical way for VM to pass the arguments to the library is via private system properties.   System::initPhase1 will save the VM properties in `jdk.internal.misc.VM` and filters out the private properties from the system properties returned from System::getProperties (see System::createProperties).

The Finalizer class is initialized before initPhase1() happens. So to 
use a property the Holder class had to be introduced to be initialized 
after initPhase1().

There is always a choice of having the VM push up a system property to 
the Java code, or the Java code calling down to query the VM. The VM 
call seems simpler/cheaper/cleaner in this case.

Cheers,
David

> You can query the flag via `jdk.internal.misc.VM.getProperty("jdk.finalization.disabled")` for example.
> 
> I don't see any issue moving the Finalizer class initialization after initPhase1 since there is no finalizer during VM startup.
> 
> -------------
> 
> PR: https://git.openjdk.java.net/jdk/pull/6442
> 

From coleenp at openjdk.java.net  Thu Nov 18 23:27:47 2021
From: coleenp at openjdk.java.net (Coleen Phillimore)
Date: Thu, 18 Nov 2021 23:27:47 GMT
Subject: RFR: 8277342: vmTestbase/nsk/stress/strace/strace004.java fails
 with SIGSEGV  in InstanceKlass::jni_id_for
In-Reply-To: <EOoNidgipL8Es30fbv7p5sY0GgvSWf13df5aZPGUfxA=.db97a723-97bb-4fb3-baa5-d28ad9b49ad4@github.com>
References: <pAeVupXPcD-13lzymILAyKfwTmvE0hAbuek4GnsZosg=.55fe59ce-31c2-4532-b573-686cc3275d36@github.com>
 <EOoNidgipL8Es30fbv7p5sY0GgvSWf13df5aZPGUfxA=.db97a723-97bb-4fb3-baa5-d28ad9b49ad4@github.com>
Message-ID: <qiR8brwETgFVEoe2KgfNQVK7nucvlnWwAsHdWKGyo6M=.7dad4f5c-1263-4186-9990-ab4973784373@github.com>

On Thu, 18 Nov 2021 22:25:35 GMT, Daniel D. Daugherty <dcubed at openjdk.org> wrote:

>> Use the version jni_id_for_impl() as jni_id_for() that takes out the JFieldIdCreation_lock before reading jni_ids in InstanceKlass.
>> Tested with mach5 tier1-3.
>
> @coleenp - The original failure happened in Tier5...

@dcubed-ojdk thanks Dan.  I'll rerun tier5 on our default platforms.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6466

From dholmes at openjdk.java.net  Thu Nov 18 23:27:50 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Thu, 18 Nov 2021 23:27:50 GMT
Subject: RFR: JDK-8276422 Add command-line option to disable finalization
 [v3]
In-Reply-To: <4fYIR8SXkUTipn7wyUCWmBYgYwsL1MRd1bKmxyV6YLk=.9a9aef46-d959-402a-a6d8-c0fb65f06983@github.com>
References: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
 <4fYIR8SXkUTipn7wyUCWmBYgYwsL1MRd1bKmxyV6YLk=.9a9aef46-d959-402a-a6d8-c0fb65f06983@github.com>
Message-ID: <hth04oedqIMLgf4muLy635XGvfzLnCYf2O7sG6rcjvk=.8813cf39-a936-4dc2-b287-59d518cffa7b@github.com>

On Thu, 18 Nov 2021 20:05:15 GMT, Stuart Marks <smarks at openjdk.org> wrote:

>> Pretty much what it says. The new option controls a static member in InstanceKlass that's consulted to determine whether the finalization machinery is activated for instances when a class is loaded. A new native method is added so that this state can be queried from Java. This is used to control whether a finalizer thread is created and to disable the `System` and `Runtime::runFinalization` methods. Includes tests for the above.
>> 
>> Adding an option to disable finalization is part of [JEP 421](https://openjdk.java.net/jeps/421).
>
> Stuart Marks has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Rename InstanceKlass::finalization_enabled to is_finalization_enabled. Minor cleanups.

Marked as reviewed by dholmes (Reviewer).

-------------

PR: https://git.openjdk.java.net/jdk/pull/6442

From bchristi at openjdk.java.net  Thu Nov 18 23:39:43 2021
From: bchristi at openjdk.java.net (Brent Christian)
Date: Thu, 18 Nov 2021 23:39:43 GMT
Subject: RFR: JDK-8276422 Add command-line option to disable finalization
 [v3]
In-Reply-To: <0NKLwjDgge6gVJVHZr8o87VQvjz2FhNte1UPgiqs9qA=.c4eda9c4-d8a7-4ccc-8c64-9967e3c2923c@github.com>
References: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
 <4fYIR8SXkUTipn7wyUCWmBYgYwsL1MRd1bKmxyV6YLk=.9a9aef46-d959-402a-a6d8-c0fb65f06983@github.com>
 <0NKLwjDgge6gVJVHZr8o87VQvjz2FhNte1UPgiqs9qA=.c4eda9c4-d8a7-4ccc-8c64-9967e3c2923c@github.com>
Message-ID: <YTASD5_5y15YeRn25J_mAc1hqS3sZJM4497QUTEbjzw=.a12a2486-9d75-41be-86d1-a9ebfddadd98@github.com>

On Thu, 18 Nov 2021 21:15:11 GMT, Mandy Chung <mchung at openjdk.org> wrote:

> When the finalization is disabled, perhaps jcmd GC.finalizer_info should just be made as a nop in the VM.

Would it be interesting (perhaps in a follow-up) for GC.finalizer_info to report that the given VM had finalization disabled?

-------------

PR: https://git.openjdk.java.net/jdk/pull/6442

From smarks at openjdk.java.net  Fri Nov 19 00:14:18 2021
From: smarks at openjdk.java.net (Stuart Marks)
Date: Fri, 19 Nov 2021 00:14:18 GMT
Subject: RFR: JDK-8276422 Add command-line option to disable finalization
 [v4]
In-Reply-To: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
References: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
Message-ID: <KdQiau7Z_SiFlF6LZJr2iMCilTHuvVuAcs9t_u-dH0s=.a1fd6719-c3dc-400f-9c65-0c70ba321120@github.com>

> Pretty much what it says. The new option controls a static member in InstanceKlass that's consulted to determine whether the finalization machinery is activated for instances when a class is loaded. A new native method is added so that this state can be queried from Java. This is used to control whether a finalizer thread is created and to disable the `System` and `Runtime::runFinalization` methods. Includes tests for the above.
> 
> Adding an option to disable finalization is part of [JEP 421](https://openjdk.java.net/jeps/421).

Stuart Marks has updated the pull request incrementally with one additional commit since the last revision:

  Remove Finalizer.Holder class.

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6442/files
  - new: https://git.openjdk.java.net/jdk/pull/6442/files/5df8bf9f..e357eeec

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6442&range=03
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6442&range=02-03

  Stats: 6 lines in 1 file changed: 0 ins; 2 del; 4 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6442.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6442/head:pull/6442

PR: https://git.openjdk.java.net/jdk/pull/6442

From smarks at openjdk.java.net  Fri Nov 19 00:17:41 2021
From: smarks at openjdk.java.net (Stuart Marks)
Date: Fri, 19 Nov 2021 00:17:41 GMT
Subject: RFR: JDK-8276422 Add command-line option to disable finalization
 [v4]
In-Reply-To: <aY7bIf24C-sA8R3hoi6dHmqd2R6QgzGUMg4UiWsy_5w=.a7144786-a08d-4def-9468-f13848200656@github.com>
References: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
 <aY7bIf24C-sA8R3hoi6dHmqd2R6QgzGUMg4UiWsy_5w=.a7144786-a08d-4def-9468-f13848200656@github.com>
Message-ID: <NG2-01MccUGiwxMb7gXCIGkmjrCH9oXbHny0BW0JhnA=.d38611a7-4aa1-4711-9a4b-fb5e29116fcf@github.com>

On Thu, 18 Nov 2021 04:13:21 GMT, Jaikiran Pai <jpai at openjdk.org> wrote:

>> Stuart Marks has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Remove Finalizer.Holder class.
>
> src/java.base/share/classes/java/lang/ref/Finalizer.java line 195:
> 
>> 193: 
>> 194:     static {
>> 195:         if (Holder.ENABLED) {
> 
> Hello Stuart,
> My understanding of the the lazy `Holder` is that it's there to delay the static initialization of the code that's part of the `Holder`. In this case here, the `Holder` is being used right within the `static` block of the `Finalizer` class, that too as the first thing. In this case, is that `Holder` class necessary?

I pushed an update to remove the Holder class. It seems to continue to work fine. Thanks for pointing this out @jaikiran !

-------------

PR: https://git.openjdk.java.net/jdk/pull/6442

From dholmes at openjdk.java.net  Fri Nov 19 01:02:41 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Fri, 19 Nov 2021 01:02:41 GMT
Subject: RFR: JDK-8276422 Add command-line option to disable finalization
 [v4]
In-Reply-To: <KdQiau7Z_SiFlF6LZJr2iMCilTHuvVuAcs9t_u-dH0s=.a1fd6719-c3dc-400f-9c65-0c70ba321120@github.com>
References: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
 <KdQiau7Z_SiFlF6LZJr2iMCilTHuvVuAcs9t_u-dH0s=.a1fd6719-c3dc-400f-9c65-0c70ba321120@github.com>
Message-ID: <jkeqaOnxyjD0r-pnJvzEYb3zBy8YhpkLDg2ZP8ehgK0=.b1bd30ca-a6ce-411e-98f6-0400043eb307@github.com>

On Fri, 19 Nov 2021 00:14:18 GMT, Stuart Marks <smarks at openjdk.org> wrote:

>> Pretty much what it says. The new option controls a static member in InstanceKlass that's consulted to determine whether the finalization machinery is activated for instances when a class is loaded. A new native method is added so that this state can be queried from Java. This is used to control whether a finalizer thread is created and to disable the `System` and `Runtime::runFinalization` methods. Includes tests for the above.
>> 
>> Adding an option to disable finalization is part of [JEP 421](https://openjdk.java.net/jeps/421).
>
> Stuart Marks has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Remove Finalizer.Holder class.

Good simplification.

src/java.base/share/classes/java/lang/ref/Finalizer.java line 64:

> 62:     }
> 63: 
> 64:     static final boolean ENABLED = isFinalizationEnabled();

private?

-------------

Marked as reviewed by dholmes (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6442

From dholmes at openjdk.java.net  Fri Nov 19 01:06:38 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Fri, 19 Nov 2021 01:06:38 GMT
Subject: RFR: JDK-8276422 Add command-line option to disable finalization
 [v3]
In-Reply-To: <YTASD5_5y15YeRn25J_mAc1hqS3sZJM4497QUTEbjzw=.a12a2486-9d75-41be-86d1-a9ebfddadd98@github.com>
References: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
 <4fYIR8SXkUTipn7wyUCWmBYgYwsL1MRd1bKmxyV6YLk=.9a9aef46-d959-402a-a6d8-c0fb65f06983@github.com>
 <0NKLwjDgge6gVJVHZr8o87VQvjz2FhNte1UPgiqs9qA=.c4eda9c4-d8a7-4ccc-8c64-9967e3c2923c@github.com>
 <YTASD5_5y15YeRn25J_mAc1hqS3sZJM4497QUTEbjzw=.a12a2486-9d75-41be-86d1-a9ebfddadd98@github.com>
Message-ID: <kGrNumOn8SMaxQk_ytrzJy43rIBpwggv1Ldv00_NmVQ=.ade6b505-0d71-4027-8ff7-960d8097e9d4@github.com>

On Thu, 18 Nov 2021 23:36:23 GMT, Brent Christian <bchristi at openjdk.org> wrote:

> When the finalization is disabled, perhaps jcmd GC.finalizer_info should just be made as a nop in the VM.

Yes that is a trivial change to add. @stuart-marks I can provide the code. You can choose whether to include in this PR or else we can do a follow-up.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6442

From dholmes at openjdk.java.net  Fri Nov 19 01:34:45 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Fri, 19 Nov 2021 01:34:45 GMT
Subject: RFR: JDK-8276422 Add command-line option to disable finalization
 [v4]
In-Reply-To: <KdQiau7Z_SiFlF6LZJr2iMCilTHuvVuAcs9t_u-dH0s=.a1fd6719-c3dc-400f-9c65-0c70ba321120@github.com>
References: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
 <KdQiau7Z_SiFlF6LZJr2iMCilTHuvVuAcs9t_u-dH0s=.a1fd6719-c3dc-400f-9c65-0c70ba321120@github.com>
Message-ID: <YHF8iEp-JYWsoiCbqYalPThEXrAr8CYytKwWpZFK57Q=.905fdd9c-0d33-4880-b2cc-c2eb1d5b9e47@github.com>

On Fri, 19 Nov 2021 00:14:18 GMT, Stuart Marks <smarks at openjdk.org> wrote:

>> Pretty much what it says. The new option controls a static member in InstanceKlass that's consulted to determine whether the finalization machinery is activated for instances when a class is loaded. A new native method is added so that this state can be queried from Java. This is used to control whether a finalizer thread is created and to disable the `System` and `Runtime::runFinalization` methods. Includes tests for the above.
>> 
>> Adding an option to disable finalization is part of [JEP 421](https://openjdk.java.net/jeps/421).
>
> Stuart Marks has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Remove Finalizer.Holder class.

@stuart-marks : https://github.com/openjdk/jdk/pull/6469  (didn't intend to actually make a PR but clicked the wrong part of the button :) )

-------------

PR: https://git.openjdk.java.net/jdk/pull/6442

From dholmes at openjdk.java.net  Fri Nov 19 02:05:42 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Fri, 19 Nov 2021 02:05:42 GMT
Subject: RFR: 8277342: vmTestbase/nsk/stress/strace/strace004.java fails
 with SIGSEGV  in InstanceKlass::jni_id_for
In-Reply-To: <pAeVupXPcD-13lzymILAyKfwTmvE0hAbuek4GnsZosg=.55fe59ce-31c2-4532-b573-686cc3275d36@github.com>
References: <pAeVupXPcD-13lzymILAyKfwTmvE0hAbuek4GnsZosg=.55fe59ce-31c2-4532-b573-686cc3275d36@github.com>
Message-ID: <LrjgV80FW0lgNt59LkJj3_c-4V6QMqHP7flCbEQEPCk=.02ebb41a-92c7-4c6d-b733-dbeeb3d866fe@github.com>

On Thu, 18 Nov 2021 21:56:58 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

> Use the version jni_id_for_impl() as jni_id_for() that takes out the JFieldIdCreation_lock before reading jni_ids in InstanceKlass.
> Tested with mach5 tier1-3.

Hi Coleen,

The changes in themselves seem fine. My only concern is whether always locking will introduce contention and impact performance. The code was attempting the classic pattern of doing a lock-free query first, but as you note it lacks the necessary memory ordering operations. So if needed we could make the lock-free path work correctly.

Thanks,
David

src/hotspot/share/oops/instanceKlass.cpp line 2064:

> 2062: }
> 2063: 
> 2064: /* jni_id_forfor jfieldIds only */

space needed between for's :)

src/hotspot/share/oops/instanceKlass.cpp line 2067:

> 2065: JNIid* InstanceKlass::jni_id_for(int offset) {
> 2066:   MutexLocker ml(JfieldIdCreation_lock);
> 2067:   // Retry lookup after we got the lock

The comment doesn't make sense now as there is only one lookup.

-------------

Marked as reviewed by dholmes (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6466

From smarks at openjdk.java.net  Fri Nov 19 02:32:41 2021
From: smarks at openjdk.java.net (Stuart Marks)
Date: Fri, 19 Nov 2021 02:32:41 GMT
Subject: RFR: JDK-8276422 Add command-line option to disable finalization
 [v3]
In-Reply-To: <kGrNumOn8SMaxQk_ytrzJy43rIBpwggv1Ldv00_NmVQ=.ade6b505-0d71-4027-8ff7-960d8097e9d4@github.com>
References: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
 <4fYIR8SXkUTipn7wyUCWmBYgYwsL1MRd1bKmxyV6YLk=.9a9aef46-d959-402a-a6d8-c0fb65f06983@github.com>
 <0NKLwjDgge6gVJVHZr8o87VQvjz2FhNte1UPgiqs9qA=.c4eda9c4-d8a7-4ccc-8c64-9967e3c2923c@github.com>
 <YTASD5_5y15YeRn25J_mAc1hqS3sZJM4497QUTEbjzw=.a12a2486-9d75-41be-86d1-a9ebfddadd98@github.com>
 <kGrNumOn8SMaxQk_ytrzJy43rIBpwggv1Ldv00_NmVQ=.ade6b505-0d71-4027-8ff7-960d8097e9d4@github.com>
Message-ID: <NWK7KqbG-ARwRUXSe2zj5Q8P8H_TICmehZT_G5Dnq40=.5cf3912b-4977-47e6-98f9-e4e411a47bae@github.com>

On Fri, 19 Nov 2021 01:03:22 GMT, David Holmes <dholmes at openjdk.org> wrote:

> > When the finalization is disabled, perhaps jcmd GC.finalizer_info should just be made as a nop in the VM.
> 
> Yes that is a trivial change to add. @stuart-marks I can provide the code. You can choose whether to include in this PR or else we can do a follow-up.

Seems simple enough. Is there any testing that needs to be done for this? Does jcmd output require CSR review? I guess there would be a compatibility issue if there were something that was parsing the output of jcmd. Or is it solely intended to be read by humans?

-------------

PR: https://git.openjdk.java.net/jdk/pull/6442

From smarks at openjdk.java.net  Fri Nov 19 02:35:44 2021
From: smarks at openjdk.java.net (Stuart Marks)
Date: Fri, 19 Nov 2021 02:35:44 GMT
Subject: RFR: JDK-8276422 Add command-line option to disable finalization
 [v4]
In-Reply-To: <jkeqaOnxyjD0r-pnJvzEYb3zBy8YhpkLDg2ZP8ehgK0=.b1bd30ca-a6ce-411e-98f6-0400043eb307@github.com>
References: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
 <KdQiau7Z_SiFlF6LZJr2iMCilTHuvVuAcs9t_u-dH0s=.a1fd6719-c3dc-400f-9c65-0c70ba321120@github.com>
 <jkeqaOnxyjD0r-pnJvzEYb3zBy8YhpkLDg2ZP8ehgK0=.b1bd30ca-a6ce-411e-98f6-0400043eb307@github.com>
Message-ID: <OFHUUV9ddjmpiwYeM0u9J93ePEOq7A2TtXNDWjg9Jqw=.ce00fd2c-a6cf-4118-8c4d-b8794e2edd83@github.com>

On Fri, 19 Nov 2021 00:59:10 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Stuart Marks has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Remove Finalizer.Holder class.
>
> src/java.base/share/classes/java/lang/ref/Finalizer.java line 64:
> 
>> 62:     }
>> 63: 
>> 64:     static final boolean ENABLED = isFinalizationEnabled();
> 
> private?

Yeah, probably should be private. Other stuff in this class is private except things that are used from outside.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6442

From coleenp at openjdk.java.net  Fri Nov 19 02:39:15 2021
From: coleenp at openjdk.java.net (Coleen Phillimore)
Date: Fri, 19 Nov 2021 02:39:15 GMT
Subject: RFR: 8277342: vmTestbase/nsk/stress/strace/strace004.java fails
 with SIGSEGV  in InstanceKlass::jni_id_for [v2]
In-Reply-To: <pAeVupXPcD-13lzymILAyKfwTmvE0hAbuek4GnsZosg=.55fe59ce-31c2-4532-b573-686cc3275d36@github.com>
References: <pAeVupXPcD-13lzymILAyKfwTmvE0hAbuek4GnsZosg=.55fe59ce-31c2-4532-b573-686cc3275d36@github.com>
Message-ID: <Im7D9kQOcv79sy_wAcG_jEVtZconnkJxMyqrxMx1cGY=.fec5deef-7db8-4b43-823c-e77c3336a396@github.com>

> Use the version jni_id_for_impl() as jni_id_for() that takes out the JFieldIdCreation_lock before reading jni_ids in InstanceKlass.
> Tested with mach5 tier1-3.

Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision:

  Fix comments.

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6466/files
  - new: https://git.openjdk.java.net/jdk/pull/6466/files/47cb164b..c44b86d5

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6466&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6466&range=00-01

  Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6466.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6466/head:pull/6466

PR: https://git.openjdk.java.net/jdk/pull/6466

From coleenp at openjdk.java.net  Fri Nov 19 02:42:41 2021
From: coleenp at openjdk.java.net (Coleen Phillimore)
Date: Fri, 19 Nov 2021 02:42:41 GMT
Subject: RFR: 8277342: vmTestbase/nsk/stress/strace/strace004.java fails
 with SIGSEGV  in InstanceKlass::jni_id_for [v2]
In-Reply-To: <Im7D9kQOcv79sy_wAcG_jEVtZconnkJxMyqrxMx1cGY=.fec5deef-7db8-4b43-823c-e77c3336a396@github.com>
References: <pAeVupXPcD-13lzymILAyKfwTmvE0hAbuek4GnsZosg=.55fe59ce-31c2-4532-b573-686cc3275d36@github.com>
 <Im7D9kQOcv79sy_wAcG_jEVtZconnkJxMyqrxMx1cGY=.fec5deef-7db8-4b43-823c-e77c3336a396@github.com>
Message-ID: <eIkGUGFEFc6C1mabxMqBdQG6Hmm9S9Nq4VpWMbt118s=.88e3310c-497c-4307-ba65-6fe90de8e86e@github.com>

On Fri, 19 Nov 2021 02:39:15 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

>> Use the version jni_id_for_impl() as jni_id_for() that takes out the JFieldIdCreation_lock before reading jni_ids in InstanceKlass.
>> Tested with mach5 tier1-3.
>
> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix comments.

Thanks for reviewing.  I fixed the comments.  I will see if I can find some jni performance tests tomorrow.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6466

From jpai at openjdk.java.net  Fri Nov 19 04:24:37 2021
From: jpai at openjdk.java.net (Jaikiran Pai)
Date: Fri, 19 Nov 2021 04:24:37 GMT
Subject: RFR: JDK-8276422 Add command-line option to disable finalization
 [v4]
In-Reply-To: <NG2-01MccUGiwxMb7gXCIGkmjrCH9oXbHny0BW0JhnA=.d38611a7-4aa1-4711-9a4b-fb5e29116fcf@github.com>
References: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
 <aY7bIf24C-sA8R3hoi6dHmqd2R6QgzGUMg4UiWsy_5w=.a7144786-a08d-4def-9468-f13848200656@github.com>
 <NG2-01MccUGiwxMb7gXCIGkmjrCH9oXbHny0BW0JhnA=.d38611a7-4aa1-4711-9a4b-fb5e29116fcf@github.com>
Message-ID: <75LlCxFbmJ2QwkczlGFpPQN7Gl3gAsfylqufYtIOkcI=.427249fe-b817-4eae-9395-9c7095d05839@github.com>

On Fri, 19 Nov 2021 00:14:34 GMT, Stuart Marks <smarks at openjdk.org> wrote:

>> src/java.base/share/classes/java/lang/ref/Finalizer.java line 195:
>> 
>>> 193: 
>>> 194:     static {
>>> 195:         if (Holder.ENABLED) {
>> 
>> Hello Stuart,
>> My understanding of the the lazy `Holder` is that it's there to delay the static initialization of the code that's part of the `Holder`. In this case here, the `Holder` is being used right within the `static` block of the `Finalizer` class, that too as the first thing. In this case, is that `Holder` class necessary?
>
> I pushed an update to remove the Holder class. It seems to continue to work fine. Thanks for pointing this out @jaikiran !

Thank you Stuart, this changed version looks fine to me.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6442

From dholmes at openjdk.java.net  Fri Nov 19 04:59:39 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Fri, 19 Nov 2021 04:59:39 GMT
Subject: RFR: JDK-8276422 Add command-line option to disable finalization
 [v3]
In-Reply-To: <NWK7KqbG-ARwRUXSe2zj5Q8P8H_TICmehZT_G5Dnq40=.5cf3912b-4977-47e6-98f9-e4e411a47bae@github.com>
References: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
 <4fYIR8SXkUTipn7wyUCWmBYgYwsL1MRd1bKmxyV6YLk=.9a9aef46-d959-402a-a6d8-c0fb65f06983@github.com>
 <0NKLwjDgge6gVJVHZr8o87VQvjz2FhNte1UPgiqs9qA=.c4eda9c4-d8a7-4ccc-8c64-9967e3c2923c@github.com>
 <YTASD5_5y15YeRn25J_mAc1hqS3sZJM4497QUTEbjzw=.a12a2486-9d75-41be-86d1-a9ebfddadd98@github.com>
 <kGrNumOn8SMaxQk_ytrzJy43rIBpwggv1Ldv00_NmVQ=.ade6b505-0d71-4027-8ff7-960d8097e9d4@github.com>
 <NWK7KqbG-ARwRUXSe2zj5Q8P8H_TICmehZT_G5Dnq40=.5cf3912b-4977-47e6-98f9-e4e411a47bae@github.com>
Message-ID: <T4cc-J5CulYZAC3vh2SpmJQoPOVFpMxiS_UGLdlTUWk=.9b96be48-1804-47d9-9427-7ec95ba6f9df@github.com>

On Fri, 19 Nov 2021 02:29:33 GMT, Stuart Marks <smarks at openjdk.org> wrote:

>>> When the finalization is disabled, perhaps jcmd GC.finalizer_info should just be made as a nop in the VM.
>> 
>> Yes that is a trivial change to add. @stuart-marks I can provide the code. You can choose whether to include in this PR or else we can do a follow-up.
>
>> > When the finalization is disabled, perhaps jcmd GC.finalizer_info should just be made as a nop in the VM.
>> 
>> Yes that is a trivial change to add. @stuart-marks I can provide the code. You can choose whether to include in this PR or else we can do a follow-up.
> 
> Seems simple enough. Is there any testing that needs to be done for this? Does jcmd output require CSR review? I guess there would be a compatibility issue if there were something that was parsing the output of jcmd. Or is it solely intended to be read by humans?

@stuart-marks No CSR needed for this as no output format is specified. Plus this command already has a simple text response when there are no finalizers queued. E.g.
``` 
> ../build/linux-x64-debug-finalization/images/jdk/bin/jcmd 27939 GC.finalizer_info
27939:
No instances waiting for finalization found

so when finalization is disabled this just becomes:
``` 
> ../build/linux-x64-debug-finalization/images/jdk/bin/jcmd 28018 GC.finalizer_info
28018:
Finalization is disabled

There is a test for this Dcmd, but it doesn't test the "nothing here" case so I don't think it is necessary to augment it for this case:
`hotspot/jtreg/serviceability/dcmd/gc/FinalizerInfoTest.java`

-------------

PR: https://git.openjdk.java.net/jdk/pull/6442

From kbarrett at openjdk.java.net  Fri Nov 19 05:48:51 2021
From: kbarrett at openjdk.java.net (Kim Barrett)
Date: Fri, 19 Nov 2021 05:48:51 GMT
Subject: RFR: JDK-8276422 Add command-line option to disable finalization
 [v4]
In-Reply-To: <KdQiau7Z_SiFlF6LZJr2iMCilTHuvVuAcs9t_u-dH0s=.a1fd6719-c3dc-400f-9c65-0c70ba321120@github.com>
References: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
 <KdQiau7Z_SiFlF6LZJr2iMCilTHuvVuAcs9t_u-dH0s=.a1fd6719-c3dc-400f-9c65-0c70ba321120@github.com>
Message-ID: <_pYTRBr5EsVrnnMnDOgp1KiJGvDpyqb5cPVgWRu88mA=.abedd733-19e3-4f57-9714-cfe7da8f96d1@github.com>

On Fri, 19 Nov 2021 00:14:18 GMT, Stuart Marks <smarks at openjdk.org> wrote:

>> Pretty much what it says. The new option controls a static member in InstanceKlass that's consulted to determine whether the finalization machinery is activated for instances when a class is loaded. A new native method is added so that this state can be queried from Java. This is used to control whether a finalizer thread is created and to disable the `System` and `Runtime::runFinalization` methods. Includes tests for the above.
>> 
>> Adding an option to disable finalization is part of [JEP 421](https://openjdk.java.net/jeps/421).
>
> Stuart Marks has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Remove Finalizer.Holder class.

Marked as reviewed by kbarrett (Reviewer).

-------------

PR: https://git.openjdk.java.net/jdk/pull/6442

From thartmann at openjdk.java.net  Fri Nov 19 07:07:38 2021
From: thartmann at openjdk.java.net (Tobias Hartmann)
Date: Fri, 19 Nov 2021 07:07:38 GMT
Subject: RFR: 8275643: C2's unaryOp vector intrinsic does not properly
 handle LongVector.neg
In-Reply-To: <VIWd1DGe48UKFJdi61wqQqYOOuFed9_yEgqMaz1444k=.43162c19-75d4-4806-81dc-29ad772d7155@github.com>
References: <VIWd1DGe48UKFJdi61wqQqYOOuFed9_yEgqMaz1444k=.43162c19-75d4-4806-81dc-29ad772d7155@github.com>
Message-ID: <510uXpyEyiEC6_m1bWk_USuJ1kpv5QSgImi1NheRBkw=.52c4fbdf-713c-4868-ade3-2d52c155d915@github.com>

On Wed, 17 Nov 2021 11:41:04 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote:

> Code in `LongVector::lanewiseTemplate` currently implements the `NEG` operation as a `SUB` and has a corresponding `FIXME` comment:
> https://github.com/openjdk/jdk/blob/e9934e1243929514e147ecdd3cefa74168ed0500/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/LongVector.java#L534-L541
> 
> The implicit assumption is that since we will never pass `NEG` to `VectorSupport.unaryOp` in line 540, the corresponding C2 intrinsic does not need to handle that case. That's not guaranteed though because C2 might still compile that path when not being able to prove that it's unreachable at parse time. As a result, we then assert in the intrinsic because the negation operation on a long vector is currently not supported (i.e. there is no `Op_NegVL`). I propose to simply handle this case in ` VectorSupport::vop2ideal`. We will then bail out from intrinsification with `operation not supported: opc=NegL bt=long` because `VectorNode::opcode` returns 0:
> https://github.com/openjdk/jdk/blob/e9934e1243929514e147ecdd3cefa74168ed0500/src/hotspot/share/opto/vectorIntrinsics.cpp#L390-L394
> 
> Question to the Vector API experts: There are other `FIXME: Support this in the JIT` comments in the code. Do these code paths suffer from similar issues? Is there a tracking RFE/bug?
> 
> Thanks,
> Tobias

Thanks for checking!

-------------

PR: https://git.openjdk.java.net/jdk/pull/6428

From thartmann at openjdk.java.net  Fri Nov 19 07:10:49 2021
From: thartmann at openjdk.java.net (Tobias Hartmann)
Date: Fri, 19 Nov 2021 07:10:49 GMT
Subject: Integrated: 8275643: C2's unaryOp vector intrinsic does not properly
 handle LongVector.neg
In-Reply-To: <VIWd1DGe48UKFJdi61wqQqYOOuFed9_yEgqMaz1444k=.43162c19-75d4-4806-81dc-29ad772d7155@github.com>
References: <VIWd1DGe48UKFJdi61wqQqYOOuFed9_yEgqMaz1444k=.43162c19-75d4-4806-81dc-29ad772d7155@github.com>
Message-ID: <lZwj0OwxTtg3IVj84Yw_FeyNekOyrmdvpYkansqXD88=.1a6caa7d-9191-4cb5-82ef-3428c4358f02@github.com>

On Wed, 17 Nov 2021 11:41:04 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote:

> Code in `LongVector::lanewiseTemplate` currently implements the `NEG` operation as a `SUB` and has a corresponding `FIXME` comment:
> https://github.com/openjdk/jdk/blob/e9934e1243929514e147ecdd3cefa74168ed0500/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/LongVector.java#L534-L541
> 
> The implicit assumption is that since we will never pass `NEG` to `VectorSupport.unaryOp` in line 540, the corresponding C2 intrinsic does not need to handle that case. That's not guaranteed though because C2 might still compile that path when not being able to prove that it's unreachable at parse time. As a result, we then assert in the intrinsic because the negation operation on a long vector is currently not supported (i.e. there is no `Op_NegVL`). I propose to simply handle this case in ` VectorSupport::vop2ideal`. We will then bail out from intrinsification with `operation not supported: opc=NegL bt=long` because `VectorNode::opcode` returns 0:
> https://github.com/openjdk/jdk/blob/e9934e1243929514e147ecdd3cefa74168ed0500/src/hotspot/share/opto/vectorIntrinsics.cpp#L390-L394
> 
> Question to the Vector API experts: There are other `FIXME: Support this in the JIT` comments in the code. Do these code paths suffer from similar issues? Is there a tracking RFE/bug?
> 
> Thanks,
> Tobias

This pull request has now been integrated.

Changeset: 47564cae
Author:    Tobias Hartmann <thartmann at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/47564caeb0628e5c03a0e7f04093adce77d6dd3b
Stats:     51 lines in 2 files changed: 51 ins; 0 del; 0 mod

8275643: C2's unaryOp vector intrinsic does not properly handle LongVector.neg

Reviewed-by: chagedorn, sviswanathan

-------------

PR: https://git.openjdk.java.net/jdk/pull/6428

From thartmann at openjdk.java.net  Fri Nov 19 07:12:54 2021
From: thartmann at openjdk.java.net (Tobias Hartmann)
Date: Fri, 19 Nov 2021 07:12:54 GMT
Subject: RFR: 8277212: GC accidentally cleans valid megamorphic vtable
 inline caches [v2]
In-Reply-To: <hxvf5rgvpTrlGwzCyqz2rbGsJD5kNuz3U8_9iVHRh8I=.2f6ce421-2f91-4f0b-a722-3b7ac1346ebd@github.com>
References: <9wD0oJ2P5bG1qYJ71qdCN4-Q_fiOkKWh4kXTdw8Yb8o=.fc8b7be9-7818-4414-9177-542e2b061480@github.com>
 <hxvf5rgvpTrlGwzCyqz2rbGsJD5kNuz3U8_9iVHRh8I=.2f6ce421-2f91-4f0b-a722-3b7ac1346ebd@github.com>
Message-ID: <xJhJDd9xvsC7f6Eeoqqk-ts2iTHVfdjt89HErKDCr_4=.df22ddb6-51c2-4e9f-bcde-a6cae93c2613@github.com>

On Thu, 18 Nov 2021 15:26:10 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:

>> We got a report on the zgc-dev list about a large performance issue affecting ZGC:
>> https://mail.openjdk.java.net/pipermail/zgc-dev/2021-November/001086.html
>> 
>> One of the issues that the reporter identified was that we could get extremely long class unloading / unlinking times:
>> 
>> [17606.140s][info][gc,phases ] GC(719) Concurrent Process Non-Strong References 25781.928ms
>> [17610.181s][info][gc,stats ] Subphase: Concurrent Classes Unlink 14280.772 / 25769.511 1126.563 / 25769.511 217.882 / 68385.750 217.882 / 68385.750 ms
>> 
>> and while this were happening we got a huge number of ICBufferFull safepoints.
>> 
>> It turns out that we have a 10-year-old bug in the inline cache cleaning code. This code came in with the permgen removal. See how the original code only calls set_to_clean when ic_oop is non-null:
>> 
>> https://github.com/openjdk/jdk/commit/5c58d27aac7b291b879a7a3ff6f39fca25619103
>> 
>>         CompiledIC *ic = CompiledIC_at(iter.reloc());
>>         oop ic_oop = ic->cached_oop();
>>         if (ic_oop != NULL && !is_alive->do_object_b(ic_oop)) {
>>           // The only exception is compiledICHolder oops which may
>>           // yet be marked below. (We check this further below).
>>           if (ic_oop->is_compiledICHolder()) {
>>             compiledICHolderOop cichk_oop = compiledICHolderOop(ic_oop);
>>             if (is_alive->do_object_b(
>>                   cichk_oop->holder_method()->method_holder()) &&
>>                 is_alive->do_object_b(cichk_oop->holder_klass())) {
>>               continue;
>>             }
>>           }
>>           ic->set_to_clean();
>>           assert(ic->cached_oop() == NULL,
>>                  "cached oop in IC should be cleared");
>>         }
>>       }
>> 
>> 
>> The rewritten code put the set_to_clean call in a different scope, causing the CompiledIC to also be cleaned when ic_oop is NULL:
>> 
>>         CompiledIC *ic = CompiledIC_at(iter.reloc());
>>         if (ic->is_icholder_call()) {
>>           // The only exception is compiledICHolder oops which may
>>           // yet be marked below. (We check this further below).
>>           CompiledICHolder* cichk_oop = ic->cached_icholder();
>>           if (cichk_oop->holder_method()->method_holder()->is_loader_alive(is_alive) &&
>>               cichk_oop->holder_klass()->is_loader_alive(is_alive)) {
>>               continue;
>>             }
>>         } else {
>>           Metadata* ic_oop = ic->cached_metadata();
>>           if (ic_oop != NULL) {
>>             if (ic_oop->is_klass()) {
>>               if (((Klass*)ic_oop)->is_loader_alive(is_alive)) {
>>                 continue;
>>               }
>>             } else if (ic_oop->is_method()) {
>>               if (((Method*)ic_oop)->method_holder()->is_loader_alive(is_alive)) {
>>                 continue;
>>               }
>>             } else {
>>               ShouldNotReachHere();
>>             }
>>           }
>>           }
>>           ic->set_to_clean();
>>       }
>> 
>> 
>> Note the weird indentation, which could be seen as a hint that this might be a dubious / accidental change.
>> 
>> To understand why this is causing the problems we are seeing it's good to start by reading:
>> https://wiki.openjdk.java.net/display/HotSpot/Overview+of+CompiledIC+and+CompiledStaticCall
>> 
>> When the GC hits this path and finds an ic_oop that is NULL, it means that it is dealing with an inline cache that is a megamorphic vtable call (or clean). Those should not be cleaned (at least that wasn't the intention of the old code).
>> 
>> But now we do clean them, and to do so we use an ICStub (to make a safe transition to the clean state), which uses up slots in the ICBuffer. When the ICBuffer is full, concurrent GCs have to stop and schedule an ICBufferFull safepoint stop-the-world operation, which removes the ICStub from the inline cache and completely frees up the ICBuffer. If the GC cleans a lot of these megamorphic vtable inline caches, then we'll create a large number of ICBufferFull safepoints.
>> 
>> But it is even worse than that. After the class unloading GCs have destroyed all megamorphic vtable inline caches, the Java threads will see these cleaned inline caches and correct them. Correcting the cleaned inline caches from the Java threads will also use ICStubs, and eventually the inline caches will transition back to be a megamorphic vtable calls. Because of this we can end up in a situation where the GC and Java threads change the inline cache back and forth between clean and megamorphic vtable calls. When this happen both GC and Java threads will continuously schedule ICBufferFull safepoints, and this can go on for many seconds, even minutes, if we are unlucky. For ZGC this has the effect that it blocks any further GC work, and eventually the Java threads will run out of memory and hit allocation stalls. The Java threads will then wait for the GC "clean" all inline caches and exit the class unloading phase and proceed to the phase where memory is reclaimed. You can see in
  the GC logs that even though the problematic unlinking phase goes on for many seconds, the allocation stalls are "only" a few hundred milliseconds. This shows that when the Java threads stop fighting over the inline caches, the GC can finish the work relatively quickly.
>> 
>> G1 performs the inline cache cleaning while the Java threads are stopped, and therefore don't have to use ICStubs when the megamorphic vtables are accidentally cleaned. So, G1 (and other stop-the-world class unloading GCs) won't enter the situation where the GC and Java thread concurrently fight over the inline caches. It still causes the Java threads to have to take a slow path and fix the inline caches, which can result in unnecessary ICBufferFull safepoints.
>> 
>> I been able to reproduce the issue where ZGC and the Java threads fight over the ICStubs, causing minute long unloading times, by running one of the microbenchmarks from the Blackbird library used by the reporter of this issue. See description in:
>> https://mail.openjdk.java.net/pipermail/zgc-dev/2021-November/001096.html
>> 
>> I think this could be reproduced in other workloads as well. I've also been able to reproduce the excessive ICBufferFull safepoints with Kitchensink (an oracle-internal stress test).
>> 
>> I've verified that restoring the set_to_clean code to the right scope fixes the issue that I can reproduce with both Blackbird and Kitchensink. After the fix, the class unloading times go back to normal levels.
>> 
>> To identify this issue, it's good to run with -Xlog:gc*,safepoint and take note of the "Concurrent Process Non-Strong References" times and ICBufferFull safepoint lines.
>> 
>> Example logs from ZGC where concurrent cleaning causes ICBufferFull safepoints:
>> 
>> [38.557s][1637062062666ms][info ][gc,phases   ] GC(222) Concurrent Mark Free 0.001ms
>> [38.565s][1637062062673ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 7389821 ns, Reaching safepoint: 167546 ns, At safepoint: 6840 ns, Total: 174386 ns
>> [38.565s][1637062062673ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 27749 ns, Reaching safepoint: 89368 ns, At safepoint: 5710 ns, Total: 95078 ns
>> [38.566s][1637062062674ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 678872 ns, Reaching safepoint: 145967 ns, At safepoint: 6969 ns, Total: 152936 ns
>> [38.567s][1637062062675ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 934596 ns, Reaching safepoint: 165826 ns, At safepoint: 5460 ns, Total: 171286 ns
>> [38.567s][1637062062675ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 16500 ns, Reaching safepoint: 91147 ns, At safepoint: 5770 ns, Total: 96917 ns
>> [38.568s][1637062062677ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1124041 ns, Reaching safepoint: 154426 ns, At safepoint: 6280 ns, Total: 160706 ns
>> [38.570s][1637062062678ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1222819 ns, Reaching safepoint: 152646 ns, At safepoint: 6920 ns, Total: 159566 ns
>> [38.571s][1637062062679ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1070303 ns, Reaching safepoint: 152686 ns, At safepoint: 6029 ns, Total: 158715 ns
>> [38.571s][1637062062679ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 23650 ns, Reaching safepoint: 83208 ns, At safepoint: 6170 ns, Total: 89378 ns
>> [38.572s][1637062062681ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1005014 ns, Reaching safepoint: 148206 ns, At safepoint: 5660 ns, Total: 153866 ns
>> [38.572s][1637062062681ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 15110 ns, Reaching safepoint: 84047 ns, At safepoint: 5690 ns, Total: 89737 ns
>> [38.574s][1637062062682ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1370755 ns, Reaching safepoint: 171876 ns, At safepoint: 5030 ns, Total: 176906 ns
>> [38.574s][1637062062682ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 19749 ns, Reaching safepoint: 82478 ns, At safepoint: 4740 ns, Total: 87218 ns
>> [38.574s][1637062062682ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 12480 ns, Reaching safepoint: 86707 ns, At safepoint: 5040 ns, Total: 91747 ns
>> [38.575s][1637062062684ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 904007 ns, Reaching safepoint: 162666 ns, At safepoint: 5160 ns, Total: 167826 ns
>> [38.575s][1637062062684ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14269 ns, Reaching safepoint: 80878 ns, At safepoint: 5420 ns, Total: 86298 ns
>> [38.577s][1637062062685ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1240908 ns, Reaching safepoint: 144267 ns, At safepoint: 7030 ns, Total: 151297 ns
>> [38.578s][1637062062686ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 971325 ns, Reaching safepoint: 175725 ns, At safepoint: 4710 ns, Total: 180435 ns
>> [38.578s][1637062062686ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 16140 ns, Reaching safepoint: 80258 ns, At safepoint: 5389 ns, Total: 85647 ns
>> [38.578s][1637062062686ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 10290 ns, Reaching safepoint: 80858 ns, At safepoint: 5530 ns, Total: 86388 ns
>> [38.579s][1637062062687ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 430509 ns, Reaching safepoint: 159906 ns, At safepoint: 4610 ns, Total: 164516 ns
>> [38.579s][1637062062687ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 18269 ns, Reaching safepoint: 83838 ns, At safepoint: 4520 ns, Total: 88358 ns
>> [38.579s][1637062062687ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 13270 ns, Reaching safepoint: 77928 ns, At safepoint: 4790 ns, Total: 82718 ns
>> [38.579s][1637062062688ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 384230 ns, Reaching safepoint: 193705 ns, At safepoint: 4080 ns, Total: 197785 ns
>> [38.579s][1637062062688ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14099 ns, Reaching safepoint: 80908 ns, At safepoint: 4840 ns, Total: 85748 ns
>> [38.580s][1637062062688ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 9150 ns, Reaching safepoint: 79268 ns, At safepoint: 4890 ns, Total: 84158 ns
>> [38.580s][1637062062689ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 549396 ns, Reaching safepoint: 143086 ns, At safepoint: 6430 ns, Total: 149516 ns
>> [38.580s][1637062062689ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 12540 ns, Reaching safepoint: 94717 ns, At safepoint: 5800 ns, Total: 100517 ns
>> [38.581s][1637062062690ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 847758 ns, Reaching safepoint: 146687 ns, At safepoint: 5969 ns, Total: 152656 ns
>> [38.582s][1637062062691ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 972285 ns, Reaching safepoint: 128177 ns, At safepoint: 6350 ns, Total: 134527 ns
>> [38.584s][1637062062692ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 986975 ns, Reaching safepoint: 136396 ns, At safepoint: 5770 ns, Total: 142166 ns
>> [38.584s][1637062062692ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 17280 ns, Reaching safepoint: 87097 ns, At safepoint: 5270 ns, Total: 92367 ns
>> [38.585s][1637062062693ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1143131 ns, Reaching safepoint: 188315 ns, At safepoint: 5250 ns, Total: 193565 ns
>> [38.585s][1637062062694ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 12200 ns, Reaching safepoint: 80168 ns, At safepoint: 7480 ns, Total: 87648 ns
>> [38.586s][1637062062695ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1153410 ns, Reaching safepoint: 166846 ns, At safepoint: 7060 ns, Total: 173906 ns
>> [38.587s][1637062062695ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 21549 ns, Reaching safepoint: 89898 ns, At safepoint: 5360 ns, Total: 95258 ns
>> [38.588s][1637062062696ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1129411 ns, Reaching safepoint: 156726 ns, At safepoint: 4810 ns, Total: 161536 ns
>> [38.588s][1637062062696ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14089 ns, Reaching safepoint: 80588 ns, At safepoint: 5170 ns, Total: 85758 ns
>> [38.589s][1637062062697ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 644824 ns, Reaching safepoint: 140666 ns, At safepoint: 5990 ns, Total: 146656 ns
>> [38.590s][1637062062699ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1085312 ns, Reaching safepoint: 254264 ns, At safepoint: 5440 ns, Total: 259704 ns
>> [38.590s][1637062062699ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14609 ns, Reaching safepoint: 83748 ns, At safepoint: 5610 ns, Total: 89358 ns
>> [38.591s][1637062062699ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 387680 ns, Reaching safepoint: 201215 ns, At safepoint: 5340 ns, Total: 206555 ns
>> [38.591s][1637062062699ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 18929 ns, Reaching safepoint: 85098 ns, At safepoint: 5910 ns, Total: 91008 ns
>> [38.591s][1637062062700ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 380750 ns, Reaching safepoint: 175066 ns, At safepoint: 4730 ns, Total: 179796 ns
>> [38.592s][1637062062700ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14500 ns, Reaching safepoint: 80577 ns, At safepoint: 6790 ns, Total: 87367 ns
>> [38.592s][1637062062700ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14660 ns, Reaching safepoint: 78498 ns, At safepoint: 7180 ns, Total: 85678 ns
>> [38.592s][1637062062701ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 655783 ns, Reaching safepoint: 141717 ns, At safepoint: 6089 ns, Total: 147806 ns
>> [38.594s][1637062062702ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 916657 ns, Reaching safepoint: 144226 ns, At safepoint: 5360 ns, Total: 149586 ns
>> [38.595s][1637062062703ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1012334 ns, Reaching safepoint: 133037 ns, At safepoint: 10439 ns, Total: 143476 ns
>> [38.597s][1637062062705ms][info ][gc,phases   ] GC(222) Concurrent Process Non-Strong References 39.443ms
>> 
>> 
>> Example logs from G1 where the Java threads fixes the cleaned inline caches and run out of ICStubs:
>> 
>> [125.998s][1637065197322ms][info ][gc          ] GC(1040) Pause Remark 586M->414M(2048M) 6.609ms
>> [125.998s][1637065197322ms][info ][gc,cpu      ] GC(1040) User=0.08s Sys=0.00s Real=0.01s
>> [125.998s][1637065197322ms][info ][safepoint   ] Safepoint "G1Concurrent", Time since last: 33150646 ns, Reaching safepoint: 103457 ns, At safepoint: 6666988 ns, Total: 6770445 ns
>> [125.998s][1637065197322ms][info ][gc,marking  ] GC(1040) Concurrent Mark 38.296ms
>> [125.998s][1637065197322ms][info ][gc,marking  ] GC(1040) Concurrent Rebuild Remembered Sets
>> [126.001s][1637065197326ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 2621782 ns, Reaching safepoint: 626684 ns, At safepoint: 9340 ns, Total: 636024 ns
>> [126.002s][1637065197326ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 19949 ns, Reaching safepoint: 714022 ns, At safepoint: 12160 ns, Total: 726182 ns
>> [126.007s][1637065197331ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 4665009 ns, Reaching safepoint: 339751 ns, At safepoint: 9640 ns, Total: 349391 ns
>> [126.009s][1637065197334ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 2274802 ns, Reaching safepoint: 365760 ns, At safepoint: 9250 ns, Total: 375010 ns
>> [126.027s][1637065197352ms][info ][gc,marking  ] GC(1040) Concurrent Rebuild Remembered Sets 29.618ms
>> 
>> 
>> I've tested the performance of the change with SPECjbb2015, SPECjvm2008, DaCapo, Renaissance.
>> 
>> I've tested run the patch through tier1-7.
>> 
>> Note that I've made patch as small as possible to make it easier to backport. Thanks @fisk for discussion and explanation of the inline caches code.
>
> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Review Coleen

Good catch and great summary!

-------------

Marked as reviewed by thartmann (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6450

From stefank at openjdk.java.net  Fri Nov 19 08:04:43 2021
From: stefank at openjdk.java.net (Stefan Karlsson)
Date: Fri, 19 Nov 2021 08:04:43 GMT
Subject: RFR: 8277212: GC accidentally cleans valid megamorphic vtable
 inline caches [v2]
In-Reply-To: <hxvf5rgvpTrlGwzCyqz2rbGsJD5kNuz3U8_9iVHRh8I=.2f6ce421-2f91-4f0b-a722-3b7ac1346ebd@github.com>
References: <9wD0oJ2P5bG1qYJ71qdCN4-Q_fiOkKWh4kXTdw8Yb8o=.fc8b7be9-7818-4414-9177-542e2b061480@github.com>
 <hxvf5rgvpTrlGwzCyqz2rbGsJD5kNuz3U8_9iVHRh8I=.2f6ce421-2f91-4f0b-a722-3b7ac1346ebd@github.com>
Message-ID: <0tGQlAt66ViSYdooeKTmTfDeOQXmKwxN_0U7CVZ5BTw=.b6ce9010-04ba-463f-8c34-c5ddb2ac1e53@github.com>

On Thu, 18 Nov 2021 15:26:10 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:

>> We got a report on the zgc-dev list about a large performance issue affecting ZGC:
>> https://mail.openjdk.java.net/pipermail/zgc-dev/2021-November/001086.html
>> 
>> One of the issues that the reporter identified was that we could get extremely long class unloading / unlinking times:
>> 
>> [17606.140s][info][gc,phases ] GC(719) Concurrent Process Non-Strong References 25781.928ms
>> [17610.181s][info][gc,stats ] Subphase: Concurrent Classes Unlink 14280.772 / 25769.511 1126.563 / 25769.511 217.882 / 68385.750 217.882 / 68385.750 ms
>> 
>> and while this were happening we got a huge number of ICBufferFull safepoints.
>> 
>> It turns out that we have a 10-year-old bug in the inline cache cleaning code. This code came in with the permgen removal. See how the original code only calls set_to_clean when ic_oop is non-null:
>> 
>> https://github.com/openjdk/jdk/commit/5c58d27aac7b291b879a7a3ff6f39fca25619103
>> 
>>         CompiledIC *ic = CompiledIC_at(iter.reloc());
>>         oop ic_oop = ic->cached_oop();
>>         if (ic_oop != NULL && !is_alive->do_object_b(ic_oop)) {
>>           // The only exception is compiledICHolder oops which may
>>           // yet be marked below. (We check this further below).
>>           if (ic_oop->is_compiledICHolder()) {
>>             compiledICHolderOop cichk_oop = compiledICHolderOop(ic_oop);
>>             if (is_alive->do_object_b(
>>                   cichk_oop->holder_method()->method_holder()) &&
>>                 is_alive->do_object_b(cichk_oop->holder_klass())) {
>>               continue;
>>             }
>>           }
>>           ic->set_to_clean();
>>           assert(ic->cached_oop() == NULL,
>>                  "cached oop in IC should be cleared");
>>         }
>>       }
>> 
>> 
>> The rewritten code put the set_to_clean call in a different scope, causing the CompiledIC to also be cleaned when ic_oop is NULL:
>> 
>>         CompiledIC *ic = CompiledIC_at(iter.reloc());
>>         if (ic->is_icholder_call()) {
>>           // The only exception is compiledICHolder oops which may
>>           // yet be marked below. (We check this further below).
>>           CompiledICHolder* cichk_oop = ic->cached_icholder();
>>           if (cichk_oop->holder_method()->method_holder()->is_loader_alive(is_alive) &&
>>               cichk_oop->holder_klass()->is_loader_alive(is_alive)) {
>>               continue;
>>             }
>>         } else {
>>           Metadata* ic_oop = ic->cached_metadata();
>>           if (ic_oop != NULL) {
>>             if (ic_oop->is_klass()) {
>>               if (((Klass*)ic_oop)->is_loader_alive(is_alive)) {
>>                 continue;
>>               }
>>             } else if (ic_oop->is_method()) {
>>               if (((Method*)ic_oop)->method_holder()->is_loader_alive(is_alive)) {
>>                 continue;
>>               }
>>             } else {
>>               ShouldNotReachHere();
>>             }
>>           }
>>           }
>>           ic->set_to_clean();
>>       }
>> 
>> 
>> Note the weird indentation, which could be seen as a hint that this might be a dubious / accidental change.
>> 
>> To understand why this is causing the problems we are seeing it's good to start by reading:
>> https://wiki.openjdk.java.net/display/HotSpot/Overview+of+CompiledIC+and+CompiledStaticCall
>> 
>> When the GC hits this path and finds an ic_oop that is NULL, it means that it is dealing with an inline cache that is a megamorphic vtable call (or clean). Those should not be cleaned (at least that wasn't the intention of the old code).
>> 
>> But now we do clean them, and to do so we use an ICStub (to make a safe transition to the clean state), which uses up slots in the ICBuffer. When the ICBuffer is full, concurrent GCs have to stop and schedule an ICBufferFull safepoint stop-the-world operation, which removes the ICStub from the inline cache and completely frees up the ICBuffer. If the GC cleans a lot of these megamorphic vtable inline caches, then we'll create a large number of ICBufferFull safepoints.
>> 
>> But it is even worse than that. After the class unloading GCs have destroyed all megamorphic vtable inline caches, the Java threads will see these cleaned inline caches and correct them. Correcting the cleaned inline caches from the Java threads will also use ICStubs, and eventually the inline caches will transition back to be a megamorphic vtable calls. Because of this we can end up in a situation where the GC and Java threads change the inline cache back and forth between clean and megamorphic vtable calls. When this happen both GC and Java threads will continuously schedule ICBufferFull safepoints, and this can go on for many seconds, even minutes, if we are unlucky. For ZGC this has the effect that it blocks any further GC work, and eventually the Java threads will run out of memory and hit allocation stalls. The Java threads will then wait for the GC to "clean" all inline caches and exit the class unloading phase and proceed to the phase where memory is reclaimed. You can see
  in the GC logs that even though the problematic unlinking phase goes on for many seconds, the allocation stalls are "only" a few hundred milliseconds. This shows that when the Java threads stop fighting over the inline caches, the GC can finish the work relatively quickly.
>> 
>> G1 performs the inline cache cleaning while the Java threads are stopped, and therefore don't have to use ICStubs when the megamorphic vtables are accidentally cleaned. So, G1 (and other stop-the-world class unloading GCs) won't enter the situation where the GC and Java thread concurrently fight over the inline caches. It still causes the Java threads to have to take a slow path and fix the inline caches, which can result in unnecessary ICBufferFull safepoints.
>> 
>> I been able to reproduce the issue where ZGC and the Java threads fight over the ICStubs, causing minute long unloading times, by running one of the microbenchmarks from the Blackbird library used by the reporter of this issue. See description in:
>> https://mail.openjdk.java.net/pipermail/zgc-dev/2021-November/001096.html
>> 
>> I think this could be reproduced in other workloads as well. I've also been able to reproduce the excessive ICBufferFull safepoints with Kitchensink (an oracle-internal stress test).
>> 
>> I've verified that restoring the set_to_clean code to the right scope fixes the issue that I can reproduce with both Blackbird and Kitchensink. After the fix, the class unloading times go back to normal levels.
>> 
>> To identify this issue, it's good to run with -Xlog:gc*,safepoint and take note of the "Concurrent Process Non-Strong References" times and ICBufferFull safepoint lines.
>> 
>> Example logs from ZGC where concurrent cleaning causes ICBufferFull safepoints:
>> 
>> [38.557s][1637062062666ms][info ][gc,phases   ] GC(222) Concurrent Mark Free 0.001ms
>> [38.565s][1637062062673ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 7389821 ns, Reaching safepoint: 167546 ns, At safepoint: 6840 ns, Total: 174386 ns
>> [38.565s][1637062062673ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 27749 ns, Reaching safepoint: 89368 ns, At safepoint: 5710 ns, Total: 95078 ns
>> [38.566s][1637062062674ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 678872 ns, Reaching safepoint: 145967 ns, At safepoint: 6969 ns, Total: 152936 ns
>> [38.567s][1637062062675ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 934596 ns, Reaching safepoint: 165826 ns, At safepoint: 5460 ns, Total: 171286 ns
>> [38.567s][1637062062675ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 16500 ns, Reaching safepoint: 91147 ns, At safepoint: 5770 ns, Total: 96917 ns
>> [38.568s][1637062062677ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1124041 ns, Reaching safepoint: 154426 ns, At safepoint: 6280 ns, Total: 160706 ns
>> [38.570s][1637062062678ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1222819 ns, Reaching safepoint: 152646 ns, At safepoint: 6920 ns, Total: 159566 ns
>> [38.571s][1637062062679ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1070303 ns, Reaching safepoint: 152686 ns, At safepoint: 6029 ns, Total: 158715 ns
>> [38.571s][1637062062679ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 23650 ns, Reaching safepoint: 83208 ns, At safepoint: 6170 ns, Total: 89378 ns
>> [38.572s][1637062062681ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1005014 ns, Reaching safepoint: 148206 ns, At safepoint: 5660 ns, Total: 153866 ns
>> [38.572s][1637062062681ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 15110 ns, Reaching safepoint: 84047 ns, At safepoint: 5690 ns, Total: 89737 ns
>> [38.574s][1637062062682ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1370755 ns, Reaching safepoint: 171876 ns, At safepoint: 5030 ns, Total: 176906 ns
>> [38.574s][1637062062682ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 19749 ns, Reaching safepoint: 82478 ns, At safepoint: 4740 ns, Total: 87218 ns
>> [38.574s][1637062062682ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 12480 ns, Reaching safepoint: 86707 ns, At safepoint: 5040 ns, Total: 91747 ns
>> [38.575s][1637062062684ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 904007 ns, Reaching safepoint: 162666 ns, At safepoint: 5160 ns, Total: 167826 ns
>> [38.575s][1637062062684ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14269 ns, Reaching safepoint: 80878 ns, At safepoint: 5420 ns, Total: 86298 ns
>> [38.577s][1637062062685ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1240908 ns, Reaching safepoint: 144267 ns, At safepoint: 7030 ns, Total: 151297 ns
>> [38.578s][1637062062686ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 971325 ns, Reaching safepoint: 175725 ns, At safepoint: 4710 ns, Total: 180435 ns
>> [38.578s][1637062062686ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 16140 ns, Reaching safepoint: 80258 ns, At safepoint: 5389 ns, Total: 85647 ns
>> [38.578s][1637062062686ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 10290 ns, Reaching safepoint: 80858 ns, At safepoint: 5530 ns, Total: 86388 ns
>> [38.579s][1637062062687ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 430509 ns, Reaching safepoint: 159906 ns, At safepoint: 4610 ns, Total: 164516 ns
>> [38.579s][1637062062687ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 18269 ns, Reaching safepoint: 83838 ns, At safepoint: 4520 ns, Total: 88358 ns
>> [38.579s][1637062062687ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 13270 ns, Reaching safepoint: 77928 ns, At safepoint: 4790 ns, Total: 82718 ns
>> [38.579s][1637062062688ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 384230 ns, Reaching safepoint: 193705 ns, At safepoint: 4080 ns, Total: 197785 ns
>> [38.579s][1637062062688ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14099 ns, Reaching safepoint: 80908 ns, At safepoint: 4840 ns, Total: 85748 ns
>> [38.580s][1637062062688ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 9150 ns, Reaching safepoint: 79268 ns, At safepoint: 4890 ns, Total: 84158 ns
>> [38.580s][1637062062689ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 549396 ns, Reaching safepoint: 143086 ns, At safepoint: 6430 ns, Total: 149516 ns
>> [38.580s][1637062062689ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 12540 ns, Reaching safepoint: 94717 ns, At safepoint: 5800 ns, Total: 100517 ns
>> [38.581s][1637062062690ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 847758 ns, Reaching safepoint: 146687 ns, At safepoint: 5969 ns, Total: 152656 ns
>> [38.582s][1637062062691ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 972285 ns, Reaching safepoint: 128177 ns, At safepoint: 6350 ns, Total: 134527 ns
>> [38.584s][1637062062692ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 986975 ns, Reaching safepoint: 136396 ns, At safepoint: 5770 ns, Total: 142166 ns
>> [38.584s][1637062062692ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 17280 ns, Reaching safepoint: 87097 ns, At safepoint: 5270 ns, Total: 92367 ns
>> [38.585s][1637062062693ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1143131 ns, Reaching safepoint: 188315 ns, At safepoint: 5250 ns, Total: 193565 ns
>> [38.585s][1637062062694ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 12200 ns, Reaching safepoint: 80168 ns, At safepoint: 7480 ns, Total: 87648 ns
>> [38.586s][1637062062695ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1153410 ns, Reaching safepoint: 166846 ns, At safepoint: 7060 ns, Total: 173906 ns
>> [38.587s][1637062062695ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 21549 ns, Reaching safepoint: 89898 ns, At safepoint: 5360 ns, Total: 95258 ns
>> [38.588s][1637062062696ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1129411 ns, Reaching safepoint: 156726 ns, At safepoint: 4810 ns, Total: 161536 ns
>> [38.588s][1637062062696ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14089 ns, Reaching safepoint: 80588 ns, At safepoint: 5170 ns, Total: 85758 ns
>> [38.589s][1637062062697ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 644824 ns, Reaching safepoint: 140666 ns, At safepoint: 5990 ns, Total: 146656 ns
>> [38.590s][1637062062699ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1085312 ns, Reaching safepoint: 254264 ns, At safepoint: 5440 ns, Total: 259704 ns
>> [38.590s][1637062062699ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14609 ns, Reaching safepoint: 83748 ns, At safepoint: 5610 ns, Total: 89358 ns
>> [38.591s][1637062062699ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 387680 ns, Reaching safepoint: 201215 ns, At safepoint: 5340 ns, Total: 206555 ns
>> [38.591s][1637062062699ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 18929 ns, Reaching safepoint: 85098 ns, At safepoint: 5910 ns, Total: 91008 ns
>> [38.591s][1637062062700ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 380750 ns, Reaching safepoint: 175066 ns, At safepoint: 4730 ns, Total: 179796 ns
>> [38.592s][1637062062700ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14500 ns, Reaching safepoint: 80577 ns, At safepoint: 6790 ns, Total: 87367 ns
>> [38.592s][1637062062700ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14660 ns, Reaching safepoint: 78498 ns, At safepoint: 7180 ns, Total: 85678 ns
>> [38.592s][1637062062701ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 655783 ns, Reaching safepoint: 141717 ns, At safepoint: 6089 ns, Total: 147806 ns
>> [38.594s][1637062062702ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 916657 ns, Reaching safepoint: 144226 ns, At safepoint: 5360 ns, Total: 149586 ns
>> [38.595s][1637062062703ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1012334 ns, Reaching safepoint: 133037 ns, At safepoint: 10439 ns, Total: 143476 ns
>> [38.597s][1637062062705ms][info ][gc,phases   ] GC(222) Concurrent Process Non-Strong References 39.443ms
>> 
>> 
>> Example logs from G1 where the Java threads fixes the cleaned inline caches and run out of ICStubs:
>> 
>> [125.998s][1637065197322ms][info ][gc          ] GC(1040) Pause Remark 586M->414M(2048M) 6.609ms
>> [125.998s][1637065197322ms][info ][gc,cpu      ] GC(1040) User=0.08s Sys=0.00s Real=0.01s
>> [125.998s][1637065197322ms][info ][safepoint   ] Safepoint "G1Concurrent", Time since last: 33150646 ns, Reaching safepoint: 103457 ns, At safepoint: 6666988 ns, Total: 6770445 ns
>> [125.998s][1637065197322ms][info ][gc,marking  ] GC(1040) Concurrent Mark 38.296ms
>> [125.998s][1637065197322ms][info ][gc,marking  ] GC(1040) Concurrent Rebuild Remembered Sets
>> [126.001s][1637065197326ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 2621782 ns, Reaching safepoint: 626684 ns, At safepoint: 9340 ns, Total: 636024 ns
>> [126.002s][1637065197326ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 19949 ns, Reaching safepoint: 714022 ns, At safepoint: 12160 ns, Total: 726182 ns
>> [126.007s][1637065197331ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 4665009 ns, Reaching safepoint: 339751 ns, At safepoint: 9640 ns, Total: 349391 ns
>> [126.009s][1637065197334ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 2274802 ns, Reaching safepoint: 365760 ns, At safepoint: 9250 ns, Total: 375010 ns
>> [126.027s][1637065197352ms][info ][gc,marking  ] GC(1040) Concurrent Rebuild Remembered Sets 29.618ms
>> 
>> 
>> I've tested the performance of the change with SPECjbb2015, SPECjvm2008, DaCapo, Renaissance.
>> 
>> I've tested run the patch through tier1-7.
>> 
>> Note that I've made patch as small as possible to make it easier to backport. Thanks @fisk for discussion and explanation of the inline caches code.
>
> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Review Coleen

Thanks all for reviewing!

-------------

PR: https://git.openjdk.java.net/jdk/pull/6450

From jbhateja at openjdk.java.net  Fri Nov 19 08:10:42 2021
From: jbhateja at openjdk.java.net (Jatin Bhateja)
Date: Fri, 19 Nov 2021 08:10:42 GMT
Subject: RFR: 8277168: AArch64: Enable arraycopy partial inlining with SVE
In-Reply-To: <v6YJjUUxs507gmh_JQNThW34vhS3w0FxQ_YPUUlST-g=.7a0a7e7d-def4-4dc9-b29a-efbad6081983@github.com>
References: <Rv-7zsL9uLaSjwoS7JIDgZ2-HhIoX00l8aCiDmFd6qw=.b1dbbb3b-0578-4d7f-97ba-1acd3baaa74c@github.com>
 <v6YJjUUxs507gmh_JQNThW34vhS3w0FxQ_YPUUlST-g=.7a0a7e7d-def4-4dc9-b29a-efbad6081983@github.com>
Message-ID: <82Kgtn4RllwF2ifvmwtaQaeG9ADXeUoq290BKnd8PZ4=.ed410c36-2f5c-4b29-9d96-07d33ac872ee@github.com>

On Thu, 18 Nov 2021 06:55:34 GMT, Pengfei Li <pli at openjdk.org> wrote:

>> Arraycopy partial inlining is a C2 compiler technique that avoids stub
>> call overhead in small-sized arraycopy operations by generating masked
>> vector instructions. So far it works on x86 AVX512 only and this patch
>> enables it on AArch64 with SVE.
>> 
>> We add AArch64 matching rule for VectorMaskGenNode and refactor that
>> node a little bit. The major change is moving the element type field
>> into its TypeVectMask bottom type. The reason is that AArch64 vector
>> masks are different for different vector element types.
>> 
>> E.g., an x86 AVX512 vector mask value masking 3 least significant vector
>> lanes (of any type) is like
>> 
>> `0000 0000 ... 0000 0000 0000 0000 0111`
>> 
>> On AArch64 SVE, this mask value can only be used for masking the 3 least
>> significant lanes of bytes. But for 3 lanes of ints, the value should be
>> 
>> `0000 0000 ... 0000 0000 0001 0001 0001`
>> 
>> where the least significant bit of each lane matters. So AArch64 matcher
>> needs to know the vector element type to generate right masks.
>> 
>> After this patch, the C2 generated code for copying a 50-byte array on
>> AArch64 SVE looks like
>> 
>>   mov     x12, #0x32
>>   whilelo p0.b, xzr, x12
>>   add     x11, x11, #0x10
>>   ld1b    {z16.b}, p0/z, [x11]
>>   add     x10, x10, #0x10
>>   st1b    {z16.b}, p0, [x10]
>> 
>> We ran jtreg hotspot::hotspot_all, jdk::tier1~3 and langtools::tier1 on
>> both x86 AVX512 and AArch64 SVE machines, no issue is found. We tested
>> JMH org/openjdk/bench/java/lang/ArrayCopyAligned.java with small array
>> size arguments on a 512-bit SVE-featured CPU. We got below performance
>> data changes.
>> 
>> Benchmark                  (length)  (Performance)
>> ArrayCopyAligned.testByte        10          -2.6%
>> ArrayCopyAligned.testByte        20          +4.7%
>> ArrayCopyAligned.testByte        30          +4.8%
>> ArrayCopyAligned.testByte        40         +21.7%
>> ArrayCopyAligned.testByte        50         +22.5%
>> ArrayCopyAligned.testByte        60         +28.4%
>> 
>> The test machine has SVE vector size of 512 bits, so we see performance
>> gain for most array sizes less than 64 bytes. For very small arrays we
>> see a bit regression because a vector load/store may be a bit slower
>> than 1 or 2 scalar loads/stores.
>
> The x86 failure is caused by a recent commit (see [JDK-8277324](https://bugs.openjdk.java.net/browse/JDK-8277324)) and unrelated to this PR.

Hi @pfustc , common type system changes looks good to me.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6444

From jbhateja at openjdk.java.net  Fri Nov 19 08:24:42 2021
From: jbhateja at openjdk.java.net (Jatin Bhateja)
Date: Fri, 19 Nov 2021 08:24:42 GMT
Subject: RFR: 8277168: AArch64: Enable arraycopy partial inlining with SVE
In-Reply-To: <Rv-7zsL9uLaSjwoS7JIDgZ2-HhIoX00l8aCiDmFd6qw=.b1dbbb3b-0578-4d7f-97ba-1acd3baaa74c@github.com>
References: <Rv-7zsL9uLaSjwoS7JIDgZ2-HhIoX00l8aCiDmFd6qw=.b1dbbb3b-0578-4d7f-97ba-1acd3baaa74c@github.com>
Message-ID: <yq911veiwu2SLWrfRSQKLZdmH5ZRL_yF-E9xxbg0IgE=.fb2933e7-4554-43c3-ad7b-a72df334741b@github.com>

On Thu, 18 Nov 2021 03:50:45 GMT, Pengfei Li <pli at openjdk.org> wrote:

> Arraycopy partial inlining is a C2 compiler technique that avoids stub
> call overhead in small-sized arraycopy operations by generating masked
> vector instructions. So far it works on x86 AVX512 only and this patch
> enables it on AArch64 with SVE.
> 
> We add AArch64 matching rule for VectorMaskGenNode and refactor that
> node a little bit. The major change is moving the element type field
> into its TypeVectMask bottom type. The reason is that AArch64 vector
> masks are different for different vector element types.
> 
> E.g., an x86 AVX512 vector mask value masking 3 least significant vector
> lanes (of any type) is like
> 
> `0000 0000 ... 0000 0000 0000 0000 0111`
> 
> On AArch64 SVE, this mask value can only be used for masking the 3 least
> significant lanes of bytes. But for 3 lanes of ints, the value should be
> 
> `0000 0000 ... 0000 0000 0001 0001 0001`
> 
> where the least significant bit of each lane matters. So AArch64 matcher
> needs to know the vector element type to generate right masks.
> 
> After this patch, the C2 generated code for copying a 50-byte array on
> AArch64 SVE looks like
> 
>   mov     x12, #0x32
>   whilelo p0.b, xzr, x12
>   add     x11, x11, #0x10
>   ld1b    {z16.b}, p0/z, [x11]
>   add     x10, x10, #0x10
>   st1b    {z16.b}, p0, [x10]
> 
> We ran jtreg hotspot::hotspot_all, jdk::tier1~3 and langtools::tier1 on
> both x86 AVX512 and AArch64 SVE machines, no issue is found. We tested
> JMH org/openjdk/bench/java/lang/ArrayCopyAligned.java with small array
> size arguments on a 512-bit SVE-featured CPU. We got below performance
> data changes.
> 
> Benchmark                  (length)  (Performance)
> ArrayCopyAligned.testByte        10          -2.6%
> ArrayCopyAligned.testByte        20          +4.7%
> ArrayCopyAligned.testByte        30          +4.8%
> ArrayCopyAligned.testByte        40         +21.7%
> ArrayCopyAligned.testByte        50         +22.5%
> ArrayCopyAligned.testByte        60         +28.4%
> 
> The test machine has SVE vector size of 512 bits, so we see performance
> gain for most array sizes less than 64 bytes. For very small arrays we
> see a bit regression because a vector load/store may be a bit slower
> than 1 or 2 scalar loads/stores.

Common type system changes looks good to me.

-------------

Marked as reviewed by jbhateja (Committer).

PR: https://git.openjdk.java.net/jdk/pull/6444

From stuefe at openjdk.java.net  Fri Nov 19 09:33:19 2021
From: stuefe at openjdk.java.net (Thomas Stuefe)
Date: Fri, 19 Nov 2021 09:33:19 GMT
Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks [v4]
In-Reply-To: <h_TohyRCtTn8BzN-h8uTTotWxPr5gTe4CWB04_lS8uA=.3f687b45-3058-42ab-902b-f42cfeb9104d@github.com>
References: <h_TohyRCtTn8BzN-h8uTTotWxPr5gTe4CWB04_lS8uA=.3f687b45-3058-42ab-902b-f42cfeb9104d@github.com>
Message-ID: <MyBPQ13D6ne2eGxMMYekfkNtrzdp838uKKhXj4tBHdA=.0cc2230f-0ae3-4736-8a63-9196a15d10bc@github.com>

> This is part of a number of RFE I plan to improve and simplify C-heap overflow checking in hotspot.
> 
> This proposal adds NMT buffer overflow checking:
> 
> - it gives us C-heap overflow checking in release builds
> - the costs are neglectable: if NMT is off, we won't pay anything; if NMT is on, the added work is minuscule since we have to do malloc header management anyway.
> - NMT needs intact headers anyway. Faced with buffer overwrites today, it would maybe crash or maybe account wrongly, but it's a bit of a lottery really. Better to go the extra step and do a real check.
> - it could be a preparation for future code removal, if we wanted to do that (see details in umbrella RFE https://bugs.openjdk.java.net/browse/JDK-8275301). That way, net complexity would come down even with this patch.
> 
> For more details, please see the JBS issue.
> 
> ----
> 
> Patch notes:
> 
> - The malloc header is changed such that it contains a 16-bit canary directly preceding the user payload of the allocation. The new malloc header does not use bitfields anymore but normal types. For more details, see the comment in mallocTracker.hpp.
>   - On 64-bit, we don't enlarge the malloc header. It remains 16 bytes in length. So no additional memory cost (apart from the 1-byte-footer, see below). Space for the canary is instead obtained by reducing the size of the bucket index bit field to 16 bits. That bit field is used to store the bucket slot index of the malloc site table in NMT detail mode. With 40 bits it was over-dimensioned, and even 16-bits arguably still are: malloc site table width is 512.
>   - On 32-bit, I had to enlarge the header from 8 bytes to 16 bytes to make room for a canary. But strictly speaking 8 bytes were not enough anyway: the header size has to be large enough to satisfy malloc(3) alignment, and that would be 16 bytes. I believe it never led to an error since we don't store 128bit data in malloc'd memory in the hotspot anywhere.
> 
> - I added a footer canary trailing the user allocation to catch tail buffer overruns. To keep matters simple (alignment) I made it a single byte only. That is enough to catch most overrun scenarios.
> 
> - I brushed up error reporting. When NMT detects corruption, it will now print out a hex dump of the corrupted area to tty before asserting.
> 
> - I added a bunch of gtests to test various heap overwrite scenarios. I also had to extend the gtest macros a bit because I wanted these tests of course to run in release builds too, but we did not have a death test macro for release builds yet (there are possibilities for code simplification here too, but that's for another RFE).
> 
> - I renamed `nmt_header_size` to `nmt_overhead` since that size includes header and footer now.
> 
> - I made the assert for malloc site table width a compile time STATIC_ASSERT.
> 
> --------------
> 
> Example output a buffer overrun would provide:
> 
> 
> Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)
> NMT Block at 0x00005600f86136b0, corruption at: 0x00005600f86136c1: 
> 0x00005600f86136a8:   21 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00
> 0x00005600f86136b8:   00 00 00 00 0f 00 1f fa 00 61 00 00 00 00 00 00
> 0x00005600f86136c8:   41 39 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00005600f86136d8:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> 0x00005600f86136e8:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00005600f86136f8:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00005600f8613708:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00005600f8613718:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00005600f8613728:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00005600f8613738:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> assert failed: fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)#
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  Internal Error (mallocTracker.cpp:203), pid=10805, tid=10805
> #  fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)
> #
> 
> -------
> 
> Tests:
> - manual tests with Linux x64, x86, minimal build
> - GHAs all clean
> - SAP nightlies ran for 4 weeks now without problems

Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:

  Extend gtests

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/5952/files
  - new: https://git.openjdk.java.net/jdk/pull/5952/files/a1611e78..d3677c1f

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5952&range=03
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5952&range=02-03

  Stats: 61 lines in 4 files changed: 47 ins; 0 del; 14 mod
  Patch: https://git.openjdk.java.net/jdk/pull/5952.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/5952/head:pull/5952

PR: https://git.openjdk.java.net/jdk/pull/5952

From stuefe at openjdk.java.net  Fri Nov 19 09:33:20 2021
From: stuefe at openjdk.java.net (Thomas Stuefe)
Date: Fri, 19 Nov 2021 09:33:20 GMT
Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks
In-Reply-To: <6HZO-x_TuA4XOmbGfIo7DCXFgZCSNU69FkpWT8WWtL8=.6c3180cf-ced1-43d3-966f-3f21e9d3bffe@github.com>
References: <h_TohyRCtTn8BzN-h8uTTotWxPr5gTe4CWB04_lS8uA=.3f687b45-3058-42ab-902b-f42cfeb9104d@github.com>
 <jJzmGoEj_VpsZNwJU0IGAE6atFbN4vl0fRvLwxGyj7M=.e48c2ff9-0fb9-4bfd-a4a1-0a9106a59760@github.com>
 <MHNw41JYAQezDRVbYJjB1t4esIEk-SjKltY4Ox6rexs=.1a843b1f-063b-4832-be99-a3be3d00410f@github.com>
 <kjoLxh2Q9IHdBfOgFjK7B6fF1FGYy3ON9TFuXv1rYto=.0229ade5-4a43-4a10-9d59-35987e67c9ba@github.com>
 <xjh3jusi2H3rzs2_PK2Efo2oN5zsg7aNZQOR6Q0dTIo=.7c57ee1d-7edf-4db8-9c9f-4bc3fb982a5a@github.com>
 <_r5qw_r-3Be7zUJuf4gcb10MFe9varAWAvix_CaJiYs=.758ad563-f2d2-466d-bcb5-1ccc6b547e94@github.com>
 <6HZO-x_TuA4XOmbGfIo7DCXFgZCSNU69FkpWT8WWtL8=.6c3180cf-ced1-43d3-966f-3f21e9d3bffe@github.com>
Message-ID: <Qo5uecQAEFHpajnn2hzVxXtQGqKIfBOqGXHTPcrlhrM=.ea636fef-ffdf-4412-9f74-b297b8f195b7@github.com>

On Sun, 17 Oct 2021 13:30:17 GMT, Zhengyu Gu <zgu at openjdk.org> wrote:

>>> > > > Sorry, we already have GuardedMemory for detecting buffer overrun, why introduce a new one?
>>> > > 
>>> > > 
>>> > > GuardedMemory has a number of disadvantages, and I'd like to remove it in favor of NMT doing buffer overrun checks. For my full reasoning, please see my reasoning in the umbrella RFE https://bugs.openjdk.java.net/browse/JDK-8275301:
>>> > > Disadvantages of the current solution:
>>> > > 
>>> > > * We have no way to do C-heap checking in release builds. But there, it is sorely missed. We ship release VMs, and those VMs get used in myriad ways with a lot of faulty third-party native code. I would love to be able to flip a product switch at a customer site and have some basic C-heap checks done, without relegating to external tools or debug c-libs.
>>> > > * The debug-only guards in os::malloc() are quite fat, really, a whopping 48 bytes per allocation on 64-bit, 40 bytes on 32-bit. That is for guarding alone. They distort the memory allocation picture, since blowing up every allocation this way causes the underlying libc to do different things. Therefore we have different memory layouts and allocation patterns between debug and release. In addition, we have different code paths too, e.g. in debug os::realloc calls os::malloc + os::free whereas in release builds it calls directly into libc ::realloc. All that means that in debug builds we test something different than what we ship in release builds.
>>> > > * The canary in the headers of the debug-only guards do not directly precede the user portion of the data, so we won't catch negative buffer overflows of only a few bytes.
>>> > > * The guarding added by CheckJNICalls is unnecessarily expensive too, since it copies the memory around, handing a copy of the guarded memory up to the caller.
>>> > > * The fact that three different code sections all do malloc headers incurs unnecessary costs, and the code is unnecessarily complex. It makes also statistics difficult to understand since the silent overhead can be large (compare the rise in RSS with the rise in NMT allocations in a debug build).
>>> > > * None of the current overflow checkers print out hex dumps of the violated memory. That is what the libc usually does and it is very useful.
>>> > > 
>>> > > Thanks, Thomas
>>> > 
>>> > 
>>> > p.s. I contemplated to do NMT overflow checks and removal of old guarding code in one RFE but was concerned that it would be too confusing and get stuck in review limbo. Maybe that was wrong. But this RFE here makes more sense when viewed as part of a whole.
>>> 
>>> Thanks for explanation. So, buffer overrun detection is now only available when NMT is on, vs. always on with GuardedMemory in debug build. Right?
>> 
>> Well, not with this patch obviously. But yes, that would be my proposal. To get "always-on", we could switch NMT on by default in debug builds. "summary" level is not really expensive at all, it uses less memory than GuardedMemory does, and the per-flag accounting does not really add much overhead (GuardedMemory also does some accounting btw).
>> 
>> Though tbh my first priority is to give us overflow checks in release builds. If we only do that and leave GuardedMemory in place I would be happy already. I had two customer cases very recently with heap overwriters, one of which I misused NMT to trigger a crash and analyze the core. A neighboring (non-VM-allocated) block was overwriting the following (VM allocated) heap block. 
>> 
>> Cheers, Thomas
>
>> > > > > Sorry, we already have GuardedMemory for detecting buffer overrun, why introduce a new one?
>> > > > 
>> > > > 
>> > > > GuardedMemory has a number of disadvantages, and I'd like to remove it in favor of NMT doing buffer overrun checks. For my full reasoning, please see my reasoning in the umbrella RFE https://bugs.openjdk.java.net/browse/JDK-8275301:
>> > > > Disadvantages of the current solution:
>> > > > 
>> > > > * We have no way to do C-heap checking in release builds. But there, it is sorely missed. We ship release VMs, and those VMs get used in myriad ways with a lot of faulty third-party native code. I would love to be able to flip a product switch at a customer site and have some basic C-heap checks done, without relegating to external tools or debug c-libs.
>> > > > * The debug-only guards in os::malloc() are quite fat, really, a whopping 48 bytes per allocation on 64-bit, 40 bytes on 32-bit. That is for guarding alone. They distort the memory allocation picture, since blowing up every allocation this way causes the underlying libc to do different things. Therefore we have different memory layouts and allocation patterns between debug and release. In addition, we have different code paths too, e.g. in debug os::realloc calls os::malloc + os::free whereas in release builds it calls directly into libc ::realloc. All that means that in debug builds we test something different than what we ship in release builds.
>> > > > * The canary in the headers of the debug-only guards do not directly precede the user portion of the data, so we won't catch negative buffer overflows of only a few bytes.
>> > > > * The guarding added by CheckJNICalls is unnecessarily expensive too, since it copies the memory around, handing a copy of the guarded memory up to the caller.
>> > > > * The fact that three different code sections all do malloc headers incurs unnecessary costs, and the code is unnecessarily complex. It makes also statistics difficult to understand since the silent overhead can be large (compare the rise in RSS with the rise in NMT allocations in a debug build).
>> > > > * None of the current overflow checkers print out hex dumps of the violated memory. That is what the libc usually does and it is very useful.
>> > > > 
>> > > > Thanks, Thomas
>> > > 
>> > > 
>> > > p.s. I contemplated to do NMT overflow checks and removal of old guarding code in one RFE but was concerned that it would be too confusing and get stuck in review limbo. Maybe that was wrong. But this RFE here makes more sense when viewed as part of a whole.
>> > 
>> > 
>> > Thanks for explanation. So, buffer overrun detection is now only available when NMT is on, vs. always on with GuardedMemory in debug build. Right?
>> 
>> Well, not with this patch obviously. But yes, that would be my proposal. To get "always-on", we could switch NMT on by default in debug builds. "summary" level is not really expensive at all, it uses less memory than GuardedMemory does, and the per-flag accounting does not really add much overhead (GuardedMemory also does some accounting btw).
>> 
>> Though tbh my first priority is to give us overflow checks in release builds. If we only do that and leave GuardedMemory in place I would be happy already. I had two customer cases very recently with heap overwriters, one of which I misused NMT to trigger a crash and analyze the core. A neighboring (non-VM-allocated) block was overwriting the following (VM allocated) heap block.
>> 
>> Cheers, Thomas
> 
> I have no problem on technical side. Changing NMT default value, I believe, needs CSR. Probably should start with a CSR to get a consensus.
> 
> Thanks.
> 
> -Zhengyu

Added a thorough regression test for the `realloc` issue but I was unable to provoce the theoretical error @zhengyu123 pointed out in practice. When analyzing I found that out of accident the current coding already works:
- a realloc to a smaller size will memcpy() the original payload with the new size, since we use MIN2(size, memblock_size) and use the new, smaller, payload size. That will leave the NMT footer intact which had been added by the os::malloc above.
- a realloc to a larger size will memcpy() with memblock_size, and Zhengyu is right, that is too large. The effect of that is that we copy the original footer too. But that is fine. Since the footer is only one byte, we will, again, leave the new NMT footer added by os::malloc() intact.
Still, Zhengyu was right, this is a problem. I will experiment with a larger footer since I believe that should fail as predicted (I just want to see my new regression tests actually fire :)

-------------

PR: https://git.openjdk.java.net/jdk/pull/5952

From sspitsyn at openjdk.java.net  Fri Nov 19 10:11:39 2021
From: sspitsyn at openjdk.java.net (Serguei Spitsyn)
Date: Fri, 19 Nov 2021 10:11:39 GMT
Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with
 "assert(java_thread == _state->get_thread()) failed: Must be" [v3]
In-Reply-To: <Io5zyME4R4fiSdR9bV91VhvMzfQoETTuGYG6gRQEimc=.35f5f16a-0848-4b93-bd84-0c41cc5a6df3@github.com>
References: <PqDFsCk3WFtK9ZA-Blegiu6CzrA7EoiG6jx0raAFF7A=.5ddd8298-66cb-47c4-b53d-ca148ce2fc5b@github.com>
 <Ivox2R4Y3x6BDgrGBt-QYm2grMJN5XWeetESBRyYa38=.3df670b6-2633-4a89-adac-2eb21f108689@github.com>
 <Io5zyME4R4fiSdR9bV91VhvMzfQoETTuGYG6gRQEimc=.35f5f16a-0848-4b93-bd84-0c41cc5a6df3@github.com>
Message-ID: <NWTidlb6E9Yr-x9c-7mtk8uFFfGa8UhuBuNABgknASg=.c11363c2-9c95-4b56-ab0e-de0d1a68003e@github.com>

On Thu, 18 Nov 2021 17:15:06 GMT, Daniel D. Daugherty <dcubed at openjdk.org> wrote:

>> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   get rid of the checks in jvmti handshakes: java_thread->threadObj() == NULL
>
> src/hotspot/share/prims/jvmtiEnvBase.cpp line 1533:
> 
>> 1531:     return; /* JVMTI_ERROR_THREAD_NOT_ALIVE (default) */
>> 1532:   }
>> 1533:   assert(java_thread == _state->get_thread(), "Must be");
> 
> This `assert()` is the site of the original test failure. I haven't yet
> looked at the locations of the other changes.
> 
> The `is_exiting()` check is made under the protection of the
> `JvmtiThreadState_lock` so an unsuspended target thread that is
> exiting cannot reach the point where the `_state` is updated to
> clear the `JavaThread*` so we can't fail the `assert()` if the
> `is_exiting()` check has returned `false`.

Dan,
Thank you for reviewing this!
I'm not sure, I correctly understand you here.
Are you saying that you agree with this change?
In fact, the thread state can not be changed (and the assert fired) after the `is_exiting()` check is made even without `JvmtiThreadState_lock` protection because it is inside of a handshake execution.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6440

From sspitsyn at openjdk.java.net  Fri Nov 19 10:17:39 2021
From: sspitsyn at openjdk.java.net (Serguei Spitsyn)
Date: Fri, 19 Nov 2021 10:17:39 GMT
Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with
 "assert(java_thread == _state->get_thread()) failed: Must be" [v3]
In-Reply-To: <UaxiSmd-pk9zw_gIc1aU53VdcCXkJNUkyySPxOaOxyw=.78569f8a-9402-4008-94b7-010f51f0bd9a@github.com>
References: <PqDFsCk3WFtK9ZA-Blegiu6CzrA7EoiG6jx0raAFF7A=.5ddd8298-66cb-47c4-b53d-ca148ce2fc5b@github.com>
 <Ivox2R4Y3x6BDgrGBt-QYm2grMJN5XWeetESBRyYa38=.3df670b6-2633-4a89-adac-2eb21f108689@github.com>
 <UaxiSmd-pk9zw_gIc1aU53VdcCXkJNUkyySPxOaOxyw=.78569f8a-9402-4008-94b7-010f51f0bd9a@github.com>
Message-ID: <2l9gjieNV6K8UMLcGHO_CtSWzzN5Kv45pFt6_3OZ85o=.1ae38c09-bf00-45ec-ac96-838469a5f7a7@github.com>

On Thu, 18 Nov 2021 17:18:23 GMT, Daniel D. Daugherty <dcubed at openjdk.org> wrote:

>> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   get rid of the checks in jvmti handshakes: java_thread->threadObj() == NULL
>
> src/hotspot/share/prims/jvmtiEnvBase.cpp line 1401:
> 
>> 1399:   if (!self) {
>> 1400:     if (!java_thread->is_suspended()) {
>> 1401:       _result = JVMTI_ERROR_THREAD_NOT_SUSPENDED;
> 
> I don't see an obvious reason for this `is_exiting()` check.

Okay. I see similar check in the `force_early_return()` function:

  if (state == NULL) {
    return JVMTI_ERROR_THREAD_NOT_ALIVE;
  }

Would it better to replace it with this check instead? :

  if (java_thread->is_exiting()) {
    return JVMTI_ERROR_THREAD_NOT_ALIVE;
  }

> src/hotspot/share/prims/jvmtiEnvBase.cpp line 1625:
> 
>> 1623:     return; /* JVMTI_ERROR_THREAD_NOT_ALIVE (default) */
>> 1624:   }
>> 1625:   assert(_state->get_thread() == java_thread, "Must be");
> 
> The `assert()` on L1625 is subject to the same race as the original site.
> This `is_exiting()` check is made under the protection of the
> `JvmtiThreadState_lock` so it is sufficient to protect that `assert()`.

Okay, thanks!

-------------

PR: https://git.openjdk.java.net/jdk/pull/6440

From stuefe at openjdk.java.net  Fri Nov 19 14:25:46 2021
From: stuefe at openjdk.java.net (Thomas Stuefe)
Date: Fri, 19 Nov 2021 14:25:46 GMT
Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks [v5]
In-Reply-To: <h_TohyRCtTn8BzN-h8uTTotWxPr5gTe4CWB04_lS8uA=.3f687b45-3058-42ab-902b-f42cfeb9104d@github.com>
References: <h_TohyRCtTn8BzN-h8uTTotWxPr5gTe4CWB04_lS8uA=.3f687b45-3058-42ab-902b-f42cfeb9104d@github.com>
Message-ID: <vw58538t0fhdIXfH1EY3LNxVqD0-222QQY53xgM2grY=.8f1dc068-22be-4a95-aed9-b487b1087b29@github.com>

> This is part of a number of RFE I plan to improve and simplify C-heap overflow checking in hotspot.
> 
> This proposal adds NMT buffer overflow checking:
> 
> - it gives us C-heap overflow checking in release builds
> - the costs are neglectable: if NMT is off, we won't pay anything; if NMT is on, the added work is minuscule since we have to do malloc header management anyway.
> - NMT needs intact headers anyway. Faced with buffer overwrites today, it would maybe crash or maybe account wrongly, but it's a bit of a lottery really. Better to go the extra step and do a real check.
> - it could be a preparation for future code removal, if we wanted to do that (see details in umbrella RFE https://bugs.openjdk.java.net/browse/JDK-8275301). That way, net complexity would come down even with this patch.
> 
> For more details, please see the JBS issue.
> 
> ----
> 
> Patch notes:
> 
> - The malloc header is changed such that it contains a 16-bit canary directly preceding the user payload of the allocation. The new malloc header does not use bitfields anymore but normal types. For more details, see the comment in mallocTracker.hpp.
>   - On 64-bit, we don't enlarge the malloc header. It remains 16 bytes in length. So no additional memory cost (apart from the 1-byte-footer, see below). Space for the canary is instead obtained by reducing the size of the bucket index bit field to 16 bits. That bit field is used to store the bucket slot index of the malloc site table in NMT detail mode. With 40 bits it was over-dimensioned, and even 16-bits arguably still are: malloc site table width is 512.
>   - On 32-bit, I had to enlarge the header from 8 bytes to 16 bytes to make room for a canary. But strictly speaking 8 bytes were not enough anyway: the header size has to be large enough to satisfy malloc(3) alignment, and that would be 16 bytes. I believe it never led to an error since we don't store 128bit data in malloc'd memory in the hotspot anywhere.
> 
> - I added a footer canary trailing the user allocation to catch tail buffer overruns. To keep matters simple (alignment) I made it a single byte only. That is enough to catch most overrun scenarios.
> 
> - I brushed up error reporting. When NMT detects corruption, it will now print out a hex dump of the corrupted area to tty before asserting.
> 
> - I added a bunch of gtests to test various heap overwrite scenarios. I also had to extend the gtest macros a bit because I wanted these tests of course to run in release builds too, but we did not have a death test macro for release builds yet (there are possibilities for code simplification here too, but that's for another RFE).
> 
> - I renamed `nmt_header_size` to `nmt_overhead` since that size includes header and footer now.
> 
> - I made the assert for malloc site table width a compile time STATIC_ASSERT.
> 
> --------------
> 
> Example output a buffer overrun would provide:
> 
> 
> Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)
> NMT Block at 0x00005600f86136b0, corruption at: 0x00005600f86136c1: 
> 0x00005600f86136a8:   21 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00
> 0x00005600f86136b8:   00 00 00 00 0f 00 1f fa 00 61 00 00 00 00 00 00
> 0x00005600f86136c8:   41 39 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00005600f86136d8:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> 0x00005600f86136e8:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00005600f86136f8:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00005600f8613708:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00005600f8613718:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00005600f8613728:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00005600f8613738:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> assert failed: fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)#
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  Internal Error (mallocTracker.cpp:203), pid=10805, tid=10805
> #  fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)
> #
> 
> -------
> 
> Tests:
> - manual tests with Linux x64, x86, minimal build
> - GHAs all clean
> - SAP nightlies ran for 4 weeks now without problems

Thomas Stuefe has updated the pull request incrementally with 115 additional commits since the last revision:

 - Fix Zhengyu Problem in os::realloc
 - update comment after increasing footer
 - improve test
 - 8277439: G1: Correct include guard name in G1EvacFailureObjectsSet.hpp
   
   Reviewed-by: tschatzl, sjohanss
 - 8277371: Remove unnecessary DefNewGeneration::ref_processor_init()
   
   Reviewed-by: stefank, tschatzl, mli
 - 8277324: C2 compilation fails with "bad AD file" on x86-32 after JDK-8276162 due to missing match rule
   
   Reviewed-by: chagedorn, roland
 - 8273039: JShell crashes when naming variable or method "abstract" or "strictfp"
   
   Reviewed-by: vromero
 - 8277213: CompileTask_lock is acquired out of order with MethodCompileQueue_lock
   
   Reviewed-by: rbackman, coleenp
 - 8275643: C2's unaryOp vector intrinsic does not properly handle LongVector.neg
   
   Reviewed-by: chagedorn, sviswanathan
 - 8277102: Dubious PrintCompilation output
   
   Reviewed-by: thartmann, dnsimon
 - ... and 105 more: https://git.openjdk.java.net/jdk/compare/d3677c1f...17a5bc71

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/5952/files
  - new: https://git.openjdk.java.net/jdk/pull/5952/files/d3677c1f..17a5bc71

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5952&range=04
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5952&range=03-04

  Stats: 39754 lines in 658 files changed: 28691 ins; 5278 del; 5785 mod
  Patch: https://git.openjdk.java.net/jdk/pull/5952.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/5952/head:pull/5952

PR: https://git.openjdk.java.net/jdk/pull/5952

From stuefe at openjdk.java.net  Fri Nov 19 14:29:17 2021
From: stuefe at openjdk.java.net (Thomas Stuefe)
Date: Fri, 19 Nov 2021 14:29:17 GMT
Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks [v6]
In-Reply-To: <h_TohyRCtTn8BzN-h8uTTotWxPr5gTe4CWB04_lS8uA=.3f687b45-3058-42ab-902b-f42cfeb9104d@github.com>
References: <h_TohyRCtTn8BzN-h8uTTotWxPr5gTe4CWB04_lS8uA=.3f687b45-3058-42ab-902b-f42cfeb9104d@github.com>
Message-ID: <IwunT5rbTfQfEBiFMgvgBn5VBfhbuGwJ7_f3EaWGPEY=.16bb5520-3fbc-4e80-9227-ac6ad299c244@github.com>

> This is part of a number of RFE I plan to improve and simplify C-heap overflow checking in hotspot.
> 
> This proposal adds NMT buffer overflow checking:
> 
> - it gives us C-heap overflow checking in release builds
> - the costs are neglectable: if NMT is off, we won't pay anything; if NMT is on, the added work is minuscule since we have to do malloc header management anyway.
> - NMT needs intact headers anyway. Faced with buffer overwrites today, it would maybe crash or maybe account wrongly, but it's a bit of a lottery really. Better to go the extra step and do a real check.
> - it could be a preparation for future code removal, if we wanted to do that (see details in umbrella RFE https://bugs.openjdk.java.net/browse/JDK-8275301). That way, net complexity would come down even with this patch.
> 
> For more details, please see the JBS issue.
> 
> ----
> 
> Patch notes:
> 
> - The malloc header is changed such that it contains a 16-bit canary directly preceding the user payload of the allocation. The new malloc header does not use bitfields anymore but normal types. For more details, see the comment in mallocTracker.hpp.
>   - On 64-bit, we don't enlarge the malloc header. It remains 16 bytes in length. So no additional memory cost (apart from the 1-byte-footer, see below). Space for the canary is instead obtained by reducing the size of the bucket index bit field to 16 bits. That bit field is used to store the bucket slot index of the malloc site table in NMT detail mode. With 40 bits it was over-dimensioned, and even 16-bits arguably still are: malloc site table width is 512.
>   - On 32-bit, I had to enlarge the header from 8 bytes to 16 bytes to make room for a canary. But strictly speaking 8 bytes were not enough anyway: the header size has to be large enough to satisfy malloc(3) alignment, and that would be 16 bytes. I believe it never led to an error since we don't store 128bit data in malloc'd memory in the hotspot anywhere.
> 
> - I added a footer canary trailing the user allocation to catch tail buffer overruns. To keep matters simple (alignment) I made it a single byte only. That is enough to catch most overrun scenarios.
> 
> - I brushed up error reporting. When NMT detects corruption, it will now print out a hex dump of the corrupted area to tty before asserting.
> 
> - I added a bunch of gtests to test various heap overwrite scenarios. I also had to extend the gtest macros a bit because I wanted these tests of course to run in release builds too, but we did not have a death test macro for release builds yet (there are possibilities for code simplification here too, but that's for another RFE).
> 
> - I renamed `nmt_header_size` to `nmt_overhead` since that size includes header and footer now.
> 
> - I made the assert for malloc site table width a compile time STATIC_ASSERT.
> 
> --------------
> 
> Example output a buffer overrun would provide:
> 
> 
> Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)
> NMT Block at 0x00005600f86136b0, corruption at: 0x00005600f86136c1: 
> 0x00005600f86136a8:   21 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00
> 0x00005600f86136b8:   00 00 00 00 0f 00 1f fa 00 61 00 00 00 00 00 00
> 0x00005600f86136c8:   41 39 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00005600f86136d8:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> 0x00005600f86136e8:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00005600f86136f8:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00005600f8613708:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00005600f8613718:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00005600f8613728:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00005600f8613738:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> assert failed: fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)#
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  Internal Error (mallocTracker.cpp:203), pid=10805, tid=10805
> #  fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)
> #
> 
> -------
> 
> Tests:
> - manual tests with Linux x64, x86, minimal build
> - GHAs all clean
> - SAP nightlies ran for 4 weeks now without problems

Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits:

 - Volker Feedback 2
 - Fix Zhengyu Problem in os::realloc
 - Extend gtests
 - extend footer to 2 bytes
 - Feedback Volker
 - Let NMT do overflow detection

-------------

Changes: https://git.openjdk.java.net/jdk/pull/5952/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5952&range=05
  Stats: 434 lines in 11 files changed: 385 ins; 11 del; 38 mod
  Patch: https://git.openjdk.java.net/jdk/pull/5952.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/5952/head:pull/5952

PR: https://git.openjdk.java.net/jdk/pull/5952

From rkennke at openjdk.java.net  Fri Nov 19 14:43:59 2021
From: rkennke at openjdk.java.net (Roman Kennke)
Date: Fri, 19 Nov 2021 14:43:59 GMT
Subject: RFR: 8277417: C1 LIR instruction for load-klass
Message-ID: <voPyK1Nahe_KLkjweEy6mY-ZEJ3RY5uhGYcF-S9fJQg=.adcb830e-d9f8-4bae-a0fb-079cfd7ffc01@github.com>

In C1, the load of a Klass* out of an object is currently identified by a load of type T_ADDRESS with offset oopDest::klass_offset_in_bytes(). When encountering such load, this may be decoded when +CompressedClassPointers. This is problematic and ugly: if we ever emit a T_ADDRESS load with offset 8 or 4 (== klass_offset_in_bytes) that is not a Klass*, we would attempt to decode the result. We have been lucky so far.

Also, in Lilliput, we want to do something entirely different there, and need to be able to emit more complex code, possibly including runtime call.

The change introduces a new C1 LIR opcode OpLoadKlass, and refactors the implementations in c1_LIRAssembler_xyz.cpp to emit the code there, instead of mem2reg(). Notice that I could not test anything but x86, all other platforms only received very basic testing via GHA. It would be nice if respective maintainers could give it a try.

Testing:
 - [x] tier1 (x86_64)
 - [ ] tier2 (x86_64)
 - [ ] tier3 (x86_64)

-------------

Commit messages:
 - Add debug info for null checks
 - 8277417: C1 LIR instruction for load-klass

Changes: https://git.openjdk.java.net/jdk/pull/6464/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6464&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8277417
  Stats: 183 lines in 11 files changed: 134 ins; 37 del; 12 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6464.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6464/head:pull/6464

PR: https://git.openjdk.java.net/jdk/pull/6464

From iveresov at openjdk.java.net  Fri Nov 19 14:44:00 2021
From: iveresov at openjdk.java.net (Igor Veresov)
Date: Fri, 19 Nov 2021 14:44:00 GMT
Subject: RFR: 8277417: C1 LIR instruction for load-klass
In-Reply-To: <voPyK1Nahe_KLkjweEy6mY-ZEJ3RY5uhGYcF-S9fJQg=.adcb830e-d9f8-4bae-a0fb-079cfd7ffc01@github.com>
References: <voPyK1Nahe_KLkjweEy6mY-ZEJ3RY5uhGYcF-S9fJQg=.adcb830e-d9f8-4bae-a0fb-079cfd7ffc01@github.com>
Message-ID: <iDYagxj1FvOVwTzALTq-q8iLld2ZuobiY-5eVB6h1dc=.ddb93b76-565d-4ea0-bd37-25dbf36bc17e@github.com>

On Thu, 18 Nov 2021 20:16:27 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

> In C1, the load of a Klass* out of an object is currently identified by a load of type T_ADDRESS with offset oopDest::klass_offset_in_bytes(). When encountering such load, this may be decoded when +CompressedClassPointers. This is problematic and ugly: if we ever emit a T_ADDRESS load with offset 8 or 4 (== klass_offset_in_bytes) that is not a Klass*, we would attempt to decode the result. We have been lucky so far.
> 
> Also, in Lilliput, we want to do something entirely different there, and need to be able to emit more complex code, possibly including runtime call.
> 
> The change introduces a new C1 LIR opcode OpLoadKlass, and refactors the implementations in c1_LIRAssembler_xyz.cpp to emit the code there, instead of mem2reg(). Notice that I could not test anything but x86, all other platforms only received very basic testing via GHA. It would be nice if respective maintainers could give it a try.
> 
> Testing:
>  - [x] tier1 (x86_64)
>  - [ ] tier2 (x86_64)
>  - [ ] tier3 (x86_64)

Nice! Thank you!

-------------

Marked as reviewed by iveresov (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6464

From stuefe at openjdk.java.net  Fri Nov 19 14:45:42 2021
From: stuefe at openjdk.java.net (Thomas Stuefe)
Date: Fri, 19 Nov 2021 14:45:42 GMT
Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks [v3]
In-Reply-To: <9GqbVZKY1Z5fCvB-vuCwqIFwPXEDU1nHd002J3SS2KM=.a1ca56cc-016b-4daf-9f69-5bbf60f32e71@github.com>
References: <h_TohyRCtTn8BzN-h8uTTotWxPr5gTe4CWB04_lS8uA=.3f687b45-3058-42ab-902b-f42cfeb9104d@github.com>
 <ar7S0Y42f-xf1ii4ntYtOIEd5Nz0_W6wCMDoJD0x-S0=.75583559-34d4-4362-9ff7-b4a8c41d31dc@github.com>
 <9GqbVZKY1Z5fCvB-vuCwqIFwPXEDU1nHd002J3SS2KM=.a1ca56cc-016b-4daf-9f69-5bbf60f32e71@github.com>
Message-ID: <qFSLWtcZ4DU7vNsNRTZD42Gkx78bMhCmLyGcsqh1thc=.6b99678c-08cd-4c4e-86d8-5fc17a18dd4a@github.com>

On Thu, 18 Nov 2021 17:33:50 GMT, Volker Simonis <simonis at openjdk.org> wrote:

>> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Feedback Volker
>
> Looks good to me know except for Zhengyu question.

Hi @simonis, @zhengyu123,

I somehow messed up and force-pushed. This is what happened since the last review:

- https://github.com/openjdk/jdk/pull/5952/commits/2247b5e6f6d6aff2c54fe51f1e064876bba43963 : increases the footer canary to two bytes.
- https://github.com/openjdk/jdk/pull/5952/commits/188f0ea36a12d20be960ec98eb303669c6fcd714 : extended the gtests, mainly to test realloc more thoroughly.
    - Added one death test to show that realloc also does heap corruption checks on the old block. 
    - Another test - not a death test but a regular test - just to test that realloc works. This was in reaction to the bug Zhengyu found, but I never got it to fire. After analysing I believe the bug was benign. Still, good to have this test.
 - https://github.com/openjdk/jdk/pull/5952/commits/ea6fe31c08af1a7073e3d14a37a572beb43a027c : This one actually fixes the bug Zhengyu found. I kept the fix simple stupid and refrained from cleaning up too much. I just removed two methods which were not needed anymore.
 - https://github.com/openjdk/jdk/pull/5952/commits/3d2a5d00b7dac5411f3c1956a4b5c8b6e1a76a66 : Last one fixes the last typos Volker found.

Thanks again, guys, for your reviews. I plan to give this another round in our test systems before pushing.

Cheers, Thomas

-------------

PR: https://git.openjdk.java.net/jdk/pull/5952

From stefank at openjdk.java.net  Fri Nov 19 15:37:46 2021
From: stefank at openjdk.java.net (Stefan Karlsson)
Date: Fri, 19 Nov 2021 15:37:46 GMT
Subject: Integrated: 8277212: GC accidentally cleans valid megamorphic vtable
 inline caches
In-Reply-To: <9wD0oJ2P5bG1qYJ71qdCN4-Q_fiOkKWh4kXTdw8Yb8o=.fc8b7be9-7818-4414-9177-542e2b061480@github.com>
References: <9wD0oJ2P5bG1qYJ71qdCN4-Q_fiOkKWh4kXTdw8Yb8o=.fc8b7be9-7818-4414-9177-542e2b061480@github.com>
Message-ID: <sqF0cc11jEHGKIeo5HLAAIT9FHmOTI6rdg7F8047jfQ=.a7f1aa00-9265-485f-942e-342e959a0f4c@github.com>

On Thu, 18 Nov 2021 09:56:37 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:

> We got a report on the zgc-dev list about a large performance issue affecting ZGC:
> https://mail.openjdk.java.net/pipermail/zgc-dev/2021-November/001086.html
> 
> One of the issues that the reporter identified was that we could get extremely long class unloading / unlinking times:
> 
> [17606.140s][info][gc,phases ] GC(719) Concurrent Process Non-Strong References 25781.928ms
> [17610.181s][info][gc,stats ] Subphase: Concurrent Classes Unlink 14280.772 / 25769.511 1126.563 / 25769.511 217.882 / 68385.750 217.882 / 68385.750 ms
> 
> and while this were happening we got a huge number of ICBufferFull safepoints.
> 
> It turns out that we have a 10-year-old bug in the inline cache cleaning code. This code came in with the permgen removal. See how the original code only calls set_to_clean when ic_oop is non-null:
> 
> https://github.com/openjdk/jdk/commit/5c58d27aac7b291b879a7a3ff6f39fca25619103
> 
>         CompiledIC *ic = CompiledIC_at(iter.reloc());
>         oop ic_oop = ic->cached_oop();
>         if (ic_oop != NULL && !is_alive->do_object_b(ic_oop)) {
>           // The only exception is compiledICHolder oops which may
>           // yet be marked below. (We check this further below).
>           if (ic_oop->is_compiledICHolder()) {
>             compiledICHolderOop cichk_oop = compiledICHolderOop(ic_oop);
>             if (is_alive->do_object_b(
>                   cichk_oop->holder_method()->method_holder()) &&
>                 is_alive->do_object_b(cichk_oop->holder_klass())) {
>               continue;
>             }
>           }
>           ic->set_to_clean();
>           assert(ic->cached_oop() == NULL,
>                  "cached oop in IC should be cleared");
>         }
>       }
> 
> 
> The rewritten code put the set_to_clean call in a different scope, causing the CompiledIC to also be cleaned when ic_oop is NULL:
> 
>         CompiledIC *ic = CompiledIC_at(iter.reloc());
>         if (ic->is_icholder_call()) {
>           // The only exception is compiledICHolder oops which may
>           // yet be marked below. (We check this further below).
>           CompiledICHolder* cichk_oop = ic->cached_icholder();
>           if (cichk_oop->holder_method()->method_holder()->is_loader_alive(is_alive) &&
>               cichk_oop->holder_klass()->is_loader_alive(is_alive)) {
>               continue;
>             }
>         } else {
>           Metadata* ic_oop = ic->cached_metadata();
>           if (ic_oop != NULL) {
>             if (ic_oop->is_klass()) {
>               if (((Klass*)ic_oop)->is_loader_alive(is_alive)) {
>                 continue;
>               }
>             } else if (ic_oop->is_method()) {
>               if (((Method*)ic_oop)->method_holder()->is_loader_alive(is_alive)) {
>                 continue;
>               }
>             } else {
>               ShouldNotReachHere();
>             }
>           }
>           }
>           ic->set_to_clean();
>       }
> 
> 
> Note the weird indentation, which could be seen as a hint that this might be a dubious / accidental change.
> 
> To understand why this is causing the problems we are seeing it's good to start by reading:
> https://wiki.openjdk.java.net/display/HotSpot/Overview+of+CompiledIC+and+CompiledStaticCall
> 
> When the GC hits this path and finds an ic_oop that is NULL, it means that it is dealing with an inline cache that is a megamorphic vtable call (or clean). Those should not be cleaned (at least that wasn't the intention of the old code).
> 
> But now we do clean them, and to do so we use an ICStub (to make a safe transition to the clean state), which uses up slots in the ICBuffer. When the ICBuffer is full, concurrent GCs have to stop and schedule an ICBufferFull safepoint stop-the-world operation, which removes the ICStub from the inline cache and completely frees up the ICBuffer. If the GC cleans a lot of these megamorphic vtable inline caches, then we'll create a large number of ICBufferFull safepoints.
> 
> But it is even worse than that. After the class unloading GCs have destroyed all megamorphic vtable inline caches, the Java threads will see these cleaned inline caches and correct them. Correcting the cleaned inline caches from the Java threads will also use ICStubs, and eventually the inline caches will transition back to be a megamorphic vtable calls. Because of this we can end up in a situation where the GC and Java threads change the inline cache back and forth between clean and megamorphic vtable calls. When this happen both GC and Java threads will continuously schedule ICBufferFull safepoints, and this can go on for many seconds, even minutes, if we are unlucky. For ZGC this has the effect that it blocks any further GC work, and eventually the Java threads will run out of memory and hit allocation stalls. The Java threads will then wait for the GC to "clean" all inline caches and exit the class unloading phase and proceed to the phase where memory is reclaimed. You can see 
 in the GC logs that even though the problematic unlinking phase goes on for many seconds, the allocation stalls are "only" a few hundred milliseconds. This shows that when the Java threads stop fighting over the inline caches, the GC can finish the work relatively quickly.
> 
> G1 performs the inline cache cleaning while the Java threads are stopped, and therefore don't have to use ICStubs when the megamorphic vtables are accidentally cleaned. So, G1 (and other stop-the-world class unloading GCs) won't enter the situation where the GC and Java thread concurrently fight over the inline caches. It still causes the Java threads to have to take a slow path and fix the inline caches, which can result in unnecessary ICBufferFull safepoints.
> 
> I been able to reproduce the issue where ZGC and the Java threads fight over the ICStubs, causing minute long unloading times, by running one of the microbenchmarks from the Blackbird library used by the reporter of this issue. See description in:
> https://mail.openjdk.java.net/pipermail/zgc-dev/2021-November/001096.html
> 
> I think this could be reproduced in other workloads as well. I've also been able to reproduce the excessive ICBufferFull safepoints with Kitchensink (an oracle-internal stress test).
> 
> I've verified that restoring the set_to_clean code to the right scope fixes the issue that I can reproduce with both Blackbird and Kitchensink. After the fix, the class unloading times go back to normal levels.
> 
> To identify this issue, it's good to run with -Xlog:gc*,safepoint and take note of the "Concurrent Process Non-Strong References" times and ICBufferFull safepoint lines.
> 
> Example logs from ZGC where concurrent cleaning causes ICBufferFull safepoints:
> 
> [38.557s][1637062062666ms][info ][gc,phases   ] GC(222) Concurrent Mark Free 0.001ms
> [38.565s][1637062062673ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 7389821 ns, Reaching safepoint: 167546 ns, At safepoint: 6840 ns, Total: 174386 ns
> [38.565s][1637062062673ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 27749 ns, Reaching safepoint: 89368 ns, At safepoint: 5710 ns, Total: 95078 ns
> [38.566s][1637062062674ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 678872 ns, Reaching safepoint: 145967 ns, At safepoint: 6969 ns, Total: 152936 ns
> [38.567s][1637062062675ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 934596 ns, Reaching safepoint: 165826 ns, At safepoint: 5460 ns, Total: 171286 ns
> [38.567s][1637062062675ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 16500 ns, Reaching safepoint: 91147 ns, At safepoint: 5770 ns, Total: 96917 ns
> [38.568s][1637062062677ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1124041 ns, Reaching safepoint: 154426 ns, At safepoint: 6280 ns, Total: 160706 ns
> [38.570s][1637062062678ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1222819 ns, Reaching safepoint: 152646 ns, At safepoint: 6920 ns, Total: 159566 ns
> [38.571s][1637062062679ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1070303 ns, Reaching safepoint: 152686 ns, At safepoint: 6029 ns, Total: 158715 ns
> [38.571s][1637062062679ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 23650 ns, Reaching safepoint: 83208 ns, At safepoint: 6170 ns, Total: 89378 ns
> [38.572s][1637062062681ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1005014 ns, Reaching safepoint: 148206 ns, At safepoint: 5660 ns, Total: 153866 ns
> [38.572s][1637062062681ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 15110 ns, Reaching safepoint: 84047 ns, At safepoint: 5690 ns, Total: 89737 ns
> [38.574s][1637062062682ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1370755 ns, Reaching safepoint: 171876 ns, At safepoint: 5030 ns, Total: 176906 ns
> [38.574s][1637062062682ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 19749 ns, Reaching safepoint: 82478 ns, At safepoint: 4740 ns, Total: 87218 ns
> [38.574s][1637062062682ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 12480 ns, Reaching safepoint: 86707 ns, At safepoint: 5040 ns, Total: 91747 ns
> [38.575s][1637062062684ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 904007 ns, Reaching safepoint: 162666 ns, At safepoint: 5160 ns, Total: 167826 ns
> [38.575s][1637062062684ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14269 ns, Reaching safepoint: 80878 ns, At safepoint: 5420 ns, Total: 86298 ns
> [38.577s][1637062062685ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1240908 ns, Reaching safepoint: 144267 ns, At safepoint: 7030 ns, Total: 151297 ns
> [38.578s][1637062062686ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 971325 ns, Reaching safepoint: 175725 ns, At safepoint: 4710 ns, Total: 180435 ns
> [38.578s][1637062062686ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 16140 ns, Reaching safepoint: 80258 ns, At safepoint: 5389 ns, Total: 85647 ns
> [38.578s][1637062062686ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 10290 ns, Reaching safepoint: 80858 ns, At safepoint: 5530 ns, Total: 86388 ns
> [38.579s][1637062062687ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 430509 ns, Reaching safepoint: 159906 ns, At safepoint: 4610 ns, Total: 164516 ns
> [38.579s][1637062062687ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 18269 ns, Reaching safepoint: 83838 ns, At safepoint: 4520 ns, Total: 88358 ns
> [38.579s][1637062062687ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 13270 ns, Reaching safepoint: 77928 ns, At safepoint: 4790 ns, Total: 82718 ns
> [38.579s][1637062062688ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 384230 ns, Reaching safepoint: 193705 ns, At safepoint: 4080 ns, Total: 197785 ns
> [38.579s][1637062062688ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14099 ns, Reaching safepoint: 80908 ns, At safepoint: 4840 ns, Total: 85748 ns
> [38.580s][1637062062688ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 9150 ns, Reaching safepoint: 79268 ns, At safepoint: 4890 ns, Total: 84158 ns
> [38.580s][1637062062689ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 549396 ns, Reaching safepoint: 143086 ns, At safepoint: 6430 ns, Total: 149516 ns
> [38.580s][1637062062689ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 12540 ns, Reaching safepoint: 94717 ns, At safepoint: 5800 ns, Total: 100517 ns
> [38.581s][1637062062690ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 847758 ns, Reaching safepoint: 146687 ns, At safepoint: 5969 ns, Total: 152656 ns
> [38.582s][1637062062691ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 972285 ns, Reaching safepoint: 128177 ns, At safepoint: 6350 ns, Total: 134527 ns
> [38.584s][1637062062692ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 986975 ns, Reaching safepoint: 136396 ns, At safepoint: 5770 ns, Total: 142166 ns
> [38.584s][1637062062692ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 17280 ns, Reaching safepoint: 87097 ns, At safepoint: 5270 ns, Total: 92367 ns
> [38.585s][1637062062693ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1143131 ns, Reaching safepoint: 188315 ns, At safepoint: 5250 ns, Total: 193565 ns
> [38.585s][1637062062694ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 12200 ns, Reaching safepoint: 80168 ns, At safepoint: 7480 ns, Total: 87648 ns
> [38.586s][1637062062695ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1153410 ns, Reaching safepoint: 166846 ns, At safepoint: 7060 ns, Total: 173906 ns
> [38.587s][1637062062695ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 21549 ns, Reaching safepoint: 89898 ns, At safepoint: 5360 ns, Total: 95258 ns
> [38.588s][1637062062696ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1129411 ns, Reaching safepoint: 156726 ns, At safepoint: 4810 ns, Total: 161536 ns
> [38.588s][1637062062696ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14089 ns, Reaching safepoint: 80588 ns, At safepoint: 5170 ns, Total: 85758 ns
> [38.589s][1637062062697ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 644824 ns, Reaching safepoint: 140666 ns, At safepoint: 5990 ns, Total: 146656 ns
> [38.590s][1637062062699ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1085312 ns, Reaching safepoint: 254264 ns, At safepoint: 5440 ns, Total: 259704 ns
> [38.590s][1637062062699ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14609 ns, Reaching safepoint: 83748 ns, At safepoint: 5610 ns, Total: 89358 ns
> [38.591s][1637062062699ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 387680 ns, Reaching safepoint: 201215 ns, At safepoint: 5340 ns, Total: 206555 ns
> [38.591s][1637062062699ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 18929 ns, Reaching safepoint: 85098 ns, At safepoint: 5910 ns, Total: 91008 ns
> [38.591s][1637062062700ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 380750 ns, Reaching safepoint: 175066 ns, At safepoint: 4730 ns, Total: 179796 ns
> [38.592s][1637062062700ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14500 ns, Reaching safepoint: 80577 ns, At safepoint: 6790 ns, Total: 87367 ns
> [38.592s][1637062062700ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 14660 ns, Reaching safepoint: 78498 ns, At safepoint: 7180 ns, Total: 85678 ns
> [38.592s][1637062062701ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 655783 ns, Reaching safepoint: 141717 ns, At safepoint: 6089 ns, Total: 147806 ns
> [38.594s][1637062062702ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 916657 ns, Reaching safepoint: 144226 ns, At safepoint: 5360 ns, Total: 149586 ns
> [38.595s][1637062062703ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 1012334 ns, Reaching safepoint: 133037 ns, At safepoint: 10439 ns, Total: 143476 ns
> [38.597s][1637062062705ms][info ][gc,phases   ] GC(222) Concurrent Process Non-Strong References 39.443ms
> 
> 
> Example logs from G1 where the Java threads fixes the cleaned inline caches and run out of ICStubs:
> 
> [125.998s][1637065197322ms][info ][gc          ] GC(1040) Pause Remark 586M->414M(2048M) 6.609ms
> [125.998s][1637065197322ms][info ][gc,cpu      ] GC(1040) User=0.08s Sys=0.00s Real=0.01s
> [125.998s][1637065197322ms][info ][safepoint   ] Safepoint "G1Concurrent", Time since last: 33150646 ns, Reaching safepoint: 103457 ns, At safepoint: 6666988 ns, Total: 6770445 ns
> [125.998s][1637065197322ms][info ][gc,marking  ] GC(1040) Concurrent Mark 38.296ms
> [125.998s][1637065197322ms][info ][gc,marking  ] GC(1040) Concurrent Rebuild Remembered Sets
> [126.001s][1637065197326ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 2621782 ns, Reaching safepoint: 626684 ns, At safepoint: 9340 ns, Total: 636024 ns
> [126.002s][1637065197326ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 19949 ns, Reaching safepoint: 714022 ns, At safepoint: 12160 ns, Total: 726182 ns
> [126.007s][1637065197331ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 4665009 ns, Reaching safepoint: 339751 ns, At safepoint: 9640 ns, Total: 349391 ns
> [126.009s][1637065197334ms][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 2274802 ns, Reaching safepoint: 365760 ns, At safepoint: 9250 ns, Total: 375010 ns
> [126.027s][1637065197352ms][info ][gc,marking  ] GC(1040) Concurrent Rebuild Remembered Sets 29.618ms
> 
> 
> I've tested the performance of the change with SPECjbb2015, SPECjvm2008, DaCapo, Renaissance.
> 
> I've tested run the patch through tier1-7.
> 
> Note that I've made patch as small as possible to make it easier to backport. Thanks @fisk for discussion and explanation of the inline caches code.

This pull request has now been integrated.

Changeset: 976c2bb0
Author:    Stefan Karlsson <stefank at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/976c2bb05611cdc7b11b0918aaf50ff693507aae
Stats:     4 lines in 1 file changed: 4 ins; 0 del; 0 mod

8277212: GC accidentally cleans valid megamorphic vtable inline caches

Reviewed-by: eosterlund, pliden, coleenp, thartmann

-------------

PR: https://git.openjdk.java.net/jdk/pull/6450

From dcubed at openjdk.java.net  Fri Nov 19 17:31:52 2021
From: dcubed at openjdk.java.net (Daniel D.Daugherty)
Date: Fri, 19 Nov 2021 17:31:52 GMT
Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with
 "assert(java_thread == _state->get_thread()) failed: Must be" [v3]
In-Reply-To: <Ivox2R4Y3x6BDgrGBt-QYm2grMJN5XWeetESBRyYa38=.3df670b6-2633-4a89-adac-2eb21f108689@github.com>
References: <PqDFsCk3WFtK9ZA-Blegiu6CzrA7EoiG6jx0raAFF7A=.5ddd8298-66cb-47c4-b53d-ca148ce2fc5b@github.com>
 <Ivox2R4Y3x6BDgrGBt-QYm2grMJN5XWeetESBRyYa38=.3df670b6-2633-4a89-adac-2eb21f108689@github.com>
Message-ID: <cPhjNtoyoQKXpiVbLbtArXKh1d3uftcmZH4RQzuo22U=.c98c571a-ea76-4938-b113-2ae2a9b33a5a@github.com>

On Thu, 18 Nov 2021 09:34:13 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote:

>> The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa.
>> The fix is to add a check for is_exiting() status into handshake closure do_thread() early.
>> There following handshake closures are fixed by this update:
>>   - UpdateForPopTopFrameClosure
>>  - SetForceEarlyReturn
>>  - SetFramePopClosure
>
> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision:
> 
>   get rid of the checks in jvmti handshakes: java_thread->threadObj() == NULL

Marked as reviewed by dcubed (Reviewer).

-------------

PR: https://git.openjdk.java.net/jdk/pull/6440

From dcubed at openjdk.java.net  Fri Nov 19 17:32:19 2021
From: dcubed at openjdk.java.net (Daniel D.Daugherty)
Date: Fri, 19 Nov 2021 17:32:19 GMT
Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with
 "assert(java_thread == _state->get_thread()) failed: Must be" [v3]
In-Reply-To: <2l9gjieNV6K8UMLcGHO_CtSWzzN5Kv45pFt6_3OZ85o=.1ae38c09-bf00-45ec-ac96-838469a5f7a7@github.com>
References: <PqDFsCk3WFtK9ZA-Blegiu6CzrA7EoiG6jx0raAFF7A=.5ddd8298-66cb-47c4-b53d-ca148ce2fc5b@github.com>
 <Ivox2R4Y3x6BDgrGBt-QYm2grMJN5XWeetESBRyYa38=.3df670b6-2633-4a89-adac-2eb21f108689@github.com>
 <UaxiSmd-pk9zw_gIc1aU53VdcCXkJNUkyySPxOaOxyw=.78569f8a-9402-4008-94b7-010f51f0bd9a@github.com>
 <2l9gjieNV6K8UMLcGHO_CtSWzzN5Kv45pFt6_3OZ85o=.1ae38c09-bf00-45ec-ac96-838469a5f7a7@github.com>
Message-ID: <M6_JOhIIw2-pfxYPK3LODN7KovedZD2FcZ0cRumjMms=.90ded333-b7da-43d3-986b-b20af6f365f0@github.com>

On Fri, 19 Nov 2021 10:14:23 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote:

>> src/hotspot/share/prims/jvmtiEnvBase.cpp line 1401:
>> 
>>> 1399:   if (!self) {
>>> 1400:     if (!java_thread->is_suspended()) {
>>> 1401:       _result = JVMTI_ERROR_THREAD_NOT_SUSPENDED;
>> 
>> I don't see an obvious reason for this `is_exiting()` check.
>
> Okay. I see similar check in the `force_early_return()` function:
> 
>   if (state == NULL) {
>     return JVMTI_ERROR_THREAD_NOT_ALIVE;
>   }
> 
> Would it better to replace it with this check instead? :
> 
>   if (java_thread->is_exiting()) {
>     return JVMTI_ERROR_THREAD_NOT_ALIVE;
>   }
> 
> Removing this check and keep the one inside the handshake would be even better.
> 
> I would also add this line for symmetry with two other cases:
> 
> +  MutexLocker mu(JvmtiThreadState_lock);
>   SetForceEarlyReturn op(state, value, tos);

My point is that I don't see why you added the `is_exiting()` check
since I don't see a race in that function, i.e., there's no `assert()` in
this function that you need to protect.

As for adding the `MutexLocker mu(JvmtiThreadState_lock)`, you'll
have to analyze and justify why you would need to add that lock grab
independent of this fix. I'm not seeing a bug there, but I haven't looked
very closely.

>> src/hotspot/share/prims/jvmtiEnvBase.cpp line 1533:
>> 
>>> 1531:     return; /* JVMTI_ERROR_THREAD_NOT_ALIVE (default) */
>>> 1532:   }
>>> 1533:   assert(java_thread == _state->get_thread(), "Must be");
>> 
>> This `assert()` is the site of the original test failure. I haven't yet
>> looked at the locations of the other changes.
>> 
>> The `is_exiting()` check is made under the protection of the
>> `JvmtiThreadState_lock` so an unsuspended target thread that is
>> exiting cannot reach the point where the `_state` is updated to
>> clear the `JavaThread*` so we can't fail the `assert()` if the
>> `is_exiting()` check has returned `false`.
>
> Dan,
> Thank you for reviewing this!
> I'm not sure, I correctly understand you here.
> Are you saying that you agree with this change?
> In fact, the thread state can not be changed (and the assert fired) after the `is_exiting()` check is made even without `JvmtiThreadState_lock` protection because it is inside of a handshake execution.

I agree with the `is_exiting()` check addition.

I forgot that we're executing a Handshake `doit()` function. So we have a couple
of reasons why an unsuspended target thread can't change from `!is_exiting()`
to `is_exiting()` while we are in this function.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6440

From hseigel at openjdk.java.net  Fri Nov 19 17:32:19 2021
From: hseigel at openjdk.java.net (Harold Seigel)
Date: Fri, 19 Nov 2021 17:32:19 GMT
Subject: RFR: 8277342: vmTestbase/nsk/stress/strace/strace004.java fails
 with SIGSEGV  in InstanceKlass::jni_id_for [v2]
In-Reply-To: <Im7D9kQOcv79sy_wAcG_jEVtZconnkJxMyqrxMx1cGY=.fec5deef-7db8-4b43-823c-e77c3336a396@github.com>
References: <pAeVupXPcD-13lzymILAyKfwTmvE0hAbuek4GnsZosg=.55fe59ce-31c2-4532-b573-686cc3275d36@github.com>
 <Im7D9kQOcv79sy_wAcG_jEVtZconnkJxMyqrxMx1cGY=.fec5deef-7db8-4b43-823c-e77c3336a396@github.com>
Message-ID: <inREIlhd19mUydwxvlAkuDNuPnvUzvhSd-GHW6KOR28=.dd50a401-2264-4717-a62d-64fe7965829e@github.com>

On Fri, 19 Nov 2021 02:39:15 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

>> Use the version jni_id_for_impl() as jni_id_for() that takes out the JFieldIdCreation_lock before reading jni_ids in InstanceKlass.
>> Tested with mach5 tier1-3.
>
> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix comments.

LGTM! Thanks for doing this.
Harold

-------------

Marked as reviewed by hseigel (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6466

From coleenp at openjdk.java.net  Fri Nov 19 17:32:27 2021
From: coleenp at openjdk.java.net (Coleen Phillimore)
Date: Fri, 19 Nov 2021 17:32:27 GMT
Subject: RFR: 8277342: vmTestbase/nsk/stress/strace/strace004.java fails
 with SIGSEGV  in InstanceKlass::jni_id_for [v2]
In-Reply-To: <Im7D9kQOcv79sy_wAcG_jEVtZconnkJxMyqrxMx1cGY=.fec5deef-7db8-4b43-823c-e77c3336a396@github.com>
References: <pAeVupXPcD-13lzymILAyKfwTmvE0hAbuek4GnsZosg=.55fe59ce-31c2-4532-b573-686cc3275d36@github.com>
 <Im7D9kQOcv79sy_wAcG_jEVtZconnkJxMyqrxMx1cGY=.fec5deef-7db8-4b43-823c-e77c3336a396@github.com>
Message-ID: <qYmXPgTTO_YdbyDrtsobqPj_CNobNeROTbBY7nXgUEI=.8281d197-f9c0-451d-869e-d5fcc0df750c@github.com>

On Fri, 19 Nov 2021 02:39:15 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

>> Use the version jni_id_for_impl() as jni_id_for() that takes out the JFieldIdCreation_lock before reading jni_ids in InstanceKlass.
>> Tested with mach5 tier1-3.
>
> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix comments.

Thanks Harold and David.  I did run some startup and performance tests that might notice JNI (added link to bug for you) and there is no difference in performance.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6466

From mdoerr at openjdk.java.net  Fri Nov 19 17:34:44 2021
From: mdoerr at openjdk.java.net (Martin Doerr)
Date: Fri, 19 Nov 2021 17:34:44 GMT
Subject: RFR: 8277417: C1 LIR instruction for load-klass
In-Reply-To: <voPyK1Nahe_KLkjweEy6mY-ZEJ3RY5uhGYcF-S9fJQg=.adcb830e-d9f8-4bae-a0fb-079cfd7ffc01@github.com>
References: <voPyK1Nahe_KLkjweEy6mY-ZEJ3RY5uhGYcF-S9fJQg=.adcb830e-d9f8-4bae-a0fb-079cfd7ffc01@github.com>
Message-ID: <u6ieWX72_T7LZr0QnHK8oGtGVl0u9HMPMhSKJMSisuM=.02f186d2-0d8d-41a2-8f66-765fd624c303@github.com>

On Thu, 18 Nov 2021 20:16:27 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

> In C1, the load of a Klass* out of an object is currently identified by a load of type T_ADDRESS with offset oopDest::klass_offset_in_bytes(). When encountering such load, this may be decoded when +CompressedClassPointers. This is problematic and ugly: if we ever emit a T_ADDRESS load with offset 8 or 4 (== klass_offset_in_bytes) that is not a Klass*, we would attempt to decode the result. We have been lucky so far.
> 
> Also, in Lilliput, we want to do something entirely different there, and need to be able to emit more complex code, possibly including runtime call.
> 
> The change introduces a new C1 LIR opcode OpLoadKlass, and refactors the implementations in c1_LIRAssembler_xyz.cpp to emit the code there, instead of mem2reg(). Notice that I could not test anything but x86, all other platforms only received very basic testing via GHA. It would be nice if respective maintainers could give it a try.
> 
> Testing:
>  - [x] tier1 (x86_64)
>  - [x] tier2 (x86_64)
>  - [x] tier3 (x86_64)

Looks like a nice change. I only found one problem on Power.

src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp line 2737:

> 2735:   if (info != NULL) {
> 2736:     add_debug_info_for_null_check_here(info);
> 2737:   }

I think this is incorrect for AIX. Note that the first page is not read protected on that OS. To make it consistent with other places, I suggest:

diff --git a/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp b/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp
index a772e48f3be..23e03cb36e3 100644
--- a/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp
+++ b/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp
@@ -2733,7 +2733,11 @@ void LIR_Assembler::emit_load_klass(LIR_OpLoadKlass* op) {
 
   CodeEmitInfo* info = op->info();
   if (info != NULL) {
-    add_debug_info_for_null_check_here(info);
+    if (!os::zero_page_read_protected() || !ImplicitNullChecks) {
+      explicit_null_check(obj, info);
+    } else {
+      add_debug_info_for_null_check_here(info);
+    }
   }
 
   if (UseCompressedClassPointers) {

-------------

Changes requested by mdoerr (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6464

From simonis at openjdk.java.net  Fri Nov 19 17:35:38 2021
From: simonis at openjdk.java.net (Volker Simonis)
Date: Fri, 19 Nov 2021 17:35:38 GMT
Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks [v6]
In-Reply-To: <IwunT5rbTfQfEBiFMgvgBn5VBfhbuGwJ7_f3EaWGPEY=.16bb5520-3fbc-4e80-9227-ac6ad299c244@github.com>
References: <h_TohyRCtTn8BzN-h8uTTotWxPr5gTe4CWB04_lS8uA=.3f687b45-3058-42ab-902b-f42cfeb9104d@github.com>
 <IwunT5rbTfQfEBiFMgvgBn5VBfhbuGwJ7_f3EaWGPEY=.16bb5520-3fbc-4e80-9227-ac6ad299c244@github.com>
Message-ID: <94GszC2UOkr7OG_XVItGSF0HOXlK3R_8kQwfxK4sxTE=.e4b0e6bb-928f-47a8-bb50-7263d0683d9a@github.com>

On Fri, 19 Nov 2021 14:29:17 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> This is part of a number of RFE I plan to improve and simplify C-heap overflow checking in hotspot.
>> 
>> This proposal adds NMT buffer overflow checking:
>> 
>> - it gives us C-heap overflow checking in release builds
>> - the costs are neglectable: if NMT is off, we won't pay anything; if NMT is on, the added work is minuscule since we have to do malloc header management anyway.
>> - NMT needs intact headers anyway. Faced with buffer overwrites today, it would maybe crash or maybe account wrongly, but it's a bit of a lottery really. Better to go the extra step and do a real check.
>> - it could be a preparation for future code removal, if we wanted to do that (see details in umbrella RFE https://bugs.openjdk.java.net/browse/JDK-8275301). That way, net complexity would come down even with this patch.
>> 
>> For more details, please see the JBS issue.
>> 
>> ----
>> 
>> Patch notes:
>> 
>> - The malloc header is changed such that it contains a 16-bit canary directly preceding the user payload of the allocation. The new malloc header does not use bitfields anymore but normal types. For more details, see the comment in mallocTracker.hpp.
>>   - On 64-bit, we don't enlarge the malloc header. It remains 16 bytes in length. So no additional memory cost (apart from the 1-byte-footer, see below). Space for the canary is instead obtained by reducing the size of the bucket index bit field to 16 bits. That bit field is used to store the bucket slot index of the malloc site table in NMT detail mode. With 40 bits it was over-dimensioned, and even 16-bits arguably still are: malloc site table width is 512.
>>   - On 32-bit, I had to enlarge the header from 8 bytes to 16 bytes to make room for a canary. But strictly speaking 8 bytes were not enough anyway: the header size has to be large enough to satisfy malloc(3) alignment, and that would be 16 bytes. I believe it never led to an error since we don't store 128bit data in malloc'd memory in the hotspot anywhere.
>> 
>> - I added a footer canary trailing the user allocation to catch tail buffer overruns. To keep matters simple (alignment) I made it a single byte only. That is enough to catch most overrun scenarios.
>> 
>> - I brushed up error reporting. When NMT detects corruption, it will now print out a hex dump of the corrupted area to tty before asserting.
>> 
>> - I added a bunch of gtests to test various heap overwrite scenarios. I also had to extend the gtest macros a bit because I wanted these tests of course to run in release builds too, but we did not have a death test macro for release builds yet (there are possibilities for code simplification here too, but that's for another RFE).
>> 
>> - I renamed `nmt_header_size` to `nmt_overhead` since that size includes header and footer now.
>> 
>> - I made the assert for malloc site table width a compile time STATIC_ASSERT.
>> 
>> --------------
>> 
>> Example output a buffer overrun would provide:
>> 
>> 
>> Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)
>> NMT Block at 0x00005600f86136b0, corruption at: 0x00005600f86136c1: 
>> 0x00005600f86136a8:   21 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00
>> 0x00005600f86136b8:   00 00 00 00 0f 00 1f fa 00 61 00 00 00 00 00 00
>> 0x00005600f86136c8:   41 39 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f86136d8:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
>> 0x00005600f86136e8:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f86136f8:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f8613708:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f8613718:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f8613728:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f8613738:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
>> assert failed: fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)#
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> #  Internal Error (mallocTracker.cpp:203), pid=10805, tid=10805
>> #  fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)
>> #
>> 
>> -------
>> 
>> Tests:
>> - manual tests with Linux x64, x86, minimal build
>> - GHAs all clean
>> - SAP nightlies ran for 4 weeks now without problems
>
> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits:
> 
>  - Volker Feedback 2
>  - Fix Zhengyu Problem in os::realloc
>  - Extend gtests
>  - extend footer to 2 bytes
>  - Feedback Volker
>  - Let NMT do overflow detection

Looks good now.

src/hotspot/share/services/mallocTracker.hpp line 435:

> 433:   }
> 434: 
> 435:   static inline void record_new_arena(MEMFLAGS flags) {

Yes, I also wondered why we need these versions so it's good that you could remove them!

-------------

Marked as reviewed by simonis (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/5952

From stuefe at openjdk.java.net  Fri Nov 19 17:48:04 2021
From: stuefe at openjdk.java.net (Thomas Stuefe)
Date: Fri, 19 Nov 2021 17:48:04 GMT
Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks [v6]
In-Reply-To: <94GszC2UOkr7OG_XVItGSF0HOXlK3R_8kQwfxK4sxTE=.e4b0e6bb-928f-47a8-bb50-7263d0683d9a@github.com>
References: <h_TohyRCtTn8BzN-h8uTTotWxPr5gTe4CWB04_lS8uA=.3f687b45-3058-42ab-902b-f42cfeb9104d@github.com>
 <IwunT5rbTfQfEBiFMgvgBn5VBfhbuGwJ7_f3EaWGPEY=.16bb5520-3fbc-4e80-9227-ac6ad299c244@github.com>
 <94GszC2UOkr7OG_XVItGSF0HOXlK3R_8kQwfxK4sxTE=.e4b0e6bb-928f-47a8-bb50-7263d0683d9a@github.com>
Message-ID: <pGaPQUaCZ1zS3zKdwRBpv7NZxVPB8CprhOEfJL4Ex5Y=.ecb28fd1-b899-4047-9f12-5fd1dc90ffa6@github.com>

On Fri, 19 Nov 2021 16:30:41 GMT, Volker Simonis <simonis at openjdk.org> wrote:

> Looks good now.

Many thanks, Volker! Nice to have this issue finally going somewhere. I feared it was stuck in PR limbo till after 18 ships.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5952

From coleenp at openjdk.java.net  Fri Nov 19 17:53:37 2021
From: coleenp at openjdk.java.net (Coleen Phillimore)
Date: Fri, 19 Nov 2021 17:53:37 GMT
Subject: Integrated: 8277342: vmTestbase/nsk/stress/strace/strace004.java fails
 with SIGSEGV  in InstanceKlass::jni_id_for
In-Reply-To: <pAeVupXPcD-13lzymILAyKfwTmvE0hAbuek4GnsZosg=.55fe59ce-31c2-4532-b573-686cc3275d36@github.com>
References: <pAeVupXPcD-13lzymILAyKfwTmvE0hAbuek4GnsZosg=.55fe59ce-31c2-4532-b573-686cc3275d36@github.com>
Message-ID: <5kXV0nOOuiIWZTtMqEYo2umU_x6R-xZnJgbF_tv12Uw=.bd70f60a-4230-4634-a665-899403bbc1fb@github.com>

On Thu, 18 Nov 2021 21:56:58 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

> Use the version jni_id_for_impl() as jni_id_for() that takes out the JFieldIdCreation_lock before reading jni_ids in InstanceKlass.
> Tested with mach5 tier1-3.

This pull request has now been integrated.

Changeset: 09e8c8c6
Author:    Coleen Phillimore <coleenp at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/09e8c8c64abf4178a042c79b92d7e08e54467331
Stats:     16 lines in 2 files changed: 0 ins; 13 del; 3 mod

8277342: vmTestbase/nsk/stress/strace/strace004.java fails with SIGSEGV  in InstanceKlass::jni_id_for

Reviewed-by: dholmes, hseigel

-------------

PR: https://git.openjdk.java.net/jdk/pull/6466

From coleenp at openjdk.java.net  Fri Nov 19 18:20:11 2021
From: coleenp at openjdk.java.net (Coleen Phillimore)
Date: Fri, 19 Nov 2021 18:20:11 GMT
Subject: RFR: 8277342: vmTestbase/nsk/stress/strace/strace004.java fails
 with SIGSEGV  in InstanceKlass::jni_id_for [v2]
In-Reply-To: <Im7D9kQOcv79sy_wAcG_jEVtZconnkJxMyqrxMx1cGY=.fec5deef-7db8-4b43-823c-e77c3336a396@github.com>
References: <pAeVupXPcD-13lzymILAyKfwTmvE0hAbuek4GnsZosg=.55fe59ce-31c2-4532-b573-686cc3275d36@github.com>
 <Im7D9kQOcv79sy_wAcG_jEVtZconnkJxMyqrxMx1cGY=.fec5deef-7db8-4b43-823c-e77c3336a396@github.com>
Message-ID: <iijtK5SE1CpzhmBUNSKOGa71Hrg0pMF6fGOEic6SfCM=.997ea483-f260-476f-8d8a-d29292573201@github.com>

On Fri, 19 Nov 2021 02:39:15 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

>> Use the version jni_id_for_impl() as jni_id_for() that takes out the JFieldIdCreation_lock before reading jni_ids in InstanceKlass.
>> Tested with mach5 tier1-3.
>
> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix comments.

I should have pointed out that tier5 also completed with no failures.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6466

From rkennke at openjdk.java.net  Fri Nov 19 18:22:37 2021
From: rkennke at openjdk.java.net (Roman Kennke)
Date: Fri, 19 Nov 2021 18:22:37 GMT
Subject: RFR: 8277417: C1 LIR instruction for load-klass [v2]
In-Reply-To: <voPyK1Nahe_KLkjweEy6mY-ZEJ3RY5uhGYcF-S9fJQg=.adcb830e-d9f8-4bae-a0fb-079cfd7ffc01@github.com>
References: <voPyK1Nahe_KLkjweEy6mY-ZEJ3RY5uhGYcF-S9fJQg=.adcb830e-d9f8-4bae-a0fb-079cfd7ffc01@github.com>
Message-ID: <vfOMZfnQF_K4XZ1a-1TR_d7Wm9zzSwNhSINMUooiQmU=.af43e2fa-12f6-4c8a-83c3-f509e7211f7c@github.com>

> In C1, the load of a Klass* out of an object is currently identified by a load of type T_ADDRESS with offset oopDest::klass_offset_in_bytes(). When encountering such load, this may be decoded when +CompressedClassPointers. This is problematic and ugly: if we ever emit a T_ADDRESS load with offset 8 or 4 (== klass_offset_in_bytes) that is not a Klass*, we would attempt to decode the result. We have been lucky so far.
> 
> Also, in Lilliput, we want to do something entirely different there, and need to be able to emit more complex code, possibly including runtime call.
> 
> The change introduces a new C1 LIR opcode OpLoadKlass, and refactors the implementations in c1_LIRAssembler_xyz.cpp to emit the code there, instead of mem2reg(). Notice that I could not test anything but x86, all other platforms only received very basic testing via GHA. It would be nice if respective maintainers could give it a try.
> 
> Testing:
>  - [x] tier1 (x86_64)
>  - [x] tier2 (x86_64)
>  - [x] tier3 (x86_64)

Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:

  Fix null-check on PPC

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6464/files
  - new: https://git.openjdk.java.net/jdk/pull/6464/files/988036fa..3454c1bf

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6464&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6464&range=00-01

  Stats: 7 lines in 1 file changed: 6 ins; 0 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6464.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6464/head:pull/6464

PR: https://git.openjdk.java.net/jdk/pull/6464

From rkennke at openjdk.java.net  Fri Nov 19 18:22:41 2021
From: rkennke at openjdk.java.net (Roman Kennke)
Date: Fri, 19 Nov 2021 18:22:41 GMT
Subject: RFR: 8277417: C1 LIR instruction for load-klass [v2]
In-Reply-To: <u6ieWX72_T7LZr0QnHK8oGtGVl0u9HMPMhSKJMSisuM=.02f186d2-0d8d-41a2-8f66-765fd624c303@github.com>
References: <voPyK1Nahe_KLkjweEy6mY-ZEJ3RY5uhGYcF-S9fJQg=.adcb830e-d9f8-4bae-a0fb-079cfd7ffc01@github.com>
 <u6ieWX72_T7LZr0QnHK8oGtGVl0u9HMPMhSKJMSisuM=.02f186d2-0d8d-41a2-8f66-765fd624c303@github.com>
Message-ID: <zqXTsO1eSqmWmi8TMbysTOKWdjCa2W3LO1ByjXADITg=.69a1ecd5-910d-46fe-b999-61a00d9395df@github.com>

On Fri, 19 Nov 2021 17:25:31 GMT, Martin Doerr <mdoerr at openjdk.org> wrote:

>> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Fix null-check on PPC
>
> src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp line 2737:
> 
>> 2735:   if (info != NULL) {
>> 2736:     add_debug_info_for_null_check_here(info);
>> 2737:   }
> 
> I think this is incorrect for AIX. Note that the first page is not read protected on that OS. To make it consistent with other places, I suggest:
> 
> diff --git a/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp b/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp
> index a772e48f3be..23e03cb36e3 100644
> --- a/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp
> +++ b/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp
> @@ -2733,7 +2733,11 @@ void LIR_Assembler::emit_load_klass(LIR_OpLoadKlass* op) {
>  
>    CodeEmitInfo* info = op->info();
>    if (info != NULL) {
> -    add_debug_info_for_null_check_here(info);
> +    if (!os::zero_page_read_protected() || !ImplicitNullChecks) {
> +      explicit_null_check(obj, info);
> +    } else {
> +      add_debug_info_for_null_check_here(info);
> +    }
>    }
>  
>    if (UseCompressedClassPointers) {

Thank you! I pushed a fix for that.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6464

From sspitsyn at openjdk.java.net  Fri Nov 19 18:25:09 2021
From: sspitsyn at openjdk.java.net (Serguei Spitsyn)
Date: Fri, 19 Nov 2021 18:25:09 GMT
Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with
 "assert(java_thread == _state->get_thread()) failed: Must be" [v3]
In-Reply-To: <Ivox2R4Y3x6BDgrGBt-QYm2grMJN5XWeetESBRyYa38=.3df670b6-2633-4a89-adac-2eb21f108689@github.com>
References: <PqDFsCk3WFtK9ZA-Blegiu6CzrA7EoiG6jx0raAFF7A=.5ddd8298-66cb-47c4-b53d-ca148ce2fc5b@github.com>
 <Ivox2R4Y3x6BDgrGBt-QYm2grMJN5XWeetESBRyYa38=.3df670b6-2633-4a89-adac-2eb21f108689@github.com>
Message-ID: <BFD75t5EYogiWui5xwhMSobG6_cPz1F0xaozU_7uwpE=.44517cf4-e2f9-48ec-b440-15f0b78991fb@github.com>

On Thu, 18 Nov 2021 09:34:13 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote:

>> The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa.
>> The fix is to add a check for is_exiting() status into handshake closure do_thread() early.
>> There following handshake closures are fixed by this update:
>>   - UpdateForPopTopFrameClosure
>>  - SetForceEarlyReturn
>>  - SetFramePopClosure
>
> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision:
> 
>   get rid of the checks in jvmti handshakes: java_thread->threadObj() == NULL

Dan, thank you for review!

-------------

PR: https://git.openjdk.java.net/jdk/pull/6440

From sspitsyn at openjdk.java.net  Fri Nov 19 18:54:22 2021
From: sspitsyn at openjdk.java.net (Serguei Spitsyn)
Date: Fri, 19 Nov 2021 18:54:22 GMT
Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with
 "assert(java_thread == _state->get_thread()) failed: Must be" [v3]
In-Reply-To: <Ivox2R4Y3x6BDgrGBt-QYm2grMJN5XWeetESBRyYa38=.3df670b6-2633-4a89-adac-2eb21f108689@github.com>
References: <PqDFsCk3WFtK9ZA-Blegiu6CzrA7EoiG6jx0raAFF7A=.5ddd8298-66cb-47c4-b53d-ca148ce2fc5b@github.com>
 <Ivox2R4Y3x6BDgrGBt-QYm2grMJN5XWeetESBRyYa38=.3df670b6-2633-4a89-adac-2eb21f108689@github.com>
Message-ID: <0c7TGP1yfAhaJo8PLIkHEvNxZmlqIP-3Lr3tw_dO3wU=.71231bf7-4d0e-48d0-bb77-2e275ef0e652@github.com>

On Thu, 18 Nov 2021 09:34:13 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote:

>> The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa.
>> The fix is to add a check for is_exiting() status into handshake closure do_thread() early.
>> There following handshake closures are fixed by this update:
>>   - UpdateForPopTopFrameClosure
>>  - SetForceEarlyReturn
>>  - SetFramePopClosure
>
> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision:
> 
>   get rid of the checks in jvmti handshakes: java_thread->threadObj() == NULL

David,
Thank you for your questions.
I'm not sure if all of them are resolved though. :)
Please, let me know if it is the case.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6440

From mdoerr at openjdk.java.net  Fri Nov 19 18:54:22 2021
From: mdoerr at openjdk.java.net (Martin Doerr)
Date: Fri, 19 Nov 2021 18:54:22 GMT
Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with
 "assert(java_thread == _state->get_thread()) failed: Must be" [v3]
In-Reply-To: <Ivox2R4Y3x6BDgrGBt-QYm2grMJN5XWeetESBRyYa38=.3df670b6-2633-4a89-adac-2eb21f108689@github.com>
References: <PqDFsCk3WFtK9ZA-Blegiu6CzrA7EoiG6jx0raAFF7A=.5ddd8298-66cb-47c4-b53d-ca148ce2fc5b@github.com>
 <Ivox2R4Y3x6BDgrGBt-QYm2grMJN5XWeetESBRyYa38=.3df670b6-2633-4a89-adac-2eb21f108689@github.com>
Message-ID: <p4Id9pguUCHErjWqK5zR6k9CiMF6jrohrUL7yLuUx5M=.70d826ad-cabe-4f98-9a6a-2e2f38e96d95@github.com>

On Thu, 18 Nov 2021 09:34:13 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote:

>> The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa.
>> The fix is to add a check for is_exiting() status into handshake closure do_thread() early.
>> There following handshake closures are fixed by this update:
>>   - UpdateForPopTopFrameClosure
>>  - SetForceEarlyReturn
>>  - SetFramePopClosure
>
> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision:
> 
>   get rid of the checks in jvmti handshakes: java_thread->threadObj() == NULL

Still good. Thumbs up from my side.

-------------

Marked as reviewed by mdoerr (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6440

From smarks at openjdk.java.net  Fri Nov 19 20:16:16 2021
From: smarks at openjdk.java.net (Stuart Marks)
Date: Fri, 19 Nov 2021 20:16:16 GMT
Subject: RFR: JDK-8276422 Add command-line option to disable finalization
 [v4]
In-Reply-To: <KdQiau7Z_SiFlF6LZJr2iMCilTHuvVuAcs9t_u-dH0s=.a1fd6719-c3dc-400f-9c65-0c70ba321120@github.com>
References: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
 <KdQiau7Z_SiFlF6LZJr2iMCilTHuvVuAcs9t_u-dH0s=.a1fd6719-c3dc-400f-9c65-0c70ba321120@github.com>
Message-ID: <aO5G0K3QYNaindgE_WVpPCYKDdSoL8dm1w0JfRyVnu4=.f7594022-a78f-44cd-bf6b-ac460404892c@github.com>

On Fri, 19 Nov 2021 00:14:18 GMT, Stuart Marks <smarks at openjdk.org> wrote:

>> Pretty much what it says. The new option controls a static member in InstanceKlass that's consulted to determine whether the finalization machinery is activated for instances when a class is loaded. A new native method is added so that this state can be queried from Java. This is used to control whether a finalizer thread is created and to disable the `System` and `Runtime::runFinalization` methods. Includes tests for the above.
>> 
>> Adding an option to disable finalization is part of [JEP 421](https://openjdk.java.net/jeps/421).
>
> Stuart Marks has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Remove Finalizer.Holder class.

Regarding **jcmd** updates, I'm thinking maybe this would be better handled separately. There is the potential to update to `GC.finalizer_info` discussed previously. Looking at the **jcmd** tool docs, it seems like `GC.run_finalization` also ought to be updated. And maybe one or more of the other commands (maybe `VM.flags` or `VM.info`?) ought to list the finalization enabled or disabled status. And of course the tool's doc will need to be updated as well.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6442

From bchristi at openjdk.java.net  Fri Nov 19 20:27:20 2021
From: bchristi at openjdk.java.net (Brent Christian)
Date: Fri, 19 Nov 2021 20:27:20 GMT
Subject: RFR: JDK-8276422 Add command-line option to disable finalization
 [v4]
In-Reply-To: <KdQiau7Z_SiFlF6LZJr2iMCilTHuvVuAcs9t_u-dH0s=.a1fd6719-c3dc-400f-9c65-0c70ba321120@github.com>
References: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
 <KdQiau7Z_SiFlF6LZJr2iMCilTHuvVuAcs9t_u-dH0s=.a1fd6719-c3dc-400f-9c65-0c70ba321120@github.com>
Message-ID: <UJCxizdMxXLe5fGa6UtOD77HteVMDI5CfkUAAoyKq4c=.5d524353-fde1-4665-908c-c2dceeee181c@github.com>

On Fri, 19 Nov 2021 00:14:18 GMT, Stuart Marks <smarks at openjdk.org> wrote:

>> Pretty much what it says. The new option controls a static member in InstanceKlass that's consulted to determine whether the finalization machinery is activated for instances when a class is loaded. A new native method is added so that this state can be queried from Java. This is used to control whether a finalizer thread is created and to disable the `System` and `Runtime::runFinalization` methods. Includes tests for the above.
>> 
>> Adding an option to disable finalization is part of [JEP 421](https://openjdk.java.net/jeps/421).
>
> Stuart Marks has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Remove Finalizer.Holder class.

Lib changes and tests look good

-------------

Marked as reviewed by bchristi (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6442

From dholmes at openjdk.java.net  Fri Nov 19 22:54:09 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Fri, 19 Nov 2021 22:54:09 GMT
Subject: RFR: JDK-8276422 Add command-line option to disable finalization
 [v4]
In-Reply-To: <aO5G0K3QYNaindgE_WVpPCYKDdSoL8dm1w0JfRyVnu4=.f7594022-a78f-44cd-bf6b-ac460404892c@github.com>
References: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
 <KdQiau7Z_SiFlF6LZJr2iMCilTHuvVuAcs9t_u-dH0s=.a1fd6719-c3dc-400f-9c65-0c70ba321120@github.com>
 <aO5G0K3QYNaindgE_WVpPCYKDdSoL8dm1w0JfRyVnu4=.f7594022-a78f-44cd-bf6b-ac460404892c@github.com>
Message-ID: <_dbLFHgXpFHVaUbTD5trbAHB01_HF-jA4FGc6kqCmO8=.de90977e-4624-453e-bc30-32acf46cbddc@github.com>

On Fri, 19 Nov 2021 20:13:06 GMT, Stuart Marks <smarks at openjdk.org> wrote:

>> Stuart Marks has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Remove Finalizer.Holder class.
>
> Regarding **jcmd** updates, I'm thinking maybe this would be better handled separately. There is the potential to update to `GC.finalizer_info` discussed previously. Looking at the **jcmd** tool docs, it seems like `GC.run_finalization` also ought to be updated. And maybe one or more of the other commands (maybe `VM.flags` or `VM.info`?) ought to list the finalization enabled or disabled status. And of course the tool's doc will need to be updated as well.

@stuart-marks no issue with doing dcmd/jcmd changes separately, but I don't think we need to go too far with this. I had considered `GC.run_finalization` but it just says it calls `System.run_finalization` - so no change needed there as it will be documented in System.runFinalization. And `VM.flags` only reports `-XX` flag information. And `VM.info` doesn't seem appropriate for mentioning this either. So no further changes needed to the other Dcmds IMO and no need to update anything on the jcmd tool page either.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6442

From smarks at openjdk.java.net  Sat Nov 20 02:19:09 2021
From: smarks at openjdk.java.net (Stuart Marks)
Date: Sat, 20 Nov 2021 02:19:09 GMT
Subject: RFR: JDK-8276422 Add command-line option to disable finalization
 [v4]
In-Reply-To: <_dbLFHgXpFHVaUbTD5trbAHB01_HF-jA4FGc6kqCmO8=.de90977e-4624-453e-bc30-32acf46cbddc@github.com>
References: <YOoUjYcp7pbHNEgcUWS44lE8V9LM9BDStxO-zjuy1OM=.fbd94042-9325-460f-a71c-8532e486c159@github.com>
 <KdQiau7Z_SiFlF6LZJr2iMCilTHuvVuAcs9t_u-dH0s=.a1fd6719-c3dc-400f-9c65-0c70ba321120@github.com>
 <aO5G0K3QYNaindgE_WVpPCYKDdSoL8dm1w0JfRyVnu4=.f7594022-a78f-44cd-bf6b-ac460404892c@github.com>
 <_dbLFHgXpFHVaUbTD5trbAHB01_HF-jA4FGc6kqCmO8=.de90977e-4624-453e-bc30-32acf46cbddc@github.com>
Message-ID: <5YHSN8bpKua9XcdiifEVMBT8Zqz9-bTOoDM1DJKp0HI=.f59a79c5-776e-4fd9-82b4-49d37795e0f1@github.com>

On Fri, 19 Nov 2021 22:50:49 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Regarding **jcmd** updates, I'm thinking maybe this would be better handled separately. There is the potential to update to `GC.finalizer_info` discussed previously. Looking at the **jcmd** tool docs, it seems like `GC.run_finalization` also ought to be updated. And maybe one or more of the other commands (maybe `VM.flags` or `VM.info`?) ought to list the finalization enabled or disabled status. And of course the tool's doc will need to be updated as well.
>
> @stuart-marks no issue with doing dcmd/jcmd changes separately, but I don't think we need to go too far with this. I had considered `GC.run_finalization` but it just says it calls `System.run_finalization` - so no change needed there as it will be documented in System.runFinalization. And `VM.flags` only reports `-XX` flag information. And `VM.info` doesn't seem appropriate for mentioning this either. So no further changes needed to the other Dcmds IMO and no need to update anything on the jcmd tool page either.

@dholmes-ora OK if you're confident that it's sufficient just to add `GC.finalizer_info` and nothing else, and no docs or additional testing, then I'll just drop in the code from that branch you posted. Of course I'll do a full build & test.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6442

From lmesnik at openjdk.java.net  Sat Nov 20 06:15:30 2021
From: lmesnik at openjdk.java.net (Leonid Mesnik)
Date: Sat, 20 Nov 2021 06:15:30 GMT
Subject: RFR: 8265795:
 vmTestbase/nsk/jvmti/AttachOnDemand/attach022/TestDescription.java fails when
 running with JEP 416
Message-ID: <_uX5GXu4fEzqseTEdOTcEH0HKiwQ8jeccft8kd5_Hcg=.3c1b3d8b-63f0-4d1b-a238-a10e86dd012c@github.com>

The VMObjectAlloc jvmti event was not generated for objects created using MethodHanldle. The fix adds posting of the event into Unsafe_AllocateInstance.

While fixing this bug I noticed that event is not posted in the intrinsics version for many functions where it is used. Including  but not limited to clone(), invoke()m allocateInstance() and allocateUninitializedArray(). There are might be other intensified functions (not analogs JVM_ENTRY versions) that allocate objects without post events. 

I think it is needed to implement some common way to handle this and cover it in another issue.

-------------

Commit messages:
 - switch to collerctor.
 - test added.
 - update
 - fix

Changes: https://git.openjdk.java.net/jdk/pull/6478/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6478&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8265795
  Stats: 148 lines in 4 files changed: 146 ins; 2 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6478.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6478/head:pull/6478

PR: https://git.openjdk.java.net/jdk/pull/6478

From sspitsyn at openjdk.java.net  Sat Nov 20 13:29:08 2021
From: sspitsyn at openjdk.java.net (Serguei Spitsyn)
Date: Sat, 20 Nov 2021 13:29:08 GMT
Subject: RFR: 8265795:
 vmTestbase/nsk/jvmti/AttachOnDemand/attach022/TestDescription.java fails when
 running with JEP 416
In-Reply-To: <_uX5GXu4fEzqseTEdOTcEH0HKiwQ8jeccft8kd5_Hcg=.3c1b3d8b-63f0-4d1b-a238-a10e86dd012c@github.com>
References: <_uX5GXu4fEzqseTEdOTcEH0HKiwQ8jeccft8kd5_Hcg=.3c1b3d8b-63f0-4d1b-a238-a10e86dd012c@github.com>
Message-ID: <1xupnqHZy2mpOQLkG92XaND-T6ofJY4UvhZbh1poUng=.b6785d7b-e048-4e53-b0c9-e3cbf742c452@github.com>

On Fri, 19 Nov 2021 15:32:24 GMT, Leonid Mesnik <lmesnik at openjdk.org> wrote:

> The VMObjectAlloc jvmti event was not generated for objects created using MethodHanldle. The fix adds posting of the event into Unsafe_AllocateInstance.
> 
> While fixing this bug I noticed that event is not posted in the intrinsics version for many functions where it is used. Including  but not limited to clone(), invoke()m allocateInstance() and allocateUninitializedArray(). There are might be other intensified functions (not analogs JVM_ENTRY versions) that allocate objects without post events. 
> 
> I think it is needed to implement some common way to handle this and cover it in another issue.

Hi Leonid,
This fix looks good to me.
Thanks,
Serguei

-------------

Marked as reviewed by sspitsyn (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6478

From lmesnik at openjdk.java.net  Sun Nov 21 00:13:09 2021
From: lmesnik at openjdk.java.net (Leonid Mesnik)
Date: Sun, 21 Nov 2021 00:13:09 GMT
Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with
 "assert(java_thread == _state->get_thread()) failed: Must be" [v3]
In-Reply-To: <Ivox2R4Y3x6BDgrGBt-QYm2grMJN5XWeetESBRyYa38=.3df670b6-2633-4a89-adac-2eb21f108689@github.com>
References: <PqDFsCk3WFtK9ZA-Blegiu6CzrA7EoiG6jx0raAFF7A=.5ddd8298-66cb-47c4-b53d-ca148ce2fc5b@github.com>
 <Ivox2R4Y3x6BDgrGBt-QYm2grMJN5XWeetESBRyYa38=.3df670b6-2633-4a89-adac-2eb21f108689@github.com>
Message-ID: <YiuLH3jW7GEYInAIurzXucLjJDow6QqgJSLTrE8oVaw=.58a279bc-473a-4b11-9e15-3f8496ee4616@github.com>

On Thu, 18 Nov 2021 09:34:13 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote:

>> The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa.
>> The fix is to add a check for is_exiting() status into handshake closure do_thread() early.
>> There following handshake closures are fixed by this update:
>>   - UpdateForPopTopFrameClosure
>>  - SetForceEarlyReturn
>>  - SetFramePopClosure
>
> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision:
> 
>   get rid of the checks in jvmti handshakes: java_thread->threadObj() == NULL

Marked as reviewed by lmesnik (Reviewer).

-------------

PR: https://git.openjdk.java.net/jdk/pull/6440

From ngasson at openjdk.java.net  Mon Nov 22 01:44:10 2021
From: ngasson at openjdk.java.net (Nick Gasson)
Date: Mon, 22 Nov 2021 01:44:10 GMT
Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection
 for Linux/AArch64 [v6]
In-Reply-To: <B7nLJm7Uegt41cWe9U00ZDQ8cdVkjkJav7-aXeXNFaQ=.c72106b7-9dd1-4fe7-9285-42b0e6ffd597@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
 <B7nLJm7Uegt41cWe9U00ZDQ8cdVkjkJav7-aXeXNFaQ=.c72106b7-9dd1-4fe7-9285-42b0e6ffd597@github.com>
Message-ID: <92o4fGbGUNrm39p_vfOJV2cIXw2lzq5CLYPhdlf4hwI=.87d970cc-7f40-4e10-8202-ecc3fd47d8be@github.com>

On Tue, 16 Nov 2021 14:23:07 GMT, Alan Hayward <duke at openjdk.java.net> wrote:

>> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One
>> of its uses is to protect against ROP based attacks. This is done by
>> signing the Link Register whenever it is stored on the stack, and
>> authenticating the value when it is loaded back from the stack. If an
>> attacker were to try to change control flow by editing the stack then
>> the authentication check of the Link Register will fail, causing a
>> segfault when the function returns.
>> 
>> On a system with PAC enabled, it is expected that all applications will
>> be compiled with ROP protection. Fedora 33 and upwards already provide
>> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of
>> PAC instructions that exist in the NOP space - on hardware without PAC,
>> these instructions act as NOPs, allowing backward compatibility for
>> negligible performance cost (2 NOPs per non-leaf function).
>> 
>> Hardware is currently limited to the Apple M1 MacBooks. All testing has
>> been done within a Fedora Docker image. A run of SpecJVM showed no
>> difference to that of noise - which was surprising.
>> 
>> The most important part of this patch is simply compiling using branch
>> protection provided by GCC/LLVM. This protects all C++ code from being
>> used in ROP attacks, removing all static ROP gadgets from use.
>> 
>> The remainder of the patch adds ROP protection to runtime generated
>> code, in both stubs and compiled Java code. Attacks here are much harder
>> as ROP gadgets must be found dynamically at runtime. If/when AOT
>> compilation is added to JDK, then all stubs and compiled Java will be
>> susceptible ROP gadgets being found by static analysis and therefore
>> potentially as vulnerable as C++ code.
>> 
>> There are a number of places where the VM changes control flow by
>> rewriting the stack or otherwise. I?ve done some analysis as to how
>> these could also be used for attacks (which I didn?t want to post here).
>> These areas can be protected ensuring the pointers to various stubs and
>> entry points are stored in memory as signed pointers. These changes are
>> simple to make (they can be reduced to a type change in common code and
>> a few addition sign/auth calls in the backend), but there a lot of them
>> and the total code change is fairly large. I?m happy to provide a few
>> work in progress patches.
>> 
>> In order to match the security benefits of the Apple Arm64e ABI across
>> the whole of JDK, then all the changes mentioned above would be
>> required.
>
> Alan Hayward has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits:
> 
>  - Merge master
>  - Rename pauth_authenticate_or_strip_return_address
>  - Fix windows aarch64 by restoring pauth file split
>  - Don't keep LR live across restore_live_registers
>  - Merge master
>  - Document pauth functions && remove OS split
>  - Update UseROPProtection description
>  - Simplify branch protection configure check
>  - 8264130: PAC-RET protection for Linux/AArch64
>    
>    PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One
>    of its uses is to protect against ROP based attacks. This is done by
>    signing the Link Register whenever it is stored on the stack, and
>    authenticating the value when it is loaded back from the stack. If an
>    attacker were to try to change control flow by editing the stack then
>    the authentication check of the Link Register will fail, causing a
>    segfault when the function returns.
>    
>    On a system with PAC enabled, it is expected that all applications will
>    be compiled with ROP protection. Fedora 33 and upwards already provide
>    this. By compiling for ARMv8.0, GCC and LLVM will only use the set of
>    PAC instructions that exist in the NOP space - on hardware without PAC,
>    these instructions act as NOPs, allowing backward compatibility for
>    negligible performance cost (2 NOPs per non-leaf function).
>    
>    Hardware is currently limited to the Apple M1 MacBooks. All testing has
>    been done within a Fedora Docker image. A run of SpecJVM showed no
>    difference to that of noise - which was surprising.
>    
>    The most important part of this patch is simply compiling using branch
>    protection provided by GCC/LLVM. This protects all C++ code from being
>    used in ROP attacks, removing all static ROP gadgets from use.
>    
>    The remainder of the patch adds ROP protection to runtime generated
>    code, in both stubs and compiled Java code. Attacks here are much harder
>    as ROP gadgets must be found dynamically at runtime. If/when AOT
>    compilation is added to JDK, then all stubs and compiled Java will be
>    susceptible ROP gadgets being found by static analysis and therefore
>    potentially as vulnerable as C++ code.
>    
>    There are a number of places where the VM changes control flow by
>    rewriting the stack or otherwise. I?ve done some analysis as to how
>    these could also be used for attacks (which I didn?t want to post here).
>    These areas can be protected ensuring the pointers to various stubs and
>    entry points are stored in memory as signed pointers. These changes are
>    simple to make (they can be reduced to a type change in common code and
>    a few addition sign/auth calls in the backend), but there a lot of them
>    and the total code change is fairly large. I?m happy to provide a few
>    work in progress patches.
>    
>    In order to match the security benefits of the Apple Arm64e ABI across
>    the whole of JDK, then all the changes mentioned above would be
>    required.
>  - Add PAC assembly instructions
>  - ... and 2 more: https://git.openjdk.java.net/jdk/compare/b8d33a2a...deb17a56

LGTM and we did extensive jtreg testing internally (tier1 + hotspot_all, jdk_core).

-------------

Marked as reviewed by ngasson (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6334

From dholmes at openjdk.java.net  Mon Nov 22 01:46:09 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Mon, 22 Nov 2021 01:46:09 GMT
Subject: RFR: 8265795:
 vmTestbase/nsk/jvmti/AttachOnDemand/attach022/TestDescription.java fails when
 running with JEP 416
In-Reply-To: <_uX5GXu4fEzqseTEdOTcEH0HKiwQ8jeccft8kd5_Hcg=.3c1b3d8b-63f0-4d1b-a238-a10e86dd012c@github.com>
References: <_uX5GXu4fEzqseTEdOTcEH0HKiwQ8jeccft8kd5_Hcg=.3c1b3d8b-63f0-4d1b-a238-a10e86dd012c@github.com>
Message-ID: <uaBnm-T8kH9wsDFoV_Tq7YzqPGBPPnn8mZFkh_4JhDQ=.c6e59ee7-3a15-4731-9e64-62572c866794@github.com>

On Fri, 19 Nov 2021 15:32:24 GMT, Leonid Mesnik <lmesnik at openjdk.org> wrote:

> The VMObjectAlloc jvmti event was not generated for objects created using MethodHanldle. The fix adds posting of the event into Unsafe_AllocateInstance.
> 
> While fixing this bug I noticed that event is not posted in the intrinsics version for many functions where it is used. Including  but not limited to clone(), invoke()m allocateInstance() and allocateUninitializedArray(). There are might be other intensified functions (not analogs JVM_ENTRY versions) that allocate objects without post events. 
> 
> I think it is needed to implement some common way to handle this and cover it in another issue.

Hi Leonid,

Functional fix looks good. A couple of minor nits below.

I agree that fixing intrinsics should be a separate issue - I have to worry that the overhead of posting events can dwarf the operation itself. I would guess the intrinsic would need a short-cut to check if the event is enabled and if so drop back to non-intrinsic version.

Thanks,
David

test/hotspot/jtreg/serviceability/jvmti/VMObjectAlloc/VMObjectAllocTest.java line 49:

> 47:         mh.invoke("str");
> 48: 
> 49:         if(getNumberOfAllocation() != 1) {

space after 'if' please

test/hotspot/jtreg/serviceability/jvmti/VMObjectAlloc/libVMObjectAlloc.cpp line 91:

> 89: }
> 90: 
> 91: }

This looks spurious ??

-------------

Marked as reviewed by dholmes (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6478

From ngasson at openjdk.java.net  Mon Nov 22 02:03:13 2021
From: ngasson at openjdk.java.net (Nick Gasson)
Date: Mon, 22 Nov 2021 02:03:13 GMT
Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection
 for Linux/AArch64 [v6]
In-Reply-To: <B7nLJm7Uegt41cWe9U00ZDQ8cdVkjkJav7-aXeXNFaQ=.c72106b7-9dd1-4fe7-9285-42b0e6ffd597@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
 <B7nLJm7Uegt41cWe9U00ZDQ8cdVkjkJav7-aXeXNFaQ=.c72106b7-9dd1-4fe7-9285-42b0e6ffd597@github.com>
Message-ID: <YHLkParbi-9H8zF1c16HQt0hNwi0unld993k9jCORoE=.0d8408d3-c329-4be2-8ad4-a3230201a614@github.com>

On Tue, 16 Nov 2021 14:23:07 GMT, Alan Hayward <duke at openjdk.java.net> wrote:

>> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One
>> of its uses is to protect against ROP based attacks. This is done by
>> signing the Link Register whenever it is stored on the stack, and
>> authenticating the value when it is loaded back from the stack. If an
>> attacker were to try to change control flow by editing the stack then
>> the authentication check of the Link Register will fail, causing a
>> segfault when the function returns.
>> 
>> On a system with PAC enabled, it is expected that all applications will
>> be compiled with ROP protection. Fedora 33 and upwards already provide
>> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of
>> PAC instructions that exist in the NOP space - on hardware without PAC,
>> these instructions act as NOPs, allowing backward compatibility for
>> negligible performance cost (2 NOPs per non-leaf function).
>> 
>> Hardware is currently limited to the Apple M1 MacBooks. All testing has
>> been done within a Fedora Docker image. A run of SpecJVM showed no
>> difference to that of noise - which was surprising.
>> 
>> The most important part of this patch is simply compiling using branch
>> protection provided by GCC/LLVM. This protects all C++ code from being
>> used in ROP attacks, removing all static ROP gadgets from use.
>> 
>> The remainder of the patch adds ROP protection to runtime generated
>> code, in both stubs and compiled Java code. Attacks here are much harder
>> as ROP gadgets must be found dynamically at runtime. If/when AOT
>> compilation is added to JDK, then all stubs and compiled Java will be
>> susceptible ROP gadgets being found by static analysis and therefore
>> potentially as vulnerable as C++ code.
>> 
>> There are a number of places where the VM changes control flow by
>> rewriting the stack or otherwise. I?ve done some analysis as to how
>> these could also be used for attacks (which I didn?t want to post here).
>> These areas can be protected ensuring the pointers to various stubs and
>> entry points are stored in memory as signed pointers. These changes are
>> simple to make (they can be reduced to a type change in common code and
>> a few addition sign/auth calls in the backend), but there a lot of them
>> and the total code change is fairly large. I?m happy to provide a few
>> work in progress patches.
>> 
>> In order to match the security benefits of the Apple Arm64e ABI across
>> the whole of JDK, then all the changes mentioned above would be
>> required.
>
> Alan Hayward has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits:
> 
>  - Merge master
>  - Rename pauth_authenticate_or_strip_return_address
>  - Fix windows aarch64 by restoring pauth file split
>  - Don't keep LR live across restore_live_registers
>  - Merge master
>  - Document pauth functions && remove OS split
>  - Update UseROPProtection description
>  - Simplify branch protection configure check
>  - 8264130: PAC-RET protection for Linux/AArch64
>    
>    PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One
>    of its uses is to protect against ROP based attacks. This is done by
>    signing the Link Register whenever it is stored on the stack, and
>    authenticating the value when it is loaded back from the stack. If an
>    attacker were to try to change control flow by editing the stack then
>    the authentication check of the Link Register will fail, causing a
>    segfault when the function returns.
>    
>    On a system with PAC enabled, it is expected that all applications will
>    be compiled with ROP protection. Fedora 33 and upwards already provide
>    this. By compiling for ARMv8.0, GCC and LLVM will only use the set of
>    PAC instructions that exist in the NOP space - on hardware without PAC,
>    these instructions act as NOPs, allowing backward compatibility for
>    negligible performance cost (2 NOPs per non-leaf function).
>    
>    Hardware is currently limited to the Apple M1 MacBooks. All testing has
>    been done within a Fedora Docker image. A run of SpecJVM showed no
>    difference to that of noise - which was surprising.
>    
>    The most important part of this patch is simply compiling using branch
>    protection provided by GCC/LLVM. This protects all C++ code from being
>    used in ROP attacks, removing all static ROP gadgets from use.
>    
>    The remainder of the patch adds ROP protection to runtime generated
>    code, in both stubs and compiled Java code. Attacks here are much harder
>    as ROP gadgets must be found dynamically at runtime. If/when AOT
>    compilation is added to JDK, then all stubs and compiled Java will be
>    susceptible ROP gadgets being found by static analysis and therefore
>    potentially as vulnerable as C++ code.
>    
>    There are a number of places where the VM changes control flow by
>    rewriting the stack or otherwise. I?ve done some analysis as to how
>    these could also be used for attacks (which I didn?t want to post here).
>    These areas can be protected ensuring the pointers to various stubs and
>    entry points are stored in memory as signed pointers. These changes are
>    simple to make (they can be reduced to a type change in common code and
>    a few addition sign/auth calls in the backend), but there a lot of them
>    and the total code change is fairly large. I?m happy to provide a few
>    work in progress patches.
>    
>    In order to match the security benefits of the Apple Arm64e ABI across
>    the whole of JDK, then all the changes mentioned above would be
>    required.
>  - Add PAC assembly instructions
>  - ... and 2 more: https://git.openjdk.java.net/jdk/compare/b8d33a2a...deb17a56

We're adding a product option `UseROPProtection` which needs a CSR according to https://wiki.openjdk.java.net/display/HotSpot/Hotspot+Command-line+Flags%3A+Kinds%2C+Lifecycle+and+the+CSR+Process

-------------

PR: https://git.openjdk.java.net/jdk/pull/6334

From dholmes at openjdk.java.net  Mon Nov 22 02:08:04 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Mon, 22 Nov 2021 02:08:04 GMT
Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with
 "assert(java_thread == _state->get_thread()) failed: Must be" [v3]
In-Reply-To: <Ivox2R4Y3x6BDgrGBt-QYm2grMJN5XWeetESBRyYa38=.3df670b6-2633-4a89-adac-2eb21f108689@github.com>
References: <PqDFsCk3WFtK9ZA-Blegiu6CzrA7EoiG6jx0raAFF7A=.5ddd8298-66cb-47c4-b53d-ca148ce2fc5b@github.com>
 <Ivox2R4Y3x6BDgrGBt-QYm2grMJN5XWeetESBRyYa38=.3df670b6-2633-4a89-adac-2eb21f108689@github.com>
Message-ID: <ptKMLCt0HmT-eEFU-sBS4O5aefeCxbxjab_9cX6whK4=.173123b3-fe71-4e5e-8b67-cca1054d14f3@github.com>

On Thu, 18 Nov 2021 09:34:13 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote:

>> The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa.
>> The fix is to add a check for is_exiting() status into handshake closure do_thread() early.
>> There following handshake closures are fixed by this update:
>>   - UpdateForPopTopFrameClosure
>>  - SetForceEarlyReturn
>>  - SetFramePopClosure
>
> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision:
> 
>   get rid of the checks in jvmti handshakes: java_thread->threadObj() == NULL

Hi Serguei,

I still feel the bug here can be fixed simply by moving assertions, rather than by introducing a change in behaviour as to what error code would be returned.

But I'll leave to serviceability folk to decide.

Thanks,
David

-------------

PR: https://git.openjdk.java.net/jdk/pull/6440

From dholmes at openjdk.java.net  Mon Nov 22 02:08:04 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Mon, 22 Nov 2021 02:08:04 GMT
Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with
 "assert(java_thread == _state->get_thread()) failed: Must be" [v3]
In-Reply-To: <M6_JOhIIw2-pfxYPK3LODN7KovedZD2FcZ0cRumjMms=.90ded333-b7da-43d3-986b-b20af6f365f0@github.com>
References: <PqDFsCk3WFtK9ZA-Blegiu6CzrA7EoiG6jx0raAFF7A=.5ddd8298-66cb-47c4-b53d-ca148ce2fc5b@github.com>
 <Ivox2R4Y3x6BDgrGBt-QYm2grMJN5XWeetESBRyYa38=.3df670b6-2633-4a89-adac-2eb21f108689@github.com>
 <UaxiSmd-pk9zw_gIc1aU53VdcCXkJNUkyySPxOaOxyw=.78569f8a-9402-4008-94b7-010f51f0bd9a@github.com>
 <2l9gjieNV6K8UMLcGHO_CtSWzzN5Kv45pFt6_3OZ85o=.1ae38c09-bf00-45ec-ac96-838469a5f7a7@github.com>
 <M6_JOhIIw2-pfxYPK3LODN7KovedZD2FcZ0cRumjMms=.90ded333-b7da-43d3-986b-b20af6f365f0@github.com>
Message-ID: <GUSGlgQC3DXCzOONJswPVtkF_n-vvdrgjCg2wpj-wMU=.3adea615-bc22-4e40-87fe-846be90c8c6c@github.com>

On Fri, 19 Nov 2021 17:04:42 GMT, Daniel D. Daugherty <dcubed at openjdk.org> wrote:

>> Okay. I see similar check in the `force_early_return()` function:
>> 
>>   if (state == NULL) {
>>     return JVMTI_ERROR_THREAD_NOT_ALIVE;
>>   }
>> 
>> Would it better to replace it with this check instead? :
>> 
>>   if (java_thread->is_exiting()) {
>>     return JVMTI_ERROR_THREAD_NOT_ALIVE;
>>   }
>> 
>> Removing this check and keep the one inside the handshake would be even better.
>> 
>> I would also add this line for symmetry with two other cases:
>> 
>> +  MutexLocker mu(JvmtiThreadState_lock);
>>   SetForceEarlyReturn op(state, value, tos);
>
> My point is that I don't see why you added the `is_exiting()` check
> since I don't see a race in that function, i.e., there's no `assert()` in
> this function that you need to protect.
> 
> As for adding the `MutexLocker mu(JvmtiThreadState_lock)`, you'll
> have to analyze and justify why you would need to add that lock grab
> independent of this fix. I'm not seeing a bug there, but I haven't looked
> very closely.

The `is_exiting` check changes the behaviour from reporting JVMTI_ERROR_THREAD_NOT_SUSPENDED to JVMTI_ERROR_THREAD_NOT_ALIVE. Arguably it is a more precise answer, but it is somewhat splitting hairs. To me it might be clearer to the developer what their logic error is if they get NOT_SUSPENDED rather than NOT_ALIVE. Either way this change is not needed to fix any known bug and the change is behaviour seems questionable.

>> Dan,
>> Thank you for reviewing this!
>> I'm not sure, I correctly understand you here.
>> Are you saying that you agree with this change?
>> In fact, the thread state can not be changed (and the assert fired) after the `is_exiting()` check is made even without `JvmtiThreadState_lock` protection because it is inside of a handshake execution.
>
> I agree with the `is_exiting()` check addition.
> 
> I forgot that we're executing a Handshake `doit()` function. So we have a couple
> of reasons why an unsuspended target thread can't change from `!is_exiting()`
> to `is_exiting()` while we are in this function.

Again this introduces a more precise state check but also changes the behaviour by now reporting NOT_ALIVE instead of NOT_SUSPENDED. The assertion failure can be fixed by simply moving the assertion to after the suspension check.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6440

From dholmes at openjdk.java.net  Mon Nov 22 02:08:05 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Mon, 22 Nov 2021 02:08:05 GMT
Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with
 "assert(java_thread == _state->get_thread()) failed: Must be" [v3]
In-Reply-To: <2l9gjieNV6K8UMLcGHO_CtSWzzN5Kv45pFt6_3OZ85o=.1ae38c09-bf00-45ec-ac96-838469a5f7a7@github.com>
References: <PqDFsCk3WFtK9ZA-Blegiu6CzrA7EoiG6jx0raAFF7A=.5ddd8298-66cb-47c4-b53d-ca148ce2fc5b@github.com>
 <Ivox2R4Y3x6BDgrGBt-QYm2grMJN5XWeetESBRyYa38=.3df670b6-2633-4a89-adac-2eb21f108689@github.com>
 <UaxiSmd-pk9zw_gIc1aU53VdcCXkJNUkyySPxOaOxyw=.78569f8a-9402-4008-94b7-010f51f0bd9a@github.com>
 <2l9gjieNV6K8UMLcGHO_CtSWzzN5Kv45pFt6_3OZ85o=.1ae38c09-bf00-45ec-ac96-838469a5f7a7@github.com>
Message-ID: <9_cJhg6lSbDRvLkCZAYYpWwKYWi5gBefhTeGyvOtHGw=.8783abed-f787-4a74-85e2-da8659f9edca@github.com>

On Fri, 19 Nov 2021 10:15:05 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote:

>> src/hotspot/share/prims/jvmtiEnvBase.cpp line 1625:
>> 
>>> 1623:     return; /* JVMTI_ERROR_THREAD_NOT_ALIVE (default) */
>>> 1624:   }
>>> 1625:   assert(_state->get_thread() == java_thread, "Must be");
>> 
>> The `assert()` on L1625 is subject to the same race as the original site.
>> This `is_exiting()` check is made under the protection of the
>> `JvmtiThreadState_lock` so it is sufficient to protect that `assert()`.
>
> Okay, thanks!

Same comment as above.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6440

From lmesnik at openjdk.java.net  Mon Nov 22 04:23:00 2021
From: lmesnik at openjdk.java.net (Leonid Mesnik)
Date: Mon, 22 Nov 2021 04:23:00 GMT
Subject: RFR: 8265795:
 vmTestbase/nsk/jvmti/AttachOnDemand/attach022/TestDescription.java fails when
 running with JEP 416 [v2]
In-Reply-To: <_uX5GXu4fEzqseTEdOTcEH0HKiwQ8jeccft8kd5_Hcg=.3c1b3d8b-63f0-4d1b-a238-a10e86dd012c@github.com>
References: <_uX5GXu4fEzqseTEdOTcEH0HKiwQ8jeccft8kd5_Hcg=.3c1b3d8b-63f0-4d1b-a238-a10e86dd012c@github.com>
Message-ID: <CR9nV9YwLSiM2_XyOZuNOuCi9_Hbsn7xO3yzy2ZJ5Zc=.c45de772-84d1-4732-8b42-5ea87f82d062@github.com>

> The VMObjectAlloc jvmti event was not generated for objects created using MethodHanldle. The fix adds posting of the event into Unsafe_AllocateInstance.
> 
> While fixing this bug I noticed that event is not posted in the intrinsics version for many functions where it is used. Including  but not limited to clone(), invoke()m allocateInstance() and allocateUninitializedArray(). There are might be other intensified functions (not analogs JVM_ENTRY versions) that allocate objects without post events. 
> 
> I think it is needed to implement some common way to handle this and cover it in another issue.

Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision:

  fixed

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6478/files
  - new: https://git.openjdk.java.net/jdk/pull/6478/files/e160dbe3..b37ee052

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6478&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6478&range=00-01

  Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6478.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6478/head:pull/6478

PR: https://git.openjdk.java.net/jdk/pull/6478

From lmesnik at openjdk.java.net  Mon Nov 22 04:23:01 2021
From: lmesnik at openjdk.java.net (Leonid Mesnik)
Date: Mon, 22 Nov 2021 04:23:01 GMT
Subject: RFR: 8265795:
 vmTestbase/nsk/jvmti/AttachOnDemand/attach022/TestDescription.java fails when
 running with JEP 416 [v2]
In-Reply-To: <uaBnm-T8kH9wsDFoV_Tq7YzqPGBPPnn8mZFkh_4JhDQ=.c6e59ee7-3a15-4731-9e64-62572c866794@github.com>
References: <_uX5GXu4fEzqseTEdOTcEH0HKiwQ8jeccft8kd5_Hcg=.3c1b3d8b-63f0-4d1b-a238-a10e86dd012c@github.com>
 <uaBnm-T8kH9wsDFoV_Tq7YzqPGBPPnn8mZFkh_4JhDQ=.c6e59ee7-3a15-4731-9e64-62572c866794@github.com>
Message-ID: <uROrkUiWdWWwvHWLdSTqISZssZnEF6_N29Fc4_aLJlU=.f5280eb1-d981-481e-844a-4a831906cc3c@github.com>

On Mon, 22 Nov 2021 01:38:47 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   fixed
>
> test/hotspot/jtreg/serviceability/jvmti/VMObjectAlloc/VMObjectAllocTest.java line 49:
> 
>> 47:         mh.invoke("str");
>> 48: 
>> 49:         if(getNumberOfAllocation() != 1) {
> 
> space after 'if' please

fixed

-------------

PR: https://git.openjdk.java.net/jdk/pull/6478

From shade at openjdk.java.net  Mon Nov 22 09:21:41 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Mon, 22 Nov 2021 09:21:41 GMT
Subject: RFR: 8274903: Zero: Support AsyncGetCallTrace [v4]
In-Reply-To: <JjNvKdaMic8QCxlSJG-pmw0Ru9eLqwnf3KQ8xGVzETY=.4d44f1e7-503a-4f08-8d62-3bf7eae74a49@github.com>
References: <JjNvKdaMic8QCxlSJG-pmw0Ru9eLqwnf3KQ8xGVzETY=.4d44f1e7-503a-4f08-8d62-3bf7eae74a49@github.com>
Message-ID: <PfjfBiMZ8qI0NObWeP2TrmhgXFYlcP7hQHhjLe75Uh4=.d29aa96f-3ddc-4af2-8d27-07eca411b828@github.com>

> This is a Zero infrastructure improvement that makes Zero VM work with AsyncGetCallTrace, and by extension, async-profiler.
> 
> Zero is quite odd in stack management. The "real" stack actually contains the C++ Interpreter and the rest of VM code. The Java stack is reported through the usual "frame" mechanism the rest of VM uses to get the mapping from Template Interpreter, stub, and compiled code. So, to support Java-centric AsyncGetCallTrace, we t "only" need Zero to report the proper Java frames from its ZeroStack from the profiling/signal handlers. 
> 
> Additional testing:
>  - [x] Linux x86_64 Zero `serviceability/AsyncGetCallTrace` now pass
>  - [x] Linux x86_64 Zero works with `async-profiler`

Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision:

 - Fix a comment
 - Merge branch 'master' into JDK-8274903-zero-asyncgetcalltrace
 - More reviews
 - Review feedback
 - Merge branch 'master' into JDK-8274903-zero-asyncgetcalltrace
 - Initial work: runs async-profiler successfully

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/5848/files
  - new: https://git.openjdk.java.net/jdk/pull/5848/files/68ef4b63..bc4ba33b

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5848&range=03
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5848&range=02-03

  Stats: 44745 lines in 800 files changed: 32663 ins; 5661 del; 6421 mod
  Patch: https://git.openjdk.java.net/jdk/pull/5848.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/5848/head:pull/5848

PR: https://git.openjdk.java.net/jdk/pull/5848

From sspitsyn at openjdk.java.net  Mon Nov 22 09:26:03 2021
From: sspitsyn at openjdk.java.net (Serguei Spitsyn)
Date: Mon, 22 Nov 2021 09:26:03 GMT
Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with
 "assert(java_thread == _state->get_thread()) failed: Must be" [v3]
In-Reply-To: <Ivox2R4Y3x6BDgrGBt-QYm2grMJN5XWeetESBRyYa38=.3df670b6-2633-4a89-adac-2eb21f108689@github.com>
References: <PqDFsCk3WFtK9ZA-Blegiu6CzrA7EoiG6jx0raAFF7A=.5ddd8298-66cb-47c4-b53d-ca148ce2fc5b@github.com>
 <Ivox2R4Y3x6BDgrGBt-QYm2grMJN5XWeetESBRyYa38=.3df670b6-2633-4a89-adac-2eb21f108689@github.com>
Message-ID: <DkvpYldUuWDdZyWhuf2_l4Uwn3XfSFUr6BbVdIgrf4E=.a91f64e3-221c-4a81-b8f6-284756e7336d@github.com>

On Thu, 18 Nov 2021 09:34:13 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote:

>> The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa.
>> The fix is to add a check for is_exiting() status into handshake closure do_thread() early.
>> There following handshake closures are fixed by this update:
>>   - UpdateForPopTopFrameClosure
>>  - SetForceEarlyReturn
>>  - SetFramePopClosure
>
> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision:
> 
>   get rid of the checks in jvmti handshakes: java_thread->threadObj() == NULL

Hi David,
Thank you for looking at this and your comments.
Exiting thread should not be in suspended state.
Also, I'm pretty sure that the THREAD_NOT_ALIVE error code should normally take priority.
So, I prefer current fix over moving the assert.
But I kind of understand you concern. Thank you for sharing it!
Thanks,
Serguei

-------------

PR: https://git.openjdk.java.net/jdk/pull/6440

From ngasson at openjdk.java.net  Mon Nov 22 10:42:13 2021
From: ngasson at openjdk.java.net (Nick Gasson)
Date: Mon, 22 Nov 2021 10:42:13 GMT
Subject: RFR: 8277417: C1 LIR instruction for load-klass [v2]
In-Reply-To: <vfOMZfnQF_K4XZ1a-1TR_d7Wm9zzSwNhSINMUooiQmU=.af43e2fa-12f6-4c8a-83c3-f509e7211f7c@github.com>
References: <voPyK1Nahe_KLkjweEy6mY-ZEJ3RY5uhGYcF-S9fJQg=.adcb830e-d9f8-4bae-a0fb-079cfd7ffc01@github.com>
 <vfOMZfnQF_K4XZ1a-1TR_d7Wm9zzSwNhSINMUooiQmU=.af43e2fa-12f6-4c8a-83c3-f509e7211f7c@github.com>
Message-ID: <Mhn6nChI4QH6n1Jh6m7Wna3ei8vMBAOmfxStIWjryvo=.8ca821d3-4f78-4407-a97d-bdd2d41841d6@github.com>

On Fri, 19 Nov 2021 18:22:37 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> In C1, the load of a Klass* out of an object is currently identified by a load of type T_ADDRESS with offset oopDest::klass_offset_in_bytes(). When encountering such load, this may be decoded when +CompressedClassPointers. This is problematic and ugly: if we ever emit a T_ADDRESS load with offset 8 or 4 (== klass_offset_in_bytes) that is not a Klass*, we would attempt to decode the result. We have been lucky so far.
>> 
>> Also, in Lilliput, we want to do something entirely different there, and need to be able to emit more complex code, possibly including runtime call.
>> 
>> The change introduces a new C1 LIR opcode OpLoadKlass, and refactors the implementations in c1_LIRAssembler_xyz.cpp to emit the code there, instead of mem2reg(). Notice that I could not test anything but x86, all other platforms only received very basic testing via GHA. It would be nice if respective maintainers could give it a try.
>> 
>> Testing:
>>  - [x] tier1 (x86_64)
>>  - [x] tier2 (x86_64)
>>  - [x] tier3 (x86_64)
>
> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix null-check on PPC

I tested tier1 on 32-bit Arm and AArch64. 32-bit Arm had some failures but they don't seem to be related to this patch.

src/hotspot/cpu/arm/c1_LIRAssembler_arm.cpp line 2453:

> 2451:   }
> 2452: 
> 2453:   if (UseCompressedClassPointers) { // On 32 bit arm??

It's probably leftover from when the "arm" port supported both 32- and 64-bit.

-------------

Marked as reviewed by ngasson (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6464

From sspitsyn at openjdk.java.net  Mon Nov 22 10:51:13 2021
From: sspitsyn at openjdk.java.net (Serguei Spitsyn)
Date: Mon, 22 Nov 2021 10:51:13 GMT
Subject: Integrated: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails
 with "assert(java_thread == _state->get_thread()) failed: Must be"
In-Reply-To: <PqDFsCk3WFtK9ZA-Blegiu6CzrA7EoiG6jx0raAFF7A=.5ddd8298-66cb-47c4-b53d-ca148ce2fc5b@github.com>
References: <PqDFsCk3WFtK9ZA-Blegiu6CzrA7EoiG6jx0raAFF7A=.5ddd8298-66cb-47c4-b53d-ca148ce2fc5b@github.com>
Message-ID: <UoUlpfcuwGWGU6wX0ZDpSbmKlNfb2r5zkq-1nTzDeeY=.6ef61c45-e0df-4886-aebb-ad30c6641f5c@github.com>

On Wed, 17 Nov 2021 22:21:33 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote:

> The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa.
> The fix is to add a check for is_exiting() status into handshake closure do_thread() early.
> There following handshake closures are fixed by this update:
>   - UpdateForPopTopFrameClosure
>  - SetForceEarlyReturn
>  - SetFramePopClosure

This pull request has now been integrated.

Changeset: 32839ba0
Author:    Serguei Spitsyn <sspitsyn at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/32839ba012f0a0a66e249cd8d12b94499d82ec0a
Stats:     22 lines in 2 files changed: 10 ins; 6 del; 6 mod

8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with "assert(java_thread == _state->get_thread()) failed: Must be"

Reviewed-by: mdoerr, lmesnik, dcubed

-------------

PR: https://git.openjdk.java.net/jdk/pull/6440

From aph at openjdk.java.net  Mon Nov 22 10:58:14 2021
From: aph at openjdk.java.net (Andrew Haley)
Date: Mon, 22 Nov 2021 10:58:14 GMT
Subject: RFR: 8277417: C1 LIR instruction for load-klass [v2]
In-Reply-To: <vfOMZfnQF_K4XZ1a-1TR_d7Wm9zzSwNhSINMUooiQmU=.af43e2fa-12f6-4c8a-83c3-f509e7211f7c@github.com>
References: <voPyK1Nahe_KLkjweEy6mY-ZEJ3RY5uhGYcF-S9fJQg=.adcb830e-d9f8-4bae-a0fb-079cfd7ffc01@github.com>
 <vfOMZfnQF_K4XZ1a-1TR_d7Wm9zzSwNhSINMUooiQmU=.af43e2fa-12f6-4c8a-83c3-f509e7211f7c@github.com>
Message-ID: <dOf9YRuxcoRvBcQauKcuqlHfyRbQKlIvtWWsKkZqCfE=.08ad6908-7d6b-41ff-92eb-39ee527add52@github.com>

On Fri, 19 Nov 2021 18:22:37 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> In C1, the load of a Klass* out of an object is currently identified by a load of type T_ADDRESS with offset oopDest::klass_offset_in_bytes(). When encountering such load, this may be decoded when +CompressedClassPointers. This is problematic and ugly: if we ever emit a T_ADDRESS load with offset 8 or 4 (== klass_offset_in_bytes) that is not a Klass*, we would attempt to decode the result. We have been lucky so far.
>> 
>> Also, in Lilliput, we want to do something entirely different there, and need to be able to emit more complex code, possibly including runtime call.
>> 
>> The change introduces a new C1 LIR opcode OpLoadKlass, and refactors the implementations in c1_LIRAssembler_xyz.cpp to emit the code there, instead of mem2reg(). Notice that I could not test anything but x86, all other platforms only received very basic testing via GHA. It would be nice if respective maintainers could give it a try.
>> 
>> Testing:
>>  - [x] tier1 (x86_64)
>>  - [x] tier2 (x86_64)
>>  - [x] tier3 (x86_64)
>
> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix null-check on PPC

Thanks, a very welcome fix. I wish I had done something like this at the time of the AArch64 port, but I was neither brave enough nor knew enough

src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp line 991:

> 989:       // FIXME: OMG this is a horrible kludge.  Any offset from an
> 990:       // address that matches klass_offset_in_bytes() will be loaded
> 991:       // as a word, not a long.

Ha! I am so glad to see this horrible kludge removed.

-------------

Marked as reviewed by aph (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6464

From mcimadamore at openjdk.java.net  Mon Nov 22 12:02:47 2021
From: mcimadamore at openjdk.java.net (Maurizio Cimadamore)
Date: Mon, 22 Nov 2021 12:02:47 GMT
Subject: RFR: 8275063: Implementation of Foreign Function & Memory API
 (Second incubator) [v25]
In-Reply-To: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
References: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
Message-ID: <RLI2K0IF8MYBNWQuFvTJzdmyxNpzQScJbD0vyCrOwVI=.9a55346f-44d3-4ed5-83d8-66243b910576@github.com>

> This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment.
> 
> [1] - https://openjdk.java.net/jeps/419

Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision:

  Fix javadoc issues found in CSR review

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/5907/files
  - new: https://git.openjdk.java.net/jdk/pull/5907/files/79d3d685..1817975f

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=24
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=23-24

  Stats: 10 lines in 4 files changed: 2 ins; 6 del; 2 mod
  Patch: https://git.openjdk.java.net/jdk/pull/5907.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/5907/head:pull/5907

PR: https://git.openjdk.java.net/jdk/pull/5907

From mcimadamore at openjdk.java.net  Mon Nov 22 12:09:30 2021
From: mcimadamore at openjdk.java.net (Maurizio Cimadamore)
Date: Mon, 22 Nov 2021 12:09:30 GMT
Subject: RFR: 8275063: Implementation of Foreign Function & Memory API
 (Second incubator) [v26]
In-Reply-To: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
References: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
Message-ID: <Ck-IxUi_PmVxuycKrHWlXSgRy7B3L-XEf23m1Lief8U=.8320f211-3752-4ae8-beab-58b180870790@github.com>

> This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment.
> 
> [1] - https://openjdk.java.net/jeps/419

Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 35 commits:

 - Merge branch 'master' into JEP-419
 - Fix javadoc issues found in CSR review
 - Adopt blessed modofier order
 - Merge branch 'master' into JEP-419
 - Revert removal of upcall MH customization
   (This change caused spurious VM crashes, so reverting to baseline)
 - Further tweak upcall safety considerations
 - Clarify safety considerations for upcalls
 - Rename MemorySegment::ofAddressNative to MemorySegment::ofAddress
   (which is consistent with other restricted factories in VaList and NativeSymbol)
 - Streamline javadoc for package-info
 - * Add two new CLinker static methods to compute upcall/downcall method types
   * Clarify section on CLinker downcall type
   * Add section on CLinker safety guarantees
 - ... and 25 more: https://git.openjdk.java.net/jdk/compare/d427c79d...29cc6c60

-------------

Changes: https://git.openjdk.java.net/jdk/pull/5907/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=25
  Stats: 14700 lines in 193 files changed: 6958 ins; 5126 del; 2616 mod
  Patch: https://git.openjdk.java.net/jdk/pull/5907.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/5907/head:pull/5907

PR: https://git.openjdk.java.net/jdk/pull/5907

From mdoerr at openjdk.java.net  Mon Nov 22 12:44:10 2021
From: mdoerr at openjdk.java.net (Martin Doerr)
Date: Mon, 22 Nov 2021 12:44:10 GMT
Subject: RFR: 8277417: C1 LIR instruction for load-klass [v2]
In-Reply-To: <vfOMZfnQF_K4XZ1a-1TR_d7Wm9zzSwNhSINMUooiQmU=.af43e2fa-12f6-4c8a-83c3-f509e7211f7c@github.com>
References: <voPyK1Nahe_KLkjweEy6mY-ZEJ3RY5uhGYcF-S9fJQg=.adcb830e-d9f8-4bae-a0fb-079cfd7ffc01@github.com>
 <vfOMZfnQF_K4XZ1a-1TR_d7Wm9zzSwNhSINMUooiQmU=.af43e2fa-12f6-4c8a-83c3-f509e7211f7c@github.com>
Message-ID: <hZo8g7lLspsEig7VmjHg8ESAKtJlIM25SNpv8zKm0lY=.86705113-cfff-4b9e-ba26-375be9d3ff60@github.com>

On Fri, 19 Nov 2021 18:22:37 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> In C1, the load of a Klass* out of an object is currently identified by a load of type T_ADDRESS with offset oopDest::klass_offset_in_bytes(). When encountering such load, this may be decoded when +CompressedClassPointers. This is problematic and ugly: if we ever emit a T_ADDRESS load with offset 8 or 4 (== klass_offset_in_bytes) that is not a Klass*, we would attempt to decode the result. We have been lucky so far.
>> 
>> Also, in Lilliput, we want to do something entirely different there, and need to be able to emit more complex code, possibly including runtime call.
>> 
>> The change introduces a new C1 LIR opcode OpLoadKlass, and refactors the implementations in c1_LIRAssembler_xyz.cpp to emit the code there, instead of mem2reg(). Notice that I could not test anything but x86, all other platforms only received very basic testing via GHA. It would be nice if respective maintainers could give it a try.
>> 
>> Testing:
>>  - [x] tier1 (x86_64)
>>  - [x] tier2 (x86_64)
>>  - [x] tier3 (x86_64)
>
> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix null-check on PPC

Nice change! Please remove the duplicated `info != NULL` check before integrating.

-------------

Marked as reviewed by mdoerr (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6464

From mdoerr at openjdk.java.net  Mon Nov 22 12:44:11 2021
From: mdoerr at openjdk.java.net (Martin Doerr)
Date: Mon, 22 Nov 2021 12:44:11 GMT
Subject: RFR: 8277417: C1 LIR instruction for load-klass [v2]
In-Reply-To: <zqXTsO1eSqmWmi8TMbysTOKWdjCa2W3LO1ByjXADITg=.69a1ecd5-910d-46fe-b999-61a00d9395df@github.com>
References: <voPyK1Nahe_KLkjweEy6mY-ZEJ3RY5uhGYcF-S9fJQg=.adcb830e-d9f8-4bae-a0fb-079cfd7ffc01@github.com>
 <u6ieWX72_T7LZr0QnHK8oGtGVl0u9HMPMhSKJMSisuM=.02f186d2-0d8d-41a2-8f66-765fd624c303@github.com>
 <zqXTsO1eSqmWmi8TMbysTOKWdjCa2W3LO1ByjXADITg=.69a1ecd5-910d-46fe-b999-61a00d9395df@github.com>
Message-ID: <d7JZBQDp5FR97c5X0cOkVlo-4d1SIdCZgh2FViDUdYI=.28ab3dc2-1a8b-47bf-bd16-4c834e1c8d38@github.com>

On Fri, 19 Nov 2021 18:18:28 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp line 2737:
>> 
>>> 2735:   if (info != NULL) {
>>> 2736:     add_debug_info_for_null_check_here(info);
>>> 2737:   }
>> 
>> I think this is incorrect for AIX. Note that the first page is not read protected on that OS. To make it consistent with other places, I suggest:
>> 
>> diff --git a/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp b/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp
>> index a772e48f3be..23e03cb36e3 100644
>> --- a/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp
>> +++ b/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp
>> @@ -2733,7 +2733,11 @@ void LIR_Assembler::emit_load_klass(LIR_OpLoadKlass* op) {
>>  
>>    CodeEmitInfo* info = op->info();
>>    if (info != NULL) {
>> -    add_debug_info_for_null_check_here(info);
>> +    if (!os::zero_page_read_protected() || !ImplicitNullChecks) {
>> +      explicit_null_check(obj, info);
>> +    } else {
>> +      add_debug_info_for_null_check_here(info);
>> +    }
>>    }
>>  
>>    if (UseCompressedClassPointers) {
>
> Thank you! I pushed a fix for that.

Unfortunately, we have the `info != NULL` check twice, now. Otherwise, good.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6464

From zgu at openjdk.java.net  Mon Nov 22 13:44:07 2021
From: zgu at openjdk.java.net (Zhengyu Gu)
Date: Mon, 22 Nov 2021 13:44:07 GMT
Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks [v6]
In-Reply-To: <IwunT5rbTfQfEBiFMgvgBn5VBfhbuGwJ7_f3EaWGPEY=.16bb5520-3fbc-4e80-9227-ac6ad299c244@github.com>
References: <h_TohyRCtTn8BzN-h8uTTotWxPr5gTe4CWB04_lS8uA=.3f687b45-3058-42ab-902b-f42cfeb9104d@github.com>
 <IwunT5rbTfQfEBiFMgvgBn5VBfhbuGwJ7_f3EaWGPEY=.16bb5520-3fbc-4e80-9227-ac6ad299c244@github.com>
Message-ID: <suo-Q-mnE3HEZ23N_2XjkSUv-kX6Sd1VTJPIFpQONU4=.8ba5ce09-8d38-4249-8de7-eb064cb3b2ed@github.com>

On Fri, 19 Nov 2021 14:29:17 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> This is part of a number of RFE I plan to improve and simplify C-heap overflow checking in hotspot.
>> 
>> This proposal adds NMT buffer overflow checking:
>> 
>> - it gives us C-heap overflow checking in release builds
>> - the costs are neglectable: if NMT is off, we won't pay anything; if NMT is on, the added work is minuscule since we have to do malloc header management anyway.
>> - NMT needs intact headers anyway. Faced with buffer overwrites today, it would maybe crash or maybe account wrongly, but it's a bit of a lottery really. Better to go the extra step and do a real check.
>> - it could be a preparation for future code removal, if we wanted to do that (see details in umbrella RFE https://bugs.openjdk.java.net/browse/JDK-8275301). That way, net complexity would come down even with this patch.
>> 
>> For more details, please see the JBS issue.
>> 
>> ----
>> 
>> Patch notes:
>> 
>> - The malloc header is changed such that it contains a 16-bit canary directly preceding the user payload of the allocation. The new malloc header does not use bitfields anymore but normal types. For more details, see the comment in mallocTracker.hpp.
>>   - On 64-bit, we don't enlarge the malloc header. It remains 16 bytes in length. So no additional memory cost (apart from the 1-byte-footer, see below). Space for the canary is instead obtained by reducing the size of the bucket index bit field to 16 bits. That bit field is used to store the bucket slot index of the malloc site table in NMT detail mode. With 40 bits it was over-dimensioned, and even 16-bits arguably still are: malloc site table width is 512.
>>   - On 32-bit, I had to enlarge the header from 8 bytes to 16 bytes to make room for a canary. But strictly speaking 8 bytes were not enough anyway: the header size has to be large enough to satisfy malloc(3) alignment, and that would be 16 bytes. I believe it never led to an error since we don't store 128bit data in malloc'd memory in the hotspot anywhere.
>> 
>> - I added a footer canary trailing the user allocation to catch tail buffer overruns. To keep matters simple (alignment) I made it a single byte only. That is enough to catch most overrun scenarios.
>> 
>> - I brushed up error reporting. When NMT detects corruption, it will now print out a hex dump of the corrupted area to tty before asserting.
>> 
>> - I added a bunch of gtests to test various heap overwrite scenarios. I also had to extend the gtest macros a bit because I wanted these tests of course to run in release builds too, but we did not have a death test macro for release builds yet (there are possibilities for code simplification here too, but that's for another RFE).
>> 
>> - I renamed `nmt_header_size` to `nmt_overhead` since that size includes header and footer now.
>> 
>> - I made the assert for malloc site table width a compile time STATIC_ASSERT.
>> 
>> --------------
>> 
>> Example output a buffer overrun would provide:
>> 
>> 
>> Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)
>> NMT Block at 0x00005600f86136b0, corruption at: 0x00005600f86136c1: 
>> 0x00005600f86136a8:   21 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00
>> 0x00005600f86136b8:   00 00 00 00 0f 00 1f fa 00 61 00 00 00 00 00 00
>> 0x00005600f86136c8:   41 39 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f86136d8:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
>> 0x00005600f86136e8:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f86136f8:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f8613708:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f8613718:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f8613728:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f8613738:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
>> assert failed: fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)#
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> #  Internal Error (mallocTracker.cpp:203), pid=10805, tid=10805
>> #  fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)
>> #
>> 
>> -------
>> 
>> Tests:
>> - manual tests with Linux x64, x86, minimal build
>> - GHAs all clean
>> - SAP nightlies ran for 4 weeks now without problems
>
> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits:
> 
>  - Volker Feedback 2
>  - Fix Zhengyu Problem in os::realloc
>  - Extend gtests
>  - extend footer to 2 bytes
>  - Feedback Volker
>  - Let NMT do overflow detection

Marked as reviewed by zgu (Reviewer).

-------------

PR: https://git.openjdk.java.net/jdk/pull/5952

From eosterlund at openjdk.java.net  Mon Nov 22 14:03:34 2021
From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=)
Date: Mon, 22 Nov 2021 14:03:34 GMT
Subject: RFR: 8276696: ParallelObjectIterator freed at the wrong time in
 VM_HeapDumper
Message-ID: <JFVjS5PCXDGX5jmjraERwlIfpyJBrSSH001s9CaO_DE=.6d8b489a-cb67-4ed9-86b2-ee9fe94df314@github.com>

The VM_HeapDumper code uses a C heap allocated ParallelObjectIterator. It is constructed right before running a parallel operation with a work gang, but freed in the destructor of the VM_HeapDumper. This means it is created on one thread and deleted on another thread. This becomes a bit problematic when a parallel object iterator implementation uses a ThreadsListHandle (which is indeed the case for ZGC). This patch changes ParallelObjectIterator to be a StackObj, carrying a ParallelObjectIteratorImpl object, which is never exposed publicly. This ensures that construction and destruction of the internal object iterator is scoped like RAII objects, hence complying with how ThreadsListHandle is supposed to be used.

-------------

Commit messages:
 - 8276696: ParallelObjectIterator freed at the wrong time in VM_HeapDumper

Changes: https://git.openjdk.java.net/jdk/pull/6501/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6501&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8276696
  Stats: 70 lines in 15 files changed: 35 ins; 11 del; 24 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6501.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6501/head:pull/6501

PR: https://git.openjdk.java.net/jdk/pull/6501

From lmesnik at openjdk.java.net  Mon Nov 22 17:14:29 2021
From: lmesnik at openjdk.java.net (Leonid Mesnik)
Date: Mon, 22 Nov 2021 17:14:29 GMT
Subject: Integrated: 8265795:
 vmTestbase/nsk/jvmti/AttachOnDemand/attach022/TestDescription.java fails when
 running with JEP 416
In-Reply-To: <_uX5GXu4fEzqseTEdOTcEH0HKiwQ8jeccft8kd5_Hcg=.3c1b3d8b-63f0-4d1b-a238-a10e86dd012c@github.com>
References: <_uX5GXu4fEzqseTEdOTcEH0HKiwQ8jeccft8kd5_Hcg=.3c1b3d8b-63f0-4d1b-a238-a10e86dd012c@github.com>
Message-ID: <QpF0_ZjMxJjatZD6KGDjjfew6nA_W5-pLDgOah9dY6o=.304d2f11-2682-4f91-85f4-037ada596866@github.com>

On Fri, 19 Nov 2021 15:32:24 GMT, Leonid Mesnik <lmesnik at openjdk.org> wrote:

> The VMObjectAlloc jvmti event was not generated for objects created using MethodHanldle. The fix adds posting of the event into Unsafe_AllocateInstance.
> 
> While fixing this bug I noticed that event is not posted in the intrinsics version for many functions where it is used. Including  but not limited to clone(), invoke()m allocateInstance() and allocateUninitializedArray(). There are might be other intensified functions (not analogs JVM_ENTRY versions) that allocate objects without post events. 
> 
> I think it is needed to implement some common way to handle this and cover it in another issue.

This pull request has now been integrated.

Changeset: 33e2a518
Author:    Leonid Mesnik <lmesnik at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/33e2a518ebcd50e76c559512539fd7c864fd2407
Stats:     148 lines in 4 files changed: 146 ins; 2 del; 0 mod

8265795: vmTestbase/nsk/jvmti/AttachOnDemand/attach022/TestDescription.java fails when running with JEP 416

Reviewed-by: sspitsyn, dholmes

-------------

PR: https://git.openjdk.java.net/jdk/pull/6478

From duke at openjdk.java.net  Mon Nov 22 17:35:41 2021
From: duke at openjdk.java.net (Alan Hayward)
Date: Mon, 22 Nov 2021 17:35:41 GMT
Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection
 for Linux/AArch64 [v7]
In-Reply-To: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
Message-ID: <WwQPhroyXJ8tpaHYZQR37O6WxFzpcN9OLJb-3kSa-54=.e6c1dd83-3c1c-4dd9-9a0f-eef981d27bf2@github.com>

> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One
> of its uses is to protect against ROP based attacks. This is done by
> signing the Link Register whenever it is stored on the stack, and
> authenticating the value when it is loaded back from the stack. If an
> attacker were to try to change control flow by editing the stack then
> the authentication check of the Link Register will fail, causing a
> segfault when the function returns.
> 
> On a system with PAC enabled, it is expected that all applications will
> be compiled with ROP protection. Fedora 33 and upwards already provide
> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of
> PAC instructions that exist in the NOP space - on hardware without PAC,
> these instructions act as NOPs, allowing backward compatibility for
> negligible performance cost (2 NOPs per non-leaf function).
> 
> Hardware is currently limited to the Apple M1 MacBooks. All testing has
> been done within a Fedora Docker image. A run of SpecJVM showed no
> difference to that of noise - which was surprising.
> 
> The most important part of this patch is simply compiling using branch
> protection provided by GCC/LLVM. This protects all C++ code from being
> used in ROP attacks, removing all static ROP gadgets from use.
> 
> The remainder of the patch adds ROP protection to runtime generated
> code, in both stubs and compiled Java code. Attacks here are much harder
> as ROP gadgets must be found dynamically at runtime. If/when AOT
> compilation is added to JDK, then all stubs and compiled Java will be
> susceptible ROP gadgets being found by static analysis and therefore
> potentially as vulnerable as C++ code.
> 
> There are a number of places where the VM changes control flow by
> rewriting the stack or otherwise. I?ve done some analysis as to how
> these could also be used for attacks (which I didn?t want to post here).
> These areas can be protected ensuring the pointers to various stubs and
> entry points are stored in memory as signed pointers. These changes are
> simple to make (they can be reduced to a type change in common code and
> a few addition sign/auth calls in the backend), but there a lot of them
> and the total code change is fairly large. I?m happy to provide a few
> work in progress patches.
> 
> In order to match the security benefits of the Apple Arm64e ABI across
> the whole of JDK, then all the changes mentioned above would be
> required.

Alan Hayward has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits:

 - Merge master
 - Merge master
 - Rename pauth_authenticate_or_strip_return_address
 - Fix windows aarch64 by restoring pauth file split
 - Don't keep LR live across restore_live_registers
 - Merge master
 - Document pauth functions && remove OS split
 - Update UseROPProtection description
 - Simplify branch protection configure check
 - 8264130: PAC-RET protection for Linux/AArch64
   
   PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One
   of its uses is to protect against ROP based attacks. This is done by
   signing the Link Register whenever it is stored on the stack, and
   authenticating the value when it is loaded back from the stack. If an
   attacker were to try to change control flow by editing the stack then
   the authentication check of the Link Register will fail, causing a
   segfault when the function returns.
   
   On a system with PAC enabled, it is expected that all applications will
   be compiled with ROP protection. Fedora 33 and upwards already provide
   this. By compiling for ARMv8.0, GCC and LLVM will only use the set of
   PAC instructions that exist in the NOP space - on hardware without PAC,
   these instructions act as NOPs, allowing backward compatibility for
   negligible performance cost (2 NOPs per non-leaf function).
   
   Hardware is currently limited to the Apple M1 MacBooks. All testing has
   been done within a Fedora Docker image. A run of SpecJVM showed no
   difference to that of noise - which was surprising.
   
   The most important part of this patch is simply compiling using branch
   protection provided by GCC/LLVM. This protects all C++ code from being
   used in ROP attacks, removing all static ROP gadgets from use.
   
   The remainder of the patch adds ROP protection to runtime generated
   code, in both stubs and compiled Java code. Attacks here are much harder
   as ROP gadgets must be found dynamically at runtime. If/when AOT
   compilation is added to JDK, then all stubs and compiled Java will be
   susceptible ROP gadgets being found by static analysis and therefore
   potentially as vulnerable as C++ code.
   
   There are a number of places where the VM changes control flow by
   rewriting the stack or otherwise. I?ve done some analysis as to how
   these could also be used for attacks (which I didn?t want to post here).
   These areas can be protected ensuring the pointers to various stubs and
   entry points are stored in memory as signed pointers. These changes are
   simple to make (they can be reduced to a type change in common code and
   a few addition sign/auth calls in the backend), but there a lot of them
   and the total code change is fairly large. I?m happy to provide a few
   work in progress patches.
   
   In order to match the security benefits of the Apple Arm64e ABI across
   the whole of JDK, then all the changes mentioned above would be
   required.
 - ... and 3 more: https://git.openjdk.java.net/jdk/compare/ca31ed53...280abc41

-------------

Changes: https://git.openjdk.java.net/jdk/pull/6334/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=06
  Stats: 1381 lines in 25 files changed: 517 ins; 18 del; 846 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6334.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6334/head:pull/6334

PR: https://git.openjdk.java.net/jdk/pull/6334

From duke at openjdk.java.net  Mon Nov 22 17:35:45 2021
From: duke at openjdk.java.net (Alan Hayward)
Date: Mon, 22 Nov 2021 17:35:45 GMT
Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection
 for Linux/AArch64 [v6]
In-Reply-To: <B7nLJm7Uegt41cWe9U00ZDQ8cdVkjkJav7-aXeXNFaQ=.c72106b7-9dd1-4fe7-9285-42b0e6ffd597@github.com>
References: <Incu1NvV4G3SROSqBQmwIW3kTMb3dzEMvQFLeLAvmng=.c433cad4-5540-4fe9-b4bb-991b8597d973@github.com>
 <B7nLJm7Uegt41cWe9U00ZDQ8cdVkjkJav7-aXeXNFaQ=.c72106b7-9dd1-4fe7-9285-42b0e6ffd597@github.com>
Message-ID: <U6nJpJUFri0LpAiIVLc1sqNVIVWfxZL0Bow5cGKCULM=.45256608-6172-4919-8024-d8b044419e8a@github.com>

On Tue, 16 Nov 2021 14:23:07 GMT, Alan Hayward <duke at openjdk.java.net> wrote:

>> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One
>> of its uses is to protect against ROP based attacks. This is done by
>> signing the Link Register whenever it is stored on the stack, and
>> authenticating the value when it is loaded back from the stack. If an
>> attacker were to try to change control flow by editing the stack then
>> the authentication check of the Link Register will fail, causing a
>> segfault when the function returns.
>> 
>> On a system with PAC enabled, it is expected that all applications will
>> be compiled with ROP protection. Fedora 33 and upwards already provide
>> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of
>> PAC instructions that exist in the NOP space - on hardware without PAC,
>> these instructions act as NOPs, allowing backward compatibility for
>> negligible performance cost (2 NOPs per non-leaf function).
>> 
>> Hardware is currently limited to the Apple M1 MacBooks. All testing has
>> been done within a Fedora Docker image. A run of SpecJVM showed no
>> difference to that of noise - which was surprising.
>> 
>> The most important part of this patch is simply compiling using branch
>> protection provided by GCC/LLVM. This protects all C++ code from being
>> used in ROP attacks, removing all static ROP gadgets from use.
>> 
>> The remainder of the patch adds ROP protection to runtime generated
>> code, in both stubs and compiled Java code. Attacks here are much harder
>> as ROP gadgets must be found dynamically at runtime. If/when AOT
>> compilation is added to JDK, then all stubs and compiled Java will be
>> susceptible ROP gadgets being found by static analysis and therefore
>> potentially as vulnerable as C++ code.
>> 
>> There are a number of places where the VM changes control flow by
>> rewriting the stack or otherwise. I?ve done some analysis as to how
>> these could also be used for attacks (which I didn?t want to post here).
>> These areas can be protected ensuring the pointers to various stubs and
>> entry points are stored in memory as signed pointers. These changes are
>> simple to make (they can be reduced to a type change in common code and
>> a few addition sign/auth calls in the backend), but there a lot of them
>> and the total code change is fairly large. I?m happy to provide a few
>> work in progress patches.
>> 
>> In order to match the security benefits of the Apple Arm64e ABI across
>> the whole of JDK, then all the changes mentioned above would be
>> required.
>
> Alan Hayward has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits:
> 
>  - Merge master
>  - Rename pauth_authenticate_or_strip_return_address
>  - Fix windows aarch64 by restoring pauth file split
>  - Don't keep LR live across restore_live_registers
>  - Merge master
>  - Document pauth functions && remove OS split
>  - Update UseROPProtection description
>  - Simplify branch protection configure check
>  - 8264130: PAC-RET protection for Linux/AArch64
>    
>    PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One
>    of its uses is to protect against ROP based attacks. This is done by
>    signing the Link Register whenever it is stored on the stack, and
>    authenticating the value when it is loaded back from the stack. If an
>    attacker were to try to change control flow by editing the stack then
>    the authentication check of the Link Register will fail, causing a
>    segfault when the function returns.
>    
>    On a system with PAC enabled, it is expected that all applications will
>    be compiled with ROP protection. Fedora 33 and upwards already provide
>    this. By compiling for ARMv8.0, GCC and LLVM will only use the set of
>    PAC instructions that exist in the NOP space - on hardware without PAC,
>    these instructions act as NOPs, allowing backward compatibility for
>    negligible performance cost (2 NOPs per non-leaf function).
>    
>    Hardware is currently limited to the Apple M1 MacBooks. All testing has
>    been done within a Fedora Docker image. A run of SpecJVM showed no
>    difference to that of noise - which was surprising.
>    
>    The most important part of this patch is simply compiling using branch
>    protection provided by GCC/LLVM. This protects all C++ code from being
>    used in ROP attacks, removing all static ROP gadgets from use.
>    
>    The remainder of the patch adds ROP protection to runtime generated
>    code, in both stubs and compiled Java code. Attacks here are much harder
>    as ROP gadgets must be found dynamically at runtime. If/when AOT
>    compilation is added to JDK, then all stubs and compiled Java will be
>    susceptible ROP gadgets being found by static analysis and therefore
>    potentially as vulnerable as C++ code.
>    
>    There are a number of places where the VM changes control flow by
>    rewriting the stack or otherwise. I?ve done some analysis as to how
>    these could also be used for attacks (which I didn?t want to post here).
>    These areas can be protected ensuring the pointers to various stubs and
>    entry points are stored in memory as signed pointers. These changes are
>    simple to make (they can be reduced to a type change in common code and
>    a few addition sign/auth calls in the backend), but there a lot of them
>    and the total code change is fairly large. I?m happy to provide a few
>    work in progress patches.
>    
>    In order to match the security benefits of the Apple Arm64e ABI across
>    the whole of JDK, then all the changes mentioned above would be
>    required.
>  - Add PAC assembly instructions
>  - ... and 2 more: https://git.openjdk.java.net/jdk/compare/b8d33a2a...deb17a56

CSR added: https://bugs.openjdk.java.net/browse/JDK-8277543

-------------

PR: https://git.openjdk.java.net/jdk/pull/6334

From jorn.vernee at oracle.com  Mon Nov 22 19:19:31 2021
From: jorn.vernee at oracle.com (Jorn Vernee)
Date: Mon, 22 Nov 2021 20:19:31 +0100
Subject: Questions about oop handling for Panama upcalls.
In-Reply-To: <89b42995-1504-d3cc-1d37-595610b75801@oracle.com>
References: <700ffdf2-f63d-7d91-828a-d41e9aa433e5@oracle.com>
 <BN0PR10MB5176632CF1C775BC4BE361D1F49A9@BN0PR10MB5176.namprd10.prod.outlook.com>
 <9e2fa731-ff0c-3497-eda0-2ca394a1f33b@oracle.com>
 <BN0PR10MB517672D2A13028547A57F7DBF49A9@BN0PR10MB5176.namprd10.prod.outlook.com>
 <89b42995-1504-d3cc-1d37-595610b75801@oracle.com>
Message-ID: <a00a1f84-9e59-3418-87dd-428304b58af9@oracle.com>

One more comment on this thread for future readers:

As mentioned before, I had noticed that deoptimization code would 
reconstitute the receiver oop in the upcall stub's frame, so I added an 
extra stack word for that in the upcall frame (thinking at that time 
that the caller was supposed to make room for the receiver on the 
stack). But upon recent inspection of c2i adapter code, I noticed that 
the c2i adapter should already be making room for the receiver as well, 
so there should theoretically be no need for those extra stack words.

It turns out that the deopt code will not recreate the c2i adapter when 
doing a deopt. For that to work for compiled callers, the stack needs to 
be adjusted to make room for the parameters (as the 2ci adapter does), 
as well as extra locals. The space that is needed is calculated in 
Deoptimization::fetch_unroll_info_helper by the following code:

 ? // Compute the amount the oldest interpreter frame will have to adjust
 ? // its caller's stack by. If the caller is a compiled frame then
 ? // we pretend that the callee has no parameters so that the
 ? // extension counts for the full amount of locals and not just
 ? // locals-parms. This is because without a c2i adapter the parm
 ? // area as created by the compiled frame will not be usable by
 ? // the interpreter. (Depending on the calling convention there
 ? // may not even be enough space).

 ? // QQQ I'd rather see this pushed down into last_frame_adjust
 ? // and have it take the sender (aka caller).

 ? if (deopt_sender.is_compiled_frame() || caller_was_method_handle) {
 ??? caller_adjustment = last_frame_adjust(0, callee_locals);
 ? } else if (callee_locals > callee_parameters) {
 ??? // The caller frame may need extending to accommodate
 ??? // non-parameter locals of the first unpacked interpreted frame.
 ??? // Compute that adjustment.
 ??? caller_adjustment = last_frame_adjust(callee_parameters, 
callee_locals);
 ? }

I think you can probably spot the problem from this: we are doing a 
compiled call in the upcall stub, but the if-statement is not catching 
that case, so we don't make enough space on the stack. (in the case of 
method handles a pessimization seems to be used, since it's not known 
how much room the caller has on the stack for the parameters).

Jorn

On 17/11/2021 16:35, Jorn Vernee wrote:
> On 17/11/2021 16:14, Erik Osterlund wrote:
>> Hi Jorn,
>>
>> In the interpreter world, the expression stack at the call site 
>> becomes the locals
>> of the callee. So everything is passed through the stack. So the 
>> upcall stub sets
>> things up like an interpreter method would have (quack quack), and 
>> calls the
>> i2c adapter if there is an nmethod (quack quack), which will 
>> transform the
>> arguments to the compiled convention of the callee. The argument 
>> ownership
>> then switches from the caller to the callee, once the callee can 
>> manifest on the
>> stack. But if there are safepoints inbetween, then the caller owns 
>> the arguments
>> until its callee manifests.
> Okay, thanks, that makes sense. This probably explains why not 
> implementing preserve_callee_argument_oops for the upcall stubs didn't 
> cause any problems so far. There probably just weren't any safepoints 
> in between the call from the stub and the callee setting up it's 
> frame. (although I'm still a bit confused here why the callee doesn't 
> make space for the receiver in it's frame as well).
>> Do you want to avoid the pretend to be the interpreter step because 
>> it is costly
>> in the Panama world to spill arguments to the stack?
> I think either one could "work", although it seems like interpreter 
> calls require more setup of meta data around calls (which would be 
> unneeded if we called into an nmethod I think?). Also, we generate an 
> argument shuffle from the native convention to the Java calling 
> convention (this is unavoidable). If the native convention passes 
> arguments in the same registers that the Java convention expects them 
> in we don't have to generate code for that in the shuffle. 
> Theoretically we could also do a pass to minimize the needed shuffle 
> by reordering parameters on the MethodHandle. If we went with an 
> interpreted calling convention, we would always have to copy across 
> arguments to the stack, in a shuffle-ish manner (right now we rely on 
> SharedRuntime::java_calling_convention to compute the target 
> registers. Would have to implement something similar for the 
> interpreter convention).
>
> It seems to me that in the long run, going with the Java compiled 
> calling convention for the upcall is the right choice if we want to be 
> able to squeeze out as much speed as possible.
>
> Jorn
>>
>> /Erik
>>
>>> -----Original Message-----
>>> From: Jorn Vernee <jorn.vernee at oracle.com>
>>> Sent: Wednesday, 17 November 2021 15:49
>>> To: Erik Osterlund <erik.osterlund at oracle.com>; hotspot-
>>> dev at openjdk.java.net
>>> Subject: Re: Questions about oop handling for Panama upcalls.
>>>
>>> Hi Erik,
>>>
>>> Thanks for the suggestion.
>>>
>>> The callee is a mix of JDK internal and user code. The user gives us 
>>> a method
>>> handle that they want to turn into a native function pointer [1], 
>>> and we adapt
>>> that using method handle combinators [2] to take only primitve 
>>> arguments
>>> according to the registers in which the native calling convention 
>>> passes
>>> arguments (essentially each primitive argument is a register value). 
>>> The
>>> register values are then reconstructed into high-level arguments 
>>> (through
>>> our MH adaptation), and passed to the user code. It's this adapted 
>>> method
>>> handle that we call from the upcall stub.
>>>
>>> I guess what you're suggesting is that we have some internal Java 
>>> method
>>> like this:
>>>
>>> ? ??? static ... invoke(long methodHandle, ...) {
>>> ? ??????? MethodHandle mh = resolveJObject(methodHandle);
>>> ? ??????? return (...) mh.invokeExact(...);
>>> ? ??? }
>>>
>>> Which is then called from the upcall stub instead.
>>>
>>> I think it could work maybe (would have to see how the performance 
>>> works
>>> out), but we have to deal with different signatures, so would have 
>>> to use
>>> bytecode spinning to generate these 'invoke' methods on demand, which
>>> seems like maybe it's a worse medicine (in terms of complexity) than 
>>> adding
>>> the correct oop handling in the VM.
>>>
>>> I would also just like to get a better understanding of how this is 
>>> supposed to
>>> work in the first place (or how it works e.g. in the case of 
>>> nmethods), since I
>>> had to implement the correct oop handling in the past as well when
>>> implementing the intrinsics for down calls, and it's probably not 
>>> the last time I
>>> have to deal with something like this...
>>>
>>> ? > Our current upcall stubs try to quack like an interpreter in 
>>> many ways, so
>>> that it will look like an i-2-something call. I think you can either 
>>> try to do the
>>> same quacking dance, to pass the oop to the callee
>>>
>>> So, I suppose interpreter argument oops are handled through another
>>> mechanism than OopMaps, maybe something similar to
>>> CompiledMethod::preserve_callee_argument_oops?
>>>
>>> Thanks,
>>> Jorn
>>>
>>> [1] :
>>> https://github.com/openjdk/panama-foreign/blob/foreign-
>>> jextract/src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/CLink 
>>>
>>> er.java#L224
>>> [2] :
>>> https://github.com/openjdk/panama-foreign/blob/foreign-
>>> jextract/src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/abi/Pr 
>>>
>>> ogrammableUpcallHandler.java#L157
>>>
>>> On 17/11/2021 10:42, Erik Osterlund wrote:
>>>> Hi Jorn,
>>>>
>>>> So you have a jobject in the caller, resolve it, and then need to 
>>>> pass the
>>> oop around as an argument to the callee. Our current upcall stubs 
>>> try to
>>> quack like an interpreter in many ways, so that it will look like an 
>>> i-2-
>>> something call. I think you can either try to do the same quacking 
>>> dance, to
>>> pass the oop to the callee, or alternatively the primary question 
>>> for me
>>> seems to be who is the callee? You have a very fixed format for the 
>>> call,
>>> which makes me suspect the callee is some kind of JDK internal code.
>>> Another way of dealing with this would be to pass the jobject as a 
>>> long and
>>> just resolve it in the callee instead, if this is indeed JDK 
>>> internal code. Then
>>> this becomes a problem that doesn't need to be solved at all. Just 
>>> sanity
>>> checking.
>>>> /Erik
>>>>
>>>>> -----Original Message-----
>>>>> From: hotspot-dev<hotspot-dev-retn at openjdk.java.net>? On Behalf Of
>>>>> Jorn Vernee
>>>>> Sent: Tuesday, 16 November 2021 18:51 To:hotspot-
>>> dev at openjdk.java.net
>>>>> Subject: Questions about oop handling for Panama upcalls.
>>>>>
>>>>> Hi,
>>>>>
>>>>> For panama-foreign upcalls we spin our own upcall stubs that wrap a
>>>>> method handle VM entry for the actual upcall. I want to make sure I
>>>>> have the oop handling correct on this.
>>>>>
>>>>> We receive a list of arguments from native code (all primitives, so
>>>>> no oops to handle there), and then prefix that list with a
>>>>> MethodHandle oop, before calling into the MH's VM entry. The MH oop
>>>>> can be stored in three different
>>>>> places:
>>>>>
>>>>> 1. The MH oop is stored in a global JNI handle, and then resolved
>>>>> right before the upcall [1].
>>>>> 2. The MH oop is then stored in the first argument register j_rarg0
>>>>> for the call.
>>>>> 3. During a deopt of the callee, the deoptimization code spills the
>>>>> receiver (MH oop) into the frame of the upcall stub. (looks like the
>>>>> extending of the frame that happens for instance in c2i adapters
>>>>> doesn't make room for the receiver?).
>>>>>
>>>>> I don't think I need to do anything else for 1., but for 2. and 3.
>>>>> there is currently no handling. I wanted to ask how those cases
>>>>> should be handled, if at all.
>>>>>
>>>>> I think 2. could in theory be addressed by implementing
>>>>> CodeBlob::preserve_callee_argument_oops. Though, it has been working
>>>>> fine so far without this, so I'm wondering if this is even needed. Is
>>>>> the caller or callee responsible for handling argument oops (seems to
>>>>> be caller, from looking at
>>> CompiledMethod::preserve_callee_argument_oops)?
>>>>> Or does the caller just handle the receiver if there is one (since
>>>>> deopt spills that into the callers frame)? The oop offset is passed
>>>>> to an OopClosure in CompiledArgumentOopFinder::handle_oop_offset as
>>>>> an oop* [2]. Does the argument register get spilled somewhere and the
>>>>> oop needs to be patched in place at that address (by the OopClosure)?
>>>>> Or is this just used to mark the oop as alive? (in the latter 
>>>>> case, the JNI
>>> global should be enough I think).
>>>>> I think 3. could be handled with an OopMap entry at the frame offset
>>>>> where the receiver is spilled during a deopt of the callee? Should it
>>>>> be an oop or a narrowOop, or does it depend on VM settings? FWIW, the
>>>>> deopt code always seems to need a machine word (64-bits) to do the
>>>>> spilling, so I think it's an oop? Do I need to zero out that part of
>>>>> the frame when allocating the frame so that the GC doesn't mistake
>>>>> some garbage that's in there for an oop?
>>>>>
>>>>> I have a POC patch here for reference [3], that implements the 2
>>>>> things above. This passes our test suite, but I'm not sure about the
>>> correctness.
>>>>> Looking at what JNI does for upcalls [4], I don't see how e.g. the
>>>>> receiver argument that is put on the stack is handled, or what
>>>>> happens when the callee deopts (though I think it would just
>>>>> overwrite the value on the stack that's there already, since JNI
>>>>> always seems to do interpreted calls, where we do compiled calls).
>>>>> But, JNI/the call stub might be special cased elsewhere...
>>>>>
>>>>> Also, the oop is briefly stored in rscratch1 when resolving. I'm
>>>>> interested to know when the GC can look at the frame and register
>>>>> state, especially with concurrent GCs in mind. I'm assuming it's only
>>>>> during the call to the MH VM entry (but the existence of
>>> frame::safe_for_sender makes me less sure)?
>>>>> AFAIK the call counts as a safepoint (with oop map for it typically
>>>>> stored at the return offset). At this safepoint, the oop can only be
>>>>> stored at one of the
>>>>> 3 places listed at the start.
>>>>>
>>>>> Thanks,
>>>>> Jorn
>>>>>
>>>>> [1] :
>>>>> https://github.com/openjdk/panama-foreign/blob/foreign-
>>>>> jextract/src/hotspot/cpu/x86/universalUpcallHandler_x86_64.cpp#L412-L
>>>>> 416
>>>>> [2] :
>>>>>
>>> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/
>>>>> fr
>>>>> ame.cpp#L939-L946
>>>>> [3] :
>>>>> https://github.com/openjdk/panama-foreign/compare/foreign-
>>>>> memaccess+abi...JornVernee:Deopt_Crash
>>>>> [4] :
>>>>>
>>> https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/stubGe
>>>>> nerator_x86_64.cpp#L339

From darcy at openjdk.java.net  Mon Nov 22 21:55:21 2021
From: darcy at openjdk.java.net (Joe Darcy)
Date: Mon, 22 Nov 2021 21:55:21 GMT
Subject: RFR: 8275063: Implementation of Foreign Function & Memory API
 (Second incubator) [v26]
In-Reply-To: <Ck-IxUi_PmVxuycKrHWlXSgRy7B3L-XEf23m1Lief8U=.8320f211-3752-4ae8-beab-58b180870790@github.com>
References: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
 <Ck-IxUi_PmVxuycKrHWlXSgRy7B3L-XEf23m1Lief8U=.8320f211-3752-4ae8-beab-58b180870790@github.com>
Message-ID: <PPmB7JwIHcB9o_EKeZlI_FKzjvlSc288ui_NpfkbUXo=.ad0a87a3-24aa-40c1-b68c-caca7890db6b@github.com>

On Mon, 22 Nov 2021 12:09:30 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

>> This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment.
>> 
>> [1] - https://openjdk.java.net/jeps/419
>
> Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 35 commits:
> 
>  - Merge branch 'master' into JEP-419
>  - Fix javadoc issues found in CSR review
>  - Adopt blessed modofier order
>  - Merge branch 'master' into JEP-419
>  - Revert removal of upcall MH customization
>    (This change caused spurious VM crashes, so reverting to baseline)
>  - Further tweak upcall safety considerations
>  - Clarify safety considerations for upcalls
>  - Rename MemorySegment::ofAddressNative to MemorySegment::ofAddress
>    (which is consistent with other restricted factories in VaList and NativeSymbol)
>  - Streamline javadoc for package-info
>  - * Add two new CLinker static methods to compute upcall/downcall method types
>    * Clarify section on CLinker downcall type
>    * Add section on CLinker safety guarantees
>  - ... and 25 more: https://git.openjdk.java.net/jdk/compare/d427c79d...29cc6c60

Marked as reviewed by darcy (Reviewer).

-------------

PR: https://git.openjdk.java.net/jdk/pull/5907

From sviswanathan at openjdk.java.net  Tue Nov 23 01:35:28 2021
From: sviswanathan at openjdk.java.net (Sandhya Viswanathan)
Date: Tue, 23 Nov 2021 01:35:28 GMT
Subject: RFR: 8277617: Optimize array copy and clear on x86_64
Message-ID: <UXnYNz0bqKdWBzAjKTAw7xsgj8ilC3mN7c4s8Xsr6zw=.24b2646d-7da8-4da5-85f8-defa502d90aa@github.com>

Currently 32-byte instructions are used for small array copy and clear. 
This can be optimized by using 64-byte instructions.

Please review.

Best Regards,
Sandhya

-------------

Commit messages:
 - 8277617: Optimize array copy and clear on x86_64

Changes: https://git.openjdk.java.net/jdk/pull/6512/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6512&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8277617
  Stats: 15 lines in 4 files changed: 2 ins; 0 del; 13 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6512.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6512/head:pull/6512

PR: https://git.openjdk.java.net/jdk/pull/6512

From dholmes at openjdk.java.net  Tue Nov 23 02:18:10 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Tue, 23 Nov 2021 02:18:10 GMT
Subject: RFR: 8277617: Optimize array copy and clear on x86_64
In-Reply-To: <UXnYNz0bqKdWBzAjKTAw7xsgj8ilC3mN7c4s8Xsr6zw=.24b2646d-7da8-4da5-85f8-defa502d90aa@github.com>
References: <UXnYNz0bqKdWBzAjKTAw7xsgj8ilC3mN7c4s8Xsr6zw=.24b2646d-7da8-4da5-85f8-defa502d90aa@github.com>
Message-ID: <CEQrwfIEc-MRFdH1G-6aVHIfUiojOjGIXytN9m0AcNg=.0791b5ce-bdb7-4c42-aa5b-5d5b0c0f0ce5@github.com>

On Tue, 23 Nov 2021 01:23:04 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:

> Currently 32-byte instructions are used for small array copy and clear. 
> This can be optimized by using 64-byte instructions.
> 
> Please review.
> 
> Best Regards,
> Sandhya

This isn't my area but I'm a bit perplexed by the changes. AFAICS this patch does 2 things:

1. It changes all use of `AVX3Threshold` to `VM_Version::avx3_threshold()`
2. It defines `VM_Version::avx3_threshold()` as:
```static int avx3_threshold() { return (supports_serialize() ? 0: AVX3Threshold); }```

but I am at a loss to understand what `supports_serialize()` has to do with using 64-byte instructions for array copy and clear. ??

Thanks,
David

Plus some performance numbers would be useful. Thanks

-------------

PR: https://git.openjdk.java.net/jdk/pull/6512

From sviswanathan at openjdk.java.net  Tue Nov 23 02:48:09 2021
From: sviswanathan at openjdk.java.net (Sandhya Viswanathan)
Date: Tue, 23 Nov 2021 02:48:09 GMT
Subject: RFR: 8277617: Optimize array copy and clear on x86_64
In-Reply-To: <CEQrwfIEc-MRFdH1G-6aVHIfUiojOjGIXytN9m0AcNg=.0791b5ce-bdb7-4c42-aa5b-5d5b0c0f0ce5@github.com>
References: <UXnYNz0bqKdWBzAjKTAw7xsgj8ilC3mN7c4s8Xsr6zw=.24b2646d-7da8-4da5-85f8-defa502d90aa@github.com>
 <CEQrwfIEc-MRFdH1G-6aVHIfUiojOjGIXytN9m0AcNg=.0791b5ce-bdb7-4c42-aa5b-5d5b0c0f0ce5@github.com>
Message-ID: <OmM1Id2SsWlHqXlZH_y1IYmSWKn_6X6mCdVD29VhlIg=.f8963f35-cfb1-4a2c-96f1-87fd8d48edde@github.com>

On Tue, 23 Nov 2021 02:14:46 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Currently 32-byte instructions are used for small array copy and clear. 
>> This can be optimized by using 64-byte instructions.
>> 
>> Please review.
>> 
>> Best Regards,
>> Sandhya
>
> Plus some performance numbers would be useful. Thanks

@dholmes-ora We see about 25% gain on a micro on our latest platform. There is no cpuid bit for this, so the closest was to check  for the new serialize ISA supported on this platform.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6512

From dholmes at openjdk.java.net  Tue Nov 23 02:58:07 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Tue, 23 Nov 2021 02:58:07 GMT
Subject: RFR: 8277617: Optimize array copy and clear on x86_64
In-Reply-To: <UXnYNz0bqKdWBzAjKTAw7xsgj8ilC3mN7c4s8Xsr6zw=.24b2646d-7da8-4da5-85f8-defa502d90aa@github.com>
References: <UXnYNz0bqKdWBzAjKTAw7xsgj8ilC3mN7c4s8Xsr6zw=.24b2646d-7da8-4da5-85f8-defa502d90aa@github.com>
Message-ID: <i_wxLk5rTL7hCabnpNjD71QgW3eeO3h5S3Z0Cib7juM=.d8967941-e6d0-40fd-a79a-0db05faf62e6@github.com>

On Tue, 23 Nov 2021 01:23:04 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:

> Currently 32-byte instructions are used for small array copy and clear. 
> This can be optimized by using 64-byte instructions.
> 
> Please review.
> 
> Best Regards,
> Sandhya

But what exactly is it that you are checking for? What is the connection between the ISA version and the decision to effectively zero out AVX3Threshold?

-------------

PR: https://git.openjdk.java.net/jdk/pull/6512

From sviswanathan at openjdk.java.net  Tue Nov 23 04:28:07 2021
From: sviswanathan at openjdk.java.net (Sandhya Viswanathan)
Date: Tue, 23 Nov 2021 04:28:07 GMT
Subject: RFR: 8277617: Optimize array copy and clear on x86_64
In-Reply-To: <i_wxLk5rTL7hCabnpNjD71QgW3eeO3h5S3Z0Cib7juM=.d8967941-e6d0-40fd-a79a-0db05faf62e6@github.com>
References: <UXnYNz0bqKdWBzAjKTAw7xsgj8ilC3mN7c4s8Xsr6zw=.24b2646d-7da8-4da5-85f8-defa502d90aa@github.com>
 <i_wxLk5rTL7hCabnpNjD71QgW3eeO3h5S3Z0Cib7juM=.d8967941-e6d0-40fd-a79a-0db05faf62e6@github.com>
Message-ID: <xGiWOFdT7TkPCySvVTsX5DAOwQSnpV6R45wTdxKUe5k=.995b070d-685c-43be-bcfa-640860ef28cf@github.com>

On Tue, 23 Nov 2021 02:54:51 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Currently 32-byte instructions are used for small array copy and clear. 
>> This can be optimized by using 64-byte instructions.
>> 
>> Please review.
>> 
>> Best Regards,
>> Sandhya
>
> But what exactly is it that you are checking for? What is the connection between the ISA version and the decision to effectively zero out AVX3Threshold?

@dholmes-ora The Intel platforms that supports this ISA has improved implementation of 64-byte load/stores. I could not find any other better way to check in the absence of cupid bit.
If it helps, I could further restrict it to (is_intel_family_core() && supports_serialize()). Also, I can add a comment towards this to the avx3_threshold() method.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6512

From dholmes at openjdk.java.net  Tue Nov 23 04:54:06 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Tue, 23 Nov 2021 04:54:06 GMT
Subject: RFR: 8277617: Optimize array copy and clear on x86_64
In-Reply-To: <xGiWOFdT7TkPCySvVTsX5DAOwQSnpV6R45wTdxKUe5k=.995b070d-685c-43be-bcfa-640860ef28cf@github.com>
References: <UXnYNz0bqKdWBzAjKTAw7xsgj8ilC3mN7c4s8Xsr6zw=.24b2646d-7da8-4da5-85f8-defa502d90aa@github.com>
 <i_wxLk5rTL7hCabnpNjD71QgW3eeO3h5S3Z0Cib7juM=.d8967941-e6d0-40fd-a79a-0db05faf62e6@github.com>
 <xGiWOFdT7TkPCySvVTsX5DAOwQSnpV6R45wTdxKUe5k=.995b070d-685c-43be-bcfa-640860ef28cf@github.com>
Message-ID: <HcpmEdneyRzTCtdGeryGYHERd-pfjmCeHBh1ggmwE2Q=.dde41662-94a6-485e-95bb-61aa03df78eb@github.com>

On Tue, 23 Nov 2021 04:25:23 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:

>> But what exactly is it that you are checking for? What is the connection between the ISA version and the decision to effectively zero out AVX3Threshold?
>
> @dholmes-ora The Intel platforms that supports this ISA has improved implementation of 64-byte load/stores. I could not find any other better way to check in the absence of cupid bit.
> If it helps, I could further restrict it to (is_intel_family_core() && supports_serialize()). Also, I can add a comment towards this to the avx3_threshold() method.

@sviswa7 that further restriction and an explanatory comment would be appreciated. Thanks.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6512

From sviswanathan at openjdk.java.net  Tue Nov 23 05:21:45 2021
From: sviswanathan at openjdk.java.net (Sandhya Viswanathan)
Date: Tue, 23 Nov 2021 05:21:45 GMT
Subject: RFR: 8277617: Optimize array copy and clear on x86_64 [v2]
In-Reply-To: <UXnYNz0bqKdWBzAjKTAw7xsgj8ilC3mN7c4s8Xsr6zw=.24b2646d-7da8-4da5-85f8-defa502d90aa@github.com>
References: <UXnYNz0bqKdWBzAjKTAw7xsgj8ilC3mN7c4s8Xsr6zw=.24b2646d-7da8-4da5-85f8-defa502d90aa@github.com>
Message-ID: <8DCW_z8u24RWqc6LhKRd6jXF8gYOF9rvY-AMtz4C2Is=.a983c0a5-a714-4a88-8cd8-dba8d65ac72a@github.com>

> Currently 32-byte instructions are used for small array copy and clear. 
> This can be optimized by using 64-byte instructions.
> 
> Please review.
> 
> Best Regards,
> Sandhya

Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision:

  restrict to Intel core and add comment

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6512/files
  - new: https://git.openjdk.java.net/jdk/pull/6512/files/54aa9cee..e0cb890d

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6512&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6512&range=00-01

  Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6512.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6512/head:pull/6512

PR: https://git.openjdk.java.net/jdk/pull/6512

From sviswanathan at openjdk.java.net  Tue Nov 23 05:28:06 2021
From: sviswanathan at openjdk.java.net (Sandhya Viswanathan)
Date: Tue, 23 Nov 2021 05:28:06 GMT
Subject: RFR: 8277617: Optimize array copy and clear on x86_64
In-Reply-To: <HcpmEdneyRzTCtdGeryGYHERd-pfjmCeHBh1ggmwE2Q=.dde41662-94a6-485e-95bb-61aa03df78eb@github.com>
References: <UXnYNz0bqKdWBzAjKTAw7xsgj8ilC3mN7c4s8Xsr6zw=.24b2646d-7da8-4da5-85f8-defa502d90aa@github.com>
 <i_wxLk5rTL7hCabnpNjD71QgW3eeO3h5S3Z0Cib7juM=.d8967941-e6d0-40fd-a79a-0db05faf62e6@github.com>
 <xGiWOFdT7TkPCySvVTsX5DAOwQSnpV6R45wTdxKUe5k=.995b070d-685c-43be-bcfa-640860ef28cf@github.com>
 <HcpmEdneyRzTCtdGeryGYHERd-pfjmCeHBh1ggmwE2Q=.dde41662-94a6-485e-95bb-61aa03df78eb@github.com>
Message-ID: <JVUACA5wkhF8B6Aik3q7frcOAiUc6u_10eJZv8N-NLc=.eb482f9e-771d-4e0f-b70b-48994ccb2288@github.com>

On Tue, 23 Nov 2021 04:50:42 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> @dholmes-ora The Intel platforms that supports this ISA has improved implementation of 64-byte load/stores. I could not find any other better way to check in the absence of cupid bit.
>> If it helps, I could further restrict it to (is_intel_family_core() && supports_serialize()). Also, I can add a comment towards this to the avx3_threshold() method.
>
> @sviswa7 that further restriction and an explanatory comment would be appreciated. Thanks.

@dholmes-ora I have implemented your review comments.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6512

From jiefu at openjdk.java.net  Tue Nov 23 06:09:05 2021
From: jiefu at openjdk.java.net (Jie Fu)
Date: Tue, 23 Nov 2021 06:09:05 GMT
Subject: RFR: 8277617: Optimize array copy and clear on x86_64
In-Reply-To: <HcpmEdneyRzTCtdGeryGYHERd-pfjmCeHBh1ggmwE2Q=.dde41662-94a6-485e-95bb-61aa03df78eb@github.com>
References: <UXnYNz0bqKdWBzAjKTAw7xsgj8ilC3mN7c4s8Xsr6zw=.24b2646d-7da8-4da5-85f8-defa502d90aa@github.com>
 <i_wxLk5rTL7hCabnpNjD71QgW3eeO3h5S3Z0Cib7juM=.d8967941-e6d0-40fd-a79a-0db05faf62e6@github.com>
 <xGiWOFdT7TkPCySvVTsX5DAOwQSnpV6R45wTdxKUe5k=.995b070d-685c-43be-bcfa-640860ef28cf@github.com>
 <HcpmEdneyRzTCtdGeryGYHERd-pfjmCeHBh1ggmwE2Q=.dde41662-94a6-485e-95bb-61aa03df78eb@github.com>
Message-ID: <HtYRlJ4gFYuJhzNoqjn0T0YnZ_GluaNqIXrGhG2KHNQ=.a99967f2-3c3f-4bae-a918-1fe4b231a53f@github.com>

On Tue, 23 Nov 2021 04:50:42 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> @dholmes-ora The Intel platforms that supports this ISA has improved implementation of 64-byte load/stores. I could not find any other better way to check in the absence of cupid bit.
>> If it helps, I could further restrict it to (is_intel_family_core() && supports_serialize()). Also, I can add a comment towards this to the avx3_threshold() method.
>
> @sviswa7 that further restriction and an explanatory comment would be appreciated. Thanks.

> @dholmes-ora We see about 25% gain on a micro on our latest platform. There is no cpuid bit for this, so the closest was to check for the new serialize ISA supported on this platform.

It would be better to add a jmh test for this opt.
Thanks.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6512

From stuefe at openjdk.java.net  Tue Nov 23 06:48:14 2021
From: stuefe at openjdk.java.net (Thomas Stuefe)
Date: Tue, 23 Nov 2021 06:48:14 GMT
Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks
In-Reply-To: <6HZO-x_TuA4XOmbGfIo7DCXFgZCSNU69FkpWT8WWtL8=.6c3180cf-ced1-43d3-966f-3f21e9d3bffe@github.com>
References: <h_TohyRCtTn8BzN-h8uTTotWxPr5gTe4CWB04_lS8uA=.3f687b45-3058-42ab-902b-f42cfeb9104d@github.com>
 <jJzmGoEj_VpsZNwJU0IGAE6atFbN4vl0fRvLwxGyj7M=.e48c2ff9-0fb9-4bfd-a4a1-0a9106a59760@github.com>
 <MHNw41JYAQezDRVbYJjB1t4esIEk-SjKltY4Ox6rexs=.1a843b1f-063b-4832-be99-a3be3d00410f@github.com>
 <kjoLxh2Q9IHdBfOgFjK7B6fF1FGYy3ON9TFuXv1rYto=.0229ade5-4a43-4a10-9d59-35987e67c9ba@github.com>
 <xjh3jusi2H3rzs2_PK2Efo2oN5zsg7aNZQOR6Q0dTIo=.7c57ee1d-7edf-4db8-9c9f-4bc3fb982a5a@github.com>
 <_r5qw_r-3Be7zUJuf4gcb10MFe9varAWAvix_CaJiYs=.758ad563-f2d2-466d-bcb5-1ccc6b547e94@github.com>
 <6HZO-x_TuA4XOmbGfIo7DCXFgZCSNU69FkpWT8WWtL8=.6c3180cf-ced1-43d3-966f-3f21e9d3bffe@github.com>
Message-ID: <jtoxSxzSoYjEhQqoXkmhh-UfsQDICFBwR0gGLS5pl-M=.afb0e6ee-22e3-4d68-9de5-cdcf81139326@github.com>

On Sun, 17 Oct 2021 13:30:17 GMT, Zhengyu Gu <zgu at openjdk.org> wrote:

>>> > > > Sorry, we already have GuardedMemory for detecting buffer overrun, why introduce a new one?
>>> > > 
>>> > > 
>>> > > GuardedMemory has a number of disadvantages, and I'd like to remove it in favor of NMT doing buffer overrun checks. For my full reasoning, please see my reasoning in the umbrella RFE https://bugs.openjdk.java.net/browse/JDK-8275301:
>>> > > Disadvantages of the current solution:
>>> > > 
>>> > > * We have no way to do C-heap checking in release builds. But there, it is sorely missed. We ship release VMs, and those VMs get used in myriad ways with a lot of faulty third-party native code. I would love to be able to flip a product switch at a customer site and have some basic C-heap checks done, without relegating to external tools or debug c-libs.
>>> > > * The debug-only guards in os::malloc() are quite fat, really, a whopping 48 bytes per allocation on 64-bit, 40 bytes on 32-bit. That is for guarding alone. They distort the memory allocation picture, since blowing up every allocation this way causes the underlying libc to do different things. Therefore we have different memory layouts and allocation patterns between debug and release. In addition, we have different code paths too, e.g. in debug os::realloc calls os::malloc + os::free whereas in release builds it calls directly into libc ::realloc. All that means that in debug builds we test something different than what we ship in release builds.
>>> > > * The canary in the headers of the debug-only guards do not directly precede the user portion of the data, so we won't catch negative buffer overflows of only a few bytes.
>>> > > * The guarding added by CheckJNICalls is unnecessarily expensive too, since it copies the memory around, handing a copy of the guarded memory up to the caller.
>>> > > * The fact that three different code sections all do malloc headers incurs unnecessary costs, and the code is unnecessarily complex. It makes also statistics difficult to understand since the silent overhead can be large (compare the rise in RSS with the rise in NMT allocations in a debug build).
>>> > > * None of the current overflow checkers print out hex dumps of the violated memory. That is what the libc usually does and it is very useful.
>>> > > 
>>> > > Thanks, Thomas
>>> > 
>>> > 
>>> > p.s. I contemplated to do NMT overflow checks and removal of old guarding code in one RFE but was concerned that it would be too confusing and get stuck in review limbo. Maybe that was wrong. But this RFE here makes more sense when viewed as part of a whole.
>>> 
>>> Thanks for explanation. So, buffer overrun detection is now only available when NMT is on, vs. always on with GuardedMemory in debug build. Right?
>> 
>> Well, not with this patch obviously. But yes, that would be my proposal. To get "always-on", we could switch NMT on by default in debug builds. "summary" level is not really expensive at all, it uses less memory than GuardedMemory does, and the per-flag accounting does not really add much overhead (GuardedMemory also does some accounting btw).
>> 
>> Though tbh my first priority is to give us overflow checks in release builds. If we only do that and leave GuardedMemory in place I would be happy already. I had two customer cases very recently with heap overwriters, one of which I misused NMT to trigger a crash and analyze the core. A neighboring (non-VM-allocated) block was overwriting the following (VM allocated) heap block. 
>> 
>> Cheers, Thomas
>
>> > > > > Sorry, we already have GuardedMemory for detecting buffer overrun, why introduce a new one?
>> > > > 
>> > > > 
>> > > > GuardedMemory has a number of disadvantages, and I'd like to remove it in favor of NMT doing buffer overrun checks. For my full reasoning, please see my reasoning in the umbrella RFE https://bugs.openjdk.java.net/browse/JDK-8275301:
>> > > > Disadvantages of the current solution:
>> > > > 
>> > > > * We have no way to do C-heap checking in release builds. But there, it is sorely missed. We ship release VMs, and those VMs get used in myriad ways with a lot of faulty third-party native code. I would love to be able to flip a product switch at a customer site and have some basic C-heap checks done, without relegating to external tools or debug c-libs.
>> > > > * The debug-only guards in os::malloc() are quite fat, really, a whopping 48 bytes per allocation on 64-bit, 40 bytes on 32-bit. That is for guarding alone. They distort the memory allocation picture, since blowing up every allocation this way causes the underlying libc to do different things. Therefore we have different memory layouts and allocation patterns between debug and release. In addition, we have different code paths too, e.g. in debug os::realloc calls os::malloc + os::free whereas in release builds it calls directly into libc ::realloc. All that means that in debug builds we test something different than what we ship in release builds.
>> > > > * The canary in the headers of the debug-only guards do not directly precede the user portion of the data, so we won't catch negative buffer overflows of only a few bytes.
>> > > > * The guarding added by CheckJNICalls is unnecessarily expensive too, since it copies the memory around, handing a copy of the guarded memory up to the caller.
>> > > > * The fact that three different code sections all do malloc headers incurs unnecessary costs, and the code is unnecessarily complex. It makes also statistics difficult to understand since the silent overhead can be large (compare the rise in RSS with the rise in NMT allocations in a debug build).
>> > > > * None of the current overflow checkers print out hex dumps of the violated memory. That is what the libc usually does and it is very useful.
>> > > > 
>> > > > Thanks, Thomas
>> > > 
>> > > 
>> > > p.s. I contemplated to do NMT overflow checks and removal of old guarding code in one RFE but was concerned that it would be too confusing and get stuck in review limbo. Maybe that was wrong. But this RFE here makes more sense when viewed as part of a whole.
>> > 
>> > 
>> > Thanks for explanation. So, buffer overrun detection is now only available when NMT is on, vs. always on with GuardedMemory in debug build. Right?
>> 
>> Well, not with this patch obviously. But yes, that would be my proposal. To get "always-on", we could switch NMT on by default in debug builds. "summary" level is not really expensive at all, it uses less memory than GuardedMemory does, and the per-flag accounting does not really add much overhead (GuardedMemory also does some accounting btw).
>> 
>> Though tbh my first priority is to give us overflow checks in release builds. If we only do that and leave GuardedMemory in place I would be happy already. I had two customer cases very recently with heap overwriters, one of which I misused NMT to trigger a crash and analyze the core. A neighboring (non-VM-allocated) block was overwriting the following (VM allocated) heap block.
>> 
>> Cheers, Thomas
> 
> I have no problem on technical side. Changing NMT default value, I believe, needs CSR. Probably should start with a CSR to get a consensus.
> 
> Thanks.
> 
> -Zhengyu

Thank you @zhengyu123 and @simonis! I'll do one round of stress tests more, then push.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5952

From dholmes at openjdk.java.net  Tue Nov 23 06:52:07 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Tue, 23 Nov 2021 06:52:07 GMT
Subject: RFR: 8277617: Optimize array copy and clear on x86_64
In-Reply-To: <JVUACA5wkhF8B6Aik3q7frcOAiUc6u_10eJZv8N-NLc=.eb482f9e-771d-4e0f-b70b-48994ccb2288@github.com>
References: <UXnYNz0bqKdWBzAjKTAw7xsgj8ilC3mN7c4s8Xsr6zw=.24b2646d-7da8-4da5-85f8-defa502d90aa@github.com>
 <i_wxLk5rTL7hCabnpNjD71QgW3eeO3h5S3Z0Cib7juM=.d8967941-e6d0-40fd-a79a-0db05faf62e6@github.com>
 <xGiWOFdT7TkPCySvVTsX5DAOwQSnpV6R45wTdxKUe5k=.995b070d-685c-43be-bcfa-640860ef28cf@github.com>
 <HcpmEdneyRzTCtdGeryGYHERd-pfjmCeHBh1ggmwE2Q=.dde41662-94a6-485e-95bb-61aa03df78eb@github.com>
 <JVUACA5wkhF8B6Aik3q7frcOAiUc6u_10eJZv8N-NLc=.eb482f9e-771d-4e0f-b70b-48994ccb2288@github.com>
Message-ID: <1KoRjoyObIS32kwNcojcLdIdUkdqpL1Pon6-IIn-H94=.a986a7bb-a14b-4df8-9ab2-9c66650e6d1b@github.com>

On Tue, 23 Nov 2021 05:24:41 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:

>> @sviswa7 that further restriction and an explanatory comment would be appreciated. Thanks.
>
> @dholmes-ora I have implemented your review comments.

Sorry @sviswa7 but could you explain in the comment why/how `avx3_threshold` reporting zero impacts the use 64-byte load/store - the connection is not at all obvious for anyone not fully conversant with AVX3 and how it is used by the code. Thanks.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6512

From pli at openjdk.java.net  Tue Nov 23 08:12:07 2021
From: pli at openjdk.java.net (Pengfei Li)
Date: Tue, 23 Nov 2021 08:12:07 GMT
Subject: RFR: 8277168: AArch64: Enable arraycopy partial inlining with SVE
In-Reply-To: <82Kgtn4RllwF2ifvmwtaQaeG9ADXeUoq290BKnd8PZ4=.ed410c36-2f5c-4b29-9d96-07d33ac872ee@github.com>
References: <Rv-7zsL9uLaSjwoS7JIDgZ2-HhIoX00l8aCiDmFd6qw=.b1dbbb3b-0578-4d7f-97ba-1acd3baaa74c@github.com>
 <v6YJjUUxs507gmh_JQNThW34vhS3w0FxQ_YPUUlST-g=.7a0a7e7d-def4-4dc9-b29a-efbad6081983@github.com>
 <82Kgtn4RllwF2ifvmwtaQaeG9ADXeUoq290BKnd8PZ4=.ed410c36-2f5c-4b29-9d96-07d33ac872ee@github.com>
Message-ID: <0NpXDvx0PPQgOnuxjlDayD-n5Y9nojMQhPRul1ysKqk=.b4e4fdc9-10bc-485f-843e-22c4cc360647@github.com>

On Fri, 19 Nov 2021 08:07:13 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> The x86 failure is caused by a recent commit (see [JDK-8277324](https://bugs.openjdk.java.net/browse/JDK-8277324)) and unrelated to this PR.
>
> Hi @pfustc , common type system changes looks good to me.

Thank you for looking at my PR. This C2 technique was originally developed by @jatin-bhateja from Intel to optimize small-sized memory copy with x86 AVX-512 masked vector instructions. Now I propose to enable it on AArch64 with SVE. Yes, it has benefit only if the copy size is less than the size of a vector. It's 512 bits on x86, but on AArch64 SVE the max copy size it can benefit depends on the hardware's implementation of the scalable vector register (from 128 bits to 2048 bits).

@theRealAph , do you approve this PR? or any specific feedback or suggestion?

-------------

PR: https://git.openjdk.java.net/jdk/pull/6444

From mdoerr at openjdk.java.net  Tue Nov 23 09:32:08 2021
From: mdoerr at openjdk.java.net (Martin Doerr)
Date: Tue, 23 Nov 2021 09:32:08 GMT
Subject: RFR: 8273563: Improve performance of implicit exceptions with
 -XX:-OmitStackTraceInFastThrow [v10]
In-Reply-To: <3DyX38fUwXmYfYuInLP-xhm1toijhtr2U7pHK2zhNqU=.b91e17bd-bea6-4323-96e0-03c59e3f0573@github.com>
References: <V33bqOWqbYrr4ACfUn3SkS_JbOuICbX68z7jlX37pXQ=.a4a6b7ce-da22-440f-b3c7-d0a99a28fea0@github.com>
 <3DyX38fUwXmYfYuInLP-xhm1toijhtr2U7pHK2zhNqU=.b91e17bd-bea6-4323-96e0-03c59e3f0573@github.com>
Message-ID: <kQj1eJBEG0bCvYpvxYoewN1nWLzlCXdoRurPC0WD6Ag=.ea949a21-e2ae-4dfc-994a-5ace91ecb67b@github.com>

On Thu, 18 Nov 2021 10:21:01 GMT, Volker Simonis <simonis at openjdk.org> wrote:

>> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening.
>> 
>> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274):
>> 
>>     public static boolean isAlpha(int c) {
>>         try {
>>             return IS_ALPHA[c];
>>         } catch (ArrayIndexOutOfBoundsException ex) {
>>             return false;
>>         }
>>     }
>> 
>> 
>> ### Solution
>> 
>> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code:
>> 
>> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions
>> Benchmark                 (exceptionProbability)  Mode  Cnt      Score      Error  Units
>> ImplicitExceptions.bench                     0.0  avgt    5      1.430 ?    0.353  ns/op
>> ImplicitExceptions.bench                    0.33  avgt    5   3563.038 ?   77.358  ns/op
>> ImplicitExceptions.bench                    0.66  avgt    5   8609.693 ? 1205.104  ns/op
>> ImplicitExceptions.bench                    1.00  avgt    5  12842.401 ? 1022.728  ns/op
>> 
>> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions
>> Benchmark                 (exceptionProbability)  Mode  Cnt      Score      Error  Units
>> ImplicitExceptions.bench                     0.0  avgt    5     1.432  ?    0.352  ns/op
>> ImplicitExceptions.bench                    0.33  avgt    5   355.723  ?   16.641  ns/op
>> ImplicitExceptions.bench                    0.66  avgt    5   887.068  ?  166.728  ns/op
>> ImplicitExceptions.bench                    1.00  avgt    5  1274.418  ?   88.235  ns/op
>> 
>> 
>> ### Implementation details
>> 
>> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default.
>> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `<init>` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. 
>> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`.
>> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call.
>> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster.
>
> Volker Simonis has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits:
> 
>  - Fix jit/t/t105/t105.java to also use -XX:-OptimizeImplicitExceptions in addition to -XX:-OmitStacktracesInFastThrow
>  - Fix IR Framework test Traps::classCheck() which now behaves differently with -XX:+OptimizeImplicitExceptions
>  - Fix build issue for minimal/zero build one more time
>  - Minor enhancements and fixes requested by Martin
>  - Add new WhiteBox functionality to sun/hotspot/WhiteBox.java as well to avoid warnings in the tests which are still using it.
>  - Fix build issue for minimal/zero build
>  - Added jtreg test and extended the Whitebox API to export decompile, deopt and trap counters
>  - Fix special case where we're creating an implicit exception for a regular invoke* bytecode
>  - Minor updates as requested by @TheRealMDoerr
>  - 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow

I think this workaround is ok. C2 currently doesn't support extended exception messages other than NullPointerExceptions. If this change gets accepted, I think we should add C2 support for other primitive Exceptions.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5488

From pliden at openjdk.java.net  Tue Nov 23 09:45:10 2021
From: pliden at openjdk.java.net (Per Liden)
Date: Tue, 23 Nov 2021 09:45:10 GMT
Subject: RFR: 8276696: ParallelObjectIterator freed at the wrong time in
 VM_HeapDumper
In-Reply-To: <JFVjS5PCXDGX5jmjraERwlIfpyJBrSSH001s9CaO_DE=.6d8b489a-cb67-4ed9-86b2-ee9fe94df314@github.com>
References: <JFVjS5PCXDGX5jmjraERwlIfpyJBrSSH001s9CaO_DE=.6d8b489a-cb67-4ed9-86b2-ee9fe94df314@github.com>
Message-ID: <OsysZVCzNnOKdfs-cZZ7r3FcY4pEX3O_KVxjNxxHops=.81564c11-59fe-4725-945d-533b58aca44d@github.com>

On Mon, 22 Nov 2021 13:49:02 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote:

> The VM_HeapDumper code uses a C heap allocated ParallelObjectIterator. It is constructed right before running a parallel operation with a work gang, but freed in the destructor of the VM_HeapDumper. This means it is created on one thread and deleted on another thread. This becomes a bit problematic when a parallel object iterator implementation uses a ThreadsListHandle (which is indeed the case for ZGC). This patch changes ParallelObjectIterator to be a StackObj, carrying a ParallelObjectIteratorImpl object, which is never exposed publicly. This ensures that construction and destruction of the internal object iterator is scoped like RAII objects, hence complying with how ThreadsListHandle is supposed to be used.

Looks good.

-------------

Marked as reviewed by pliden (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6501

From stefank at openjdk.java.net  Tue Nov 23 09:52:06 2021
From: stefank at openjdk.java.net (Stefan Karlsson)
Date: Tue, 23 Nov 2021 09:52:06 GMT
Subject: RFR: 8276696: ParallelObjectIterator freed at the wrong time in
 VM_HeapDumper
In-Reply-To: <JFVjS5PCXDGX5jmjraERwlIfpyJBrSSH001s9CaO_DE=.6d8b489a-cb67-4ed9-86b2-ee9fe94df314@github.com>
References: <JFVjS5PCXDGX5jmjraERwlIfpyJBrSSH001s9CaO_DE=.6d8b489a-cb67-4ed9-86b2-ee9fe94df314@github.com>
Message-ID: <k_Zya4UXwRmphJwmoc2zlAZSNUDYSR9sz4wwPA1eUjk=.38d5bd42-ae7f-4aaa-8c5b-a00f2c8f8504@github.com>

On Mon, 22 Nov 2021 13:49:02 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote:

> The VM_HeapDumper code uses a C heap allocated ParallelObjectIterator. It is constructed right before running a parallel operation with a work gang, but freed in the destructor of the VM_HeapDumper. This means it is created on one thread and deleted on another thread. This becomes a bit problematic when a parallel object iterator implementation uses a ThreadsListHandle (which is indeed the case for ZGC). This patch changes ParallelObjectIterator to be a StackObj, carrying a ParallelObjectIteratorImpl object, which is never exposed publicly. This ensures that construction and destruction of the internal object iterator is scoped like RAII objects, hence complying with how ThreadsListHandle is supposed to be used.

Marked as reviewed by stefank (Reviewer).

-------------

PR: https://git.openjdk.java.net/jdk/pull/6501

From eosterlund at openjdk.java.net  Tue Nov 23 13:42:06 2021
From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=)
Date: Tue, 23 Nov 2021 13:42:06 GMT
Subject: RFR: 8276696: ParallelObjectIterator freed at the wrong time in
 VM_HeapDumper
In-Reply-To: <OsysZVCzNnOKdfs-cZZ7r3FcY4pEX3O_KVxjNxxHops=.81564c11-59fe-4725-945d-533b58aca44d@github.com>
References: <JFVjS5PCXDGX5jmjraERwlIfpyJBrSSH001s9CaO_DE=.6d8b489a-cb67-4ed9-86b2-ee9fe94df314@github.com>
 <OsysZVCzNnOKdfs-cZZ7r3FcY4pEX3O_KVxjNxxHops=.81564c11-59fe-4725-945d-533b58aca44d@github.com>
Message-ID: <8246XgMZ_pumK-BgCx9osG0N9jvJxNGmcJVw8s_0oqo=.d723b7d4-ac0e-4897-af9f-65524377ab87@github.com>

On Tue, 23 Nov 2021 09:42:10 GMT, Per Liden <pliden at openjdk.org> wrote:

>> The VM_HeapDumper code uses a C heap allocated ParallelObjectIterator. It is constructed right before running a parallel operation with a work gang, but freed in the destructor of the VM_HeapDumper. This means it is created on one thread and deleted on another thread. This becomes a bit problematic when a parallel object iterator implementation uses a ThreadsListHandle (which is indeed the case for ZGC). This patch changes ParallelObjectIterator to be a StackObj, carrying a ParallelObjectIteratorImpl object, which is never exposed publicly. This ensures that construction and destruction of the internal object iterator is scoped like RAII objects, hence complying with how ThreadsListHandle is supposed to be used.
>
> Looks good.

Thanks for the reviews, @pliden and @stefank!

-------------

PR: https://git.openjdk.java.net/jdk/pull/6501

From eosterlund at openjdk.java.net  Tue Nov 23 14:22:46 2021
From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=)
Date: Tue, 23 Nov 2021 14:22:46 GMT
Subject: RFR: 8277631: ZGC: CriticalMetaspaceAllocation asserts
Message-ID: <CVnXCzCvWpdyq01hRvsGJr32qKns-Oa9uyBvggkXzfk=.a2eb08dd-92b7-4849-9ae9-22df9ebd63a8@github.com>

The MetaspaceCritical_lock is a non-safepoint checking lock. That implies that the allow VM block flag is true. That implies that taking that lock takes a NoSafepointVerifier. That causes an assert to fire when MetaspaceCriticalAllocation::wait_for_purge transitions to blocked with ThreadBlockInVM while holding the lock. The fix is to move the locker inside of the ThreadBlockInVM.

-------------

Commit messages:
 - 8277631: ZGC: CriticalMetaspaceAllocation asserts

Changes: https://git.openjdk.java.net/jdk/pull/6520/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6520&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8277631
  Stats: 35 lines in 2 files changed: 30 ins; 0 del; 5 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6520.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6520/head:pull/6520

PR: https://git.openjdk.java.net/jdk/pull/6520

From pliden at openjdk.java.net  Tue Nov 23 14:37:12 2021
From: pliden at openjdk.java.net (Per Liden)
Date: Tue, 23 Nov 2021 14:37:12 GMT
Subject: RFR: 8277631: ZGC: CriticalMetaspaceAllocation asserts
In-Reply-To: <CVnXCzCvWpdyq01hRvsGJr32qKns-Oa9uyBvggkXzfk=.a2eb08dd-92b7-4849-9ae9-22df9ebd63a8@github.com>
References: <CVnXCzCvWpdyq01hRvsGJr32qKns-Oa9uyBvggkXzfk=.a2eb08dd-92b7-4849-9ae9-22df9ebd63a8@github.com>
Message-ID: <dA4txGphIG6nh-OWPZ4bioAtNIu61ymfFYvIipsii3s=.42a227fb-f7ab-4c85-ac7c-4cbfe3a92974@github.com>

On Tue, 23 Nov 2021 14:14:31 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote:

> The MetaspaceCritical_lock is a non-safepoint checking lock. That implies that the allow VM block flag is true. That implies that taking that lock takes a NoSafepointVerifier. That causes an assert to fire when MetaspaceCriticalAllocation::wait_for_purge transitions to blocked with ThreadBlockInVM while holding the lock. The fix is to move the locker inside of the ThreadBlockInVM.

Marked as reviewed by pliden (Reviewer).

-------------

PR: https://git.openjdk.java.net/jdk/pull/6520

From eosterlund at openjdk.java.net  Tue Nov 23 14:38:15 2021
From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=)
Date: Tue, 23 Nov 2021 14:38:15 GMT
Subject: Integrated: 8276696: ParallelObjectIterator freed at the wrong time in
 VM_HeapDumper
In-Reply-To: <JFVjS5PCXDGX5jmjraERwlIfpyJBrSSH001s9CaO_DE=.6d8b489a-cb67-4ed9-86b2-ee9fe94df314@github.com>
References: <JFVjS5PCXDGX5jmjraERwlIfpyJBrSSH001s9CaO_DE=.6d8b489a-cb67-4ed9-86b2-ee9fe94df314@github.com>
Message-ID: <RU1WUNJVx1eqBs8mKDW0CaW5HtJGn7sFG7noGQsCAIQ=.538901b5-02a8-4bbf-896b-fa70b7424abf@github.com>

On Mon, 22 Nov 2021 13:49:02 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote:

> The VM_HeapDumper code uses a C heap allocated ParallelObjectIterator. It is constructed right before running a parallel operation with a work gang, but freed in the destructor of the VM_HeapDumper. This means it is created on one thread and deleted on another thread. This becomes a bit problematic when a parallel object iterator implementation uses a ThreadsListHandle (which is indeed the case for ZGC). This patch changes ParallelObjectIterator to be a StackObj, carrying a ParallelObjectIteratorImpl object, which is never exposed publicly. This ensures that construction and destruction of the internal object iterator is scoped like RAII objects, hence complying with how ThreadsListHandle is supposed to be used.

This pull request has now been integrated.

Changeset: f4dc03ea
Author:    Erik ?sterlund <eosterlund at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/f4dc03ea6de327425ff265c3d2ec16ea7b0e1634
Stats:     70 lines in 15 files changed: 35 ins; 11 del; 24 mod

8276696: ParallelObjectIterator freed at the wrong time in VM_HeapDumper

Reviewed-by: pliden, stefank

-------------

PR: https://git.openjdk.java.net/jdk/pull/6501

From eosterlund at openjdk.java.net  Tue Nov 23 14:46:08 2021
From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=)
Date: Tue, 23 Nov 2021 14:46:08 GMT
Subject: RFR: 8277631: ZGC: CriticalMetaspaceAllocation asserts
In-Reply-To: <dA4txGphIG6nh-OWPZ4bioAtNIu61ymfFYvIipsii3s=.42a227fb-f7ab-4c85-ac7c-4cbfe3a92974@github.com>
References: <CVnXCzCvWpdyq01hRvsGJr32qKns-Oa9uyBvggkXzfk=.a2eb08dd-92b7-4849-9ae9-22df9ebd63a8@github.com>
 <dA4txGphIG6nh-OWPZ4bioAtNIu61ymfFYvIipsii3s=.42a227fb-f7ab-4c85-ac7c-4cbfe3a92974@github.com>
Message-ID: <9tidE4J6H_rFLh6OWbKAEzZziYYULySBJJr9RVgcnlI=.a1b80d06-bb9f-45be-9b83-9ad6a1036788@github.com>

On Tue, 23 Nov 2021 14:33:47 GMT, Per Liden <pliden at openjdk.org> wrote:

>> The MetaspaceCritical_lock is a non-safepoint checking lock. That implies that the allow VM block flag is true. That implies that taking that lock takes a NoSafepointVerifier. That causes an assert to fire when MetaspaceCriticalAllocation::wait_for_purge transitions to blocked with ThreadBlockInVM while holding the lock. The fix is to move the locker inside of the ThreadBlockInVM.
>
> Marked as reviewed by pliden (Reviewer).

Thanks for the reviews, @pliden and @stefank!

-------------

PR: https://git.openjdk.java.net/jdk/pull/6520

From stefank at openjdk.java.net  Tue Nov 23 14:46:08 2021
From: stefank at openjdk.java.net (Stefan Karlsson)
Date: Tue, 23 Nov 2021 14:46:08 GMT
Subject: RFR: 8277631: ZGC: CriticalMetaspaceAllocation asserts
In-Reply-To: <CVnXCzCvWpdyq01hRvsGJr32qKns-Oa9uyBvggkXzfk=.a2eb08dd-92b7-4849-9ae9-22df9ebd63a8@github.com>
References: <CVnXCzCvWpdyq01hRvsGJr32qKns-Oa9uyBvggkXzfk=.a2eb08dd-92b7-4849-9ae9-22df9ebd63a8@github.com>
Message-ID: <_W_qcnx6Y9G12rJKxfvrKbuFDnUJcs6dC0F8agu5ueE=.63b4b4dd-0396-4431-bf7b-8c04e2f380e0@github.com>

On Tue, 23 Nov 2021 14:14:31 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote:

> The MetaspaceCritical_lock is a non-safepoint checking lock. That implies that the allow VM block flag is true. That implies that taking that lock takes a NoSafepointVerifier. That causes an assert to fire when MetaspaceCriticalAllocation::wait_for_purge transitions to blocked with ThreadBlockInVM while holding the lock. The fix is to move the locker inside of the ThreadBlockInVM.

Marked as reviewed by stefank (Reviewer).

-------------

PR: https://git.openjdk.java.net/jdk/pull/6520

From jiefu at openjdk.java.net  Tue Nov 23 16:04:23 2021
From: jiefu at openjdk.java.net (Jie Fu)
Date: Tue, 23 Nov 2021 16:04:23 GMT
Subject: RFR: 8277652: SIGSEGV in ShenandoahBarrierC2Support::verify_raw_mem
 for malformed control flow graph
Message-ID: <kXZWFS0Yj4qU3UF_PEmFY5ca1r7yYiCcgpSb2Yn7qFQ=.d9b39b03-5b46-4c3d-ad2a-605f07d83f3f@github.com>

Hi all,

`ShenandoahBarrierC2Support::verify_raw_mem` crashes due to `u->unique_ctrl_out()` [1] returns NULL for malformed control flow graph.
It can be reproduced by running `compiler/vectorapi/TestIntrinsicBailOut.java` with `-XX:+UseShenandoahGC`.
It would be better to fix it.

Thanks.
Best regards,
Jie

[1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp#L1925

-------------

Commit messages:
 - 8277652: SIGSEGV in ShenandoahBarrierC2Support::verify_raw_mem for malformed control flow graph

Changes: https://git.openjdk.java.net/jdk/pull/6525/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6525&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8277652
  Stats: 13 lines in 2 files changed: 13 ins; 0 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6525.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6525/head:pull/6525

PR: https://git.openjdk.java.net/jdk/pull/6525

From rkennke at openjdk.java.net  Tue Nov 23 16:24:07 2021
From: rkennke at openjdk.java.net (Roman Kennke)
Date: Tue, 23 Nov 2021 16:24:07 GMT
Subject: RFR: 8277652: SIGSEGV in
 ShenandoahBarrierC2Support::verify_raw_mem for malformed control flow graph
In-Reply-To: <kXZWFS0Yj4qU3UF_PEmFY5ca1r7yYiCcgpSb2Yn7qFQ=.d9b39b03-5b46-4c3d-ad2a-605f07d83f3f@github.com>
References: <kXZWFS0Yj4qU3UF_PEmFY5ca1r7yYiCcgpSb2Yn7qFQ=.d9b39b03-5b46-4c3d-ad2a-605f07d83f3f@github.com>
Message-ID: <81o2YKFQvTE2C9qqBBDBjC5L1dNyPMRTJw1CcTdD2SA=.6946cbb3-de08-46b7-9724-7c39a989efc3@github.com>

On Tue, 23 Nov 2021 15:59:00 GMT, Jie Fu <jiefu at openjdk.org> wrote:

> Hi all,
> 
> `ShenandoahBarrierC2Support::verify_raw_mem` crashes due to `u->unique_ctrl_out()` [1] returns NULL for malformed control flow graph.
> It can be reproduced by running `compiler/vectorapi/TestIntrinsicBailOut.java` with `-XX:+UseShenandoahGC`.
> It would be better to fix it.
> 
> Thanks.
> Best regards,
> Jie
> 
> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp#L1925

Thank you, Jie!
I am currently working on a change that would make LRB runtime call not consume or produce raw memory at all, and would obsolete your change. See #6526 .

-------------

PR: https://git.openjdk.java.net/jdk/pull/6525

From eastig at amazon.co.uk  Tue Nov 23 17:34:44 2021
From: eastig at amazon.co.uk (Astigeevich, Evgeny)
Date: Tue, 23 Nov 2021 17:34:44 +0000
Subject: RFC: improving NMethod code locality in CodeCache
Message-ID: <18BB091D-7983-48B1-BD0D-A333D8B81226@amazon.com>

Hello,
 
We?d like to discuss a proposal for improving NMethod code locality in CodeCache.

We have cases where the CodeCache contains more than 15,000 compiled methods. In these cases, we saw a negative performance effect. The hot executable code is not contiguous, so branch prediction hardware can become overloaded.

The current NMethod layout is continuous and consists of the following sections:
* Header: This is C++ part of NMethod: class members and other C++ stuff. Its size is ?sizeof(NMethod)?. Jdk17 arm64 has it to be 344 bytes. On x86_64 it is 352 bytes.
* Relocation
* Constant pool
* Instructions (main code)
* Stub code
* Oops
* Metadata: Class related metadata
* Scopes data: Debugging information
* Scopes pcs: Debugging information
* Dependencies
* Handler table: Exception handler table
* Nul chk table: Implicit Null Pointer exception table
* Speculations
* JVMCI data

We collected the section sizes of C2 nmethods in the DaCapo and Renaissance benchmarks on x86_64 and arm64. The C2 methods were got with ?XX:+LogCompilation?. 
Summary of results for jdk17 with tiered compilation:
* DaCapo:
    * arm64 (full data https://github.com/eastig/codecache/blob/master/jdk17/dacapo_c2_sizes_arm64.csv): 
+---------------------+---------+------------+-----------+
|                     |   min   |   max      |   median  |
+---------------------+---------+------------+-----------+
| C2 nmethods         | 152     | 5215       | 916       |
| Total size - bytes  | 271,576 | 38,367,872 | 4,072,616 |
+---------------------+---------+------------+-----------+

Proportion of the total size of a section vs C2 nmethods total size

+---------------+-------+-------+--------+
|    Section    |  min  |  max  | median |
+---------------+-------+-------+--------+
| header        | 4.7%  | 19.3% | 8.0%   |
| consts        | 0.0%  | 0.1%  | 0.0%   |
| instrs        | 39.7% | 49.7% | 44.5%  |
| stub code     | 8.9%  | 11.3% | 10.1%  |
| oops          | 0.2%  | 0.4%  | 0.3%   |
| metadata      | 2.0%  | 3.0%  | 2.3%   |
| scopes data   | 12.2% | 18.6% | 15.9%  |
| scopes pcs    | 7.8%  | 9.0%  | 8.4%   |
| deps          | 0.3%  | 0.8%  | 0.5%   |
| handler table | 1.3%  | 3.3%  | 2.1%   |
| nul_chk table | 1.0%  | 1.6%  | 1.6%   |
+---------------+-------+-------+--------+

    * x86_64 (full data https://github.com/eastig/codecache/blob/master/jdk17/dacapo_c2_sizes_x86_64.csv):
+---------------------+---------+------------+-----------+
|                     |   min   |   max      |   median  |
+---------------------+---------+------------+-----------+
| C2 nmethods         | 155     | 5135       | 889       |
| Total size - bytes  | 264,800 | 35,026,312 | 3,985,744 |
+---------------------+---------+------------+-----------+

Proportion of the total size of a section vs C2 nmethods total size

+---------------+-------+-------+--------+
|    Section    |  min  |  max  | median |
+---------------+-------+-------+--------+
| header        | 5.2%  | 20.6% | 8.3%   |
| consts        | 0.0%  | 0.6%  | 0.1%   |
| instrs        | 49.2% | 60.7% | 55.3%  |
| stub code     | 1.1%  | 1.9%  | 1.4%   |
| oops          | 0.1%  | 0.3%  | 0.2%   |
| metadata      | 1.6%  | 2.9%  | 2.0%   |
| scopes data   | 12.2% | 19.6% | 16.8%  |
| scopes pcs    | 7.8%  | 9.2%  | 8.5%   |
| deps          | 0.3%  | 0.8%  | 0.5%   |
| handler table | 1.5%  | 3.5%  | 2.0%   |
| nul_chk table | 0.9%  | 1.6%  | 1.1%   |
+---------------+-------+-------+--------+

* Renaissance
    * arm64 (full data https://github.com/eastig/codecache/blob/master/jdk17/renaissance_c2_sizes_arm64.csv):
+---------------------+---------+------------+-----------+
|                     |   min   |   max      |   median  |
+---------------------+---------+------------+-----------+
| C2 nmethods         | 155     | 7447       | 1198      |
| Total size - bytes  | 366,248 | 52,840,528 | 4,989,392 |
+---------------------+---------+------------+-----------+

Proportion of the total size of a section vs C2 nmethods total size

+---------------+-------+-------+--------+
|    Section    |  min  |  max  | median |
+---------------+-------+-------+--------+
| header        | 4.8%  | 14.6% | 8.5%   |
| consts        | 0.0%  | 0.1%  | 0.0%   |
| instrs        | 35.7% | 45.6% | 42.8%  |
| stub code     | 8.3%  | 12.0% | 10.1%  |
| oops          | 0.2%  | 0.6%  | 0.4%   |
| metadata      | 2.0%  | 4.1%  | 3.0%   |
| scopes data   | 12.4% | 20.8% | 16.1%  |
| scopes pcs    | 7.8%  | 8.9%  | 8.4%   |
| deps          | 0.4%  | 1.0%  | 0.5%   |
| handler table | 1.2%  | 3.9%  | 2.4%   |
| nul_chk table | 0.9%  | 1.3%  | 1.1%   |
+---------------+-------+-------+--------+

    * x86_64 (full data https://github.com/eastig/codecache/blob/master/jdk17/renaissance_c2_sizes_x86_64.csv):

+---------------------+---------+------------+-----------+
|                     |   min   |   max      |   median  |
+---------------------+---------+------------+-----------+
| C2 nmethods         | 158     | 7242       | 938       |
| Total size - bytes  | 354,952 | 47,019,560 | 3,791,764 |
+---------------------+---------+------------+-----------+

Proportion of the total size of a section vs C2 nmethods total size

+---------------+-------+-------+--------+
|    Section    |  min  |  max  | median |
+---------------+-------+-------+--------+
| header        | 5.4%  | 15.7% | 9.7%   |
| consts        | 0.0%  | 0.1%  | 0.0%   |
| instrs        | 46.1% | 54.4% | 52.7%  |
| stub code     | 1.3%  | 1.9%  | 1.4%   |
| oops          | 0.2%  | 0.5%  | 0.3%   |
| metadata      | 1.9%  | 3.4%  | 2.6%   |
| scopes data   | 12.7% | 23.6% | 17.4%  |
| scopes pcs    | 8.0%  | 9.4%  | 8.6%   |
| deps          | 0.4%  | 1.0%  | 0.5%   |
| handler table | 1.3%  | 4.0%  | 2.5%   |
| nul_chk table | 1.0%  | 1.4%  | 1.2%   |
+---------------+-------+-------+--------+

The data show that due to intervening non-executable data in NMethods, executable code is sparse in the CodeCache. The data also show the most contributors of non-executable data are the header and scopes sections. Arm64 vs x86_64 looks consistent except the stub code. On arm64 the size of the stub code is 4-5 times bigger.

We?d like to have an option to configure the CodeCache to support C2 nmethods with separated executable code and non-executable data. According to the fixed JDK-8152664 (https://bugs.openjdk.java.net/browse/JDK-8152664) ?Support non-continuous CodeBlobs in HotSpot?, NMethod sections can be located in different places of memory. The discussion of it: https://mail.openjdk.java.net/pipermail/hotspot-dev/2016-April/022500.html. Separating code will complicate maintenance of the CodeCache. Different parts of memory for a nmethod need to be allocated/released.

There is JDK-7072317 ?move metadata from CodeCache? (https://bugs.openjdk.java.net/browse/JDK-7072317) which the implementation works can be done under.

There can be different approaches for the implementation:

1. What to separate:
    a. All code (main plus stub) from other sections.
    b. Or only main code because this is the code where an application should spend most of the time.
    c. Or the header and scope sections.
2. Where to put:
    a. Different segments for code and nmethod data. This will require updating NMethod because it uses code_offset, stub_offset from header_begin.
    b. The same segment but in a different part (e.g., code grows from lower addresses upwards and metadata from high addresses downwards). This might allow keeping NMethod using code_offset, stub_offset.
    c.  Or in a completely different place (C-heap, Metaspace,...)

It needs to be investigated if the separation of sections which are frequently accessed during the normal execution of the code (e.g., oop section) affects the performance negatively. We might need to change NMethodSweeper to preserve the code locality property.

We would like to get feedback on the above approaches (or something different) before implementing JDK-7072317.
 
Comments welcome!
 
Thanks,
Evgeny Astigeevich, AWS Corretto Team




Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom.



From sviswanathan at openjdk.java.net  Tue Nov 23 17:52:40 2021
From: sviswanathan at openjdk.java.net (Sandhya Viswanathan)
Date: Tue, 23 Nov 2021 17:52:40 GMT
Subject: RFR: 8277617: Optimize array copy and clear on x86_64 [v3]
In-Reply-To: <UXnYNz0bqKdWBzAjKTAw7xsgj8ilC3mN7c4s8Xsr6zw=.24b2646d-7da8-4da5-85f8-defa502d90aa@github.com>
References: <UXnYNz0bqKdWBzAjKTAw7xsgj8ilC3mN7c4s8Xsr6zw=.24b2646d-7da8-4da5-85f8-defa502d90aa@github.com>
Message-ID: <z4UqLNg2RiND4LaahyQZB8ax5aJkPy_VNe4lhaQRMk4=.a1ac4a2c-8b07-417a-bf71-f73ce4f1fe2f@github.com>

> Currently 32-byte instructions are used for small array copy and clear. 
> This can be optimized by using 64-byte instructions.
> 
> Please review.
> 
> Best Regards,
> Sandhya

Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision:

  update comment for avx3_threshold() with more details

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6512/files
  - new: https://git.openjdk.java.net/jdk/pull/6512/files/e0cb890d..c90e7004

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6512&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6512&range=01-02

  Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6512.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6512/head:pull/6512

PR: https://git.openjdk.java.net/jdk/pull/6512

From jbhateja at openjdk.java.net  Tue Nov 23 19:06:15 2021
From: jbhateja at openjdk.java.net (Jatin Bhateja)
Date: Tue, 23 Nov 2021 19:06:15 GMT
Subject: RFR: 8277617: Optimize array copy and clear on x86_64 [v3]
In-Reply-To: <z4UqLNg2RiND4LaahyQZB8ax5aJkPy_VNe4lhaQRMk4=.a1ac4a2c-8b07-417a-bf71-f73ce4f1fe2f@github.com>
References: <UXnYNz0bqKdWBzAjKTAw7xsgj8ilC3mN7c4s8Xsr6zw=.24b2646d-7da8-4da5-85f8-defa502d90aa@github.com>
 <z4UqLNg2RiND4LaahyQZB8ax5aJkPy_VNe4lhaQRMk4=.a1ac4a2c-8b07-417a-bf71-f73ce4f1fe2f@github.com>
Message-ID: <auZ4FOXhlHO8TeEC_8543U5P8ieulDKrKE-T8DViRZ4=.ac770862-3f22-47a1-8633-7d7f76acad87@github.com>

On Tue, 23 Nov 2021 17:52:40 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:

>> Currently 32-byte instructions are used for small array copy and clear. 
>> This can be optimized by using 64-byte instructions.
>> 
>> Please review.
>> 
>> Best Regards,
>> Sandhya
>
> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision:
> 
>   update comment for avx3_threshold() with more details

src/hotspot/cpu/x86/vm_version_x86.hpp line 920:

> 918:   // is set to 0 for these platforms.
> 919:   static int avx3_threshold() { return ((is_intel_family_core() &&
> 920:                                 supports_serialize()) ? 0: AVX3Threshold); }

Hi @sviswa7 , Should we not return a zero threshold only if user does not explicitly set AVX3Threshold i.e. in default case.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6512

From sviswanathan at openjdk.java.net  Tue Nov 23 22:24:07 2021
From: sviswanathan at openjdk.java.net (Sandhya Viswanathan)
Date: Tue, 23 Nov 2021 22:24:07 GMT
Subject: RFR: 8277617: Optimize array copy and clear on x86_64
In-Reply-To: <HtYRlJ4gFYuJhzNoqjn0T0YnZ_GluaNqIXrGhG2KHNQ=.a99967f2-3c3f-4bae-a918-1fe4b231a53f@github.com>
References: <UXnYNz0bqKdWBzAjKTAw7xsgj8ilC3mN7c4s8Xsr6zw=.24b2646d-7da8-4da5-85f8-defa502d90aa@github.com>
 <i_wxLk5rTL7hCabnpNjD71QgW3eeO3h5S3Z0Cib7juM=.d8967941-e6d0-40fd-a79a-0db05faf62e6@github.com>
 <xGiWOFdT7TkPCySvVTsX5DAOwQSnpV6R45wTdxKUe5k=.995b070d-685c-43be-bcfa-640860ef28cf@github.com>
 <HcpmEdneyRzTCtdGeryGYHERd-pfjmCeHBh1ggmwE2Q=.dde41662-94a6-485e-95bb-61aa03df78eb@github.com>
 <HtYRlJ4gFYuJhzNoqjn0T0YnZ_GluaNqIXrGhG2KHNQ=.a99967f2-3c3f-4bae-a918-1fe4b231a53f@github.com>
Message-ID: <-cmJjHI8NnKQ0YbPeHP_aRW7J797ZT38ZS9dCeGwdSw=.62291db3-14bd-4ed5-ac01-6f783ab5c5fe@github.com>

On Tue, 23 Nov 2021 06:05:48 GMT, Jie Fu <jiefu at openjdk.org> wrote:

>> @sviswa7 that further restriction and an explanatory comment would be appreciated. Thanks.
>
>> @dholmes-ora We see about 25% gain on a micro on our latest platform. There is no cpuid bit for this, so the closest was to check for the new serialize ISA supported on this platform.
> 
> It would be better to add a jmh test for this opt.
> Thanks.

@DamonFool There are jmh tests for Arraycopy in test/micro/org/openjdk/bench/java/lang/Arraycopy.java.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6512

From sviswanathan at openjdk.java.net  Tue Nov 23 22:46:04 2021
From: sviswanathan at openjdk.java.net (Sandhya Viswanathan)
Date: Tue, 23 Nov 2021 22:46:04 GMT
Subject: RFR: 8277617: Optimize array copy and clear on x86_64 [v3]
In-Reply-To: <auZ4FOXhlHO8TeEC_8543U5P8ieulDKrKE-T8DViRZ4=.ac770862-3f22-47a1-8633-7d7f76acad87@github.com>
References: <UXnYNz0bqKdWBzAjKTAw7xsgj8ilC3mN7c4s8Xsr6zw=.24b2646d-7da8-4da5-85f8-defa502d90aa@github.com>
 <z4UqLNg2RiND4LaahyQZB8ax5aJkPy_VNe4lhaQRMk4=.a1ac4a2c-8b07-417a-bf71-f73ce4f1fe2f@github.com>
 <auZ4FOXhlHO8TeEC_8543U5P8ieulDKrKE-T8DViRZ4=.ac770862-3f22-47a1-8633-7d7f76acad87@github.com>
Message-ID: <W-iJe9jy6fGTonzHiIjsceMs3MBycaZIVKMsFk44kEA=.7b184a2c-8fe7-4cbd-8b8c-11f1c5aecf18@github.com>

On Tue, 23 Nov 2021 19:01:53 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   update comment for avx3_threshold() with more details
>
> src/hotspot/cpu/x86/vm_version_x86.hpp line 920:
> 
>> 918:   // is set to 0 for these platforms.
>> 919:   static int avx3_threshold() { return ((is_intel_family_core() &&
>> 920:                                 supports_serialize()) ? 0: AVX3Threshold); }
> 
> Hi @sviswa7 , Should we not return a zero threshold only if user does not explicitly set AVX3Threshold i.e. in default case.

@jatin-bhateja On these platforms it is beneficial to set the threshold to zero for copy and clear operations and hence the override. I have described that in the comment in detail as well.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6512

From dholmes at openjdk.java.net  Wed Nov 24 05:05:07 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Wed, 24 Nov 2021 05:05:07 GMT
Subject: RFR: 8277617: Optimize array copy and clear on x86_64 [v3]
In-Reply-To: <W-iJe9jy6fGTonzHiIjsceMs3MBycaZIVKMsFk44kEA=.7b184a2c-8fe7-4cbd-8b8c-11f1c5aecf18@github.com>
References: <UXnYNz0bqKdWBzAjKTAw7xsgj8ilC3mN7c4s8Xsr6zw=.24b2646d-7da8-4da5-85f8-defa502d90aa@github.com>
 <z4UqLNg2RiND4LaahyQZB8ax5aJkPy_VNe4lhaQRMk4=.a1ac4a2c-8b07-417a-bf71-f73ce4f1fe2f@github.com>
 <auZ4FOXhlHO8TeEC_8543U5P8ieulDKrKE-T8DViRZ4=.ac770862-3f22-47a1-8633-7d7f76acad87@github.com>
 <W-iJe9jy6fGTonzHiIjsceMs3MBycaZIVKMsFk44kEA=.7b184a2c-8fe7-4cbd-8b8c-11f1c5aecf18@github.com>
Message-ID: <2zFKQ-o4UXauqYztn-Zu02_rCOJ57minkJaZvT7BShk=.249b47b5-e94e-419b-9c8a-e1342ebfe0a4@github.com>

On Tue, 23 Nov 2021 22:43:03 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:

>> src/hotspot/cpu/x86/vm_version_x86.hpp line 920:
>> 
>>> 918:   // is set to 0 for these platforms.
>>> 919:   static int avx3_threshold() { return ((is_intel_family_core() &&
>>> 920:                                 supports_serialize()) ? 0: AVX3Threshold); }
>> 
>> Hi @sviswa7 , Should we not return a zero threshold only if user does not explicitly set AVX3Threshold i.e. in default case.
>
> @jatin-bhateja On these platforms it is beneficial to set the threshold to zero for copy and clear operations and hence the override. I have described that in the comment in detail as well.

@sviswa7 I tend to agree with @jatin-bhateja . AVX3Threshold is a diagnostic flag so if someone has deliberately modified it so they can measure something, your change will make that impossible on newer systems. You may want to define a static field to store the actual value for `avx3_threshold()` to return and initialize it during VM initialization. Or lazy initialize it on first use.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6512

From dholmes at openjdk.java.net  Wed Nov 24 05:19:06 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Wed, 24 Nov 2021 05:19:06 GMT
Subject: RFR: 8277631: ZGC: CriticalMetaspaceAllocation asserts
In-Reply-To: <CVnXCzCvWpdyq01hRvsGJr32qKns-Oa9uyBvggkXzfk=.a2eb08dd-92b7-4849-9ae9-22df9ebd63a8@github.com>
References: <CVnXCzCvWpdyq01hRvsGJr32qKns-Oa9uyBvggkXzfk=.a2eb08dd-92b7-4849-9ae9-22df9ebd63a8@github.com>
Message-ID: <FeAVpfs9WqbCkkyb8LTbp14NLRKtAsGoBkLHiIi9Nuc=.b7860f2d-8e3a-48fc-97b9-fa75ffa04b8f@github.com>

On Tue, 23 Nov 2021 14:14:31 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote:

> The MetaspaceCritical_lock is a non-safepoint checking lock. That implies that the allow VM block flag is true. That implies that taking that lock takes a NoSafepointVerifier. That causes an assert to fire when MetaspaceCriticalAllocation::wait_for_purge transitions to blocked with ThreadBlockInVM while holding the lock. The fix is to move the locker inside of the ThreadBlockInVM.

Just a comment but it always concerns me that if we have to manually add a TBIVM when using a non-safepoint-checking lock then the lock is mis-classified as a non-safepoint-checking one! :(

That aside changes look fine. A few grammatical nits in the test.

Thanks,
David

src/hotspot/share/memory/metaspaceCriticalAllocation.cpp line 130:

> 128: void MetaspaceCriticalAllocation::wait_for_purge(MetadataAllocationRequest* request) {
> 129:   for (;;) {
> 130:     ThreadBlockInVM tbivm(JavaThread::current());

Can't you move the TBIVM outside of the loop now that it is always created?

test/hotspot/jtreg/vmTestbase/gc/gctests/LoadUnloadGC/LoadUnloadGC.java line 55:

> 53:  * VM Testbase keywords: [gc, stress, stressopt, nonconcurrent, monitoring]
> 54:  * VM Testbase readme:
> 55:  * In this test a 1000 classes are loaded and unloaded in a loop.

nit: /a 1000/1000/

test/hotspot/jtreg/vmTestbase/gc/gctests/LoadUnloadGC/LoadUnloadGC.java line 57:

> 55:  * In this test a 1000 classes are loaded and unloaded in a loop.
> 56:  * Class0 gets loaded which results in Class1 getting loaded and so on all
> 57:  * the way uptill class1000.  The classes should be unloaded whenever a

nit: /uptill/up until/ or /up to/

test/hotspot/jtreg/vmTestbase/gc/gctests/LoadUnloadGC/LoadUnloadGC.java line 59:

> 57:  * the way uptill class1000.  The classes should be unloaded whenever a
> 58:  * garbage collection takes place because their classloader is made unreachable
> 59:  * at the end of the each loop iteration. The loop is repeated 1000 times.

nit: s/the each/each/

-------------

Marked as reviewed by dholmes (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6520

From eosterlund at openjdk.java.net  Wed Nov 24 08:27:37 2021
From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=)
Date: Wed, 24 Nov 2021 08:27:37 GMT
Subject: RFR: 8277631: ZGC: CriticalMetaspaceAllocation asserts [v2]
In-Reply-To: <CVnXCzCvWpdyq01hRvsGJr32qKns-Oa9uyBvggkXzfk=.a2eb08dd-92b7-4849-9ae9-22df9ebd63a8@github.com>
References: <CVnXCzCvWpdyq01hRvsGJr32qKns-Oa9uyBvggkXzfk=.a2eb08dd-92b7-4849-9ae9-22df9ebd63a8@github.com>
Message-ID: <Mx2JK6pFK5qq4R-gWv_4wj_VPaLL5Jgd8eFoLNBlm30=.f641ae6f-6fdb-4cc2-92df-ea853cdb576c@github.com>

> The MetaspaceCritical_lock is a non-safepoint checking lock. That implies that the allow VM block flag is true. That implies that taking that lock takes a NoSafepointVerifier. That causes an assert to fire when MetaspaceCriticalAllocation::wait_for_purge transitions to blocked with ThreadBlockInVM while holding the lock. The fix is to move the locker inside of the ThreadBlockInVM.

Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision:

  dholmes review comments

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6520/files
  - new: https://git.openjdk.java.net/jdk/pull/6520/files/ec93bede..ea651174

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6520&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6520&range=00-01

  Stats: 10 lines in 2 files changed: 2 ins; 2 del; 6 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6520.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6520/head:pull/6520

PR: https://git.openjdk.java.net/jdk/pull/6520

From eosterlund at openjdk.java.net  Wed Nov 24 08:41:07 2021
From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=)
Date: Wed, 24 Nov 2021 08:41:07 GMT
Subject: RFR: 8277631: ZGC: CriticalMetaspaceAllocation asserts [v2]
In-Reply-To: <FeAVpfs9WqbCkkyb8LTbp14NLRKtAsGoBkLHiIi9Nuc=.b7860f2d-8e3a-48fc-97b9-fa75ffa04b8f@github.com>
References: <CVnXCzCvWpdyq01hRvsGJr32qKns-Oa9uyBvggkXzfk=.a2eb08dd-92b7-4849-9ae9-22df9ebd63a8@github.com>
 <FeAVpfs9WqbCkkyb8LTbp14NLRKtAsGoBkLHiIi9Nuc=.b7860f2d-8e3a-48fc-97b9-fa75ffa04b8f@github.com>
Message-ID: <pCqLbVzBuHJ6HSXa8scFEq8B4fztIhgv188DfFh2Jhw=.c3da8ca2-c773-43ac-9e74-c3c1865cc2f3@github.com>

On Wed, 24 Nov 2021 05:11:23 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   dholmes review comments
>
> src/hotspot/share/memory/metaspaceCriticalAllocation.cpp line 130:
> 
>> 128: void MetaspaceCriticalAllocation::wait_for_purge(MetadataAllocationRequest* request) {
>> 129:   for (;;) {
>> 130:     ThreadBlockInVM tbivm(JavaThread::current());
> 
> Can't you move the TBIVM outside of the loop now that it is always created?

> Just a comment but it always concerns me that if we have to manually add a TBIVM when using a non-safepoint-checking lock then the lock is mis-classified as a non-safepoint-checking one! :(
> 
> That aside changes look fine. A few grammatical nits in the test.
> 
> Thanks, David

Thanks for the review David. I fixed your nits.

I agree that this lock should preferably have been a safepoint checking lock. In fact, it *was* a safepoint checking lock when I wrote the code. But that was before we changed the locking rules so that whether we do safepoint checking or not is a function of the rank. After that, this lock has become constrained to the current low rank by the current set of other locks, and by being that low ish rank, it is not allowed to safepoint check, unless I move a bunch of other lock ranks around, and figure out if it's okay for those locks to start safepoint checking as well, which isn't entirely obvious.

So this lock might be a case where those new rules end up being a bit awkward, and have lead this code to do manual transitions to blocked instead, as an escape hatch from the asserts.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6520

From dholmes at openjdk.java.net  Wed Nov 24 09:02:09 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Wed, 24 Nov 2021 09:02:09 GMT
Subject: RFR: 8277631: ZGC: CriticalMetaspaceAllocation asserts [v2]
In-Reply-To: <pCqLbVzBuHJ6HSXa8scFEq8B4fztIhgv188DfFh2Jhw=.c3da8ca2-c773-43ac-9e74-c3c1865cc2f3@github.com>
References: <CVnXCzCvWpdyq01hRvsGJr32qKns-Oa9uyBvggkXzfk=.a2eb08dd-92b7-4849-9ae9-22df9ebd63a8@github.com>
 <FeAVpfs9WqbCkkyb8LTbp14NLRKtAsGoBkLHiIi9Nuc=.b7860f2d-8e3a-48fc-97b9-fa75ffa04b8f@github.com>
 <pCqLbVzBuHJ6HSXa8scFEq8B4fztIhgv188DfFh2Jhw=.c3da8ca2-c773-43ac-9e74-c3c1865cc2f3@github.com>
Message-ID: <5bQ6PmeDjpFBgxZ6-xVI4uKgo1LCiXXyDOsRkiQjkZg=.e551776e-1f2c-4610-923a-569852138f3f@github.com>

On Wed, 24 Nov 2021 08:38:16 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote:

>> src/hotspot/share/memory/metaspaceCriticalAllocation.cpp line 130:
>> 
>>> 128: void MetaspaceCriticalAllocation::wait_for_purge(MetadataAllocationRequest* request) {
>>> 129:   for (;;) {
>>> 130:     ThreadBlockInVM tbivm(JavaThread::current());
>> 
>> Can't you move the TBIVM outside of the loop now that it is always created?
>
>> Just a comment but it always concerns me that if we have to manually add a TBIVM when using a non-safepoint-checking lock then the lock is mis-classified as a non-safepoint-checking one! :(
>> 
>> That aside changes look fine. A few grammatical nits in the test.
>> 
>> Thanks, David
> 
> Thanks for the review David. I fixed your nits.
> 
> I agree that this lock should preferably have been a safepoint checking lock. In fact, it *was* a safepoint checking lock when I wrote the code. But that was before we changed the locking rules so that whether we do safepoint checking or not is a function of the rank. After that, this lock has become constrained to the current low rank by the current set of other locks, and by being that low ish rank, it is not allowed to safepoint check, unless I move a bunch of other lock ranks around, and figure out if it's okay for those locks to start safepoint checking as well, which isn't entirely obvious.
> 
> So this lock might be a case where those new rules end up being a bit awkward, and have lead this code to do manual transitions to blocked instead, as an escape hatch from the asserts.

Ah! I forgot about the rank changes that forced this dichotomy.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6520

From mcimadamore at openjdk.java.net  Wed Nov 24 11:55:11 2021
From: mcimadamore at openjdk.java.net (Maurizio Cimadamore)
Date: Wed, 24 Nov 2021 11:55:11 GMT
Subject: Integrated: 8275063: Implementation of Foreign Function & Memory API
 (Second incubator)
In-Reply-To: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
References: <ZWqSuAbHEr4AWe4pOadf6EHbJhxk0A1ddjnRqcur0h8=.b79dad77-ed29-48db-8892-e76a82aa4cd6@github.com>
Message-ID: <lMUznRCV0PQNur1JEW_x9940VAD3m5xhSAWLWpEZaLA=.ccc1b83b-98e5-4aa6-a237-7c482550ab7e@github.com>

On Tue, 12 Oct 2021 11:16:51 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

> This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment.
> 
> [1] - https://openjdk.java.net/jeps/419

This pull request has now been integrated.

Changeset: 96e36071
Author:    Maurizio Cimadamore <mcimadamore at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/96e36071b63b624d56739b014b457ffc48147c4f
Stats:     14700 lines in 193 files changed: 6958 ins; 5126 del; 2616 mod

8275063: Implementation of Foreign Function & Memory API (Second incubator)

Reviewed-by: erikj, psandoz, jvernee, darcy

-------------

PR: https://git.openjdk.java.net/jdk/pull/5907

From stuefe at openjdk.java.net  Wed Nov 24 12:16:14 2021
From: stuefe at openjdk.java.net (Thomas Stuefe)
Date: Wed, 24 Nov 2021 12:16:14 GMT
Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks [v6]
In-Reply-To: <IwunT5rbTfQfEBiFMgvgBn5VBfhbuGwJ7_f3EaWGPEY=.16bb5520-3fbc-4e80-9227-ac6ad299c244@github.com>
References: <h_TohyRCtTn8BzN-h8uTTotWxPr5gTe4CWB04_lS8uA=.3f687b45-3058-42ab-902b-f42cfeb9104d@github.com>
 <IwunT5rbTfQfEBiFMgvgBn5VBfhbuGwJ7_f3EaWGPEY=.16bb5520-3fbc-4e80-9227-ac6ad299c244@github.com>
Message-ID: <2dExt0esGNAm7M_l0tiCAQvl6kuUMHCjArg2_KkD4aE=.679f5205-1053-4b2d-8631-8ad76a16c7ff@github.com>

On Fri, 19 Nov 2021 14:29:17 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> This is part of a number of RFE I plan to improve and simplify C-heap overflow checking in hotspot.
>> 
>> This proposal adds NMT buffer overflow checking:
>> 
>> - it gives us C-heap overflow checking in release builds
>> - the costs are neglectable: if NMT is off, we won't pay anything; if NMT is on, the added work is minuscule since we have to do malloc header management anyway.
>> - NMT needs intact headers anyway. Faced with buffer overwrites today, it would maybe crash or maybe account wrongly, but it's a bit of a lottery really. Better to go the extra step and do a real check.
>> - it could be a preparation for future code removal, if we wanted to do that (see details in umbrella RFE https://bugs.openjdk.java.net/browse/JDK-8275301). That way, net complexity would come down even with this patch.
>> 
>> For more details, please see the JBS issue.
>> 
>> ----
>> 
>> Patch notes:
>> 
>> - The malloc header is changed such that it contains a 16-bit canary directly preceding the user payload of the allocation. The new malloc header does not use bitfields anymore but normal types. For more details, see the comment in mallocTracker.hpp.
>>   - On 64-bit, we don't enlarge the malloc header. It remains 16 bytes in length. So no additional memory cost (apart from the 1-byte-footer, see below). Space for the canary is instead obtained by reducing the size of the bucket index bit field to 16 bits. That bit field is used to store the bucket slot index of the malloc site table in NMT detail mode. With 40 bits it was over-dimensioned, and even 16-bits arguably still are: malloc site table width is 512.
>>   - On 32-bit, I had to enlarge the header from 8 bytes to 16 bytes to make room for a canary. But strictly speaking 8 bytes were not enough anyway: the header size has to be large enough to satisfy malloc(3) alignment, and that would be 16 bytes. I believe it never led to an error since we don't store 128bit data in malloc'd memory in the hotspot anywhere.
>> 
>> - I added a footer canary trailing the user allocation to catch tail buffer overruns. To keep matters simple (alignment) I made it a single byte only. That is enough to catch most overrun scenarios.
>> 
>> - I brushed up error reporting. When NMT detects corruption, it will now print out a hex dump of the corrupted area to tty before asserting.
>> 
>> - I added a bunch of gtests to test various heap overwrite scenarios. I also had to extend the gtest macros a bit because I wanted these tests of course to run in release builds too, but we did not have a death test macro for release builds yet (there are possibilities for code simplification here too, but that's for another RFE).
>> 
>> - I renamed `nmt_header_size` to `nmt_overhead` since that size includes header and footer now.
>> 
>> - I made the assert for malloc site table width a compile time STATIC_ASSERT.
>> 
>> --------------
>> 
>> Example output a buffer overrun would provide:
>> 
>> 
>> Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)
>> NMT Block at 0x00005600f86136b0, corruption at: 0x00005600f86136c1: 
>> 0x00005600f86136a8:   21 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00
>> 0x00005600f86136b8:   00 00 00 00 0f 00 1f fa 00 61 00 00 00 00 00 00
>> 0x00005600f86136c8:   41 39 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f86136d8:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
>> 0x00005600f86136e8:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f86136f8:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f8613708:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f8613718:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f8613728:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x00005600f8613738:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
>> assert failed: fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)#
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> #  Internal Error (mallocTracker.cpp:203), pid=10805, tid=10805
>> #  fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)
>> #
>> 
>> -------
>> 
>> Tests:
>> - manual tests with Linux x64, x86, minimal build
>> - GHAs all clean
>> - SAP nightlies ran for 4 weeks now without problems
>
> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits:
> 
>  - Volker Feedback 2
>  - Fix Zhengyu Problem in os::realloc
>  - Extend gtests
>  - extend footer to 2 bytes
>  - Feedback Volker
>  - Let NMT do overflow detection

Nightlies are clean.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5952

From stuefe at openjdk.java.net  Wed Nov 24 12:16:16 2021
From: stuefe at openjdk.java.net (Thomas Stuefe)
Date: Wed, 24 Nov 2021 12:16:16 GMT
Subject: Integrated: JDK-8275320: NMT should perform buffer overrun checks
In-Reply-To: <h_TohyRCtTn8BzN-h8uTTotWxPr5gTe4CWB04_lS8uA=.3f687b45-3058-42ab-902b-f42cfeb9104d@github.com>
References: <h_TohyRCtTn8BzN-h8uTTotWxPr5gTe4CWB04_lS8uA=.3f687b45-3058-42ab-902b-f42cfeb9104d@github.com>
Message-ID: <KqQ3eRIPriFpk4RxedgFOEQ5Hobv1xEayxTi3YkiuIk=.ebb0b3a5-ff23-4288-b268-ab70d364ddb5@github.com>

On Thu, 14 Oct 2021 15:49:05 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> This is part of a number of RFE I plan to improve and simplify C-heap overflow checking in hotspot.
> 
> This proposal adds NMT buffer overflow checking:
> 
> - it gives us C-heap overflow checking in release builds
> - the costs are neglectable: if NMT is off, we won't pay anything; if NMT is on, the added work is minuscule since we have to do malloc header management anyway.
> - NMT needs intact headers anyway. Faced with buffer overwrites today, it would maybe crash or maybe account wrongly, but it's a bit of a lottery really. Better to go the extra step and do a real check.
> - it could be a preparation for future code removal, if we wanted to do that (see details in umbrella RFE https://bugs.openjdk.java.net/browse/JDK-8275301). That way, net complexity would come down even with this patch.
> 
> For more details, please see the JBS issue.
> 
> ----
> 
> Patch notes:
> 
> - The malloc header is changed such that it contains a 16-bit canary directly preceding the user payload of the allocation. The new malloc header does not use bitfields anymore but normal types. For more details, see the comment in mallocTracker.hpp.
>   - On 64-bit, we don't enlarge the malloc header. It remains 16 bytes in length. So no additional memory cost (apart from the 1-byte-footer, see below). Space for the canary is instead obtained by reducing the size of the bucket index bit field to 16 bits. That bit field is used to store the bucket slot index of the malloc site table in NMT detail mode. With 40 bits it was over-dimensioned, and even 16-bits arguably still are: malloc site table width is 512.
>   - On 32-bit, I had to enlarge the header from 8 bytes to 16 bytes to make room for a canary. But strictly speaking 8 bytes were not enough anyway: the header size has to be large enough to satisfy malloc(3) alignment, and that would be 16 bytes. I believe it never led to an error since we don't store 128bit data in malloc'd memory in the hotspot anywhere.
> 
> - I added a footer canary trailing the user allocation to catch tail buffer overruns. To keep matters simple (alignment) I made it a single byte only. That is enough to catch most overrun scenarios.
> 
> - I brushed up error reporting. When NMT detects corruption, it will now print out a hex dump of the corrupted area to tty before asserting.
> 
> - I added a bunch of gtests to test various heap overwrite scenarios. I also had to extend the gtest macros a bit because I wanted these tests of course to run in release builds too, but we did not have a death test macro for release builds yet (there are possibilities for code simplification here too, but that's for another RFE).
> 
> - I renamed `nmt_header_size` to `nmt_overhead` since that size includes header and footer now.
> 
> - I made the assert for malloc site table width a compile time STATIC_ASSERT.
> 
> --------------
> 
> Example output a buffer overrun would provide:
> 
> 
> Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)
> NMT Block at 0x00005600f86136b0, corruption at: 0x00005600f86136c1: 
> 0x00005600f86136a8:   21 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00
> 0x00005600f86136b8:   00 00 00 00 0f 00 1f fa 00 61 00 00 00 00 00 00
> 0x00005600f86136c8:   41 39 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00005600f86136d8:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> 0x00005600f86136e8:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00005600f86136f8:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00005600f8613708:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00005600f8613718:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00005600f8613728:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00005600f8613738:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> assert failed: fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)#
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  Internal Error (mallocTracker.cpp:203), pid=10805, tid=10805
> #  fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)
> #
> 
> -------
> 
> Tests:
> - manual tests with Linux x64, x86, minimal build
> - GHAs all clean
> - SAP nightlies ran for 4 weeks now without problems

This pull request has now been integrated.

Changeset: cf7adae6
Author:    Thomas Stuefe <stuefe at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/cf7adae6333c7446048ef0364737927337631f63
Stats:     434 lines in 11 files changed: 385 ins; 11 del; 38 mod

8275320: NMT should perform buffer overrun checks
8275320: NMT should perform buffer overrun checks
8275301: Unify C-heap buffer overrun checks into NMT

Reviewed-by: simonis, zgu

-------------

PR: https://git.openjdk.java.net/jdk/pull/5952

From zgu at openjdk.java.net  Wed Nov 24 16:02:17 2021
From: zgu at openjdk.java.net (Zhengyu Gu)
Date: Wed, 24 Nov 2021 16:02:17 GMT
Subject: RFR: 8277797: Remove undefined/unused SharedRuntime::trampoline_size()
Message-ID: <odjG6gibrDM0PFJwVRkHuz-2ft2yg5xLb6xbrQt65r8=.1fc8e18b-0725-46d1-b434-9b6177610480@github.com>

A trivial patch to remove undefined and unused `SharedRuntime::trampoline_size()`, a leftover from [JDK-8263002](https://bugs.openjdk.java.net/browse/JDK-8263002).

-------------

Commit messages:
 - v0

Changes: https://git.openjdk.java.net/jdk/pull/6540/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6540&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8277797
  Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6540.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6540/head:pull/6540

PR: https://git.openjdk.java.net/jdk/pull/6540

From simonis at openjdk.java.net  Wed Nov 24 16:41:27 2021
From: simonis at openjdk.java.net (Volker Simonis)
Date: Wed, 24 Nov 2021 16:41:27 GMT
Subject: RFR: 8275908: Record null_check traps for calls and array_check traps
 in the interpreter
Message-ID: <RvOm5XIj3o-4aocezdzRYl1HObteBMTWaSWd_hcS_Qk=.c3273db0-b957-4dde-b5b9-bb37865a9d47@github.com>

`null_checks` occurring at invoke bytecodes are currently not recorded by the profiler. This leads to unnecessary uncommon traps, deoptimizations and recompilations for exceptions which already occurred before the compilation (i.e. are "hot"). This change fixes the problem in the interpreter.

`array_checks` are currently recorded as `class_checks` in the interpreter and therefore not recognized by the compiler. This again leads to uncommon traps, deoptimizations and recompilations. This change unifies the handling of `array_checks` in the interpreter and compiler and prevents unnecessary recompilation.

The test is a stripped down version of a test which was developed for [JDK-8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow](https://bugs.openjdk.java.net/browse/JDK-8273563) (still [under review](https://github.com/openjdk/jdk/pull/5488)). It introduces an extension to the Whitebox API to expose the decompile, deopt and trap counters which is also required for testing [JDK-8273563](https://bugs.openjdk.java.net/browse/JDK-8273563). I think (and hope) it will also be helpful for others in the future.

-------------

Commit messages:
 - 8275908: Record null_check traps for calls and array_check traps in the interpreter

Changes: https://git.openjdk.java.net/jdk/pull/6541/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6541&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8275908
  Stats: 519 lines in 9 files changed: 509 ins; 2 del; 8 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6541.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6541/head:pull/6541

PR: https://git.openjdk.java.net/jdk/pull/6541

From sviswanathan at openjdk.java.net  Wed Nov 24 16:55:32 2021
From: sviswanathan at openjdk.java.net (Sandhya Viswanathan)
Date: Wed, 24 Nov 2021 16:55:32 GMT
Subject: RFR: 8277617: Optimize array copy and clear on x86_64 [v4]
In-Reply-To: <UXnYNz0bqKdWBzAjKTAw7xsgj8ilC3mN7c4s8Xsr6zw=.24b2646d-7da8-4da5-85f8-defa502d90aa@github.com>
References: <UXnYNz0bqKdWBzAjKTAw7xsgj8ilC3mN7c4s8Xsr6zw=.24b2646d-7da8-4da5-85f8-defa502d90aa@github.com>
Message-ID: <EQhMD_Jyc5voaHn-qjhTjEACFFejYz5v8bAr4ZjFw5E=.4922dd69-4510-4dab-ae58-c75d94ed4ffc@github.com>

> Currently 32-byte instructions are used for small array copy and clear. 
> This can be optimized by using 64-byte instructions.
> 
> Please review.
> 
> Best Regards,
> Sandhya

Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision:

  Override threshold only if flag is default

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6512/files
  - new: https://git.openjdk.java.net/jdk/pull/6512/files/c90e7004..021bc659

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6512&range=03
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6512&range=02-03

  Stats: 21 lines in 2 files changed: 14 ins; 6 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6512.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6512/head:pull/6512

PR: https://git.openjdk.java.net/jdk/pull/6512

From sviswanathan at openjdk.java.net  Wed Nov 24 16:55:33 2021
From: sviswanathan at openjdk.java.net (Sandhya Viswanathan)
Date: Wed, 24 Nov 2021 16:55:33 GMT
Subject: RFR: 8277617: Optimize array copy and clear on x86_64
In-Reply-To: <1KoRjoyObIS32kwNcojcLdIdUkdqpL1Pon6-IIn-H94=.a986a7bb-a14b-4df8-9ab2-9c66650e6d1b@github.com>
References: <UXnYNz0bqKdWBzAjKTAw7xsgj8ilC3mN7c4s8Xsr6zw=.24b2646d-7da8-4da5-85f8-defa502d90aa@github.com>
 <i_wxLk5rTL7hCabnpNjD71QgW3eeO3h5S3Z0Cib7juM=.d8967941-e6d0-40fd-a79a-0db05faf62e6@github.com>
 <xGiWOFdT7TkPCySvVTsX5DAOwQSnpV6R45wTdxKUe5k=.995b070d-685c-43be-bcfa-640860ef28cf@github.com>
 <HcpmEdneyRzTCtdGeryGYHERd-pfjmCeHBh1ggmwE2Q=.dde41662-94a6-485e-95bb-61aa03df78eb@github.com>
 <JVUACA5wkhF8B6Aik3q7frcOAiUc6u_10eJZv8N-NLc=.eb482f9e-771d-4e0f-b70b-48994ccb2288@github.com>
 <1KoRjoyObIS32kwNcojcLdIdUkdqpL1Pon6-IIn-H94=.a986a7bb-a14b-4df8-9ab2-9c66650e6d1b@github.com>
Message-ID: <RDeklzmM1yLA936ohNvKw8x5z0KgWpQN9aE-yg53BF4=.860e9525-81e5-40d1-b2c1-69f59e6ee53d@github.com>

On Tue, 23 Nov 2021 06:49:07 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> @dholmes-ora I have implemented your review comments.
>
> Sorry @sviswa7 but could you explain in the comment why/how `avx3_threshold` reporting zero impacts the use 64-byte load/store - the connection is not at all obvious for anyone not fully conversant with AVX3 and how it is used by the code. Thanks.

@dholmes-ora @jatin-bhateja I have added a check for FLAG_IS_DEFAULT before overriding the threshold. Let me know if this looks ok to you.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6512

From jbhateja at openjdk.java.net  Wed Nov 24 18:39:14 2021
From: jbhateja at openjdk.java.net (Jatin Bhateja)
Date: Wed, 24 Nov 2021 18:39:14 GMT
Subject: RFR: 8277617: Adjust AVX3Threshold for copy/fill stubs [v4]
In-Reply-To: <EQhMD_Jyc5voaHn-qjhTjEACFFejYz5v8bAr4ZjFw5E=.4922dd69-4510-4dab-ae58-c75d94ed4ffc@github.com>
References: <UXnYNz0bqKdWBzAjKTAw7xsgj8ilC3mN7c4s8Xsr6zw=.24b2646d-7da8-4da5-85f8-defa502d90aa@github.com>
 <EQhMD_Jyc5voaHn-qjhTjEACFFejYz5v8bAr4ZjFw5E=.4922dd69-4510-4dab-ae58-c75d94ed4ffc@github.com>
Message-ID: <mc689CqtKw_YZ6BG0_porWF-M45fVkJnZszZlNFBw1E=.749f2043-58ed-4111-b16f-a67efc32897f@github.com>

On Wed, 24 Nov 2021 16:55:32 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:

>> Currently 32-byte instructions are used for small array copy and clear. 
>> This can be optimized by using 64-byte instructions.
>> 
>> Please review.
>> 
>> Best Regards,
>> Sandhya
>
> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Override threshold only if flag is default

Thanks @sviswa7 , changes looks good to me.
Best Regards

-------------

Marked as reviewed by jbhateja (Committer).

PR: https://git.openjdk.java.net/jdk/pull/6512

From psandoz at openjdk.java.net  Wed Nov 24 19:40:31 2021
From: psandoz at openjdk.java.net (Paul Sandoz)
Date: Wed, 24 Nov 2021 19:40:31 GMT
Subject: RFR: 8277155: Compress and expand vector operations
Message-ID: <khYJPl1QcmxuX80aPl_hsapCbSnWmB7WmePbUEuZHwM=.fce04a9a-e037-4393-9da4-2a3a98301d8b@github.com>

Add two new cross-lane vector operations, `compress` and `expand`.

An example of such usage might be code that selects elements from array `a` and stores those selected elements in array `z`:


int[] a = ...;

int[] z = ...;
int ai = 0, zi = 0;
while (ai < a.length) {
    IntVector av = IntVector.fromArray(SPECIES, a, ai);
    // query over elements of vector av
    // returning a mask marking elements of interest
    VectorMask<Integer> m = interestingBits(av, ...);
    IntVector zv = av.compress(m);
    zv.intoArray(z, zi, m.compress());
    ai += SPECIES.length();
    zi += m.trueCount();
}


(There's also a more sophisticated version using `unslice` to coalesce matching elements with non-masked stores.)

Given RDP 1 for 18 is getting close, 2021/12/09, we may not get this reviewed in time and included in [JEP 417](https://openjdk.java.net/jeps/417). Still I think I think it worth starting the review now (the CSR is marked provisional).

-------------

Commit messages:
 - 8277155: Compress and expand vector operations

Changes: https://git.openjdk.java.net/jdk/pull/6545/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6545&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8277155
  Stats: 5429 lines in 105 files changed: 5315 ins; 21 del; 93 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6545.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6545/head:pull/6545

PR: https://git.openjdk.java.net/jdk/pull/6545

From dholmes at openjdk.java.net  Thu Nov 25 05:14:04 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Thu, 25 Nov 2021 05:14:04 GMT
Subject: RFR: 8277617: Adjust AVX3Threshold for copy/fill stubs [v4]
In-Reply-To: <EQhMD_Jyc5voaHn-qjhTjEACFFejYz5v8bAr4ZjFw5E=.4922dd69-4510-4dab-ae58-c75d94ed4ffc@github.com>
References: <UXnYNz0bqKdWBzAjKTAw7xsgj8ilC3mN7c4s8Xsr6zw=.24b2646d-7da8-4da5-85f8-defa502d90aa@github.com>
 <EQhMD_Jyc5voaHn-qjhTjEACFFejYz5v8bAr4ZjFw5E=.4922dd69-4510-4dab-ae58-c75d94ed4ffc@github.com>
Message-ID: <dNu_1z9fozIvODrkwyORFY0MuwEO96T3EuKWU1hltq4=.ce056f08-de91-4745-8d18-5f8ad6825824@github.com>

On Wed, 24 Nov 2021 16:55:32 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:

>> Currently 32-byte instructions are used for small array copy and clear. 
>> This can be optimized by using 64-byte instructions.
>> 
>> Please review.
>> 
>> Best Regards,
>> Sandhya
>
> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Override threshold only if flag is default

General change looks okay but I have a query below about startup overhead.

Also what testing has been done for this aside from the benchmarking? AFAICS there is only a single test that currently sets AVX3Threshold to zero so we have very little test coverage for that. With this change it will be zero all the time on some systems and so will now be exercising code paths that do not normally get executed.

Thanks,
David

src/hotspot/cpu/x86/vm_version_x86.cpp line 1893:

> 1891:     return AVX3Threshold;
> 1892:   }
> 1893: }

I am somewhat concerned about the overhead of evaluating this each time it is used. I realize these will only be startup costs while generating the stubs, not part of the stubs themselves, but it still may be a startup impact. Can you run a startup benchmark to see if there is any problem?

I was also thinking the more direct formulation would just be:
```return (is_intel_family_core() && supports_serialize() && FLAG_IS_DEFAULT(AVX3Threshold)) ? 0 : AVX3Threshold;```

-------------

PR: https://git.openjdk.java.net/jdk/pull/6512

From dholmes at openjdk.java.net  Thu Nov 25 05:18:06 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Thu, 25 Nov 2021 05:18:06 GMT
Subject: RFR: 8277797: Remove undefined/unused
 SharedRuntime::trampoline_size()
In-Reply-To: <odjG6gibrDM0PFJwVRkHuz-2ft2yg5xLb6xbrQt65r8=.1fc8e18b-0725-46d1-b434-9b6177610480@github.com>
References: <odjG6gibrDM0PFJwVRkHuz-2ft2yg5xLb6xbrQt65r8=.1fc8e18b-0725-46d1-b434-9b6177610480@github.com>
Message-ID: <_mUzi5C5miYGHHXIwY2Hv3tCeZ1676_hZH9raqFn4Gw=.2d265a64-c041-4f73-a775-76d7f09470a3@github.com>

On Wed, 24 Nov 2021 15:54:47 GMT, Zhengyu Gu <zgu at openjdk.org> wrote:

> A trivial patch to remove undefined and unused `SharedRuntime::trampoline_size()`, a leftover from [JDK-8263002](https://bugs.openjdk.java.net/browse/JDK-8263002).

Good and trivial.

Thanks,
David

-------------

Marked as reviewed by dholmes (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6540

From stuefe at openjdk.java.net  Thu Nov 25 05:28:07 2021
From: stuefe at openjdk.java.net (Thomas Stuefe)
Date: Thu, 25 Nov 2021 05:28:07 GMT
Subject: RFR: 8277797: Remove undefined/unused
 SharedRuntime::trampoline_size()
In-Reply-To: <odjG6gibrDM0PFJwVRkHuz-2ft2yg5xLb6xbrQt65r8=.1fc8e18b-0725-46d1-b434-9b6177610480@github.com>
References: <odjG6gibrDM0PFJwVRkHuz-2ft2yg5xLb6xbrQt65r8=.1fc8e18b-0725-46d1-b434-9b6177610480@github.com>
Message-ID: <WlhYj9KgOQu738PIaC-gf5Ep-jrEwjBjBIpevt6AsxM=.5b257f6a-d99a-42ae-ba74-6da2f59711e0@github.com>

On Wed, 24 Nov 2021 15:54:47 GMT, Zhengyu Gu <zgu at openjdk.org> wrote:

> A trivial patch to remove undefined and unused `SharedRuntime::trampoline_size()`, a leftover from [JDK-8263002](https://bugs.openjdk.java.net/browse/JDK-8263002).

LGTM

-------------

Marked as reviewed by stuefe (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6540

From eosterlund at openjdk.java.net  Thu Nov 25 09:54:10 2021
From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=)
Date: Thu, 25 Nov 2021 09:54:10 GMT
Subject: Integrated: 8277631: ZGC: CriticalMetaspaceAllocation asserts
In-Reply-To: <CVnXCzCvWpdyq01hRvsGJr32qKns-Oa9uyBvggkXzfk=.a2eb08dd-92b7-4849-9ae9-22df9ebd63a8@github.com>
References: <CVnXCzCvWpdyq01hRvsGJr32qKns-Oa9uyBvggkXzfk=.a2eb08dd-92b7-4849-9ae9-22df9ebd63a8@github.com>
Message-ID: <-KsYYH2luM9cu--SGpZz61ELwEU5H9BKrpSnlmVDc10=.67bd4901-9edb-4ee4-a108-098b947644f3@github.com>

On Tue, 23 Nov 2021 14:14:31 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote:

> The MetaspaceCritical_lock is a non-safepoint checking lock. That implies that the allow VM block flag is true. That implies that taking that lock takes a NoSafepointVerifier. That causes an assert to fire when MetaspaceCriticalAllocation::wait_for_purge transitions to blocked with ThreadBlockInVM while holding the lock. The fix is to move the locker inside of the ThreadBlockInVM.

This pull request has now been integrated.

Changeset: 3034ae87
Author:    Erik ?sterlund <eosterlund at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/3034ae87ce4b94c7dc40cfb5a96d6d1e87910bbf
Stats:     39 lines in 2 files changed: 30 ins; 0 del; 9 mod

8277631: ZGC: CriticalMetaspaceAllocation asserts

Reviewed-by: pliden, stefank, dholmes

-------------

PR: https://git.openjdk.java.net/jdk/pull/6520

From chagedorn at openjdk.java.net  Thu Nov 25 10:32:08 2021
From: chagedorn at openjdk.java.net (Christian Hagedorn)
Date: Thu, 25 Nov 2021 10:32:08 GMT
Subject: RFR: 8275908: Record null_check traps for calls and array_check
 traps in the interpreter
In-Reply-To: <RvOm5XIj3o-4aocezdzRYl1HObteBMTWaSWd_hcS_Qk=.c3273db0-b957-4dde-b5b9-bb37865a9d47@github.com>
References: <RvOm5XIj3o-4aocezdzRYl1HObteBMTWaSWd_hcS_Qk=.c3273db0-b957-4dde-b5b9-bb37865a9d47@github.com>
Message-ID: <8LRhErgZM4EAfnBSnynflcCchsz3z8Ao9vADyEEY16w=.389b897d-87f9-44da-bb67-ad942daaafa9@github.com>

On Wed, 24 Nov 2021 16:33:35 GMT, Volker Simonis <simonis at openjdk.org> wrote:

> `null_checks` occurring at invoke bytecodes are currently not recorded by the profiler. This leads to unnecessary uncommon traps, deoptimizations and recompilations for exceptions which already occurred before the compilation (i.e. are "hot"). This change fixes the problem in the interpreter.
> 
> `array_checks` are currently recorded as `class_checks` in the interpreter and therefore not recognized by the compiler. This again leads to uncommon traps, deoptimizations and recompilations. This change unifies the handling of `array_checks` in the interpreter and compiler and prevents unnecessary recompilation.
> 
> The test is a stripped down version of a test which was developed for [JDK-8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow](https://bugs.openjdk.java.net/browse/JDK-8273563) (still [under review](https://github.com/openjdk/jdk/pull/5488)). It introduces an extension to the Whitebox API to expose the decompile, deopt and trap counters which is also required for testing [JDK-8273563](https://bugs.openjdk.java.net/browse/JDK-8273563). I think (and hope) it will also be helpful for others in the future.

Otherwise, it looks good to me! But would be good to get a second review for it.

Nice test!

src/hotspot/share/interpreter/interpreterRuntime.cpp line 834:

> 832:                                  THREAD);
> 833: 
> 834:     if(HAS_PENDING_EXCEPTION) {

Missing space

src/hotspot/share/opto/parseHelper.cpp line 301:

> 299: 
> 300: #endif
> 301: 

This line was probably deleted by mistake?

src/hotspot/share/runtime/deoptimization.hpp line 436:

> 434: 
> 435:   static jint total_deoptimization_count();
> 436:   static jint deoptimization_count(const char *reason_str, const char *action_str);

Nit: Asterisk should be at the type: `const char* reason_str`

test/hotspot/jtreg/compiler/exceptions/OptimizeImplicitExceptions.java line 141:

> 139:     private static void printCounters(TestMode testMode, ImplicitException impExcp, Method throwImplicitException_m, int invocations) {
> 140:         System.out.println("testMode=" + testMode + " exception=" + impExcp + " invocations=" + invocations + "\n" +
> 141:                            "decompilecount=" + WB.getMethodDecompileCount(throwImplicitException_m) + " " +

`getMethodDecompileCount()` seems only to be used to print the counters here but is not verified otherwise. If it is is not too complicated, could a specific test for it be added as well?

test/hotspot/jtreg/compiler/exceptions/OptimizeImplicitExceptions.java line 166:

> 164:     // Checks after the JIT-compiled test method has been invoked 'PerBytecodeTrapLimit' times.
> 165:     private static void checkTwo(TestMode testMode, ImplicitException impExcp, Exception ex, Method throwImplicitException_m, int invocations) {
> 166: 

If I see that correctly, `checkTwo`, `checkThree` and `checkFour` only differ in whether using `PerBytecodeTrapLimit` or `Tier0InvokeNotifyFreq` and could be merged together (if the omitted assertions in `checkThree` and `checkFour` for the exception message compared to `checkTwo` are valid to be added again).

test/hotspot/jtreg/compiler/exceptions/OptimizeImplicitExceptions.java line 287:

> 285:                 checkTwo(testMode, impExcp, lastException, throwImplicitException_m, invocations);
> 286: 
> 287:                 // Invoke compiled (or interpreted if JDK-8275908 isn't fixed) code 'Tier0InvokeNotifyFreq' times.

As this is the fix for JDK-8275908, you can remove the comment about it :-) It's probably a leftover from JDK-
8273563.

test/lib/sun/hotspot/WhiteBox.java line 321:

> 319:     return getMethodCompilationLevel0(method, isOsr);
> 320:   }
> 321:   public         int     getMethodDecompileCount(Executable method) {

As this class is marked as `@Deprecated`, do we need to add the methods here as well?

-------------

PR: https://git.openjdk.java.net/jdk/pull/6541

From duke at openjdk.java.net  Thu Nov 25 10:50:10 2021
From: duke at openjdk.java.net (duke)
Date: Thu, 25 Nov 2021 10:50:10 GMT
Subject: Withdrawn: 8273392: Improve usability of stack-less exceptions due to
 -XX:+OmitStackTraceInFastThrow
In-Reply-To: <sAGoUJfd3NgyqGm3uhT3sqmkPLWZyyuUTfs2QkXECok=.0cacde1b-47af-4256-960f-779a4329f790@github.com>
References: <sAGoUJfd3NgyqGm3uhT3sqmkPLWZyyuUTfs2QkXECok=.0cacde1b-47af-4256-960f-779a4329f790@github.com>
Message-ID: <SY1z7vKeInk-nkXNw0l-NJETydLC1_dokwLl29Mu8CI=.cc74efc4-7e3e-470d-95ba-941760d00d49@github.com>

On Tue, 7 Sep 2021 15:25:46 GMT, Volker Simonis <simonis at openjdk.org> wrote:

> If running with `-XX:+OmitStackTraceInFastThrow` (which is the default) C2 will optimize certain "hot" implicit exceptions (i.e. AIOOBE, NullPointerExceptions,..) and replace them by a static, pre-allocated exception without any stacktrace.
> 
> However, we can actually do better. Instead of using a single, pre-allocated exception object for all methods we can let the compiler allocate specific exceptions for each compilation unit (i.e. nmethod) and fill them with at least one stack frame with the method /line-number information of the currently compiled method. If the method in question is being inlined (which often happens), we can add stackframes for all callers up to the inlining depth of the method in question.
> 
> For the attached JTreg test, we get the following exception in interpreter mode:
> 
> java.lang.NullPointerException: Cannot read the array length because "<parameter2>" is null
>         at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76)
>         at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95)
>         at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99)
>         at compiler.exceptions.StackFrameInFastThrow.main(StackFrameInFastThrow.java:233)
> 
> Once the method gets compiled with `-XX:+OmitStackTraceInFastThrow` the same exception will look as follows:
> 
> java.lang.NullPointerException
> 
> After this change, if `StackFrameInFastThrow.throwImplicitException()` will be compiled stand alone, we will get:
> 
> java.lang.NullPointerException
>         at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76)
> 
> and if `StackFrameInFastThrow.throwImplicitException()` will be inlined into `level2()` and `level2()` into `level1()` we will get the following exception (altough we're still running with `-XX:+OmitStackTraceInFastThrow`):
> 
> java.lang.NullPointerException
>         at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76)
>         at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95)
>         at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99)
> 
> The new functionality is guarded by `-XX:+/-StackFrameInFastThrow`, but switched on by default (I'll create a CSR for the new option once reviewers are comfortable with the change). Notice that the optimization comes at no run-time costs because all the extra work will be done at compile time.
> 
> ## Implementation details
> 
> - Already the current implementation of `-XX:+OmitStackTraceInFastThrow` potentially lazy-allocates the empty singleton exceptions like AIOOBE in `ciEnv::ArrayStoreException_instance()`. With this change, if running with `-XX:+StackFrameInFastThrow` we will always allocate new exception objects and populate them with the stack frames which are statically available at compile time (see `java_lang_Throwable::fill_in_stack_trace_of_implicit_exception()`).
> - Because nmethods don't act as strong GC roots, we have to create a global JNI handle for every newly generated exception to prevent GC from collecting them.
> - In order to avoid a memory leak we have to release these global JNI handles once a nmethod gets unloaded. In order to achieve this, I've added a new section "implicit exceptions" to the nmethod which holds these JNI handles.
> - While adding the new  "implicit exceptions" section to the corresponding stats (`print_nmethod_stats()` and printing routines (`nmethod::print()`) I realized that a previous change ([JDK-8254231: Implementation of Foreign Linker API (Incubator)](https://bugs.openjdk.java.net/browse/JDK-8254231)) had already introduced a new nmethod section ("native invokers") but missed to add it to the corresponding stats and printing routines so I've added that section as well.
> - The `#ifdef COMPILER2` guards are only required to not break the `zero`/`minimal` builds.
> - The JTreg test is using `-XX:PerMethodTrapLimit=0` to handle all implicit exceptions as "hot". This makes the test simpler and at the same time provokes the allocation of more implicit exceptions.
> - Manually verified that the created Exception objects are freed by GC once the corresponding nmethods have been flushed.
> - Manual "stress" test with a very small heap and continuous recompilation of methods with explicit exceptions to provoke GCs during compilation didn't reveal any issues.

This pull request has been closed without being integrated.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5392

From simonis at openjdk.java.net  Thu Nov 25 10:57:11 2021
From: simonis at openjdk.java.net (Volker Simonis)
Date: Thu, 25 Nov 2021 10:57:11 GMT
Subject: RFR: 8275908: Record null_check traps for calls and array_check
 traps in the interpreter
In-Reply-To: <8LRhErgZM4EAfnBSnynflcCchsz3z8Ao9vADyEEY16w=.389b897d-87f9-44da-bb67-ad942daaafa9@github.com>
References: <RvOm5XIj3o-4aocezdzRYl1HObteBMTWaSWd_hcS_Qk=.c3273db0-b957-4dde-b5b9-bb37865a9d47@github.com>
 <8LRhErgZM4EAfnBSnynflcCchsz3z8Ao9vADyEEY16w=.389b897d-87f9-44da-bb67-ad942daaafa9@github.com>
Message-ID: <B4rDrP6fJKX2xiZ6BiiGbTaHJGKorfq1Qq4XpdoRxY8=.01d432b9-944b-47e0-8bc1-9c187fb8971b@github.com>

On Thu, 25 Nov 2021 09:12:42 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

>> `null_checks` occurring at invoke bytecodes are currently not recorded by the profiler. This leads to unnecessary uncommon traps, deoptimizations and recompilations for exceptions which already occurred before the compilation (i.e. are "hot"). This change fixes the problem in the interpreter.
>> 
>> `array_checks` are currently recorded as `class_checks` in the interpreter and therefore not recognized by the compiler. This again leads to uncommon traps, deoptimizations and recompilations. This change unifies the handling of `array_checks` in the interpreter and compiler and prevents unnecessary recompilation.
>> 
>> The test is a stripped down version of a test which was developed for [JDK-8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow](https://bugs.openjdk.java.net/browse/JDK-8273563) (still [under review](https://github.com/openjdk/jdk/pull/5488)). It introduces an extension to the Whitebox API to expose the decompile, deopt and trap counters which is also required for testing [JDK-8273563](https://bugs.openjdk.java.net/browse/JDK-8273563). I think (and hope) it will also be helpful for others in the future.
>
> src/hotspot/share/interpreter/interpreterRuntime.cpp line 834:
> 
>> 832:                                  THREAD);
>> 833: 
>> 834:     if(HAS_PENDING_EXCEPTION) {
> 
> Missing space

Fixed

> src/hotspot/share/opto/parseHelper.cpp line 301:
> 
>> 299: 
>> 300: #endif
>> 301: 
> 
> This line was probably deleted by mistake?

The line was actually deleted by my editor. I first wondered myself, but the file had an extra empty line et the end which I think we discourage. So at the end I think my editor was right :)
But as there remained no other changes in that file except the deleted empty line I agree that it looks strange now and I'll restore the file to its initial state.

> src/hotspot/share/runtime/deoptimization.hpp line 436:
> 
>> 434: 
>> 435:   static jint total_deoptimization_count();
>> 436:   static jint deoptimization_count(const char *reason_str, const char *action_str);
> 
> Nit: Asterisk should be at the type: `const char* reason_str`

Fixed

-------------

PR: https://git.openjdk.java.net/jdk/pull/6541

From mdoerr at openjdk.java.net  Thu Nov 25 11:01:06 2021
From: mdoerr at openjdk.java.net (Martin Doerr)
Date: Thu, 25 Nov 2021 11:01:06 GMT
Subject: RFR: 8275908: Record null_check traps for calls and array_check
 traps in the interpreter
In-Reply-To: <RvOm5XIj3o-4aocezdzRYl1HObteBMTWaSWd_hcS_Qk=.c3273db0-b957-4dde-b5b9-bb37865a9d47@github.com>
References: <RvOm5XIj3o-4aocezdzRYl1HObteBMTWaSWd_hcS_Qk=.c3273db0-b957-4dde-b5b9-bb37865a9d47@github.com>
Message-ID: <wLF7bn_MO2UnSDUBvzCzUG4-QDaWWoCEJTe2erqAFAU=.d73c9449-857a-4d75-bd1f-eec1f35ec4fd@github.com>

On Wed, 24 Nov 2021 16:33:35 GMT, Volker Simonis <simonis at openjdk.org> wrote:

> `null_checks` occurring at invoke bytecodes are currently not recorded by the profiler. This leads to unnecessary uncommon traps, deoptimizations and recompilations for exceptions which already occurred before the compilation (i.e. are "hot"). This change fixes the problem in the interpreter.
> 
> `array_checks` are currently recorded as `class_checks` in the interpreter and therefore not recognized by the compiler. This again leads to uncommon traps, deoptimizations and recompilations. This change unifies the handling of `array_checks` in the interpreter and compiler and prevents unnecessary recompilation.
> 
> The test is a stripped down version of a test which was developed for [JDK-8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow](https://bugs.openjdk.java.net/browse/JDK-8273563) (still [under review](https://github.com/openjdk/jdk/pull/5488)). It introduces an extension to the Whitebox API to expose the decompile, deopt and trap counters which is also required for testing [JDK-8273563](https://bugs.openjdk.java.net/browse/JDK-8273563). I think (and hope) it will also be helpful for others in the future.

Nice change! Looks good to me besides what was already said.

src/hotspot/share/opto/graphKit.cpp line 3342:

> 3340:         // A non-null value will always produce an exception.
> 3341:         if (!objtp->maybe_null()) {
> 3342:           bool aastore = (java_bc() == Bytecodes::_aastore);

better: is_aastore

-------------

PR: https://git.openjdk.java.net/jdk/pull/6541

From simonis at openjdk.java.net  Thu Nov 25 11:01:07 2021
From: simonis at openjdk.java.net (Volker Simonis)
Date: Thu, 25 Nov 2021 11:01:07 GMT
Subject: RFR: 8275908: Record null_check traps for calls and array_check
 traps in the interpreter
In-Reply-To: <8LRhErgZM4EAfnBSnynflcCchsz3z8Ao9vADyEEY16w=.389b897d-87f9-44da-bb67-ad942daaafa9@github.com>
References: <RvOm5XIj3o-4aocezdzRYl1HObteBMTWaSWd_hcS_Qk=.c3273db0-b957-4dde-b5b9-bb37865a9d47@github.com>
 <8LRhErgZM4EAfnBSnynflcCchsz3z8Ao9vADyEEY16w=.389b897d-87f9-44da-bb67-ad942daaafa9@github.com>
Message-ID: <9QL7aEAuYwXuPda-c9w-jJwZ787hAIIj0OBsf5c6K8I=.c3c158fb-a62b-450a-9676-9e072631b585@github.com>

On Thu, 25 Nov 2021 09:58:19 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

>> `null_checks` occurring at invoke bytecodes are currently not recorded by the profiler. This leads to unnecessary uncommon traps, deoptimizations and recompilations for exceptions which already occurred before the compilation (i.e. are "hot"). This change fixes the problem in the interpreter.
>> 
>> `array_checks` are currently recorded as `class_checks` in the interpreter and therefore not recognized by the compiler. This again leads to uncommon traps, deoptimizations and recompilations. This change unifies the handling of `array_checks` in the interpreter and compiler and prevents unnecessary recompilation.
>> 
>> The test is a stripped down version of a test which was developed for [JDK-8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow](https://bugs.openjdk.java.net/browse/JDK-8273563) (still [under review](https://github.com/openjdk/jdk/pull/5488)). It introduces an extension to the Whitebox API to expose the decompile, deopt and trap counters which is also required for testing [JDK-8273563](https://bugs.openjdk.java.net/browse/JDK-8273563). I think (and hope) it will also be helpful for others in the future.
>
> test/hotspot/jtreg/compiler/exceptions/OptimizeImplicitExceptions.java line 287:
> 
>> 285:                 checkTwo(testMode, impExcp, lastException, throwImplicitException_m, invocations);
>> 286: 
>> 287:                 // Invoke compiled (or interpreted if JDK-8275908 isn't fixed) code 'Tier0InvokeNotifyFreq' times.
> 
> As this is the fix for JDK-8275908, you can remove the comment about it :-) It's probably a leftover from JDK-
> 8273563.

Right :)
Fixed.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6541

From simonis at openjdk.java.net  Thu Nov 25 11:06:06 2021
From: simonis at openjdk.java.net (Volker Simonis)
Date: Thu, 25 Nov 2021 11:06:06 GMT
Subject: RFR: 8275908: Record null_check traps for calls and array_check
 traps in the interpreter
In-Reply-To: <wLF7bn_MO2UnSDUBvzCzUG4-QDaWWoCEJTe2erqAFAU=.d73c9449-857a-4d75-bd1f-eec1f35ec4fd@github.com>
References: <RvOm5XIj3o-4aocezdzRYl1HObteBMTWaSWd_hcS_Qk=.c3273db0-b957-4dde-b5b9-bb37865a9d47@github.com>
 <wLF7bn_MO2UnSDUBvzCzUG4-QDaWWoCEJTe2erqAFAU=.d73c9449-857a-4d75-bd1f-eec1f35ec4fd@github.com>
Message-ID: <YygqV-9eq-hhg7Kvy-R_R5-_9QIPVS4hbV77eO8u66o=.125ddcce-a63e-4346-882c-659312290dcc@github.com>

On Thu, 25 Nov 2021 10:54:47 GMT, Martin Doerr <mdoerr at openjdk.org> wrote:

>> `null_checks` occurring at invoke bytecodes are currently not recorded by the profiler. This leads to unnecessary uncommon traps, deoptimizations and recompilations for exceptions which already occurred before the compilation (i.e. are "hot"). This change fixes the problem in the interpreter.
>> 
>> `array_checks` are currently recorded as `class_checks` in the interpreter and therefore not recognized by the compiler. This again leads to uncommon traps, deoptimizations and recompilations. This change unifies the handling of `array_checks` in the interpreter and compiler and prevents unnecessary recompilation.
>> 
>> The test is a stripped down version of a test which was developed for [JDK-8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow](https://bugs.openjdk.java.net/browse/JDK-8273563) (still [under review](https://github.com/openjdk/jdk/pull/5488)). It introduces an extension to the Whitebox API to expose the decompile, deopt and trap counters which is also required for testing [JDK-8273563](https://bugs.openjdk.java.net/browse/JDK-8273563). I think (and hope) it will also be helpful for others in the future.
>
> src/hotspot/share/opto/graphKit.cpp line 3342:
> 
>> 3340:         // A non-null value will always produce an exception.
>> 3341:         if (!objtp->maybe_null()) {
>> 3342:           bool aastore = (java_bc() == Bytecodes::_aastore);
> 
> better: is_aastore

Fixed both occurrences.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6541

From simonis at openjdk.java.net  Thu Nov 25 11:11:08 2021
From: simonis at openjdk.java.net (Volker Simonis)
Date: Thu, 25 Nov 2021 11:11:08 GMT
Subject: RFR: 8275908: Record null_check traps for calls and array_check
 traps in the interpreter
In-Reply-To: <8LRhErgZM4EAfnBSnynflcCchsz3z8Ao9vADyEEY16w=.389b897d-87f9-44da-bb67-ad942daaafa9@github.com>
References: <RvOm5XIj3o-4aocezdzRYl1HObteBMTWaSWd_hcS_Qk=.c3273db0-b957-4dde-b5b9-bb37865a9d47@github.com>
 <8LRhErgZM4EAfnBSnynflcCchsz3z8Ao9vADyEEY16w=.389b897d-87f9-44da-bb67-ad942daaafa9@github.com>
Message-ID: <Lus0v5j8yR_FC64DZah-2Ob_O4sKKjWYsLL6-V7ldXY=.c6e3f097-229f-4ad9-a388-f157cec1bc21@github.com>

On Thu, 25 Nov 2021 10:14:42 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

>> `null_checks` occurring at invoke bytecodes are currently not recorded by the profiler. This leads to unnecessary uncommon traps, deoptimizations and recompilations for exceptions which already occurred before the compilation (i.e. are "hot"). This change fixes the problem in the interpreter.
>> 
>> `array_checks` are currently recorded as `class_checks` in the interpreter and therefore not recognized by the compiler. This again leads to uncommon traps, deoptimizations and recompilations. This change unifies the handling of `array_checks` in the interpreter and compiler and prevents unnecessary recompilation.
>> 
>> The test is a stripped down version of a test which was developed for [JDK-8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow](https://bugs.openjdk.java.net/browse/JDK-8273563) (still [under review](https://github.com/openjdk/jdk/pull/5488)). It introduces an extension to the Whitebox API to expose the decompile, deopt and trap counters which is also required for testing [JDK-8273563](https://bugs.openjdk.java.net/browse/JDK-8273563). I think (and hope) it will also be helpful for others in the future.
>
> test/lib/sun/hotspot/WhiteBox.java line 321:
> 
>> 319:     return getMethodCompilationLevel0(method, isOsr);
>> 320:   }
>> 321:   public         int     getMethodDecompileCount(Executable method) {
> 
> As this class is marked as `@Deprecated`, do we need to add the methods here as well?

Unfortunately yes :(
I first didn't but there are still tests using it and they'll get warning otherwise. To make matters worse, some of them parse the output, so I gave up and added the methods to `sun/hotspot/WhiteBox.java` as well.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6541

From simonis at openjdk.java.net  Thu Nov 25 17:26:03 2021
From: simonis at openjdk.java.net (Volker Simonis)
Date: Thu, 25 Nov 2021 17:26:03 GMT
Subject: RFR: 8275908: Record null_check traps for calls and array_check
 traps in the interpreter
In-Reply-To: <8LRhErgZM4EAfnBSnynflcCchsz3z8Ao9vADyEEY16w=.389b897d-87f9-44da-bb67-ad942daaafa9@github.com>
References: <RvOm5XIj3o-4aocezdzRYl1HObteBMTWaSWd_hcS_Qk=.c3273db0-b957-4dde-b5b9-bb37865a9d47@github.com>
 <8LRhErgZM4EAfnBSnynflcCchsz3z8Ao9vADyEEY16w=.389b897d-87f9-44da-bb67-ad942daaafa9@github.com>
Message-ID: <xFoU-SmXgBCukn6rRHW3jZ_7EfPXfvLRdPcZlsZ4J8Y=.ee1238d3-6004-4581-b11b-ab0216542558@github.com>

On Thu, 25 Nov 2021 09:29:12 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

>> `null_checks` occurring at invoke bytecodes are currently not recorded by the profiler. This leads to unnecessary uncommon traps, deoptimizations and recompilations for exceptions which already occurred before the compilation (i.e. are "hot"). This change fixes the problem in the interpreter.
>> 
>> `array_checks` are currently recorded as `class_checks` in the interpreter and therefore not recognized by the compiler. This again leads to uncommon traps, deoptimizations and recompilations. This change unifies the handling of `array_checks` in the interpreter and compiler and prevents unnecessary recompilation.
>> 
>> The test is a stripped down version of a test which was developed for [JDK-8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow](https://bugs.openjdk.java.net/browse/JDK-8273563) (still [under review](https://github.com/openjdk/jdk/pull/5488)). It introduces an extension to the Whitebox API to expose the decompile, deopt and trap counters which is also required for testing [JDK-8273563](https://bugs.openjdk.java.net/browse/JDK-8273563). I think (and hope) it will also be helpful for others in the future.
>
> test/hotspot/jtreg/compiler/exceptions/OptimizeImplicitExceptions.java line 141:
> 
>> 139:     private static void printCounters(TestMode testMode, ImplicitException impExcp, Method throwImplicitException_m, int invocations) {
>> 140:         System.out.println("testMode=" + testMode + " exception=" + impExcp + " invocations=" + invocations + "\n" +
>> 141:                            "decompilecount=" + WB.getMethodDecompileCount(throwImplicitException_m) + " " +
> 
> `getMethodDecompileCount()` seems only to be used to print the counters here but is not verified otherwise. If it is is not too complicated, could a specific test for it be added as well?

Hm, it was used in JDK-8275908 before this fix. Now, with this fix we don't get any recompiles any more :)

The tests is already quite big so I've added a separate test `test/hotspot/jtreg/compiler/uncommontrap/Decompile.java` to verify the new WhiteBox methods introduced by  this change.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6541

From simonis at openjdk.java.net  Thu Nov 25 17:45:08 2021
From: simonis at openjdk.java.net (Volker Simonis)
Date: Thu, 25 Nov 2021 17:45:08 GMT
Subject: RFR: 8275908: Record null_check traps for calls and array_check
 traps in the interpreter
In-Reply-To: <8LRhErgZM4EAfnBSnynflcCchsz3z8Ao9vADyEEY16w=.389b897d-87f9-44da-bb67-ad942daaafa9@github.com>
References: <RvOm5XIj3o-4aocezdzRYl1HObteBMTWaSWd_hcS_Qk=.c3273db0-b957-4dde-b5b9-bb37865a9d47@github.com>
 <8LRhErgZM4EAfnBSnynflcCchsz3z8Ao9vADyEEY16w=.389b897d-87f9-44da-bb67-ad942daaafa9@github.com>
Message-ID: <6SeGNDIxY4gsishHn0dj2OZVwqH_jiDgxtC0HeviSTs=.13441674-bea2-48d5-9cc0-5b5bd914af18@github.com>

On Thu, 25 Nov 2021 09:45:56 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

>> `null_checks` occurring at invoke bytecodes are currently not recorded by the profiler. This leads to unnecessary uncommon traps, deoptimizations and recompilations for exceptions which already occurred before the compilation (i.e. are "hot"). This change fixes the problem in the interpreter.
>> 
>> `array_checks` are currently recorded as `class_checks` in the interpreter and therefore not recognized by the compiler. This again leads to uncommon traps, deoptimizations and recompilations. This change unifies the handling of `array_checks` in the interpreter and compiler and prevents unnecessary recompilation.
>> 
>> The test is a stripped down version of a test which was developed for [JDK-8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow](https://bugs.openjdk.java.net/browse/JDK-8273563) (still [under review](https://github.com/openjdk/jdk/pull/5488)). It introduces an extension to the Whitebox API to expose the decompile, deopt and trap counters which is also required for testing [JDK-8273563](https://bugs.openjdk.java.net/browse/JDK-8273563). I think (and hope) it will also be helpful for others in the future.
>
> test/hotspot/jtreg/compiler/exceptions/OptimizeImplicitExceptions.java line 166:
> 
>> 164:     // Checks after the JIT-compiled test method has been invoked 'PerBytecodeTrapLimit' times.
>> 165:     private static void checkTwo(TestMode testMode, ImplicitException impExcp, Exception ex, Method throwImplicitException_m, int invocations) {
>> 166: 
> 
> If I see that correctly, `checkTwo`, `checkThree` and `checkFour` only differ in whether using `PerBytecodeTrapLimit` or `Tier0InvokeNotifyFreq` and could be merged together (if the omitted assertions in `checkThree` and `checkFour` for the exception message compared to `checkTwo` are valid to be added again).

That's a good point, now that the tests have got a little simpler :)

I'll have to add more cases for JDK-8273563 but I think it's just fair to leave it for that change.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6541

From simonis at openjdk.java.net  Thu Nov 25 17:51:45 2021
From: simonis at openjdk.java.net (Volker Simonis)
Date: Thu, 25 Nov 2021 17:51:45 GMT
Subject: RFR: 8275908: Record null_check traps for calls and array_check
 traps in the interpreter [v2]
In-Reply-To: <RvOm5XIj3o-4aocezdzRYl1HObteBMTWaSWd_hcS_Qk=.c3273db0-b957-4dde-b5b9-bb37865a9d47@github.com>
References: <RvOm5XIj3o-4aocezdzRYl1HObteBMTWaSWd_hcS_Qk=.c3273db0-b957-4dde-b5b9-bb37865a9d47@github.com>
Message-ID: <1sOmEVcu-uqZ5EIzL95kbSfMUiioSEwVXRkYEZSbYMs=.7805f384-e601-43e5-9ada-b070475bedf5@github.com>

> `null_checks` occurring at invoke bytecodes are currently not recorded by the profiler. This leads to unnecessary uncommon traps, deoptimizations and recompilations for exceptions which already occurred before the compilation (i.e. are "hot"). This change fixes the problem in the interpreter.
> 
> `array_checks` are currently recorded as `class_checks` in the interpreter and therefore not recognized by the compiler. This again leads to uncommon traps, deoptimizations and recompilations. This change unifies the handling of `array_checks` in the interpreter and compiler and prevents unnecessary recompilation.
> 
> The test is a stripped down version of a test which was developed for [JDK-8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow](https://bugs.openjdk.java.net/browse/JDK-8273563) (still [under review](https://github.com/openjdk/jdk/pull/5488)). It introduces an extension to the Whitebox API to expose the decompile, deopt and trap counters which is also required for testing [JDK-8273563](https://bugs.openjdk.java.net/browse/JDK-8273563). I think (and hope) it will also be helpful for others in the future.

Volker Simonis has updated the pull request incrementally with one additional commit since the last revision:

  Simplified test OptimizeImplicitExceptions.java and added Decompile.java test. Includes minor fixes requested by Martin and Christian

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6541/files
  - new: https://git.openjdk.java.net/jdk/pull/6541/files/6d12c341..58a107db

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6541&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6541&range=00-01

  Stats: 230 lines in 6 files changed: 164 ins; 48 del; 18 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6541.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6541/head:pull/6541

PR: https://git.openjdk.java.net/jdk/pull/6541

From simonis at openjdk.java.net  Thu Nov 25 17:55:07 2021
From: simonis at openjdk.java.net (Volker Simonis)
Date: Thu, 25 Nov 2021 17:55:07 GMT
Subject: RFR: 8275908: Record null_check traps for calls and array_check
 traps in the interpreter [v2]
In-Reply-To: <wLF7bn_MO2UnSDUBvzCzUG4-QDaWWoCEJTe2erqAFAU=.d73c9449-857a-4d75-bd1f-eec1f35ec4fd@github.com>
References: <RvOm5XIj3o-4aocezdzRYl1HObteBMTWaSWd_hcS_Qk=.c3273db0-b957-4dde-b5b9-bb37865a9d47@github.com>
 <wLF7bn_MO2UnSDUBvzCzUG4-QDaWWoCEJTe2erqAFAU=.d73c9449-857a-4d75-bd1f-eec1f35ec4fd@github.com>
Message-ID: <41ebcqQZPbOaMQk6xOSHbRKEbKRPrWRzvZZ-LAawZcM=.53f0048e-6d5c-46fb-9f64-1caa9b8906e9@github.com>

On Thu, 25 Nov 2021 10:58:13 GMT, Martin Doerr <mdoerr at openjdk.org> wrote:

>> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Simplified test OptimizeImplicitExceptions.java and added Decompile.java test. Includes minor fixes requested by Martin and Christian
>
> Nice change! Looks good to me besides what was already said.

@TheRealMDoerr, @chhagedorn thanks a lot for the quick reviews.

I hope I could address all your concerns and suggestions with my latest push.

@chhagedorn: I'm especially happy that you like the tests. As all too often the effort for a good test is much higher than for the fix itself :)

-------------

PR: https://git.openjdk.java.net/jdk/pull/6541

From duke at openjdk.java.net  Fri Nov 26 07:41:24 2021
From: duke at openjdk.java.net (Vishal Chand)
Date: Fri, 26 Nov 2021 07:41:24 GMT
Subject: RFR: 8277372: Add getters for BOT and card table members
Message-ID: <klzlVqf82kg2njVyRdKpzmRWN6Jl5E2n-jknOw7sEEw=.a178e411-bdd8-43a8-8e62-c7f9a0921241@github.com>

Changed the visibility, added getters and refactored the following:

1. Card Table Members
2. BOT members
3. ObjectStartArray block members

-------------

Commit messages:
 - Initial patch

Changes: https://git.openjdk.java.net/jdk/pull/6570/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6570&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8277372
  Stats: 199 lines in 31 files changed: 40 ins; 11 del; 148 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6570.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6570/head:pull/6570

PR: https://git.openjdk.java.net/jdk/pull/6570

From chagedorn at openjdk.java.net  Fri Nov 26 09:23:18 2021
From: chagedorn at openjdk.java.net (Christian Hagedorn)
Date: Fri, 26 Nov 2021 09:23:18 GMT
Subject: RFR: 8275908: Record null_check traps for calls and array_check
 traps in the interpreter [v2]
In-Reply-To: <1sOmEVcu-uqZ5EIzL95kbSfMUiioSEwVXRkYEZSbYMs=.7805f384-e601-43e5-9ada-b070475bedf5@github.com>
References: <RvOm5XIj3o-4aocezdzRYl1HObteBMTWaSWd_hcS_Qk=.c3273db0-b957-4dde-b5b9-bb37865a9d47@github.com>
 <1sOmEVcu-uqZ5EIzL95kbSfMUiioSEwVXRkYEZSbYMs=.7805f384-e601-43e5-9ada-b070475bedf5@github.com>
Message-ID: <XVCVSAbT9qFvFkPAPFV665UYtO7_O2lwqYX_Q9OwIAY=.8a307fce-e815-49e6-b70c-0dfb32e06a31@github.com>

On Thu, 25 Nov 2021 17:51:45 GMT, Volker Simonis <simonis at openjdk.org> wrote:

>> `null_checks` occurring at invoke bytecodes are currently not recorded by the profiler. This leads to unnecessary uncommon traps, deoptimizations and recompilations for exceptions which already occurred before the compilation (i.e. are "hot"). This change fixes the problem in the interpreter.
>> 
>> `array_checks` are currently recorded as `class_checks` in the interpreter and therefore not recognized by the compiler. This again leads to uncommon traps, deoptimizations and recompilations. This change unifies the handling of `array_checks` in the interpreter and compiler and prevents unnecessary recompilation.
>> 
>> The test is a stripped down version of a test which was developed for [JDK-8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow](https://bugs.openjdk.java.net/browse/JDK-8273563) (still [under review](https://github.com/openjdk/jdk/pull/5488)). It introduces an extension to the Whitebox API to expose the decompile, deopt and trap counters which is also required for testing [JDK-8273563](https://bugs.openjdk.java.net/browse/JDK-8273563). I think (and hope) it will also be helpful for others in the future.
>
> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Simplified test OptimizeImplicitExceptions.java and added Decompile.java test. Includes minor fixes requested by Martin and Christian

Thanks for doing the changes, they look good to me!

> @chhagedorn: I'm especially happy that you like the tests. As all too often the effort for a good test is much higher than for the fix itself :)

Yes, I couldn't agree more to that. It's sometimes underestimated how much time that is needed to come up with a good test. So, I always appreciate the extra effort :)

-------------

Marked as reviewed by chagedorn (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6541

From chagedorn at openjdk.java.net  Fri Nov 26 09:23:19 2021
From: chagedorn at openjdk.java.net (Christian Hagedorn)
Date: Fri, 26 Nov 2021 09:23:19 GMT
Subject: RFR: 8275908: Record null_check traps for calls and array_check
 traps in the interpreter [v2]
In-Reply-To: <B4rDrP6fJKX2xiZ6BiiGbTaHJGKorfq1Qq4XpdoRxY8=.01d432b9-944b-47e0-8bc1-9c187fb8971b@github.com>
References: <RvOm5XIj3o-4aocezdzRYl1HObteBMTWaSWd_hcS_Qk=.c3273db0-b957-4dde-b5b9-bb37865a9d47@github.com>
 <8LRhErgZM4EAfnBSnynflcCchsz3z8Ao9vADyEEY16w=.389b897d-87f9-44da-bb67-ad942daaafa9@github.com>
 <B4rDrP6fJKX2xiZ6BiiGbTaHJGKorfq1Qq4XpdoRxY8=.01d432b9-944b-47e0-8bc1-9c187fb8971b@github.com>
Message-ID: <ezmWyMRxdg1jhncQFrjF3cOiw5sUgnyysPHxBbP0Jdk=.ad18c0da-8a4e-44b2-8753-4ad110d4230e@github.com>

On Thu, 25 Nov 2021 10:52:02 GMT, Volker Simonis <simonis at openjdk.org> wrote:

>> src/hotspot/share/opto/parseHelper.cpp line 301:
>> 
>>> 299: 
>>> 300: #endif
>>> 301: 
>> 
>> This line was probably deleted by mistake?
>
> The line was actually deleted by my editor. I first wondered myself, but the file had an extra empty line et the end which I think we discourage. So at the end I think my editor was right :)
> But as there remained no other changes in that file except the deleted empty line I agree that it looks strange now and I'll restore the file to its initial state.

I agree that we should then remove this line to follow the convention but as you've said, it might not be justified to fix these things in otherwise untouched files. Maybe a thing to fix for the next one who edits this file ;)

>> test/hotspot/jtreg/compiler/exceptions/OptimizeImplicitExceptions.java line 141:
>> 
>>> 139:     private static void printCounters(TestMode testMode, ImplicitException impExcp, Method throwImplicitException_m, int invocations) {
>>> 140:         System.out.println("testMode=" + testMode + " exception=" + impExcp + " invocations=" + invocations + "\n" +
>>> 141:                            "decompilecount=" + WB.getMethodDecompileCount(throwImplicitException_m) + " " +
>> 
>> `getMethodDecompileCount()` seems only to be used to print the counters here but is not verified otherwise. If it is is not too complicated, could a specific test for it be added as well?
>
> Hm, it was used in JDK-8275908 before this fix. Now, with this fix we don't get any recompiles any more :)
> 
> The tests is already quite big so I've added a separate test `test/hotspot/jtreg/compiler/uncommontrap/Decompile.java` to verify the new WhiteBox methods introduced by  this change.

Thanks for the effort to add an extra extensive test for it, it looks good!

>> test/hotspot/jtreg/compiler/exceptions/OptimizeImplicitExceptions.java line 166:
>> 
>>> 164:     // Checks after the JIT-compiled test method has been invoked 'PerBytecodeTrapLimit' times.
>>> 165:     private static void checkTwo(TestMode testMode, ImplicitException impExcp, Exception ex, Method throwImplicitException_m, int invocations) {
>>> 166: 
>> 
>> If I see that correctly, `checkTwo`, `checkThree` and `checkFour` only differ in whether using `PerBytecodeTrapLimit` or `Tier0InvokeNotifyFreq` and could be merged together (if the omitted assertions in `checkThree` and `checkFour` for the exception message compared to `checkTwo` are valid to be added again).
>
> That's a good point, now that the tests have got a little simpler :)
> 
> I'll have to add more cases for JDK-8273563 but I think it's just fair to leave it for that change.

That sounds good.

>> test/lib/sun/hotspot/WhiteBox.java line 321:
>> 
>>> 319:     return getMethodCompilationLevel0(method, isOsr);
>>> 320:   }
>>> 321:   public         int     getMethodDecompileCount(Executable method) {
>> 
>> As this class is marked as `@Deprecated`, do we need to add the methods here as well?
>
> Unfortunately yes :(
> I first didn't but there are still tests using it and they'll get warning otherwise. To make matters worse, some of them parse the output, so I gave up and added the methods to `sun/hotspot/WhiteBox.java` as well.

Oh I see, that's indeed unfortunate. I wasn't aware of that. Then better leave them in until the entire class gets removed at some point. Thanks for the explanation!

-------------

PR: https://git.openjdk.java.net/jdk/pull/6541

From rkennke at openjdk.java.net  Fri Nov 26 09:49:15 2021
From: rkennke at openjdk.java.net (Roman Kennke)
Date: Fri, 26 Nov 2021 09:49:15 GMT
Subject: Integrated: 8277417: C1 LIR instruction for load-klass
In-Reply-To: <voPyK1Nahe_KLkjweEy6mY-ZEJ3RY5uhGYcF-S9fJQg=.adcb830e-d9f8-4bae-a0fb-079cfd7ffc01@github.com>
References: <voPyK1Nahe_KLkjweEy6mY-ZEJ3RY5uhGYcF-S9fJQg=.adcb830e-d9f8-4bae-a0fb-079cfd7ffc01@github.com>
Message-ID: <i_N7OazBgnkBzzCmn7wzQgDQe6MLJVuELGXBWOE3axs=.87be2a92-9b4b-4b5d-b616-531bb6a8f491@github.com>

On Thu, 18 Nov 2021 20:16:27 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

> In C1, the load of a Klass* out of an object is currently identified by a load of type T_ADDRESS with offset oopDest::klass_offset_in_bytes(). When encountering such load, this may be decoded when +CompressedClassPointers. This is problematic and ugly: if we ever emit a T_ADDRESS load with offset 8 or 4 (== klass_offset_in_bytes) that is not a Klass*, we would attempt to decode the result. We have been lucky so far.
> 
> Also, in Lilliput, we want to do something entirely different there, and need to be able to emit more complex code, possibly including runtime call.
> 
> The change introduces a new C1 LIR opcode OpLoadKlass, and refactors the implementations in c1_LIRAssembler_xyz.cpp to emit the code there, instead of mem2reg(). Notice that I could not test anything but x86, all other platforms only received very basic testing via GHA. It would be nice if respective maintainers could give it a try.
> 
> Testing:
>  - [x] tier1 (x86_64)
>  - [x] tier2 (x86_64)
>  - [x] tier3 (x86_64)

This pull request has now been integrated.

Changeset: 99e4bda3
Author:    Roman Kennke <rkennke at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/99e4bda303f2c71972a125d0ecaf4cf986c8614a
Stats:     189 lines in 11 files changed: 140 ins; 37 del; 12 mod

8277417: C1 LIR instruction for load-klass

Reviewed-by: iveresov, mdoerr, ngasson, aph

-------------

PR: https://git.openjdk.java.net/jdk/pull/6464

From duke at openjdk.java.net  Fri Nov 26 10:33:34 2021
From: duke at openjdk.java.net (Vishal Chand)
Date: Fri, 26 Nov 2021 10:33:34 GMT
Subject: RFR: 8277372: Add getters for BOT and card table members [v2]
In-Reply-To: <klzlVqf82kg2njVyRdKpzmRWN6Jl5E2n-jknOw7sEEw=.a178e411-bdd8-43a8-8e62-c7f9a0921241@github.com>
References: <klzlVqf82kg2njVyRdKpzmRWN6Jl5E2n-jknOw7sEEw=.a178e411-bdd8-43a8-8e62-c7f9a0921241@github.com>
Message-ID: <lZuG8TbhePJ1yQWmt_M0uM_3yEXi86Z623g46A2IZHA=.4d13c4bb-38cb-4375-8896-bf3241f6f560@github.com>

> Changed the visibility, added getters and refactored the following:
> 
> 1. Card Table Members
> 2. BOT members
> 3. ObjectStartArray block members

Vishal Chand has updated the pull request incrementally with one additional commit since the last revision:

  Refactoring in hotspot/cpu dir

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6570/files
  - new: https://git.openjdk.java.net/jdk/pull/6570/files/1ffc2d3d..bb85aa48

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6570&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6570&range=00-01

  Stats: 21 lines in 9 files changed: 0 ins; 0 del; 21 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6570.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6570/head:pull/6570

PR: https://git.openjdk.java.net/jdk/pull/6570

From simonis at openjdk.java.net  Fri Nov 26 11:13:29 2021
From: simonis at openjdk.java.net (Volker Simonis)
Date: Fri, 26 Nov 2021 11:13:29 GMT
Subject: RFR: 8275908: Record null_check traps for calls and array_check
 traps in the interpreter [v3]
In-Reply-To: <XVCVSAbT9qFvFkPAPFV665UYtO7_O2lwqYX_Q9OwIAY=.8a307fce-e815-49e6-b70c-0dfb32e06a31@github.com>
References: <RvOm5XIj3o-4aocezdzRYl1HObteBMTWaSWd_hcS_Qk=.c3273db0-b957-4dde-b5b9-bb37865a9d47@github.com>
 <1sOmEVcu-uqZ5EIzL95kbSfMUiioSEwVXRkYEZSbYMs=.7805f384-e601-43e5-9ada-b070475bedf5@github.com>
 <XVCVSAbT9qFvFkPAPFV665UYtO7_O2lwqYX_Q9OwIAY=.8a307fce-e815-49e6-b70c-0dfb32e06a31@github.com>
Message-ID: <tH3NfgAMQu7F5dKaHvqW1V_2CZwquH9mBbcleCOLVJc=.d06005e0-04be-44cc-aa59-35ed84a8813b@github.com>

On Fri, 26 Nov 2021 09:20:09 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

>> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Fix Decompile.java test for non-JVMCI builds
>
> Thanks for doing the changes, they look good to me!
> 
>> @chhagedorn: I'm especially happy that you like the tests. As all too often the effort for a good test is much higher than for the fix itself :)
> 
> Yes, I couldn't agree more to that. It's sometimes underestimated how much time that is needed to come up with a good test. So, I always appreciate the extra effort :)

Thanks for the approval @chhagedorn .

I found that the new `Decompile.java` test failed on linux/x86_32. The reason for this is that 32-bit builds don't include JVMCI and without JMVCI the bimorphic inlining trap is called just `bimorphic` (in contrast to `bimorphic_or_optimized_type_check` for JVMCI builds`).

The fix is trivial and I hope your approval is also valid for the latest version :)

-------------

PR: https://git.openjdk.java.net/jdk/pull/6541

From simonis at openjdk.java.net  Fri Nov 26 11:13:29 2021
From: simonis at openjdk.java.net (Volker Simonis)
Date: Fri, 26 Nov 2021 11:13:29 GMT
Subject: RFR: 8275908: Record null_check traps for calls and array_check
 traps in the interpreter [v3]
In-Reply-To: <RvOm5XIj3o-4aocezdzRYl1HObteBMTWaSWd_hcS_Qk=.c3273db0-b957-4dde-b5b9-bb37865a9d47@github.com>
References: <RvOm5XIj3o-4aocezdzRYl1HObteBMTWaSWd_hcS_Qk=.c3273db0-b957-4dde-b5b9-bb37865a9d47@github.com>
Message-ID: <mZm7ekL1RM7utuWfChXkBfc5wsTl5jt_35tsyd6mNR0=.6d79cfe0-d783-4bbb-9d30-616559455b08@github.com>

> `null_checks` occurring at invoke bytecodes are currently not recorded by the profiler. This leads to unnecessary uncommon traps, deoptimizations and recompilations for exceptions which already occurred before the compilation (i.e. are "hot"). This change fixes the problem in the interpreter.
> 
> `array_checks` are currently recorded as `class_checks` in the interpreter and therefore not recognized by the compiler. This again leads to uncommon traps, deoptimizations and recompilations. This change unifies the handling of `array_checks` in the interpreter and compiler and prevents unnecessary recompilation.
> 
> The test is a stripped down version of a test which was developed for [JDK-8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow](https://bugs.openjdk.java.net/browse/JDK-8273563) (still [under review](https://github.com/openjdk/jdk/pull/5488)). It introduces an extension to the Whitebox API to expose the decompile, deopt and trap counters which is also required for testing [JDK-8273563](https://bugs.openjdk.java.net/browse/JDK-8273563). I think (and hope) it will also be helpful for others in the future.

Volker Simonis has updated the pull request incrementally with one additional commit since the last revision:

  Fix Decompile.java test for non-JVMCI builds

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6541/files
  - new: https://git.openjdk.java.net/jdk/pull/6541/files/58a107db..c8564d08

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6541&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6541&range=01-02

  Stats: 11 lines in 1 file changed: 4 ins; 0 del; 7 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6541.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6541/head:pull/6541

PR: https://git.openjdk.java.net/jdk/pull/6541

From mdoerr at openjdk.java.net  Fri Nov 26 11:28:03 2021
From: mdoerr at openjdk.java.net (Martin Doerr)
Date: Fri, 26 Nov 2021 11:28:03 GMT
Subject: RFR: 8275908: Record null_check traps for calls and array_check
 traps in the interpreter [v3]
In-Reply-To: <mZm7ekL1RM7utuWfChXkBfc5wsTl5jt_35tsyd6mNR0=.6d79cfe0-d783-4bbb-9d30-616559455b08@github.com>
References: <RvOm5XIj3o-4aocezdzRYl1HObteBMTWaSWd_hcS_Qk=.c3273db0-b957-4dde-b5b9-bb37865a9d47@github.com>
 <mZm7ekL1RM7utuWfChXkBfc5wsTl5jt_35tsyd6mNR0=.6d79cfe0-d783-4bbb-9d30-616559455b08@github.com>
Message-ID: <djP-kE3J-luIOuK7kVEvREmd1WmOWs-pIzM93jKlxiI=.fce5fe7e-3caf-481a-9732-68d2dd7d3dd3@github.com>

On Fri, 26 Nov 2021 11:13:29 GMT, Volker Simonis <simonis at openjdk.org> wrote:

>> `null_checks` occurring at invoke bytecodes are currently not recorded by the profiler. This leads to unnecessary uncommon traps, deoptimizations and recompilations for exceptions which already occurred before the compilation (i.e. are "hot"). This change fixes the problem in the interpreter.
>> 
>> `array_checks` are currently recorded as `class_checks` in the interpreter and therefore not recognized by the compiler. This again leads to uncommon traps, deoptimizations and recompilations. This change unifies the handling of `array_checks` in the interpreter and compiler and prevents unnecessary recompilation.
>> 
>> The test is a stripped down version of a test which was developed for [JDK-8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow](https://bugs.openjdk.java.net/browse/JDK-8273563) (still [under review](https://github.com/openjdk/jdk/pull/5488)). It introduces an extension to the Whitebox API to expose the decompile, deopt and trap counters which is also required for testing [JDK-8273563](https://bugs.openjdk.java.net/browse/JDK-8273563). I think (and hope) it will also be helpful for others in the future.
>
> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix Decompile.java test for non-JVMCI builds

Right, non-JVMCI platforms need this fix (also PPC and s390). LGTM, now. We'll retest it.

-------------

Marked as reviewed by mdoerr (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6541

From chagedorn at openjdk.java.net  Fri Nov 26 11:50:08 2021
From: chagedorn at openjdk.java.net (Christian Hagedorn)
Date: Fri, 26 Nov 2021 11:50:08 GMT
Subject: RFR: 8275908: Record null_check traps for calls and array_check
 traps in the interpreter [v3]
In-Reply-To: <mZm7ekL1RM7utuWfChXkBfc5wsTl5jt_35tsyd6mNR0=.6d79cfe0-d783-4bbb-9d30-616559455b08@github.com>
References: <RvOm5XIj3o-4aocezdzRYl1HObteBMTWaSWd_hcS_Qk=.c3273db0-b957-4dde-b5b9-bb37865a9d47@github.com>
 <mZm7ekL1RM7utuWfChXkBfc5wsTl5jt_35tsyd6mNR0=.6d79cfe0-d783-4bbb-9d30-616559455b08@github.com>
Message-ID: <5wWSpqtrhuJgNpKzSkpZcEEwcO3UXv9ZzJspyVu96zo=.32f86314-d08f-4a37-8e39-3ed58da05c18@github.com>

On Fri, 26 Nov 2021 11:13:29 GMT, Volker Simonis <simonis at openjdk.org> wrote:

>> `null_checks` occurring at invoke bytecodes are currently not recorded by the profiler. This leads to unnecessary uncommon traps, deoptimizations and recompilations for exceptions which already occurred before the compilation (i.e. are "hot"). This change fixes the problem in the interpreter.
>> 
>> `array_checks` are currently recorded as `class_checks` in the interpreter and therefore not recognized by the compiler. This again leads to uncommon traps, deoptimizations and recompilations. This change unifies the handling of `array_checks` in the interpreter and compiler and prevents unnecessary recompilation.
>> 
>> The test is a stripped down version of a test which was developed for [JDK-8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow](https://bugs.openjdk.java.net/browse/JDK-8273563) (still [under review](https://github.com/openjdk/jdk/pull/5488)). It introduces an extension to the Whitebox API to expose the decompile, deopt and trap counters which is also required for testing [JDK-8273563](https://bugs.openjdk.java.net/browse/JDK-8273563). I think (and hope) it will also be helpful for others in the future.
>
> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix Decompile.java test for non-JVMCI builds

Good catch! Looks good.

-------------

Marked as reviewed by chagedorn (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6541

From lkorinth at openjdk.java.net  Fri Nov 26 12:35:03 2021
From: lkorinth at openjdk.java.net (Leo Korinth)
Date: Fri, 26 Nov 2021 12:35:03 GMT
Subject: RFR: 8269537: memset() is called after operator new [v4]
In-Reply-To: <OLUjj1GGh7kLi5tWiI_keOCWU-VgxtjesyoWHDtifSs=.213f71a6-c4b5-4107-b64f-b59efbfe7309@github.com>
References: <fe0PJHcDQ9Gax0idlyvpt0IQxhDwReN27jV9en3F1Uo=.6853eda5-c0fe-4664-b231-8e8922fa3713@github.com>
 <OLUjj1GGh7kLi5tWiI_keOCWU-VgxtjesyoWHDtifSs=.213f71a6-c4b5-4107-b64f-b59efbfe7309@github.com>
Message-ID: <MOvoI3fzttRrl6Gv3sltm_gOqdFDIAsY7C16XTY-I-Y=.7d806181-a604-4125-9cf4-6715d981b335@github.com>

On Wed, 20 Oct 2021 09:36:38 GMT, Leo Korinth <lkorinth at openjdk.org> wrote:

>> The basic problem is that we are relying on undefined behaviour, as documented in the code:
>> 
>> // This whole business of passing information from ResourceObj::operator new
>> // to the ResourceObj constructor via fields in the "object" is technically UB.
>> // But it seems to work within the limitations of HotSpot usage (such as no
>> // multiple inheritance) with the compilers and compiler options we're using.
>> // And it gives some possibly useful checking for misuse of ResourceObj.
>> 
>> 
>> I am removing the undefined behaviour by passing the type of allocation through a thread local variable.
>> 
>> This solution has some advantages:
>> 1) it is not UB
>> 2) it is simpler and easier to understand
>> 3) it uses less memory (I could make it use even less if I made the enum `allocation_type` a u8)
>> 4) in the *very* unlikely situation that stack memory (or embedded) already equals the data calculated from the address of the object, the code will also work. 
>> 
>> When doing the change, I also updated  `allocated_on_stack()` to the new name `allocated_on_stack_or_embedded()` which is much harder to misinterpret.
>> 
>> I also disallow to "fake" the memory type by explicitly calling `ResourceObj::set_allocation_type`.
>> 
>> This forced me to change two places that is faking the allocation type of an embedded `GrowableArray` from  `STACK_OR_EMBEDDED` to `C_HEAP`. The faking of the type is hard to understand as a `STACK_OR_EMBEDDED` `GrowableArray` can allocate any type of object. My guess is that `GrowableArray` has changed behaviour, or maybe that it was hard to understand because the old naming of `allocated_on_stack()`. 
>> 
>> I have also tried to update the comments. In doing that I not only changed the comments for this change, but also for the *incorrect* advice to always delete object you allocate with new.
>> 
>> Testing on debug build tier1-3
>> Testing on release build tier1
>
> Leo Korinth has updated the pull request incrementally with one additional commit since the last revision:
> 
>   review updates

This comment will keep this pull request alive a bit longer.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5387

From simonis at openjdk.java.net  Fri Nov 26 16:24:13 2021
From: simonis at openjdk.java.net (Volker Simonis)
Date: Fri, 26 Nov 2021 16:24:13 GMT
Subject: Integrated: 8275908: Record null_check traps for calls and array_check
 traps in the interpreter
In-Reply-To: <RvOm5XIj3o-4aocezdzRYl1HObteBMTWaSWd_hcS_Qk=.c3273db0-b957-4dde-b5b9-bb37865a9d47@github.com>
References: <RvOm5XIj3o-4aocezdzRYl1HObteBMTWaSWd_hcS_Qk=.c3273db0-b957-4dde-b5b9-bb37865a9d47@github.com>
Message-ID: <53AX6UcR3p-pozMTAUGFL5x080jMy7BYYRF0Qa1ats8=.2035356c-1c61-4e9d-8985-29eb5fad4847@github.com>

On Wed, 24 Nov 2021 16:33:35 GMT, Volker Simonis <simonis at openjdk.org> wrote:

> `null_checks` occurring at invoke bytecodes are currently not recorded by the profiler. This leads to unnecessary uncommon traps, deoptimizations and recompilations for exceptions which already occurred before the compilation (i.e. are "hot"). This change fixes the problem in the interpreter.
> 
> `array_checks` are currently recorded as `class_checks` in the interpreter and therefore not recognized by the compiler. This again leads to uncommon traps, deoptimizations and recompilations. This change unifies the handling of `array_checks` in the interpreter and compiler and prevents unnecessary recompilation.
> 
> The test is a stripped down version of a test which was developed for [JDK-8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow](https://bugs.openjdk.java.net/browse/JDK-8273563) (still [under review](https://github.com/openjdk/jdk/pull/5488)). It introduces an extension to the Whitebox API to expose the decompile, deopt and trap counters which is also required for testing [JDK-8273563](https://bugs.openjdk.java.net/browse/JDK-8273563). I think (and hope) it will also be helpful for others in the future.

This pull request has now been integrated.

Changeset: 40fef231
Author:    Volker Simonis <simonis at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/40fef2311c95eca0ec34652f9fc0e56b827b8380
Stats:     637 lines in 9 files changed: 628 ins; 1 del; 8 mod

8275908: Record null_check traps for calls and array_check traps in the interpreter

Reviewed-by: chagedorn, mdoerr

-------------

PR: https://git.openjdk.java.net/jdk/pull/6541

From simonis at openjdk.java.net  Fri Nov 26 16:24:10 2021
From: simonis at openjdk.java.net (Volker Simonis)
Date: Fri, 26 Nov 2021 16:24:10 GMT
Subject: RFR: 8275908: Record null_check traps for calls and array_check
 traps in the interpreter [v3]
In-Reply-To: <5wWSpqtrhuJgNpKzSkpZcEEwcO3UXv9ZzJspyVu96zo=.32f86314-d08f-4a37-8e39-3ed58da05c18@github.com>
References: <RvOm5XIj3o-4aocezdzRYl1HObteBMTWaSWd_hcS_Qk=.c3273db0-b957-4dde-b5b9-bb37865a9d47@github.com>
 <mZm7ekL1RM7utuWfChXkBfc5wsTl5jt_35tsyd6mNR0=.6d79cfe0-d783-4bbb-9d30-616559455b08@github.com>
 <5wWSpqtrhuJgNpKzSkpZcEEwcO3UXv9ZzJspyVu96zo=.32f86314-d08f-4a37-8e39-3ed58da05c18@github.com>
Message-ID: <9fnUl67C4gTEG1KUH1p3pEwamphpZy4XB91ShOxyVO0=.7ddfd8d5-04a9-4960-b980-78a9e8d19bdf@github.com>

On Fri, 26 Nov 2021 11:47:11 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

>> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Fix Decompile.java test for non-JVMCI builds
>
> Good catch! Looks good.

Thanks @chhagedorn, @TheRealMDoerr !

-------------

PR: https://git.openjdk.java.net/jdk/pull/6541

From dholmes at openjdk.java.net  Sat Nov 27 05:00:12 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Sat, 27 Nov 2021 05:00:12 GMT
Subject: RFR: 8275908: Record null_check traps for calls and array_check
 traps in the interpreter [v3]
In-Reply-To: <mZm7ekL1RM7utuWfChXkBfc5wsTl5jt_35tsyd6mNR0=.6d79cfe0-d783-4bbb-9d30-616559455b08@github.com>
References: <RvOm5XIj3o-4aocezdzRYl1HObteBMTWaSWd_hcS_Qk=.c3273db0-b957-4dde-b5b9-bb37865a9d47@github.com>
 <mZm7ekL1RM7utuWfChXkBfc5wsTl5jt_35tsyd6mNR0=.6d79cfe0-d783-4bbb-9d30-616559455b08@github.com>
Message-ID: <41uVcf2dhmwZ_JLdSZz-njNJYeWpG_VYiJ_5W4vM-fA=.f0fccd47-82ea-4758-87d2-3832ebe083d9@github.com>

On Fri, 26 Nov 2021 11:13:29 GMT, Volker Simonis <simonis at openjdk.org> wrote:

>> `null_checks` occurring at invoke bytecodes are currently not recorded by the profiler. This leads to unnecessary uncommon traps, deoptimizations and recompilations for exceptions which already occurred before the compilation (i.e. are "hot"). This change fixes the problem in the interpreter.
>> 
>> `array_checks` are currently recorded as `class_checks` in the interpreter and therefore not recognized by the compiler. This again leads to uncommon traps, deoptimizations and recompilations. This change unifies the handling of `array_checks` in the interpreter and compiler and prevents unnecessary recompilation.
>> 
>> The test is a stripped down version of a test which was developed for [JDK-8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow](https://bugs.openjdk.java.net/browse/JDK-8273563) (still [under review](https://github.com/openjdk/jdk/pull/5488)). It introduces an extension to the Whitebox API to expose the decompile, deopt and trap counters which is also required for testing [JDK-8273563](https://bugs.openjdk.java.net/browse/JDK-8273563). I think (and hope) it will also be helpful for others in the future.
>
> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix Decompile.java test for non-JVMCI builds

The new tests are not written correctly and are causing failures in our tier3 CI runs.

If you set an explicit GC on the @run line then you have to ensure no GC option is passed in when running jtreg, else you get two GC's requested and the test fails to run.

https://bugs.openjdk.java.net/browse/JDK-8277878

-------------

PR: https://git.openjdk.java.net/jdk/pull/6541

From duke at openjdk.java.net  Sat Nov 27 10:01:07 2021
From: duke at openjdk.java.net (duke)
Date: Sat, 27 Nov 2021 10:01:07 GMT
Subject: Withdrawn: 8218885: Restore pop_frame and force_early_return
 functionality for Graal
In-Reply-To: <x8a8D_mI2_R0T5iXOYbTPNWSKp3-8Qtj0QABQY_F3sM=.a1780863-066e-4c69-9864-0a5ba7b48255@github.com>
References: <x8a8D_mI2_R0T5iXOYbTPNWSKp3-8Qtj0QABQY_F3sM=.a1780863-066e-4c69-9864-0a5ba7b48255@github.com>
Message-ID: <yTCIaz9DWhHlG-B1_ndfmOxHFfmyTdjHt4R9B2lHIic=.b3d65a7d-f7ab-4585-9b9a-ba06bcbfcd2c@github.com>

On Wed, 22 Sep 2021 05:40:40 GMT, Tom Rodriguez <never at openjdk.org> wrote:

> This logic no longer seems to be necessary since the adjustCompilationLevel callback has been removed.

This pull request has been closed without being integrated.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5625

From tschatzl at openjdk.java.net  Sat Nov 27 12:13:03 2021
From: tschatzl at openjdk.java.net (Thomas Schatzl)
Date: Sat, 27 Nov 2021 12:13:03 GMT
Subject: RFR: 8277372: Add getters for BOT and card table members [v2]
In-Reply-To: <lZuG8TbhePJ1yQWmt_M0uM_3yEXi86Z623g46A2IZHA=.4d13c4bb-38cb-4375-8896-bf3241f6f560@github.com>
References: <klzlVqf82kg2njVyRdKpzmRWN6Jl5E2n-jknOw7sEEw=.a178e411-bdd8-43a8-8e62-c7f9a0921241@github.com>
 <lZuG8TbhePJ1yQWmt_M0uM_3yEXi86Z623g46A2IZHA=.4d13c4bb-38cb-4375-8896-bf3241f6f560@github.com>
Message-ID: <nKKKhrxCum870GF0GDkmUy02OYLWZ-1QSnjtZSebR6E=.cf85a8d7-15c3-40a4-894c-3ed7a81b957b@github.com>

On Fri, 26 Nov 2021 10:33:34 GMT, Vishal Chand <duke at openjdk.java.net> wrote:

>> Changed the visibility, added getters and refactored the following:
>> 
>> 1. Card Table Members
>> 2. BOT members
>> 3. ObjectStartArray block members
>
> Vishal Chand has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Refactoring in hotspot/cpu dir

@tstuefe : can you check whether the s390 and ppc changes still compile? The changes look straightforward enough, but...

Thanks,
  Thomas

-------------

PR: https://git.openjdk.java.net/jdk/pull/6570

From stuefe at openjdk.java.net  Sat Nov 27 14:30:04 2021
From: stuefe at openjdk.java.net (Thomas Stuefe)
Date: Sat, 27 Nov 2021 14:30:04 GMT
Subject: RFR: 8277372: Add getters for BOT and card table members [v2]
In-Reply-To: <nKKKhrxCum870GF0GDkmUy02OYLWZ-1QSnjtZSebR6E=.cf85a8d7-15c3-40a4-894c-3ed7a81b957b@github.com>
References: <klzlVqf82kg2njVyRdKpzmRWN6Jl5E2n-jknOw7sEEw=.a178e411-bdd8-43a8-8e62-c7f9a0921241@github.com>
 <lZuG8TbhePJ1yQWmt_M0uM_3yEXi86Z623g46A2IZHA=.4d13c4bb-38cb-4375-8896-bf3241f6f560@github.com>
 <nKKKhrxCum870GF0GDkmUy02OYLWZ-1QSnjtZSebR6E=.cf85a8d7-15c3-40a4-894c-3ed7a81b957b@github.com>
Message-ID: <w12h3fXmD0ipedHcWzEEZ7TghaVW2f3dWh52KmE88PU=.81f5eaf5-9e7f-4ef3-a09e-ed21de72f977@github.com>

On Sat, 27 Nov 2021 12:10:17 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:

>> Vishal Chand has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Refactoring in hotspot/cpu dir
>
> @tstuefe : can you check whether the s390 and ppc changes still compile? The changes look straightforward enough, but...
> 
> Thanks,
>   Thomas

@tschatzl We have s390 + ppcle builds now in GHAs thanks to Alexey, and they do look fine (https://github.com/openjdk/jdk/pull/6570/checks?check_run_id=4341321057). Seeing how simple the platform changes are, I think this is okay. 

Cheers, Thomas

-------------

PR: https://git.openjdk.java.net/jdk/pull/6570

From jbhateja at openjdk.java.net  Sat Nov 27 14:52:04 2021
From: jbhateja at openjdk.java.net (Jatin Bhateja)
Date: Sat, 27 Nov 2021 14:52:04 GMT
Subject: RFR: 8277617: Adjust AVX3Threshold for copy/fill stubs [v4]
In-Reply-To: <dNu_1z9fozIvODrkwyORFY0MuwEO96T3EuKWU1hltq4=.ce056f08-de91-4745-8d18-5f8ad6825824@github.com>
References: <UXnYNz0bqKdWBzAjKTAw7xsgj8ilC3mN7c4s8Xsr6zw=.24b2646d-7da8-4da5-85f8-defa502d90aa@github.com>
 <EQhMD_Jyc5voaHn-qjhTjEACFFejYz5v8bAr4ZjFw5E=.4922dd69-4510-4dab-ae58-c75d94ed4ffc@github.com>
 <dNu_1z9fozIvODrkwyORFY0MuwEO96T3EuKWU1hltq4=.ce056f08-de91-4745-8d18-5f8ad6825824@github.com>
Message-ID: <ukEsdb2bU_axcjm5xFHnCe_Frj5PrK8lpvf6V3tHBcc=.41234723-7385-4dc9-ad25-68f82c86f69f@github.com>

On Thu, 25 Nov 2021 05:08:42 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Override threshold only if flag is default
>
> src/hotspot/cpu/x86/vm_version_x86.cpp line 1893:
> 
>> 1891:     return AVX3Threshold;
>> 1892:   }
>> 1893: }
> 
> I am somewhat concerned about the overhead of evaluating this each time it is used. I realize these will only be startup costs while generating the stubs, not part of the stubs themselves, but it still may be a startup impact. Can you run a startup benchmark to see if there is any problem?
> 
> I was also thinking the more direct formulation would just be:
> ```return (is_intel_family_core() && supports_serialize() && FLAG_IS_DEFAULT(AVX3Threshold)) ? 0 : AVX3Threshold;```

Hi  @sviswa7 agree with @dholmes-ora , instead of calling multiple times in a stub can we not call it only once per stub? Since stubs are assembled once and not relocated hence it should be okay to call this method only once for stubs which are going to benefit form this change.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6512

From tobias.hartmann at oracle.com  Mon Nov 29 09:01:05 2021
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 29 Nov 2021 10:01:05 +0100
Subject: [External] : RFC: improving NMethod code locality in CodeCache
In-Reply-To: <18BB091D-7983-48B1-BD0D-A333D8B81226@amazon.com>
References: <18BB091D-7983-48B1-BD0D-A333D8B81226@amazon.com>
Message-ID: <2e87598c-fb18-5189-8cb6-5e791133161f@oracle.com>

Hi Evgeny,

Thanks for sharing these results and starting the discussion.

Some comments below.

On 23.11.21 18:34, Astigeevich, Evgeny wrote:
> We have cases where the CodeCache contains more than 15,000 compiled methods. In these cases, we saw a negative performance effect. The hot executable code is not contiguous, so branch prediction hardware can become overloaded.

Is it really a problem with branch prediction or more with instruction caching? With the current
implementation, the hot instructions of a single nmethod are already contiguous but different
nmethods might be located far away (and there's lots of metadata in-between). (Re-)moving the
metadata will improve locality but does that really have an effect on branch prediction?

Did you gather some numbers via hardware performance counters (iCache, ITLB, branch prediction misses)?

> The data show that due to intervening non-executable data in NMethods, executable code is sparse in the CodeCache. The data also show the most contributors of non-executable data are the header and scopes sections. Arm64 vs x86_64 looks consistent except the stub code. On arm64 the size of the stub code is 4-5 times bigger.
> 
> We?d like to have an option to configure the CodeCache to support C2 nmethods with separated executable code and non-executable data.

It would definitely be nice to have this as an option (rather than replacing the current
implementation) but I wonder how feasible it is. There is lots of code that depends on the current
layout and we would need to make all of that dependent on a flag.

> According to the fixed JDK-8152664 (https://bugs.openjdk.java.net/browse/JDK-8152664) ?Support non-continuous CodeBlobs in HotSpot?, NMethod sections can be located in different places of memory. The discussion of it: https://mail.openjdk.java.net/pipermail/hotspot-dev/2016-April/022500.html. Separating code will complicate maintenance of the CodeCache. Different parts of memory for a nmethod need to be allocated/released.

Ever since I finished the implementation of the Segmented Code Cache
(https://openjdk.java.net/jeps/197), I wanted to work on this but never got to it. I think that the
additional complexity in the code cache is worth it but of course that has to be proven by a
performance evaluation.

For reference, here's my old thesis and the paper we published back then:
http://cr.openjdk.java.net/~thartmann/papers/2014-Code_Cache_Optimizations-thesis.pdf
http://cr.openjdk.java.net/~thartmann/papers/2014-PPPJ-Efficient_Code_Cache_Management.pdf

> There is JDK-7072317 ?move metadata from CodeCache? (https://bugs.openjdk.java.net/browse/JDK-7072317) which the implementation works can be done under.

Yes, that makes sense.

> There can be different approaches for the implementation:
> 
> 1. What to separate:
>     a. All code (main plus stub) from other sections.
>     b. Or only main code because this is the code where an application should spend most of the time.
>     c. Or the header and scope sections.

I would say that from a performance perspective, only the main code matters because the stubs are
used for slow paths. If it simplifies prototyping, I would go with b) first.

> 2. Where to put:
>     a. Different segments for code and nmethod data. This will require updating NMethod because it uses code_offset, stub_offset from header_begin.
>     b. The same segment but in a different part (e.g., code grows from lower addresses upwards and metadata from high addresses downwards). This might allow keeping NMethod using code_offset, stub_offset.
>     c.  Or in a completely different place (C-heap, Metaspace,...)

It depends on what we want to improve: (i) Code locality in the same nmethod or (ii) code locality
between different nmethods.

Solution b) would only improve code locality in the same nmethod but the overall layout of
executable code in the code cache would still be sparse.

I think c) would be the ideal solution: The code cache would only contain executable code and all
the metadata would be somewhere else. But solution a) would lead to the same layout and might be
easier to implement.

> It needs to be investigated if the separation of sections which are frequently accessed during the normal execution of the code (e.g., oop section) affects the performance negatively. We might need to change NMethodSweeper to preserve the code locality property.

Yes, that is a concern. A thorough performance evaluation is required.

> We would like to get feedback on the above approaches (or something different) before implementing JDK-7072317.

Hope that helps. I'm curious what others think.

Best regards,
Tobias

From tschatzl at openjdk.java.net  Mon Nov 29 09:52:09 2021
From: tschatzl at openjdk.java.net (Thomas Schatzl)
Date: Mon, 29 Nov 2021 09:52:09 GMT
Subject: RFR: 8277372: Add getters for BOT and card table members [v2]
In-Reply-To: <lZuG8TbhePJ1yQWmt_M0uM_3yEXi86Z623g46A2IZHA=.4d13c4bb-38cb-4375-8896-bf3241f6f560@github.com>
References: <klzlVqf82kg2njVyRdKpzmRWN6Jl5E2n-jknOw7sEEw=.a178e411-bdd8-43a8-8e62-c7f9a0921241@github.com>
 <lZuG8TbhePJ1yQWmt_M0uM_3yEXi86Z623g46A2IZHA=.4d13c4bb-38cb-4375-8896-bf3241f6f560@github.com>
Message-ID: <NoVHMuFB4h5GK0ORlSdHiVYnZdaApQIF-uCDO7sGuJM=.6a4cd723-5a9e-4c12-b9e2-f8799a066b26@github.com>

On Fri, 26 Nov 2021 10:33:34 GMT, Vishal Chand <duke at openjdk.java.net> wrote:

>> Changed the visibility, added getters and refactored the following:
>> 
>> 1. Card Table Members
>> 2. BOT members
>> 3. ObjectStartArray block members
>
> Vishal Chand has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Refactoring in hotspot/cpu dir

I will push it through our testing, particular the changes for the SA agent (in `vmstructs_gc.hpp`) are always good to double-check.

src/hotspot/share/gc/shared/blockOffsetTable.hpp line 56:

> 54:   static uint _LogN_words;
> 55:   static uint _N_bytes;
> 56:   static uint _N_words;

The `private` visibility modifier can be removed as this is default at the top of a class.
The static variables should start with a lower case letter after the underscore, something like `_log_n`.

My suggestion would also be to change `N`/`n` to something more understandable, like `size`, and add `block`, i.e. something like `_log_block_size`, `_log_block_size_in_words` similar to the corresponding `CardTable` members etc.

-------------

Changes requested by tschatzl (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6570

From duke at openjdk.java.net  Mon Nov 29 12:12:21 2021
From: duke at openjdk.java.net (xpbob)
Date: Mon, 29 Nov 2021 12:12:21 GMT
Subject: RFR: 8277930: Add unsafe allocation event to jfr
Message-ID: <hTSvT0d63lXUYdW8Y4Gk_DjO6rX5RsBnTwKQg9-if64=.cac9f512-62d8-4e6d-a25c-33287635b82d@github.com>

Unsafe is used in many Java frameworks.
When the framework has a unsafe memory leak , there is no way to know what code is causing it.
Add unsafe allocation event to jfr.
Records the size and stack allocated.
This event is off by default

-------------

Commit messages:
 - 8277930: Add unsafe allocation event to jfr

Changes: https://git.openjdk.java.net/jdk/pull/6591/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6591&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8277930
  Stats: 18 lines in 4 files changed: 16 ins; 0 del; 2 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6591.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6591/head:pull/6591

PR: https://git.openjdk.java.net/jdk/pull/6591

From lutz.schmidt at sap.com  Mon Nov 29 12:19:09 2021
From: lutz.schmidt at sap.com (Schmidt, Lutz)
Date: Mon, 29 Nov 2021 12:19:09 +0000
Subject: [External] : RFC: improving NMethod code locality in CodeCache
In-Reply-To: <2e87598c-fb18-5189-8cb6-5e791133161f@oracle.com>
References: <18BB091D-7983-48B1-BD0D-A333D8B81226@amazon.com>
 <2e87598c-fb18-5189-8cb6-5e791133161f@oracle.com>
Message-ID: <3C21E64B-42BA-4818-8623-6B518E00D97D@sap.com>

Hi, 

a few thoughts immediately popped up when reading Evgeny's RFC and Tobias' comments. If my comments seem influenced by s390x - that might well be. It's the architecture I know best. 

 - The biggest concern I have relates to pc-relative addressing. 
    o nmethod constants are currently located next to the instruction section.
      Putting them into a separately allocated area may break the pc-relative limit.
      s390x limit: +/- 4GB, no fallback implemented.
    o relative branches either are
       + short distance, mostly intra-nmethod
       + long distance, mostly inter-nmethod
       + not possible in general, e.g., runtime calls
      The branch optimization (in shorten_branches) might less often be possible.
      One example would be if stub code is moved to a separately allocated area. 
 - When considering performance, it is beneficial to have data which is being 
   patched (frequently) separated from the instruction stream.
   s390x: never modify data in a cache line where instructions are fetched from.
   That will kill your performance big time.
 - I'm not a branch prediction expert. Instruction stream compactness may have an
   influence if the prediction engine not only remembers the branch direction, but
   the (limited length) distance as well. 

Thanks,
Lutz


?On 29.11.21, 10:03, "hotspot-dev on behalf of Tobias Hartmann" <hotspot-dev-retn at openjdk.java.net on behalf of tobias.hartmann at oracle.com> wrote:

    Hi Evgeny,

    Thanks for sharing these results and starting the discussion.

    Some comments below.

    On 23.11.21 18:34, Astigeevich, Evgeny wrote:
    > We have cases where the CodeCache contains more than 15,000 compiled methods. In these cases, we saw a negative performance effect. The hot executable code is not contiguous, so branch prediction hardware can become overloaded.

    Is it really a problem with branch prediction or more with instruction caching? With the current
    implementation, the hot instructions of a single nmethod are already contiguous but different
    nmethods might be located far away (and there's lots of metadata in-between). (Re-)moving the
    metadata will improve locality but does that really have an effect on branch prediction?

    Did you gather some numbers via hardware performance counters (iCache, ITLB, branch prediction misses)?

    > The data show that due to intervening non-executable data in NMethods, executable code is sparse in the CodeCache. The data also show the most contributors of non-executable data are the header and scopes sections. Arm64 vs x86_64 looks consistent except the stub code. On arm64 the size of the stub code is 4-5 times bigger.
    > 
    > We?d like to have an option to configure the CodeCache to support C2 nmethods with separated executable code and non-executable data.

    It would definitely be nice to have this as an option (rather than replacing the current
    implementation) but I wonder how feasible it is. There is lots of code that depends on the current
    layout and we would need to make all of that dependent on a flag.

    > According to the fixed JDK-8152664 (https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8152664&amp;data=04%7C01%7Clutz.schmidt%40sap.com%7C17b6b19707b845d65b6308d9b316d9b6%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C637737734063133916%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=0j0bCjbCv7AQH1uULiERMIcfUWaTWzh%2FIJbKuMO70Ow%3D&amp;reserved=0) ?Support non-continuous CodeBlobs in HotSpot?, NMethod sections can be located in different places of memory. The discussion of it: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openjdk.java.net%2Fpipermail%2Fhotspot-dev%2F2016-April%2F022500.html&amp;data=04%7C01%7Clutz.schmidt%40sap.com%7C17b6b19707b845d65b6308d9b316d9b6%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C637737734063133916%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=4bXS2plxpknWzKwY9qdJl%2BTGEHiwV1LgMnIkHGwkG8A%3D&amp;reserved=0. Separating code will complicate maintenance of the CodeCache. Different parts of memory for a nmethod need to be allocated/released.

    Ever since I finished the implementation of the Segmented Code Cache
    (https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fopenjdk.java.net%2Fjeps%2F197&amp;data=04%7C01%7Clutz.schmidt%40sap.com%7C17b6b19707b845d65b6308d9b316d9b6%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C637737734063133916%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=ylfS6p71bpm7XmNRfG0vjSw6ZqRPOoJvSRujzYkQz8g%3D&amp;reserved=0), I wanted to work on this but never got to it. I think that the
    additional complexity in the code cache is worth it but of course that has to be proven by a
    performance evaluation.

    For reference, here's my old thesis and the paper we published back then:
    https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fcr.openjdk.java.net%2F~thartmann%2Fpapers%2F2014-Code_Cache_Optimizations-thesis.pdf&amp;data=04%7C01%7Clutz.schmidt%40sap.com%7C17b6b19707b845d65b6308d9b316d9b6%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C637737734063143871%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=8KgOtwbSULPN%2FlUz10%2B9itGl%2Fmmvm6bV4y6D%2BcsT%2Bu4%3D&amp;reserved=0
    https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fcr.openjdk.java.net%2F~thartmann%2Fpapers%2F2014-PPPJ-Efficient_Code_Cache_Management.pdf&amp;data=04%7C01%7Clutz.schmidt%40sap.com%7C17b6b19707b845d65b6308d9b316d9b6%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C637737734063143871%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=gDYHJdpnK1%2FgcxDGZsYJ0X0Ku%2BIwS9KWrk8ggSfUVt0%3D&amp;reserved=0

    > There is JDK-7072317 ?move metadata from CodeCache? (https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-7072317&amp;data=04%7C01%7Clutz.schmidt%40sap.com%7C17b6b19707b845d65b6308d9b316d9b6%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C637737734063143871%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=p6sjPC9HXMlydHk5mi4DlQh2ZOG4MYvcLte%2FAz%2B3ZbU%3D&amp;reserved=0) which the implementation works can be done under.

    Yes, that makes sense.

    > There can be different approaches for the implementation:
    > 
    > 1. What to separate:
    >     a. All code (main plus stub) from other sections.
    >     b. Or only main code because this is the code where an application should spend most of the time.
    >     c. Or the header and scope sections.

    I would say that from a performance perspective, only the main code matters because the stubs are
    used for slow paths. If it simplifies prototyping, I would go with b) first.

    > 2. Where to put:
    >     a. Different segments for code and nmethod data. This will require updating NMethod because it uses code_offset, stub_offset from header_begin.
    >     b. The same segment but in a different part (e.g., code grows from lower addresses upwards and metadata from high addresses downwards). This might allow keeping NMethod using code_offset, stub_offset.
    >     c.  Or in a completely different place (C-heap, Metaspace,...)

    It depends on what we want to improve: (i) Code locality in the same nmethod or (ii) code locality
    between different nmethods.

    Solution b) would only improve code locality in the same nmethod but the overall layout of
    executable code in the code cache would still be sparse.

    I think c) would be the ideal solution: The code cache would only contain executable code and all
    the metadata would be somewhere else. But solution a) would lead to the same layout and might be
    easier to implement.

    > It needs to be investigated if the separation of sections which are frequently accessed during the normal execution of the code (e.g., oop section) affects the performance negatively. We might need to change NMethodSweeper to preserve the code locality property.

    Yes, that is a concern. A thorough performance evaluation is required.

    > We would like to get feedback on the above approaches (or something different) before implementing JDK-7072317.

    Hope that helps. I'm curious what others think.

    Best regards,
    Tobias


From duke at openjdk.java.net  Mon Nov 29 13:02:09 2021
From: duke at openjdk.java.net (Vishal Chand)
Date: Mon, 29 Nov 2021 13:02:09 GMT
Subject: RFR: 8277372: Add getters for BOT and card table members [v2]
In-Reply-To: <NoVHMuFB4h5GK0ORlSdHiVYnZdaApQIF-uCDO7sGuJM=.6a4cd723-5a9e-4c12-b9e2-f8799a066b26@github.com>
References: <klzlVqf82kg2njVyRdKpzmRWN6Jl5E2n-jknOw7sEEw=.a178e411-bdd8-43a8-8e62-c7f9a0921241@github.com>
 <lZuG8TbhePJ1yQWmt_M0uM_3yEXi86Z623g46A2IZHA=.4d13c4bb-38cb-4375-8896-bf3241f6f560@github.com>
 <NoVHMuFB4h5GK0ORlSdHiVYnZdaApQIF-uCDO7sGuJM=.6a4cd723-5a9e-4c12-b9e2-f8799a066b26@github.com>
Message-ID: <njeXl4iWpAc7HGJy0r3hy2lojS9NiwPSBfJadmVUBbU=.f94e3078-d747-4922-bd67-ca3694f379ce@github.com>

On Mon, 29 Nov 2021 09:41:54 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:

>> Vishal Chand has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Refactoring in hotspot/cpu dir
>
> src/hotspot/share/gc/shared/blockOffsetTable.hpp line 56:
> 
>> 54:   static uint _LogN_words;
>> 55:   static uint _N_bytes;
>> 56:   static uint _N_words;
> 
> The `private` visibility modifier can be removed as this is default at the top of a class.
> The static variables should start with a lower case letter after the underscore, something like `_log_n`.
> 
> My suggestion would also be to change `N`/`n` to something more understandable, like `size`, and add `block`, i.e. something like `_log_block_size`, `_log_block_size_in_words` similar to the corresponding `CardTable` members etc.
> 
> Edit: note that "block" isn't a good word to use here, so scratch that - "block" is any kind of area that is more generic than an object, but does not refer to the BOT entry.

As I can understand, we need to replace "N" with something meaningful. Does something like "entry_size" or "bot_entry_size" would work?

-------------

PR: https://git.openjdk.java.net/jdk/pull/6570

From zgu at openjdk.java.net  Mon Nov 29 14:04:11 2021
From: zgu at openjdk.java.net (Zhengyu Gu)
Date: Mon, 29 Nov 2021 14:04:11 GMT
Subject: RFR: 8277797: Remove undefined/unused
 SharedRuntime::trampoline_size()
In-Reply-To: <_mUzi5C5miYGHHXIwY2Hv3tCeZ1676_hZH9raqFn4Gw=.2d265a64-c041-4f73-a775-76d7f09470a3@github.com>
References: <odjG6gibrDM0PFJwVRkHuz-2ft2yg5xLb6xbrQt65r8=.1fc8e18b-0725-46d1-b434-9b6177610480@github.com>
 <_mUzi5C5miYGHHXIwY2Hv3tCeZ1676_hZH9raqFn4Gw=.2d265a64-c041-4f73-a775-76d7f09470a3@github.com>
Message-ID: <GKHAPlzEvXV38GqeqP06av1XPHbsH5Q3EPaizSDShR4=.83e18741-ab83-4655-ab73-e90c42c67d02@github.com>

On Thu, 25 Nov 2021 05:14:59 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> A trivial patch to remove undefined and unused `SharedRuntime::trampoline_size()`, a leftover from [JDK-8263002](https://bugs.openjdk.java.net/browse/JDK-8263002).
>
> Good and trivial.
> 
> Thanks,
> David

Thanks, @dholmes-ora @tstuefe

-------------

PR: https://git.openjdk.java.net/jdk/pull/6540

From zgu at openjdk.java.net  Mon Nov 29 14:04:11 2021
From: zgu at openjdk.java.net (Zhengyu Gu)
Date: Mon, 29 Nov 2021 14:04:11 GMT
Subject: Integrated: 8277797: Remove undefined/unused
 SharedRuntime::trampoline_size()
In-Reply-To: <odjG6gibrDM0PFJwVRkHuz-2ft2yg5xLb6xbrQt65r8=.1fc8e18b-0725-46d1-b434-9b6177610480@github.com>
References: <odjG6gibrDM0PFJwVRkHuz-2ft2yg5xLb6xbrQt65r8=.1fc8e18b-0725-46d1-b434-9b6177610480@github.com>
Message-ID: <NA_uAStFUhOH7DmIv4IhTxQgPmQwettw42qfutYpdD8=.7ebcc34e-e3f9-4e67-aac0-53b6ed89ecaa@github.com>

On Wed, 24 Nov 2021 15:54:47 GMT, Zhengyu Gu <zgu at openjdk.org> wrote:

> A trivial patch to remove undefined and unused `SharedRuntime::trampoline_size()`, a leftover from [JDK-8263002](https://bugs.openjdk.java.net/browse/JDK-8263002).

This pull request has now been integrated.

Changeset: 05ab1767
Author:    Zhengyu Gu <zgu at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/05ab1767684bee0a3b8c8214c610beafaad058f9
Stats:     2 lines in 1 file changed: 0 ins; 2 del; 0 mod

8277797: Remove undefined/unused SharedRuntime::trampoline_size()

Reviewed-by: dholmes, stuefe

-------------

PR: https://git.openjdk.java.net/jdk/pull/6540

From shade at openjdk.java.net  Mon Nov 29 14:10:54 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Mon, 29 Nov 2021 14:10:54 GMT
Subject: RFR: 8277893: Arraycopy stress tests
Message-ID: <Jf29S7pfw1xlam-1-XYQBl-GzeW0OSApxQYbpvnRkeA=.a0ad4382-5b42-4fec-a57f-542f532119fc@github.com>

I would like to fork the new tests off the JDK-8150730. These tests were instrumental in capturing many bugs in my arraycopy work, and I think they are good on their own merit, because they provide a test for the current baseline and on-going minor improvements in arraycopy on all platforms, not only x86_64, and they might be cleanly backportable.

A brief tour of these tests:

- Tests all data types;
- Tests small arrays exhaustively, which captures conjoint/disjoint cases, errors near the edges, etc;
- Tests large arrays with fuzzing around powers of two and powers of ten, both conjoint and disjoint cases;
- Tests all available compilation modes for arraycopy stubs; for example, running on AVX-512 enabled machine runs all versions down to `-XX:UseAVX=0 -XX:UseSSE=0` cases;
- Tests with/without compressed oops mode -- theoretically only needed for `Object` copies, but Hotspot cobbles together int+coops and long+no-coops loops, so I decided to alternate coops mode for all data types;

My previous version used individual `@run` clauses for all configurations, but I think the Java driver is cleaner and easier to maintain.

Test times:


# x86_64 (TR 3970X)
  real	9m11.037s
  user	78m2.766s
  sys	0m19.873s

# x86_32 (TR 3970X)
  real	13m39.054s
  user	147m38.308s
  sys	0m10.924s

# x86_64 (i5-11500)
  real    41m32.622s
  user    447m19.986s
  sys     0m21.026s

# AArch64 (ThunderX2)
  real	5m34.210s
  user	45m16.015s
  sys	0m24.723s


Since these tests are quite long, especially on small machines, I hooked them up to `hotspot:tier3`.

Additional testing:
 - [x] Linux x86_64 fastdebug `compiler/stress/arraycopy`
 - [x] Linux x86_32 fastdebug `compiler/stress/arraycopy`
 - [x] Linux AArch64 fastdebug `compiler/stress/arraycopy`

-------------

Commit messages:
 - Ready for review

Changes: https://git.openjdk.java.net/jdk/pull/6594/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6594&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8277893
  Stats: 1181 lines in 12 files changed: 1181 ins; 0 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6594.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6594/head:pull/6594

PR: https://git.openjdk.java.net/jdk/pull/6594

From jvernee at openjdk.java.net  Mon Nov 29 14:42:03 2021
From: jvernee at openjdk.java.net (Jorn Vernee)
Date: Mon, 29 Nov 2021 14:42:03 GMT
Subject: RFR: 8277930: Add unsafe allocation event to jfr
In-Reply-To: <hTSvT0d63lXUYdW8Y4Gk_DjO6rX5RsBnTwKQg9-if64=.cac9f512-62d8-4e6d-a25c-33287635b82d@github.com>
References: <hTSvT0d63lXUYdW8Y4Gk_DjO6rX5RsBnTwKQg9-if64=.cac9f512-62d8-4e6d-a25c-33287635b82d@github.com>
Message-ID: <vgNHQSIbxt16_7eNnT6A41lVCZ0DRzANn99DL_8xsyI=.15cee6da-31f5-41f7-b943-a70624c27668@github.com>

On Mon, 29 Nov 2021 12:06:02 GMT, xpbob <duke at openjdk.java.net> wrote:

> Unsafe is used in many Java frameworks.
> When the framework has a unsafe memory leak , there is no way to know what code is causing it.
> Add unsafe allocation event to jfr.
> Records the size and stack allocated.
> This event is off by default

An event like this would help to find allocation sites.

I'd suggest also adding an event for `Unsafe::freeMemory`, as well as recording the memory address in both event types. With that, it should be possible to match up allocations with frees, and leaks could be identified by looking for allocations that don't have a corresponding free with the same address.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6591

From duke at openjdk.java.net  Mon Nov 29 14:52:17 2021
From: duke at openjdk.java.net (Scott Gibbons)
Date: Mon, 29 Nov 2021 14:52:17 GMT
Subject: RFR: 8277358: Accelerate CRC32-C
Message-ID: <c2lZEv8jF88RwXMRJ_Xvhh-s6uKa-Gyz5L3oeLICqbo=.0c0b9ae6-1f3d-4fa6-bfe9-afaaaf9c683a@github.com>

Accelerates CRC32-C by utilizing vpclmulqdq similarly to CRC32.  This change achieves ~4x throughput improvement.

5986.947899319073 MB/s => 24041.05203089616 MB/s
5840.02689336947 MB/s => 24898.781468710356 MB/s

********** Original ***********


scottgi at 96974-ICX32:~/crc/jdk (asgibbons-crc32c)$ java test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java 20000000
 offset = 0
msgSize = 512 bytes
  iters = 20000000
-------------------------------------------------------
CRCs: crc = ae10ee5a, crcReference = ae10ee5a
CRC32C.update(byte[]) runtime = 1.710387358 seconds
CRC32C.update(byte[]) throughput = 5986.947899319073 MB/s
CRCs: crc = ae10ee5a, crcReference = ae10ee5a
-------------------------------------------------------
CRCs: crc = ae10ee5a, crcReference = ae10ee5a
CRC32C.update(ByteBuffer) runtime = 1.753416583 seconds
CRC32C.update(ByteBuffer) throughput = 5840.02689336947 MB/s
CRCs: crc = ae10ee5a, crcReference = ae10ee5a
-------------------------------------------------------




*********** With my changes: *************



scottgi at 96974-ICX32:~/crc/jdk (asgibbons-crc32c)$ java test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java 20000000
 offset = 0
msgSize = 512 bytes
  iters = 20000000
-------------------------------------------------------
CRCs: crc = ae10ee5a, crcReference = ae10ee5a
CRC32C.update(byte[]) runtime = 0.425938099 seconds
CRC32C.update(byte[]) throughput = 24041.05203089616 MB/s
CRCs: crc = ae10ee5a, crcReference = ae10ee5a
-------------------------------------------------------
CRCs: crc = ae10ee5a, crcReference = ae10ee5a
CRC32C.update(ByteBuffer) runtime = 0.411265106 seconds
CRC32C.update(ByteBuffer) throughput = 24898.781468710356 MB/s
CRCs: crc = ae10ee5a, crcReference = ae10ee5a
-------------------------------------------------------

-------------

Commit messages:
 - Merge branch 'asgibbons-crc32c' of https://github.com/asgibbons/jdk into asgibbons-crc32c
 - Merge branch 'master' into asgibbons-crc32c
 - Asgibbons crc32c (#7)
 - Merge branch 'openjdk:master' into master
 - Revert .gitignore change
 - Move register save to within conditional; add comments
 - Bad merge.
 - Merge branch 'asgibbons-crc32c' of https://github.com/asgibbons/jdk into asgibbons-crc32c
 - ZZMerge branch 'asgibbons-crc32c' of https://github.com/asgibbons/jdk into asgibbons-crc32c
 - Use existing CRC32 code with different table for CRC32-C
 - ... and 203 more: https://git.openjdk.java.net/jdk/compare/e9b36a83...10aeaec6

Changes: https://git.openjdk.java.net/jdk/pull/6595/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6595&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8277358
  Stats: 62 lines in 4 files changed: 40 ins; 1 del; 21 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6595.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6595/head:pull/6595

PR: https://git.openjdk.java.net/jdk/pull/6595

From tschatzl at openjdk.java.net  Mon Nov 29 16:55:08 2021
From: tschatzl at openjdk.java.net (Thomas Schatzl)
Date: Mon, 29 Nov 2021 16:55:08 GMT
Subject: RFR: 8277372: Add getters for BOT and card table members [v2]
In-Reply-To: <njeXl4iWpAc7HGJy0r3hy2lojS9NiwPSBfJadmVUBbU=.f94e3078-d747-4922-bd67-ca3694f379ce@github.com>
References: <klzlVqf82kg2njVyRdKpzmRWN6Jl5E2n-jknOw7sEEw=.a178e411-bdd8-43a8-8e62-c7f9a0921241@github.com>
 <lZuG8TbhePJ1yQWmt_M0uM_3yEXi86Z623g46A2IZHA=.4d13c4bb-38cb-4375-8896-bf3241f6f560@github.com>
 <NoVHMuFB4h5GK0ORlSdHiVYnZdaApQIF-uCDO7sGuJM=.6a4cd723-5a9e-4c12-b9e2-f8799a066b26@github.com>
 <njeXl4iWpAc7HGJy0r3hy2lojS9NiwPSBfJadmVUBbU=.f94e3078-d747-4922-bd67-ca3694f379ce@github.com>
Message-ID: <lwQY6aVYhd4qrIRDTqRC_Kbv1p6RCt8xshff5CjwGdg=.5aa28e3d-02f5-4f77-a593-99295f750749@github.com>

On Mon, 29 Nov 2021 12:58:58 GMT, Vishal Chand <duke at openjdk.java.net> wrote:

>> src/hotspot/share/gc/shared/blockOffsetTable.hpp line 56:
>> 
>>> 54:   static uint _LogN_words;
>>> 55:   static uint _N_bytes;
>>> 56:   static uint _N_words;
>> 
>> The `private` visibility modifier can be removed as this is default at the top of a class.
>> The static variables should start with a lower case letter after the underscore, something like `_log_n`.
>> 
>> My suggestion would also be to change `N`/`n` to something more understandable, like `size`, and add `block`, i.e. something like `_log_block_size`, `_log_block_size_in_words` similar to the corresponding `CardTable` members etc.
>> 
>> Edit: note that "block" isn't a good word to use here, so scratch that - "block" is any kind of area that is more generic than an object, but does not refer to the BOT entry.
>
> As I can understand, we need to replace "N" with something meaningful. Does something like "entry_size" or "bot_entry_size" would work?

I would think that `bot_entry_size` is one byte. Probably "bot_card_size"?

-------------

PR: https://git.openjdk.java.net/jdk/pull/6570

From ddong at openjdk.java.net  Mon Nov 29 17:47:18 2021
From: ddong at openjdk.java.net (Denghui Dong)
Date: Mon, 29 Nov 2021 17:47:18 GMT
Subject: RFR: 8277948: AArch64: Print the correct stack if
 -XX:+PreserveFramePointer when crash
Message-ID: <rpTtCfty6TqLhdYjwtrSn5lzYHK7jQGbtFKaIPc5510=.9c965e3a-b9f1-4bdb-807c-ae6c5584e353@github.com>

Hi,

I found that the native stack frames in the hs log are not accurate sometimes on AArch64, not sure if this is a known issue or an issue worth fixing.

The following steps can quick reproduce the problem:

1. apply the diff(comment the dtrace_object_alloc call in interpreter and make a crash on SharedRuntime::dtrace_object_alloc)

  index 39e99bdd5ed..4fc768e94aa 100644
  --- a/src/hotspot/cpu/aarch64/templateTable_aarch64.cpp
  +++ b/src/hotspot/cpu/aarch64/templateTable_aarch64.cpp
  @@ -3558,6 +3558,7 @@ void TemplateTable::_new() {
       __ store_klass_gap(r0, zr);  // zero klass gap for compressed oops
       __ store_klass(r0, r4);      // store klass last

  +/**
       {
         SkipIfEqual skip(_masm, &DTraceAllocProbes, false);
         // Trigger dtrace event for fastpath
  @@ -3567,6 +3568,7 @@ void TemplateTable::_new() {
         __ pop(atos); // restore the return value

       }
  +*/
       __ b(done);
     }

  diff --git a/src/hotspot/cpu/x86/templateTable_x86.cpp b/src/hotspot/cpu/x86/templateTable_x86.cpp
  index 19530b7c57c..15b0509da4c 100644
  --- a/src/hotspot/cpu/x86/templateTable_x86.cpp
  +++ b/src/hotspot/cpu/x86/templateTable_x86.cpp
  @@ -4033,6 +4033,7 @@ void TemplateTable::_new() {
       Register tmp_store_klass = LP64_ONLY(rscratch1) NOT_LP64(noreg);
       __ store_klass(rax, rcx, tmp_store_klass);  // klass

  +/**
       {
         SkipIfEqual skip_if(_masm, &DTraceAllocProbes, 0);
         // Trigger dtrace event for fastpath
  @@ -4041,6 +4042,7 @@ void TemplateTable::_new() {
              CAST_FROM_FN_PTR(address, static_cast<int (*)(oopDesc*)>(SharedRuntime::dtrace_object_alloc)), rax);
         __ pop(atos);
       }
  +*/

       __ jmp(done);
     }
  diff --git a/src/hotspot/share/runtime/sharedRuntime.cpp b/src/hotspot/share/runtime/sharedRuntime.cpp
  index a5de65ea5ab..60b4bd3bcc8 100644
  --- a/src/hotspot/share/runtime/sharedRuntime.cpp
  +++ b/src/hotspot/share/runtime/sharedRuntime.cpp
  @@ -1002,6 +1002,7 @@ jlong SharedRuntime::get_java_tid(Thread* thread) {
    * 6254741.  Once that is fixed we can remove the dummy return value.
    */
   int SharedRuntime::dtrace_object_alloc(oopDesc* o) {
  +  *(int*)0 = 1;
     return dtrace_object_alloc(Thread::current(), o, o->size());
   }


2. `java -XX:+DTraceAllocProbes -Xcomp -XX:-PreserveFramePointer -version`

On x86_64, the native stack in hs log is complete, but in AArch64, the native stack is incorrect.

In the beginning, I thought it might be the influence of PreserveFramePointer. Later, I found that no matter whether PreserveFramePointer is enabled or not, in the hs log of x86_64, the native stack is always correct, and aarch64 is wrong.

After some investigation, I found that this problem is related to the layout of the stack.

On x86_64, whether it is C/C++, interpreter, or JIT, `callee` will always put the `return address` and `fp` of the `caller` at the bottom of the stack.
Hence, `callee` can always get the `caller sp`(aka `sender sp`) by `fp + 2`, and if `caller` is a compiled method, `caller sp` is the key to getting the `caller`'s `caller` since `caller fp` may be invalid.(see frame::sender_for_compiled_frame).


push   %rbp
mov    %rsp,%rbp

         _ _ _ _ _ _
        |           |
        |           |               |
        |_ _ _ _ _ _|               |
        |           |               |
 caller |           | <- caller sp  |
 _ _ _  |_ _ _ _ _ _|               | expand
        |           |               |
        | ret addr  |               | direction
 callee |_ _ _ _ _ _|               |
        |           |               V
        | caller fp | <- fp
        |_ _ _ _ _ _|



But for AArch64, the C/C++ code doesn't put the `return address` and `fp` of the `caller` at the bottom of the stack.
Hence, we cannot use `fp + 2` to calculate the proper `caller sp`(although it is still implemented this way).

When `caller` is a C1/C2 method A, and `callee` a C/C++ method B, we cannot get the `caller` of A since we cannot get the proper sp value of it.


stp x29, x30, [sp, #-N]!
mov x29, sp

         _ _ _ _ _ _
        |           |
        |           |               |
        |_ _ _ _ _ _|               |
        |           |               |
 caller |           | <- caller sp  |
 _ _ _  |_ _ _ _ _ _|     -         | expand
                          |         |
          . . . . .       |         | direction
         _ _ _ _ _ _      |         |
        |           |     | N       |
        | ret addr  |     |         |
 callee |_ _ _ _ _ _|     |         |
        |           |     -         V
        | caller fp | <- fp
        |_ _ _ _ _ _|



I am not very familiar with AArch64 and have no idea how to fix this issue perfectly at current.

Based on my understanding of the implementation, we can get the correct stack trace when PreserveFramePointer is enabled.

Although PreserveFramePointer is disabled by default, I found that some real applications will enable it in the production environment.
Therefore, in my opinion, this fix can help troubleshoot crash issues in applications that enable PreserveFramePointer on AArch64 platform.

This patch changes the logic of l_sender_sp calculation, uses sender_sp() as the value of l_sender_sp when PreserveFramePointer is enabled.

Any input is appreciated.

Thanks,
Denghui

-------------

Commit messages:
 - 8277948: AArch64: Print the correct stack if -XX:+PreserveFramePointer when crash

Changes: https://git.openjdk.java.net/jdk/pull/6597/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6597&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8277948
  Stats: 13 lines in 4 files changed: 11 ins; 0 del; 2 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6597.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6597/head:pull/6597

PR: https://git.openjdk.java.net/jdk/pull/6597

From sviswanathan at openjdk.java.net  Mon Nov 29 23:30:36 2021
From: sviswanathan at openjdk.java.net (Sandhya Viswanathan)
Date: Mon, 29 Nov 2021 23:30:36 GMT
Subject: RFR: 8277617: Adjust AVX3Threshold for copy/fill stubs [v5]
In-Reply-To: <UXnYNz0bqKdWBzAjKTAw7xsgj8ilC3mN7c4s8Xsr6zw=.24b2646d-7da8-4da5-85f8-defa502d90aa@github.com>
References: <UXnYNz0bqKdWBzAjKTAw7xsgj8ilC3mN7c4s8Xsr6zw=.24b2646d-7da8-4da5-85f8-defa502d90aa@github.com>
Message-ID: <5Dmjboa1Vh9PwwZnwfiTpmrXSm0D7sNMMm92bufbGSM=.d8494927-8a16-4efb-84b7-15086809d13f@github.com>

> Currently 32-byte instructions are used for small array copy and clear. 
> This can be optimized by using 64-byte instructions.
> 
> Please review.
> 
> Best Regards,
> Sandhya

Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision:

  Implement review comments

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6512/files
  - new: https://git.openjdk.java.net/jdk/pull/6512/files/021bc659..b44b63ed

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6512&range=04
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6512&range=03-04

  Stats: 17 lines in 3 files changed: 2 ins; 3 del; 12 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6512.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6512/head:pull/6512

PR: https://git.openjdk.java.net/jdk/pull/6512

From sviswanathan at openjdk.java.net  Tue Nov 30 00:10:39 2021
From: sviswanathan at openjdk.java.net (Sandhya Viswanathan)
Date: Tue, 30 Nov 2021 00:10:39 GMT
Subject: RFR: 8277617: Adjust AVX3Threshold for copy/fill stubs [v6]
In-Reply-To: <UXnYNz0bqKdWBzAjKTAw7xsgj8ilC3mN7c4s8Xsr6zw=.24b2646d-7da8-4da5-85f8-defa502d90aa@github.com>
References: <UXnYNz0bqKdWBzAjKTAw7xsgj8ilC3mN7c4s8Xsr6zw=.24b2646d-7da8-4da5-85f8-defa502d90aa@github.com>
Message-ID: <tmkXP_CmlQFh1X78NO1exq26ajbPoLo-o8k1fRncWNY=.ae93dfd2-b44c-43f2-a437-c30e4c9ebc1c@github.com>

> Currently 32-byte instructions are used for small array copy and clear. 
> This can be optimized by using 64-byte instructions.
> 
> Please review.
> 
> Best Regards,
> Sandhya

Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision:

  Fix whitespace

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6512/files
  - new: https://git.openjdk.java.net/jdk/pull/6512/files/b44b63ed..190f974c

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6512&range=05
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6512&range=04-05

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6512.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6512/head:pull/6512

PR: https://git.openjdk.java.net/jdk/pull/6512

From sviswanathan at openjdk.java.net  Tue Nov 30 00:43:04 2021
From: sviswanathan at openjdk.java.net (Sandhya Viswanathan)
Date: Tue, 30 Nov 2021 00:43:04 GMT
Subject: RFR: 8277617: Adjust AVX3Threshold for copy/fill stubs
In-Reply-To: <1KoRjoyObIS32kwNcojcLdIdUkdqpL1Pon6-IIn-H94=.a986a7bb-a14b-4df8-9ab2-9c66650e6d1b@github.com>
References: <UXnYNz0bqKdWBzAjKTAw7xsgj8ilC3mN7c4s8Xsr6zw=.24b2646d-7da8-4da5-85f8-defa502d90aa@github.com>
 <i_wxLk5rTL7hCabnpNjD71QgW3eeO3h5S3Z0Cib7juM=.d8967941-e6d0-40fd-a79a-0db05faf62e6@github.com>
 <xGiWOFdT7TkPCySvVTsX5DAOwQSnpV6R45wTdxKUe5k=.995b070d-685c-43be-bcfa-640860ef28cf@github.com>
 <HcpmEdneyRzTCtdGeryGYHERd-pfjmCeHBh1ggmwE2Q=.dde41662-94a6-485e-95bb-61aa03df78eb@github.com>
 <JVUACA5wkhF8B6Aik3q7frcOAiUc6u_10eJZv8N-NLc=.eb482f9e-771d-4e0f-b70b-48994ccb2288@github.com>
 <1KoRjoyObIS32kwNcojcLdIdUkdqpL1Pon6-IIn-H94=.a986a7bb-a14b-4df8-9ab2-9c66650e6d1b@github.com>
Message-ID: <pmBfQ7as9Elm3sCHMcZc-O8_yhwpesletuMbG-_7sIE=.9d7f56ad-8063-4c6f-928d-3c1154bedd97@github.com>

On Tue, 23 Nov 2021 06:49:07 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> @dholmes-ora I have implemented your review comments.
>
> Sorry @sviswa7 but could you explain in the comment why/how `avx3_threshold` reporting zero impacts the use 64-byte load/store - the connection is not at all obvious for anyone not fully conversant with AVX3 and how it is used by the code. Thanks.

@dholmes-ora @jatin-bhateja I have implemented your review comments.
I have used the direct formulation for avx3_threshold() method as suggested by David.
Reused the avx3_threshold() computation where possible as suggested by Jatin.
The tier1-tier3 testing passed on the platform where avx3_threshold() returns 0. 
No additional observable overhead seen in SPECjvm2008 startup benchmarks on AVX512 platform.
Please let me know if the patch looks ok to you.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6512

From duke at openjdk.java.net  Tue Nov 30 02:14:37 2021
From: duke at openjdk.java.net (xpbob)
Date: Tue, 30 Nov 2021 02:14:37 GMT
Subject: RFR: 8277930: Add unsafe allocation event to jfr [v2]
In-Reply-To: <hTSvT0d63lXUYdW8Y4Gk_DjO6rX5RsBnTwKQg9-if64=.cac9f512-62d8-4e6d-a25c-33287635b82d@github.com>
References: <hTSvT0d63lXUYdW8Y4Gk_DjO6rX5RsBnTwKQg9-if64=.cac9f512-62d8-4e6d-a25c-33287635b82d@github.com>
Message-ID: <jgJPI3ipV3EL2_jSyyB4233fYXAmwSSBnfT_-BKcWAA=.4240f7b9-da8b-409b-8900-9e68b801f2ff@github.com>

> Unsafe is used in many Java frameworks.
> When the framework has a unsafe memory leak , there is no way to know what code is causing it.
> Add unsafe allocation event to jfr.
> Records the size and stack allocated.
> This event is off by default

xpbob has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision:

 - Merge branch 'openjdk:master' into JDK-8277930
 - 8277930: Add unsafe allocation event to jfr

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6591/files
  - new: https://git.openjdk.java.net/jdk/pull/6591/files/a30f3618..d883f62d

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6591&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6591&range=00-01

  Stats: 450 lines in 28 files changed: 164 ins; 169 del; 117 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6591.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6591/head:pull/6591

PR: https://git.openjdk.java.net/jdk/pull/6591

From ddong at openjdk.java.net  Tue Nov 30 02:40:06 2021
From: ddong at openjdk.java.net (Denghui Dong)
Date: Tue, 30 Nov 2021 02:40:06 GMT
Subject: RFR: 8277930: Add unsafe allocation event to jfr
In-Reply-To: <vgNHQSIbxt16_7eNnT6A41lVCZ0DRzANn99DL_8xsyI=.15cee6da-31f5-41f7-b943-a70624c27668@github.com>
References: <hTSvT0d63lXUYdW8Y4Gk_DjO6rX5RsBnTwKQg9-if64=.cac9f512-62d8-4e6d-a25c-33287635b82d@github.com>
 <vgNHQSIbxt16_7eNnT6A41lVCZ0DRzANn99DL_8xsyI=.15cee6da-31f5-41f7-b943-a70624c27668@github.com>
Message-ID: <UPBT_FMWIDpMBhwsXTcMB0ibaBY72IJlTHRGgG_zq1w=.30676862-bf3d-49d0-bbef-d295b323483e@github.com>

On Mon, 29 Nov 2021 14:39:16 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:

> I'd suggest also adding an event for `Unsafe::freeMemory`, as well as recording the memory address in both event types. With that, it should be possible to match up allocations with frees, and leaks could be identified by looking for allocations that don't have a corresponding free with the same address.

Unsafe also support reallocating memory, which complexes the analysis of direct memory leak.

And I think we need a mechanism to filter events by allocation size, but  AFAIK, there is no general mechanism to achieve it at present.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6591

From coleenp at openjdk.java.net  Tue Nov 30 02:45:34 2021
From: coleenp at openjdk.java.net (Coleen Phillimore)
Date: Tue, 30 Nov 2021 02:45:34 GMT
Subject: RFR: 8265150: AsyncGetCallTrace crashes on ResourceMark
Message-ID: <MIHWCxuUoKja8jIrBV0b-vMuQWd1lmVIT56s85YHqzM=.3ccfaea9-264e-4769-bc6b-9e44f9b516a0@github.com>

This change seems to keep the test case in the bug from crashing in the ResourceMark destructor.  We have a ResourceMark during stack walking in AsyncGetCallTrace.  Also RegisterMap during jvmti shouldn't process oops, fix care of @fisk.
Testing tier1-6 in progress.

-------------

Commit messages:
 - 8265150: AsyncGetCallTrace crashes on ResourceMark

Changes: https://git.openjdk.java.net/jdk/pull/6606/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6606&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8265150
  Stats: 9 lines in 2 files changed: 0 ins; 3 del; 6 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6606.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6606/head:pull/6606

PR: https://git.openjdk.java.net/jdk/pull/6606

From jiefu at openjdk.java.net  Tue Nov 30 03:42:09 2021
From: jiefu at openjdk.java.net (Jie Fu)
Date: Tue, 30 Nov 2021 03:42:09 GMT
Subject: Withdrawn: 8277652: SIGSEGV in
 ShenandoahBarrierC2Support::verify_raw_mem for malformed control flow graph
In-Reply-To: <kXZWFS0Yj4qU3UF_PEmFY5ca1r7yYiCcgpSb2Yn7qFQ=.d9b39b03-5b46-4c3d-ad2a-605f07d83f3f@github.com>
References: <kXZWFS0Yj4qU3UF_PEmFY5ca1r7yYiCcgpSb2Yn7qFQ=.d9b39b03-5b46-4c3d-ad2a-605f07d83f3f@github.com>
Message-ID: <sAopOys0czOI8rcAZhRPxl6xn2slgAXacSZiseZ7wjg=.31d91e8b-834d-4e53-8e99-31f7ca5192ab@github.com>

On Tue, 23 Nov 2021 15:59:00 GMT, Jie Fu <jiefu at openjdk.org> wrote:

> Hi all,
> 
> `ShenandoahBarrierC2Support::verify_raw_mem` crashes due to `u->unique_ctrl_out()` [1] returns NULL for malformed control flow graph.
> It can be reproduced by running `compiler/vectorapi/TestIntrinsicBailOut.java` with `-XX:+UseShenandoahGC`.
> It would be better to fix it.
> 
> Thanks.
> Best regards,
> Jie
> 
> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp#L1925

This pull request has been closed without being integrated.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6525

From jiefu at openjdk.java.net  Tue Nov 30 03:42:09 2021
From: jiefu at openjdk.java.net (Jie Fu)
Date: Tue, 30 Nov 2021 03:42:09 GMT
Subject: RFR: 8277652: SIGSEGV in
 ShenandoahBarrierC2Support::verify_raw_mem for malformed control flow graph
In-Reply-To: <81o2YKFQvTE2C9qqBBDBjC5L1dNyPMRTJw1CcTdD2SA=.6946cbb3-de08-46b7-9724-7c39a989efc3@github.com>
References: <kXZWFS0Yj4qU3UF_PEmFY5ca1r7yYiCcgpSb2Yn7qFQ=.d9b39b03-5b46-4c3d-ad2a-605f07d83f3f@github.com>
 <81o2YKFQvTE2C9qqBBDBjC5L1dNyPMRTJw1CcTdD2SA=.6946cbb3-de08-46b7-9724-7c39a989efc3@github.com>
Message-ID: <I3G6lhCUL4lGXDayxqP9iyN8WHbrosOpVfHcKOnG7xU=.c4bffb1c-672a-446f-b342-bf2199efd54c@github.com>

On Tue, 23 Nov 2021 16:20:56 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

> Thank you, Jie! I am currently working on a change that would make LRB runtime call not consume or produce raw memory at all, and would obsolete your change. See #6526 .

Thanks @rkennke for fixing it.
So it's time to close this pr.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6525

From dholmes at openjdk.java.net  Tue Nov 30 04:43:06 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Tue, 30 Nov 2021 04:43:06 GMT
Subject: RFR: 8265150: AsyncGetCallTrace crashes on ResourceMark
In-Reply-To: <MIHWCxuUoKja8jIrBV0b-vMuQWd1lmVIT56s85YHqzM=.3ccfaea9-264e-4769-bc6b-9e44f9b516a0@github.com>
References: <MIHWCxuUoKja8jIrBV0b-vMuQWd1lmVIT56s85YHqzM=.3ccfaea9-264e-4769-bc6b-9e44f9b516a0@github.com>
Message-ID: <dlN-9HbAFhuCvegM5-uaIHPuOylqwNsMqfFgO02m7HM=.ad2b1871-d4ca-48b9-bd4a-4f9e837de9b3@github.com>

On Tue, 30 Nov 2021 02:37:47 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

> This change seems to keep the test case in the bug from crashing in the ResourceMark destructor.  We have a ResourceMark during stack walking in AsyncGetCallTrace.  Also RegisterMap during jvmti shouldn't process oops, fix care of @fisk.
> Testing tier1-6 in progress.

Hi Coleen,

This bypasses the currently observed problem, but we still have a fundamentally unsafe mechanism in use here. :(

Thanks,
David

-------------

Marked as reviewed by dholmes (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6606

From stuefe at openjdk.java.net  Tue Nov 30 06:07:02 2021
From: stuefe at openjdk.java.net (Thomas Stuefe)
Date: Tue, 30 Nov 2021 06:07:02 GMT
Subject: RFR: 8265150: AsyncGetCallTrace crashes on ResourceMark
In-Reply-To: <MIHWCxuUoKja8jIrBV0b-vMuQWd1lmVIT56s85YHqzM=.3ccfaea9-264e-4769-bc6b-9e44f9b516a0@github.com>
References: <MIHWCxuUoKja8jIrBV0b-vMuQWd1lmVIT56s85YHqzM=.3ccfaea9-264e-4769-bc6b-9e44f9b516a0@github.com>
Message-ID: <pojavOvsOHNlK5hSaUYGO2C4nvmqsYYIL8lBccdVy_k=.3eb2dae7-f4f6-4420-85d3-e887060ddf63@github.com>

On Tue, 30 Nov 2021 02:37:47 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

> This change seems to keep the test case in the bug from crashing in the ResourceMark destructor.  We have a ResourceMark during stack walking in AsyncGetCallTrace.  Also RegisterMap during jvmti shouldn't process oops, fix care of @fisk.
> Testing tier1-6 in progress.

LGTM

-------------

Marked as reviewed by stuefe (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6606

From eosterlund at openjdk.java.net  Tue Nov 30 06:07:02 2021
From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=)
Date: Tue, 30 Nov 2021 06:07:02 GMT
Subject: RFR: 8265150: AsyncGetCallTrace crashes on ResourceMark
In-Reply-To: <MIHWCxuUoKja8jIrBV0b-vMuQWd1lmVIT56s85YHqzM=.3ccfaea9-264e-4769-bc6b-9e44f9b516a0@github.com>
References: <MIHWCxuUoKja8jIrBV0b-vMuQWd1lmVIT56s85YHqzM=.3ccfaea9-264e-4769-bc6b-9e44f9b516a0@github.com>
Message-ID: <MFCRG2SzTJPYDoAZb-r3dsqJCz3XcmI_WniavP6uQkE=.158f04a0-a689-48d1-871f-5262f1af9715@github.com>

On Tue, 30 Nov 2021 02:37:47 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

> This change seems to keep the test case in the bug from crashing in the ResourceMark destructor.  We have a ResourceMark during stack walking in AsyncGetCallTrace.  Also RegisterMap during jvmti shouldn't process oops, fix care of @fisk.
> Testing tier1-6 in progress.

Looks good.

-------------

Marked as reviewed by eosterlund (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6606

From stuefe at openjdk.java.net  Tue Nov 30 06:07:03 2021
From: stuefe at openjdk.java.net (Thomas Stuefe)
Date: Tue, 30 Nov 2021 06:07:03 GMT
Subject: RFR: 8265150: AsyncGetCallTrace crashes on ResourceMark
In-Reply-To: <dlN-9HbAFhuCvegM5-uaIHPuOylqwNsMqfFgO02m7HM=.ad2b1871-d4ca-48b9-bd4a-4f9e837de9b3@github.com>
References: <MIHWCxuUoKja8jIrBV0b-vMuQWd1lmVIT56s85YHqzM=.3ccfaea9-264e-4769-bc6b-9e44f9b516a0@github.com>
 <dlN-9HbAFhuCvegM5-uaIHPuOylqwNsMqfFgO02m7HM=.ad2b1871-d4ca-48b9-bd4a-4f9e837de9b3@github.com>
Message-ID: <XvntU7sVRROn8tbYXAquXGsYOh1nwZpaeSPxhW56S-E=.702d6fab-6dce-462b-8331-5089b440ee40@github.com>

On Tue, 30 Nov 2021 04:39:58 GMT, David Holmes <dholmes at openjdk.org> wrote:

> Hi Coleen,
> 
> This bypasses the currently observed problem, but we still have a fundamentally unsafe mechanism in use here. :(
> 
> Thanks, David

Does AsyncGetCallTrace get triggered asynchronously via signal?

-------------

PR: https://git.openjdk.java.net/jdk/pull/6606

From dholmes at openjdk.java.net  Tue Nov 30 06:24:02 2021
From: dholmes at openjdk.java.net (David Holmes)
Date: Tue, 30 Nov 2021 06:24:02 GMT
Subject: RFR: 8265150: AsyncGetCallTrace crashes on ResourceMark
In-Reply-To: <XvntU7sVRROn8tbYXAquXGsYOh1nwZpaeSPxhW56S-E=.702d6fab-6dce-462b-8331-5089b440ee40@github.com>
References: <MIHWCxuUoKja8jIrBV0b-vMuQWd1lmVIT56s85YHqzM=.3ccfaea9-264e-4769-bc6b-9e44f9b516a0@github.com>
 <dlN-9HbAFhuCvegM5-uaIHPuOylqwNsMqfFgO02m7HM=.ad2b1871-d4ca-48b9-bd4a-4f9e837de9b3@github.com>
 <XvntU7sVRROn8tbYXAquXGsYOh1nwZpaeSPxhW56S-E=.702d6fab-6dce-462b-8331-5089b440ee40@github.com>
Message-ID: <R7PMP3hKU0FHA2rgMYh5uPtKaZj8ESAdPIIj7nx_Tdw=.814ac591-4e18-4141-b1f8-513bc32e88b9@github.com>

On Tue, 30 Nov 2021 06:02:08 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> > Hi Coleen,
> > This bypasses the currently observed problem, but we still have a fundamentally unsafe mechanism in use here. :(
> > Thanks, David
> 
> Does AsyncGetCallTrace get triggered asynchronously via signal?

Yes:
```V [libjvm.so+0x986023] AsyncGetCallTrace+0x1e5
C [libasyncProfiler.so+0x89b4] Profiler::getJavaTraceAsync(void*, ASGCT_CallFrame*, int)+0xd4
C [libasyncProfiler.so+0x9242] Profiler::recordSample(void*, unsigned long long, int, Event*)+0xd2 
C [libasyncProfiler.so+0x34f2c] PerfEvents::signalHandler(int, siginfo_t*, void*)+0x8c

-------------

PR: https://git.openjdk.java.net/jdk/pull/6606

From duke at openjdk.java.net  Tue Nov 30 07:18:36 2021
From: duke at openjdk.java.net (xpbob)
Date: Tue, 30 Nov 2021 07:18:36 GMT
Subject: RFR: 8277930: Add unsafe allocation event to jfr [v3]
In-Reply-To: <hTSvT0d63lXUYdW8Y4Gk_DjO6rX5RsBnTwKQg9-if64=.cac9f512-62d8-4e6d-a25c-33287635b82d@github.com>
References: <hTSvT0d63lXUYdW8Y4Gk_DjO6rX5RsBnTwKQg9-if64=.cac9f512-62d8-4e6d-a25c-33287635b82d@github.com>
Message-ID: <qVKMRxoNsd4piKW-ocvJ6rphFTQJCsF2nycxNiPryDE=.b7722273-c6d8-4a7a-a6fc-085cdda4727a@github.com>

> Unsafe is used in many Java frameworks.
> When the framework has a unsafe memory leak , there is no way to know what code is causing it.
> Add unsafe allocation event to jfr.
> Records the size and stack allocated.
> This event is off by default

xpbob has updated the pull request incrementally with one additional commit since the last revision:

  add free and Reallocate event

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6591/files
  - new: https://git.openjdk.java.net/jdk/pull/6591/files/d883f62d..f883847f

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6591&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6591&range=01-02

  Stats: 53 lines in 4 files changed: 49 ins; 0 del; 4 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6591.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6591/head:pull/6591

PR: https://git.openjdk.java.net/jdk/pull/6591

From stuefe at openjdk.java.net  Tue Nov 30 07:21:06 2021
From: stuefe at openjdk.java.net (Thomas Stuefe)
Date: Tue, 30 Nov 2021 07:21:06 GMT
Subject: RFR: 8265150: AsyncGetCallTrace crashes on ResourceMark
In-Reply-To: <MIHWCxuUoKja8jIrBV0b-vMuQWd1lmVIT56s85YHqzM=.3ccfaea9-264e-4769-bc6b-9e44f9b516a0@github.com>
References: <MIHWCxuUoKja8jIrBV0b-vMuQWd1lmVIT56s85YHqzM=.3ccfaea9-264e-4769-bc6b-9e44f9b516a0@github.com>
Message-ID: <H48_oGKscAL3FK1akhV6qpK1RcnKLbRs6wrt8ByyLZ0=.66b50984-3be6-4984-b2e9-b338c14e5f99@github.com>

On Tue, 30 Nov 2021 02:37:47 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

> This change seems to keep the test case in the bug from crashing in the ResourceMark destructor.  We have a ResourceMark during stack walking in AsyncGetCallTrace.  Also RegisterMap during jvmti shouldn't process oops, fix care of @fisk.
> Testing tier1-6 in progress.

> > > Hi Coleen,
> > > This bypasses the currently observed problem, but we still have a fundamentally unsafe mechanism in use here. :(
> > > Thanks, David
> > 
> > 
> > Does AsyncGetCallTrace get triggered asynchronously via signal?
> 
> Yes:
> 
> ```v
> C [libasyncProfiler.so+0x89b4] Profiler::getJavaTraceAsync(void*, ASGCT_CallFrame*, int)+0xd4
> C [libasyncProfiler.so+0x9242] Profiler::recordSample(void*, unsigned long long, int, Event*)+0xd2 
> C [libasyncProfiler.so+0x34f2c] PerfEvents::signalHandler(int, siginfo_t*, void*)+0x8c 
> ```

What you could do is keep (on demand only) a secondary resource area per thread. On entering a context that may have been called by a signal handler, and with the current resource area in an unknown state, swap the current resource area pointer in Thread with that prepared secondary resource area, and upon leaving swap back. That way you never touch the original resource area.

Kind of like double buffering for signal contexts.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6606

From duke at openjdk.java.net  Tue Nov 30 07:27:27 2021
From: duke at openjdk.java.net (xpbob)
Date: Tue, 30 Nov 2021 07:27:27 GMT
Subject: RFR: 8277930: Add unsafe allocation event to jfr [v4]
In-Reply-To: <hTSvT0d63lXUYdW8Y4Gk_DjO6rX5RsBnTwKQg9-if64=.cac9f512-62d8-4e6d-a25c-33287635b82d@github.com>
References: <hTSvT0d63lXUYdW8Y4Gk_DjO6rX5RsBnTwKQg9-if64=.cac9f512-62d8-4e6d-a25c-33287635b82d@github.com>
Message-ID: <Cbkf16ScpMY0IWH4MnZ8TVWD-1_LMM03k1as3DJohPc=.7c2cafd9-d28d-4649-a86a-c368ad40f2c1@github.com>

> Unsafe is used in many Java frameworks.
> When the framework has a unsafe memory leak , there is no way to know what code is causing it.
> Add unsafe allocation event to jfr.
> Records the size and stack allocated.
> This event is off by default

xpbob has updated the pull request incrementally with one additional commit since the last revision:

  remove whitespace

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6591/files
  - new: https://git.openjdk.java.net/jdk/pull/6591/files/f883847f..790cc817

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6591&range=03
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6591&range=02-03

  Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6591.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6591/head:pull/6591

PR: https://git.openjdk.java.net/jdk/pull/6591

From duke at openjdk.java.net  Tue Nov 30 08:00:11 2021
From: duke at openjdk.java.net (xpbob)
Date: Tue, 30 Nov 2021 08:00:11 GMT
Subject: RFR: 8277930: Add unsafe allocation event to jfr [v4]
In-Reply-To: <Cbkf16ScpMY0IWH4MnZ8TVWD-1_LMM03k1as3DJohPc=.7c2cafd9-d28d-4649-a86a-c368ad40f2c1@github.com>
References: <hTSvT0d63lXUYdW8Y4Gk_DjO6rX5RsBnTwKQg9-if64=.cac9f512-62d8-4e6d-a25c-33287635b82d@github.com>
 <Cbkf16ScpMY0IWH4MnZ8TVWD-1_LMM03k1as3DJohPc=.7c2cafd9-d28d-4649-a86a-c368ad40f2c1@github.com>
Message-ID: <wQP_eDWNiIWESmSFG9ZVVritbDwA4ieHQNQx9u8SJZ8=.e55c338a-ff4b-48b2-92fc-0734dabd1b20@github.com>

On Tue, 30 Nov 2021 07:27:27 GMT, xpbob <duke at openjdk.java.net> wrote:

>> Unsafe is used in many Java frameworks.
>> When the framework has a unsafe memory leak , there is no way to know what code is causing it.
>> Add unsafe allocation event to jfr.
>> Records the size and stack allocated.
>> This event is off by default
>
> xpbob has updated the pull request incrementally with one additional commit since the last revision:
> 
>   remove whitespace

Thanks
I added 3 events
|event|stack|addr|size|
|-|-|-|-|
|Allocation|true|alloc|true|
|Reallocate|true|before realloc,after realloc|true|
|Free|true|free addr|false|

-------------

PR: https://git.openjdk.java.net/jdk/pull/6591

From aph at openjdk.java.net  Tue Nov 30 10:24:02 2021
From: aph at openjdk.java.net (Andrew Haley)
Date: Tue, 30 Nov 2021 10:24:02 GMT
Subject: RFR: 8277948: AArch64: Print the correct native stack if
 -XX:+PreserveFramePointer when crash
In-Reply-To: <rpTtCfty6TqLhdYjwtrSn5lzYHK7jQGbtFKaIPc5510=.9c965e3a-b9f1-4bdb-807c-ae6c5584e353@github.com>
References: <rpTtCfty6TqLhdYjwtrSn5lzYHK7jQGbtFKaIPc5510=.9c965e3a-b9f1-4bdb-807c-ae6c5584e353@github.com>
Message-ID: <neFNs--sjksnHtHcpHXVkJlIebxozXtDMBhIYQjaheA=.82653888-760a-4e55-894e-d6987f67d536@github.com>

On Mon, 29 Nov 2021 17:40:43 GMT, Denghui Dong <ddong at openjdk.org> wrote:

> Hi,
> 
> I found that the native stack frames in the hs log are not accurate sometimes on AArch64, not sure if this is a known issue or an issue worth fixing.
> 
> The following steps can quick reproduce the problem:
> 
> 1. apply the diff(comment the dtrace_object_alloc call in interpreter and make a crash on SharedRuntime::dtrace_object_alloc)
> 
>   index 39e99bdd5ed..4fc768e94aa 100644
>   --- a/src/hotspot/cpu/aarch64/templateTable_aarch64.cpp
>   +++ b/src/hotspot/cpu/aarch64/templateTable_aarch64.cpp
>   @@ -3558,6 +3558,7 @@ void TemplateTable::_new() {
>        __ store_klass_gap(r0, zr);  // zero klass gap for compressed oops
>        __ store_klass(r0, r4);      // store klass last
> 
>   +/**
>        {
>          SkipIfEqual skip(_masm, &DTraceAllocProbes, false);
>          // Trigger dtrace event for fastpath
>   @@ -3567,6 +3568,7 @@ void TemplateTable::_new() {
>          __ pop(atos); // restore the return value
> 
>        }
>   +*/
>        __ b(done);
>      }
> 
>   diff --git a/src/hotspot/cpu/x86/templateTable_x86.cpp b/src/hotspot/cpu/x86/templateTable_x86.cpp
>   index 19530b7c57c..15b0509da4c 100644
>   --- a/src/hotspot/cpu/x86/templateTable_x86.cpp
>   +++ b/src/hotspot/cpu/x86/templateTable_x86.cpp
>   @@ -4033,6 +4033,7 @@ void TemplateTable::_new() {
>        Register tmp_store_klass = LP64_ONLY(rscratch1) NOT_LP64(noreg);
>        __ store_klass(rax, rcx, tmp_store_klass);  // klass
> 
>   +/**
>        {
>          SkipIfEqual skip_if(_masm, &DTraceAllocProbes, 0);
>          // Trigger dtrace event for fastpath
>   @@ -4041,6 +4042,7 @@ void TemplateTable::_new() {
>               CAST_FROM_FN_PTR(address, static_cast<int (*)(oopDesc*)>(SharedRuntime::dtrace_object_alloc)), rax);
>          __ pop(atos);
>        }
>   +*/
> 
>        __ jmp(done);
>      }
>   diff --git a/src/hotspot/share/runtime/sharedRuntime.cpp b/src/hotspot/share/runtime/sharedRuntime.cpp
>   index a5de65ea5ab..60b4bd3bcc8 100644
>   --- a/src/hotspot/share/runtime/sharedRuntime.cpp
>   +++ b/src/hotspot/share/runtime/sharedRuntime.cpp
>   @@ -1002,6 +1002,7 @@ jlong SharedRuntime::get_java_tid(Thread* thread) {
>     * 6254741.  Once that is fixed we can remove the dummy return value.
>     */
>    int SharedRuntime::dtrace_object_alloc(oopDesc* o) {
>   +  *(int*)0 = 1;
>      return dtrace_object_alloc(Thread::current(), o, o->size());
>    }
> 
> 
> 2. `java -XX:+DTraceAllocProbes -Xcomp -XX:-PreserveFramePointer -version`
> 
> On x86_64, the native stack in hs log is complete, but in AArch64, the native stack is incorrect.
> 
> In the beginning, I thought it might be the influence of PreserveFramePointer. Later, I found that no matter whether PreserveFramePointer is enabled or not, in the hs log of x86_64, the native stack is always correct, and aarch64 is wrong.
> 
> After some investigation, I found that this problem is related to the layout of the stack.
> 
> On x86_64, whether it is C/C++, interpreter, or JIT, `callee` will always put the `return address` and `fp` of the `caller` at the bottom of the stack.
> Hence, `callee` can always get the `caller sp`(aka `sender sp`) by `fp + 2`, and if `caller` is a compiled method, `caller sp` is the key to getting the `caller`'s `caller` since `caller fp` may be invalid.(see frame::sender_for_compiled_frame).
> 
> 
> push   %rbp
> mov    %rsp,%rbp
> 
>          _ _ _ _ _ _
>         |           |
>         |           |               |
>         |_ _ _ _ _ _|               |
>         |           |               |
>  caller |           | <- caller sp  |
>  _ _ _  |_ _ _ _ _ _|               | expand
>         |           |               |
>         | ret addr  |               | direction
>  callee |_ _ _ _ _ _|               |
>         |           |               V
>         | caller fp | <- fp
>         |_ _ _ _ _ _|
> 
> 
> 
> But for AArch64, the C/C++ code doesn't put the `return address` and `fp` of the `caller` at the bottom of the stack.
> Hence, we cannot use `fp + 2` to calculate the proper `caller sp`(although it is still implemented this way).
> 
> When `caller` is a C1/C2 method A, and `callee` a C/C++ method B, we cannot get the `caller` of A since we cannot get the proper sp value of it.
> 
> 
> stp x29, x30, [sp, #-N]!
> mov x29, sp
> 
>          _ _ _ _ _ _
>         |           |
>         |           |               |
>         |_ _ _ _ _ _|               |
>         |           |               |
>  caller |           | <- caller sp  |
>  _ _ _  |_ _ _ _ _ _|     -         | expand
>                           |         |
>           . . . . .       |         | direction
>          _ _ _ _ _ _      |         |
>         |           |     | N       |
>         | ret addr  |     |         |
>  callee |_ _ _ _ _ _|     |         |
>         |           |     -         V
>         | caller fp | <- fp
>         |_ _ _ _ _ _|
> 
> 
> 
> I am not very familiar with AArch64 and have no idea how to fix this issue perfectly at current.
> 
> Based on my understanding of the implementation, we can get the correct stack trace when PreserveFramePointer is enabled.
> 
> Although PreserveFramePointer is disabled by default, I found that some real applications will enable it in the production environment.
> Therefore, in my opinion, this fix can help troubleshoot crash issues in applications that enable PreserveFramePointer on AArch64 platform.
> 
> This patch changes the logic of l_sender_sp calculation, uses sender_sp() as the value of l_sender_sp when PreserveFramePointer is enabled.
> 
> Any input is appreciated.
> 
> Thanks,
> Denghui

Thank you for this. I'll have a look.

Stack unwinding on AArch64 C/C++ uses call frame information, which is in a separate section in the binary file. This allows the stack to be fully traced, even if there is no frame pointer. There is a library, libunwind, which does this. But that won't work with Java, which has its own way to do it

It would be nice to get -XX:+PreserveFramePointer working correctly.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6597

From shade at openjdk.java.net  Tue Nov 30 10:47:55 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Tue, 30 Nov 2021 10:47:55 GMT
Subject: RFR: 8274903: Zero: Support AsyncGetCallTrace [v5]
In-Reply-To: <JjNvKdaMic8QCxlSJG-pmw0Ru9eLqwnf3KQ8xGVzETY=.4d44f1e7-503a-4f08-8d62-3bf7eae74a49@github.com>
References: <JjNvKdaMic8QCxlSJG-pmw0Ru9eLqwnf3KQ8xGVzETY=.4d44f1e7-503a-4f08-8d62-3bf7eae74a49@github.com>
Message-ID: <FhMySaluprmLFDkDADtZf3TDRR6mGaZ7G3NpJDbGEeg=.84f3e4d7-46dd-433c-b679-e4869d2a7131@github.com>

> This is a Zero infrastructure improvement that makes Zero VM work with AsyncGetCallTrace, and by extension, async-profiler.
> 
> Zero is quite odd in stack management. The "real" stack actually contains the C++ Interpreter and the rest of VM code. The Java stack is reported through the usual "frame" mechanism the rest of VM uses to get the mapping from Template Interpreter, stub, and compiled code. So, to support Java-centric AsyncGetCallTrace, we t "only" need Zero to report the proper Java frames from its ZeroStack from the profiling/signal handlers. 
> 
> Additional testing:
>  - [x] Linux x86_64 Zero `serviceability/AsyncGetCallTrace` now pass
>  - [x] Linux x86_64 Zero works with `async-profiler`

Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision:

 - Merge branch 'master' into JDK-8274903-zero-asyncgetcalltrace
 - Fix a comment
 - Merge branch 'master' into JDK-8274903-zero-asyncgetcalltrace
 - More reviews
 - Review feedback
 - Merge branch 'master' into JDK-8274903-zero-asyncgetcalltrace
 - Initial work: runs async-profiler successfully

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/5848/files
  - new: https://git.openjdk.java.net/jdk/pull/5848/files/bc4ba33b..373f15ae

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5848&range=04
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5848&range=03-04

  Stats: 22783 lines in 424 files changed: 13220 ins; 6227 del; 3336 mod
  Patch: https://git.openjdk.java.net/jdk/pull/5848.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/5848/head:pull/5848

PR: https://git.openjdk.java.net/jdk/pull/5848

From duke at openjdk.java.net  Tue Nov 30 11:04:44 2021
From: duke at openjdk.java.net (xpbob)
Date: Tue, 30 Nov 2021 11:04:44 GMT
Subject: RFR: 8277930: Add unsafe allocation event to jfr [v5]
In-Reply-To: <hTSvT0d63lXUYdW8Y4Gk_DjO6rX5RsBnTwKQg9-if64=.cac9f512-62d8-4e6d-a25c-33287635b82d@github.com>
References: <hTSvT0d63lXUYdW8Y4Gk_DjO6rX5RsBnTwKQg9-if64=.cac9f512-62d8-4e6d-a25c-33287635b82d@github.com>
Message-ID: <J_6zt5y_lyY_JH9Qo4NrmiE6QY1Mr4uY2QBVBofJp8g=.bdb6f67d-c125-4d44-a5f4-b8374838578c@github.com>

> Unsafe is used in many Java frameworks.
> When the framework has a unsafe memory leak , there is no way to know what code is causing it.
> Add unsafe allocation event to jfr.
> Records the size and stack allocated.
> This event is off by default

xpbob has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision:

 - Merge branch 'openjdk:master' into JDK-8277930
 - remove whitespace
 - add free and Reallocate event
 - Merge branch 'openjdk:master' into JDK-8277930
 - 8277930: Add unsafe allocation event to jfr

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6591/files
  - new: https://git.openjdk.java.net/jdk/pull/6591/files/790cc817..b09c744d

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6591&range=04
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6591&range=03-04

  Stats: 163 lines in 10 files changed: 58 ins; 83 del; 22 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6591.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6591/head:pull/6591

PR: https://git.openjdk.java.net/jdk/pull/6591

From aph at openjdk.java.net  Tue Nov 30 11:29:11 2021
From: aph at openjdk.java.net (Andrew Haley)
Date: Tue, 30 Nov 2021 11:29:11 GMT
Subject: RFR: 8274903: Zero: Support AsyncGetCallTrace [v5]
In-Reply-To: <FhMySaluprmLFDkDADtZf3TDRR6mGaZ7G3NpJDbGEeg=.84f3e4d7-46dd-433c-b679-e4869d2a7131@github.com>
References: <JjNvKdaMic8QCxlSJG-pmw0Ru9eLqwnf3KQ8xGVzETY=.4d44f1e7-503a-4f08-8d62-3bf7eae74a49@github.com>
 <FhMySaluprmLFDkDADtZf3TDRR6mGaZ7G3NpJDbGEeg=.84f3e4d7-46dd-433c-b679-e4869d2a7131@github.com>
Message-ID: <aStt5MUYPXeTY097_aeHnLAmsMPAehrRG9761sRudMI=.a921b5ae-4e9d-4edb-8b9a-f09fb0640a39@github.com>

On Tue, 30 Nov 2021 10:47:55 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> This is a Zero infrastructure improvement that makes Zero VM work with AsyncGetCallTrace, and by extension, async-profiler.
>> 
>> Zero is quite odd in stack management. The "real" stack actually contains the C++ Interpreter and the rest of VM code. The Java stack is reported through the usual "frame" mechanism the rest of VM uses to get the mapping from Template Interpreter, stub, and compiled code. So, to support Java-centric AsyncGetCallTrace, we t "only" need Zero to report the proper Java frames from its ZeroStack from the profiling/signal handlers. 
>> 
>> Additional testing:
>>  - [x] Linux x86_64 Zero `serviceability/AsyncGetCallTrace` now pass
>>  - [x] Linux x86_64 Zero works with `async-profiler`
>
> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision:
> 
>  - Merge branch 'master' into JDK-8274903-zero-asyncgetcalltrace
>  - Fix a comment
>  - Merge branch 'master' into JDK-8274903-zero-asyncgetcalltrace
>  - More reviews
>  - Review feedback
>  - Merge branch 'master' into JDK-8274903-zero-asyncgetcalltrace
>  - Initial work: runs async-profiler successfully

src/hotspot/cpu/zero/frame_zero.cpp line 139:

> 137:   assert(is_interpreted_frame(), "Not an interpreted frame");
> 138:   // These are reasonable sanity checks
> 139:   if (fp() == 0 || (intptr_t(fp()) & (wordSize-1)) != 0) {

Use `is_aligned()` here?

-------------

PR: https://git.openjdk.java.net/jdk/pull/5848

From egahlin at openjdk.java.net  Tue Nov 30 11:37:13 2021
From: egahlin at openjdk.java.net (Erik Gahlin)
Date: Tue, 30 Nov 2021 11:37:13 GMT
Subject: RFR: 8277930: Add unsafe allocation event to jfr [v5]
In-Reply-To: <J_6zt5y_lyY_JH9Qo4NrmiE6QY1Mr4uY2QBVBofJp8g=.bdb6f67d-c125-4d44-a5f4-b8374838578c@github.com>
References: <hTSvT0d63lXUYdW8Y4Gk_DjO6rX5RsBnTwKQg9-if64=.cac9f512-62d8-4e6d-a25c-33287635b82d@github.com>
 <J_6zt5y_lyY_JH9Qo4NrmiE6QY1Mr4uY2QBVBofJp8g=.bdb6f67d-c125-4d44-a5f4-b8374838578c@github.com>
Message-ID: <cmKbA7z7mQ1KqymN64g3QK2mWU8qSx79ZRb-4kHPTTk=.c6c3a65e-e72d-4064-9a6d-9419c7ea13a6@github.com>

On Tue, 30 Nov 2021 11:04:44 GMT, xpbob <duke at openjdk.java.net> wrote:

>> Unsafe is used in many Java frameworks.
>> When the framework has a unsafe memory leak , there is no way to know what code is causing it.
>> Add unsafe allocation event to jfr.
>> Records the size and stack allocated.
>> This event is off by default
>
> xpbob has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision:
> 
>  - Merge branch 'openjdk:master' into JDK-8277930
>  - remove whitespace
>  - add free and Reallocate event
>  - Merge branch 'openjdk:master' into JDK-8277930
>  - 8277930: Add unsafe allocation event to jfr

What about overhead (if JFR is disabled)? 

This looks like it could be a hot path for some applications.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6591

From smonteith at openjdk.java.net  Tue Nov 30 11:41:14 2021
From: smonteith at openjdk.java.net (Stuart Monteith)
Date: Tue, 30 Nov 2021 11:41:14 GMT
Subject: RFR: 8277893: Arraycopy stress tests
In-Reply-To: <Jf29S7pfw1xlam-1-XYQBl-GzeW0OSApxQYbpvnRkeA=.a0ad4382-5b42-4fec-a57f-542f532119fc@github.com>
References: <Jf29S7pfw1xlam-1-XYQBl-GzeW0OSApxQYbpvnRkeA=.a0ad4382-5b42-4fec-a57f-542f532119fc@github.com>
Message-ID: <BrVZJsi9vQZcBuufDOG6X_NMrUxVjWf5j4TD80Sy8d8=.26ec7ffe-9aca-491f-9fda-d4d3fd0b99f2@github.com>

On Mon, 29 Nov 2021 13:28:33 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> I would like to fork the new tests off the JDK-8150730. These tests were instrumental in capturing many bugs in my arraycopy work, and I think they are good on their own merit, because they provide a test for the current baseline and on-going minor improvements in arraycopy on all platforms, not only x86_64, and they might be cleanly backportable.
> 
> A brief tour of these tests:
> 
> - Tests all data types;
> - Tests small arrays exhaustively, which captures conjoint/disjoint cases, errors near the edges, etc;
> - Tests large arrays with fuzzing around powers of two and powers of ten, both conjoint and disjoint cases;
> - Tests all available compilation modes for arraycopy stubs; for example, running on AVX-512 enabled machine runs all versions down to `-XX:UseAVX=0 -XX:UseSSE=0` cases;
> - Tests with/without compressed oops mode -- theoretically only needed for `Object` copies, but Hotspot cobbles together int+coops and long+no-coops loops, so I decided to alternate coops mode for all data types;
> 
> My previous version used individual `@run` clauses for all configurations, but I think the Java driver is cleaner and easier to maintain.
> 
> Test times:
> 
> 
> # x86_64 (TR 3970X)
>   real	9m11.037s
>   user	78m2.766s
>   sys	0m19.873s
> 
> # x86_32 (TR 3970X)
>   real	13m39.054s
>   user	147m38.308s
>   sys	0m10.924s
> 
> # x86_64 (i5-11500)
>   real    41m32.622s
>   user    447m19.986s
>   sys     0m21.026s
> 
> # AArch64 (ThunderX2)
>   real	5m34.210s
>   user	45m16.015s
>   sys	0m24.723s
> 
> 
> Since these tests are quite long, especially on small machines, I hooked them up to `hotspot:tier3`.
> 
> Additional testing:
>  - [x] Linux x86_64 fastdebug `compiler/stress/arraycopy`
>  - [x] Linux x86_32 fastdebug `compiler/stress/arraycopy`
>  - [x] Linux AArch64 fastdebug `compiler/stress/arraycopy`

This looks great, thanks Aleksey. This covers all of the cases I'd reasonably expect to see covered.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6594

From duke at openjdk.java.net  Tue Nov 30 12:43:35 2021
From: duke at openjdk.java.net (Vishal Chand)
Date: Tue, 30 Nov 2021 12:43:35 GMT
Subject: RFR: 8277372: Add getters for BOT and card table members [v3]
In-Reply-To: <klzlVqf82kg2njVyRdKpzmRWN6Jl5E2n-jknOw7sEEw=.a178e411-bdd8-43a8-8e62-c7f9a0921241@github.com>
References: <klzlVqf82kg2njVyRdKpzmRWN6Jl5E2n-jknOw7sEEw=.a178e411-bdd8-43a8-8e62-c7f9a0921241@github.com>
Message-ID: <J7DyEdC-gCWzqxA0aC9-VMPjD2-lgtSUPIvrvpQvWDA=.8bdf24f9-1bff-42e9-a13e-02a7d3461fac@github.com>

> Changed the visibility, added getters and refactored the following:
> 
> 1. Card Table Members
> 2. BOT members
> 3. ObjectStartArray block members

Vishal Chand has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits:

 - Merge branch 'master' into JDK-8277372-refactor
 - Refactoring in hotspot/cpu dir
 - Initial patch

-------------

Changes: https://git.openjdk.java.net/jdk/pull/6570/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6570&range=02
  Stats: 223 lines in 40 files changed: 46 ins; 11 del; 166 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6570.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6570/head:pull/6570

PR: https://git.openjdk.java.net/jdk/pull/6570

From shade at openjdk.java.net  Tue Nov 30 13:01:13 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Tue, 30 Nov 2021 13:01:13 GMT
Subject: RFR: 8274903: Zero: Support AsyncGetCallTrace [v5]
In-Reply-To: <aStt5MUYPXeTY097_aeHnLAmsMPAehrRG9761sRudMI=.a921b5ae-4e9d-4edb-8b9a-f09fb0640a39@github.com>
References: <JjNvKdaMic8QCxlSJG-pmw0Ru9eLqwnf3KQ8xGVzETY=.4d44f1e7-503a-4f08-8d62-3bf7eae74a49@github.com>
 <FhMySaluprmLFDkDADtZf3TDRR6mGaZ7G3NpJDbGEeg=.84f3e4d7-46dd-433c-b679-e4869d2a7131@github.com>
 <aStt5MUYPXeTY097_aeHnLAmsMPAehrRG9761sRudMI=.a921b5ae-4e9d-4edb-8b9a-f09fb0640a39@github.com>
Message-ID: <Y1IxoXXmAOJ2Zt26SC_MLg50cwd14kgkdEPp8xDbKVA=.687e9ed6-6484-401f-8357-3ef5a2fca953@github.com>

On Tue, 30 Nov 2021 11:26:04 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision:
>> 
>>  - Merge branch 'master' into JDK-8274903-zero-asyncgetcalltrace
>>  - Fix a comment
>>  - Merge branch 'master' into JDK-8274903-zero-asyncgetcalltrace
>>  - More reviews
>>  - Review feedback
>>  - Merge branch 'master' into JDK-8274903-zero-asyncgetcalltrace
>>  - Initial work: runs async-profiler successfully
>
> src/hotspot/cpu/zero/frame_zero.cpp line 139:
> 
>> 137:   assert(is_interpreted_frame(), "Not an interpreted frame");
>> 138:   // These are reasonable sanity checks
>> 139:   if (fp() == 0 || (intptr_t(fp()) & (wordSize-1)) != 0) {
> 
> Use `is_aligned()` here?

I could, but this matches what other platforms are doing in their `frame::is_interpreted_frame_valid()`. If there are no other fixes needed, okay if I keep this one in place? Otherwise, I would need to re-test the whole thing for a minor touchup, which is tedious.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5848

From duke at openjdk.java.net  Tue Nov 30 13:39:41 2021
From: duke at openjdk.java.net (Vishal Chand)
Date: Tue, 30 Nov 2021 13:39:41 GMT
Subject: RFR: 8277372: Add getters for BOT and card table members [v4]
In-Reply-To: <klzlVqf82kg2njVyRdKpzmRWN6Jl5E2n-jknOw7sEEw=.a178e411-bdd8-43a8-8e62-c7f9a0921241@github.com>
References: <klzlVqf82kg2njVyRdKpzmRWN6Jl5E2n-jknOw7sEEw=.a178e411-bdd8-43a8-8e62-c7f9a0921241@github.com>
Message-ID: <3ux2lUBwsHGYTsBe0jE0nvKWWEljd9VH2IdLwp0utNw=.7bf0bebb-6aca-4132-aadd-1113e657a6da@github.com>

> Changed the visibility, added getters and refactored the following:
> 
> 1. Card Table Members
> 2. BOT members
> 3. ObjectStartArray block members

Vishal Chand has updated the pull request incrementally with one additional commit since the last revision:

  Rename BOTConstants

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6570/files
  - new: https://git.openjdk.java.net/jdk/pull/6570/files/69ee4a32..48828873

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6570&range=03
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6570&range=02-03

  Stats: 93 lines in 9 files changed: 0 ins; 6 del; 87 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6570.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6570/head:pull/6570

PR: https://git.openjdk.java.net/jdk/pull/6570

From eric.caspole at oracle.com  Tue Nov 30 16:00:00 2021
From: eric.caspole at oracle.com (eric.caspole at oracle.com)
Date: Tue, 30 Nov 2021 11:00:00 -0500
Subject: RFR: 8277358: Accelerate CRC32-C
In-Reply-To: <c2lZEv8jF88RwXMRJ_Xvhh-s6uKa-Gyz5L3oeLICqbo=.0c0b9ae6-1f3d-4fa6-bfe9-afaaaf9c683a@github.com>
References: <c2lZEv8jF88RwXMRJ_Xvhh-s6uKa-Gyz5L3oeLICqbo=.0c0b9ae6-1f3d-4fa6-bfe9-afaaaf9c683a@github.com>
Message-ID: <ce8e7870-8089-84ee-5f83-210bc2a6c97f@oracle.com>

Hi Scott, is there a JMH for this or would an existing zip JMH benefit 
from this change? If there is already one, great, otherwise could you 
add one?
Thanks,
Eric


On 11/29/21 9:52 AM, Scott Gibbons wrote:
> Accelerates CRC32-C by utilizing vpclmulqdq similarly to CRC32.  This change achieves ~4x throughput improvement.
>
> 5986.947899319073 MB/s => 24041.05203089616 MB/s
> 5840.02689336947 MB/s => 24898.781468710356 MB/s
>
> ********** Original ***********
>
>
> scottgi at 96974-ICX32:~/crc/jdk (asgibbons-crc32c)$ java test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java 20000000
>   offset = 0
> msgSize = 512 bytes
>    iters = 20000000
> -------------------------------------------------------
> CRCs: crc = ae10ee5a, crcReference = ae10ee5a
> CRC32C.update(byte[]) runtime = 1.710387358 seconds
> CRC32C.update(byte[]) throughput = 5986.947899319073 MB/s
> CRCs: crc = ae10ee5a, crcReference = ae10ee5a
> -------------------------------------------------------
> CRCs: crc = ae10ee5a, crcReference = ae10ee5a
> CRC32C.update(ByteBuffer) runtime = 1.753416583 seconds
> CRC32C.update(ByteBuffer) throughput = 5840.02689336947 MB/s
> CRCs: crc = ae10ee5a, crcReference = ae10ee5a
> -------------------------------------------------------
>
>
>
>
> *********** With my changes: *************
>
>
>
> scottgi at 96974-ICX32:~/crc/jdk (asgibbons-crc32c)$ java test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java 20000000
>   offset = 0
> msgSize = 512 bytes
>    iters = 20000000
> -------------------------------------------------------
> CRCs: crc = ae10ee5a, crcReference = ae10ee5a
> CRC32C.update(byte[]) runtime = 0.425938099 seconds
> CRC32C.update(byte[]) throughput = 24041.05203089616 MB/s
> CRCs: crc = ae10ee5a, crcReference = ae10ee5a
> -------------------------------------------------------
> CRCs: crc = ae10ee5a, crcReference = ae10ee5a
> CRC32C.update(ByteBuffer) runtime = 0.411265106 seconds
> CRC32C.update(ByteBuffer) throughput = 24898.781468710356 MB/s
> CRCs: crc = ae10ee5a, crcReference = ae10ee5a
> -------------------------------------------------------
>
> -------------
>
> Commit messages:
>   - Merge branch 'asgibbons-crc32c' of https://github.com/asgibbons/jdk into asgibbons-crc32c
>   - Merge branch 'master' into asgibbons-crc32c
>   - Asgibbons crc32c (#7)
>   - Merge branch 'openjdk:master' into master
>   - Revert .gitignore change
>   - Move register save to within conditional; add comments
>   - Bad merge.
>   - Merge branch 'asgibbons-crc32c' of https://github.com/asgibbons/jdk into asgibbons-crc32c
>   - ZZMerge branch 'asgibbons-crc32c' of https://github.com/asgibbons/jdk into asgibbons-crc32c
>   - Use existing CRC32 code with different table for CRC32-C
>   - ... and 203 more: https://git.openjdk.java.net/jdk/compare/e9b36a83...10aeaec6
>
> Changes: https://git.openjdk.java.net/jdk/pull/6595/files
>   Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6595&range=00
>    Issue: https://bugs.openjdk.java.net/browse/JDK-8277358
>    Stats: 62 lines in 4 files changed: 40 ins; 1 del; 21 mod
>    Patch: https://git.openjdk.java.net/jdk/pull/6595.diff
>    Fetch: git fetch https://git.openjdk.java.net/jdk pull/6595/head:pull/6595
>
> PR: https://git.openjdk.java.net/jdk/pull/6595


From tschatzl at openjdk.java.net  Tue Nov 30 16:16:10 2021
From: tschatzl at openjdk.java.net (Thomas Schatzl)
Date: Tue, 30 Nov 2021 16:16:10 GMT
Subject: RFR: 8277372: Add getters for BOT and card table members [v4]
In-Reply-To: <3ux2lUBwsHGYTsBe0jE0nvKWWEljd9VH2IdLwp0utNw=.7bf0bebb-6aca-4132-aadd-1113e657a6da@github.com>
References: <klzlVqf82kg2njVyRdKpzmRWN6Jl5E2n-jknOw7sEEw=.a178e411-bdd8-43a8-8e62-c7f9a0921241@github.com>
 <3ux2lUBwsHGYTsBe0jE0nvKWWEljd9VH2IdLwp0utNw=.7bf0bebb-6aca-4132-aadd-1113e657a6da@github.com>
Message-ID: <SVqkYy4CD3bbDd7ACQBFRxd9_Qs7QGkxuIE1paiNXiU=.2629a9ea-84fb-4a2e-ad21-c440cedaa5f5@github.com>

On Tue, 30 Nov 2021 13:39:41 GMT, Vishal Chand <duke at openjdk.java.net> wrote:

>> Changed the visibility, added getters and refactored the following:
>> 
>> 1. Card Table Members
>> 2. BOT members
>> 3. ObjectStartArray block members
>
> Vishal Chand has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Rename BOTConstants

Getting good :) Some minor comments.

src/hotspot/share/gc/g1/g1BlockOffsetTable.cpp line 290:

> 288:     assert(_bot->offset_array(j) > 0 &&
> 289:            _bot->offset_array(j) <=
> 290:              (u_char) (BOTConstants::bot_card_size_words()+BOTConstants::N_powers-1),

Suggestion:

             (u_char) (BOTConstants::bot_card_size_words() + BOTConstants::N_powers - 1),

Pre-existing: operator has no spaces around it

src/hotspot/share/gc/g1/g1BlockOffsetTable.cpp line 295:

> 293:            (uint) _bot->offset_array(j),
> 294:            (uint) _bot->offset_array(j),
> 295:            (uint) (BOTConstants::bot_card_size_words()+BOTConstants::N_powers-1));

Suggestion:

           (uint) (BOTConstants::bot_card_size_words() + BOTConstants::N_powers - 1));

Pre-existing: spaces around operator

src/hotspot/share/gc/parallel/objectStartArray.hpp line 52:

> 50:   static uint _block_size;
> 51:   static uint _block_size_in_words;
> 52: 

Almost the same naming issue as in the `BlockOffsetTable/SharedArray`; I would prefer if these members (and getters) here were named similarly to the ones there.
It is true that `ObjectStartArray` and `BlockOffsetTable` are basically the same thing, but any eventual merge is another issue.

src/hotspot/share/gc/shared/cardTable.cpp line 416:

> 414:                dirty_cards++, next_entry++);
> 415:           MemRegion cur_cards(addr_for(cur_entry),
> 416:                               dirty_cards*_card_size_in_words);

Suggestion:

                              dirty_cards * _card_size_in_words);

Pre-existing: spaces around operator

src/hotspot/share/gc/shared/cardTable.cpp line 442:

> 440:                dirty_cards++, next_entry++);
> 441:           MemRegion cur_cards(addr_for(cur_entry),
> 442:                               dirty_cards*_card_size_in_words);

Suggestion:

                              dirty_cards * _card_size_in_words);

Pre-existing: spaces around operator

-------------

Changes requested by tschatzl (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6570

From sviswanathan at openjdk.java.net  Tue Nov 30 16:47:04 2021
From: sviswanathan at openjdk.java.net (Sandhya Viswanathan)
Date: Tue, 30 Nov 2021 16:47:04 GMT
Subject: RFR: 8277617: Adjust AVX3Threshold for copy/fill stubs [v6]
In-Reply-To: <tmkXP_CmlQFh1X78NO1exq26ajbPoLo-o8k1fRncWNY=.ae93dfd2-b44c-43f2-a437-c30e4c9ebc1c@github.com>
References: <UXnYNz0bqKdWBzAjKTAw7xsgj8ilC3mN7c4s8Xsr6zw=.24b2646d-7da8-4da5-85f8-defa502d90aa@github.com>
 <tmkXP_CmlQFh1X78NO1exq26ajbPoLo-o8k1fRncWNY=.ae93dfd2-b44c-43f2-a437-c30e4c9ebc1c@github.com>
Message-ID: <CubAEg0pvPl5MFmY4Tjd91wwzkA7K4yzjT9iC8yKsng=.52bfefe8-b236-4ad8-b3ac-a8811605bebb@github.com>

On Tue, 30 Nov 2021 00:10:39 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:

>> Currently 32-byte instructions are used for small array copy and clear. 
>> This can be optimized by using 64-byte instructions.
>> 
>> Please review.
>> 
>> Best Regards,
>> Sandhya
>
> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix whitespace

@neliasso Could you please also review this small patch. I would like to get it integrated before JDK 18 feature freeze.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6512

From kvn at openjdk.java.net  Tue Nov 30 19:29:07 2021
From: kvn at openjdk.java.net (Vladimir Kozlov)
Date: Tue, 30 Nov 2021 19:29:07 GMT
Subject: RFR: 8277893: Arraycopy stress tests
In-Reply-To: <Jf29S7pfw1xlam-1-XYQBl-GzeW0OSApxQYbpvnRkeA=.a0ad4382-5b42-4fec-a57f-542f532119fc@github.com>
References: <Jf29S7pfw1xlam-1-XYQBl-GzeW0OSApxQYbpvnRkeA=.a0ad4382-5b42-4fec-a57f-542f532119fc@github.com>
Message-ID: <cznx3SLXNb-0vjpmzGZMRkLsN4slomsZtGrNN3fzDSs=.f6f0239f-c3c4-4ade-be42-d0c2c21abdcc@github.com>

On Mon, 29 Nov 2021 13:28:33 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> I would like to fork the new tests off the JDK-8150730. These tests were instrumental in capturing many bugs in my arraycopy work, and I think they are good on their own merit, because they provide a test for the current baseline and on-going minor improvements in arraycopy on all platforms, not only x86_64, and they might be cleanly backportable.
> 
> A brief tour of these tests:
> 
> - Tests all data types;
> - Tests small arrays exhaustively, which captures conjoint/disjoint cases, errors near the edges, etc;
> - Tests large arrays with fuzzing around powers of two and powers of ten, both conjoint and disjoint cases;
> - Tests all available compilation modes for arraycopy stubs; for example, running on AVX-512 enabled machine runs all versions down to `-XX:UseAVX=0 -XX:UseSSE=0` cases;
> - Tests with/without compressed oops mode -- theoretically only needed for `Object` copies, but Hotspot cobbles together int+coops and long+no-coops loops, so I decided to alternate coops mode for all data types;
> 
> My previous version used individual `@run` clauses for all configurations, but I think the Java driver is cleaner and easier to maintain.
> 
> Test times:
> 
> 
> # x86_64 (TR 3970X)
>   real	9m11.037s
>   user	78m2.766s
>   sys	0m19.873s
> 
> # x86_32 (TR 3970X)
>   real	13m39.054s
>   user	147m38.308s
>   sys	0m10.924s
> 
> # x86_64 (i5-11500)
>   real    41m32.622s
>   user    447m19.986s
>   sys     0m21.026s
> 
> # AArch64 (ThunderX2)
>   real	5m34.210s
>   user	45m16.015s
>   sys	0m24.723s
> 
> 
> Since these tests are quite long, especially on small machines, I hooked them up to `hotspot:tier3`.
> 
> Additional testing:
>  - [x] Linux x86_64 fastdebug `compiler/stress/arraycopy`
>  - [x] Linux x86_32 fastdebug `compiler/stress/arraycopy`
>  - [x] Linux AArch64 fastdebug `compiler/stress/arraycopy`

I assume that `test/micro/org/openjdk/bench/java/lang` micros cover all these cases. Otherwise you may need to add some.

test/hotspot/jtreg/TEST.groups line 183:

> 181: 
> 182: tier3_compiler = \
> 183:   compiler/arraycopy/stress

Can you introduce separate group for this? For example `hotspot_arraycopy_stress` and use it here.
I am fine with introduced `tier2|3_compiler` groups but it will help us in Oracle to have separate group for `arraycopy` so we can schedule its testing on proper machines.

test/hotspot/jtreg/compiler/arraycopy/stress/AbstractStressArrayCopy.java line 32:

> 30:      * Max array size to test.
> 31:      */
> 32:     static final int MAX_SIZE = 1024*1024 + 1;

Do we really need such big arrays for regression testing. It may make sense for JMH but not for these tests I think.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6594

From shade at openjdk.java.net  Tue Nov 30 19:38:22 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Tue, 30 Nov 2021 19:38:22 GMT
Subject: RFR: 8278016: Add compiler tests to tier{2,3}
Message-ID: <Wo5EiCbWIdICf3zvz2_aTVr6U_zvEdltBiGKuoTT5oQ=.8439e332-2659-4894-bc75-b152e10bda07@github.com>

I have been looking at `hotspot:tier4` (catch-all not in lower tiers) run logs, and realized the whole bunch of compiler tests are running there.

Since `hotspot:tier4` runs a lot of `vmTestbase` tests, contributors seldom run it, as it takes many hours. Which means that many compiler tests are not running regularly for many contributors. But these tests are rather fast themselves and cover important compiler features.

We can properly add compiler tests to `tier{2,3}` to expose them on earlier tiers. The split logic between tiers is roughly: fast feature tests go into tier2, slower feature tests and debugging/printing stuff goes to tier3.

Sample times for new subgroups (think about this as "How much time they add to existing tiers"):


==============================
Test summary
==============================
   TEST                                              TOTAL  PASS  FAIL ERROR   
   jtreg:test/hotspot/jtreg:tier2_compiler             243   243     0     0   
==============================

real	2m16.518s
user	35m40.839s
sys	1m35.334s

==============================
Test summary
==============================
   TEST                                              TOTAL  PASS  FAIL ERROR   
   jtreg:test/hotspot/jtreg:tier3_compiler             132   132     0     0   
==============================

real	4m31.935s
user	71m54.617s
sys	2m13.073s

-------------

Commit messages:
 - Fix

Changes: https://git.openjdk.java.net/jdk/pull/6622/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6622&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8278016
  Stats: 43 lines in 1 file changed: 43 ins; 0 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6622.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6622/head:pull/6622

PR: https://git.openjdk.java.net/jdk/pull/6622

From shade at openjdk.java.net  Tue Nov 30 20:29:05 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Tue, 30 Nov 2021 20:29:05 GMT
Subject: RFR: 8277893: Arraycopy stress tests
In-Reply-To: <cznx3SLXNb-0vjpmzGZMRkLsN4slomsZtGrNN3fzDSs=.f6f0239f-c3c4-4ade-be42-d0c2c21abdcc@github.com>
References: <Jf29S7pfw1xlam-1-XYQBl-GzeW0OSApxQYbpvnRkeA=.a0ad4382-5b42-4fec-a57f-542f532119fc@github.com>
 <cznx3SLXNb-0vjpmzGZMRkLsN4slomsZtGrNN3fzDSs=.f6f0239f-c3c4-4ade-be42-d0c2c21abdcc@github.com>
Message-ID: <86VRDdE8F6Q0b4CNj2otyPX2z07QA1fBlV0TN0Vn1cs=.43f1fcde-4727-4925-a7fd-51afca9d30cf@github.com>

On Tue, 30 Nov 2021 19:25:41 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

> I assume that `test/micro/org/openjdk/bench/java/lang` micros cover all these cases. Otherwise you may need to add some.

Yes. Performance tests will come separately. This PR covers purely functional tests that verify arraycopies are not foobar-ing array contents, not hitting any asserts, or otherwise crash VMs. Performance tests would run on a limited set of inputs and in `release` bits, so they are bad for verification like this :)

> test/hotspot/jtreg/TEST.groups line 183:
> 
>> 181: 
>> 182: tier3_compiler = \
>> 183:   compiler/arraycopy/stress
> 
> Can you introduce separate group for this? For example `hotspot_arraycopy_stress` and use it here.
> I am fine with introduced `tier2|3_compiler` groups but it will help us in Oracle to have separate group for `arraycopy` so we can schedule its testing on proper machines.

Yes, we can. Actually, working on #6622, I realized these test groups would be introduced anyway. So these new arraycopy tests should probably go to `hotspot_slow_compiler` group, along with other `stress` tests. This would hook arraycopy tests into `hotspot:tier3` automatically if #6622 lands. Tell me if you still want a completely separate test group, or `hotspot_slow_compiler` is enough for current Oracle testing infra.

> test/hotspot/jtreg/compiler/arraycopy/stress/AbstractStressArrayCopy.java line 32:
> 
>> 30:      * Max array size to test.
>> 31:      */
>> 32:     static final int MAX_SIZE = 1024*1024 + 1;
> 
> Do we really need such big arrays for regression testing. It may make sense for JMH but not for these tests I think.

My original intent was to make sure the tests cross all small page sizes (up to 64K) and maybe even some large page sizes (1M `long[]` is 8M, so 2*4M). The size of this array does not matter for test performance very much, since we only allocate two `MAX_SIZE`-d arrays per entire run. Driver even caps the heap size at `-Xmx256m` to block tests from using too much memory. So, I'd leave it at 1M, if you agree.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6594

From kvn at openjdk.java.net  Tue Nov 30 20:39:10 2021
From: kvn at openjdk.java.net (Vladimir Kozlov)
Date: Tue, 30 Nov 2021 20:39:10 GMT
Subject: RFR: 8277893: Arraycopy stress tests
In-Reply-To: <86VRDdE8F6Q0b4CNj2otyPX2z07QA1fBlV0TN0Vn1cs=.43f1fcde-4727-4925-a7fd-51afca9d30cf@github.com>
References: <Jf29S7pfw1xlam-1-XYQBl-GzeW0OSApxQYbpvnRkeA=.a0ad4382-5b42-4fec-a57f-542f532119fc@github.com>
 <cznx3SLXNb-0vjpmzGZMRkLsN4slomsZtGrNN3fzDSs=.f6f0239f-c3c4-4ade-be42-d0c2c21abdcc@github.com>
 <86VRDdE8F6Q0b4CNj2otyPX2z07QA1fBlV0TN0Vn1cs=.43f1fcde-4727-4925-a7fd-51afca9d30cf@github.com>
Message-ID: <S0Kepg2S7f2olSK1zEbTWJGTxWMj2iaD7OMBfybY2AM=.715150f4-8107-45dc-8742-456e6031ec45@github.com>

On Tue, 30 Nov 2021 20:21:19 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> test/hotspot/jtreg/compiler/arraycopy/stress/AbstractStressArrayCopy.java line 32:
>> 
>>> 30:      * Max array size to test.
>>> 31:      */
>>> 32:     static final int MAX_SIZE = 1024*1024 + 1;
>> 
>> Do we really need such big arrays for regression testing. It may make sense for JMH but not for these tests I think.
>
> My original intent was to make sure the tests cross all small page sizes (up to 64K) and maybe even some large page sizes (1M `long[]` is 8M, so 2*4M). The size of this array does not matter for test performance very much, since we only allocate two `MAX_SIZE`-d arrays per entire run. Driver even caps the heap size at `-Xmx256m` to block tests from using too much memory. So, I'd leave it at 1M, if you agree.

Okay. I was concern because of times you show. I am fine with running tests upto 10-15 mins but not this:

# x86_64 (i5-11500)
  real    41m32.622s
  user    447m19.986s
  sys     0m21.026s


Do you know why it takes so much time on it?

-------------

PR: https://git.openjdk.java.net/jdk/pull/6594

From shade at openjdk.java.net  Tue Nov 30 20:44:43 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Tue, 30 Nov 2021 20:44:43 GMT
Subject: RFR: 8278016: Add compiler tests to tier{2,3} [v2]
In-Reply-To: <Wo5EiCbWIdICf3zvz2_aTVr6U_zvEdltBiGKuoTT5oQ=.8439e332-2659-4894-bc75-b152e10bda07@github.com>
References: <Wo5EiCbWIdICf3zvz2_aTVr6U_zvEdltBiGKuoTT5oQ=.8439e332-2659-4894-bc75-b152e10bda07@github.com>
Message-ID: <BP7aMsZkEwE4uCDCDyZUwVMtECktedga1JdG2GeyVbI=.90ca3872-100c-4de5-8d06-ee8c06a9c2f4@github.com>

> I have been looking at `hotspot:tier4` (catch-all not in lower tiers) run logs, and realized the whole bunch of compiler tests are running there.
> 
> Since `hotspot:tier4` runs a lot of `vmTestbase` tests, contributors seldom run it, as it takes many hours. Which means that many compiler tests are not running regularly for many contributors. But these tests are rather fast themselves and cover important compiler features.
> 
> We can properly add compiler tests to `tier{2,3}` to expose them on earlier tiers. The split logic between tiers is roughly: fast feature tests go into tier2, slower feature tests and debugging/printing stuff goes to tier3.
> 
> Sample times for new subgroups (think about this as "How much time they add to existing tiers"):
> 
> 
> ==============================
> Test summary
> ==============================
>    TEST                                              TOTAL  PASS  FAIL ERROR   
>    jtreg:test/hotspot/jtreg:tier2_compiler             243   243     0     0   
> ==============================
> 
> real	2m16.518s
> user	35m40.839s
> sys	1m35.334s
> 
> ==============================
> Test summary
> ==============================
>    TEST                                              TOTAL  PASS  FAIL ERROR   
>    jtreg:test/hotspot/jtreg:tier3_compiler             132   132     0     0   
> ==============================
> 
> real	4m31.935s
> user	71m54.617s
> sys	2m13.073s

Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:

  Filter out tier1/2 groups too

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6622/files
  - new: https://git.openjdk.java.net/jdk/pull/6622/files/d027cbe0..3a15f32b

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6622&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6622&range=00-01

  Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6622.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6622/head:pull/6622

PR: https://git.openjdk.java.net/jdk/pull/6622

From kvn at openjdk.java.net  Tue Nov 30 20:49:03 2021
From: kvn at openjdk.java.net (Vladimir Kozlov)
Date: Tue, 30 Nov 2021 20:49:03 GMT
Subject: RFR: 8277893: Arraycopy stress tests
In-Reply-To: <86VRDdE8F6Q0b4CNj2otyPX2z07QA1fBlV0TN0Vn1cs=.43f1fcde-4727-4925-a7fd-51afca9d30cf@github.com>
References: <Jf29S7pfw1xlam-1-XYQBl-GzeW0OSApxQYbpvnRkeA=.a0ad4382-5b42-4fec-a57f-542f532119fc@github.com>
 <cznx3SLXNb-0vjpmzGZMRkLsN4slomsZtGrNN3fzDSs=.f6f0239f-c3c4-4ade-be42-d0c2c21abdcc@github.com>
 <86VRDdE8F6Q0b4CNj2otyPX2z07QA1fBlV0TN0Vn1cs=.43f1fcde-4727-4925-a7fd-51afca9d30cf@github.com>
Message-ID: <uV4jgjSIa8creXks2ezzEhl7vxRhFGiqGM0iJ4seWLw=.19949c8f-d490-4327-8879-1cb3dee32b30@github.com>

On Tue, 30 Nov 2021 20:23:04 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> test/hotspot/jtreg/TEST.groups line 183:
>> 
>>> 181: 
>>> 182: tier3_compiler = \
>>> 183:   compiler/arraycopy/stress
>> 
>> Can you introduce separate group for this? For example `hotspot_arraycopy_stress` and use it here.
>> I am fine with introduced `tier2|3_compiler` groups but it will help us in Oracle to have separate group for `arraycopy` so we can schedule its testing on proper machines.
>
> Yes, we can. Actually, working on #6622, I realized these test groups would be introduced anyway. So these new arraycopy tests should probably go to `hotspot_slow_compiler` group, along with other `stress` tests. This would hook arraycopy tests into `hotspot:tier3` automatically if #6622 lands. Tell me if you still want a completely separate test group, or `hotspot_slow_compiler` is enough for current Oracle testing infra.

Please,  create separate test group and add it to `hotspot_slow_compiler`. We would not need to change infra settings if more testing is added to this new group later.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6594

From kvn at openjdk.java.net  Tue Nov 30 21:01:08 2021
From: kvn at openjdk.java.net (Vladimir Kozlov)
Date: Tue, 30 Nov 2021 21:01:08 GMT
Subject: RFR: 8278016: Add compiler tests to tier{2,3} [v2]
In-Reply-To: <BP7aMsZkEwE4uCDCDyZUwVMtECktedga1JdG2GeyVbI=.90ca3872-100c-4de5-8d06-ee8c06a9c2f4@github.com>
References: <Wo5EiCbWIdICf3zvz2_aTVr6U_zvEdltBiGKuoTT5oQ=.8439e332-2659-4894-bc75-b152e10bda07@github.com>
 <BP7aMsZkEwE4uCDCDyZUwVMtECktedga1JdG2GeyVbI=.90ca3872-100c-4de5-8d06-ee8c06a9c2f4@github.com>
Message-ID: <vsn3Lzb4-fp83nue6g4A7z0ywCZ9L2j4kVJscI-OfRY=.020e4453-ff85-4dc4-9a14-2bd50ef57acd@github.com>

On Tue, 30 Nov 2021 20:44:43 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> I have been looking at `hotspot:tier4` (catch-all not in lower tiers) run logs, and realized the whole bunch of compiler tests are running there.
>> 
>> Since `hotspot:tier4` runs a lot of `vmTestbase` tests, contributors seldom run it, as it takes many hours. Which means that many compiler tests are not running regularly for many contributors. But these tests are rather fast themselves and cover important compiler features.
>> 
>> We can properly add compiler tests to `tier{2,3}` to expose them on earlier tiers. The split logic between tiers is roughly: fast feature tests go into tier2, slower feature tests and debugging/printing stuff goes to tier3.
>> 
>> Sample times for new subgroups (think about this as "How much time they add to existing tiers"):
>> 
>> 
>> ==============================
>> Test summary
>> ==============================
>>    TEST                                              TOTAL  PASS  FAIL ERROR   
>>    jtreg:test/hotspot/jtreg:tier2_compiler             243   243     0     0   
>> ==============================
>> 
>> real	2m16.518s
>> user	35m40.839s
>> sys	1m35.334s
>> 
>> ==============================
>> Test summary
>> ==============================
>>    TEST                                              TOTAL  PASS  FAIL ERROR   
>>    jtreg:test/hotspot/jtreg:tier3_compiler             132   132     0     0   
>> ==============================
>> 
>> real	4m31.935s
>> user	71m54.617s
>> sys	2m13.073s
>
> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Filter out tier1/2 groups too

Looks good to me.

-------------

Marked as reviewed by kvn (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6622

From shade at openjdk.java.net  Tue Nov 30 21:25:07 2021
From: shade at openjdk.java.net (Aleksey Shipilev)
Date: Tue, 30 Nov 2021 21:25:07 GMT
Subject: RFR: 8277893: Arraycopy stress tests
In-Reply-To: <S0Kepg2S7f2olSK1zEbTWJGTxWMj2iaD7OMBfybY2AM=.715150f4-8107-45dc-8742-456e6031ec45@github.com>
References: <Jf29S7pfw1xlam-1-XYQBl-GzeW0OSApxQYbpvnRkeA=.a0ad4382-5b42-4fec-a57f-542f532119fc@github.com>
 <cznx3SLXNb-0vjpmzGZMRkLsN4slomsZtGrNN3fzDSs=.f6f0239f-c3c4-4ade-be42-d0c2c21abdcc@github.com>
 <86VRDdE8F6Q0b4CNj2otyPX2z07QA1fBlV0TN0Vn1cs=.43f1fcde-4727-4925-a7fd-51afca9d30cf@github.com>
 <S0Kepg2S7f2olSK1zEbTWJGTxWMj2iaD7OMBfybY2AM=.715150f4-8107-45dc-8742-456e6031ec45@github.com>
Message-ID: <9L5CHY8n-6csbW9jfsnXt4pSqnabXH5R7dt2pZFDmdA=.e53d343e-f7fd-46b1-a8af-02dba3fad3ec@github.com>

On Tue, 30 Nov 2021 20:34:46 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

>> My original intent was to make sure the tests cross all small page sizes (up to 64K) and maybe even some large page sizes (1M `long[]` is 8M, so 2*4M). The size of this array does not matter for test performance very much, since we only allocate two `MAX_SIZE`-d arrays per entire run. Driver even caps the heap size at `-Xmx256m` to block tests from using too much memory. So, I'd leave it at 1M, if you agree.
>
> Okay. I was concern because of times you show. I am fine with running tests upto 10-15 mins but not this:
> 
> # x86_64 (i5-11500)
>   real    41m32.622s
>   user    447m19.986s
>   sys     0m21.026s
> 
> 
> Do you know why it takes so much time on it?

That small machine has very slow memory compared to other ones. The parallelism in stress tests (9 types, 2 forked VMs each) puts that machine on its knees. There is a blurb about that effect here: https://github.com/openjdk/jdk/pull/6594/files#diff-f72fee20a49daaf4e05002372e93f426407ecd429a227393e2ec79e821042c90R40-R47 -- I don't think it would matter much if we trim `MAX_SIZE`, but I'll try tomorrow.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6594

From sspitsyn at openjdk.java.net  Tue Nov 30 23:23:24 2021
From: sspitsyn at openjdk.java.net (Serguei Spitsyn)
Date: Tue, 30 Nov 2021 23:23:24 GMT
Subject: RFR: 8265150: AsyncGetCallTrace crashes on ResourceMark
In-Reply-To: <MIHWCxuUoKja8jIrBV0b-vMuQWd1lmVIT56s85YHqzM=.3ccfaea9-264e-4769-bc6b-9e44f9b516a0@github.com>
References: <MIHWCxuUoKja8jIrBV0b-vMuQWd1lmVIT56s85YHqzM=.3ccfaea9-264e-4769-bc6b-9e44f9b516a0@github.com>
Message-ID: <MrstFlh0JINnDgQheSYkyqo3HFKuGDaOppaE3JdyAig=.6edc8ad7-0666-428b-a7f2-694dd183b54b@github.com>

On Tue, 30 Nov 2021 02:37:47 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

> This change seems to keep the test case in the bug from crashing in the ResourceMark destructor.  We have a ResourceMark during stack walking in AsyncGetCallTrace.  Also RegisterMap during jvmti shouldn't process oops, fix care of @fisk.
> Testing tier1-6 in progress.

Hi Coleen,
I'm okay with this work around.
Thanks,
Serguei

-------------

Marked as reviewed by sspitsyn (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6606

From sspitsyn at openjdk.java.net  Tue Nov 30 23:26:32 2021
From: sspitsyn at openjdk.java.net (Serguei Spitsyn)
Date: Tue, 30 Nov 2021 23:26:32 GMT
Subject: RFR: 8274903: Zero: Support AsyncGetCallTrace [v5]
In-Reply-To: <FhMySaluprmLFDkDADtZf3TDRR6mGaZ7G3NpJDbGEeg=.84f3e4d7-46dd-433c-b679-e4869d2a7131@github.com>
References: <JjNvKdaMic8QCxlSJG-pmw0Ru9eLqwnf3KQ8xGVzETY=.4d44f1e7-503a-4f08-8d62-3bf7eae74a49@github.com>
 <FhMySaluprmLFDkDADtZf3TDRR6mGaZ7G3NpJDbGEeg=.84f3e4d7-46dd-433c-b679-e4869d2a7131@github.com>
Message-ID: <9Qsyq7smTvNNP3a7WwtYaMDiGesLCvPWF1FTxArevT0=.aa62eb0d-0fbd-4f5b-a15c-5553cf88ecff@github.com>

On Tue, 30 Nov 2021 10:47:55 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> This is a Zero infrastructure improvement that makes Zero VM work with AsyncGetCallTrace, and by extension, async-profiler.
>> 
>> Zero is quite odd in stack management. The "real" stack actually contains the C++ Interpreter and the rest of VM code. The Java stack is reported through the usual "frame" mechanism the rest of VM uses to get the mapping from Template Interpreter, stub, and compiled code. So, to support Java-centric AsyncGetCallTrace, we t "only" need Zero to report the proper Java frames from its ZeroStack from the profiling/signal handlers. 
>> 
>> Additional testing:
>>  - [x] Linux x86_64 Zero `serviceability/AsyncGetCallTrace` now pass
>>  - [x] Linux x86_64 Zero works with `async-profiler`
>
> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision:
> 
>  - Merge branch 'master' into JDK-8274903-zero-asyncgetcalltrace
>  - Fix a comment
>  - Merge branch 'master' into JDK-8274903-zero-asyncgetcalltrace
>  - More reviews
>  - Review feedback
>  - Merge branch 'master' into JDK-8274903-zero-asyncgetcalltrace
>  - Initial work: runs async-profiler successfully

Marked as reviewed by sspitsyn (Reviewer).

-------------

PR: https://git.openjdk.java.net/jdk/pull/5848

From Divino.Cesar at microsoft.com  Thu Nov 11 19:24:30 2021
From: Divino.Cesar at microsoft.com (Cesar Soares Lucas)
Date: Thu, 11 Nov 2021 19:24:30 -0000
Subject: [External] : Re: RFC - Improving C2 Escape Analysis
In-Reply-To: <457a3277-bc96-d481-2a69-4559f25cd52e@oracle.com>
References: <BY5PR21MB1473143A554A8B9DE9577C159AAA9@BY5PR21MB1473.namprd21.prod.outlook.com>
 <20210930140335.648146897@eggemoggin.niobe.net>
 <ADAF2E9E-5D48-4CF0-9EFB-C68F47E31874@oracle.com>
 <BY5PR21MB147300BCF3E5008B63643D7D9AAE9@BY5PR21MB1473.namprd21.prod.outlook.com>
 <415a6622-a46c-33da-8e39-c8f3068c7df3@oracle.com>
 <44563450-403B-4A15-95AB-5FB5DCA4ED0B@oracle.com>
 <DM6PR21MB1484F06897E51C0399A94F109ABF9@DM6PR21MB1484.namprd21.prod.outlook.com>
 <81f86a0b-dfb7-0b45-1779-49209a82ae40@oracle.com>
 <BY5PR21MB14738F7C7AED2F625389109B9A859@BY5PR21MB1473.namprd21.prod.outlook.com>
 <0f30507c-e0f0-c380-568b-ac441611e116@oracle.com>
 <787f8fbb-83e6-0867-1c97-ae2516df114b@oracle.com>
 <BY5PR21MB1473B300A3D625054E22C8139A879@BY5PR21MB1473.namprd21.prod.outlook.com>
 <457a3277-bc96-d481-2a69-4559f25cd52e@oracle.com>
Message-ID: <BY5PR21MB14734EB2CE1D14079A6252909A949@BY5PR21MB1473.namprd21.prod.outlook.com>

Hi Vladimir,

Thank you for the feedback and sorry for the delay in getting back to you!

> Yes, finding solution for allocation merges (or NULL) is a pain. I spent some
> time investigating possible solutions for it but "no cigar". May be we do
> indead need control flow analysis to resolve this.

Can you elaborate a bit on the approaches you tried and why you didn't like
them? By allocation merges do you mean nested objects like "obj1.obj2.x",
right? Did you try solving both control-flow merge issues and also allocation
merges?

> There are 2 test files with small methods for different EA cases I used to
> see how EA works:

These examples are being very helpful, thank you again!

> Yes, I think it would be good to have a prototype if you are comfortable to
> work with C2 code already.  I proposed small RFEs just for warmup ;)

I talked with my colleagues and we decided to start the work by trying to fix
the control/data-flow merge issues - *perhaps not for all cases, but at least
for some of them*. Then, based on our experience with this and some
benchmarking we'll decide if we really need flow-sensitive analysis and how to
best approach that.

We'll definitely take a look at the RFEs as we move along! Implementing Stadler
algorithm was just something that crossed my mind initially, it's very likely
the last approach we'd try ... I don't want to bite more than I can chew..


Regards,
Cesar
________________________________
From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
Sent: October 29, 2021 5:27 PM
To: Cesar Soares Lucas <Divino.Cesar at microsoft.com>; Tobias Hartmann <tobias.hartmann at oracle.com>; Ron Pressler <ron.pressler at oracle.com>
Cc: John Rose <john.r.rose at oracle.com>; Mark Reinhold <mark.reinhold at oracle.com>; hotspot-dev at openjdk.java.net <hotspot-dev at openjdk.java.net>; Brian Stafford <Brian.Stafford at microsoft.com>; Martijn Verburg <Martijn.Verburg at microsoft.com>; Hohensee, Paul <hohensee at amazon.com>
Subject: Re: [External] : Re: RFC - Improving C2 Escape Analysis

On 10/29/21 4:50 PM, Cesar Soares Lucas wrote:
> Hi Vladimir and Tobias,
>
>  >> Sure, here are four examples of EA and/or scalarization failing due to
>  >> complicated control/data flow:
>  >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcr.openjdk.java.net%2F~thartmann%2FEA_examples&amp;data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731032568%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=hz4ti9lgmQeGLX%2BZ3vmSngXHHUAX%2FAvtObgeu%2Fqz1DI%3D&amp;reserved=0
>
>  >> There are 2 test files with small methods for different EA cases I used to
>  >> see how EA works:
>  >>
>  >> test/hotspot/jtreg/compiler/escapeAnalysis/Test6726999.java
>  >> test/hotspot/jtreg/compiler/escapeAnalysis/Test6689060.java
>
> Thank you for the examples, Tobias/Vladimir. This is being very helpful.
>
>  >> Yes, finding solution for allocation merges (or NULL) is a pain. I spent
>  >> some time investigating possible solutions for it but "no cigar". May be we
>  >> do indead need control flow analysis to resolve this.
>
> By "need control flow analysis" you mean the flow-sensitive EA algorithm? My

Yes.

To clarify. I investigated solutions in current flow-insensitive EA.

> first idea to handle these control/data-merge issues was to implement in C2 the
> same algorithm used by GRAAL - i.e., the algorithm described in Stadler et. al
> PEA paper. Do you think this is reasonable?

Yes, I think it would be good to have a prototype if you are comfortable to work with C2 code already.
I proposed small RFEs just for warmup ;)

>
>  >> I am currently looking on iterative EA. Do more EA rounds if we can
>  >> eliminate more connected allocations. It was proposed by Vladimir Ivanov and
>  >> I have working prototype.
>
> Cool! I'm curious, when do you plan to submit a Pull Request for this?

I am investigating regressions in some benchmarks.

>
>  >> There is also suggestion from Amazon Java group about "C2 Partial Escape
>  >> Analysis" which needs more discussion:
>  >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openjdk.java.net%2Fpipermail%2Fhotspot-compiler-dev%2F2021-May%2F047486.html&amp;data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731032568%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=S65Ko1Jss1PRksoLs9w1Ci7lH7Cprikus4goFUXDdL4%3D&amp;reserved=0
>
> I'd love to hear from them about their experience with these issues and if they
> have any plans to work on this moving forward! I'll ping them on the thread
> that you linked above.

Yes, I would like them to participate too (CCing to Paul). They sent proposal almost 6 months ago and we did not hear
any additional information after Vladimir Ivanov replied.

Regards,
Vladimir K

>
>
> Regards,
> Cesar
> ------------------------------------------------------------------------------------------------------------------------
> *From:* Vladimir Kozlov <vladimir.kozlov at oracle.com>
> *Sent:* October 27, 2021 10:26 AM
> *To:* Tobias Hartmann <tobias.hartmann at oracle.com>; Cesar Soares Lucas <Divino.Cesar at microsoft.com>; Ron Pressler
> <ron.pressler at oracle.com>
> *Cc:* John Rose <john.r.rose at oracle.com>; Mark Reinhold <mark.reinhold at oracle.com>; hotspot-dev at openjdk.java.net
> <hotspot-dev at openjdk.java.net>; Brian Stafford <Brian.Stafford at microsoft.com>; Martijn Verburg
> <Martijn.Verburg at microsoft.com>
> *Subject:* Re: [External] : Re: RFC - Improving C2 Escape Analysis
> First. Thank you, Cesar, for collecting data about C2 EA shortcomings.
>
> I agree with cases Tobias pointed as possible starting points to improve EA.
>
> Yes, finding solution for allocation merges (or NULL) is a pain. I spent some time investigating possible solutions for
> it but "no cigar". May be we do indead need control flow analysis to resolve this.
>
> I looked through JBS and found few issues which are not required to write new EA:
>
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-7149991&amp;data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731032568%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=w1OPBcpSVInagqRbMJ9%2BB0XYxxm84DWKGltPT5Btjss%3D&amp;reserved=0
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fnam06.safelinks.protection.outlook.com%2F%3Furl%3Dhttps*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-7149991%26amp%3Bdata%3D04*7C01*7CDivino.Cesar*40microsoft.com*7C63920cb1798f48c3487508d9996efd44*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637709524262611242*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000%26amp%3Bsdata%3DQrR7*2BGxXon4ToV6x3PhtQzZGl5tF7f1RUDbEi2AMTqA*3D%26amp%3Breserved%3D0__%3BJSUlJSUlJSUlJSUlJSUlJSUlJQ!!ACWV5N9M2RV99hQ!Y9n_pFC3a0ZG4KrWKJhrn9mlogJtuWmPqPlYgyNTBHWD2o2yhpaz9QpDsRbFhx0nlxftOg%24&amp;data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731032568%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Z%2B99B925iq8y%2BCcl%2Bs3zsocygNtEpAl%2F22xgX5CJcFg%3D&amp;reserved=0>
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8059378&amp;data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731042513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=iFo%2Farh7mS777oQl705t5pznFZttfMGqFO6%2BQpr71uY%3D&amp;reserved=0
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fnam06.safelinks.protection.outlook.com%2F%3Furl%3Dhttps*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8059378%26amp%3Bdata%3D04*7C01*7CDivino.Cesar*40microsoft.com*7C63920cb1798f48c3487508d9996efd44*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637709524262611242*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000%26amp%3Bsdata%3DrsMHgOyTDYF*2B*2Ba38jGeown5TcZfIEDucAWI5QuAaTd4*3D%26amp%3Breserved%3D0__%3BJSUlJSUlJSUlJSUlJSUlJSUlJSU!!ACWV5N9M2RV99hQ!Y9n_pFC3a0ZG4KrWKJhrn9mlogJtuWmPqPlYgyNTBHWD2o2yhpaz9QpDsRbFhx3fmFwUkA%24&amp;data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731042513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=30G1N2vm%2BTNOgRtDesl3ssesCGuvx2RUqyw6tns%2FDi0%3D&amp;reserved=0>
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8073358&amp;data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731042513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=wkSutLxq2%2B%2FqUsUViubbNO97gQQ9I91%2FarNQqQxIFC8%3D&amp;reserved=0
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fnam06.safelinks.protection.outlook.com%2F%3Furl%3Dhttps*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8073358%26amp%3Bdata%3D04*7C01*7CDivino.Cesar*40microsoft.com*7C63920cb1798f48c3487508d9996efd44*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637709524262611242*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000%26amp%3Bsdata%3DCypHNEd5B5EymTYMnF6jf30LspY6sBqXoz1sypE2tSg*3D%26amp%3Breserved%3D0__%3BJSUlJSUlJSUlJSUlJSUlJSUl!!ACWV5N9M2RV99hQ!Y9n_pFC3a0ZG4KrWKJhrn9mlogJtuWmPqPlYgyNTBHWD2o2yhpaz9QpDsRbFhx2VVMtprg%24&amp;data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731042513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=qcUOOBHFXNmPXPvG66KDzdlFQvTZ453fdsUliva4W8A%3D&amp;reserved=0>
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8155769&amp;data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731042513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=oCMhOgnX0FjV4j%2Bymy7z8Op6IFfd8z71AZ%2BZlqbYWSU%3D&amp;reserved=0
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fnam06.safelinks.protection.outlook.com%2F%3Furl%3Dhttps*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8155769%26amp%3Bdata%3D04*7C01*7CDivino.Cesar*40microsoft.com*7C63920cb1798f48c3487508d9996efd44*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637709524262611242*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000%26amp%3Bsdata%3DBE170*2BZrn2c2*2FDLcijZsol25q2zY5X5idHXXwjCn7ug*3D%26amp%3Breserved%3D0__%3BJSUlJSUlJSUlJSUlJSUlJSUlJSU!!ACWV5N9M2RV99hQ!Y9n_pFC3a0ZG4KrWKJhrn9mlogJtuWmPqPlYgyNTBHWD2o2yhpaz9QpDsRbFhx3hRRGkQg%24&amp;data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731042513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Irx%2Bc6pTAZmB6ipB2IF2ma%2BVE7t0mXK%2Fl7%2BiwhPntPA%3D&amp;reserved=0>
>
> Tobias also has fix prototype for next bug which was not fixed yet:
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8236493&amp;data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731042513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=KCLrH3%2FnNhLANzyGrbCLILwuDUfql5h3Lx0REVsol%2F0%3D&amp;reserved=0
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fnam06.safelinks.protection.outlook.com%2F%3Furl%3Dhttps*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8236493%26amp%3Bdata%3D04*7C01*7CDivino.Cesar*40microsoft.com*7C63920cb1798f48c3487508d9996efd44*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637709524262611242*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000%26amp%3Bsdata%3DfqaQ7zhAHGdsnUcw7wjA6c4XX96Aaa3acTIzc6*2FJXmY*3D%26amp%3Breserved%3D0__%3BJSUlJSUlJSUlJSUlJSUlJSUlJQ!!ACWV5N9M2RV99hQ!Y9n_pFC3a0ZG4KrWKJhrn9mlogJtuWmPqPlYgyNTBHWD2o2yhpaz9QpDsRbFhx2urgFigw%24&amp;data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731042513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=TEd4GjLj1FC%2BwwBaix%2B0JwWoSX7ch0nCsVmsI4VDc%2B4%3D&amp;reserved=0>
>
> Ther are 2 test files with small methods for different EA cases I used to see how EA works:
>
> test/hotspot/jtreg/compiler/escapeAnalysis/Test6726999.java
> test/hotspot/jtreg/compiler/escapeAnalysis/Test6689060.java
>
> You can start looking on above RFE/bug or run these tests and see why scalarization failed for some cases. Except for
> known merge issue:
>
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-6853701&amp;data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731042513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=vYIhnXEGGw%2FLx83NKcCAu0Vdt382TngtfpQ%2BCDBq7cU%3D&amp;reserved=0
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fnam06.safelinks.protection.outlook.com%2F%3Furl%3Dhttps*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-6853701%26amp%3Bdata%3D04*7C01*7CDivino.Cesar*40microsoft.com*7C63920cb1798f48c3487508d9996efd44*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637709524262621193*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000%26amp%3Bsdata%3DF*2Bz1CFuCK6ZgXi5*2FWOcOgBWuXKeap0oZJh4873QKRgk*3D%26amp%3Breserved%3D0__%3BJSUlJSUlJSUlJSUlJSUlJSUlJSU!!ACWV5N9M2RV99hQ!Y9n_pFC3a0ZG4KrWKJhrn9mlogJtuWmPqPlYgyNTBHWD2o2yhpaz9QpDsRbFhx1olloG2Q%24&amp;data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731042513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=1G7%2FG24Dpl23jat0F6EMv7EU8ezR2RoviINRcopQwpw%3D&amp;reserved=0>
>
> I am currently looking on iterative EA. Do more EA rounds if we can eliminate more connected allocations. It was
> proposed by Vladimir Ivanov and I have working prototype.
>
> There is also suggestin from Amazon Java group about "C2 Partial Escape Analysis" which needs more discsussion:
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openjdk.java.net%2Fpipermail%2Fhotspot-compiler-dev%2F2021-May%2F047486.html&amp;data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731052481%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=VUDTiLcJzwQkcUHQzLk7vcOIjmqSKQt8glKSrTHRX6w%3D&amp;reserved=0
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fnam06.safelinks.protection.outlook.com%2F%3Furl%3Dhttps*3A*2F*2Fmail.openjdk.java.net*2Fpipermail*2Fhotspot-compiler-dev*2F2021-May*2F047486.html%26amp%3Bdata%3D04*7C01*7CDivino.Cesar*40microsoft.com*7C63920cb1798f48c3487508d9996efd44*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637709524262621193*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000%26amp%3Bsdata%3DQFszHSDnPkYLBkjqzNkmU92P6VlBFSok1mOku5sNudw*3D%26amp%3Breserved%3D0__%3BJSUlJSUlJSUlJSUlJSUlJSUlJSU!!ACWV5N9M2RV99hQ!Y9n_pFC3a0ZG4KrWKJhrn9mlogJtuWmPqPlYgyNTBHWD2o2yhpaz9QpDsRbFhx2tIPFENw%24&amp;data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731052481%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=eEcFdBEBJb%2Bg%2F2NYA9mp3%2BaBRhshP8Nk9R7lCIrpc7A%3D&amp;reserved=0>
>
> Thanks,
> Vladimir K
>
> On 10/27/21 3:04 AM, Tobias Hartmann wrote:
>> Hi Cesar,
>>
>> On 27.10.21 08:20, Cesar Soares Lucas wrote:
>>> Right. I was suspecting this to be the most critical issue indeed. However, I
>>> didn't know there was a case where "... the object does not escape on any paths
>>> but control flow is too complicated for EA to prove that." Is this an issue
>>> tracked in JBS or perhaps you can show me an example where this happens?
>>
>> Sure, here are four examples of EA and/or scalarization failing due to complicated control/data
>> flow: https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcr.openjdk.java.net%2F~thartmann%2FEA_examples&amp;data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731052481%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=jv5lNO928mVyKHHlZIKyQ2eZGfu4W9ADV%2BlyX2IAvlk%3D&amp;reserved=0
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fnam06.safelinks.protection.outlook.com%2F%3Furl%3Dhttps*3A*2F*2Fcr.openjdk.java.net*2F*thartmann*2FEA_examples%26amp%3Bdata%3D04*7C01*7CDivino.Cesar*40microsoft.com*7C63920cb1798f48c3487508d9996efd44*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637709524262621193*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000%26amp%3Bsdata%3DYbaF4T0zt9dle23nulvUWWLktuTvaWFWENQHD7Q13CE*3D%26amp%3Breserved%3D0__%3BJSUlJX4lJSUlJSUlJSUlJSUlJQ!!ACWV5N9M2RV99hQ!Y9n_pFC3a0ZG4KrWKJhrn9mlogJtuWmPqPlYgyNTBHWD2o2yhpaz9QpDsRbFhx03YEOG3w%24&amp;data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731052481%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=1eI2my4BwLVxIqImsawvnY1sAHVV2Jth2lnMBmMLwFI%3D&amp;reserved=0>
>>
>> All examples would completely fold with inline types (Valhalla).
>>
>> I'm not sure if these issues are tracked by JBS issues but there's most likely an overlap with some
>> of the issues you already described.
>>
>> Best regards,
>> Tobias
>>